linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] x86: Add vDSO exception fixup for SGX
@ 2018-12-05 23:20 Sean Christopherson
  2018-12-05 23:20 ` [RFC PATCH 1/4] x86/vdso: Add support for exception fixup in vDSO functions Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Sean Christopherson @ 2018-12-05 23:20 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, Dave Hansen, Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Andy Lutomirski, Jarkko Sakkinen,
	Josh Triplett

First things first, this RFC is not intended to address whether or not
vDSO is the appropriate method for supporting SGX enclaves, but rather
its purpose is to hammer out the vDSO implementation *if* we decide it
is the best approach.

Though the code technically stands alone, the intent is to roll it
into the main SGX series once it has been beat on for a bit.  For that
reason there are no dependencies on CONFIG_SGX (which doesn't exist)
or any other bits that one might expect.

The quick and dirty is that for all intents and purposes, the SGX
architecture requires userspace to gracefully handle exceptions.  The
vast majority of enclaves are expected to leverage a library to handle
much of the plumbing.  Unfortunately for us (them?), putting two and
two together means SGX libraries would need to handle most signals on
behalf of the enclave's application, which is far from ideal.

One proposed solution for supporting SGX without requiring signals is
to wrap enclave transitions in a vDSO function so that SGX exceptions
can be intercepted via exception fixup and returned inline to the
caller.  This RFC series adds exception fixup and SGX support to the
vDSO.


Sean Christopherson (4):
  x86/vdso: Add support for exception fixup in vDSO functions
  x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling
  x86/traps: Attempt to fixup exceptions in vDSO before signaling
  x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions

 arch/x86/entry/vdso/Makefile          |   5 +-
 arch/x86/entry/vdso/extable.c         |  37 +++++++++
 arch/x86/entry/vdso/extable.h         |  17 ++++
 arch/x86/entry/vdso/vdso-layout.lds.S |   9 ++-
 arch/x86/entry/vdso/vdso.lds.S        |   1 +
 arch/x86/entry/vdso/vdso2c.h          |  58 ++++++++++++--
 arch/x86/entry/vdso/vsgx_eenter.c     | 108 ++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h           |   5 ++
 arch/x86/kernel/traps.c               |  11 +++
 arch/x86/mm/fault.c                   |   7 ++
 10 files changed, 247 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/entry/vdso/extable.c
 create mode 100644 arch/x86/entry/vdso/extable.h
 create mode 100644 arch/x86/entry/vdso/vsgx_eenter.c

-- 
2.19.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/4] x86/vdso: Add support for exception fixup in vDSO functions
  2018-12-05 23:20 [RFC PATCH 0/4] x86: Add vDSO exception fixup for SGX Sean Christopherson
@ 2018-12-05 23:20 ` Sean Christopherson
  2018-12-05 23:20 ` [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2018-12-05 23:20 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, Dave Hansen, Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Andy Lutomirski, Jarkko Sakkinen,
	Josh Triplett

The basic concept and implementation is very similar to the kernel's
exception fixup mechanism.  The key differences are that the kernel
handler is hardcoded and the fixup entry addresses are relative to
the overall table as opposed to individual entries.

Hardcoding the kernel handler avoids the need to figure out how to
get userspace code to point at a kernel function.  Given that the
expected usage is to propagate information to userspace, dumping all
fault information into registers is likely the desired behavior for
the vast majority of yet-to-be-created functions.  Use registers
DI, SI and DX to communicate fault information, which follows Linux's
ABI for register consumption and hopefully avoids conflict with
hardware features that might leverage the fixup capabilities, e.g.
register usage for SGX instructions was at least partially designed
with calling conventions in mind.

Making fixup addresses relative to the overall table allows the table
to be stripped from the final vDSO image (it's a kernel construct)
without complicating the offset logic, e.g. entry-relative addressing
would also need to account for the table's location relative to the
image.

Regarding stripping the table, modify vdso2c to extract the table from
the raw, a.k.a. unstripped, data and dump it as a standalone byte array
in the resulting .c file.  The original base of the table, its length
and a pointer to the byte array are captured in struct vdso_image.
Alternatively, the table could be dumped directly into the struct,
but because the number of entries can vary per image, that would
require either hardcoding a max sized table into the struct definition
or defining the table as a flexible length array.  The flexible length
array approach has zero benefits, e.g. the base/size are still needed,
and prevents reusing the extraction code, while hardcoding the max size
adds ongoing maintenance just to avoid exporting the explicit size.

The immediate use case is for Intel Software Guard Extensions (SGX).
SGX introduces a new CPL3-only "enclave" mode that runs as a sort of
black box shared object that is hosted by an untrusted "normal" CPl3
process.

Entering an enclave can only be done through SGX-specific instructions,
EENTER and ERESUME, and is a non-trivial process.  Because of the
complexity of transitioning to/from an enclave, the vast majority of
enclaves are expected to utilize a library to handle the actual
transitions.  This is roughly analogous to how e.g. libc implementations
are used by most applications.

Another crucial characteristic of SGX enclaves is that they can generate
exceptions as part of their normal (at least as "normal" as SGX can be)
operation that need to be handled *in* the enclave and/or are unique
to SGX.

And because they are essentially fancy shared objects, a process can
host any number of enclaves, each of which can execute multiple threads
simultaneously.

Putting everything together, userspace enclaves will utilize a library
that must be prepared to handle any and (almost) all exceptions any time
at least one thread may be executing in an enclave.  Leveraging signals
to handle the enclave exceptions is unpleasant, to put it mildly, e.g.
the SGX library must constantly (un)register its signal handler based
on whether or not at least one thread is executing in an enclave, and
filter and forward exceptions that aren't related to its enclaves.  This
becomes particularly nasty when using multiple levels of libraries that
register signal handlers, e.g. running an enclave via cgo inside of the
Go runtime.

Enabling exception fixup in vDSO allows the kernel to provide a vDSO
function that wraps the low-level transitions to/from the enclave, i.e.
the EENTER and ERESUME instructions.  The vDSO function can intercept
exceptions that would otherwise generate a signal and return the fault
information directly to its caller, thus avoiding the need to juggle
signal handlers.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/entry/vdso/Makefile          |  4 +-
 arch/x86/entry/vdso/extable.c         | 37 +++++++++++++++++
 arch/x86/entry/vdso/extable.h         | 17 ++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  9 ++++-
 arch/x86/entry/vdso/vdso2c.h          | 58 +++++++++++++++++++++++----
 arch/x86/include/asm/vdso.h           |  5 +++
 6 files changed, 119 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/entry/vdso/extable.c
 create mode 100644 arch/x86/entry/vdso/extable.h

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 141d415a8c80..eb543ee1bcec 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -20,7 +20,7 @@ VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
 
 # files to link into kernel
-obj-y				+= vma.o
+obj-y				+= vma.o extable.o
 OBJECT_FILES_NON_STANDARD_vma.o	:= n
 
 # vDSO images to build
@@ -115,7 +115,7 @@ $(obj)/%-x32.o: $(obj)/%.o FORCE
 
 targets += vdsox32.lds $(vobjx32s-y)
 
-$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: OBJCOPYFLAGS := -S --remove-section __ex_table
 $(obj)/%.so: $(obj)/%.so.dbg
 	$(call if_changed,objcopy)
 
diff --git a/arch/x86/entry/vdso/extable.c b/arch/x86/entry/vdso/extable.c
new file mode 100644
index 000000000000..49284d560d36
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <asm/current.h>
+#include <asm/vdso.h>
+
+struct vdso_exception_table_entry {
+	int insn, fixup;
+};
+
+bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+			  unsigned long error_code, unsigned long fault_addr)
+{
+	const struct vdso_image *image = current->mm->context.vdso_image;
+	const struct vdso_exception_table_entry *extable;
+	unsigned int nr_entries, i;
+	unsigned long base;
+
+	if (!current->mm->context.vdso)
+		return false;
+
+	base =  (unsigned long)current->mm->context.vdso + image->extable_base;
+	nr_entries = image->extable_len / (sizeof(*extable));
+	extable = image->extable;
+
+	for (i = 0; i < nr_entries; i++) {
+		if (regs->ip == base + extable[i].insn) {
+			regs->ip = base + extable[i].fixup;
+			regs->di = trapnr;
+			regs->si = error_code;
+			regs->dx = fault_addr;
+			return true;
+		}
+	}
+
+	return false;
+}
diff --git a/arch/x86/entry/vdso/extable.h b/arch/x86/entry/vdso/extable.h
new file mode 100644
index 000000000000..0f4ef8f34f2b
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __VDSO_EXTABLE_H
+#define __VDSO_EXTABLE_H
+
+/*
+ * Inject exception fixup for vDSO code.  Unlike normal exception fixup,
+ * vDSO uses a dedicated handler the addresses are relative to the overall
+ * exception table, not each individual entry.
+ */
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to)	\
+	".pushsection __ex_table,\"a\"\n"	\
+	".balign 4\n"				\
+	".long (" #from ") - __ex_table\n"	\
+	".long (" #to ") - __ex_table\n"	\
+	".popsection\n"
+
+#endif /* __VDSO_EXTABLE_H */
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index acfd5ba7d943..f1b228e8e599 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -85,7 +85,12 @@ SECTIONS
 	 * stuff that isn't used at runtime in between.
 	 */
 
-	.text		: { *(.text*) }			:text	=0x90909090,
+	.text		: {
+		*(.text*)
+		*(.fixup)
+	}						:text	=0x90909090,
+
+
 
 	/*
 	 * At the end so that eu-elflint stays happy when vdso2c strips
@@ -95,6 +100,8 @@ SECTIONS
 	.altinstructions	: { *(.altinstructions) }	:text
 	.altinstr_replacement	: { *(.altinstr_replacement) }	:text
 
+	__ex_table		: { *(__ex_table) }		:text
+
 	/DISCARD/ : {
 		*(.discard)
 		*(.discard.*)
diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index fa847a620f40..eca2f808bec3 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -5,6 +5,41 @@
  * are built for 32-bit userspace.
  */
 
+static void BITSFUNC(copy)(FILE *outfile, const unsigned char *data, size_t len)
+{
+	size_t i;
+
+	for (i = 0; i < len; i++) {
+		if (i % 10 == 0)
+			fprintf(outfile, "\n\t");
+		fprintf(outfile, "0x%02X, ", (int)(data)[i]);
+	}
+}
+
+
+/*
+ * Extract a section from the input data into a standalone blob.  Used to
+ * capture kernel-only data that needs to persist indefinitely, e.g. the
+ * exception fixup tables, but only in the kernel, i.e. the section can
+ * be stripped from the final vDSO image.
+ */
+static void BITSFUNC(extract)(const unsigned char *data, size_t data_len,
+			      FILE *outfile, ELF(Shdr) *sec, const char *name)
+{
+	unsigned long offset;
+	size_t len;
+
+	offset = (unsigned long)GET_LE(&sec->sh_offset);
+	len = (size_t)GET_LE(&sec->sh_size);
+
+	if (offset + len > data_len)
+		fail("section to extract overruns input data");
+
+	fprintf(outfile, "static const unsigned char %s[%lu] = {", name, len);
+	BITSFUNC(copy)(outfile, data + offset, len);
+	fprintf(outfile, "\n};\n\n");
+}
+
 static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 			 void *stripped_addr, size_t stripped_len,
 			 FILE *outfile, const char *name)
@@ -14,9 +49,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	unsigned long mapping_size;
 	ELF(Ehdr) *hdr = (ELF(Ehdr) *)raw_addr;
 	int i;
-	unsigned long j;
 	ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
-		*alt_sec = NULL;
+		*alt_sec = NULL, *extable_sec = NULL;
 	ELF(Dyn) *dyn = 0, *dyn_end = 0;
 	const char *secstrings;
 	INT_BITS syms[NSYMS] = {};
@@ -78,6 +112,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 		if (!strcmp(secstrings + GET_LE(&sh->sh_name),
 			    ".altinstructions"))
 			alt_sec = sh;
+		if (!strcmp(secstrings + GET_LE(&sh->sh_name), "__ex_table"))
+			extable_sec = sh;
 	}
 
 	if (!symtab_hdr)
@@ -149,13 +185,11 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	fprintf(outfile,
 		"static unsigned char raw_data[%lu] __ro_after_init __aligned(PAGE_SIZE) = {",
 		mapping_size);
-	for (j = 0; j < stripped_len; j++) {
-		if (j % 10 == 0)
-			fprintf(outfile, "\n\t");
-		fprintf(outfile, "0x%02X, ",
-			(int)((unsigned char *)stripped_addr)[j]);
-	}
+	BITSFUNC(copy)(outfile, stripped_addr, stripped_len);
 	fprintf(outfile, "\n};\n\n");
+	if (extable_sec)
+		BITSFUNC(extract)(raw_addr, raw_len, outfile,
+				  extable_sec, "extable");
 
 	fprintf(outfile, "const struct vdso_image %s = {\n", name);
 	fprintf(outfile, "\t.data = raw_data,\n");
@@ -166,6 +200,14 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 		fprintf(outfile, "\t.alt_len = %lu,\n",
 			(unsigned long)GET_LE(&alt_sec->sh_size));
 	}
+	if (extable_sec) {
+		fprintf(outfile, "\t.extable_base = %lu,\n",
+			(unsigned long)GET_LE(&extable_sec->sh_offset));
+		fprintf(outfile, "\t.extable_len = %lu,\n",
+			(unsigned long)GET_LE(&extable_sec->sh_size));
+		fprintf(outfile, "\t.extable = extable,\n");
+	}
+
 	for (i = 0; i < NSYMS; i++) {
 		if (required_syms[i].export && syms[i])
 			fprintf(outfile, "\t.sym_%s = %" PRIi64 ",\n",
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 27566e57e87d..1c8a6a8f7b59 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -15,6 +15,8 @@ struct vdso_image {
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
 	unsigned long alt, alt_len;
+	unsigned long extable_base, extable_len;
+	const void *extable;
 
 	long sym_vvar_start;  /* Negative offset to the vvar area */
 
@@ -45,6 +47,9 @@ extern void __init init_vdso_image(const struct vdso_image *image);
 
 extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);
 
+extern bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+				 unsigned long error_code,
+				 unsigned long fault_addr);
 #endif /* __ASSEMBLER__ */
 
 #endif /* _ASM_X86_VDSO_H */
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling
  2018-12-05 23:20 [RFC PATCH 0/4] x86: Add vDSO exception fixup for SGX Sean Christopherson
  2018-12-05 23:20 ` [RFC PATCH 1/4] x86/vdso: Add support for exception fixup in vDSO functions Sean Christopherson
@ 2018-12-05 23:20 ` Sean Christopherson
  2018-12-06 18:17   ` Dave Hansen
  2018-12-05 23:20 ` [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions " Sean Christopherson
  2018-12-05 23:20 ` [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions Sean Christopherson
  3 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2018-12-05 23:20 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, Dave Hansen, Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Andy Lutomirski, Jarkko Sakkinen,
	Josh Triplett

Call fixup_vdso_exception() in the SIGSEGV and SIGBUS paths to attempt
to fixup page faults in vDSO.  This allows vDSO functions to utilize
exception fixup to report unhandled page faults directly to userspace
for specific flows in lieu of generating a signal.

In the SIGSEGV flow, make sure to call fixup_vdso_exception() after
the error code has been sanitized.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/mm/fault.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..ff1cb10858d5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
 #include <asm/mmu_context.h>		/* vma_pkey()			*/
 #include <asm/efi.h>			/* efi_recover_from_page_fault()*/
 #include <asm/desc.h>			/* store_idt(), ...		*/
+#include <asm/vdso.h>			/* fixup_vdso_exception()	*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		if (address >= TASK_SIZE_MAX)
 			error_code |= X86_PF_PROT;
 
+		if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+			return;
+
 		if (likely(show_unhandled_signals))
 			show_signal_msg(regs, error_code, address, tsk);
 
@@ -1045,6 +1049,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 	if (is_prefetch(regs, error_code, address))
 		return;
 
+	if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+		return;
+
 	set_signal_archinfo(address, error_code);
 
 #ifdef CONFIG_MEMORY_FAILURE
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions in vDSO before signaling
  2018-12-05 23:20 [RFC PATCH 0/4] x86: Add vDSO exception fixup for SGX Sean Christopherson
  2018-12-05 23:20 ` [RFC PATCH 1/4] x86/vdso: Add support for exception fixup in vDSO functions Sean Christopherson
  2018-12-05 23:20 ` [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling Sean Christopherson
@ 2018-12-05 23:20 ` Sean Christopherson
  2018-12-06 18:22   ` Dave Hansen
  2018-12-05 23:20 ` [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions Sean Christopherson
  3 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2018-12-05 23:20 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, Dave Hansen, Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Andy Lutomirski, Jarkko Sakkinen,
	Josh Triplett

Call fixup_vdso_exception() to attempt to fixup exceptions in vDSO code
before generating a signal for userspace faults.   This allows vDSO
functions to utilize exception fixup to report unhandled exceptions
directly to userspace for specific flows in lieu of generating a signal.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/traps.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a7..106a6cc20ad6 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
 #include <asm/mpx.h>
 #include <asm/vm86.h>
 #include <asm/umip.h>
+#include <asm/vdso.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -223,6 +224,10 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
 	tsk->thread.error_code = error_code;
 	tsk->thread.trap_nr = trapnr;
 
+	if (user_mode(regs) &&
+	    fixup_vdso_exception(regs, trapnr, error_code, 0))
+		return 0;
+
 	return -1;
 }
 
@@ -563,6 +568,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	tsk->thread.error_code = error_code;
 	tsk->thread.trap_nr = X86_TRAP_GP;
 
+	if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
+		return;
+
 	show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
 
 	force_sig(SIGSEGV, tsk);
@@ -854,6 +862,9 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 	if (!si_code)
 		return;
 
+	if (fixup_vdso_exception(regs, trapnr, error_code, 0))
+		return;
+
 	force_sig_fault(SIGFPE, si_code,
 			(void __user *)uprobe_get_trap_addr(regs), task);
 }
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions
  2018-12-05 23:20 [RFC PATCH 0/4] x86: Add vDSO exception fixup for SGX Sean Christopherson
                   ` (2 preceding siblings ...)
  2018-12-05 23:20 ` [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions " Sean Christopherson
@ 2018-12-05 23:20 ` Sean Christopherson
  2018-12-05 23:40   ` Andy Lutomirski
  3 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2018-12-05 23:20 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, Dave Hansen, Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Andy Lutomirski, Jarkko Sakkinen,
	Josh Triplett

Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
enclave mode that runs as a sort of black box shared object that is
hosted by an untrusted normal CPL3 process.

Enclave transitions have semantics that are a lovely blend of SYCSALL,
SYSRET and VM-Exit.  In a non-faulting scenario, entering and exiting
an enclave can only be done through SGX-specific instructions, EENTER
and EEXIT respectively.  EENTER+EEXIT is analogous to SYSCALL+SYSRET,
e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
RIP from R{B,C}X.

But in a faulting/interrupting scenario, enclave transitions act more
like VM-Exit and VMRESUME.  Maintaining the black box nature of the
enclave means that hardware must automatically switch CPU context when
an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
or exception (exceptions are AEEs because asynchronous in this context
is relative to the enclave and not CPU execution, e.g. the enclave
doesn't get an opportunity to save/fuzz CPU state).

Like VM-Exits, all AEEs jump to a common location, referred to as the
Asynchronous Exiting Point (AEP).  The AEP is specified at enclave entry
via register passed to EENTER/ERESUME, similar to how the hypervisor
specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
Resuming the enclave/VM after the exiting event is handled is done via
ERESUME/VMRESUME respectively.  In SGX, AEEs that are handled by the
kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
the AEP which then ERESUMEs th enclave.

Enclaves also behave a bit like VMs in the sense that they can generate
exceptions as part of their normal operation that for all intents and
purposes need to handled in the enclave/VM.  However, unlike VMX, SGX
doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
as doing so would circumvent the enclave's security.  So to handle an
exception, the enclave must first be re-entered through the normal
EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
(VMRESUME behavior) after the source of the exception is resolved.

All of the above is just the tip of the iceberg when it comes to running
an enclave.  But, SGX was designed in such a way that the host process
can utilize a library to build, launch and run an enclave.  This is
roughly analogous to how e.g. libc implementations are used by most
applications so that the application can focus on its business logic.

The big gotcha is that because enclaves can generate *and* handle
exceptions, any SGX library must be prepared to handle nearly any
exception at any time (well, any time a thread is executing in an
enclave).  In Linux, this means the SGX library must register a
signal handler in order to intercept relevant exceptions and forward
them to the enclave (or in some cases, take action on behalf of the
enclave).  Unfortunately, Linux's signal mechanism doesn't mesh well
with libraries, e.g. signal handlers are process wide, are difficult
to chain, etc...  This becomes particularly nasty when using multiple
levels of libraries that register signal handlers, e.g. running an
enclave via cgo inside of the Go runtime.

In comes vDSO to save the day.  Now that vDSO can fixup exceptions,
add a function to wrap enclave transitions and intercept any exceptions
that occur in the enclave or on EENTER/ERESUME.  The actually code is
blissfully short (especially compared to this changelog).

In addition to the obvious trapnr, error_code and address, propagate
the leaf number, i.e. RAX, back to userspace so that the caller can know
whether the fault occurred in the enclave or if it occurred on EENTER.
A fault on EENTER generally means the enclave has died and needs to be
restarted.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/entry/vdso/Makefile      |   1 +
 arch/x86/entry/vdso/vdso.lds.S    |   1 +
 arch/x86/entry/vdso/vsgx_eenter.c | 108 ++++++++++++++++++++++++++++++
 3 files changed, 110 insertions(+)
 create mode 100644 arch/x86/entry/vdso/vsgx_eenter.c

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index eb543ee1bcec..ba46673076bd 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 
 # files to link into the vdso
 vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs-$(VDSO64-y)		+= vsgx_eenter.o
 
 # files to link into kernel
 obj-y				+= vma.o extable.o
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..e422c4454f34 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -25,6 +25,7 @@ VERSION {
 		__vdso_getcpu;
 		time;
 		__vdso_time;
+		__vdso_sgx_eenter;
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vsgx_eenter.c b/arch/x86/entry/vdso/vsgx_eenter.c
new file mode 100644
index 000000000000..3df4a95a34cc
--- /dev/null
+++ b/arch/x86/entry/vdso/vsgx_eenter.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2018 Intel Corporation.
+
+#include <uapi/linux/errno.h>
+#include <uapi/linux/types.h>
+
+#include "extable.h"
+
+/*
+ * This struct will be defined elsewhere in the actual implementation,
+ * e.g. arch/x86/include/uapi/asm/sgx.h.
+ */
+struct sgx_eenter_fault_info {
+	__u32   leaf;
+	__u16   trapnr;
+	__u16   error_code;
+	__u64   address;
+};
+
+/*
+ * ENCLU (ENCLave User) is an umbrella instruction for a variety of CPL3
+ * SGX functions,  The ENCLU function that is executed is specified in EAX,
+ * with each function potentially having more leaf-specific operands beyond
+ * EAX.  In the vDSO we're only concerned with the leafs that are used to
+ * transition to/from the enclave.
+ */
+enum sgx_enclu_leaves {
+	SGX_EENTER	= 2,
+	SGX_ERESUME	= 3,
+	SGX_EEXIT	= 4,
+};
+
+notrace long __vdso_sgx_eenter(void *tcs, void *priv,
+			       struct sgx_eenter_fault_info *fault_info)
+{
+	u32 trapnr, error_code;
+	long leaf;
+	u64 addr;
+
+	/*
+	 *	%eax = EENTER
+	 *	%rbx = tcs
+	 *	%rcx = do_eresume
+	 *	%rdi = priv
+	 * do_eenter:
+	 *	enclu
+	 *	jmp	out
+	 *
+	 * do_eresume:
+	 *	enclu
+	 *	ud2
+	 *
+	 * out:
+	 *	<return to C code>
+	 *
+	 * fault_fixup:
+	 *	<extable loads RDI, DSI and RDX with fault info>
+	 *	jmp	out
+	 */
+	asm volatile(
+		/*
+		 * When an event occurs in an enclave, hardware first exits the
+		 * enclave to the AEP, switching CPU context along the way, and
+		 * *then* delivers the event as usual.  As part of the context
+		 * switching, registers are loaded with synthetic state (except
+		 * BP and SP, which are saved/restored).  The defined synthetic
+		 * state loads registers so that simply executing ENCLU will do
+		 * ERESUME, e.g. RAX=4, RBX=TCS and RCX=AEP after an AEE.  So,
+		 * we just need to load RAX, RBX and RCX for EENTER, and encode
+		 * an ENCLU at the AEP.  Throw in a ud2 to ensure we don't fall
+		 * through ENCLU somehow.
+		 */
+		"	lea	2f(%%rip), %%rcx\n"
+		"1:	enclu\n"
+		"	jmp	3f\n"
+		"2:	enclu\n"
+		"	ud2\n"
+		"3:\n"
+
+		".pushsection .fixup, \"ax\" \n"
+		"4:	jmp 3b\n"
+		".popsection\n"
+		_ASM_VDSO_EXTABLE_HANDLE(1b, 4b)"\n"
+		_ASM_VDSO_EXTABLE_HANDLE(2b, 4b)
+
+		: "=a"(leaf), "=D" (trapnr), "=S" (error_code), "=d" (addr)
+		: "a" (SGX_EENTER), "b" (tcs), "D" (priv)
+		: "cc", "memory",
+		  "rcx", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"
+	);
+
+	/*
+	 * EEXIT means we left the assembly blob via EEXIT, anything else is
+	 * an unhandled exception (handled exceptions and interrupts simply
+	 * ERESUME from the AEP).
+	 */
+	if (leaf == SGX_EEXIT)
+		return 0;
+
+	if (fault_info) {
+		fault_info->leaf = leaf;
+		fault_info->trapnr = trapnr;
+		fault_info->error_code = error_code;
+		fault_info->address = addr;
+	}
+
+	return -EFAULT;
+}
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions
  2018-12-05 23:20 ` [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions Sean Christopherson
@ 2018-12-05 23:40   ` Andy Lutomirski
  2018-12-06 13:55     ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2018-12-05 23:40 UTC (permalink / raw)
  To: Christopherson, Sean J
  Cc: Andrew Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	X86 ML, Dave Hansen, Peter Zijlstra, H. Peter Anvin, LKML,
	Jarkko Sakkinen, Josh Triplett

On Wed, Dec 5, 2018 at 3:20 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
> enclave mode that runs as a sort of black box shared object that is
> hosted by an untrusted normal CPL3 process.
>
> Enclave transitions have semantics that are a lovely blend of SYCSALL,
> SYSRET and VM-Exit.  In a non-faulting scenario, entering and exiting
> an enclave can only be done through SGX-specific instructions, EENTER
> and EEXIT respectively.  EENTER+EEXIT is analogous to SYSCALL+SYSRET,
> e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
> RIP from R{B,C}X.
>
> But in a faulting/interrupting scenario, enclave transitions act more
> like VM-Exit and VMRESUME.  Maintaining the black box nature of the
> enclave means that hardware must automatically switch CPU context when
> an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
> or exception (exceptions are AEEs because asynchronous in this context
> is relative to the enclave and not CPU execution, e.g. the enclave
> doesn't get an opportunity to save/fuzz CPU state).
>
> Like VM-Exits, all AEEs jump to a common location, referred to as the
> Asynchronous Exiting Point (AEP).  The AEP is specified at enclave entry
> via register passed to EENTER/ERESUME, similar to how the hypervisor
> specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
> Resuming the enclave/VM after the exiting event is handled is done via
> ERESUME/VMRESUME respectively.  In SGX, AEEs that are handled by the
> kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
> the AEP which then ERESUMEs th enclave.
>
> Enclaves also behave a bit like VMs in the sense that they can generate
> exceptions as part of their normal operation that for all intents and
> purposes need to handled in the enclave/VM.  However, unlike VMX, SGX
> doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
> as doing so would circumvent the enclave's security.  So to handle an
> exception, the enclave must first be re-entered through the normal
> EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
> (VMRESUME behavior) after the source of the exception is resolved.
>
> All of the above is just the tip of the iceberg when it comes to running
> an enclave.  But, SGX was designed in such a way that the host process
> can utilize a library to build, launch and run an enclave.  This is
> roughly analogous to how e.g. libc implementations are used by most
> applications so that the application can focus on its business logic.
>
> The big gotcha is that because enclaves can generate *and* handle
> exceptions, any SGX library must be prepared to handle nearly any
> exception at any time (well, any time a thread is executing in an
> enclave).  In Linux, this means the SGX library must register a
> signal handler in order to intercept relevant exceptions and forward
> them to the enclave (or in some cases, take action on behalf of the
> enclave).  Unfortunately, Linux's signal mechanism doesn't mesh well
> with libraries, e.g. signal handlers are process wide, are difficult
> to chain, etc...  This becomes particularly nasty when using multiple
> levels of libraries that register signal handlers, e.g. running an
> enclave via cgo inside of the Go runtime.
>
> In comes vDSO to save the day.  Now that vDSO can fixup exceptions,
> add a function to wrap enclave transitions and intercept any exceptions
> that occur in the enclave or on EENTER/ERESUME.  The actually code is
> blissfully short (especially compared to this changelog).
>
> In addition to the obvious trapnr, error_code and address, propagate
> the leaf number, i.e. RAX, back to userspace so that the caller can know
> whether the fault occurred in the enclave or if it occurred on EENTER.
> A fault on EENTER generally means the enclave has died and needs to be
> restarted.
>
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/entry/vdso/Makefile      |   1 +
>  arch/x86/entry/vdso/vdso.lds.S    |   1 +
>  arch/x86/entry/vdso/vsgx_eenter.c | 108 ++++++++++++++++++++++++++++++
>  3 files changed, 110 insertions(+)
>  create mode 100644 arch/x86/entry/vdso/vsgx_eenter.c
>
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index eb543ee1bcec..ba46673076bd 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION)       := y
>
>  # files to link into the vdso
>  vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
> +vobjs-$(VDSO64-y)              += vsgx_eenter.o
>
>  # files to link into kernel
>  obj-y                          += vma.o extable.o
> diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
> index d3a2dce4cfa9..e422c4454f34 100644
> --- a/arch/x86/entry/vdso/vdso.lds.S
> +++ b/arch/x86/entry/vdso/vdso.lds.S
> @@ -25,6 +25,7 @@ VERSION {
>                 __vdso_getcpu;
>                 time;
>                 __vdso_time;
> +               __vdso_sgx_eenter;
>         local: *;
>         };
>  }
> diff --git a/arch/x86/entry/vdso/vsgx_eenter.c b/arch/x86/entry/vdso/vsgx_eenter.c
> new file mode 100644
> index 000000000000..3df4a95a34cc
> --- /dev/null
> +++ b/arch/x86/entry/vdso/vsgx_eenter.c
> @@ -0,0 +1,108 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +// Copyright(c) 2018 Intel Corporation.
> +
> +#include <uapi/linux/errno.h>
> +#include <uapi/linux/types.h>
> +
> +#include "extable.h"
> +
> +/*
> + * This struct will be defined elsewhere in the actual implementation,
> + * e.g. arch/x86/include/uapi/asm/sgx.h.
> + */
> +struct sgx_eenter_fault_info {
> +       __u32   leaf;
> +       __u16   trapnr;
> +       __u16   error_code;
> +       __u64   address;
> +};
> +
> +/*
> + * ENCLU (ENCLave User) is an umbrella instruction for a variety of CPL3
> + * SGX functions,  The ENCLU function that is executed is specified in EAX,
> + * with each function potentially having more leaf-specific operands beyond
> + * EAX.  In the vDSO we're only concerned with the leafs that are used to
> + * transition to/from the enclave.
> + */
> +enum sgx_enclu_leaves {
> +       SGX_EENTER      = 2,
> +       SGX_ERESUME     = 3,
> +       SGX_EEXIT       = 4,
> +};
> +
> +notrace long __vdso_sgx_eenter(void *tcs, void *priv,
> +                              struct sgx_eenter_fault_info *fault_info)
> +{
> +       u32 trapnr, error_code;
> +       long leaf;
> +       u64 addr;
> +
> +       /*
> +        *      %eax = EENTER
> +        *      %rbx = tcs
> +        *      %rcx = do_eresume
> +        *      %rdi = priv
> +        * do_eenter:
> +        *      enclu
> +        *      jmp     out
> +        *
> +        * do_eresume:
> +        *      enclu
> +        *      ud2

Is the only reason for do_eresume to be different from do_eenter so
that you can do the ud2?

> +        *
> +        * out:
> +        *      <return to C code>
> +        *
> +        * fault_fixup:
> +        *      <extable loads RDI, DSI and RDX with fault info>
> +        *      jmp     out
> +        */

This has the IMO excellent property that it's extremely awkward to use
it for a model where the enclave is reentrant.  I think it's excellent
because reentrancy on the same enclave thread is just asking for
severe bugs.  Of course, I fully expect the SDK to emulate reentrancy,
but then it's 100% their problem :)  On the fiip side, it means that
you can't really recover from a reported fault, even if you want to,
because there's no way to ask for ERESUME.  So maybe the API should
allow that after all.

I think it might be polite to at least give some out regs, maybe RSI and RDI?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions
  2018-12-05 23:40   ` Andy Lutomirski
@ 2018-12-06 13:55     ` Sean Christopherson
  2018-12-06 14:17       ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2018-12-06 13:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Dave Hansen, Peter Zijlstra, H. Peter Anvin, LKML,
	Jarkko Sakkinen, Josh Triplett

On Wed, Dec 05, 2018 at 03:40:48PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 5, 2018 at 3:20 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> > +notrace long __vdso_sgx_eenter(void *tcs, void *priv,
> > +                              struct sgx_eenter_fault_info *fault_info)
> > +{
> > +       u32 trapnr, error_code;
> > +       long leaf;
> > +       u64 addr;
> > +
> > +       /*
> > +        *      %eax = EENTER
> > +        *      %rbx = tcs
> > +        *      %rcx = do_eresume
> > +        *      %rdi = priv
> > +        * do_eenter:
> > +        *      enclu
> > +        *      jmp     out
> > +        *
> > +        * do_eresume:
> > +        *      enclu
> > +        *      ud2
> 
> Is the only reason for do_eresume to be different from do_eenter so
> that you can do the ud2?

No, it was a holdover from doing fixup via a magic prefix in user code.
The fixup could only skip the ENCLU and so a second ENCLU was needed to
differentiate between EENTER and ERESUME.  The need for two ENCLUs got
ingrained in my head.  I can't think of anything that will break if we
use a single ENCLU.

> > +        *
> > +        * out:
> > +        *      <return to C code>
> > +        *
> > +        * fault_fixup:
> > +        *      <extable loads RDI, DSI and RDX with fault info>
> > +        *      jmp     out
> > +        */
> 
> This has the IMO excellent property that it's extremely awkward to use
> it for a model where the enclave is reentrant.  I think it's excellent
> because reentrancy on the same enclave thread is just asking for
> severe bugs.  Of course, I fully expect the SDK to emulate reentrancy,
> but then it's 100% their problem :)  On the fiip side, it means that
> you can't really recover from a reported fault, even if you want to,
> because there's no way to ask for ERESUME.  So maybe the API should
> allow that after all.

Doh.  The ability to do ERESUME is an explicit requirement from the SDK
folks.  More code that I pulled from my userspace implementation and
didn't revisit.

> I think it might be polite to at least give some out regs, maybe RSI and RDI?

For the outbound path?  I was thinking @priv would be used for passing
data out as well as in.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions
  2018-12-06 13:55     ` Sean Christopherson
@ 2018-12-06 14:17       ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2018-12-06 14:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Dave Hansen, Peter Zijlstra, H. Peter Anvin, LKML,
	Jarkko Sakkinen, Josh Triplett

On Thu, Dec 06, 2018 at 05:55:47AM -0800, Sean Christopherson wrote:
> On Wed, Dec 05, 2018 at 03:40:48PM -0800, Andy Lutomirski wrote:
> > On Wed, Dec 5, 2018 at 3:20 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > > +notrace long __vdso_sgx_eenter(void *tcs, void *priv,
> > > +                              struct sgx_eenter_fault_info *fault_info)
> > > +{
> > > +       u32 trapnr, error_code;
> > > +       long leaf;
> > > +       u64 addr;
> > > +
> > > +       /*
> > > +        *      %eax = EENTER
> > > +        *      %rbx = tcs
> > > +        *      %rcx = do_eresume
> > > +        *      %rdi = priv
> > > +        * do_eenter:
> > > +        *      enclu
> > > +        *      jmp     out
> > > +        *
> > > +        * do_eresume:
> > > +        *      enclu
> > > +        *      ud2
> > 
> > Is the only reason for do_eresume to be different from do_eenter so
> > that you can do the ud2?
> 
> No, it was a holdover from doing fixup via a magic prefix in user code.
> The fixup could only skip the ENCLU and so a second ENCLU was needed to
> differentiate between EENTER and ERESUME.  The need for two ENCLUs got
> ingrained in my head.  I can't think of anything that will break if we
> use a single ENCLU.
> 
> > > +        *
> > > +        * out:
> > > +        *      <return to C code>
> > > +        *
> > > +        * fault_fixup:
> > > +        *      <extable loads RDI, DSI and RDX with fault info>
> > > +        *      jmp     out
> > > +        */
> > 
> > This has the IMO excellent property that it's extremely awkward to use
> > it for a model where the enclave is reentrant.  I think it's excellent
> > because reentrancy on the same enclave thread is just asking for
> > severe bugs.  Of course, I fully expect the SDK to emulate reentrancy,
> > but then it's 100% their problem :)  On the fiip side, it means that
> > you can't really recover from a reported fault, even if you want to,
> > because there's no way to ask for ERESUME.  So maybe the API should
> > allow that after all.
> 
> Doh.  The ability to do ERESUME is an explicit requirement from the SDK
> folks.  More code that I pulled from my userspace implementation and
> didn't revisit.

Is it ok to add a separate exported function for ERESUME?  ERESUME can't
explicitly pass anything to the enclave, i.e. doesn't need a @priv param.
A separate function is a little prettier, e.g.:

static inline
long vdso_enter_enclave(enum sgx_enclu_leaf op, void *tcs, void *priv,
			struct sgx_eenter_fault_info *fault_info)
{
	...
}

notrace long __vdso_sgx_eenter(void *tcs, void *priv,
			       struct sgx_eenter_fault_info *fault_info)
{
	return vdso_enter_enclave(SGX_EENTER, tcs, priv, fault_info);
}

notrace long __vdso_sgx_eresume(void *tcs,
				struct sgx_eenter_fault_info *fault_info)
{
	return vdso_enter_enclave(SGX_ERESUME, tcs, NULL, fault_info);
}
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling
  2018-12-05 23:20 ` [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling Sean Christopherson
@ 2018-12-06 18:17   ` Dave Hansen
  2018-12-06 18:20     ` Sean Christopherson
  2018-12-06 18:46     ` Andy Lutomirski
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2018-12-06 18:17 UTC (permalink / raw)
  To: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Dave Hansen, Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Andy Lutomirski, Jarkko Sakkinen,
	Josh Triplett

>  #define CREATE_TRACE_POINTS
>  #include <asm/trace/exceptions.h>
> @@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
>  		if (address >= TASK_SIZE_MAX)
>  			error_code |= X86_PF_PROT;
>  
> +		if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
> +			return;
> +
>  		if (likely(show_unhandled_signals))
>  			show_signal_msg(regs, error_code, address, tsk);

I'd preferably like to get this plugged into the page fault code before
we get to the "bad_area" handling.  This plugs it in near the erratum
handling which seems really late to me.

> @@ -1045,6 +1049,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
>  	if (is_prefetch(regs, error_code, address))
>  		return;
>  
> +	if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
> +		return;
> +
>  	set_signal_archinfo(address, error_code);
>  
>  #ifdef CONFIG_MEMORY_FAILURE

This *seems* really late to me.  We've already called into the mm fault
handling code to try and handle the fault and they told us it was
VM_FAULT_SIGBUS.  Shouldn't we have just detected that it was in the
vDSO first and not even called the handling code?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling
  2018-12-06 18:17   ` Dave Hansen
@ 2018-12-06 18:20     ` Sean Christopherson
  2018-12-06 18:46     ` Andy Lutomirski
  1 sibling, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2018-12-06 18:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, Dave Hansen, Peter Zijlstra, H. Peter Anvin, linux-kernel,
	Andy Lutomirski, Jarkko Sakkinen, Josh Triplett

On Thu, Dec 06, 2018 at 10:17:34AM -0800, Dave Hansen wrote:
> >  #define CREATE_TRACE_POINTS
> >  #include <asm/trace/exceptions.h>
> > @@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
> >  		if (address >= TASK_SIZE_MAX)
> >  			error_code |= X86_PF_PROT;
> >  
> > +		if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
> > +			return;
> > +
> >  		if (likely(show_unhandled_signals))
> >  			show_signal_msg(regs, error_code, address, tsk);
> 
> I'd preferably like to get this plugged into the page fault code before
> we get to the "bad_area" handling.  This plugs it in near the erratum
> handling which seems really late to me.
> 
> > @@ -1045,6 +1049,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
> >  	if (is_prefetch(regs, error_code, address))
> >  		return;
> >  
> > +	if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
> > +		return;
> > +
> >  	set_signal_archinfo(address, error_code);
> >  
> >  #ifdef CONFIG_MEMORY_FAILURE
> 
> This *seems* really late to me.  We've already called into the mm fault
> handling code to try and handle the fault and they told us it was
> VM_FAULT_SIGBUS.  Shouldn't we have just detected that it was in the
> vDSO first and not even called the handling code?

Not sure if Andy had other use cases in mind, but for SGX we only want
to invoke the fixup in lieu of a signal.  E.g. #PFs to fault in an EPC
page need to be handled in the kernel.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions in vDSO before signaling
  2018-12-05 23:20 ` [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions " Sean Christopherson
@ 2018-12-06 18:22   ` Dave Hansen
  2018-12-06 18:49     ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2018-12-06 18:22 UTC (permalink / raw)
  To: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Dave Hansen, Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Andy Lutomirski, Jarkko Sakkinen,
	Josh Triplett

On 12/5/18 3:20 PM, Sean Christopherson wrote:
> @@ -223,6 +224,10 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
>  	tsk->thread.error_code = error_code;
>  	tsk->thread.trap_nr = trapnr;
>  
> +	if (user_mode(regs) &&
> +	    fixup_vdso_exception(regs, trapnr, error_code, 0))
> +		return 0;
> +
>  	return -1;
>  }
>  
> @@ -563,6 +568,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
>  	tsk->thread.error_code = error_code;
>  	tsk->thread.trap_nr = X86_TRAP_GP;
>  
> +	if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
> +		return;
> +
>  	show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
>  
>  	force_sig(SIGSEGV, tsk);
> @@ -854,6 +862,9 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
>  	if (!si_code)
>  		return;
>  
> +	if (fixup_vdso_exception(regs, trapnr, error_code, 0))
> +		return;
> +
>  	force_sig_fault(SIGFPE, si_code,
>  			(void __user *)uprobe_get_trap_addr(regs), task);
>  }
> -- 

Needs commenting, please.

But, also, this seems really ad-hoc.  Probably, that's a result of our
signal generation being really ad-hoc itself.  But, if this claims
"Attempt to fixup exceptions in vDSO before signaling", how do we assure
ourselves that we hit all the ad-hoc signal generation cases?  How do we
know we didn't miss one or ten?

I want to hear more of the story of how you picked these sites and also
decided that this is a comprehensive-enough set of sites to patch.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling
  2018-12-06 18:17   ` Dave Hansen
  2018-12-06 18:20     ` Sean Christopherson
@ 2018-12-06 18:46     ` Andy Lutomirski
  1 sibling, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2018-12-06 18:46 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christopherson, Sean J, Andrew Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, Dave Hansen,
	Peter Zijlstra, H. Peter Anvin, LKML, Jarkko Sakkinen,
	Josh Triplett

On Thu, Dec 6, 2018 at 10:17 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> >  #define CREATE_TRACE_POINTS
> >  #include <asm/trace/exceptions.h>
> > @@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
> >               if (address >= TASK_SIZE_MAX)
> >                       error_code |= X86_PF_PROT;
> >
> > +             if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
> > +                     return;
> > +
> >               if (likely(show_unhandled_signals))
> >                       show_signal_msg(regs, error_code, address, tsk);
>
> I'd preferably like to get this plugged into the page fault code before
> we get to the "bad_area" handling.  This plugs it in near the erratum
> handling which seems really late to me.
>
> > @@ -1045,6 +1049,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
> >       if (is_prefetch(regs, error_code, address))
> >               return;
> >
> > +     if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
> > +             return;
> > +
> >       set_signal_archinfo(address, error_code);
> >
> >  #ifdef CONFIG_MEMORY_FAILURE
>
> This *seems* really late to me.  We've already called into the mm fault
> handling code to try and handle the fault and they told us it was
> VM_FAULT_SIGBUS.  Shouldn't we have just detected that it was in the
> vDSO first and not even called the handling code?

I think we only want to do the fixup in cases where we would have
signalled.  I'm about 99.5% confident that if the page fault points to
valid user memory that just happened to have been swapped out, then we
want to swap it in and go straight to ERESUME rather than bailing out
of the vDSO.  What we really want is to IRET with a magic flag saying
"resume the enclave, silly", but we don't have any way to ask for
that.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions in vDSO before signaling
  2018-12-06 18:22   ` Dave Hansen
@ 2018-12-06 18:49     ` Andy Lutomirski
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2018-12-06 18:49 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christopherson, Sean J, Andrew Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, Dave Hansen,
	Peter Zijlstra, H. Peter Anvin, LKML, Jarkko Sakkinen,
	Josh Triplett

On Thu, Dec 6, 2018 at 10:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 12/5/18 3:20 PM, Sean Christopherson wrote:
> > @@ -223,6 +224,10 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
> >       tsk->thread.error_code = error_code;
> >       tsk->thread.trap_nr = trapnr;
> >
> > +     if (user_mode(regs) &&
> > +         fixup_vdso_exception(regs, trapnr, error_code, 0))
> > +             return 0;
> > +
> >       return -1;
> >  }
> >
> > @@ -563,6 +568,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
> >       tsk->thread.error_code = error_code;
> >       tsk->thread.trap_nr = X86_TRAP_GP;
> >
> > +     if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
> > +             return;
> > +
> >       show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
> >
> >       force_sig(SIGSEGV, tsk);
> > @@ -854,6 +862,9 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
> >       if (!si_code)
> >               return;
> >
> > +     if (fixup_vdso_exception(regs, trapnr, error_code, 0))
> > +             return;
> > +
> >       force_sig_fault(SIGFPE, si_code,
> >                       (void __user *)uprobe_get_trap_addr(regs), task);
> >  }
> > --
>
> Needs commenting, please.
>
> But, also, this seems really ad-hoc.  Probably, that's a result of our
> signal generation being really ad-hoc itself.  But, if this claims
> "Attempt to fixup exceptions in vDSO before signaling", how do we assure
> ourselves that we hit all the ad-hoc signal generation cases?  How do we
> know we didn't miss one or ten?
>
> I want to hear more of the story of how you picked these sites and also
> decided that this is a comprehensive-enough set of sites to patch.

With my maintainer hat on, it would be awesome if we could inspire
Sean to do a nice cleanup and unify the code such that there is a
single "send a signal to user code to report an exception that wasn't
fixed up" path.  But that's also quite a big request for an otherwise
not-terribly-huge patch...

But, in the absence of a cleanup like that, we should at least
enumerate all the signals that are indented to get fixed up somewhere
in the comments or the changelog.  I'm a big suspicious that the
correct answer is "all of them", with the possible exception of MCE.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-12-06 18:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 23:20 [RFC PATCH 0/4] x86: Add vDSO exception fixup for SGX Sean Christopherson
2018-12-05 23:20 ` [RFC PATCH 1/4] x86/vdso: Add support for exception fixup in vDSO functions Sean Christopherson
2018-12-05 23:20 ` [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling Sean Christopherson
2018-12-06 18:17   ` Dave Hansen
2018-12-06 18:20     ` Sean Christopherson
2018-12-06 18:46     ` Andy Lutomirski
2018-12-05 23:20 ` [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions " Sean Christopherson
2018-12-06 18:22   ` Dave Hansen
2018-12-06 18:49     ` Andy Lutomirski
2018-12-05 23:20 ` [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions Sean Christopherson
2018-12-05 23:40   ` Andy Lutomirski
2018-12-06 13:55     ` Sean Christopherson
2018-12-06 14:17       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).