All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Remove syscall instructions at fixed addresses
@ 2011-05-30  3:48 Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 01/10] x86-64: Fix alignment of jiffies variable Andy Lutomirski
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

This series is really five different parts.



The first part (patch 1/10) is just a bugfix from the last vdso series.
The bug should be harmless but it's pretty dumb.  This is almost
certainly 3.0 material.



The second part removes a bunch of syscall instructions in kernel space
at fixed addresses that user code can execute.  This is not all that
well tested or inspected at this point.

Several are data that isn't marked NX.  Patch 2/10 makes vvars NX and
5/10 makes the HPET NX.

The time() vsyscall contains an explicit syscall fallback.  Patch 3/10
removes it.

The last one is the gettimeofday fallback.  We need that, but it doesn't
have to be a real syscall.  Patch 4/10 adds int 0xcc (callable only from
the vsyscall page) that implements the gettimeofday fallback and nothing
else.



The third part is a more aggressive cleanup of the vsyscall page.  It
removes the code implementing the vsyscalls and replaces it with magic
int 0xcc incantations.  These incantations are specifically designed so
that jumping into them at funny offsets will either work fine or
generate some kind of fault.  Patch 8/10 is optional and might want to
be hidden away in CONFIG_EMBEDDED for awhile.  This needs some more
testing in CONFIG_UNSAFE_VSYSCALLS=y mode and a lot of careful
inspection in CONFIG_UNSAFE_VSYSCALLS=n mode.

Patch 6/10 removes venosys.  It's been broken (crashes) for a couple
years and it doesn't do anything particularly useful anyway.

Patch 7/10 fills the vsyscall page with 0xcc instead of 0x00.  0xcc is
an explicit trap

Patch 8/10 adds a config option to emulate the vsyscalls.  The int 0xcc
incantation intentionally depends on the config option -- it is not ABI.



Patch 9/10 randomizes the int 0xcc incantation at bootup.  It is pretty
much worthless for security (there are only three choices for the random
number and it's easy to figure out which one is in use) but it prevents
overly clever userspace programs from thinking that the incantation is
ABI.  One instrumentation tool author offered to hard-code special
handling for int 0xcc; I want to discourage this approach.

Patch 10/10 adds some documentation for entry_64.S.  A lot of the magic
in there is far from obvious.


Changes from v1:
 - Patches 6-10 are new.
 - The int 0xcc code is much prettier and has lots of bugs fixed.
 - I've decided to let everyone compile turbostat on their own :)

Andy Lutomirski (10):
  x86-64: Fix alignment of jiffies variable
  x86-64: Give vvars their own page
  x86-64: Remove kernel.vsyscall64 sysctl
  x86-64: Replace vsyscall gettimeofday fallback with int 0xcc
  x86-64: Map the HPET NX
  x86-64: Remove vsyscall number 3 (venosys)
  x86-64: Fill unused parts of the vsyscall page with 0xcc
  x86-64: Emulate vsyscalls
  x86-64: Randomize int 0xcc magic al values at boot
  x86-64: Document some of entry_64.S

 Documentation/x86/entry_64.txt       |   95 +++++++++++
 arch/x86/Kconfig                     |   17 ++
 arch/x86/include/asm/fixmap.h        |    1 +
 arch/x86/include/asm/irq_vectors.h   |    6 +-
 arch/x86/include/asm/pgtable_types.h |    6 +-
 arch/x86/include/asm/traps.h         |    4 +
 arch/x86/include/asm/vgtod.h         |    1 -
 arch/x86/include/asm/vsyscall.h      |    6 +
 arch/x86/include/asm/vvar.h          |   24 ++--
 arch/x86/kernel/Makefile             |    3 +
 arch/x86/kernel/entry_64.S           |    4 +
 arch/x86/kernel/hpet.c               |    2 +-
 arch/x86/kernel/traps.c              |    4 +
 arch/x86/kernel/vmlinux.lds.S        |   47 +++---
 arch/x86/kernel/vsyscall_64.c        |  289 +++++++++++++++++++++++++++++-----
 arch/x86/kernel/vsyscall_emu_64.S    |   40 +++++
 arch/x86/vdso/vclock_gettime.c       |   55 +++----
 17 files changed, 486 insertions(+), 118 deletions(-)
 create mode 100644 Documentation/x86/entry_64.txt
 create mode 100644 arch/x86/kernel/vsyscall_emu_64.S

-- 
1.7.5.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 01/10] x86-64: Fix alignment of jiffies variable
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 02/10] x86-64: Give vvars their own page Andy Lutomirski
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

It's declared __attribute__((aligned(16)) but it's explicitly not
aligned.  This is probably harmless but it's a but embarrassing.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/vvar.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 341b355..a4eaca4 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -45,7 +45,7 @@
 /* DECLARE_VVAR(offset, type, name) */
 
 DECLARE_VVAR(0, volatile unsigned long, jiffies)
-DECLARE_VVAR(8, int, vgetcpu_mode)
+DECLARE_VVAR(16, int, vgetcpu_mode)
 DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
 
 #undef DECLARE_VVAR
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 02/10] x86-64: Give vvars their own page
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 01/10] x86-64: Fix alignment of jiffies variable Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 03/10] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

Move vvars out of the vsyscall page into their own page and mark it
NX.

Without this patch, an attacker who can force a daemon to call some
fixed address could wait until the time contains, say, 0xCD80, and
then execute the current time.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/fixmap.h        |    1 +
 arch/x86/include/asm/pgtable_types.h |    2 ++
 arch/x86/include/asm/vvar.h          |   22 ++++++++++------------
 arch/x86/kernel/vmlinux.lds.S        |   27 ++++++++++++++++-----------
 arch/x86/kernel/vsyscall_64.c        |    5 +++++
 5 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 4729b2b..460c74e 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -78,6 +78,7 @@ enum fixed_addresses {
 	VSYSCALL_LAST_PAGE,
 	VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
 			    + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1,
+	VVAR_PAGE,
 	VSYSCALL_HPET,
 #endif
 	FIX_DBGP_BASE,
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d56187c..6a29aed6 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -108,6 +108,7 @@
 #define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
 #define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
@@ -130,6 +131,7 @@
 #define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
 #define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
 #define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
+#define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
 
 #define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
 #define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index a4eaca4..de656ac 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -10,15 +10,14 @@
  * In normal kernel code, they are used like any other variable.
  * In user code, they are accessed through the VVAR macro.
  *
- * Each of these variables lives in the vsyscall page, and each
- * one needs a unique offset within the little piece of the page
- * reserved for vvars.  Specify that offset in DECLARE_VVAR.
- * (There are 896 bytes available.  If you mess up, the linker will
- * catch it.)
+ * These variables live in a page of kernel data that has an extra RO
+ * mapping for userspace.  Each variable needs a unique offset within
+ * that page; specify that offset with the DECLARE_VVAR macro.  (If
+ * you mess up, the linker will catch it.)
  */
 
-/* Offset of vars within vsyscall page */
-#define VSYSCALL_VARS_OFFSET (3072 + 128)
+/* Base address of vvars.  This is not ABI. */
+#define VVAR_ADDRESS (-10*1024*1024 - 4096)
 
 #if defined(__VVAR_KERNEL_LDS)
 
@@ -26,17 +25,17 @@
  * right place.
  */
 #define DECLARE_VVAR(offset, type, name) \
-	EMIT_VVAR(name, VSYSCALL_VARS_OFFSET + offset)
+	EMIT_VVAR(name, offset)
 
 #else
 
 #define DECLARE_VVAR(offset, type, name)				\
 	static type const * const vvaraddr_ ## name =			\
-		(void *)(VSYSCALL_START + VSYSCALL_VARS_OFFSET + (offset));
+		(void *)(VVAR_ADDRESS + (offset));
 
 #define DEFINE_VVAR(type, name)						\
-	type __vvar_ ## name						\
-	__attribute__((section(".vsyscall_var_" #name), aligned(16)))
+	type name							\
+	__attribute__((section(".vvar_" #name), aligned(16)))
 
 #define VVAR(name) (*vvaraddr_ ## name)
 
@@ -49,4 +48,3 @@ DECLARE_VVAR(16, int, vgetcpu_mode)
 DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
 
 #undef DECLARE_VVAR
-#undef VSYSCALL_VARS_OFFSET
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1432bd4..3c1ec1c 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -161,12 +161,6 @@ SECTIONS
 
 #define VVIRT_OFFSET (VSYSCALL_ADDR - __vsyscall_0)
 #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET)
-#define EMIT_VVAR(x, offset) .vsyscall_var_ ## x	\
-	ADDR(.vsyscall_0) + offset		 	\
-	: AT(VLOAD(.vsyscall_var_ ## x)) {     		\
-		*(.vsyscall_var_ ## x)			\
-	}						\
-	x = VVIRT(.vsyscall_var_ ## x);
 
 	. = ALIGN(4096);
 	__vsyscall_0 = .;
@@ -192,17 +186,28 @@ SECTIONS
 		*(.vsyscall_3)
 	}
 
-#define __VVAR_KERNEL_LDS
-#include <asm/vvar.h>
-#undef __VVAR_KERNEL_LDS
-
-	. = __vsyscall_0 + PAGE_SIZE;
+	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
 #undef VLOAD_OFFSET
 #undef VLOAD
 #undef VVIRT_OFFSET
 #undef VVIRT
+
+	__vvar_page = .;
+
+#define EMIT_VVAR(name, offset) .vvar_ ## name		\
+	(__vvar_page + offset) :			\
+	 AT(ADDR(.vvar_ ## name) - LOAD_OFFSET) {	\
+		*(.vvar_ ## x)				\
+	} :data
+
+#define __VVAR_KERNEL_LDS
+#include <asm/vvar.h>
+#undef __VVAR_KERNEL_LDS
+
+       . = ALIGN(__vvar_page + PAGE_SIZE, PAGE_SIZE);
+
 #undef EMIT_VVAR
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 5f6ad03..ee22180 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -284,9 +284,14 @@ void __init map_vsyscall(void)
 {
 	extern char __vsyscall_0;
 	unsigned long physaddr_page0 = __pa_symbol(&__vsyscall_0);
+	extern char __vvar_page;
+	unsigned long physaddr_vvar_page = __pa_symbol(&__vvar_page);
 
 	/* Note that VSYSCALL_MAPPED_PAGES must agree with the code below. */
 	__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
+	__set_fixmap(VVAR_PAGE, physaddr_vvar_page, PAGE_KERNEL_VVAR);
+	BUILD_BUG_ON((unsigned long)__fix_to_virt(VVAR_PAGE) !=
+		     (unsigned long)VVAR_ADDRESS);
 }
 
 static int __init vsyscall_init(void)
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 03/10] x86-64: Remove kernel.vsyscall64 sysctl
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 01/10] x86-64: Fix alignment of jiffies variable Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 02/10] x86-64: Give vvars their own page Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 04/10] x86-64: Replace vsyscall gettimeofday fallback with int 0xcc Andy Lutomirski
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

It's unnecessary overhead in code that's supposed to be highly
optimized.  Removing it allows us to remove one of the two syscall
instructions in the vsyscall page.

The only sensible use for it is for UML users, and it doesn't fully
address inconsistent vsyscall results on UML.  The real fix for UML
is to stop using vsyscalls entirely.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/vgtod.h   |    1 -
 arch/x86/kernel/vsyscall_64.c  |   34 +------------------------
 arch/x86/vdso/vclock_gettime.c |   55 +++++++++++++++------------------------
 3 files changed, 22 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 646b4c1..aa5add8 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -11,7 +11,6 @@ struct vsyscall_gtod_data {
 	time_t		wall_time_sec;
 	u32		wall_time_nsec;
 
-	int		sysctl_enabled;
 	struct timezone sys_tz;
 	struct { /* extract of a clocksource struct */
 		cycle_t (*vread)(void);
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index ee22180..3e8dac7 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -53,7 +53,6 @@ DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 {
 	.lock = SEQLOCK_UNLOCKED,
-	.sysctl_enabled = 1,
 };
 
 void update_vsyscall_tz(void)
@@ -103,15 +102,6 @@ static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
 	return ret;
 }
 
-static __always_inline long time_syscall(long *t)
-{
-	long secs;
-	asm volatile("syscall"
-		: "=a" (secs)
-		: "0" (__NR_time),"D" (t) : __syscall_clobber);
-	return secs;
-}
-
 static __always_inline void do_vgettimeofday(struct timeval * tv)
 {
 	cycle_t now, base, mask, cycle_delta;
@@ -122,8 +112,7 @@ static __always_inline void do_vgettimeofday(struct timeval * tv)
 		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
 
 		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled ||
-			     !vread)) {
+		if (unlikely(!vread)) {
 			gettimeofday(tv,NULL);
 			return;
 		}
@@ -165,8 +154,6 @@ time_t __vsyscall(1) vtime(time_t *t)
 {
 	unsigned seq;
 	time_t result;
-	if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled))
-		return time_syscall(t);
 
 	do {
 		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
@@ -227,22 +214,6 @@ static long __vsyscall(3) venosys_1(void)
 	return -ENOSYS;
 }
 
-#ifdef CONFIG_SYSCTL
-static ctl_table kernel_table2[] = {
-	{ .procname = "vsyscall64",
-	  .data = &vsyscall_gtod_data.sysctl_enabled, .maxlen = sizeof(int),
-	  .mode = 0644,
-	  .proc_handler = proc_dointvec },
-	{}
-};
-
-static ctl_table kernel_root_table2[] = {
-	{ .procname = "kernel", .mode = 0555,
-	  .child = kernel_table2 },
-	{}
-};
-#endif
-
 /* Assume __initcall executes before all user space. Hopefully kmod
    doesn't violate that. We'll find out if it does. */
 static void __cpuinit vsyscall_set_cpu(int cpu)
@@ -301,9 +272,6 @@ static int __init vsyscall_init(void)
 	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
 	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
 	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
-#ifdef CONFIG_SYSCTL
-	register_sysctl_table(kernel_root_table2);
-#endif
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index a724905..cf54813 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -116,21 +116,21 @@ notrace static noinline int do_monotonic_coarse(struct timespec *ts)
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
-	if (likely(gtod->sysctl_enabled))
-		switch (clock) {
-		case CLOCK_REALTIME:
-			if (likely(gtod->clock.vread))
-				return do_realtime(ts);
-			break;
-		case CLOCK_MONOTONIC:
-			if (likely(gtod->clock.vread))
-				return do_monotonic(ts);
-			break;
-		case CLOCK_REALTIME_COARSE:
-			return do_realtime_coarse(ts);
-		case CLOCK_MONOTONIC_COARSE:
-			return do_monotonic_coarse(ts);
-		}
+	switch (clock) {
+	case CLOCK_REALTIME:
+		if (likely(gtod->clock.vread))
+			return do_realtime(ts);
+		break;
+	case CLOCK_MONOTONIC:
+		if (likely(gtod->clock.vread))
+			return do_monotonic(ts);
+		break;
+	case CLOCK_REALTIME_COARSE:
+		return do_realtime_coarse(ts);
+	case CLOCK_MONOTONIC_COARSE:
+		return do_monotonic_coarse(ts);
+	}
+
 	return vdso_fallback_gettime(clock, ts);
 }
 int clock_gettime(clockid_t, struct timespec *)
@@ -139,7 +139,7 @@ int clock_gettime(clockid_t, struct timespec *)
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	long ret;
-	if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
+	if (likely(gtod->clock.vread)) {
 		if (likely(tv != NULL)) {
 			BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
 				     offsetof(struct timespec, tv_nsec) ||
@@ -161,27 +161,14 @@ notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 int gettimeofday(struct timeval *, struct timezone *)
 	__attribute__((weak, alias("__vdso_gettimeofday")));
 
-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
-
-static __always_inline long time_syscall(long *t)
-{
-	long secs;
-	asm volatile("syscall"
-		     : "=a" (secs)
-		     : "0" (__NR_time), "D" (t) : "cc", "r11", "cx", "memory");
-	return secs;
-}
-
+/*
+ * This will break when the xtime seconds get inaccurate, but that is
+ * unlikely
+ */
 notrace time_t __vdso_time(time_t *t)
 {
-	time_t result;
-
-	if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled))
-		return time_syscall(t);
-
 	/* This is atomic on x86_64 so we don't need any locks. */
-	result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
+	time_t result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
 
 	if (t)
 		*t = result;
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 04/10] x86-64: Replace vsyscall gettimeofday fallback with int 0xcc
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (2 preceding siblings ...)
  2011-05-30  3:48 ` [PATCH v2 03/10] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 05/10] x86-64: Map the HPET NX Andy Lutomirski
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

Now the only way to issue a syscall with side effects through the
vsyscall page is to call a misaligned instruction.  I haven't
checked for that.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/irq_vectors.h |    6 ++-
 arch/x86/include/asm/traps.h       |    4 ++
 arch/x86/include/asm/vsyscall.h    |    6 +++
 arch/x86/kernel/entry_64.S         |    2 +
 arch/x86/kernel/traps.c            |    4 ++
 arch/x86/kernel/vsyscall_64.c      |   83 +++++++++++++++++++++++++++++++++---
 6 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 6e976ee..a563c50 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -17,7 +17,8 @@
  *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
  *  Vectors  32 ... 127 : device interrupts
  *  Vector  128         : legacy int80 syscall interface
- *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 : device interrupts
+ *  Vector  204         : legacy x86_64 vsyscall emulation
+ *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 except 204 : device interrupts
  *  Vectors INVALIDATE_TLB_VECTOR_START ... 255 : special interrupts
  *
  * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
@@ -50,6 +51,9 @@
 #ifdef CONFIG_X86_32
 # define SYSCALL_VECTOR			0x80
 #endif
+#ifdef CONFIG_X86_64
+# define VSYSCALL_EMU_VECTOR		0xcc
+#endif
 
 /*
  * Vectors 0x30-0x3f are used for ISA interrupts.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0310da6..2bae0a5 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_TRAPS_H
 #define _ASM_X86_TRAPS_H
 
+#include <linux/kprobes.h>
+
 #include <asm/debugreg.h>
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
@@ -38,6 +40,7 @@ asmlinkage void alignment_check(void);
 asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
+asmlinkage void emulate_vsyscall(void);
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
@@ -64,6 +67,7 @@ dotraplinkage void do_alignment_check(struct pt_regs *, long);
 dotraplinkage void do_machine_check(struct pt_regs *, long);
 #endif
 dotraplinkage void do_simd_coprocessor_error(struct pt_regs *, long);
+dotraplinkage void do_emulate_vsyscall(struct pt_regs *, long);
 #ifdef CONFIG_X86_32
 dotraplinkage void do_iret_error(struct pt_regs *, long);
 #endif
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index d555973..293ae08 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -31,6 +31,12 @@ extern struct timezone sys_tz;
 
 extern void map_vsyscall(void);
 
+/* Emulation */
+static inline bool in_vsyscall_page(unsigned long addr)
+{
+	return (addr & ~(PAGE_SIZE - 1)) == VSYSCALL_START;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 8a445a0..bee7e81 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1121,6 +1121,8 @@ zeroentry spurious_interrupt_bug do_spurious_interrupt_bug
 zeroentry coprocessor_error do_coprocessor_error
 errorentry alignment_check do_alignment_check
 zeroentry simd_coprocessor_error do_simd_coprocessor_error
+zeroentry emulate_vsyscall do_emulate_vsyscall
+
 
 	/* Reload gs selector with exception handling */
 	/* edi:  new selector */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..72f0f6a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -872,6 +872,10 @@ void __init trap_init(void)
 	set_bit(SYSCALL_VECTOR, used_vectors);
 #endif
 
+	BUG_ON(test_bit(VSYSCALL_EMU_VECTOR, used_vectors));
+	set_system_intr_gate(VSYSCALL_EMU_VECTOR, &emulate_vsyscall);
+	set_bit(VSYSCALL_EMU_VECTOR, used_vectors);
+
 	/*
 	 * Should be a barrier for any external CPU state:
 	 */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 3e8dac7..53d2237 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -32,6 +32,8 @@
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/notifier.h>
+#include <linux/syscalls.h>
+#include <linux/ratelimit.h>
 
 #include <asm/vsyscall.h>
 #include <asm/pgtable.h>
@@ -44,10 +46,10 @@
 #include <asm/desc.h>
 #include <asm/topology.h>
 #include <asm/vgtod.h>
+#include <asm/traps.h>
 
 #define __vsyscall(nr) \
 		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
-#define __syscall_clobber "r11","cx","memory"
 
 DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
@@ -84,6 +86,26 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
 
+static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
+			      const char *message)
+{
+	struct task_struct *tsk;
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	if (!show_unhandled_signals || !__ratelimit(&rs))
+		return;
+
+	tsk = current;
+	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx",
+	       is_warning ? KERN_WARNING : KERN_INFO,
+	       tsk->comm, task_pid_nr(tsk),
+	       message,
+	       regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
+	if (!in_vsyscall_page(regs->ip - 2))
+		print_vma_addr(" in ", regs->ip - 2);
+	printk("\n");
+}
+
 /* RED-PEN may want to readd seq locking, but then the variable should be
  * write-once.
  */
@@ -92,13 +114,14 @@ static __always_inline void do_get_tz(struct timezone * tz)
 	*tz = VVAR(vsyscall_gtod_data).sys_tz;
 }
 
-static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
+static __always_inline int fallback_gettimeofday(struct timeval *tv)
 {
 	int ret;
-	asm volatile("syscall"
-		: "=a" (ret)
-		: "0" (__NR_gettimeofday),"D" (tv),"S" (tz)
-		: __syscall_clobber );
+	/* Invoke do_emulate_vsyscall. */
+	asm volatile("movb $0xce, %%al;\n\t"
+		     "int %[vec]"
+		     : "=a" (ret)
+		     : "D" (tv), [vec] "i" (VSYSCALL_EMU_VECTOR));
 	return ret;
 }
 
@@ -113,7 +136,7 @@ static __always_inline void do_vgettimeofday(struct timeval * tv)
 
 		vread = VVAR(vsyscall_gtod_data).clock.vread;
 		if (unlikely(!vread)) {
-			gettimeofday(tv,NULL);
+			fallback_gettimeofday(tv);
 			return;
 		}
 
@@ -214,6 +237,52 @@ static long __vsyscall(3) venosys_1(void)
 	return -ENOSYS;
 }
 
+void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
+{
+	long ret;
+
+	/* Kernel code must never get here. */
+	BUG_ON(!user_mode(regs));
+
+	local_irq_enable();
+
+	if ((regs->ax & 0xFF) != 0xce) {
+		warn_bad_vsyscall(regs, false, "illegal int 0xcc "
+				  "(exploit attempt?)");
+		force_sig(SIGSEGV, current);
+		goto out;
+	}
+
+	if (!in_vsyscall_page(regs->ip)) {
+		/*
+		 * We allow the call because tools like ThreadSpotter
+		 * might copy the int 0xcc instruction to user memory.
+		 * We make it annoying, though, to try to persuade
+		 * the authors to stop doing that...
+		 */
+		warn_bad_vsyscall(regs, true, "int 0xcc in user code (exploit"
+				  " attempt? legacy instrumented code?)");
+	}
+
+	if (current->seccomp.mode) {
+		do_exit(SIGKILL);
+		goto out;
+	}
+
+	ret = sys_gettimeofday((struct timeval __user *)regs->di, NULL);
+	if (ret == -EFAULT) {
+		warn_bad_vsyscall(regs, true, "int 0xcc faulted (exploit "
+				  "attempt?)");
+		force_sig(SIGSEGV, current);
+		goto out;
+	}
+
+	regs->ax = ret;
+
+out:
+	local_irq_disable();
+}
+
 /* Assume __initcall executes before all user space. Hopefully kmod
    doesn't violate that. We'll find out if it does. */
 static void __cpuinit vsyscall_set_cpu(int cpu)
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 05/10] x86-64: Map the HPET NX
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (3 preceding siblings ...)
  2011-05-30  3:48 ` [PATCH v2 04/10] x86-64: Replace vsyscall gettimeofday fallback with int 0xcc Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 06/10] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

Currently the HPET mapping is a user-accessible syscall instruction
at a fixed address some of the time.  A sufficiently determined
hacker might be able to guess when.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/pgtable_types.h |    4 ++--
 arch/x86/kernel/hpet.c               |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 6a29aed6..013286a 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -107,8 +107,8 @@
 #define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
-#define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
+#define __PAGE_KERNEL_VVAR_NOCACHE	(__PAGE_KERNEL_VVAR | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
@@ -130,8 +130,8 @@
 #define PAGE_KERNEL_LARGE_NOCACHE	__pgprot(__PAGE_KERNEL_LARGE_NOCACHE)
 #define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
 #define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
 #define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
+#define PAGE_KERNEL_VVAR_NOCACHE	__pgprot(__PAGE_KERNEL_VVAR_NOCACHE)
 
 #define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
 #define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index bfe8f72..bf71830 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -71,7 +71,7 @@ static inline void hpet_set_mapping(void)
 {
 	hpet_virt_address = ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
 #ifdef CONFIG_X86_64
-	__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);
+	__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VVAR_NOCACHE);
 #endif
 }
 
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 06/10] x86-64: Remove vsyscall number 3 (venosys)
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (4 preceding siblings ...)
  2011-05-30  3:48 ` [PATCH v2 05/10] x86-64: Map the HPET NX Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 07/10] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

It just segfaults since April 2008 (a4928cff), so I'm pretty sure
that nothing uses it.  And having an empty section makes the linker
script a bit fragile.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/kernel/vmlinux.lds.S |    4 ----
 arch/x86/kernel/vsyscall_64.c |    5 -----
 2 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3c1ec1c..1583238 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -182,10 +182,6 @@ SECTIONS
 		*(.vsyscall_2)
 	}
 
-	.vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) {
-		*(.vsyscall_3)
-	}
-
 	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 53d2237..71fa506 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -232,11 +232,6 @@ vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
 	return 0;
 }
 
-static long __vsyscall(3) venosys_1(void)
-{
-	return -ENOSYS;
-}
-
 void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 {
 	long ret;
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 07/10] x86-64: Fill unused parts of the vsyscall page with 0xcc
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (5 preceding siblings ...)
  2011-05-30  3:48 ` [PATCH v2 06/10] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 08/10] x86-64: Emulate vsyscalls Andy Lutomirski
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

Jumping to 0x00 might do something depending on the following bytes.
Jumping to 0xcc is a trap.  So fill the unused parts of the vsyscall
page with 0xcc to make it useless for exploits to jump there.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/kernel/vmlinux.lds.S |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1583238..8d6a0b7 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -166,22 +166,20 @@ SECTIONS
 	__vsyscall_0 = .;
 
 	. = VSYSCALL_ADDR;
-	.vsyscall_0 : AT(VLOAD(.vsyscall_0)) {
+	.vsyscall : AT(VLOAD(.vsyscall)) {
 		*(.vsyscall_0)
-	} :user
 
-	. = ALIGN(L1_CACHE_BYTES);
-	.vsyscall_fn : AT(VLOAD(.vsyscall_fn)) {
+		. = ALIGN(L1_CACHE_BYTES);
 		*(.vsyscall_fn)
-	}
 
-	.vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) {
+		. = 1024;
 		*(.vsyscall_1)
-	}
-	.vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) {
+
+		. = 2048;
 		*(.vsyscall_2)
-	}
 
+		. = 4096;  /* Pad the whole page. */
+	} :user =0xcc
 	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (6 preceding siblings ...)
  2011-05-30  3:48 ` [PATCH v2 07/10] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  7:35   ` Borislav Petkov
                     ` (2 more replies)
  2011-05-30  3:48 ` [PATCH v2 09/10] x86-64: Randomize int 0xcc magic al values at boot Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 10/10] x86-64: Document some of entry_64.S Andy Lutomirski
  9 siblings, 3 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

There's a fair amount of code in the vsyscall page, and who knows
what will happen if an exploit jumps into the middle of it.  Reduce
the risk by replacing most of it with short magic incantations that
are useless if entered in the middle.  This change can be disabled
by CONFIG_UNSAFE_VSYSCALLS (default y).

This causes vsyscalls to be a little more expensive than real
syscalls.  Fortunately sensible programs don't use them.

Some care is taken to make sure that tools like valgrind and
ThreadSpotter still work.

This patch is not perfect: the vread_tsc and vread_hpet functions
are still at a fixed address.  Fixing that might involve making
alternative patching work in the vDSO.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/Kconfig                  |   17 +++++
 arch/x86/kernel/Makefile          |    3 +
 arch/x86/kernel/vsyscall_64.c     |  121 +++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/vsyscall_emu_64.S |   40 ++++++++++++
 4 files changed, 176 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kernel/vsyscall_emu_64.S

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc6c53a..186018b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1650,6 +1650,23 @@ config COMPAT_VDSO
 
 	  If unsure, say Y.
 
+config UNSAFE_VSYSCALLS
+	def_bool y
+	prompt "Unsafe fast legacy vsyscalls"
+	depends on X86_64
+	---help---
+	  Legacy user code expects to be able to issue three syscalls
+	  by calling fixed addresses in kernel space.  If you say N,
+	  then the kernel traps and emulates these calls.  If you say
+	  Y, then there is actual executable code at a fixed address
+	  to implement these calls efficiently.
+
+	  On a system with recent enough glibc (probably 2.14 or
+	  newer) and no static binaries, you can say N without a
+	  performance penalty to improve security
+
+	  If unsure, say Y.
+
 config CMDLINE_BOOL
 	bool "Built-in kernel command line"
 	---help---
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a24521b..b901781 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -42,6 +42,9 @@ obj-$(CONFIG_X86_32)	+= probe_roms_32.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
+ifndef CONFIG_UNSAFE_VSYSCALLS
+	obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
+endif
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 71fa506..5b3d62a 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -48,9 +48,6 @@
 #include <asm/vgtod.h>
 #include <asm/traps.h>
 
-#define __vsyscall(nr) \
-		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
-
 DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 {
@@ -96,6 +93,7 @@ static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
 		return;
 
 	tsk = current;
+
 	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx",
 	       is_warning ? KERN_WARNING : KERN_INFO,
 	       tsk->comm, task_pid_nr(tsk),
@@ -106,6 +104,12 @@ static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
 	printk("\n");
 }
 
+#ifdef CONFIG_UNSAFE_VSYSCALLS
+
+#define __vsyscall(nr)							\
+	__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
+
+
 /* RED-PEN may want to readd seq locking, but then the variable should be
  * write-once.
  */
@@ -117,8 +121,11 @@ static __always_inline void do_get_tz(struct timezone * tz)
 static __always_inline int fallback_gettimeofday(struct timeval *tv)
 {
 	int ret;
-	/* Invoke do_emulate_vsyscall. */
-	asm volatile("movb $0xce, %%al;\n\t"
+	/*
+	 * Invoke do_emulate_vsyscall.  Intentionally incompatible with
+	 * the CONFIG_UNSAFE_VSYSCALLS=n case.
+	 */
+	asm volatile("mov $0xce, %%al;\n\t"
 		     "int %[vec]"
 		     : "=a" (ret)
 		     : "D" (tv), [vec] "i" (VSYSCALL_EMU_VECTOR));
@@ -237,6 +244,8 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 	long ret;
 
 	/* Kernel code must never get here. */
+	if (!user_mode(regs))
+		early_printk("oh crap!\n");
 	BUG_ON(!user_mode(regs));
 
 	local_irq_enable();
@@ -278,6 +287,106 @@ out:
 	local_irq_disable();
 }
 
+#else /* CONFIG_UNSAFE_VSYSCALLS=n below */
+
+static inline unsigned long vsyscall_intcc_addr(int vsyscall_nr)
+{
+	return VSYSCALL_START + 1024*vsyscall_nr + 2;
+}
+
+void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
+{
+	u8 vsyscall_nr, al;
+	long ret;
+
+	/* Kernel code must never get here. */
+	BUG_ON(!user_mode(regs));
+
+	local_irq_enable();
+
+	/*
+	 * Decode the vsyscall number.
+	 * 0xcc -> 0, 0xce -> 1, 0xf0 -> 2; see vsyscall_emu_64.S for why.
+	 */
+	al = regs->ax & 0xff;
+	vsyscall_nr = (al - 0xcc) >> 1;
+	if (vsyscall_nr > 2 || al != (vsyscall_nr << 1) + 0xcc) {
+		warn_bad_vsyscall(regs, false, "illegal int 0xcc "
+				  "(exploit attempt?)");
+		force_sig(SIGSEGV, current);
+		goto out;
+	}
+
+	if (regs->ip - 2 != vsyscall_intcc_addr(vsyscall_nr)) {
+		if (in_vsyscall_page(regs->ip - 2)) {
+			/* This should not be possible. */
+			warn_bad_vsyscall(regs, true, "int 0xcc severe badness"
+					  " (exploit attempt?)");
+			force_sig(SIGSEGV, current);
+			goto out;
+		} else {
+			/*
+			 * We allow the call because tools like ThreadSpotter
+			 * might copy the int 0xcc instruction to user memory.
+			 * We make it annoying, though, to try to persuade
+			 * the authors to stop doing that...
+			 */
+			warn_bad_vsyscall(regs, true, "int 0xcc in user code "
+					  "(exploit attempt? legacy "
+					  "instrumented code?)");
+		}
+	}
+
+	if (current->seccomp.mode) {
+		do_exit(SIGKILL);
+		goto out;
+	}
+
+	switch(vsyscall_nr)
+	{
+	case 0:
+		ret = sys_gettimeofday(
+			(struct timeval __user *)regs->di,
+			(struct timezone __user *)regs->si);
+		break;
+
+	case 1:
+		ret = sys_time((time_t __user *)regs->di);
+		break;
+
+	case 2:
+		ret = sys_getcpu((unsigned __user *)regs->di,
+				 (unsigned __user *)regs->si,
+				 0);
+		break;
+
+	default:
+		unreachable();
+	}
+
+	if (ret == -EFAULT) {
+		/*
+		 * Bad news -- userspace fed a bad pointer to a vsyscall.
+		 *
+		 * With a real vsyscall, that would have caused SIGSEGV.
+		 * To make writing reliable exploits using the emulated
+		 * vsyscalls harder, generate SIGSEGV here as well.
+		 */
+		warn_bad_vsyscall(regs, false,
+				  "vsyscall fault (exploit attempt?)");
+		force_sig(SIGSEGV, current);
+		goto out;
+	}
+
+	regs->ax = ret;
+
+out:
+	local_irq_disable();
+	return;
+}
+
+#endif /* CONFIG_UNSAFE_VSYSCALLS */
+
 /* Assume __initcall executes before all user space. Hopefully kmod
    doesn't violate that. We'll find out if it does. */
 static void __cpuinit vsyscall_set_cpu(int cpu)
@@ -331,11 +440,13 @@ void __init map_vsyscall(void)
 
 static int __init vsyscall_init(void)
 {
+#ifdef CONFIG_UNSAFE_VSYSCALLS
 	BUG_ON(((unsigned long) &vgettimeofday !=
 			VSYSCALL_ADDR(__NR_vgettimeofday)));
 	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
 	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
 	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
+#endif
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
new file mode 100644
index 0000000..3e1cad2
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -0,0 +1,40 @@
+/*
+ * vsyscall_emu_64.S: Vsyscall emulation page
+ * Copyright (c) 2011 Andy Lutomirski
+ * Subject to the GNU General Public License, version 2
+*/
+
+#include <linux/linkage.h>
+#include <asm/irq_vectors.h>
+
+/*
+ * These magic incantations are chosen so that they fault if entered anywhere
+ * other than an instruction boundary.  The movb instruction is two bytes, and
+ * the int imm8 instruction is also two bytes, so the only misaligned places
+ * to enter are the immediate values for the two instructions.  0xcc is int3
+ * (always faults), 0xce is into (faults on x64-64, and 32-bit code can't get
+ * here), and 0xf0 is lock (lock int is invalid).
+ *
+ * The unused parts of the page are filled with 0xcc by the linker script.
+ */
+
+.section .vsyscall_0, "a"
+ENTRY(vsyscall_0)
+	movb $0xcc, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_0)
+
+.section .vsyscall_1, "a"
+ENTRY(vsyscall_1)
+	movb $0xce, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_1)
+
+.section .vsyscall_2, "a"
+ENTRY(vsyscall_2)
+	movb $0xf0, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_2)
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 09/10] x86-64: Randomize int 0xcc magic al values at boot
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (7 preceding siblings ...)
  2011-05-30  3:48 ` [PATCH v2 08/10] x86-64: Emulate vsyscalls Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  3:48 ` [PATCH v2 10/10] x86-64: Document some of entry_64.S Andy Lutomirski
  9 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

This is not a security feature.  It's to prevent the int 0xcc
sequence from becoming ABI.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/kernel/vsyscall_64.c |   65 ++++++++++++++++++++++++++++++++++++++--
 1 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 5b3d62a..f53f490 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -34,6 +34,8 @@
 #include <linux/notifier.h>
 #include <linux/syscalls.h>
 #include <linux/ratelimit.h>
+#include <linux/random.h>
+#include <linux/highmem.h>
 
 #include <asm/vsyscall.h>
 #include <asm/pgtable.h>
@@ -54,6 +56,19 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 	.lock = SEQLOCK_UNLOCKED,
 };
 
+static u8 vsyscall_nr_offset;  /* Cyclic permutation of al. */
+
+static inline u8 mangle_al(u8 al)
+{
+	/* Permute among 0xcc, 0xce, and 0xf0. */
+	return (al - 0xcc + 2*vsyscall_nr_offset) % 6 + 0xcc;
+}
+
+static inline u8 demangle_vsyscall_nr(u8 nr)
+{
+	return (nr + 3 - vsyscall_nr_offset) % 3;
+}
+
 void update_vsyscall_tz(void)
 {
 	unsigned long flags;
@@ -94,11 +109,12 @@ static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
 
 	tsk = current;
 
-	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx",
+	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx offset:%d si:%lx di:%lx",
 	       is_warning ? KERN_WARNING : KERN_INFO,
 	       tsk->comm, task_pid_nr(tsk),
 	       message,
-	       regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
+	       regs->ip - 2, regs->sp, regs->ax, (int)vsyscall_nr_offset,
+	       regs->si, regs->di);
 	if (!in_vsyscall_page(regs->ip - 2))
 		print_vma_addr(" in ", regs->ip - 2);
 	printk("\n");
@@ -125,7 +141,8 @@ static __always_inline int fallback_gettimeofday(struct timeval *tv)
 	 * Invoke do_emulate_vsyscall.  Intentionally incompatible with
 	 * the CONFIG_UNSAFE_VSYSCALLS=n case.
 	 */
-	asm volatile("mov $0xce, %%al;\n\t"
+	asm volatile("fallback_gtod_movb:\n\t"
+		     "movb $0xce, %%al;\n\t"
 		     "int %[vec]"
 		     : "=a" (ret)
 		     : "D" (tv), [vec] "i" (VSYSCALL_EMU_VECTOR));
@@ -250,7 +267,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 
 	local_irq_enable();
 
-	if ((regs->ax & 0xFF) != 0xce) {
+	if ((regs->ax & 0xFF) != mangle_al(0xce)) {
 		warn_bad_vsyscall(regs, false, "illegal int 0xcc "
 				  "(exploit attempt?)");
 		force_sig(SIGSEGV, current);
@@ -316,6 +333,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 		force_sig(SIGSEGV, current);
 		goto out;
 	}
+	vsyscall_nr = demangle_vsyscall_nr(vsyscall_nr);
 
 	if (regs->ip - 2 != vsyscall_intcc_addr(vsyscall_nr)) {
 		if (in_vsyscall_page(regs->ip - 2)) {
@@ -438,15 +456,54 @@ void __init map_vsyscall(void)
 		     (unsigned long)VVAR_ADDRESS);
 }
 
+static void __init mangle_vsyscall_movb(void *mapping,
+					unsigned long movb_addr, u8 initial)
+{
+	u8 *imm8;
+	BUG_ON(movb_addr >= 4095);
+
+	imm8 = (char*)(mapping) + movb_addr + 1;
+
+	BUG_ON(*imm8 != initial);
+	*imm8 = mangle_al(*imm8);
+}
+
 static int __init vsyscall_init(void)
 {
+	struct page *vsyscall_page;
+	extern char __vsyscall_0;
+	void *mapping;
+
 #ifdef CONFIG_UNSAFE_VSYSCALLS
+	extern char fallback_gtod_movb;
+
 	BUG_ON(((unsigned long) &vgettimeofday !=
 			VSYSCALL_ADDR(__NR_vgettimeofday)));
 	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
 	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
 	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
 #endif
+
+	/*
+	 * Randomize the magic al values for int 0xcc invocation.  This
+	 * isn't really a security feature; it's to make sure that
+	 * dynamic binary instrumentation tools don't start to think
+	 * that the int 0xcc magic incantation is ABI.
+	 */
+	vsyscall_nr_offset = get_random_int() % 3;
+	vsyscall_page = pfn_to_page(__pa_symbol(&__vsyscall_0) >> PAGE_SHIFT);
+	mapping = kmap_atomic(vsyscall_page);
+#ifdef CONFIG_UNSAFE_VSYSCALLS
+	mangle_vsyscall_movb(mapping,
+			     &fallback_gtod_movb - (char*)&vgettimeofday, 0xce);
+#else
+	/* It's easier to hardcode the addresses -- they're ABI. */
+	mangle_vsyscall_movb(mapping, 0, 0xcc);
+	mangle_vsyscall_movb(mapping, 1024, 0xce);
+	mangle_vsyscall_movb(mapping, 2048, 0xf0);
+#endif
+	kunmap_atomic(mapping);
+
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 10/10] x86-64: Document some of entry_64.S
  2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (8 preceding siblings ...)
  2011-05-30  3:48 ` [PATCH v2 09/10] x86-64: Randomize int 0xcc magic al values at boot Andy Lutomirski
@ 2011-05-30  3:48 ` Andy Lutomirski
  2011-05-30  7:59   ` Borislav Petkov
  9 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2011-05-30  3:48 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 Documentation/x86/entry_64.txt |   95 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/entry_64.S     |    2 +
 2 files changed, 97 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/x86/entry_64.txt

diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
new file mode 100644
index 0000000..081c405
--- /dev/null
+++ b/Documentation/x86/entry_64.txt
@@ -0,0 +1,95 @@
+This file documents some of the kernel entries in
+arch/x86/kernel/entry_64.S.  A lot of this explanation is adapted from
+an email from Ingo Molnar:
+
+http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
+
+The x86 architecture has quite a few different ways to jump into
+kernel code.  Most of these entry points are registered in
+arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S.
+
+The IDT vector assignments are listed in arch/x86/include/irq_vectors.h.
+
+Some of these entries are:
+
+ - system_call: syscall instruction from 64-bit code.
+
+ - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
+   either way.
+
+ - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit code
+
+ - interrupt: An array of entries.  Every IDT vector that doesn't
+   explicitly point somewhere else gets set to the corresponding value
+   in interrupts.  These point to a whole array of magically-generated
+   functions that make their way to do_IRQ with the interrupt number
+   as a parameter.
+
+ - emulate_vsyscall: int 0xcc, a special non-ABI entry used by
+   vsyscall emulation.
+
+ - APIC interrupts: Various special-purpose interrupts for things like
+   TLB shootdown.
+
+ - Architecturally-defined exceptions like divide_error.
+
+There are a few complexities here.  The different x86-64 entries have
+different calling conventions.  The syscall and sysenter instructions
+have their own peculiar calling conventions.  Some of the IDT entries
+push an error code onto the stack; others don't.  IDT entries using
+the IST alternative stack mechanism need their own magic to get the
+stack frames right.  (You can find some documentation in the Intel
+SDM, Volume 3, Chapter 6.)
+
+Dealing with the swapgs instruction is especially tricky.  Swapgs
+toggles whether gs is the kernel gs or the user gs.  The swapgs
+instruction is rather fragile: it must nest perfectly and only in
+single depth, it should only be used if entering from user mode to
+kernel mode and then when returning to user-space, and precisely
+so. If we mess that up even slightly, we crash.
+
+So when we have a secondary entry, already in kernel mode, we *must
+not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
+not switched/swapped yet.
+
+Now, there's a secondary complication: there's a cheap way to test
+which mode the CPU is in and an expensive way.
+
+The cheap way is to pick this info off the entry frame on the kernel
+stack, from the CS of the ptregs area of the kernel stack:
+
+	xorl %ebx,%ebx
+	testl $3,CS+8(%rsp)
+	je error_kernelspace
+	SWAPGS
+
+The expensive (paranoid) way is to read back the MSR_GS_BASE value
+(which is what SWAPGS modifies):
+
+	movl $1,%ebx
+	movl $MSR_GS_BASE,%ecx
+	rdmsr
+	testl %edx,%edx
+	js 1f   /* negative -> in kernel */
+	SWAPGS
+	xorl %ebx,%ebx
+1:	ret
+
+and the whole paranoid non-paranoid macro complexity is about whether
+to suffer that RDMSR cost.
+
+If we are at an interrupt or user-trap/gate-alike boundary then we can
+use the faster check: the stack will be a reliable indicator of
+whether SWAPGS was already done: if we see that we are a secondary
+entry interrupting kernel mode execution, then we know that the GS
+base has already been switched. If it says that we interrupted
+user-space execution then we must do the SWAPGS.
+
+But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
+which might have triggered right after a normal entry wrote CS to the
+stack but before we executed SWAPGS, then the only safe way to check
+for GS is the slower method: the RDMSR.
+
+So we try only to mark those entry methods 'paranoid' that absolutely
+need the more expensive check for the GS base - and we generate all
+'normal' entry points with the regular (faster) entry macros.
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index bee7e81..e949793 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -9,6 +9,8 @@
 /*
  * entry.S contains the system-call and fault low-level handling routines.
  *
+ * Some of this is documented in Documentation/x86/entry_64.txt
+ *
  * NOTE: This code handles signal-recognition, which happens every time
  * after an interrupt and after each system call.
  *
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  3:48 ` [PATCH v2 08/10] x86-64: Emulate vsyscalls Andy Lutomirski
@ 2011-05-30  7:35   ` Borislav Petkov
  2011-05-30 10:43     ` Andrew Lutomirski
  2011-05-30  7:46   ` Ingo Molnar
  2011-05-30  7:51   ` Jan Beulich
  2 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2011-05-30  7:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson

On Sun, May 29, 2011 at 11:48:45PM -0400, Andy Lutomirski wrote:
> There's a fair amount of code in the vsyscall page, and who knows
> what will happen if an exploit jumps into the middle of it.  Reduce
> the risk by replacing most of it with short magic incantations that
> are useless if entered in the middle.  This change can be disabled
> by CONFIG_UNSAFE_VSYSCALLS (default y).
> 
> This causes vsyscalls to be a little more expensive than real
> syscalls.  Fortunately sensible programs don't use them.
> 
> Some care is taken to make sure that tools like valgrind and
> ThreadSpotter still work.
> 
> This patch is not perfect: the vread_tsc and vread_hpet functions
> are still at a fixed address.  Fixing that might involve making
> alternative patching work in the vDSO.
> 
> Signed-off-by: Andy Lutomirski <luto@mit.edu>
> ---
>  arch/x86/Kconfig                  |   17 +++++
>  arch/x86/kernel/Makefile          |    3 +
>  arch/x86/kernel/vsyscall_64.c     |  121 +++++++++++++++++++++++++++++++++++--
>  arch/x86/kernel/vsyscall_emu_64.S |   40 ++++++++++++
>  4 files changed, 176 insertions(+), 5 deletions(-)
>  create mode 100644 arch/x86/kernel/vsyscall_emu_64.S
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cc6c53a..186018b 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1650,6 +1650,23 @@ config COMPAT_VDSO
>  
>  	  If unsure, say Y.
>  
> +config UNSAFE_VSYSCALLS
> +	def_bool y
> +	prompt "Unsafe fast legacy vsyscalls"
> +	depends on X86_64
> +	---help---
> +	  Legacy user code expects to be able to issue three syscalls
> +	  by calling fixed addresses in kernel space.  If you say N,
> +	  then the kernel traps and emulates these calls.  If you say
> +	  Y, then there is actual executable code at a fixed address
> +	  to implement these calls efficiently.
> +
> +	  On a system with recent enough glibc (probably 2.14 or
> +	  newer) and no static binaries, you can say N without a
> +	  performance penalty to improve security
> +
> +	  If unsure, say Y.
> +
>  config CMDLINE_BOOL
>  	bool "Built-in kernel command line"
>  	---help---
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index a24521b..b901781 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -42,6 +42,9 @@ obj-$(CONFIG_X86_32)	+= probe_roms_32.o
>  obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
>  obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
>  obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
> +ifndef CONFIG_UNSAFE_VSYSCALLS
> +	obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
> +endif
>  obj-y			+= bootflag.o e820.o
>  obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
>  obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o
> diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
> index 71fa506..5b3d62a 100644
> --- a/arch/x86/kernel/vsyscall_64.c
> +++ b/arch/x86/kernel/vsyscall_64.c
> @@ -48,9 +48,6 @@
>  #include <asm/vgtod.h>
>  #include <asm/traps.h>
>  
> -#define __vsyscall(nr) \
> -		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
> -
>  DEFINE_VVAR(int, vgetcpu_mode);
>  DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
>  {
> @@ -96,6 +93,7 @@ static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
>  		return;
>  
>  	tsk = current;
> +
>  	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx",
>  	       is_warning ? KERN_WARNING : KERN_INFO,
>  	       tsk->comm, task_pid_nr(tsk),
> @@ -106,6 +104,12 @@ static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
>  	printk("\n");
>  }
>  
> +#ifdef CONFIG_UNSAFE_VSYSCALLS
> +
> +#define __vsyscall(nr)							\
> +	__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
> +
> +
>  /* RED-PEN may want to readd seq locking, but then the variable should be
>   * write-once.
>   */
> @@ -117,8 +121,11 @@ static __always_inline void do_get_tz(struct timezone * tz)
>  static __always_inline int fallback_gettimeofday(struct timeval *tv)
>  {
>  	int ret;
> -	/* Invoke do_emulate_vsyscall. */
> -	asm volatile("movb $0xce, %%al;\n\t"
> +	/*
> +	 * Invoke do_emulate_vsyscall.  Intentionally incompatible with
> +	 * the CONFIG_UNSAFE_VSYSCALLS=n case.
> +	 */
> +	asm volatile("mov $0xce, %%al;\n\t"
>  		     "int %[vec]"
>  		     : "=a" (ret)
>  		     : "D" (tv), [vec] "i" (VSYSCALL_EMU_VECTOR));
> @@ -237,6 +244,8 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
>  	long ret;
>  
>  	/* Kernel code must never get here. */
> +	if (!user_mode(regs))
> +		early_printk("oh crap!\n");
>  	BUG_ON(!user_mode(regs));
>  
>  	local_irq_enable();
> @@ -278,6 +287,106 @@ out:
>  	local_irq_disable();
>  }
>  
> +#else /* CONFIG_UNSAFE_VSYSCALLS=n below */
> +
> +static inline unsigned long vsyscall_intcc_addr(int vsyscall_nr)
> +{
> +	return VSYSCALL_START + 1024*vsyscall_nr + 2;
> +}
> +
> +void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
> +{
> +	u8 vsyscall_nr, al;
> +	long ret;
> +
> +	/* Kernel code must never get here. */
> +	BUG_ON(!user_mode(regs));
> +
> +	local_irq_enable();
> +
> +	/*
> +	 * Decode the vsyscall number.
> +	 * 0xcc -> 0, 0xce -> 1, 0xf0 -> 2; see vsyscall_emu_64.S for why.
> +	 */
> +	al = regs->ax & 0xff;
> +	vsyscall_nr = (al - 0xcc) >> 1;

Ok, but

	(0xf0 - 0xcc) >> 1 == 0x12

Don't you mean 0xd0 here? Although 0xd0 is opcode for all those
rotate/shift insns. What am I missing?

> +	if (vsyscall_nr > 2 || al != (vsyscall_nr << 1) + 0xcc) {
> +		warn_bad_vsyscall(regs, false, "illegal int 0xcc "
> +				  "(exploit attempt?)");
> +		force_sig(SIGSEGV, current);
> +		goto out;
> +	}
> +
> +	if (regs->ip - 2 != vsyscall_intcc_addr(vsyscall_nr)) {
> +		if (in_vsyscall_page(regs->ip - 2)) {
> +			/* This should not be possible. */
> +			warn_bad_vsyscall(regs, true, "int 0xcc severe badness"
> +					  " (exploit attempt?)");
> +			force_sig(SIGSEGV, current);
> +			goto out;
> +		} else {
> +			/*
> +			 * We allow the call because tools like ThreadSpotter
> +			 * might copy the int 0xcc instruction to user memory.
> +			 * We make it annoying, though, to try to persuade
> +			 * the authors to stop doing that...
> +			 */
> +			warn_bad_vsyscall(regs, true, "int 0xcc in user code "
> +					  "(exploit attempt? legacy "
> +					  "instrumented code?)");
> +		}
> +	}
> +
> +	if (current->seccomp.mode) {
> +		do_exit(SIGKILL);
> +		goto out;
> +	}
> +
> +	switch(vsyscall_nr)
> +	{
> +	case 0:
> +		ret = sys_gettimeofday(
> +			(struct timeval __user *)regs->di,
> +			(struct timezone __user *)regs->si);
> +		break;
> +
> +	case 1:
> +		ret = sys_time((time_t __user *)regs->di);
> +		break;
> +
> +	case 2:
> +		ret = sys_getcpu((unsigned __user *)regs->di,
> +				 (unsigned __user *)regs->si,
> +				 0);
> +		break;
> +
> +	default:
> +		unreachable();
> +	}
> +
> +	if (ret == -EFAULT) {
> +		/*
> +		 * Bad news -- userspace fed a bad pointer to a vsyscall.
> +		 *
> +		 * With a real vsyscall, that would have caused SIGSEGV.
> +		 * To make writing reliable exploits using the emulated
> +		 * vsyscalls harder, generate SIGSEGV here as well.
> +		 */
> +		warn_bad_vsyscall(regs, false,
> +				  "vsyscall fault (exploit attempt?)");
> +		force_sig(SIGSEGV, current);
> +		goto out;
> +	}
> +
> +	regs->ax = ret;
> +
> +out:
> +	local_irq_disable();
> +	return;
> +}
> +
> +#endif /* CONFIG_UNSAFE_VSYSCALLS */
> +
>  /* Assume __initcall executes before all user space. Hopefully kmod
>     doesn't violate that. We'll find out if it does. */
>  static void __cpuinit vsyscall_set_cpu(int cpu)
> @@ -331,11 +440,13 @@ void __init map_vsyscall(void)
>  
>  static int __init vsyscall_init(void)
>  {
> +#ifdef CONFIG_UNSAFE_VSYSCALLS
>  	BUG_ON(((unsigned long) &vgettimeofday !=
>  			VSYSCALL_ADDR(__NR_vgettimeofday)));
>  	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
>  	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
>  	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
> +#endif
>  	on_each_cpu(cpu_vsyscall_init, NULL, 1);
>  	/* notifier priority > KVM */
>  	hotcpu_notifier(cpu_vsyscall_notifier, 30);
> diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
> new file mode 100644
> index 0000000..3e1cad2
> --- /dev/null
> +++ b/arch/x86/kernel/vsyscall_emu_64.S
> @@ -0,0 +1,40 @@
> +/*
> + * vsyscall_emu_64.S: Vsyscall emulation page
> + * Copyright (c) 2011 Andy Lutomirski
> + * Subject to the GNU General Public License, version 2
> +*/
> +
> +#include <linux/linkage.h>
> +#include <asm/irq_vectors.h>
> +
> +/*
> + * These magic incantations are chosen so that they fault if entered anywhere
> + * other than an instruction boundary.  The movb instruction is two bytes, and
> + * the int imm8 instruction is also two bytes, so the only misaligned places
> + * to enter are the immediate values for the two instructions.  0xcc is int3
> + * (always faults), 0xce is into (faults on x64-64, and 32-bit code can't get
> + * here), and 0xf0 is lock (lock int is invalid).
> + *
> + * The unused parts of the page are filled with 0xcc by the linker script.
> + */
> +
> +.section .vsyscall_0, "a"
> +ENTRY(vsyscall_0)
> +	movb $0xcc, %al
> +	int $VSYSCALL_EMU_VECTOR
> +	ret
> +END(vsyscall_0)
> +
> +.section .vsyscall_1, "a"
> +ENTRY(vsyscall_1)
> +	movb $0xce, %al
> +	int $VSYSCALL_EMU_VECTOR
> +	ret
> +END(vsyscall_1)
> +
> +.section .vsyscall_2, "a"
> +ENTRY(vsyscall_2)
> +	movb $0xf0, %al
> +	int $VSYSCALL_EMU_VECTOR
> +	ret
> +END(vsyscall_2)
> -- 
> 1.7.5.1
> 
> 

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  3:48 ` [PATCH v2 08/10] x86-64: Emulate vsyscalls Andy Lutomirski
  2011-05-30  7:35   ` Borislav Petkov
@ 2011-05-30  7:46   ` Ingo Molnar
  2011-05-30 10:57     ` Andrew Lutomirski
  2011-05-30  7:51   ` Jan Beulich
  2 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2011-05-30  7:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson


* Andy Lutomirski <luto@MIT.EDU> wrote:

> There's a fair amount of code in the vsyscall page, and who knows
> what will happen if an exploit jumps into the middle of it.  Reduce
> the risk by replacing most of it with short magic incantations that
> are useless if entered in the middle.  This change can be disabled
> by CONFIG_UNSAFE_VSYSCALLS (default y).

btw., please flip the default or consider removing the option 
altogether.

We want to improve security and we want safe vsyscalls the default, 
and it's no good if we make it too easy for users to keep the fire 
door open all the time! :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  3:48 ` [PATCH v2 08/10] x86-64: Emulate vsyscalls Andy Lutomirski
  2011-05-30  7:35   ` Borislav Petkov
  2011-05-30  7:46   ` Ingo Molnar
@ 2011-05-30  7:51   ` Jan Beulich
  2011-05-30  8:07     ` Ingo Molnar
  2011-05-31  2:29     ` Andrew Lutomirski
  2 siblings, 2 replies; 26+ messages in thread
From: Jan Beulich @ 2011-05-30  7:51 UTC (permalink / raw)
  To: Ingo Molnar, Andy Lutomirski
  Cc: Borislav Petkov, Jesper Juhl, richard -rw- weinberger,
	Arjan van de Ven, Mikael Pettersson, x86, Thomas Gleixner,
	Andrew Morton, Linus Torvalds, linux-kernel

>>> On 30.05.11 at 05:48, Andy Lutomirski <luto@MIT.EDU> wrote:
> This causes vsyscalls to be a little more expensive than real
> syscalls.  Fortunately sensible programs don't use them.

Hmm - weren't vsyscalls there for performance reasons?

Besides that, just a mostly cosmetic remark:

> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1650,6 +1650,23 @@ config COMPAT_VDSO
>  
>  	  If unsure, say Y.
>  
> +config UNSAFE_VSYSCALLS
> +	def_bool y
> +	prompt "Unsafe fast legacy vsyscalls"
> +	depends on X86_64
> +	---help---
> +	  Legacy user code expects to be able to issue three syscalls
> +	  by calling fixed addresses in kernel space.  If you say N,
> +	  then the kernel traps and emulates these calls.  If you say
> +	  Y, then there is actual executable code at a fixed address
> +	  to implement these calls efficiently.
> +
> +	  On a system with recent enough glibc (probably 2.14 or
> +	  newer) and no static binaries, you can say N without a
> +	  performance penalty to improve security
> +
> +	  If unsure, say Y.
> +
>  config CMDLINE_BOOL
>  	bool "Built-in kernel command line"
>  	---help---
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -42,6 +42,9 @@ obj-$(CONFIG_X86_32)	+= probe_roms_32.o
>  obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
>  obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
>  obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
> +ifndef CONFIG_UNSAFE_VSYSCALLS
> +	obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
> +endif

With the Kconfig dependency on X86_64 above and the new
variable being a boolean one, these three lines can be written
as just

obj-$(CONFIG_UNSAFE_VSYSCALLS) += vsyscall_emu_64.o

Jan

>  obj-y			+= bootflag.o e820.o
>  obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
>  obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 10/10] x86-64: Document some of entry_64.S
  2011-05-30  3:48 ` [PATCH v2 10/10] x86-64: Document some of entry_64.S Andy Lutomirski
@ 2011-05-30  7:59   ` Borislav Petkov
  2011-05-30 10:40     ` Andrew Lutomirski
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2011-05-30  7:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson

On Sun, May 29, 2011 at 11:48:47PM -0400, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <luto@mit.edu>

Oh yeah!

Acked-by: Borislav Petkov <bp@alien8.de>

> ---
>  Documentation/x86/entry_64.txt |   95 ++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/entry_64.S     |    2 +
>  2 files changed, 97 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/x86/entry_64.txt
> 
> diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
> new file mode 100644
> index 0000000..081c405
> --- /dev/null
> +++ b/Documentation/x86/entry_64.txt
> @@ -0,0 +1,95 @@
> +This file documents some of the kernel entries in
> +arch/x86/kernel/entry_64.S.  A lot of this explanation is adapted from
> +an email from Ingo Molnar:
> +
> +http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
> +
> +The x86 architecture has quite a few different ways to jump into
> +kernel code.  Most of these entry points are registered in
> +arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S

and arch/x86/ia32/ia32entry.S

> +
> +The IDT vector assignments are listed in arch/x86/include/irq_vectors.h.
> +
> +Some of these entries are:
> +
> + - system_call: syscall instruction from 64-bit code.
> +
> + - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
> +   either way.
> +
> + - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit code
> +
> + - interrupt: An array of entries.  Every IDT vector that doesn't
> +   explicitly point somewhere else gets set to the corresponding value
> +   in interrupts.  These point to a whole array of magically-generated
> +   functions that make their way to do_IRQ with the interrupt number
> +   as a parameter.
> +
> + - emulate_vsyscall: int 0xcc, a special non-ABI entry used by
> +   vsyscall emulation.
> +
> + - APIC interrupts: Various special-purpose interrupts for things like
> +   TLB shootdown.
> +
> + - Architecturally-defined exceptions like divide_error.
> +
> +There are a few complexities here.  The different x86-64 entries have
> +different calling conventions.  The syscall and sysenter instructions
> +have their own peculiar calling conventions.  Some of the IDT entries
> +push an error code onto the stack; others don't.  IDT entries using
> +the IST alternative stack mechanism need their own magic to get the
> +stack frames right.  (You can find some documentation in the Intel
> +SDM, Volume 3, Chapter 6.)

and the AMD APM Volume 2, Chapter 8.

> +
> +Dealing with the swapgs instruction is especially tricky.  Swapgs
> +toggles whether gs is the kernel gs or the user gs.  The swapgs
> +instruction is rather fragile: it must nest perfectly and only in
> +single depth, it should only be used if entering from user mode to
> +kernel mode and then when returning to user-space, and precisely
> +so. If we mess that up even slightly, we crash.
> +
> +So when we have a secondary entry, already in kernel mode, we *must
> +not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
> +not switched/swapped yet.
> +
> +Now, there's a secondary complication: there's a cheap way to test
> +which mode the CPU is in and an expensive way.
> +
> +The cheap way is to pick this info off the entry frame on the kernel
> +stack, from the CS of the ptregs area of the kernel stack:
> +
> +	xorl %ebx,%ebx
> +	testl $3,CS+8(%rsp)
> +	je error_kernelspace
> +	SWAPGS
> +
> +The expensive (paranoid) way is to read back the MSR_GS_BASE value
> +(which is what SWAPGS modifies):
> +
> +	movl $1,%ebx
> +	movl $MSR_GS_BASE,%ecx
> +	rdmsr
> +	testl %edx,%edx
> +	js 1f   /* negative -> in kernel */
> +	SWAPGS
> +	xorl %ebx,%ebx
> +1:	ret
> +
> +and the whole paranoid non-paranoid macro complexity is about whether
> +to suffer that RDMSR cost.
> +
> +If we are at an interrupt or user-trap/gate-alike boundary then we can
> +use the faster check: the stack will be a reliable indicator of
> +whether SWAPGS was already done: if we see that we are a secondary
> +entry interrupting kernel mode execution, then we know that the GS
> +base has already been switched. If it says that we interrupted
> +user-space execution then we must do the SWAPGS.
> +
> +But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
> +which might have triggered right after a normal entry wrote CS to the
> +stack but before we executed SWAPGS, then the only safe way to check
> +for GS is the slower method: the RDMSR.
> +
> +So we try only to mark those entry methods 'paranoid' that absolutely
> +need the more expensive check for the GS base - and we generate all
> +'normal' entry points with the regular (faster) entry macros.
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index bee7e81..e949793 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -9,6 +9,8 @@
>  /*
>   * entry.S contains the system-call and fault low-level handling routines.
>   *
> + * Some of this is documented in Documentation/x86/entry_64.txt
> + *
>   * NOTE: This code handles signal-recognition, which happens every time
>   * after an interrupt and after each system call.
>   *
> -- 
> 1.7.5.1
> 
> 

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  7:51   ` Jan Beulich
@ 2011-05-30  8:07     ` Ingo Molnar
  2011-05-31  2:29     ` Andrew Lutomirski
  1 sibling, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2011-05-30  8:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andy Lutomirski, Borislav Petkov, Jesper Juhl,
	richard -rw- weinberger, Arjan van de Ven, Mikael Pettersson,
	x86, Thomas Gleixner, Andrew Morton, Linus Torvalds,
	linux-kernel


* Jan Beulich <JBeulich@novell.com> wrote:

> >>> On 30.05.11 at 05:48, Andy Lutomirski <luto@MIT.EDU> wrote:
> > This causes vsyscalls to be a little more expensive than real
> > syscalls.  Fortunately sensible programs don't use them.
> 
> Hmm - weren't vsyscalls there for performance reasons?

New code uses the vDSO. The fixed-address vsyscalls are apparently 
only used in esoteric cases like statically linked glibc binaries. 
(which should arguably be using the vDSO as well)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 10/10] x86-64: Document some of entry_64.S
  2011-05-30  7:59   ` Borislav Petkov
@ 2011-05-30 10:40     ` Andrew Lutomirski
  2011-05-30 10:50       ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lutomirski @ 2011-05-30 10:40 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Ingo Molnar, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson

On Mon, May 30, 2011 at 3:59 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, May 29, 2011 at 11:48:47PM -0400, Andy Lutomirski wrote:
>> Signed-off-by: Andy Lutomirski <luto@mit.edu>
>
> Oh yeah!
>
> Acked-by: Borislav Petkov <bp@alien8.de>
>
>> +The x86 architecture has quite a few different ways to jump into
>> +kernel code.  Most of these entry points are registered in
>> +arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S
>
> and arch/x86/ia32/ia32entry.S

I was hedging by saying "most".  Will fix.

>> +
>> +There are a few complexities here.  The different x86-64 entries have
>> +different calling conventions.  The syscall and sysenter instructions
>> +have their own peculiar calling conventions.  Some of the IDT entries
>> +push an error code onto the stack; others don't.  IDT entries using
>> +the IST alternative stack mechanism need their own magic to get the
>> +stack frames right.  (You can find some documentation in the Intel
>> +SDM, Volume 3, Chapter 6.)
>
> and the AMD APM Volume 2, Chapter 8.

Of course.  I think I like that version even better.  Page 243 is a
great summary.

(And whose idea was it to have pushing the error code be optional?)


--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  7:35   ` Borislav Petkov
@ 2011-05-30 10:43     ` Andrew Lutomirski
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Lutomirski @ 2011-05-30 10:43 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Ingo Molnar, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson

On Mon, May 30, 2011 at 3:35 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, May 29, 2011 at 11:48:45PM -0400, Andy Lutomirski wrote:
>> There's a fair amount of code in the vsyscall page, and who knows
>> what will happen if an exploit jumps into the middle of it.  Reduce
>> the risk by replacing most of it with short magic incantations that
>> are useless if entered in the middle.  This change can be disabled
>> by CONFIG_UNSAFE_VSYSCALLS (default y).
>>
>> This causes vsyscalls to be a little more expensive than real
>> syscalls.  Fortunately sensible programs don't use them.
>>
>> Some care is taken to make sure that tools like valgrind and
>> ThreadSpotter still work.
>>
>> This patch is not perfect: the vread_tsc and vread_hpet functions
>> are still at a fixed address.  Fixing that might involve making
>> alternative patching work in the vDSO.
>>
>> Signed-off-by: Andy Lutomirski <luto@mit.edu>
>> ---
>>  arch/x86/Kconfig                  |   17 +++++
>>  arch/x86/kernel/Makefile          |    3 +
>>  arch/x86/kernel/vsyscall_64.c     |  121 +++++++++++++++++++++++++++++++++++--
>>  arch/x86/kernel/vsyscall_emu_64.S |   40 ++++++++++++
>>  4 files changed, 176 insertions(+), 5 deletions(-)
>>  create mode 100644 arch/x86/kernel/vsyscall_emu_64.S
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index cc6c53a..186018b 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1650,6 +1650,23 @@ config COMPAT_VDSO
>>
>>         If unsure, say Y.
>>
>> +config UNSAFE_VSYSCALLS
>> +     def_bool y
>> +     prompt "Unsafe fast legacy vsyscalls"
>> +     depends on X86_64
>> +     ---help---
>> +       Legacy user code expects to be able to issue three syscalls
>> +       by calling fixed addresses in kernel space.  If you say N,
>> +       then the kernel traps and emulates these calls.  If you say
>> +       Y, then there is actual executable code at a fixed address
>> +       to implement these calls efficiently.
>> +
>> +       On a system with recent enough glibc (probably 2.14 or
>> +       newer) and no static binaries, you can say N without a
>> +       performance penalty to improve security
>> +
>> +       If unsure, say Y.
>> +
>>  config CMDLINE_BOOL
>>       bool "Built-in kernel command line"
>>       ---help---
>> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> index a24521b..b901781 100644
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -42,6 +42,9 @@ obj-$(CONFIG_X86_32)        += probe_roms_32.o
>>  obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o
>>  obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
>>  obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o vread_tsc_64.o
>> +ifndef CONFIG_UNSAFE_VSYSCALLS
>> +     obj-$(CONFIG_X86_64)    += vsyscall_emu_64.o
>> +endif
>>  obj-y                        += bootflag.o e820.o
>>  obj-y                        += pci-dma.o quirks.o topology.o kdebugfs.o
>>  obj-y                        += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
>> diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
>> index 71fa506..5b3d62a 100644
>> --- a/arch/x86/kernel/vsyscall_64.c
>> +++ b/arch/x86/kernel/vsyscall_64.c
>> @@ -48,9 +48,6 @@
>>  #include <asm/vgtod.h>
>>  #include <asm/traps.h>
>>
>> -#define __vsyscall(nr) \
>> -             __attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
>> -
>>  DEFINE_VVAR(int, vgetcpu_mode);
>>  DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
>>  {
>> @@ -96,6 +93,7 @@ static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
>>               return;
>>
>>       tsk = current;
>> +
>>       printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx",
>>              is_warning ? KERN_WARNING : KERN_INFO,
>>              tsk->comm, task_pid_nr(tsk),
>> @@ -106,6 +104,12 @@ static void warn_bad_vsyscall(struct pt_regs *regs, bool is_warning,
>>       printk("\n");
>>  }
>>
>> +#ifdef CONFIG_UNSAFE_VSYSCALLS
>> +
>> +#define __vsyscall(nr)                                                       \
>> +     __attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
>> +
>> +
>>  /* RED-PEN may want to readd seq locking, but then the variable should be
>>   * write-once.
>>   */
>> @@ -117,8 +121,11 @@ static __always_inline void do_get_tz(struct timezone * tz)
>>  static __always_inline int fallback_gettimeofday(struct timeval *tv)
>>  {
>>       int ret;
>> -     /* Invoke do_emulate_vsyscall. */
>> -     asm volatile("movb $0xce, %%al;\n\t"
>> +     /*
>> +      * Invoke do_emulate_vsyscall.  Intentionally incompatible with
>> +      * the CONFIG_UNSAFE_VSYSCALLS=n case.
>> +      */
>> +     asm volatile("mov $0xce, %%al;\n\t"
>>                    "int %[vec]"
>>                    : "=a" (ret)
>>                    : "D" (tv), [vec] "i" (VSYSCALL_EMU_VECTOR));
>> @@ -237,6 +244,8 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
>>       long ret;
>>
>>       /* Kernel code must never get here. */
>> +     if (!user_mode(regs))
>> +             early_printk("oh crap!\n");
>>       BUG_ON(!user_mode(regs));
>>
>>       local_irq_enable();
>> @@ -278,6 +287,106 @@ out:
>>       local_irq_disable();
>>  }
>>
>> +#else /* CONFIG_UNSAFE_VSYSCALLS=n below */
>> +
>> +static inline unsigned long vsyscall_intcc_addr(int vsyscall_nr)
>> +{
>> +     return VSYSCALL_START + 1024*vsyscall_nr + 2;
>> +}
>> +
>> +void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
>> +{
>> +     u8 vsyscall_nr, al;
>> +     long ret;
>> +
>> +     /* Kernel code must never get here. */
>> +     BUG_ON(!user_mode(regs));
>> +
>> +     local_irq_enable();
>> +
>> +     /*
>> +      * Decode the vsyscall number.
>> +      * 0xcc -> 0, 0xce -> 1, 0xf0 -> 2; see vsyscall_emu_64.S for why.
>> +      */
>> +     al = regs->ax & 0xff;
>> +     vsyscall_nr = (al - 0xcc) >> 1;
>
> Ok, but
>
>        (0xf0 - 0xcc) >> 1 == 0x12
>
> Don't you mean 0xd0 here? Although 0xd0 is opcode for all those
> rotate/shift insns. What am I missing?

Wow, -ESHOULDUSECALCULATOR  I knew I was bad at arithmetic, but 0xcc +
4 == 0xf0 is pretty amazing.  I was so excited that I found three
"safe" opcodes in an arithmetic sequence.

I'll fix that.  I suspect the only reason I didn't crash anything is
because I didn't give vgetcpu enough exercise here.

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 10/10] x86-64: Document some of entry_64.S
  2011-05-30 10:40     ` Andrew Lutomirski
@ 2011-05-30 10:50       ` Ingo Molnar
  0 siblings, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2011-05-30 10:50 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Borislav Petkov, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson


* Andrew Lutomirski <luto@mit.edu> wrote:

> (And whose idea was it to have pushing the error code be optional?)

I think it must have been some genius trying to save 0.1 cycles while 
spending a thousand transistors on it in the critical path and 
introducing a dozen crashes and security holes into various kernels 
;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  7:46   ` Ingo Molnar
@ 2011-05-30 10:57     ` Andrew Lutomirski
  2011-05-30 10:59       ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lutomirski @ 2011-05-30 10:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson

On Mon, May 30, 2011 at 3:46 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andy Lutomirski <luto@MIT.EDU> wrote:
>
>> There's a fair amount of code in the vsyscall page, and who knows
>> what will happen if an exploit jumps into the middle of it.  Reduce
>> the risk by replacing most of it with short magic incantations that
>> are useless if entered in the middle.  This change can be disabled
>> by CONFIG_UNSAFE_VSYSCALLS (default y).
>
> btw., please flip the default or consider removing the option
> altogether.
>
> We want to improve security and we want safe vsyscalls the default,
> and it's no good if we make it too easy for users to keep the fire
> door open all the time! :-)

I'd advocate waiting until glibc 2.14 comes out with this change:

http://sourceware.org/git/?p=glibc.git;a=commit;h=a8509ca540427502bd955f35296ff7b727c7a8a1

I want to add a warning (ratelimited to an extremely low rate) in v3
whenever any of the vsyscalls get used telling users that their legacy
code is suffering a performance impact, but it seems like bad form to
tell people to build glibc from git to avoid a regression.

The other option is to make the change unconditional for gettimeofday
and vgetcpu but leave time alone for awhile.  There's no a priori
reason why leaving vtime around is worse than vread_tsc and
vread_hpet.


--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30 10:57     ` Andrew Lutomirski
@ 2011-05-30 10:59       ` Ingo Molnar
  2011-05-30 11:35         ` Andrew Lutomirski
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2011-05-30 10:59 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson


* Andrew Lutomirski <luto@mit.edu> wrote:

> On Mon, May 30, 2011 at 3:46 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Andy Lutomirski <luto@MIT.EDU> wrote:
> >
> >> There's a fair amount of code in the vsyscall page, and who knows
> >> what will happen if an exploit jumps into the middle of it.  Reduce
> >> the risk by replacing most of it with short magic incantations that
> >> are useless if entered in the middle.  This change can be disabled
> >> by CONFIG_UNSAFE_VSYSCALLS (default y).
> >
> > btw., please flip the default or consider removing the option
> > altogether.
> >
> > We want to improve security and we want safe vsyscalls the default,
> > and it's no good if we make it too easy for users to keep the fire
> > door open all the time! :-)
> 
> I'd advocate waiting until glibc 2.14 comes out with this change:
> 
> http://sourceware.org/git/?p=glibc.git;a=commit;h=a8509ca540427502bd955f35296ff7b727c7a8a1
> 
> I want to add a warning (ratelimited to an extremely low rate) in v3
> whenever any of the vsyscalls get used telling users that their legacy
> code is suffering a performance impact, but it seems like bad form to
> tell people to build glibc from git to avoid a regression.

But only statically built binaries would be impacted in practice, 
right? The number of statically built binaries that heavily rely on 
vsyscalls ought to be a very small set ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30 10:59       ` Ingo Molnar
@ 2011-05-30 11:35         ` Andrew Lutomirski
  2011-05-30 12:15           ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lutomirski @ 2011-05-30 11:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson

On Mon, May 30, 2011 at 6:59 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Lutomirski <luto@mit.edu> wrote:
>
>>
>> I'd advocate waiting until glibc 2.14 comes out with this change:
>>
>> http://sourceware.org/git/?p=glibc.git;a=commit;h=a8509ca540427502bd955f35296ff7b727c7a8a1
>>
>> I want to add a warning (ratelimited to an extremely low rate) in v3
>> whenever any of the vsyscalls get used telling users that their legacy
>> code is suffering a performance impact, but it seems like bad form to
>> tell people to build glibc from git to avoid a regression.
>
> But only statically built binaries would be impacted in practice,
> right? The number of statically built binaries that heavily rely on
> vsyscalls ought to be a very small set ...

With current glibc even dynamic binaries take the hit on time().

With the emulation warning (coming in v3), I get (on a Fedora 15-based VM):

[    0.635493] init[1] emulated legacy vsyscall time(); upgrade your
code to avoid a performance hit. ip:ffffffffff600404 sp:7fff277fe9c8
caller:3da3e9e27d in libc.so.6[3da3e00000+192000]

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30 11:35         ` Andrew Lutomirski
@ 2011-05-30 12:15           ` Ingo Molnar
  2011-05-30 12:25             ` Andrew Lutomirski
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2011-05-30 12:15 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson


* Andrew Lutomirski <luto@mit.edu> wrote:

> On Mon, May 30, 2011 at 6:59 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Andrew Lutomirski <luto@mit.edu> wrote:
> >
> >>
> >> I'd advocate waiting until glibc 2.14 comes out with this change:
> >>
> >> http://sourceware.org/git/?p=glibc.git;a=commit;h=a8509ca540427502bd955f35296ff7b727c7a8a1
> >>
> >> I want to add a warning (ratelimited to an extremely low rate) in v3
> >> whenever any of the vsyscalls get used telling users that their legacy
> >> code is suffering a performance impact, but it seems like bad form to
> >> tell people to build glibc from git to avoid a regression.
> >
> > But only statically built binaries would be impacted in practice,
> > right? The number of statically built binaries that heavily rely on
> > vsyscalls ought to be a very small set ...
> 
> With current glibc even dynamic binaries take the hit on time().

Indeed, you are right, i completely forgot that again :)

> With the emulation warning (coming in v3), I get (on a Fedora 
> 15-based VM):
> 
> [    0.635493] init[1] emulated legacy vsyscall time(); upgrade your
> code to avoid a performance hit. ip:ffffffffff600404 sp:7fff277fe9c8
> caller:3da3e9e27d in libc.so.6[3da3e00000+192000]

Ok, so we should leave the option enabled by default and distros can 
flip it as they upgrade/fix glibc, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30 12:15           ` Ingo Molnar
@ 2011-05-30 12:25             ` Andrew Lutomirski
  2011-05-30 14:12               ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lutomirski @ 2011-05-30 12:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson

On Mon, May 30, 2011 at 8:15 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Lutomirski <luto@mit.edu> wrote:
>
>> On Mon, May 30, 2011 at 6:59 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> > * Andrew Lutomirski <luto@mit.edu> wrote:
>> >
>> >>
>> >> I'd advocate waiting until glibc 2.14 comes out with this change:
>> >>
>> >> http://sourceware.org/git/?p=glibc.git;a=commit;h=a8509ca540427502bd955f35296ff7b727c7a8a1
>> >>
>> >> I want to add a warning (ratelimited to an extremely low rate) in v3
>> >> whenever any of the vsyscalls get used telling users that their legacy
>> >> code is suffering a performance impact, but it seems like bad form to
>> >> tell people to build glibc from git to avoid a regression.
>> >
>> > But only statically built binaries would be impacted in practice,
>> > right? The number of statically built binaries that heavily rely on
>> > vsyscalls ought to be a very small set ...
>>
>> With current glibc even dynamic binaries take the hit on time().
>
> Indeed, you are right, i completely forgot that again :)
>
>> With the emulation warning (coming in v3), I get (on a Fedora
>> 15-based VM):
>>
>> [    0.635493] init[1] emulated legacy vsyscall time(); upgrade your
>> code to avoid a performance hit. ip:ffffffffff600404 sp:7fff277fe9c8
>> caller:3da3e9e27d in libc.so.6[3da3e00000+192000]
>
> Ok, so we should leave the option enabled by default and distros can
> flip it as they upgrade/fix glibc, right?

Yes.

But I'll make the option control just time() instead of all three.
That will also reduce the size of the patch :)

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30 12:25             ` Andrew Lutomirski
@ 2011-05-30 14:12               ` Ingo Molnar
  0 siblings, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2011-05-30 14:12 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson


* Andrew Lutomirski <luto@mit.edu> wrote:

> But I'll make the option control just time() instead of all three.
> That will also reduce the size of the patch :)

Yes :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/10] x86-64: Emulate vsyscalls
  2011-05-30  7:51   ` Jan Beulich
  2011-05-30  8:07     ` Ingo Molnar
@ 2011-05-31  2:29     ` Andrew Lutomirski
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Lutomirski @ 2011-05-31  2:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ingo Molnar, Borislav Petkov, Jesper Juhl,
	richard -rw- weinberger, Arjan van de Ven, Mikael Pettersson,
	x86, Thomas Gleixner, Andrew Morton, Linus Torvalds,
	linux-kernel

On Mon, May 30, 2011 at 3:51 AM, Jan Beulich <JBeulich@novell.com> wrote:
>>>> On 30.05.11 at 05:48, Andy Lutomirski <luto@MIT.EDU> wrote:
>> This causes vsyscalls to be a little more expensive than real
>> syscalls.  Fortunately sensible programs don't use them.
>
> Hmm - weren't vsyscalls there for performance reasons?
>
> Besides that, just a mostly cosmetic remark:
>
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1650,6 +1650,23 @@ config COMPAT_VDSO
>>
>>         If unsure, say Y.
>>
>> +config UNSAFE_VSYSCALLS
>> +     def_bool y
>> +     prompt "Unsafe fast legacy vsyscalls"
>> +     depends on X86_64
>> +     ---help---
>> +       Legacy user code expects to be able to issue three syscalls
>> +       by calling fixed addresses in kernel space.  If you say N,
>> +       then the kernel traps and emulates these calls.  If you say
>> +       Y, then there is actual executable code at a fixed address
>> +       to implement these calls efficiently.
>> +
>> +       On a system with recent enough glibc (probably 2.14 or
>> +       newer) and no static binaries, you can say N without a
>> +       performance penalty to improve security
>> +
>> +       If unsure, say Y.
>> +
>>  config CMDLINE_BOOL
>>       bool "Built-in kernel command line"
>>       ---help---
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -42,6 +42,9 @@ obj-$(CONFIG_X86_32)        += probe_roms_32.o
>>  obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o
>>  obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
>>  obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o vread_tsc_64.o
>> +ifndef CONFIG_UNSAFE_VSYSCALLS
>> +     obj-$(CONFIG_X86_64)    += vsyscall_emu_64.o
>> +endif
>
> With the Kconfig dependency on X86_64 above and the new
> variable being a boolean one, these three lines can be written
> as just
>
> obj-$(CONFIG_UNSAFE_VSYSCALLS) += vsyscall_emu_64.o

I think that's backwards.  But in v3 that file will be included unconditionally.

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-05-31  2:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-30  3:48 [PATCH v2 00/10] Remove syscall instructions at fixed addresses Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 01/10] x86-64: Fix alignment of jiffies variable Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 02/10] x86-64: Give vvars their own page Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 03/10] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 04/10] x86-64: Replace vsyscall gettimeofday fallback with int 0xcc Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 05/10] x86-64: Map the HPET NX Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 06/10] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 07/10] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 08/10] x86-64: Emulate vsyscalls Andy Lutomirski
2011-05-30  7:35   ` Borislav Petkov
2011-05-30 10:43     ` Andrew Lutomirski
2011-05-30  7:46   ` Ingo Molnar
2011-05-30 10:57     ` Andrew Lutomirski
2011-05-30 10:59       ` Ingo Molnar
2011-05-30 11:35         ` Andrew Lutomirski
2011-05-30 12:15           ` Ingo Molnar
2011-05-30 12:25             ` Andrew Lutomirski
2011-05-30 14:12               ` Ingo Molnar
2011-05-30  7:51   ` Jan Beulich
2011-05-30  8:07     ` Ingo Molnar
2011-05-31  2:29     ` Andrew Lutomirski
2011-05-30  3:48 ` [PATCH v2 09/10] x86-64: Randomize int 0xcc magic al values at boot Andy Lutomirski
2011-05-30  3:48 ` [PATCH v2 10/10] x86-64: Document some of entry_64.S Andy Lutomirski
2011-05-30  7:59   ` Borislav Petkov
2011-05-30 10:40     ` Andrew Lutomirski
2011-05-30 10:50       ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.