linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] Remove syscall instructions at fixed addresses
@ 2011-06-05 17:50 Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable Andy Lutomirski
                   ` (9 more replies)
  0 siblings, 10 replies; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

Patch 1 is just a bugfix from the last vdso series.
The bug should be harmless but it's pretty dumb.  This is almost
certainly 3.0 material.

Patch 2 adds documentation for entry_64.S.  A lot of the magic in there is far from obvious.

Patches 3, 4, and 5 remove a bunch of syscall instructions in kernel space
at fixed addresses that user code can execute.  Several are data that isn't marked NX.  Patch 2 makes vvars NX and
5/10 makes the HPET NX.

The time() vsyscall contains an explicit syscall fallback.  Patch 3/10
removes it.

At this point, there is only one explicit syscall left in the vsyscall
page: the fallback case for vgettimeofday.  The rest of the series is to
remove it and most of the remaining vsyscall code.

Patch 6 is pure cleanup.  venosys (one of the four vsyscalls) has been
broken for years, so patch 6 removes it.

Patch 7 pads the empty parts of the vsyscall page with 0xcc.  0xcc is an
explicit trap.

Patch 8 removes the code implementing the vsyscalls and replaces it with
magic int 0xcc incantations.  These incantations are specifically
designed so that jumping into them at funny offsets will either work
fine or generate some kind of fault.  This is a significant performance
penalty (~220ns here) for all vsyscall users, but there aren't many
left.  Because current glibc still uses the time vsyscall (although it's
fixed in glibc's git), the option CONFIG_UNSAFE_VSYSCALLS (default y)
will leave time() alone.

This patch is also nice because it removes a bunch of duplicated code
from vsyscall_64.s.

With CONFIG_UNSAFE_VSYSCALLS=y, I can boot Fedora 15 into GNOME and run
Firefox without a single warning about emulated vsyscalls.

Patch 9/10 randomizes the int 0xcc incantation at bootup.  It is pretty
much worthless for security (there are only three choices for the random
number and it's easy to figure out which one is in use) but it prevents
overly clever userspace programs from thinking that the incantation is
ABI.  One instrumentation tool author offered to hard-code special
handling for int 0xcc; I want to discourage this approach.

Patch 10/10 adds CONFIG_UNSAFE_VSYSCALLS to
feature-removal-schedule.txt.  Removing it won't break anything but it
will slow some older code down.

Changed from v4:
 - Remove int 0xcc randomization.
 - Make .vvar one section instead of three (and fix a typo in EMIT_VVAR).
 - Add a missing #ifdef CONFIG_X86_64.
 - Update feature removal changelog to clarify that CONFIG_UNSAFE_VSYSCALLS
   is new.

Changes from v3:
 - Rebased onto tip/master (1a0c84d)

Changes from v2:
 - Reordered the patches.
 - Removed the option to leave gettimeofday and getcpu as native code.
 - Clean up the int 0xcc handler and registration.
 - sched_getcpu works now (thanks, Borislav, for catching my blatant arithmetic error).
 - Numerous small fixes from review comments.
 - Abandon my plan to spread turbostat to the masses.

Changes from v1:
 - Patches 6-10 are new.
 - The int 0xcc code is much prettier and has lots of bugs fixed.
 - I've decided to let everyone compile turbostat on their own :)


Andy Lutomirski (9):
  x86-64: Fix alignment of jiffies variable
  x86-64: Document some of entry_64.S
  x86-64: Give vvars their own page
  x86-64: Remove kernel.vsyscall64 sysctl
  x86-64: Map the HPET NX
  x86-64: Remove vsyscall number 3 (venosys)
  x86-64: Fill unused parts of the vsyscall page with 0xcc
  x86-64: Emulate legacy vsyscalls
  x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule

 Documentation/feature-removal-schedule.txt |    9 +
 Documentation/x86/entry_64.txt             |   98 ++++++++++
 arch/x86/Kconfig                           |   17 ++
 arch/x86/include/asm/fixmap.h              |    1 +
 arch/x86/include/asm/irq_vectors.h         |    6 +-
 arch/x86/include/asm/pgtable_types.h       |    6 +-
 arch/x86/include/asm/traps.h               |    4 +
 arch/x86/include/asm/vgtod.h               |    1 -
 arch/x86/include/asm/vsyscall.h            |    6 +
 arch/x86/include/asm/vvar.h                |   24 +--
 arch/x86/kernel/Makefile                   |    1 +
 arch/x86/kernel/entry_64.S                 |    4 +
 arch/x86/kernel/hpet.c                     |    2 +-
 arch/x86/kernel/traps.c                    |    6 +
 arch/x86/kernel/vmlinux.lds.S              |   48 +++---
 arch/x86/kernel/vsyscall_64.c              |  289 +++++++++++++++-------------
 arch/x86/kernel/vsyscall_emu_64.S          |   42 ++++
 arch/x86/vdso/vclock_gettime.c             |   55 ++----
 18 files changed, 407 insertions(+), 212 deletions(-)
 create mode 100644 Documentation/x86/entry_64.txt
 create mode 100644 arch/x86/kernel/vsyscall_emu_64.S

-- 
1.7.5.2


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:31   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 2/9] x86-64: Document some of entry_64.S Andy Lutomirski
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

It's declared __attribute__((aligned(16)) but it's explicitly not
aligned.  This is probably harmless but it's a bit embarrassing.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/vvar.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 341b355..a4eaca4 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -45,7 +45,7 @@
 /* DECLARE_VVAR(offset, type, name) */
 
 DECLARE_VVAR(0, volatile unsigned long, jiffies)
-DECLARE_VVAR(8, int, vgetcpu_mode)
+DECLARE_VVAR(16, int, vgetcpu_mode)
 DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
 
 #undef DECLARE_VVAR
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 2/9] x86-64: Document some of entry_64.S
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:31   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 3/9] x86-64: Give vvars their own page Andy Lutomirski
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 Documentation/x86/entry_64.txt |   98 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/entry_64.S     |    2 +
 2 files changed, 100 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/x86/entry_64.txt

diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
new file mode 100644
index 0000000..7869f14
--- /dev/null
+++ b/Documentation/x86/entry_64.txt
@@ -0,0 +1,98 @@
+This file documents some of the kernel entries in
+arch/x86/kernel/entry_64.S.  A lot of this explanation is adapted from
+an email from Ingo Molnar:
+
+http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
+
+The x86 architecture has quite a few different ways to jump into
+kernel code.  Most of these entry points are registered in
+arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S
+and arch/x86/ia32/ia32entry.S.
+
+The IDT vector assignments are listed in arch/x86/include/irq_vectors.h.
+
+Some of these entries are:
+
+ - system_call: syscall instruction from 64-bit code.
+
+ - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
+   either way.
+
+ - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit
+   code
+
+ - interrupt: An array of entries.  Every IDT vector that doesn't
+   explicitly point somewhere else gets set to the corresponding
+   value in interrupts.  These point to a whole array of
+   magically-generated functions that make their way to do_IRQ with
+   the interrupt number as a parameter.
+
+ - emulate_vsyscall: int 0xcc, a special non-ABI entry used by
+   vsyscall emulation.
+
+ - APIC interrupts: Various special-purpose interrupts for things
+   like TLB shootdown.
+
+ - Architecturally-defined exceptions like divide_error.
+
+There are a few complexities here.  The different x86-64 entries
+have different calling conventions.  The syscall and sysenter
+instructions have their own peculiar calling conventions.  Some of
+the IDT entries push an error code onto the stack; others don't.
+IDT entries using the IST alternative stack mechanism need their own
+magic to get the stack frames right.  (You can find some
+documentation in the AMD APM, Volume 2, Chapter 8 and the Intel SDM,
+Volume 3, Chapter 6.)
+
+Dealing with the swapgs instruction is especially tricky.  Swapgs
+toggles whether gs is the kernel gs or the user gs.  The swapgs
+instruction is rather fragile: it must nest perfectly and only in
+single depth, it should only be used if entering from user mode to
+kernel mode and then when returning to user-space, and precisely
+so. If we mess that up even slightly, we crash.
+
+So when we have a secondary entry, already in kernel mode, we *must
+not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
+not switched/swapped yet.
+
+Now, there's a secondary complication: there's a cheap way to test
+which mode the CPU is in and an expensive way.
+
+The cheap way is to pick this info off the entry frame on the kernel
+stack, from the CS of the ptregs area of the kernel stack:
+
+	xorl %ebx,%ebx
+	testl $3,CS+8(%rsp)
+	je error_kernelspace
+	SWAPGS
+
+The expensive (paranoid) way is to read back the MSR_GS_BASE value
+(which is what SWAPGS modifies):
+
+	movl $1,%ebx
+	movl $MSR_GS_BASE,%ecx
+	rdmsr
+	testl %edx,%edx
+	js 1f   /* negative -> in kernel */
+	SWAPGS
+	xorl %ebx,%ebx
+1:	ret
+
+and the whole paranoid non-paranoid macro complexity is about whether
+to suffer that RDMSR cost.
+
+If we are at an interrupt or user-trap/gate-alike boundary then we can
+use the faster check: the stack will be a reliable indicator of
+whether SWAPGS was already done: if we see that we are a secondary
+entry interrupting kernel mode execution, then we know that the GS
+base has already been switched. If it says that we interrupted
+user-space execution then we must do the SWAPGS.
+
+But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
+which might have triggered right after a normal entry wrote CS to the
+stack but before we executed SWAPGS, then the only safe way to check
+for GS is the slower method: the RDMSR.
+
+So we try only to mark those entry methods 'paranoid' that absolutely
+need the more expensive check for the GS base - and we generate all
+'normal' entry points with the regular (faster) entry macros.
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 8a445a0..72c4a77 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -9,6 +9,8 @@
 /*
  * entry.S contains the system-call and fault low-level handling routines.
  *
+ * Some of this is documented in Documentation/x86/entry_64.txt
+ *
  * NOTE: This code handles signal-recognition, which happens every time
  * after an interrupt and after each system call.
  *
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 3/9] x86-64: Give vvars their own page
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 2/9] x86-64: Document some of entry_64.S Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:32   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

Move vvars out of the vsyscall page into their own page and mark it
NX.

Without this patch, an attacker who can force a daemon to call some
fixed address could wait until the time contains, say, 0xCD80, and
then execute the current time.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/fixmap.h        |    1 +
 arch/x86/include/asm/pgtable_types.h |    2 ++
 arch/x86/include/asm/vvar.h          |   22 ++++++++++------------
 arch/x86/kernel/vmlinux.lds.S        |   28 +++++++++++++++++-----------
 arch/x86/kernel/vsyscall_64.c        |    5 +++++
 5 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 4729b2b..460c74e 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -78,6 +78,7 @@ enum fixed_addresses {
 	VSYSCALL_LAST_PAGE,
 	VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
 			    + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1,
+	VVAR_PAGE,
 	VSYSCALL_HPET,
 #endif
 	FIX_DBGP_BASE,
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d56187c..6a29aed6 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -108,6 +108,7 @@
 #define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
 #define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
@@ -130,6 +131,7 @@
 #define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
 #define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
 #define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
+#define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
 
 #define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
 #define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index a4eaca4..de656ac 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -10,15 +10,14 @@
  * In normal kernel code, they are used like any other variable.
  * In user code, they are accessed through the VVAR macro.
  *
- * Each of these variables lives in the vsyscall page, and each
- * one needs a unique offset within the little piece of the page
- * reserved for vvars.  Specify that offset in DECLARE_VVAR.
- * (There are 896 bytes available.  If you mess up, the linker will
- * catch it.)
+ * These variables live in a page of kernel data that has an extra RO
+ * mapping for userspace.  Each variable needs a unique offset within
+ * that page; specify that offset with the DECLARE_VVAR macro.  (If
+ * you mess up, the linker will catch it.)
  */
 
-/* Offset of vars within vsyscall page */
-#define VSYSCALL_VARS_OFFSET (3072 + 128)
+/* Base address of vvars.  This is not ABI. */
+#define VVAR_ADDRESS (-10*1024*1024 - 4096)
 
 #if defined(__VVAR_KERNEL_LDS)
 
@@ -26,17 +25,17 @@
  * right place.
  */
 #define DECLARE_VVAR(offset, type, name) \
-	EMIT_VVAR(name, VSYSCALL_VARS_OFFSET + offset)
+	EMIT_VVAR(name, offset)
 
 #else
 
 #define DECLARE_VVAR(offset, type, name)				\
 	static type const * const vvaraddr_ ## name =			\
-		(void *)(VSYSCALL_START + VSYSCALL_VARS_OFFSET + (offset));
+		(void *)(VVAR_ADDRESS + (offset));
 
 #define DEFINE_VVAR(type, name)						\
-	type __vvar_ ## name						\
-	__attribute__((section(".vsyscall_var_" #name), aligned(16)))
+	type name							\
+	__attribute__((section(".vvar_" #name), aligned(16)))
 
 #define VVAR(name) (*vvaraddr_ ## name)
 
@@ -49,4 +48,3 @@ DECLARE_VVAR(16, int, vgetcpu_mode)
 DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
 
 #undef DECLARE_VVAR
-#undef VSYSCALL_VARS_OFFSET
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 89aed99..98b378d 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -161,12 +161,6 @@ SECTIONS
 
 #define VVIRT_OFFSET (VSYSCALL_ADDR - __vsyscall_0)
 #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET)
-#define EMIT_VVAR(x, offset) .vsyscall_var_ ## x	\
-	ADDR(.vsyscall_0) + offset		 	\
-	: AT(VLOAD(.vsyscall_var_ ## x)) {     		\
-		*(.vsyscall_var_ ## x)			\
-	}						\
-	x = VVIRT(.vsyscall_var_ ## x);
 
 	. = ALIGN(4096);
 	__vsyscall_0 = .;
@@ -192,19 +186,31 @@ SECTIONS
 		*(.vsyscall_3)
 	}
 
-#define __VVAR_KERNEL_LDS
-#include <asm/vvar.h>
-#undef __VVAR_KERNEL_LDS
-
-	. = __vsyscall_0 + PAGE_SIZE;
+	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
 #undef VLOAD_OFFSET
 #undef VLOAD
 #undef VVIRT_OFFSET
 #undef VVIRT
+
+	__vvar_page = .;
+
+	.vvar : AT(ADDR(.vvar) - LOAD_OFFSET) {
+
+	      /* Place all vvars at the offsets in asm/vvar.h. */
+#define EMIT_VVAR(name, offset) 		\
+		. = offset;		\
+		*(.vvar_ ## name)
+#define __VVAR_KERNEL_LDS
+#include <asm/vvar.h>
+#undef __VVAR_KERNEL_LDS
 #undef EMIT_VVAR
 
+	} :data
+
+       . = ALIGN(__vvar_page + PAGE_SIZE, PAGE_SIZE);
+
 #endif /* CONFIG_X86_64 */
 
 	/* Init code and data - will be freed after init */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 3e68218..3cf1cef 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -284,9 +284,14 @@ void __init map_vsyscall(void)
 {
 	extern char __vsyscall_0;
 	unsigned long physaddr_page0 = __pa_symbol(&__vsyscall_0);
+	extern char __vvar_page;
+	unsigned long physaddr_vvar_page = __pa_symbol(&__vvar_page);
 
 	/* Note that VSYSCALL_MAPPED_PAGES must agree with the code below. */
 	__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
+	__set_fixmap(VVAR_PAGE, physaddr_vvar_page, PAGE_KERNEL_VVAR);
+	BUILD_BUG_ON((unsigned long)__fix_to_virt(VVAR_PAGE) !=
+		     (unsigned long)VVAR_ADDRESS);
 }
 
 static int __init vsyscall_init(void)
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (2 preceding siblings ...)
  2011-06-05 17:50 ` [PATCH v5 3/9] x86-64: Give vvars their own page Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:32   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-12-05 18:27   ` [PATCH v5 4/9] " Matthew Maurer
  2011-06-05 17:50 ` [PATCH v5 5/9] x86-64: Map the HPET NX Andy Lutomirski
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

It's unnecessary overhead in code that's supposed to be highly
optimized.  Removing it allows us to remove one of the two syscall
instructions in the vsyscall page.

The only sensible use for it is for UML users, and it doesn't fully
address inconsistent vsyscall results on UML.  The real fix for UML
is to stop using vsyscalls entirely.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/vgtod.h   |    1 -
 arch/x86/kernel/vsyscall_64.c  |   34 +------------------------
 arch/x86/vdso/vclock_gettime.c |   55 +++++++++++++++------------------------
 3 files changed, 22 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 646b4c1..aa5add8 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -11,7 +11,6 @@ struct vsyscall_gtod_data {
 	time_t		wall_time_sec;
 	u32		wall_time_nsec;
 
-	int		sysctl_enabled;
 	struct timezone sys_tz;
 	struct { /* extract of a clocksource struct */
 		cycle_t (*vread)(void);
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 3cf1cef..9b2f3f5 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -53,7 +53,6 @@ DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 {
 	.lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
-	.sysctl_enabled = 1,
 };
 
 void update_vsyscall_tz(void)
@@ -103,15 +102,6 @@ static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
 	return ret;
 }
 
-static __always_inline long time_syscall(long *t)
-{
-	long secs;
-	asm volatile("syscall"
-		: "=a" (secs)
-		: "0" (__NR_time),"D" (t) : __syscall_clobber);
-	return secs;
-}
-
 static __always_inline void do_vgettimeofday(struct timeval * tv)
 {
 	cycle_t now, base, mask, cycle_delta;
@@ -122,8 +112,7 @@ static __always_inline void do_vgettimeofday(struct timeval * tv)
 		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
 
 		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled ||
-			     !vread)) {
+		if (unlikely(!vread)) {
 			gettimeofday(tv,NULL);
 			return;
 		}
@@ -165,8 +154,6 @@ time_t __vsyscall(1) vtime(time_t *t)
 {
 	unsigned seq;
 	time_t result;
-	if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled))
-		return time_syscall(t);
 
 	do {
 		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
@@ -227,22 +214,6 @@ static long __vsyscall(3) venosys_1(void)
 	return -ENOSYS;
 }
 
-#ifdef CONFIG_SYSCTL
-static ctl_table kernel_table2[] = {
-	{ .procname = "vsyscall64",
-	  .data = &vsyscall_gtod_data.sysctl_enabled, .maxlen = sizeof(int),
-	  .mode = 0644,
-	  .proc_handler = proc_dointvec },
-	{}
-};
-
-static ctl_table kernel_root_table2[] = {
-	{ .procname = "kernel", .mode = 0555,
-	  .child = kernel_table2 },
-	{}
-};
-#endif
-
 /* Assume __initcall executes before all user space. Hopefully kmod
    doesn't violate that. We'll find out if it does. */
 static void __cpuinit vsyscall_set_cpu(int cpu)
@@ -301,9 +272,6 @@ static int __init vsyscall_init(void)
 	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
 	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
 	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
-#ifdef CONFIG_SYSCTL
-	register_sysctl_table(kernel_root_table2);
-#endif
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index a724905..cf54813 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -116,21 +116,21 @@ notrace static noinline int do_monotonic_coarse(struct timespec *ts)
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
-	if (likely(gtod->sysctl_enabled))
-		switch (clock) {
-		case CLOCK_REALTIME:
-			if (likely(gtod->clock.vread))
-				return do_realtime(ts);
-			break;
-		case CLOCK_MONOTONIC:
-			if (likely(gtod->clock.vread))
-				return do_monotonic(ts);
-			break;
-		case CLOCK_REALTIME_COARSE:
-			return do_realtime_coarse(ts);
-		case CLOCK_MONOTONIC_COARSE:
-			return do_monotonic_coarse(ts);
-		}
+	switch (clock) {
+	case CLOCK_REALTIME:
+		if (likely(gtod->clock.vread))
+			return do_realtime(ts);
+		break;
+	case CLOCK_MONOTONIC:
+		if (likely(gtod->clock.vread))
+			return do_monotonic(ts);
+		break;
+	case CLOCK_REALTIME_COARSE:
+		return do_realtime_coarse(ts);
+	case CLOCK_MONOTONIC_COARSE:
+		return do_monotonic_coarse(ts);
+	}
+
 	return vdso_fallback_gettime(clock, ts);
 }
 int clock_gettime(clockid_t, struct timespec *)
@@ -139,7 +139,7 @@ int clock_gettime(clockid_t, struct timespec *)
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	long ret;
-	if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
+	if (likely(gtod->clock.vread)) {
 		if (likely(tv != NULL)) {
 			BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
 				     offsetof(struct timespec, tv_nsec) ||
@@ -161,27 +161,14 @@ notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 int gettimeofday(struct timeval *, struct timezone *)
 	__attribute__((weak, alias("__vdso_gettimeofday")));
 
-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
-
-static __always_inline long time_syscall(long *t)
-{
-	long secs;
-	asm volatile("syscall"
-		     : "=a" (secs)
-		     : "0" (__NR_time), "D" (t) : "cc", "r11", "cx", "memory");
-	return secs;
-}
-
+/*
+ * This will break when the xtime seconds get inaccurate, but that is
+ * unlikely
+ */
 notrace time_t __vdso_time(time_t *t)
 {
-	time_t result;
-
-	if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled))
-		return time_syscall(t);
-
 	/* This is atomic on x86_64 so we don't need any locks. */
-	result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
+	time_t result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
 
 	if (t)
 		*t = result;
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 5/9] x86-64: Map the HPET NX
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (3 preceding siblings ...)
  2011-06-05 17:50 ` [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:33   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 6/9] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

Currently the HPET mapping is a user-accessible syscall instruction
at a fixed address some of the time.  A sufficiently determined
hacker might be able to guess when.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/include/asm/pgtable_types.h |    4 ++--
 arch/x86/kernel/hpet.c               |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 6a29aed6..013286a 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -107,8 +107,8 @@
 #define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
-#define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
+#define __PAGE_KERNEL_VVAR_NOCACHE	(__PAGE_KERNEL_VVAR | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
@@ -130,8 +130,8 @@
 #define PAGE_KERNEL_LARGE_NOCACHE	__pgprot(__PAGE_KERNEL_LARGE_NOCACHE)
 #define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
 #define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
 #define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
+#define PAGE_KERNEL_VVAR_NOCACHE	__pgprot(__PAGE_KERNEL_VVAR_NOCACHE)
 
 #define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
 #define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 6781765..e9f5605 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -71,7 +71,7 @@ static inline void hpet_set_mapping(void)
 {
 	hpet_virt_address = ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
 #ifdef CONFIG_X86_64
-	__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);
+	__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VVAR_NOCACHE);
 #endif
 }
 
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 6/9] x86-64: Remove vsyscall number 3 (venosys)
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (4 preceding siblings ...)
  2011-06-05 17:50 ` [PATCH v5 5/9] x86-64: Map the HPET NX Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:33   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 7/9] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

It just segfaults since April 2008 (a4928cff), so I'm pretty sure
that nothing uses it.  And having an empty section makes the linker
script a bit fragile.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/kernel/vmlinux.lds.S |    4 ----
 arch/x86/kernel/vsyscall_64.c |    3 ---
 2 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 98b378d..4f90082 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -182,10 +182,6 @@ SECTIONS
 		*(.vsyscall_2)
 	}
 
-	.vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) {
-		*(.vsyscall_3)
-	}
-
 	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 9b2f3f5..c7fe325 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -209,9 +209,6 @@ vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
 	return 0;
 }
 
-static long __vsyscall(3) venosys_1(void)
-{
-	return -ENOSYS;
 }
 
 /* Assume __initcall executes before all user space. Hopefully kmod
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 7/9] x86-64: Fill unused parts of the vsyscall page with 0xcc
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (5 preceding siblings ...)
  2011-06-05 17:50 ` [PATCH v5 6/9] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

Jumping to 0x00 might do something depending on the following bytes.
Jumping to 0xcc is a trap.  So fill the unused parts of the vsyscall
page with 0xcc to make it useless for exploits to jump there.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/kernel/vmlinux.lds.S |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4f90082..8017471 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -166,22 +166,20 @@ SECTIONS
 	__vsyscall_0 = .;
 
 	. = VSYSCALL_ADDR;
-	.vsyscall_0 : AT(VLOAD(.vsyscall_0)) {
+	.vsyscall : AT(VLOAD(.vsyscall)) {
 		*(.vsyscall_0)
-	} :user
 
-	. = ALIGN(L1_CACHE_BYTES);
-	.vsyscall_fn : AT(VLOAD(.vsyscall_fn)) {
+		. = ALIGN(L1_CACHE_BYTES);
 		*(.vsyscall_fn)
-	}
 
-	.vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) {
+		. = 1024;
 		*(.vsyscall_1)
-	}
-	.vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) {
+
+		. = 2048;
 		*(.vsyscall_2)
-	}
 
+		. = 4096;  /* Pad the whole page. */
+	} :user =0xcc
 	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (6 preceding siblings ...)
  2011-06-05 17:50 ` [PATCH v5 7/9] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-05 19:30   ` Ingo Molnar
                     ` (4 more replies)
  2011-06-05 17:50 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule Andy Lutomirski
  2011-06-05 20:05 ` [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andrew Lutomirski
  9 siblings, 5 replies; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

There's a fair amount of code in the vsyscall page.  It contains a
syscall instruction (in the gettimeofday fallback) and who knows
what will happen if an exploit jumps into the middle of some other
code.

Reduce the risk by replacing the vsyscalls with short magic
incantations that cause the kernel to emulate the real vsyscalls.
These incantations are useless if entered in the middle.

This causes vsyscalls to be a little more expensive than real
syscalls.  Fortunately sensible programs don't use them.

Less fortunately, current glibc uses the vsyscall for time() even in
dynamic binaries.  So there's a CONFIG_UNSAFE_VSYSCALLS (default y)
option that leaves in the native code for time().  That should go
away in awhile when glibc gets fixed.

Some care is taken to make sure that tools like valgrind and
ThreadSpotter still work.

This patch is not perfect: the vread_tsc and vread_hpet functions
are still at a fixed address.  Fixing that might involve making
alternative patching work in the vDSO.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 arch/x86/Kconfig                   |   17 +++
 arch/x86/include/asm/irq_vectors.h |    6 +-
 arch/x86/include/asm/traps.h       |    4 +
 arch/x86/include/asm/vsyscall.h    |    6 +
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/entry_64.S         |    2 +
 arch/x86/kernel/traps.c            |    6 +
 arch/x86/kernel/vsyscall_64.c      |  253 +++++++++++++++++++++---------------
 arch/x86/kernel/vsyscall_emu_64.S  |   42 ++++++
 9 files changed, 233 insertions(+), 104 deletions(-)
 create mode 100644 arch/x86/kernel/vsyscall_emu_64.S

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index da34972..79e5d8a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1646,6 +1646,23 @@ config COMPAT_VDSO
 
 	  If unsure, say Y.
 
+config UNSAFE_VSYSCALLS
+	def_bool y
+	prompt "Unsafe fast legacy vsyscalls"
+	depends on X86_64
+	---help---
+	  Legacy user code expects to be able to issue three syscalls
+	  by calling fixed addresses in kernel space.  If you say N,
+	  then the kernel traps and emulates these calls.  If you say
+	  Y, then there is actual executable code at a fixed address
+	  to implement time() efficiently.
+
+	  On a system with recent enough glibc (probably 2.14 or
+	  newer) and no static binaries, you can say N without a
+	  performance penalty to improve security
+
+	  If unsure, say Y.
+
 config CMDLINE_BOOL
 	bool "Built-in kernel command line"
 	---help---
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 6e976ee..a563c50 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -17,7 +17,8 @@
  *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
  *  Vectors  32 ... 127 : device interrupts
  *  Vector  128         : legacy int80 syscall interface
- *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 : device interrupts
+ *  Vector  204         : legacy x86_64 vsyscall emulation
+ *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 except 204 : device interrupts
  *  Vectors INVALIDATE_TLB_VECTOR_START ... 255 : special interrupts
  *
  * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
@@ -50,6 +51,9 @@
 #ifdef CONFIG_X86_32
 # define SYSCALL_VECTOR			0x80
 #endif
+#ifdef CONFIG_X86_64
+# define VSYSCALL_EMU_VECTOR		0xcc
+#endif
 
 /*
  * Vectors 0x30-0x3f are used for ISA interrupts.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0310da6..2bae0a5 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_TRAPS_H
 #define _ASM_X86_TRAPS_H
 
+#include <linux/kprobes.h>
+
 #include <asm/debugreg.h>
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
@@ -38,6 +40,7 @@ asmlinkage void alignment_check(void);
 asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
+asmlinkage void emulate_vsyscall(void);
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
@@ -64,6 +67,7 @@ dotraplinkage void do_alignment_check(struct pt_regs *, long);
 dotraplinkage void do_machine_check(struct pt_regs *, long);
 #endif
 dotraplinkage void do_simd_coprocessor_error(struct pt_regs *, long);
+dotraplinkage void do_emulate_vsyscall(struct pt_regs *, long);
 #ifdef CONFIG_X86_32
 dotraplinkage void do_iret_error(struct pt_regs *, long);
 #endif
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index d555973..293ae08 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -31,6 +31,12 @@ extern struct timezone sys_tz;
 
 extern void map_vsyscall(void);
 
+/* Emulation */
+static inline bool in_vsyscall_page(unsigned long addr)
+{
+	return (addr & ~(PAGE_SIZE - 1)) == VSYSCALL_START;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 90b06d4..cc0469a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -44,6 +44,7 @@ obj-y			+= probe_roms.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
+obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 72c4a77..e949793 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1123,6 +1123,8 @@ zeroentry spurious_interrupt_bug do_spurious_interrupt_bug
 zeroentry coprocessor_error do_coprocessor_error
 errorentry alignment_check do_alignment_check
 zeroentry simd_coprocessor_error do_simd_coprocessor_error
+zeroentry emulate_vsyscall do_emulate_vsyscall
+
 
 	/* Reload gs selector with exception handling */
 	/* edi:  new selector */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..fbc097a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -872,6 +872,12 @@ void __init trap_init(void)
 	set_bit(SYSCALL_VECTOR, used_vectors);
 #endif
 
+#ifdef CONFIG_X86_64
+	BUG_ON(test_bit(VSYSCALL_EMU_VECTOR, used_vectors));
+	set_system_intr_gate(VSYSCALL_EMU_VECTOR, &emulate_vsyscall);
+	set_bit(VSYSCALL_EMU_VECTOR, used_vectors);
+#endif
+
 	/*
 	 * Should be a barrier for any external CPU state:
 	 */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index c7fe325..52ba392 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -32,6 +32,8 @@
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/notifier.h>
+#include <linux/syscalls.h>
+#include <linux/ratelimit.h>
 
 #include <asm/vsyscall.h>
 #include <asm/pgtable.h>
@@ -44,10 +46,7 @@
 #include <asm/desc.h>
 #include <asm/topology.h>
 #include <asm/vgtod.h>
-
-#define __vsyscall(nr) \
-		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
-#define __syscall_clobber "r11","cx","memory"
+#include <asm/traps.h>
 
 DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
@@ -84,73 +83,45 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
 
-/* RED-PEN may want to readd seq locking, but then the variable should be
- * write-once.
- */
-static __always_inline void do_get_tz(struct timezone * tz)
+static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
+			      const char *message)
 {
-	*tz = VVAR(vsyscall_gtod_data).sys_tz;
+	struct task_struct *tsk;
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+
+	if (!show_unhandled_signals || !__ratelimit(&rs))
+		return;
+
+	tsk = current;
+
+	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx",
+	       level, tsk->comm, task_pid_nr(tsk),
+	       message,
+	       regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
+	if (!in_vsyscall_page(regs->ip - 2))
+		print_vma_addr(" in ", regs->ip - 2);
+	printk("\n");
 }
 
-static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
-{
-	int ret;
-	asm volatile("syscall"
-		: "=a" (ret)
-		: "0" (__NR_gettimeofday),"D" (tv),"S" (tz)
-		: __syscall_clobber );
-	return ret;
-}
+/* al values for each vsyscall; see vsyscall_emu_64.S for why. */
+static u8 vsyscall_nr_to_al[] = {0xcc, 0xce, 0xf0};
 
-static __always_inline void do_vgettimeofday(struct timeval * tv)
+static int al_to_vsyscall_nr(u8 al)
 {
-	cycle_t now, base, mask, cycle_delta;
-	unsigned seq;
-	unsigned long mult, shift, nsec;
-	cycle_t (*vread)(void);
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
-
-		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!vread)) {
-			gettimeofday(tv,NULL);
-			return;
-		}
-
-		now = vread();
-		base = VVAR(vsyscall_gtod_data).clock.cycle_last;
-		mask = VVAR(vsyscall_gtod_data).clock.mask;
-		mult = VVAR(vsyscall_gtod_data).clock.mult;
-		shift = VVAR(vsyscall_gtod_data).clock.shift;
-
-		tv->tv_sec = VVAR(vsyscall_gtod_data).wall_time_sec;
-		nsec = VVAR(vsyscall_gtod_data).wall_time_nsec;
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
-
-	/* calculate interval: */
-	cycle_delta = (now - base) & mask;
-	/* convert to nsecs: */
-	nsec += (cycle_delta * mult) >> shift;
-
-	while (nsec >= NSEC_PER_SEC) {
-		tv->tv_sec += 1;
-		nsec -= NSEC_PER_SEC;
-	}
-	tv->tv_usec = nsec / NSEC_PER_USEC;
+	int i;
+	for (i = 0; i < ARRAY_SIZE(vsyscall_nr_to_al); i++)
+		if (vsyscall_nr_to_al[i] == al)
+			return i;
+	return -1;
 }
 
-int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
-{
-	if (tv)
-		do_vgettimeofday(tv);
-	if (tz)
-		do_get_tz(tz);
-	return 0;
-}
+#ifdef CONFIG_UNSAFE_VSYSCALLS
 
 /* This will break when the xtime seconds get inaccurate, but that is
  * unlikely */
-time_t __vsyscall(1) vtime(time_t *t)
+time_t __attribute__ ((unused, __section__(".vsyscall_1"))) notrace
+vtime(time_t *t)
 {
 	unsigned seq;
 	time_t result;
@@ -167,48 +138,127 @@ time_t __vsyscall(1) vtime(time_t *t)
 	return result;
 }
 
-/* Fast way to get current CPU and node.
-   This helps to do per node and per CPU caches in user space.
-   The result is not guaranteed without CPU affinity, but usually
-   works out because the scheduler tries to keep a thread on the same
-   CPU.
+#endif /* CONFIG_UNSAFE_VSYSCALLS */
+
+/* If CONFIG_UNSAFE_VSYSCALLS=y, then this is incorrect for vsyscall_nr == 1. */
+static inline unsigned long vsyscall_intcc_addr(int vsyscall_nr)
+{
+	return VSYSCALL_START + 1024*vsyscall_nr + 2;
+}
 
-   tcache must point to a two element sized long array.
-   All arguments can be NULL. */
-long __vsyscall(2)
-vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 {
-	unsigned int p;
-	unsigned long j = 0;
-
-	/* Fast cache - only recompute value once per jiffies and avoid
-	   relatively costly rdtscp/cpuid otherwise.
-	   This works because the scheduler usually keeps the process
-	   on the same CPU and this syscall doesn't guarantee its
-	   results anyways.
-	   We do this here because otherwise user space would do it on
-	   its own in a likely inferior way (no access to jiffies).
-	   If you don't like it pass NULL. */
-	if (tcache && tcache->blob[0] == (j = VVAR(jiffies))) {
-		p = tcache->blob[1];
-	} else if (VVAR(vgetcpu_mode) == VGETCPU_RDTSCP) {
-		/* Load per CPU data from RDTSCP */
-		native_read_tscp(&p);
-	} else {
-		/* Load per CPU data from GDT */
-		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+	static DEFINE_RATELIMIT_STATE(rs, 3600 * HZ, 3);
+	struct task_struct *tsk;
+	const char *vsyscall_name;
+	int vsyscall_nr;
+	long ret;
+
+	/* Kernel code must never get here. */
+	BUG_ON(!user_mode(regs));
+
+	local_irq_enable();
+
+	vsyscall_nr = al_to_vsyscall_nr(regs->ax & 0xff);
+	if (vsyscall_nr < 0) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "illegal int 0xcc "
+				  "(exploit attempt?)");
+		goto sigsegv;
 	}
-	if (tcache) {
-		tcache->blob[0] = j;
-		tcache->blob[1] = p;
+
+	if (regs->ip - 2 != vsyscall_intcc_addr(vsyscall_nr)) {
+		if (in_vsyscall_page(regs->ip - 2)) {
+			/* This should not be possible. */
+			warn_bad_vsyscall(KERN_WARNING, regs,
+					  "int 0xcc bogus magic "
+					  "(exploit attempt?)");
+			goto sigsegv;
+		} else {
+			/*
+			 * We allow the call because tools like ThreadSpotter
+			 * might copy the int 0xcc instruction to user memory.
+			 * We make it annoying, though, to try to persuade
+			 * the authors to stop doing that...
+			 */
+			warn_bad_vsyscall(KERN_WARNING, regs,
+					  "int 0xcc in user code "
+					  "(exploit attempt? legacy "
+					  "instrumented code?)");
+		}
 	}
-	if (cpu)
-		*cpu = p & 0xfff;
-	if (node)
-		*node = p >> 12;
-	return 0;
-}
 
+	tsk = current;
+	if (tsk->seccomp.mode) {
+		do_exit(SIGKILL);
+		goto out;
+	}
+
+	switch (vsyscall_nr) {
+	case 0:
+		vsyscall_name = "gettimeofday";
+		ret = sys_gettimeofday(
+			(struct timeval __user *)regs->di,
+			(struct timezone __user *)regs->si);
+		break;
+
+	case 1:
+#ifdef CONFIG_UNSAFE_VSYSCALLS
+		warn_bad_vsyscall(KERN_WARNING, regs, "bogus time() vsyscall "
+				  "emulation (exploit attempt?)");
+		goto sigsegv;
+#else
+		vsyscall_name = "time";
+		ret = sys_time((time_t __user *)regs->di);
+		break;
+#endif
+
+	case 2:
+		vsyscall_name = "getcpu";
+		ret = sys_getcpu((unsigned __user *)regs->di,
+				 (unsigned __user *)regs->si,
+				 0);
+		break;
+
+	default:
+		BUG();
+	}
+
+	if (ret == -EFAULT) {
+		/*
+		 * Bad news -- userspace fed a bad pointer to a vsyscall.
+		 *
+		 * With a real vsyscall, that would have caused SIGSEGV.
+		 * To make writing reliable exploits using the emulated
+		 * vsyscalls harder, generate SIGSEGV here as well.
+		 */
+		warn_bad_vsyscall(KERN_INFO, regs,
+				  "vsyscall fault (exploit attempt?)");
+		goto sigsegv;
+	}
+
+	regs->ax = ret;
+
+	if (__ratelimit(&rs)) {
+		unsigned long caller;
+		if (get_user(caller, (unsigned long __user *)regs->sp))
+			caller = 0;  /* no need to crash on this fault. */
+		printk(KERN_INFO "%s[%d] emulated legacy vsyscall %s(); "
+		       "upgrade your code to avoid a performance hit. "
+		       "ip:%lx sp:%lx caller:%lx",
+		       tsk->comm, task_pid_nr(tsk), vsyscall_name,
+		       regs->ip - 2, regs->sp, caller);
+		if (caller)
+			print_vma_addr(" in ", caller);
+		printk("\n");
+	}
+
+out:
+	local_irq_disable();
+	return;
+
+sigsegv:
+	regs->ip -= 2;  /* The faulting instruction should be the int 0xcc. */
+	force_sig(SIGSEGV, current);
 }
 
 /* Assume __initcall executes before all user space. Hopefully kmod
@@ -264,11 +314,8 @@ void __init map_vsyscall(void)
 
 static int __init vsyscall_init(void)
 {
-	BUG_ON(((unsigned long) &vgettimeofday !=
-			VSYSCALL_ADDR(__NR_vgettimeofday)));
-	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
-	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
-	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
+	BUG_ON(VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE));
+
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
new file mode 100644
index 0000000..7ebde61
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -0,0 +1,42 @@
+/*
+ * vsyscall_emu_64.S: Vsyscall emulation page
+ * Copyright (c) 2011 Andy Lutomirski
+ * Subject to the GNU General Public License, version 2
+*/
+
+#include <linux/linkage.h>
+#include <asm/irq_vectors.h>
+
+/*
+ * These magic incantations are chosen so that they fault if entered anywhere
+ * other than an instruction boundary.  The movb instruction is two bytes, and
+ * the int imm8 instruction is also two bytes, so the only misaligned places
+ * to enter are the immediate values for the two instructions.  0xcc is int3
+ * (always faults), 0xce is into (faults on x64-64, and 32-bit code can't get
+ * here), and 0xf0 is lock (lock int is invalid).
+ *
+ * The unused parts of the page are filled with 0xcc by the linker script.
+ */
+
+.section .vsyscall_0, "a"
+ENTRY(vsyscall_0)
+	movb $0xcc, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_0)
+
+#ifndef CONFIG_UNSAFE_VSYSCALLS
+.section .vsyscall_1, "a"
+ENTRY(vsyscall_1)
+	movb $0xce, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_1)
+#endif
+
+.section .vsyscall_2, "a"
+ENTRY(vsyscall_2)
+	movb $0xf0, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_2)
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (7 preceding siblings ...)
  2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
@ 2011-06-05 17:50 ` Andy Lutomirski
  2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
  2011-06-06  8:46   ` [PATCH v5 9/9] " Linus Torvalds
  2011-06-05 20:05 ` [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andrew Lutomirski
  9 siblings, 2 replies; 112+ messages in thread
From: Andy Lutomirski @ 2011-06-05 17:50 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
temporary hack to avoid penalizing users who don't build glibc from
git.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 Documentation/feature-removal-schedule.txt |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 1a9446b..94b4470 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -600,3 +600,12 @@ Why:	Superseded by the UVCIOC_CTRL_QUERY ioctl.
 Who:	Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 
 ----------------------------
+
+What:	CONFIG_UNSAFE_VSYSCALLS (x86_64)
+When:	When glibc 2.14 or newer is ubitquitous.  Perhaps mid-2012.
+Why:	Having user-executable code at a fixed address is a security problem.
+	Turning off CONFIG_UNSAFE_VSYSCALLS mostly removes the risk but will
+	make the time() function slower on glibc versions 2.13 and below.
+Who:	Andy Lutomirski <luto@mit.edu>
+
+----------------------------
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
@ 2011-06-05 19:30   ` Ingo Molnar
  2011-06-05 20:01     ` Andrew Lutomirski
  2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-05 19:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec


* Andy Lutomirski <luto@MIT.EDU> wrote:

> This patch is not perfect: the vread_tsc and vread_hpet functions 
> are still at a fixed address.  Fixing that might involve making 
> alternative patching work in the vDSO.

Can you see any problem with them? Here is how they are looking like 
currently:

ffffffffff600100 <vread_tsc>:
ffffffffff600100:	55                   	push   %rbp
ffffffffff600101:	48 89 e5             	mov    %rsp,%rbp
ffffffffff600104:	66 66 90             	data32 xchg %ax,%ax
ffffffffff600107:	66 66 90             	data32 xchg %ax,%ax
ffffffffff60010a:	0f 31                	rdtsc  
ffffffffff60010c:	89 c1                	mov    %eax,%ecx
ffffffffff60010e:	48 89 d0             	mov    %rdx,%rax
ffffffffff600111:	48 8b 14 25 28 0d 60 	mov    0xffffffffff600d28,%rdx
ffffffffff600118:	ff 
ffffffffff600119:	48 c1 e0 20          	shl    $0x20,%rax
ffffffffff60011d:	48 09 c8             	or     %rcx,%rax
ffffffffff600120:	48 39 d0             	cmp    %rdx,%rax
ffffffffff600123:	73 03                	jae    ffffffffff600128 <vread_tsc+0x28>
ffffffffff600125:	48 89 d0             	mov    %rdx,%rax
ffffffffff600128:	5d                   	pop    %rbp
ffffffffff600129:	c3                   	retq   

ffffffffff60012a <vread_hpet>:
ffffffffff60012a:	55                   	push   %rbp
ffffffffff60012b:	48 89 e5             	mov    %rsp,%rbp
ffffffffff60012e:	8b 04 25 f0 f0 5f ff 	mov    0xffffffffff5ff0f0,%eax
ffffffffff600135:	89 c0                	mov    %eax,%eax
ffffffffff600137:	5d                   	pop    %rbp
ffffffffff600138:	c3                   	retq   

There's no obvious syscall instruction in them that i can see. No 
0x0f 0x05 pattern (even misaligned), no 0xcd-anything.

We could even 'tie down' the actual assembly by moving this all to a 
.S - this way we protect against GCC accidentally generating 
something dangerous in there. I suggested that before.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-05 19:30   ` Ingo Molnar
@ 2011-06-05 20:01     ` Andrew Lutomirski
  2011-06-06  7:39       ` Ingo Molnar
  2011-06-06  9:42       ` pageexec
  0 siblings, 2 replies; 112+ messages in thread
From: Andrew Lutomirski @ 2011-06-05 20:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec

On Sun, Jun 5, 2011 at 3:30 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andy Lutomirski <luto@MIT.EDU> wrote:
>
>> This patch is not perfect: the vread_tsc and vread_hpet functions
>> are still at a fixed address.  Fixing that might involve making
>> alternative patching work in the vDSO.
>
> Can you see any problem with them? Here is how they are looking like
> currently:
>
> ffffffffff600100 <vread_tsc>:
> ffffffffff600100:       55                      push   %rbp
> ffffffffff600101:       48 89 e5                mov    %rsp,%rbp
> ffffffffff600104:       66 66 90                data32 xchg %ax,%ax
> ffffffffff600107:       66 66 90                data32 xchg %ax,%ax
> ffffffffff60010a:       0f 31                   rdtsc
> ffffffffff60010c:       89 c1                   mov    %eax,%ecx
> ffffffffff60010e:       48 89 d0                mov    %rdx,%rax
> ffffffffff600111:       48 8b 14 25 28 0d 60    mov    0xffffffffff600d28,%rdx
> ffffffffff600118:       ff
> ffffffffff600119:       48 c1 e0 20             shl    $0x20,%rax
> ffffffffff60011d:       48 09 c8                or     %rcx,%rax
> ffffffffff600120:       48 39 d0                cmp    %rdx,%rax
> ffffffffff600123:       73 03                   jae    ffffffffff600128 <vread_tsc+0x28>
> ffffffffff600125:       48 89 d0                mov    %rdx,%rax
> ffffffffff600128:       5d                      pop    %rbp
> ffffffffff600129:       c3                      retq
>
> ffffffffff60012a <vread_hpet>:
> ffffffffff60012a:       55                      push   %rbp
> ffffffffff60012b:       48 89 e5                mov    %rsp,%rbp
> ffffffffff60012e:       8b 04 25 f0 f0 5f ff    mov    0xffffffffff5ff0f0,%eax
> ffffffffff600135:       89 c0                   mov    %eax,%eax
> ffffffffff600137:       5d                      pop    %rbp
> ffffffffff600138:       c3                      retq
>
> There's no obvious syscall instruction in them that i can see. No
> 0x0f 0x05 pattern (even misaligned), no 0xcd-anything.

I can't see any problem, but exploit writers are exceedingly clever,
and maybe someone has a use for a piece of the code that isn't a
syscall.  Just as a completely artificial example, here's some buggy
code:

void buggy_function()
{
  attacker_controlled_pointer();
}

long should_be_insecure()
{
  buggy_function();
  return 0;  // We don't want to be exploitable.
}

int main()
{
  if (should_be_insecure())
    chmod("/etc/passwd", 0666);  // Live on the edge!
}

Assume that this code has frame pointers omitted but no other
optimizations.  An exploit could set attacher_controlled_pointer to
0xffffffffff60012e.  Then buggy_function will call the last bit of
vread_hpet, which will set eax to something nonzero, pop the return
address (i.e. the pointer to should_be_insecure) off the stack, then
return to main.  main checks the return value, decides it's nonzero,
and roots the system.

Of course, this is totally artificial and I haven't double-checked my
math, but it's kind of fun to be paranoid.

>
> We could even 'tie down' the actual assembly by moving this all to a
> .S - this way we protect against GCC accidentally generating
> something dangerous in there. I suggested that before.

I have no problem with that suggestion, except that once the current
series makes it into -tip I intend to move vread_tsc and vread_hpet to
the vDSO.  So leaving them alone for now saves work, and they'll be
more maintainable later if they're written in C.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 0/9] Remove syscall instructions at fixed addresses
  2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
                   ` (8 preceding siblings ...)
  2011-06-05 17:50 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule Andy Lutomirski
@ 2011-06-05 20:05 ` Andrew Lutomirski
  9 siblings, 0 replies; 112+ messages in thread
From: Andrew Lutomirski @ 2011-06-05 20:05 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec,
	Andy Lutomirski

On Sun, Jun 5, 2011 at 1:50 PM, Andy Lutomirski <luto@mit.edu> wrote:

>  x86-64: Remove vsyscall number 3 (venosys)

Torsten Kaiser pointed out that this patch will break bisection due to
a typo.  So please don't apply this series as is.

If there are no further changes I need to make, I'll send out v6 with
the fix in a day or two (unless someone wants it sooner).

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-05 20:01     ` Andrew Lutomirski
@ 2011-06-06  7:39       ` Ingo Molnar
  2011-06-06  9:42       ` pageexec
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06  7:39 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec


* Andrew Lutomirski <luto@mit.edu> wrote:

> > We could even 'tie down' the actual assembly by moving this all 
> > to a .S - this way we protect against GCC accidentally generating 
> > something dangerous in there. I suggested that before.
> 
> I have no problem with that suggestion, except that once the 
> current series makes it into -tip I intend to move vread_tsc and 
> vread_hpet to the vDSO.  So leaving them alone for now saves work, 
> and they'll be more maintainable later if they're written in C.

Sure, if they move to the vDSO real soon then moving them to assembly 
becomes moot.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Fix alignment of jiffies variable
  2011-06-05 17:50 ` [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable Andy Lutomirski
@ 2011-06-06  8:31   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, hpa, linux-kernel, luto, andi, bp,
	arjan, mingo

Commit-ID:  6879eb2deed7171a81b2f904c9ad14b9648689a7
Gitweb:     http://git.kernel.org/tip/6879eb2deed7171a81b2f904c9ad14b9648689a7
Author:     Andy Lutomirski <luto@mit.edu>
AuthorDate: Sun, 5 Jun 2011 13:50:17 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 5 Jun 2011 21:30:31 +0200

x86-64: Fix alignment of jiffies variable

It's declared __attribute__((aligned(16)) but it's explicitly
not aligned.  This is probably harmless but it's a bit
embarrassing.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/5f3bc5542e9aaa9382d53f153f54373165cdef89.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/vvar.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 341b355..a4eaca4 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -45,7 +45,7 @@
 /* DECLARE_VVAR(offset, type, name) */
 
 DECLARE_VVAR(0, volatile unsigned long, jiffies)
-DECLARE_VVAR(8, int, vgetcpu_mode)
+DECLARE_VVAR(16, int, vgetcpu_mode)
 DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
 
 #undef DECLARE_VVAR

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Document some of entry_64.S
  2011-06-05 17:50 ` [PATCH v5 2/9] x86-64: Document some of entry_64.S Andy Lutomirski
@ 2011-06-06  8:31   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, luto, hpa, linux-kernel, luto,
	andi, bp, arjan, mingo

Commit-ID:  8b4777a4b50cb0c84c1152eac85d24415fb6ff7d
Gitweb:     http://git.kernel.org/tip/8b4777a4b50cb0c84c1152eac85d24415fb6ff7d
Author:     Andy Lutomirski <luto@MIT.EDU>
AuthorDate: Sun, 5 Jun 2011 13:50:18 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 5 Jun 2011 21:30:32 +0200

x86-64: Document some of entry_64.S

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/fc134867cc550977cc996866129e11a16ba0f9ea.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/x86/entry_64.txt |   98 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/entry_64.S     |    2 +
 2 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
new file mode 100644
index 0000000..7869f14
--- /dev/null
+++ b/Documentation/x86/entry_64.txt
@@ -0,0 +1,98 @@
+This file documents some of the kernel entries in
+arch/x86/kernel/entry_64.S.  A lot of this explanation is adapted from
+an email from Ingo Molnar:
+
+http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
+
+The x86 architecture has quite a few different ways to jump into
+kernel code.  Most of these entry points are registered in
+arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S
+and arch/x86/ia32/ia32entry.S.
+
+The IDT vector assignments are listed in arch/x86/include/irq_vectors.h.
+
+Some of these entries are:
+
+ - system_call: syscall instruction from 64-bit code.
+
+ - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
+   either way.
+
+ - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit
+   code
+
+ - interrupt: An array of entries.  Every IDT vector that doesn't
+   explicitly point somewhere else gets set to the corresponding
+   value in interrupts.  These point to a whole array of
+   magically-generated functions that make their way to do_IRQ with
+   the interrupt number as a parameter.
+
+ - emulate_vsyscall: int 0xcc, a special non-ABI entry used by
+   vsyscall emulation.
+
+ - APIC interrupts: Various special-purpose interrupts for things
+   like TLB shootdown.
+
+ - Architecturally-defined exceptions like divide_error.
+
+There are a few complexities here.  The different x86-64 entries
+have different calling conventions.  The syscall and sysenter
+instructions have their own peculiar calling conventions.  Some of
+the IDT entries push an error code onto the stack; others don't.
+IDT entries using the IST alternative stack mechanism need their own
+magic to get the stack frames right.  (You can find some
+documentation in the AMD APM, Volume 2, Chapter 8 and the Intel SDM,
+Volume 3, Chapter 6.)
+
+Dealing with the swapgs instruction is especially tricky.  Swapgs
+toggles whether gs is the kernel gs or the user gs.  The swapgs
+instruction is rather fragile: it must nest perfectly and only in
+single depth, it should only be used if entering from user mode to
+kernel mode and then when returning to user-space, and precisely
+so. If we mess that up even slightly, we crash.
+
+So when we have a secondary entry, already in kernel mode, we *must
+not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
+not switched/swapped yet.
+
+Now, there's a secondary complication: there's a cheap way to test
+which mode the CPU is in and an expensive way.
+
+The cheap way is to pick this info off the entry frame on the kernel
+stack, from the CS of the ptregs area of the kernel stack:
+
+	xorl %ebx,%ebx
+	testl $3,CS+8(%rsp)
+	je error_kernelspace
+	SWAPGS
+
+The expensive (paranoid) way is to read back the MSR_GS_BASE value
+(which is what SWAPGS modifies):
+
+	movl $1,%ebx
+	movl $MSR_GS_BASE,%ecx
+	rdmsr
+	testl %edx,%edx
+	js 1f   /* negative -> in kernel */
+	SWAPGS
+	xorl %ebx,%ebx
+1:	ret
+
+and the whole paranoid non-paranoid macro complexity is about whether
+to suffer that RDMSR cost.
+
+If we are at an interrupt or user-trap/gate-alike boundary then we can
+use the faster check: the stack will be a reliable indicator of
+whether SWAPGS was already done: if we see that we are a secondary
+entry interrupting kernel mode execution, then we know that the GS
+base has already been switched. If it says that we interrupted
+user-space execution then we must do the SWAPGS.
+
+But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
+which might have triggered right after a normal entry wrote CS to the
+stack but before we executed SWAPGS, then the only safe way to check
+for GS is the slower method: the RDMSR.
+
+So we try only to mark those entry methods 'paranoid' that absolutely
+need the more expensive check for the GS base - and we generate all
+'normal' entry points with the regular (faster) entry macros.
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 8a445a0..72c4a77 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -9,6 +9,8 @@
 /*
  * entry.S contains the system-call and fault low-level handling routines.
  *
+ * Some of this is documented in Documentation/x86/entry_64.txt
+ *
  * NOTE: This code handles signal-recognition, which happens every time
  * after an interrupt and after each system call.
  *

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Give vvars their own page
  2011-06-05 17:50 ` [PATCH v5 3/9] x86-64: Give vvars their own page Andy Lutomirski
@ 2011-06-06  8:32   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, luto, hpa, linux-kernel, luto,
	andi, bp, arjan, mingo

Commit-ID:  9fd67b4ed0714ab718f1f9bd14c344af336a6df7
Gitweb:     http://git.kernel.org/tip/9fd67b4ed0714ab718f1f9bd14c344af336a6df7
Author:     Andy Lutomirski <luto@MIT.EDU>
AuthorDate: Sun, 5 Jun 2011 13:50:19 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 5 Jun 2011 21:30:32 +0200

x86-64: Give vvars their own page

Move vvars out of the vsyscall page into their own page and mark
it NX.

Without this patch, an attacker who can force a daemon to call
some fixed address could wait until the time contains, say,
0xCD80, and then execute the current time.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/b1460f81dc4463d66ea3f2b5ce240f58d48effec.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/fixmap.h        |    1 +
 arch/x86/include/asm/pgtable_types.h |    2 ++
 arch/x86/include/asm/vvar.h          |   22 ++++++++++------------
 arch/x86/kernel/vmlinux.lds.S        |   28 +++++++++++++++++-----------
 arch/x86/kernel/vsyscall_64.c        |    5 +++++
 5 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 4729b2b..460c74e 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -78,6 +78,7 @@ enum fixed_addresses {
 	VSYSCALL_LAST_PAGE,
 	VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
 			    + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1,
+	VVAR_PAGE,
 	VSYSCALL_HPET,
 #endif
 	FIX_DBGP_BASE,
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d56187c..6a29aed6 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -108,6 +108,7 @@
 #define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
 #define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
@@ -130,6 +131,7 @@
 #define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
 #define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
 #define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
+#define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
 
 #define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
 #define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index a4eaca4..de656ac 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -10,15 +10,14 @@
  * In normal kernel code, they are used like any other variable.
  * In user code, they are accessed through the VVAR macro.
  *
- * Each of these variables lives in the vsyscall page, and each
- * one needs a unique offset within the little piece of the page
- * reserved for vvars.  Specify that offset in DECLARE_VVAR.
- * (There are 896 bytes available.  If you mess up, the linker will
- * catch it.)
+ * These variables live in a page of kernel data that has an extra RO
+ * mapping for userspace.  Each variable needs a unique offset within
+ * that page; specify that offset with the DECLARE_VVAR macro.  (If
+ * you mess up, the linker will catch it.)
  */
 
-/* Offset of vars within vsyscall page */
-#define VSYSCALL_VARS_OFFSET (3072 + 128)
+/* Base address of vvars.  This is not ABI. */
+#define VVAR_ADDRESS (-10*1024*1024 - 4096)
 
 #if defined(__VVAR_KERNEL_LDS)
 
@@ -26,17 +25,17 @@
  * right place.
  */
 #define DECLARE_VVAR(offset, type, name) \
-	EMIT_VVAR(name, VSYSCALL_VARS_OFFSET + offset)
+	EMIT_VVAR(name, offset)
 
 #else
 
 #define DECLARE_VVAR(offset, type, name)				\
 	static type const * const vvaraddr_ ## name =			\
-		(void *)(VSYSCALL_START + VSYSCALL_VARS_OFFSET + (offset));
+		(void *)(VVAR_ADDRESS + (offset));
 
 #define DEFINE_VVAR(type, name)						\
-	type __vvar_ ## name						\
-	__attribute__((section(".vsyscall_var_" #name), aligned(16)))
+	type name							\
+	__attribute__((section(".vvar_" #name), aligned(16)))
 
 #define VVAR(name) (*vvaraddr_ ## name)
 
@@ -49,4 +48,3 @@ DECLARE_VVAR(16, int, vgetcpu_mode)
 DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
 
 #undef DECLARE_VVAR
-#undef VSYSCALL_VARS_OFFSET
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 89aed99..98b378d 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -161,12 +161,6 @@ SECTIONS
 
 #define VVIRT_OFFSET (VSYSCALL_ADDR - __vsyscall_0)
 #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET)
-#define EMIT_VVAR(x, offset) .vsyscall_var_ ## x	\
-	ADDR(.vsyscall_0) + offset		 	\
-	: AT(VLOAD(.vsyscall_var_ ## x)) {     		\
-		*(.vsyscall_var_ ## x)			\
-	}						\
-	x = VVIRT(.vsyscall_var_ ## x);
 
 	. = ALIGN(4096);
 	__vsyscall_0 = .;
@@ -192,19 +186,31 @@ SECTIONS
 		*(.vsyscall_3)
 	}
 
-#define __VVAR_KERNEL_LDS
-#include <asm/vvar.h>
-#undef __VVAR_KERNEL_LDS
-
-	. = __vsyscall_0 + PAGE_SIZE;
+	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
 #undef VLOAD_OFFSET
 #undef VLOAD
 #undef VVIRT_OFFSET
 #undef VVIRT
+
+	__vvar_page = .;
+
+	.vvar : AT(ADDR(.vvar) - LOAD_OFFSET) {
+
+	      /* Place all vvars at the offsets in asm/vvar.h. */
+#define EMIT_VVAR(name, offset) 		\
+		. = offset;		\
+		*(.vvar_ ## name)
+#define __VVAR_KERNEL_LDS
+#include <asm/vvar.h>
+#undef __VVAR_KERNEL_LDS
 #undef EMIT_VVAR
 
+	} :data
+
+       . = ALIGN(__vvar_page + PAGE_SIZE, PAGE_SIZE);
+
 #endif /* CONFIG_X86_64 */
 
 	/* Init code and data - will be freed after init */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 3e68218..3cf1cef 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -284,9 +284,14 @@ void __init map_vsyscall(void)
 {
 	extern char __vsyscall_0;
 	unsigned long physaddr_page0 = __pa_symbol(&__vsyscall_0);
+	extern char __vvar_page;
+	unsigned long physaddr_vvar_page = __pa_symbol(&__vvar_page);
 
 	/* Note that VSYSCALL_MAPPED_PAGES must agree with the code below. */
 	__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
+	__set_fixmap(VVAR_PAGE, physaddr_vvar_page, PAGE_KERNEL_VVAR);
+	BUILD_BUG_ON((unsigned long)__fix_to_virt(VVAR_PAGE) !=
+		     (unsigned long)VVAR_ADDRESS);
 }
 
 static int __init vsyscall_init(void)

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Remove kernel.vsyscall64 sysctl
  2011-06-05 17:50 ` [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
@ 2011-06-06  8:32   ` tip-bot for Andy Lutomirski
  2011-12-05 18:27   ` [PATCH v5 4/9] " Matthew Maurer
  1 sibling, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, luto, hpa, linux-kernel, luto,
	andi, bp, arjan, mingo

Commit-ID:  0d7b8547fb67d5c2a7d954c56b3715b0e708be4a
Gitweb:     http://git.kernel.org/tip/0d7b8547fb67d5c2a7d954c56b3715b0e708be4a
Author:     Andy Lutomirski <luto@MIT.EDU>
AuthorDate: Sun, 5 Jun 2011 13:50:20 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 5 Jun 2011 21:30:33 +0200

x86-64: Remove kernel.vsyscall64 sysctl

It's unnecessary overhead in code that's supposed to be highly
optimized.  Removing it allows us to remove one of the two
syscall instructions in the vsyscall page.

The only sensible use for it is for UML users, and it doesn't
fully address inconsistent vsyscall results on UML.  The real
fix for UML is to stop using vsyscalls entirely.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/973ae803fe76f712da4b2740e66dccf452d3b1e4.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/vgtod.h   |    1 -
 arch/x86/kernel/vsyscall_64.c  |   34 +------------------------
 arch/x86/vdso/vclock_gettime.c |   55 +++++++++++++++------------------------
 3 files changed, 22 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 646b4c1..aa5add8 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -11,7 +11,6 @@ struct vsyscall_gtod_data {
 	time_t		wall_time_sec;
 	u32		wall_time_nsec;
 
-	int		sysctl_enabled;
 	struct timezone sys_tz;
 	struct { /* extract of a clocksource struct */
 		cycle_t (*vread)(void);
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 3cf1cef..9b2f3f5 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -53,7 +53,6 @@ DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 {
 	.lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
-	.sysctl_enabled = 1,
 };
 
 void update_vsyscall_tz(void)
@@ -103,15 +102,6 @@ static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
 	return ret;
 }
 
-static __always_inline long time_syscall(long *t)
-{
-	long secs;
-	asm volatile("syscall"
-		: "=a" (secs)
-		: "0" (__NR_time),"D" (t) : __syscall_clobber);
-	return secs;
-}
-
 static __always_inline void do_vgettimeofday(struct timeval * tv)
 {
 	cycle_t now, base, mask, cycle_delta;
@@ -122,8 +112,7 @@ static __always_inline void do_vgettimeofday(struct timeval * tv)
 		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
 
 		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled ||
-			     !vread)) {
+		if (unlikely(!vread)) {
 			gettimeofday(tv,NULL);
 			return;
 		}
@@ -165,8 +154,6 @@ time_t __vsyscall(1) vtime(time_t *t)
 {
 	unsigned seq;
 	time_t result;
-	if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled))
-		return time_syscall(t);
 
 	do {
 		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
@@ -227,22 +214,6 @@ static long __vsyscall(3) venosys_1(void)
 	return -ENOSYS;
 }
 
-#ifdef CONFIG_SYSCTL
-static ctl_table kernel_table2[] = {
-	{ .procname = "vsyscall64",
-	  .data = &vsyscall_gtod_data.sysctl_enabled, .maxlen = sizeof(int),
-	  .mode = 0644,
-	  .proc_handler = proc_dointvec },
-	{}
-};
-
-static ctl_table kernel_root_table2[] = {
-	{ .procname = "kernel", .mode = 0555,
-	  .child = kernel_table2 },
-	{}
-};
-#endif
-
 /* Assume __initcall executes before all user space. Hopefully kmod
    doesn't violate that. We'll find out if it does. */
 static void __cpuinit vsyscall_set_cpu(int cpu)
@@ -301,9 +272,6 @@ static int __init vsyscall_init(void)
 	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
 	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
 	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
-#ifdef CONFIG_SYSCTL
-	register_sysctl_table(kernel_root_table2);
-#endif
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index a724905..cf54813 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -116,21 +116,21 @@ notrace static noinline int do_monotonic_coarse(struct timespec *ts)
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
-	if (likely(gtod->sysctl_enabled))
-		switch (clock) {
-		case CLOCK_REALTIME:
-			if (likely(gtod->clock.vread))
-				return do_realtime(ts);
-			break;
-		case CLOCK_MONOTONIC:
-			if (likely(gtod->clock.vread))
-				return do_monotonic(ts);
-			break;
-		case CLOCK_REALTIME_COARSE:
-			return do_realtime_coarse(ts);
-		case CLOCK_MONOTONIC_COARSE:
-			return do_monotonic_coarse(ts);
-		}
+	switch (clock) {
+	case CLOCK_REALTIME:
+		if (likely(gtod->clock.vread))
+			return do_realtime(ts);
+		break;
+	case CLOCK_MONOTONIC:
+		if (likely(gtod->clock.vread))
+			return do_monotonic(ts);
+		break;
+	case CLOCK_REALTIME_COARSE:
+		return do_realtime_coarse(ts);
+	case CLOCK_MONOTONIC_COARSE:
+		return do_monotonic_coarse(ts);
+	}
+
 	return vdso_fallback_gettime(clock, ts);
 }
 int clock_gettime(clockid_t, struct timespec *)
@@ -139,7 +139,7 @@ int clock_gettime(clockid_t, struct timespec *)
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	long ret;
-	if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
+	if (likely(gtod->clock.vread)) {
 		if (likely(tv != NULL)) {
 			BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
 				     offsetof(struct timespec, tv_nsec) ||
@@ -161,27 +161,14 @@ notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 int gettimeofday(struct timeval *, struct timezone *)
 	__attribute__((weak, alias("__vdso_gettimeofday")));
 
-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
-
-static __always_inline long time_syscall(long *t)
-{
-	long secs;
-	asm volatile("syscall"
-		     : "=a" (secs)
-		     : "0" (__NR_time), "D" (t) : "cc", "r11", "cx", "memory");
-	return secs;
-}
-
+/*
+ * This will break when the xtime seconds get inaccurate, but that is
+ * unlikely
+ */
 notrace time_t __vdso_time(time_t *t)
 {
-	time_t result;
-
-	if (unlikely(!VVAR(vsyscall_gtod_data).sysctl_enabled))
-		return time_syscall(t);
-
 	/* This is atomic on x86_64 so we don't need any locks. */
-	result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
+	time_t result = ACCESS_ONCE(VVAR(vsyscall_gtod_data).wall_time_sec);
 
 	if (t)
 		*t = result;

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Map the HPET NX
  2011-06-05 17:50 ` [PATCH v5 5/9] x86-64: Map the HPET NX Andy Lutomirski
@ 2011-06-06  8:33   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, hpa, linux-kernel, luto, andi, bp,
	arjan, mingo

Commit-ID:  d319bb79afa4039bda6f85661d6bf0c13299ce93
Gitweb:     http://git.kernel.org/tip/d319bb79afa4039bda6f85661d6bf0c13299ce93
Author:     Andy Lutomirski <luto@mit.edu>
AuthorDate: Sun, 5 Jun 2011 13:50:21 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 5 Jun 2011 21:30:33 +0200

x86-64: Map the HPET NX

Currently the HPET mapping is a user-accessible syscall
instruction at a fixed address some of the time.

A sufficiently determined hacker might be able to guess when.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/ab41b525a4ca346b1ca1145d16fb8d181861a8aa.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/pgtable_types.h |    4 ++--
 arch/x86/kernel/hpet.c               |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 6a29aed6..013286a 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -107,8 +107,8 @@
 #define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
-#define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
+#define __PAGE_KERNEL_VVAR_NOCACHE	(__PAGE_KERNEL_VVAR | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
@@ -130,8 +130,8 @@
 #define PAGE_KERNEL_LARGE_NOCACHE	__pgprot(__PAGE_KERNEL_LARGE_NOCACHE)
 #define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
 #define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
 #define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
+#define PAGE_KERNEL_VVAR_NOCACHE	__pgprot(__PAGE_KERNEL_VVAR_NOCACHE)
 
 #define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
 #define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 6781765..e9f5605 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -71,7 +71,7 @@ static inline void hpet_set_mapping(void)
 {
 	hpet_virt_address = ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
 #ifdef CONFIG_X86_64
-	__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);
+	__set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VVAR_NOCACHE);
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Remove vsyscall number 3 (venosys)
  2011-06-05 17:50 ` [PATCH v5 6/9] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
@ 2011-06-06  8:33   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, hpa, linux-kernel, luto, andi, bp,
	arjan, mingo

Commit-ID:  bb5fe2f78eadf5a52d8dcbf9a57728fd107af97b
Gitweb:     http://git.kernel.org/tip/bb5fe2f78eadf5a52d8dcbf9a57728fd107af97b
Author:     Andy Lutomirski <luto@mit.edu>
AuthorDate: Sun, 5 Jun 2011 13:50:22 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 6 Jun 2011 09:43:14 +0200

x86-64: Remove vsyscall number 3 (venosys)

It just segfaults since April 2008 (a4928cff), so I'm pretty
sure that nothing uses it.  And having an empty section makes
the linker script a bit fragile.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/4a4abcf47ecadc269f2391a313576fe6d06acef7.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/vmlinux.lds.S |    4 ----
 arch/x86/kernel/vsyscall_64.c |    5 -----
 2 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 98b378d..4f90082 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -182,10 +182,6 @@ SECTIONS
 		*(.vsyscall_2)
 	}
 
-	.vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) {
-		*(.vsyscall_3)
-	}
-
 	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 9b2f3f5..70a5f6e 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -209,11 +209,6 @@ vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
 	return 0;
 }
 
-static long __vsyscall(3) venosys_1(void)
-{
-	return -ENOSYS;
-}
-
 /* Assume __initcall executes before all user space. Hopefully kmod
    doesn't violate that. We'll find out if it does. */
 static void __cpuinit vsyscall_set_cpu(int cpu)

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Fill unused parts of the vsyscall page with 0xcc
  2011-06-05 17:50 ` [PATCH v5 7/9] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
@ 2011-06-06  8:34   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, luto, hpa, linux-kernel, luto,
	andi, bp, arjan, mingo

Commit-ID:  5dfcea629a08b4684a019cd0cb59d0c9129a6c02
Gitweb:     http://git.kernel.org/tip/5dfcea629a08b4684a019cd0cb59d0c9129a6c02
Author:     Andy Lutomirski <luto@MIT.EDU>
AuthorDate: Sun, 5 Jun 2011 13:50:23 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 6 Jun 2011 09:43:14 +0200

x86-64: Fill unused parts of the vsyscall page with 0xcc

Jumping to 0x00 might do something depending on the following
bytes. Jumping to 0xcc is a trap.  So fill the unused parts of
the vsyscall page with 0xcc to make it useless for exploits to
jump there.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/ed54bfcfbe50a9070d20ec1edbe0d149e22a4568.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/vmlinux.lds.S |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4f90082..8017471 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -166,22 +166,20 @@ SECTIONS
 	__vsyscall_0 = .;
 
 	. = VSYSCALL_ADDR;
-	.vsyscall_0 : AT(VLOAD(.vsyscall_0)) {
+	.vsyscall : AT(VLOAD(.vsyscall)) {
 		*(.vsyscall_0)
-	} :user
 
-	. = ALIGN(L1_CACHE_BYTES);
-	.vsyscall_fn : AT(VLOAD(.vsyscall_fn)) {
+		. = ALIGN(L1_CACHE_BYTES);
 		*(.vsyscall_fn)
-	}
 
-	.vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) {
+		. = 1024;
 		*(.vsyscall_1)
-	}
-	.vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) {
+
+		. = 2048;
 		*(.vsyscall_2)
-	}
 
+		. = 4096;  /* Pad the whole page. */
+	} :user =0xcc
 	. = ALIGN(__vsyscall_0 + PAGE_SIZE, PAGE_SIZE);
 
 #undef VSYSCALL_ADDR

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Emulate legacy vsyscalls
  2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
  2011-06-05 19:30   ` Ingo Molnar
@ 2011-06-06  8:34   ` tip-bot for Andy Lutomirski
  2011-06-06  8:35   ` [tip:x86/vdso] x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build tip-bot for Ingo Molnar
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, luto, hpa, linux-kernel, luto,
	andi, bp, arjan, mingo

Commit-ID:  d55ed1d30b82f941b37f9dbf5ed4aaf8b3f917e3
Gitweb:     http://git.kernel.org/tip/d55ed1d30b82f941b37f9dbf5ed4aaf8b3f917e3
Author:     Andy Lutomirski <luto@MIT.EDU>
AuthorDate: Sun, 5 Jun 2011 13:50:24 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 6 Jun 2011 09:43:14 +0200

x86-64: Emulate legacy vsyscalls

There's a fair amount of code in the vsyscall page.  It contains
a syscall instruction (in the gettimeofday fallback) and who
knows what will happen if an exploit jumps into the middle of
some other code.

Reduce the risk by replacing the vsyscalls with short magic
incantations that cause the kernel to emulate the real
vsyscalls. These incantations are useless if entered in the
middle.

This causes vsyscalls to be a little more expensive than real
syscalls.  Fortunately sensible programs don't use them.

Less fortunately, current glibc uses the vsyscall for time()
even in dynamic binaries.  So there's a CONFIG_UNSAFE_VSYSCALLS
(default y) option that leaves in the native code for time().
That should go away in awhile when glibc gets fixed.

Some care is taken to make sure that tools like valgrind and
ThreadSpotter still work.

This patch is not perfect: the vread_tsc and vread_hpet
functions are still at a fixed address.  Fixing that might
involve making alternative patching work in the vDSO.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/Kconfig                   |   17 +++
 arch/x86/include/asm/irq_vectors.h |    6 +-
 arch/x86/include/asm/traps.h       |    4 +
 arch/x86/include/asm/vsyscall.h    |    6 +
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/entry_64.S         |    2 +
 arch/x86/kernel/traps.c            |    6 +
 arch/x86/kernel/vsyscall_64.c      |  253 +++++++++++++++++++++---------------
 arch/x86/kernel/vsyscall_emu_64.S  |   42 ++++++
 9 files changed, 234 insertions(+), 103 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index da34972..79e5d8a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1646,6 +1646,23 @@ config COMPAT_VDSO
 
 	  If unsure, say Y.
 
+config UNSAFE_VSYSCALLS
+	def_bool y
+	prompt "Unsafe fast legacy vsyscalls"
+	depends on X86_64
+	---help---
+	  Legacy user code expects to be able to issue three syscalls
+	  by calling fixed addresses in kernel space.  If you say N,
+	  then the kernel traps and emulates these calls.  If you say
+	  Y, then there is actual executable code at a fixed address
+	  to implement time() efficiently.
+
+	  On a system with recent enough glibc (probably 2.14 or
+	  newer) and no static binaries, you can say N without a
+	  performance penalty to improve security
+
+	  If unsure, say Y.
+
 config CMDLINE_BOOL
 	bool "Built-in kernel command line"
 	---help---
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 6e976ee..a563c50 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -17,7 +17,8 @@
  *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
  *  Vectors  32 ... 127 : device interrupts
  *  Vector  128         : legacy int80 syscall interface
- *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 : device interrupts
+ *  Vector  204         : legacy x86_64 vsyscall emulation
+ *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 except 204 : device interrupts
  *  Vectors INVALIDATE_TLB_VECTOR_START ... 255 : special interrupts
  *
  * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
@@ -50,6 +51,9 @@
 #ifdef CONFIG_X86_32
 # define SYSCALL_VECTOR			0x80
 #endif
+#ifdef CONFIG_X86_64
+# define VSYSCALL_EMU_VECTOR		0xcc
+#endif
 
 /*
  * Vectors 0x30-0x3f are used for ISA interrupts.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0310da6..2bae0a5 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_TRAPS_H
 #define _ASM_X86_TRAPS_H
 
+#include <linux/kprobes.h>
+
 #include <asm/debugreg.h>
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
@@ -38,6 +40,7 @@ asmlinkage void alignment_check(void);
 asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
+asmlinkage void emulate_vsyscall(void);
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
@@ -64,6 +67,7 @@ dotraplinkage void do_alignment_check(struct pt_regs *, long);
 dotraplinkage void do_machine_check(struct pt_regs *, long);
 #endif
 dotraplinkage void do_simd_coprocessor_error(struct pt_regs *, long);
+dotraplinkage void do_emulate_vsyscall(struct pt_regs *, long);
 #ifdef CONFIG_X86_32
 dotraplinkage void do_iret_error(struct pt_regs *, long);
 #endif
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index d555973..293ae08 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -31,6 +31,12 @@ extern struct timezone sys_tz;
 
 extern void map_vsyscall(void);
 
+/* Emulation */
+static inline bool in_vsyscall_page(unsigned long addr)
+{
+	return (addr & ~(PAGE_SIZE - 1)) == VSYSCALL_START;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 90b06d4..cc0469a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -44,6 +44,7 @@ obj-y			+= probe_roms.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
+obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 72c4a77..e949793 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1123,6 +1123,8 @@ zeroentry spurious_interrupt_bug do_spurious_interrupt_bug
 zeroentry coprocessor_error do_coprocessor_error
 errorentry alignment_check do_alignment_check
 zeroentry simd_coprocessor_error do_simd_coprocessor_error
+zeroentry emulate_vsyscall do_emulate_vsyscall
+
 
 	/* Reload gs selector with exception handling */
 	/* edi:  new selector */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..fbc097a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -872,6 +872,12 @@ void __init trap_init(void)
 	set_bit(SYSCALL_VECTOR, used_vectors);
 #endif
 
+#ifdef CONFIG_X86_64
+	BUG_ON(test_bit(VSYSCALL_EMU_VECTOR, used_vectors));
+	set_system_intr_gate(VSYSCALL_EMU_VECTOR, &emulate_vsyscall);
+	set_bit(VSYSCALL_EMU_VECTOR, used_vectors);
+#endif
+
 	/*
 	 * Should be a barrier for any external CPU state:
 	 */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 70a5f6e..52ba392 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -32,6 +32,8 @@
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/notifier.h>
+#include <linux/syscalls.h>
+#include <linux/ratelimit.h>
 
 #include <asm/vsyscall.h>
 #include <asm/pgtable.h>
@@ -44,10 +46,7 @@
 #include <asm/desc.h>
 #include <asm/topology.h>
 #include <asm/vgtod.h>
-
-#define __vsyscall(nr) \
-		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
-#define __syscall_clobber "r11","cx","memory"
+#include <asm/traps.h>
 
 DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
@@ -84,73 +83,45 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
 
-/* RED-PEN may want to readd seq locking, but then the variable should be
- * write-once.
- */
-static __always_inline void do_get_tz(struct timezone * tz)
+static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
+			      const char *message)
 {
-	*tz = VVAR(vsyscall_gtod_data).sys_tz;
+	struct task_struct *tsk;
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+
+	if (!show_unhandled_signals || !__ratelimit(&rs))
+		return;
+
+	tsk = current;
+
+	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx",
+	       level, tsk->comm, task_pid_nr(tsk),
+	       message,
+	       regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
+	if (!in_vsyscall_page(regs->ip - 2))
+		print_vma_addr(" in ", regs->ip - 2);
+	printk("\n");
 }
 
-static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
-{
-	int ret;
-	asm volatile("syscall"
-		: "=a" (ret)
-		: "0" (__NR_gettimeofday),"D" (tv),"S" (tz)
-		: __syscall_clobber );
-	return ret;
-}
+/* al values for each vsyscall; see vsyscall_emu_64.S for why. */
+static u8 vsyscall_nr_to_al[] = {0xcc, 0xce, 0xf0};
 
-static __always_inline void do_vgettimeofday(struct timeval * tv)
+static int al_to_vsyscall_nr(u8 al)
 {
-	cycle_t now, base, mask, cycle_delta;
-	unsigned seq;
-	unsigned long mult, shift, nsec;
-	cycle_t (*vread)(void);
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
-
-		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!vread)) {
-			gettimeofday(tv,NULL);
-			return;
-		}
-
-		now = vread();
-		base = VVAR(vsyscall_gtod_data).clock.cycle_last;
-		mask = VVAR(vsyscall_gtod_data).clock.mask;
-		mult = VVAR(vsyscall_gtod_data).clock.mult;
-		shift = VVAR(vsyscall_gtod_data).clock.shift;
-
-		tv->tv_sec = VVAR(vsyscall_gtod_data).wall_time_sec;
-		nsec = VVAR(vsyscall_gtod_data).wall_time_nsec;
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
-
-	/* calculate interval: */
-	cycle_delta = (now - base) & mask;
-	/* convert to nsecs: */
-	nsec += (cycle_delta * mult) >> shift;
-
-	while (nsec >= NSEC_PER_SEC) {
-		tv->tv_sec += 1;
-		nsec -= NSEC_PER_SEC;
-	}
-	tv->tv_usec = nsec / NSEC_PER_USEC;
+	int i;
+	for (i = 0; i < ARRAY_SIZE(vsyscall_nr_to_al); i++)
+		if (vsyscall_nr_to_al[i] == al)
+			return i;
+	return -1;
 }
 
-int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
-{
-	if (tv)
-		do_vgettimeofday(tv);
-	if (tz)
-		do_get_tz(tz);
-	return 0;
-}
+#ifdef CONFIG_UNSAFE_VSYSCALLS
 
 /* This will break when the xtime seconds get inaccurate, but that is
  * unlikely */
-time_t __vsyscall(1) vtime(time_t *t)
+time_t __attribute__ ((unused, __section__(".vsyscall_1"))) notrace
+vtime(time_t *t)
 {
 	unsigned seq;
 	time_t result;
@@ -167,46 +138,127 @@ time_t __vsyscall(1) vtime(time_t *t)
 	return result;
 }
 
-/* Fast way to get current CPU and node.
-   This helps to do per node and per CPU caches in user space.
-   The result is not guaranteed without CPU affinity, but usually
-   works out because the scheduler tries to keep a thread on the same
-   CPU.
+#endif /* CONFIG_UNSAFE_VSYSCALLS */
+
+/* If CONFIG_UNSAFE_VSYSCALLS=y, then this is incorrect for vsyscall_nr == 1. */
+static inline unsigned long vsyscall_intcc_addr(int vsyscall_nr)
+{
+	return VSYSCALL_START + 1024*vsyscall_nr + 2;
+}
 
-   tcache must point to a two element sized long array.
-   All arguments can be NULL. */
-long __vsyscall(2)
-vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 {
-	unsigned int p;
-	unsigned long j = 0;
-
-	/* Fast cache - only recompute value once per jiffies and avoid
-	   relatively costly rdtscp/cpuid otherwise.
-	   This works because the scheduler usually keeps the process
-	   on the same CPU and this syscall doesn't guarantee its
-	   results anyways.
-	   We do this here because otherwise user space would do it on
-	   its own in a likely inferior way (no access to jiffies).
-	   If you don't like it pass NULL. */
-	if (tcache && tcache->blob[0] == (j = VVAR(jiffies))) {
-		p = tcache->blob[1];
-	} else if (VVAR(vgetcpu_mode) == VGETCPU_RDTSCP) {
-		/* Load per CPU data from RDTSCP */
-		native_read_tscp(&p);
-	} else {
-		/* Load per CPU data from GDT */
-		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+	static DEFINE_RATELIMIT_STATE(rs, 3600 * HZ, 3);
+	struct task_struct *tsk;
+	const char *vsyscall_name;
+	int vsyscall_nr;
+	long ret;
+
+	/* Kernel code must never get here. */
+	BUG_ON(!user_mode(regs));
+
+	local_irq_enable();
+
+	vsyscall_nr = al_to_vsyscall_nr(regs->ax & 0xff);
+	if (vsyscall_nr < 0) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "illegal int 0xcc "
+				  "(exploit attempt?)");
+		goto sigsegv;
 	}
-	if (tcache) {
-		tcache->blob[0] = j;
-		tcache->blob[1] = p;
+
+	if (regs->ip - 2 != vsyscall_intcc_addr(vsyscall_nr)) {
+		if (in_vsyscall_page(regs->ip - 2)) {
+			/* This should not be possible. */
+			warn_bad_vsyscall(KERN_WARNING, regs,
+					  "int 0xcc bogus magic "
+					  "(exploit attempt?)");
+			goto sigsegv;
+		} else {
+			/*
+			 * We allow the call because tools like ThreadSpotter
+			 * might copy the int 0xcc instruction to user memory.
+			 * We make it annoying, though, to try to persuade
+			 * the authors to stop doing that...
+			 */
+			warn_bad_vsyscall(KERN_WARNING, regs,
+					  "int 0xcc in user code "
+					  "(exploit attempt? legacy "
+					  "instrumented code?)");
+		}
 	}
-	if (cpu)
-		*cpu = p & 0xfff;
-	if (node)
-		*node = p >> 12;
-	return 0;
+
+	tsk = current;
+	if (tsk->seccomp.mode) {
+		do_exit(SIGKILL);
+		goto out;
+	}
+
+	switch (vsyscall_nr) {
+	case 0:
+		vsyscall_name = "gettimeofday";
+		ret = sys_gettimeofday(
+			(struct timeval __user *)regs->di,
+			(struct timezone __user *)regs->si);
+		break;
+
+	case 1:
+#ifdef CONFIG_UNSAFE_VSYSCALLS
+		warn_bad_vsyscall(KERN_WARNING, regs, "bogus time() vsyscall "
+				  "emulation (exploit attempt?)");
+		goto sigsegv;
+#else
+		vsyscall_name = "time";
+		ret = sys_time((time_t __user *)regs->di);
+		break;
+#endif
+
+	case 2:
+		vsyscall_name = "getcpu";
+		ret = sys_getcpu((unsigned __user *)regs->di,
+				 (unsigned __user *)regs->si,
+				 0);
+		break;
+
+	default:
+		BUG();
+	}
+
+	if (ret == -EFAULT) {
+		/*
+		 * Bad news -- userspace fed a bad pointer to a vsyscall.
+		 *
+		 * With a real vsyscall, that would have caused SIGSEGV.
+		 * To make writing reliable exploits using the emulated
+		 * vsyscalls harder, generate SIGSEGV here as well.
+		 */
+		warn_bad_vsyscall(KERN_INFO, regs,
+				  "vsyscall fault (exploit attempt?)");
+		goto sigsegv;
+	}
+
+	regs->ax = ret;
+
+	if (__ratelimit(&rs)) {
+		unsigned long caller;
+		if (get_user(caller, (unsigned long __user *)regs->sp))
+			caller = 0;  /* no need to crash on this fault. */
+		printk(KERN_INFO "%s[%d] emulated legacy vsyscall %s(); "
+		       "upgrade your code to avoid a performance hit. "
+		       "ip:%lx sp:%lx caller:%lx",
+		       tsk->comm, task_pid_nr(tsk), vsyscall_name,
+		       regs->ip - 2, regs->sp, caller);
+		if (caller)
+			print_vma_addr(" in ", caller);
+		printk("\n");
+	}
+
+out:
+	local_irq_disable();
+	return;
+
+sigsegv:
+	regs->ip -= 2;  /* The faulting instruction should be the int 0xcc. */
+	force_sig(SIGSEGV, current);
 }
 
 /* Assume __initcall executes before all user space. Hopefully kmod
@@ -262,11 +314,8 @@ void __init map_vsyscall(void)
 
 static int __init vsyscall_init(void)
 {
-	BUG_ON(((unsigned long) &vgettimeofday !=
-			VSYSCALL_ADDR(__NR_vgettimeofday)));
-	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
-	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
-	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
+	BUG_ON(VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE));
+
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
new file mode 100644
index 0000000..7ebde61
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -0,0 +1,42 @@
+/*
+ * vsyscall_emu_64.S: Vsyscall emulation page
+ * Copyright (c) 2011 Andy Lutomirski
+ * Subject to the GNU General Public License, version 2
+*/
+
+#include <linux/linkage.h>
+#include <asm/irq_vectors.h>
+
+/*
+ * These magic incantations are chosen so that they fault if entered anywhere
+ * other than an instruction boundary.  The movb instruction is two bytes, and
+ * the int imm8 instruction is also two bytes, so the only misaligned places
+ * to enter are the immediate values for the two instructions.  0xcc is int3
+ * (always faults), 0xce is into (faults on x64-64, and 32-bit code can't get
+ * here), and 0xf0 is lock (lock int is invalid).
+ *
+ * The unused parts of the page are filled with 0xcc by the linker script.
+ */
+
+.section .vsyscall_0, "a"
+ENTRY(vsyscall_0)
+	movb $0xcc, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_0)
+
+#ifndef CONFIG_UNSAFE_VSYSCALLS
+.section .vsyscall_1, "a"
+ENTRY(vsyscall_1)
+	movb $0xce, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_1)
+#endif
+
+.section .vsyscall_2, "a"
+ENTRY(vsyscall_2)
+	movb $0xf0, %al
+	int $VSYSCALL_EMU_VECTOR
+	ret
+END(vsyscall_2)

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-05 17:50 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule Andy Lutomirski
@ 2011-06-06  8:34   ` tip-bot for Andy Lutomirski
  2011-06-06  8:46   ` [PATCH v5 9/9] " Linus Torvalds
  1 sibling, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-06  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, hpa, linux-kernel, luto, andi, bp,
	arjan, mingo

Commit-ID:  38172403a97828ae2ea12281b19528582d6625d4
Gitweb:     http://git.kernel.org/tip/38172403a97828ae2ea12281b19528582d6625d4
Author:     Andy Lutomirski <luto@mit.edu>
AuthorDate: Sun, 5 Jun 2011 13:50:25 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 6 Jun 2011 09:43:15 +0200

x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule

CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
temporary hack to avoid penalizing users who don't build glibc
from git.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/4de62bfbf6974f14d0e9d9ae37cc137dbc926a30.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/feature-removal-schedule.txt |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 1a9446b..94b4470 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -600,3 +600,12 @@ Why:	Superseded by the UVCIOC_CTRL_QUERY ioctl.
 Who:	Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 
 ----------------------------
+
+What:	CONFIG_UNSAFE_VSYSCALLS (x86_64)
+When:	When glibc 2.14 or newer is ubitquitous.  Perhaps mid-2012.
+Why:	Having user-executable code at a fixed address is a security problem.
+	Turning off CONFIG_UNSAFE_VSYSCALLS mostly removes the risk but will
+	make the time() function slower on glibc versions 2.13 and below.
+Who:	Andy Lutomirski <luto@mit.edu>
+
+----------------------------

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build
  2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
  2011-06-05 19:30   ` Ingo Molnar
  2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
@ 2011-06-06  8:35   ` tip-bot for Ingo Molnar
  2011-06-07  7:49   ` [tip:x86/vdso] x86-64: Emulate legacy vsyscalls tip-bot for Andy Lutomirski
  2011-06-07  8:03   ` tip-bot for Andy Lutomirski
  4 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Ingo Molnar @ 2011-06-06  8:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, hpa, linux-kernel, luto, andi, bp,
	arjan, mingo

Commit-ID:  764611c8dfb5be43611296affd92272b7215daea
Gitweb:     http://git.kernel.org/tip/764611c8dfb5be43611296affd92272b7215daea
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Mon, 6 Jun 2011 10:00:01 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 6 Jun 2011 10:00:01 +0200

x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build

Factor out seccomp->mode access into an inline function and make
it work in the !CONFIG_SECCOMP case as well.

Cc: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/vsyscall_64.c |    2 +-
 include/linux/seccomp.h       |   10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 52ba392..285af7a 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -188,7 +188,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 	}
 
 	tsk = current;
-	if (tsk->seccomp.mode) {
+	if (seccomp_mode(&tsk->seccomp)) {
 		do_exit(SIGKILL);
 		goto out;
 	}
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 167c333..cc7a4e9 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -19,6 +19,11 @@ static inline void secure_computing(int this_syscall)
 extern long prctl_get_seccomp(void);
 extern long prctl_set_seccomp(unsigned long);
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return s->mode;
+}
+
 #else /* CONFIG_SECCOMP */
 
 #include <linux/errno.h>
@@ -37,6 +42,11 @@ static inline long prctl_set_seccomp(unsigned long arg2)
 	return -EINVAL;
 }
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return 0;
+}
+
 #endif /* CONFIG_SECCOMP */
 
 #endif /* _LINUX_SECCOMP_H */

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-05 17:50 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule Andy Lutomirski
  2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
@ 2011-06-06  8:46   ` Linus Torvalds
  2011-06-06  9:31     ` Andi Kleen
                       ` (2 more replies)
  1 sibling, 3 replies; 112+ messages in thread
From: Linus Torvalds @ 2011-06-06  8:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Borislav Petkov, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec

On Mon, Jun 6, 2011 at 2:50 AM, Andy Lutomirski <luto@mit.edu> wrote:
> CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
> temporary hack to avoid penalizing users who don't build glibc from
> git.

I really hate that name.

Do you have *any* reason to call this "unsafe"?

Seriously. The whole patch series just seems annoying.

                  Linus

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06  8:46   ` [PATCH v5 9/9] " Linus Torvalds
@ 2011-06-06  9:31     ` Andi Kleen
  2011-06-06 10:39       ` pageexec
  2011-06-06 10:24     ` [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS Ingo Molnar
  2011-06-06 14:34     ` [tip:x86/vdso] " tip-bot for Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: Andi Kleen @ 2011-06-06  9:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ingo Molnar, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Andrew Morton, Arjan van de Ven,
	Jan Beulich, richard -rw- weinberger, Mikael Pettersson,
	Andi Kleen, Brian Gerst, Louis Rilling, Valdis.Kletnieks,
	pageexec

On Mon, Jun 06, 2011 at 05:46:41PM +0900, Linus Torvalds wrote:
> On Mon, Jun 6, 2011 at 2:50 AM, Andy Lutomirski <luto@mit.edu> wrote:
> > CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
> > temporary hack to avoid penalizing users who don't build glibc from
> > git.
> 
> I really hate that name.
> 
> Do you have *any* reason to call this "unsafe"?
> 
> Seriously. The whole patch series just seems annoying.

and assumes everyone is using glibc which is just wrong.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-05 20:01     ` Andrew Lutomirski
  2011-06-06  7:39       ` Ingo Molnar
@ 2011-06-06  9:42       ` pageexec
  2011-06-06 11:19         ` Andrew Lutomirski
  2011-06-06 15:41         ` Ingo Molnar
  1 sibling, 2 replies; 112+ messages in thread
From: pageexec @ 2011-06-06  9:42 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Lutomirski
  Cc: x86, Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Linus Torvalds, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 5 Jun 2011 at 16:01, Andrew Lutomirski wrote:

> On Sun, Jun 5, 2011 at 3:30 PM, Ingo Molnar <mingo@elte.hu> wrote:
[...]
> > ffffffffff60012a <vread_hpet>:
> > ffffffffff60012a:       55                      push   %rbp
> > ffffffffff60012b:       48 89 e5                mov    %rsp,%rbp
> > ffffffffff60012e:       8b 04 25 f0 f0 5f ff    mov    0xffffffffff5ff0f0,%eax
> > ffffffffff600135:       89 c0                   mov    %eax,%eax
> > ffffffffff600137:       5d                      pop    %rbp
> > ffffffffff600138:       c3                      retq
> >
> > There's no obvious syscall instruction in them that i can see. No
> > 0x0f 0x05 pattern (even misaligned), no 0xcd-anything.
> 
> I can't see any problem, but exploit writers are exceedingly clever,
> and maybe someone has a use for a piece of the code that isn't a
> syscall.  Just as a completely artificial example, here's some buggy
> code:

what you're describing here is a classical ret2libc (in modern marketing
speak, ROP) attack. in general, having an executable ret insn (with an
optional pop even) at a fixed address is very useful, especially for the
all too classical case of stack overflows where the attacker may already
know of a 'good' function pointer somewhere on the stack but in order to
have the cpu reach it, he needs to pop enough bytes off of it. guess what
they'll use this ret at a fixed address for...

as i said in private already, for security there's only one real solution
here: make the vsyscall page non-executable (as i did in PaX years ago)
and move or redirect every entry point to the vdso. yes, that kills the
fast path performance until glibc stops using the vsyscall page.

another thing to consider for using the int xx redirection scheme (speaking
of which, it should just be an int3): it enables new kinds of 'nop sled'
sequences that IDS/IPS systems will be unaware of, not exactly a win for
the security conscious/aware people who this change is supposed to serve.

> I have no problem with that suggestion, except that once the current
> series makes it into -tip I intend to move vread_tsc and vread_hpet to
> the vDSO.  So leaving them alone for now saves work, and they'll be
> more maintainable later if they're written in C.

imho, moving everything to and executing from the vdso page is the only
viable solution if you really want to fix the security aspect of the
vsyscall mess. it's worked fine for PaX for years now ;).


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06  8:46   ` [PATCH v5 9/9] " Linus Torvalds
  2011-06-06  9:31     ` Andi Kleen
@ 2011-06-06 10:24     ` Ingo Molnar
  2011-06-06 11:20       ` pageexec
  2011-06-06 12:19       ` Ted Ts'o
  2011-06-06 14:34     ` [tip:x86/vdso] " tip-bot for Ingo Molnar
  2 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 10:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Borislav Petkov, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Jun 6, 2011 at 2:50 AM, Andy Lutomirski <luto@mit.edu> wrote:
> > CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
> > temporary hack to avoid penalizing users who don't build glibc from
> > git.
> 
> I really hate that name.
> 
> Do you have *any* reason to call this "unsafe"?

No, there's no reason at all for that. That naming is borderline 
security FUD and last time i saw the series i considered renaming
it but got distracted :-)

How about the patch below? COMPAT_VSYSCALLS looks like a good logical 
extension to the COMPAT_VDSO we already have.

CONFIG_FIXED_VSYSCALLS seemed a bit awkward to me nor does it carry 
the compat nature of them.

Thanks,

	Ingo

--------------->
>From 1593843e2ada6d6832d0de4d633aacd997dc3a45 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 6 Jun 2011 12:13:40 +0200
Subject: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS

Linus pointed out that the UNSAFE_VSYSCALL naming was inherently
bad: it suggests that there's something unsafe about enabling them,
while in reality they only have any security effect in the presence
of some *other* security hole.

So rename it to CONFIG_COMPAT_VSYSCALL and fix the documentation
and Kconfig text to correctly explain the purpose of this change.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/BANLkTimrhO8QfBqQsH_Q13ghRH2P%2BZP7AA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/feature-removal-schedule.txt |    7 ++++---
 arch/x86/Kconfig                           |   17 ++++++++++-------
 arch/x86/kernel/vsyscall_64.c              |    8 ++++----
 arch/x86/kernel/vsyscall_emu_64.S          |    2 +-
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 94b4470..4282ab2 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -601,10 +601,11 @@ Who:	Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 
 ----------------------------
 
-What:	CONFIG_UNSAFE_VSYSCALLS (x86_64)
+What:	CONFIG_COMPAT_VSYSCALLS (x86_64)
 When:	When glibc 2.14 or newer is ubitquitous.  Perhaps mid-2012.
-Why:	Having user-executable code at a fixed address is a security problem.
-	Turning off CONFIG_UNSAFE_VSYSCALLS mostly removes the risk but will
+Why:	Having user-executable syscall invoking code at a fixed addresses makes
+	it easier for attackers to exploit security holes.
+	Turning off CONFIG_COMPAT_VSYSCALLS mostly removes the risk but will
 	make the time() function slower on glibc versions 2.13 and below.
 Who:	Andy Lutomirski <luto@mit.edu>
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79e5d8a..30041d8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1646,20 +1646,23 @@ config COMPAT_VDSO
 
 	  If unsure, say Y.
 
-config UNSAFE_VSYSCALLS
+config COMPAT_VSYSCALLS
 	def_bool y
-	prompt "Unsafe fast legacy vsyscalls"
+	prompt "Fixed address legacy vsyscalls"
 	depends on X86_64
 	---help---
 	  Legacy user code expects to be able to issue three syscalls
-	  by calling fixed addresses in kernel space.  If you say N,
-	  then the kernel traps and emulates these calls.  If you say
-	  Y, then there is actual executable code at a fixed address
-	  to implement time() efficiently.
+	  by calling a fixed addresses.  If you say N, then the kernel
+	  traps and emulates these calls.  If you say Y, then there is
+	  actual executable code at a fixed address to implement time()
+	  efficiently.
 
 	  On a system with recent enough glibc (probably 2.14 or
 	  newer) and no static binaries, you can say N without a
-	  performance penalty to improve security
+	  performance penalty to improve security: having no fixed
+	  address userspace-executable syscall invoking code makes
+	  it harder for both remote and local attackers to exploit
+	  security holes.
 
 	  If unsure, say Y.
 
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 285af7a..27d49b7 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -116,7 +116,7 @@ static int al_to_vsyscall_nr(u8 al)
 	return -1;
 }
 
-#ifdef CONFIG_UNSAFE_VSYSCALLS
+#ifdef CONFIG_COMPAT_VSYSCALLS
 
 /* This will break when the xtime seconds get inaccurate, but that is
  * unlikely */
@@ -138,9 +138,9 @@ vtime(time_t *t)
 	return result;
 }
 
-#endif /* CONFIG_UNSAFE_VSYSCALLS */
+#endif /* CONFIG_COMPAT_VSYSCALLS */
 
-/* If CONFIG_UNSAFE_VSYSCALLS=y, then this is incorrect for vsyscall_nr == 1. */
+/* If CONFIG_COMPAT_VSYSCALLS=y, then this is incorrect for vsyscall_nr == 1. */
 static inline unsigned long vsyscall_intcc_addr(int vsyscall_nr)
 {
 	return VSYSCALL_START + 1024*vsyscall_nr + 2;
@@ -202,7 +202,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 		break;
 
 	case 1:
-#ifdef CONFIG_UNSAFE_VSYSCALLS
+#ifdef CONFIG_COMPAT_VSYSCALLS
 		warn_bad_vsyscall(KERN_WARNING, regs, "bogus time() vsyscall "
 				  "emulation (exploit attempt?)");
 		goto sigsegv;
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
index 7ebde61..2d53e26 100644
--- a/arch/x86/kernel/vsyscall_emu_64.S
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -25,7 +25,7 @@ ENTRY(vsyscall_0)
 	ret
 END(vsyscall_0)
 
-#ifndef CONFIG_UNSAFE_VSYSCALLS
+#ifndef CONFIG_COMPAT_VSYSCALLS
 .section .vsyscall_1, "a"
 ENTRY(vsyscall_1)
 	movb $0xce, %al

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06  9:31     ` Andi Kleen
@ 2011-06-06 10:39       ` pageexec
  2011-06-06 13:56         ` Linus Torvalds
                           ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: pageexec @ 2011-06-06 10:39 UTC (permalink / raw)
  To: Linus Torvalds, Andi Kleen
  Cc: Andy Lutomirski, Ingo Molnar, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Andrew Morton, Arjan van de Ven,
	Jan Beulich, richard -rw- weinberger, Mikael Pettersson,
	Andi Kleen, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 11:31, Andi Kleen wrote:

> On Mon, Jun 06, 2011 at 05:46:41PM +0900, Linus Torvalds wrote:
> > On Mon, Jun 6, 2011 at 2:50 AM, Andy Lutomirski <luto@mit.edu> wrote:
> > > CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
> > > temporary hack to avoid penalizing users who don't build glibc from
> > > git.

[didn't get your mail directly (yet?), so i'm replying here]

> > I really hate that name.
> > 
> > Do you have *any* reason to call this "unsafe"?

any userland executable code at a universally (read: across any and all 2.6+ linux
boxes) fixed address is not secure (no really, it's worse, it's simply insane design,
there's a reason the vdso got randomized eventually), it's the prime vehicle used by
both reliable userland and kernel exploits who need to execute syscalls and/or pop
the stack until something useful is reached, etc. not to mention the generic snippets
of both code and data (marketing word: ROP) that one may find in there.

> > Seriously. The whole patch series just seems annoying.

what is annoying is your covering up of security fixes on grounds that you don't want
to help script kiddies (a bullshit argument as it were) but at the same time question
proactive security measures (one can debate the implementation, see my other mail) that
would *actually* prevent the same kiddies from writing textbook exploits.

but hey, spouting security to journalists works so much better for marketing, doesn't it.

> and assumes everyone is using glibc which is just wrong.

the libc is irrelevant, they can all be fixed up to use the vdso entry points if they
haven't been doing it already. already deployed systems will simply continue to use
their flawed kernel and libc, they're not affected.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06  9:42       ` pageexec
@ 2011-06-06 11:19         ` Andrew Lutomirski
  2011-06-06 11:56           ` pageexec
  2011-06-06 15:41         ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: Andrew Lutomirski @ 2011-06-06 11:19 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Borislav Petkov, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Jan Beulich, richard -rw- weinberger, Mikael Pettersson,
	Andi Kleen, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Mon, Jun 6, 2011 at 5:42 AM,  <pageexec@freemail.hu> wrote:
>>
>> I can't see any problem, but exploit writers are exceedingly clever,
>> and maybe someone has a use for a piece of the code that isn't a
>> syscall.  Just as a completely artificial example, here's some buggy
>> code:
>
> what you're describing here is a classical ret2libc (in modern marketing
> speak, ROP) attack. in general, having an executable ret insn (with an
> optional pop even) at a fixed address is very useful, especially for the
> all too classical case of stack overflows where the attacker may already
> know of a 'good' function pointer somewhere on the stack but in order to
> have the cpu reach it, he needs to pop enough bytes off of it. guess what
> they'll use this ret at a fixed address for...

I'm even more convinced now that exploit writers are exceedingly clever.

>
> as i said in private already, for security there's only one real solution
> here: make the vsyscall page non-executable (as i did in PaX years ago)
> and move or redirect every entry point to the vdso. yes, that kills the
> fast path performance until glibc stops using the vsyscall page.

I'm still unconvinced.

I would be happy to submit a version where the entire sequence is just
int 0xcc and the kernel emulates the ret instruction as well.  But I'm
not convinced that using a page fault to emulate the vsyscalls is any
better, and it's less flexible, slower, and it could impact a fast
path in the kernel.

>
> another thing to consider for using the int xx redirection scheme (speaking
> of which, it should just be an int3):

Why?  0xcd 0xcc traps no matter what offset you enter it at.

> it enables new kinds of 'nop sled'
> sequences that IDS/IPS systems will be unaware of, not exactly a win for
> the security conscious/aware people who this change is supposed to serve.

I think that's only because the patch allows int 0xcc to exist at any
address.  That's only because not doing so will apparently break one
particular commercial program.

I'm happy to break said program, and it sounds like the maintainer
will fix it up quickly.  I checked, and at least recent versions of
valgrind would not be affected (contrary to what I said earlier).

I don't think that making the page NX is viable until at least 2012.
We really want to wait for that glibc release.

(Yes, I know that not everyone uses glibc.  But the only remotely
relevant alternative out there that I can find that would be affected
is Go, and I'm sure that'll get fixed up in short order.)

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 10:24     ` [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS Ingo Molnar
@ 2011-06-06 11:20       ` pageexec
  2011-06-06 12:47         ` Ingo Molnar
  2011-06-06 12:19       ` Ted Ts'o
  1 sibling, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 11:20 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andy Lutomirski, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Borislav Petkov, Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 12:24, Ingo Molnar wrote:

> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Mon, Jun 6, 2011 at 2:50 AM, Andy Lutomirski <luto@mit.edu> wrote:
> > > CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
> > > temporary hack to avoid penalizing users who don't build glibc from
> > > git.
> > 
> > I really hate that name.
> > 
> > Do you have *any* reason to call this "unsafe"?
> 
> No, there's no reason at all for that. That naming is borderline 
> security FUD and last time i saw the series i considered renaming
> it but got distracted :-)

security FUD? for real? ;) does that mean that you guys would accept a patch
that would map the vdso at a fixed address for old times's sake? if not, on
what grounds would you refuse it? see, you can't have it both ways.

the fixed address of the vsyscall page *is* a very real security problem, it
should have never been accepted as such and it's high time it went away finally
in 2011AD.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 11:19         ` Andrew Lutomirski
@ 2011-06-06 11:56           ` pageexec
  2011-06-06 12:43             ` Andrew Lutomirski
  2011-06-06 14:01             ` Linus Torvalds
  0 siblings, 2 replies; 112+ messages in thread
From: pageexec @ 2011-06-06 11:56 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Borislav Petkov, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Jan Beulich, richard -rw- weinberger, Mikael Pettersson,
	Andi Kleen, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 7:19, Andrew Lutomirski wrote:

> On Mon, Jun 6, 2011 at 5:42 AM,  <pageexec@freemail.hu> wrote:
> > as i said in private already, for security there's only one real solution
> > here: make the vsyscall page non-executable (as i did in PaX years ago)
> > and move or redirect every entry point to the vdso. yes, that kills the
> > fast path performance until glibc stops using the vsyscall page.
> 
> I'm still unconvinced.
> 
> I would be happy to submit a version where the entire sequence is just
> int 0xcc and the kernel emulates the ret instruction as well.  But I'm
> not convinced that using a page fault to emulate the vsyscalls is any
> better,

> and it's less flexible

why? as in, what kind of flexibility do you need that int xx can provide but a page
fault cannot?

> slower

vs. int xx? it's the same kind of cpu exception (lots of cycles to transition to
kernel mode), and the vsyscall address checks can be moved arbitrarily close
to the entry point, if that really matters.

> and it could impact a fast path in the kernel.

a page fault is never a fast path, after all the cpu has just taken an exception
(vs. the syscall/sysenter style actually fast user->kernel transition) and is
about to make page table changes (and possibly TLB flushes).

> > another thing to consider for using the int xx redirection scheme (speaking
> > of which, it should just be an int3):
> 
> Why?  0xcd 0xcc traps no matter what offset you enter it at.

but you're wasting/abusing an IDT entry for no real gain (and it's lots of code
for such a little change). also placing sw interrupts among hw ones is what can
result in (ab)use like this:

http://www.invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf

(yes, a highly theoretical case but since you were wondering about explot writers... ;)

> > it enables new kinds of 'nop sled'
> > sequences that IDS/IPS systems will be unaware of, not exactly a win for
> > the security conscious/aware people who this change is supposed to serve.
> 
> I think that's only because the patch allows int 0xcc to exist at any
> address.  That's only because not doing so will apparently break one
> particular commercial program.

well, that's one you know about now (from a small userbase that tracks very
recent lkml discussions), but you don't know what else out there uses the same
tricks...

> I don't think that making the page NX is viable until at least 2012.
> We really want to wait for that glibc release.

sure, if for mainline users performance impact is that much more important
then timing the nx approach for later is no problem (i'll just have to do
more work till then to revert/adapt this in PaX ;).

> (Yes, I know that not everyone uses glibc.  But the only remotely
> relevant alternative out there that I can find that would be affected
> is Go, and I'm sure that'll get fixed up in short order.)

i don't understand why people are so 'worked up' about glibc or not glibc.

it's *irrelevant*. this change you propose would go into future kernels,
it would not affect existing ones, obviously. therefore anyone possibly
affected would have to update his kernel first at which point they have
no excuse to not update their libc of whatever flavour as well.

in other words, it's at most a distro issue (in that they have to be actively
aware of not forgetting about the libc update for this future kernel), no
existing user will care or be affected.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 10:24     ` [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS Ingo Molnar
  2011-06-06 11:20       ` pageexec
@ 2011-06-06 12:19       ` Ted Ts'o
  2011-06-06 12:33         ` Andrew Lutomirski
  2011-06-06 12:37         ` Ingo Molnar
  1 sibling, 2 replies; 112+ messages in thread
From: Ted Ts'o @ 2011-06-06 12:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, pageexec

On Mon, Jun 06, 2011 at 12:24:19PM +0200, Ingo Molnar wrote:
>  
> -What:	CONFIG_UNSAFE_VSYSCALLS (x86_64)
> +What:	CONFIG_COMPAT_VSYSCALLS (x86_64)
>  When:	When glibc 2.14 or newer is ubitquitous.  Perhaps mid-2012.
> -Why:	Having user-executable code at a fixed address is a security problem.
> -	Turning off CONFIG_UNSAFE_VSYSCALLS mostly removes the risk but will
> +Why:	Having user-executable syscall invoking code at a fixed addresses makes
> +	it easier for attackers to exploit security holes.
> +	Turning off CONFIG_COMPAT_VSYSCALLS mostly removes the risk but will
>  	make the time() function slower on glibc versions 2.13 and below.
>  Who:	Andy Lutomirski <luto@mit.edu>

I'd suggest 2013 or 2014, at least.  People using Ubuntu LTS and RHEL
6 are stuck back at glibc 2.11, and many of those users do like being
able to upgrade to newer kernels.  And there are probably are a large
number of static binaries around.

Maybe in 2012 or so we change the to be 'no' (and I'd suggest adding a
comment in the feature-removal-schedule.txt file that this will also
break static binaries).

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 12:19       ` Ted Ts'o
@ 2011-06-06 12:33         ` Andrew Lutomirski
  2011-06-06 12:37         ` Ingo Molnar
  1 sibling, 0 replies; 112+ messages in thread
From: Andrew Lutomirski @ 2011-06-06 12:33 UTC (permalink / raw)
  To: Ted Ts'o, Ingo Molnar, Linus Torvalds, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec

On Mon, Jun 6, 2011 at 8:19 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Jun 06, 2011 at 12:24:19PM +0200, Ingo Molnar wrote:
>>
>> -What:        CONFIG_UNSAFE_VSYSCALLS (x86_64)
>> +What:        CONFIG_COMPAT_VSYSCALLS (x86_64)
>>  When:        When glibc 2.14 or newer is ubitquitous.  Perhaps mid-2012.
>> -Why: Having user-executable code at a fixed address is a security problem.
>> -     Turning off CONFIG_UNSAFE_VSYSCALLS mostly removes the risk but will
>> +Why: Having user-executable syscall invoking code at a fixed addresses makes
>> +     it easier for attackers to exploit security holes.
>> +     Turning off CONFIG_COMPAT_VSYSCALLS mostly removes the risk but will
>>       make the time() function slower on glibc versions 2.13 and below.
>>  Who: Andy Lutomirski <luto@mit.edu>
>
> I'd suggest 2013 or 2014, at least.  People using Ubuntu LTS and RHEL
> 6 are stuck back at glibc 2.11, and many of those users do like being
> able to upgrade to newer kernels.  And there are probably are a large
> number of static binaries around.
>
> Maybe in 2012 or so we change the to be 'no' (and I'd suggest adding a
> comment in the feature-removal-schedule.txt file that this will also
> break static binaries).

It doesn't actually break them; it just slows them down.  But I'm very
particular about the date.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 12:19       ` Ted Ts'o
  2011-06-06 12:33         ` Andrew Lutomirski
@ 2011-06-06 12:37         ` Ingo Molnar
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 12:37 UTC (permalink / raw)
  To: Ted Ts'o, Linus Torvalds, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks, pageexec


* Ted Ts'o <tytso@mit.edu> wrote:

> On Mon, Jun 06, 2011 at 12:24:19PM +0200, Ingo Molnar wrote:
> >  
> > -What:	CONFIG_UNSAFE_VSYSCALLS (x86_64)
> > +What:	CONFIG_COMPAT_VSYSCALLS (x86_64)
> >  When:	When glibc 2.14 or newer is ubitquitous.  Perhaps mid-2012.
> > -Why:	Having user-executable code at a fixed address is a security problem.
> > -	Turning off CONFIG_UNSAFE_VSYSCALLS mostly removes the risk but will
> > +Why:	Having user-executable syscall invoking code at a fixed addresses makes
> > +	it easier for attackers to exploit security holes.
> > +	Turning off CONFIG_COMPAT_VSYSCALLS mostly removes the risk but will
> >  	make the time() function slower on glibc versions 2.13 and below.
> >  Who:	Andy Lutomirski <luto@mit.edu>
> 
> I'd suggest 2013 or 2014, at least.  People using Ubuntu LTS and 
> RHEL 6 are stuck back at glibc 2.11, and many of those users do 
> like being able to upgrade to newer kernels.  And there are 
> probably are a large number of static binaries around.
> 
> Maybe in 2012 or so we change the to be 'no' (and I'd suggest 
> adding a comment in the feature-removal-schedule.txt file that this 
> will also break static binaries).

There is no breakage of binaries at all, just a small slowdown for 
time() calls - but even that is fixable via a simple, backportable 
few-liner glibc patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 11:56           ` pageexec
@ 2011-06-06 12:43             ` Andrew Lutomirski
  2011-06-06 13:58               ` pageexec
  2011-06-06 14:01             ` Linus Torvalds
  1 sibling, 1 reply; 112+ messages in thread
From: Andrew Lutomirski @ 2011-06-06 12:43 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Borislav Petkov, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Jan Beulich, richard -rw- weinberger, Mikael Pettersson,
	Andi Kleen, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Mon, Jun 6, 2011 at 7:56 AM,  <pageexec@freemail.hu> wrote:
> On 6 Jun 2011 at 7:19, Andrew Lutomirski wrote:
>
>> On Mon, Jun 6, 2011 at 5:42 AM,  <pageexec@freemail.hu> wrote:
>> > as i said in private already, for security there's only one real solution
>> > here: make the vsyscall page non-executable (as i did in PaX years ago)
>> > and move or redirect every entry point to the vdso. yes, that kills the
>> > fast path performance until glibc stops using the vsyscall page.
>>
>> I'm still unconvinced.
>>
>> I would be happy to submit a version where the entire sequence is just
>> int 0xcc and the kernel emulates the ret instruction as well.  But I'm
>> not convinced that using a page fault to emulate the vsyscalls is any
>> better,
>
>> and it's less flexible
>
> why? as in, what kind of flexibility do you need that int xx can provide but a page
> fault cannot?

The ability to make time() fast when configured that way.

>
>> slower
>
> vs. int xx? it's the same kind of cpu exception (lots of cycles to transition to
> kernel mode), and the vsyscall address checks can be moved arbitrarily close
> to the entry point, if that really matters.
>
>> and it could impact a fast path in the kernel.
>
> a page fault is never a fast path, after all the cpu has just taken an exception
> (vs. the syscall/sysenter style actually fast user->kernel transition) and is
> about to make page table changes (and possibly TLB flushes).

Sure it is.  It's a path that's optimized carefully and needs to be as
fast as possible.  Just because it's annoyingly slow doesn't mean we
get to make it even slower.

>
>> > another thing to consider for using the int xx redirection scheme (speaking
>> > of which, it should just be an int3):
>>
>> Why?  0xcd 0xcc traps no matter what offset you enter it at.
>
> but you're wasting/abusing an IDT entry for no real gain (and it's lots of code
> for such a little change). also placing sw interrupts among hw ones is what can
> result in (ab)use like this:

I think it's less messy than mucking with the page fault handler.

>
> http://www.invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf
>
> (yes, a highly theoretical case but since you were wondering about explot writers... ;)

Cute.  Impossible to use against the more paranoid variant of int
0xcc, I think, because it verifies that RIP does, in fact, point to
somewhere where an int 0xcc belongs.

>
>> > it enables new kinds of 'nop sled'
>> > sequences that IDS/IPS systems will be unaware of, not exactly a win for
>> > the security conscious/aware people who this change is supposed to serve.
>>
>> I think that's only because the patch allows int 0xcc to exist at any
>> address.  That's only because not doing so will apparently break one
>> particular commercial program.
>
> well, that's one you know about now (from a small userbase that tracks very
> recent lkml discussions), but you don't know what else out there uses the same
> tricks...
>
>> I don't think that making the page NX is viable until at least 2012.
>> We really want to wait for that glibc release.
>
> sure, if for mainline users performance impact is that much more important
> then timing the nx approach for later is no problem (i'll just have to do
> more work till then to revert/adapt this in PaX ;).

I think my approach is at least as paranoid as yours.  Why won't it
work (if int 0xcc is disallowed from outside the vsyscall page)?

>
>> (Yes, I know that not everyone uses glibc.  But the only remotely
>> relevant alternative out there that I can find that would be affected
>> is Go, and I'm sure that'll get fixed up in short order.)
>
> i don't understand why people are so 'worked up' about glibc or not glibc.
>
> it's *irrelevant*. this change you propose would go into future kernels,
> it would not affect existing ones, obviously. therefore anyone possibly
> affected would have to update his kernel first at which point they have
> no excuse to not update their libc of whatever flavour as well.

That's not true.  New kernels are explicitly supposed to work with old
userspace.  Lots of users of old RHEL versions, for example,
nonetheless run new kernels.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 11:20       ` pageexec
@ 2011-06-06 12:47         ` Ingo Molnar
  2011-06-06 12:48           ` Ingo Molnar
  2011-06-06 18:04           ` pageexec
  0 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 12:47 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 12:24, Ingo Molnar wrote:
> 
> > 
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > On Mon, Jun 6, 2011 at 2:50 AM, Andy Lutomirski <luto@mit.edu> wrote:
> > > > CONFIG_UNSAFE_VSYSCALLS was added in the previous patch as a
> > > > temporary hack to avoid penalizing users who don't build glibc from
> > > > git.
> > > 
> > > I really hate that name.
> > > 
> > > Do you have *any* reason to call this "unsafe"?
> > 
> > No, there's no reason at all for that. That naming is borderline 
> > security FUD and last time i saw the series i considered renaming
> > it but got distracted :-)
> 
> security FUD? for real? ;) [...]

'Borderline' security FUD! :-)

> [...] does that mean that you guys would accept a patch that would 
> map the vdso at a fixed address for old times's sake? if not, on 
> what grounds would you refuse it? see, you can't have it both ways.

You can actually do that by enabling CONFIG_COMPAT_VDSO=y.

> the fixed address of the vsyscall page *is* a very real security 
> problem, it should have never been accepted as such and it's high 
> time it went away finally in 2011AD.

It's only a security problem if there's a security hole elsewhere.

The thing is, and i'm not sure whether you realize or recognize it, 
but these measures *are* two-edged swords.

Yes, the upside is that they reduce the risks associated with 
security holes - but only statistically so.

The downside is that having such a measure in place makes it somewhat 
less likely that those bugs will be found and fixed in the future: if 
a bug is not exploitable then people like Spender wont spend time 
exploiting and making a big deal out of them, right?

And yes, it might be embarrasing to see easy exploits and we might 
roll eyes at the associated self-promotion circus but it will be one 
more bug found, the reasons for the bug will be examined, potentially 
avoiding a whole class of similar bugs *for sure*.

Can you guarantee that security bugs will be found and fixed with the 
same kind of intensity even if we make their exploitation (much) 
harder? I don't think you can make such a guarantee.

So as long as we are trading bugs-fixed-for-sure against statistical 
safety we have to be mindful of the downsides of such a tradeoff ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 12:47         ` Ingo Molnar
@ 2011-06-06 12:48           ` Ingo Molnar
  2011-06-06 18:04           ` pageexec
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 12:48 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* Ingo Molnar <mingo@elte.hu> wrote:

> > [...] does that mean that you guys would accept a patch that 
> > would map the vdso at a fixed address for old times's sake? if 
> > not, on what grounds would you refuse it? see, you can't have it 
> > both ways.
> 
> You can actually do that by enabling CONFIG_COMPAT_VDSO=y.

(on 32-bit x86)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 10:39       ` pageexec
@ 2011-06-06 13:56         ` Linus Torvalds
  2011-06-06 18:46           ` pageexec
  2011-06-06 14:44         ` Ingo Molnar
  2011-06-06 14:52         ` Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: Linus Torvalds @ 2011-06-06 13:56 UTC (permalink / raw)
  To: pageexec
  Cc: Andi Kleen, Andy Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Mon, Jun 6, 2011 at 7:39 PM,  <pageexec@freemail.hu> wrote:
>
> what is annoying is your covering up of security fixes on grounds that you don't want
> to help script kiddies (a bullshit argument as it were) but at the same time question
> proactive security measures (one can debate the implementation, see my other mail) that
> would *actually* prevent the same kiddies from writing textbook exploits.

Shut up unless you have any real arguments. I know you have your
hangups, and I just don't care.

Calling the old vdso "UNSAFE" as a config option is just plain stupid.
t's a politicized name, with no good reason except for your political
agenda. And when I call it out as such, you just spout the same tired
old security nonsense.

I'm happy with perhaps moving away from the fixed-address vdso, but
that does not excuse bad naming and non-descriptive crap like the
feature-removal thing, and all the insanity going on in the thread. If
the config option is about removing the legacy vdso, then CALL IT
THAT, instead of spouting idiotic and irrelevant nonsense.

                       Linus

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 12:43             ` Andrew Lutomirski
@ 2011-06-06 13:58               ` pageexec
  2011-06-06 14:07                 ` Brian Gerst
  2011-06-06 15:26                 ` Ingo Molnar
  0 siblings, 2 replies; 112+ messages in thread
From: pageexec @ 2011-06-06 13:58 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, x86, Thomas Gleixner, linux-kernel, Jesper Juhl,
	Borislav Petkov, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Jan Beulich, richard -rw- weinberger, Mikael Pettersson,
	Andi Kleen, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 8:43, Andrew Lutomirski wrote:

> >> and it's less flexible
> >
> > why? as in, what kind of flexibility do you need that int xx can provide but a page
> > fault cannot?
> 
> The ability to make time() fast when configured that way.

true, nx and fast time() at vsyscall addresses will never mix. but it's a temporary
problem for anyone who cares, a trivial glibc patch fixes it.

> >> and it could impact a fast path in the kernel.
> >
> > a page fault is never a fast path, after all the cpu has just taken an exception
> > (vs. the syscall/sysenter style actually fast user->kernel transition) and is
> > about to make page table changes (and possibly TLB flushes).
> 
> Sure it is.  It's a path that's optimized carefully and needs to be as
> fast as possible.  Just because it's annoyingly slow doesn't mean we
> get to make it even slower.

sorry, but stating that the pf handler is a fast path doesn't make it so ;).
the typical pf is caused by userland to either fill in non-present pages
or do c-o-w, a few well predicted conditional branches in those paths are
simply not measurable (actually, those conditional branches would not be
on those paths, at least they aren't in PaX). seriously, try it ;).

> >> > another thing to consider for using the int xx redirection scheme (speaking
> >> > of which, it should just be an int3):
> >>
> >> Why?  0xcd 0xcc traps no matter what offset you enter it at.
> >
> > but you're wasting/abusing an IDT entry for no real gain (and it's lots of code
> > for such a little change). also placing sw interrupts among hw ones is what can
> > result in (ab)use like this:
> 
> I think it's less messy than mucking with the page fault handler.

do you know what that mucking looks like? ;) prepare for the most complex code
you've ever seen (it's in __bad_area_nosemaphore):

 779 #ifdef CONFIG_X86_64
 780 »·······if (mm && (error_code & PF_INSTR) && mm->context.vdso) {
 781 »·······»·······if (regs->ip == (unsigned long)vgettimeofday) {
 782 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, gettimeofday);
 783 »·······»·······»·······return;
 784 »·······»·······} else if (regs->ip == (unsigned long)vtime) {
 785 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, clock_gettime);
 786 »·······»·······»·······return;
 787 »·······»·······} else if (regs->ip == (unsigned long)vgetcpu) {
 788 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, getcpu);
 789 »·······»·······»·······return;
 790 »·······»·······}
 791 »·······}
 792 #endif

if there's complexity involved with the nx vsyscall page approach, it's certainly
not in the pf handler, rather in the moving of data/code into the vdso (something
that you have done or will do too eventually, so it's not an argument really against
my approach).

> >> I don't think that making the page NX is viable until at least 2012.
> >> We really want to wait for that glibc release.
> >
> > sure, if for mainline users performance impact is that much more important
> > then timing the nx approach for later is no problem (i'll just have to do
> > more work till then to revert/adapt this in PaX ;).
> 
> I think my approach is at least as paranoid as yours.  Why won't it
> work (if int 0xcc is disallowed from outside the vsyscall page)?

it's not about paranoia or what works ;). the question is, what goals are you
trying to achieve and what is the best way to achieve them. to me it appeared so
far that the fundamental problem you guys realized and wanted to do something
about is that 'attacker can rely on known code at known addresses in his exploit
attempts'. now if you only worry about the 'syscalls at fixed address' subset
of this problem, then sure, anything that removes them or makes them uniquely
identifiable solves that subset of the problem. but if you worry about the
bigger problem (as stated above) then your approach is not enough (in fact,
even my approach is not good enough since data can still be read or relied
upon in the vsyscall page at known addresses, so nothing short of removing it
cuts it really and i'm glad we're getting close to that goal finally).

> > it's *irrelevant*. this change you propose would go into future kernels,
> > it would not affect existing ones, obviously. therefore anyone possibly
> > affected would have to update his kernel first at which point they have
> > no excuse to not update their libc of whatever flavour as well.
> 
> That's not true.  New kernels are explicitly supposed to work with old
> userspace.

trust me, the nx vsyscall approach i implemented in PaX works with those old
userlands, even without users knowing it as it's not a configurable feature ;).

> Lots of users of old RHEL versions, for example, nonetheless run new kernels.  

if they go to the trouble of running fresh new vanilla kernels, they can surely
afford to patch a few lines in glibc? or if RH backports this to the RHEL kernel,
they can surely do the same with glibc?


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 11:56           ` pageexec
  2011-06-06 12:43             ` Andrew Lutomirski
@ 2011-06-06 14:01             ` Linus Torvalds
  2011-06-06 14:55               ` pageexec
  1 sibling, 1 reply; 112+ messages in thread
From: Linus Torvalds @ 2011-06-06 14:01 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On Mon, Jun 6, 2011 at 8:56 PM,  <pageexec@freemail.hu> wrote:
>
> it's *irrelevant*. this change you propose would go into future kernels,
> it would not affect existing ones, obviously. therefore anyone possibly
> affected would have to update his kernel first at which point they have
> no excuse to not update their libc of whatever flavour as well.

Christ.

Heard about that thing called "backwards compatibility"?

We *require* that those "future kernels" run the old unmodified binaries.

Yeah, I know you don't have that requirement, but anything that
actually wants to be considered *relevant* and actually merged into a
mainline kernel does have that requirement. So your argument is utter
crap, ignorant, and stupid.

No, we don't update any libraries for a kernel upgrade. Ever. End of story.

               Linus

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 13:58               ` pageexec
@ 2011-06-06 14:07                 ` Brian Gerst
  2011-06-07 23:32                   ` pageexec
  2011-06-06 15:26                 ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: Brian Gerst @ 2011-06-06 14:07 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Louis Rilling, Valdis.Kletnieks

On Mon, Jun 6, 2011 at 9:58 AM,  <pageexec@freemail.hu> wrote:
> On 6 Jun 2011 at 8:43, Andrew Lutomirski wrote:
>
>> >> and it's less flexible
>> >
>> > why? as in, what kind of flexibility do you need that int xx can provide but a page
>> > fault cannot?
>>
>> The ability to make time() fast when configured that way.
>
> true, nx and fast time() at vsyscall addresses will never mix. but it's a temporary
> problem for anyone who cares, a trivial glibc patch fixes it.
>
>> >> and it could impact a fast path in the kernel.
>> >
>> > a page fault is never a fast path, after all the cpu has just taken an exception
>> > (vs. the syscall/sysenter style actually fast user->kernel transition) and is
>> > about to make page table changes (and possibly TLB flushes).
>>
>> Sure it is.  It's a path that's optimized carefully and needs to be as
>> fast as possible.  Just because it's annoyingly slow doesn't mean we
>> get to make it even slower.
>
> sorry, but stating that the pf handler is a fast path doesn't make it so ;).
> the typical pf is caused by userland to either fill in non-present pages
> or do c-o-w, a few well predicted conditional branches in those paths are
> simply not measurable (actually, those conditional branches would not be
> on those paths, at least they aren't in PaX). seriously, try it ;).
>
>> >> > another thing to consider for using the int xx redirection scheme (speaking
>> >> > of which, it should just be an int3):
>> >>
>> >> Why?  0xcd 0xcc traps no matter what offset you enter it at.
>> >
>> > but you're wasting/abusing an IDT entry for no real gain (and it's lots of code
>> > for such a little change). also placing sw interrupts among hw ones is what can
>> > result in (ab)use like this:
>>
>> I think it's less messy than mucking with the page fault handler.
>
> do you know what that mucking looks like? ;) prepare for the most complex code
> you've ever seen (it's in __bad_area_nosemaphore):
>
>  779 #ifdef CONFIG_X86_64
>  780 »·······if (mm && (error_code & PF_INSTR) && mm->context.vdso) {
>  781 »·······»·······if (regs->ip == (unsigned long)vgettimeofday) {
>  782 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, gettimeofday);
>  783 »·······»·······»·······return;
>  784 »·······»·······} else if (regs->ip == (unsigned long)vtime) {
>  785 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, clock_gettime);
>  786 »·······»·······»·······return;
>  787 »·······»·······} else if (regs->ip == (unsigned long)vgetcpu) {
>  788 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, getcpu);
>  789 »·······»·······»·······return;
>  790 »·······»·······}
>  791 »·······}
>  792 #endif

I like this approach, however since we're already in the kernel it
makes sense just to run the normal syscall instead of redirecting to
the vdso.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06  8:46   ` [PATCH v5 9/9] " Linus Torvalds
  2011-06-06  9:31     ` Andi Kleen
  2011-06-06 10:24     ` [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS Ingo Molnar
@ 2011-06-06 14:34     ` tip-bot for Ingo Molnar
  2 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Ingo Molnar @ 2011-06-06 14:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, hpa, linux-kernel, luto, andi, bp,
	arjan, mingo

Commit-ID:  1593843e2ada6d6832d0de4d633aacd997dc3a45
Gitweb:     http://git.kernel.org/tip/1593843e2ada6d6832d0de4d633aacd997dc3a45
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Mon, 6 Jun 2011 12:13:40 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 6 Jun 2011 12:13:40 +0200

x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS

Linus pointed out that the UNSAFE_VSYSCALL naming was inherently
bad: it suggests that there's something unsafe about enabling them,
while in reality they only have any security effect in the presence
of some *other* security hole.

So rename it to CONFIG_COMPAT_VSYSCALL and fix the documentation
and Kconfig text to correctly explain the purpose of this change.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/BANLkTimrhO8QfBqQsH_Q13ghRH2P%2BZP7AA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/feature-removal-schedule.txt |    7 ++++---
 arch/x86/Kconfig                           |   17 ++++++++++-------
 arch/x86/kernel/vsyscall_64.c              |    8 ++++----
 arch/x86/kernel/vsyscall_emu_64.S          |    2 +-
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 94b4470..4282ab2 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -601,10 +601,11 @@ Who:	Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 
 ----------------------------
 
-What:	CONFIG_UNSAFE_VSYSCALLS (x86_64)
+What:	CONFIG_COMPAT_VSYSCALLS (x86_64)
 When:	When glibc 2.14 or newer is ubitquitous.  Perhaps mid-2012.
-Why:	Having user-executable code at a fixed address is a security problem.
-	Turning off CONFIG_UNSAFE_VSYSCALLS mostly removes the risk but will
+Why:	Having user-executable syscall invoking code at a fixed addresses makes
+	it easier for attackers to exploit security holes.
+	Turning off CONFIG_COMPAT_VSYSCALLS mostly removes the risk but will
 	make the time() function slower on glibc versions 2.13 and below.
 Who:	Andy Lutomirski <luto@mit.edu>
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79e5d8a..30041d8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1646,20 +1646,23 @@ config COMPAT_VDSO
 
 	  If unsure, say Y.
 
-config UNSAFE_VSYSCALLS
+config COMPAT_VSYSCALLS
 	def_bool y
-	prompt "Unsafe fast legacy vsyscalls"
+	prompt "Fixed address legacy vsyscalls"
 	depends on X86_64
 	---help---
 	  Legacy user code expects to be able to issue three syscalls
-	  by calling fixed addresses in kernel space.  If you say N,
-	  then the kernel traps and emulates these calls.  If you say
-	  Y, then there is actual executable code at a fixed address
-	  to implement time() efficiently.
+	  by calling a fixed addresses.  If you say N, then the kernel
+	  traps and emulates these calls.  If you say Y, then there is
+	  actual executable code at a fixed address to implement time()
+	  efficiently.
 
 	  On a system with recent enough glibc (probably 2.14 or
 	  newer) and no static binaries, you can say N without a
-	  performance penalty to improve security
+	  performance penalty to improve security: having no fixed
+	  address userspace-executable syscall invoking code makes
+	  it harder for both remote and local attackers to exploit
+	  security holes.
 
 	  If unsure, say Y.
 
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 285af7a..27d49b7 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -116,7 +116,7 @@ static int al_to_vsyscall_nr(u8 al)
 	return -1;
 }
 
-#ifdef CONFIG_UNSAFE_VSYSCALLS
+#ifdef CONFIG_COMPAT_VSYSCALLS
 
 /* This will break when the xtime seconds get inaccurate, but that is
  * unlikely */
@@ -138,9 +138,9 @@ vtime(time_t *t)
 	return result;
 }
 
-#endif /* CONFIG_UNSAFE_VSYSCALLS */
+#endif /* CONFIG_COMPAT_VSYSCALLS */
 
-/* If CONFIG_UNSAFE_VSYSCALLS=y, then this is incorrect for vsyscall_nr == 1. */
+/* If CONFIG_COMPAT_VSYSCALLS=y, then this is incorrect for vsyscall_nr == 1. */
 static inline unsigned long vsyscall_intcc_addr(int vsyscall_nr)
 {
 	return VSYSCALL_START + 1024*vsyscall_nr + 2;
@@ -202,7 +202,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 		break;
 
 	case 1:
-#ifdef CONFIG_UNSAFE_VSYSCALLS
+#ifdef CONFIG_COMPAT_VSYSCALLS
 		warn_bad_vsyscall(KERN_WARNING, regs, "bogus time() vsyscall "
 				  "emulation (exploit attempt?)");
 		goto sigsegv;
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
index 7ebde61..2d53e26 100644
--- a/arch/x86/kernel/vsyscall_emu_64.S
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -25,7 +25,7 @@ ENTRY(vsyscall_0)
 	ret
 END(vsyscall_0)
 
-#ifndef CONFIG_UNSAFE_VSYSCALLS
+#ifndef CONFIG_COMPAT_VSYSCALLS
 .section .vsyscall_1, "a"
 ENTRY(vsyscall_1)
 	movb $0xce, %al

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 10:39       ` pageexec
  2011-06-06 13:56         ` Linus Torvalds
@ 2011-06-06 14:44         ` Ingo Molnar
  2011-06-06 15:01           ` pageexec
  2011-06-06 18:59           ` pageexec
  2011-06-06 14:52         ` Ingo Molnar
  2 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 14:44 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> > > Seriously. The whole patch series just seems annoying.
> 
> what is annoying is your covering up of security fixes on grounds 
> that you don't want to help script kiddies (a bullshit argument as 
> it were) but at the same time question proactive security measures 
> (one can debate the implementation, see my other mail) that would 
> *actually* prevent the same kiddies from writing textbook exploits.

You are mixing up several issues here, and rather unfairly so.

Firstly, see my other mail, there's an imperfect balance to be
found between statistical 'proactive' measures and the incentives
that remove the *real* bugs. You have not replied to that mail of
mine so can i assume that you concur and accept my points? If yes
then why are you still arguing the same thing?

Secondly, *once* a real security bug has been found the correct 
action is different from the considerations of proactive measures. 
How can you possibly draw equivalence between disclosure policies
and the handling of statistical security measures?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 10:39       ` pageexec
  2011-06-06 13:56         ` Linus Torvalds
  2011-06-06 14:44         ` Ingo Molnar
@ 2011-06-06 14:52         ` Ingo Molnar
  2 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 14:52 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 11:31, Andi Kleen wrote:
>
> > and assumes everyone is using glibc which is just wrong.
> 
> the libc is irrelevant, they can all be fixed up to use the vdso 
> entry points if they haven't been doing it already. already 
> deployed systems will simply continue to use their flawed kernel 
> and libc, they're not affected.

Correct, the libc is irrelevant here really - a distro will obviously 
enable or disable CONFIG_COMPAT_VSYSCALL=y based on the type and 
version of libc it is using.

This has been pointed out to Andi before.

Unfortunately Andi has been spouting nonsense in this thread without 
replying to mails that challenge his points, such as:

         http://marc.info/?l=linux-kernel&m=130686838202409
    and: http://marc.info/?l=linux-kernel&m=130686827002311
    and: http://marc.info/?l=linux-kernel&m=130687014804697

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 14:01             ` Linus Torvalds
@ 2011-06-06 14:55               ` pageexec
  2011-06-06 15:33                 ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 14:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 6 Jun 2011 at 23:01, Linus Torvalds wrote:

> On Mon, Jun 6, 2011 at 8:56 PM,  <pageexec@freemail.hu> wrote:
> >
> > it's *irrelevant*. this change you propose would go into future kernels,
> > it would not affect existing ones, obviously. therefore anyone possibly
> > affected would have to update his kernel first at which point they have
> > no excuse to not update their libc of whatever flavour as well.
> 
> Christ.
> 
> Heard about that thing called "backwards compatibility"?
> 
> We *require* that those "future kernels" run the old unmodified binaries.

both Andy's and my approach work with 'old unmodified binaries'. by virtue
of those 'new modified binaries' not existing (hint: glibc's intentionally
not changed for static binaries).

all the compatibility talk is about performance impact, not black&white 'runs
or fails'. but more on this 'requirement' of yours below. you just so shot
yourself in the foot, it's not even funny ;).

> Yeah, I know you don't have that requirement,

you don't know jack then. i do allow old binaries to work for every PaX feature
i've ever introduced, even at the non-considerable expense of having to make
special cases for them.

> but anything that actually wants to be considered *relevant* and
> actually merged into a mainline kernel does have that requirement. So
> your argument is utter crap, ignorant, and stupid. 

watch your words Linus, you're about to eat them ;).

> No, we don't update any libraries for a kernel upgrade. Ever. End of story.

then you *must* revert the utterly *wrong* heap/stack gap 'fix' of yours that
you cooked up without any public discussion a year ago and have been 'fixing'
it for various userland breakage ever since.

you know what you did? you *broke* a userland API, namely /proc/pid/maps. you
broke it so badly that it breaks every app that wanted to do its own stack
expansion tracking.

one particular case i'm aware of is the Sun JVM that tries to map a guard page
below what it thinks is the stack. except thanks to your very broken idea of
lying about it in maps, it will actually map *over* the real last page of the
stack, effectively *moving* its lower bound up, without having intended to do
so (and without having done so under earlier kernels).

in other words, a simple userland mmap request can now change *another* map.
*that* is a clear violation of your own principles (not that i think you have
any, your argument style is to throw out random shit whenever you think it
serves your purpose, and not able to defend it when faced with real argumemts).

but that's still not the end of the story. whenever userland code hits that
carelessly hidden guard page, it'll cause a page fault that the JVM's segfault
handler can't identify as its own since the fault didn't occur in what it
thinks is the stack guard page. a really brilliant solution Linus, you must
be very proud of it. what a pity that now you get to revert the whole shit
and implement it properly (i don't need to tell you where you can find such
a working solution, do i).

cheers,

 PaX Team


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 14:44         ` Ingo Molnar
@ 2011-06-06 15:01           ` pageexec
  2011-06-06 15:15             ` Ingo Molnar
  2011-06-06 18:59           ` pageexec
  1 sibling, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 15:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 16:44, Ingo Molnar wrote:

> You have not replied to that mail of
> mine so can i assume that you concur and accept my points?

my bandwidth/quota for replying to idiocy is limited (and is
close to exhaustion for today ;), be patient, i'll reply to
you as well.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 15:01           ` pageexec
@ 2011-06-06 15:15             ` Ingo Molnar
  2011-06-06 15:29               ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:15 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 16:44, Ingo Molnar wrote:
> 
> > You have not replied to that mail of
> > mine so can i assume that you concur and accept my points?
> 
> my bandwidth/quota for replying to idiocy is limited (and is
> close to exhaustion for today ;), be patient, i'll reply to
> you as well.

You might want to save the insults to after we are done with the 
discussion.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 13:58               ` pageexec
  2011-06-06 14:07                 ` Brian Gerst
@ 2011-06-06 15:26                 ` Ingo Molnar
  2011-06-06 15:48                   ` pageexec
  1 sibling, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:26 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> > > a page fault is never a fast path, after all the cpu has just 
> > > taken an exception (vs. the syscall/sysenter style actually 
> > > fast user->kernel transition) and is about to make page table 
> > > changes (and possibly TLB flushes).
> >
> > Sure it is.  It's a path that's optimized carefully and needs to 
> > be as fast as possible.  Just because it's annoyingly slow 
> > doesn't mean we get to make it even slower.
> 
> sorry, but stating that the pf handler is a fast path doesn't make 
> it so ;) [...]

Are you talking about the Linux kernel?

FYI, incredible amount of work has gone into making pagefaults as 
fast and scalable as possible. If you are following Linux kernel 
development you'd have to be virtually blind to not see all that 
effort.

[ And yes, serious amount of work has gone into the hardware side as
  well - P4's used to suck *really* bad at pagefaults. ]

You claiming that pagefaults are a slow-path and ridiculing those who 
point out your mistake does not make it a slowpath.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 15:15             ` Ingo Molnar
@ 2011-06-06 15:29               ` pageexec
  2011-06-06 16:54                 ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 15:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 17:15, Ingo Molnar wrote:

> You might want to save the insults to after we are done with the 
> discussion.

haha, Ingo, seriously, you wrote the above 20 minutes after this one?

> Unfortunately Andi has been spouting nonsense in this thread without 
> replying to mails that challenge his points, such as:

tell you what Ingo, heed your own advice. better, you can even keep it
to yourself. i do whatever i want, including what you reserve for
seemingly yourself only.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 14:55               ` pageexec
@ 2011-06-06 15:33                 ` Ingo Molnar
  2011-06-06 15:58                   ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:33 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> > No, we don't update any libraries for a kernel upgrade. Ever. End 
> > of story.
> 
> then you *must* revert the utterly *wrong* heap/stack gap 'fix' of 
> yours that you cooked up without any public discussion a year ago 
> and have been 'fixing' it for various userland breakage ever since.

Is it this commit:

 320b2b8de126: mm: keep a guard page below a grow-down stack segment

?

It has a few followup fixes indeed:

 a1fde08c74e9: VM: skip the stack guard page lookup in get_user_pages only for mlock
 a626ca6a6564: vm: fix vm_pgoff wrap in stack expansion
 95042f9eb78a: vm: fix mlock() on stack guard page
 0e8e50e20c83: mm: make stack guard page logic use vm_prev pointer
 7798330ac811: mm: make the mlock() stack guard page checks stricter
 d7824370e263: mm: fix up some user-visible effects of the stack guard page
 11ac552477e3: mm: fix page table unmap for stack guard page properly
 5528f9132cf6: mm: fix missing page table unmap for stack guard page failure case

But you say that there's a Sun JVM breakage still left, right? Is 
there a bugzilla # or simple .c reproducer for that?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06  9:42       ` pageexec
  2011-06-06 11:19         ` Andrew Lutomirski
@ 2011-06-06 15:41         ` Ingo Molnar
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:41 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> > I can't see any problem, but exploit writers are exceedingly 
> > clever, and maybe someone has a use for a piece of the code that 
> > isn't a syscall.  Just as a completely artificial example, here's 
> > some buggy code:
> 
> what you're describing here is a classical ret2libc (in modern 
> marketing speak, ROP) attack. in general, having an executable ret 
> insn (with an optional pop even) at a fixed address is very useful, 
> especially for the all too classical case of stack overflows where 
> the attacker may already know of a 'good' function pointer 
> somewhere on the stack but in order to have the cpu reach it, he 
> needs to pop enough bytes off of it. guess what they'll use this 
> ret at a fixed address for...

Good point and i agree that we should get rid of the RETQ there. The 
do_intcc() code can fetch the return address without much fuss - this 
is much faster than doing a #PF.

Please keep reviewing these patches, the security-technical aspects 
of your reviews are extremely useful.

> imho, moving everything to and executing from the vdso page is the 
> only viable solution if you really want to fix the security aspect 
> of the vsyscall mess. it's worked fine for PaX for years now ;).

FYI, this probably means that no-one ever benchmared postgresql 
scalability on a PaX kernel i suspect? Past versions of postgresql 
would big time if you drive the vsyscall time() through through a
#PF ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 15:26                 ` Ingo Molnar
@ 2011-06-06 15:48                   ` pageexec
  2011-06-06 15:59                     ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 15:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 6 Jun 2011 at 17:26, Ingo Molnar wrote:

> 
> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > > > a page fault is never a fast path, after all the cpu has just 
> > > > taken an exception (vs. the syscall/sysenter style actually 
> > > > fast user->kernel transition) and is about to make page table 
> > > > changes (and possibly TLB flushes).
> > >
> > > Sure it is.  It's a path that's optimized carefully and needs to 
> > > be as fast as possible.  Just because it's annoyingly slow 
> > > doesn't mean we get to make it even slower.
> > 
> > sorry, but stating that the pf handler is a fast path doesn't make 
> > it so ;) [...]
> 
> Are you talking about the Linux kernel?

yes, what else? ;)

> FYI, incredible amount of work has gone into making pagefaults as 
> fast and scalable as possible.

i wasn't talking about scalability (it's irrelevant anyway here), only
speed. you're the man with the (in)famous measuring stick, tell me how
many cycles it takes to serve a C-O-W fault or instantiation of an anon
page and a file page (both cached and non-cached). then stick a few well
predicted conditional branches in that path (something that wouldn't
actually happen with my code but let's play this out fully) and show me
measured differences. then we'll talk about the 'impact on fast path'.

> If you are following Linux kernel development you'd have to be
> virtually blind to not see all that effort. 

nice attempt to question me but it won't get you anywhere (didn't i just
tell you to heed your own advice?). case in point, i'd fixed the anon_vma
refcount leak (exploitable and silently fixed as usual in mainline) a
few months before you guys did. just by reading the code. now show me
that you know your stuff around and can find the commit i'm talking about.
unless of course you are not "following Linux kernel development" and
are "virtually blind to not see all that effort".
 
> You claiming that pagefaults are a slow-path and ridiculing those who 
> point out your mistake does not make it a slowpath.

you must have realized by now that slow/fast path in this context were
relative to int xx vs. pf processing. and i'm yet to receive the numbers
that show how much faster the latter is processed than the former and why
sticking a few well predictable conditional jumps (no really, they would
not be there, but let's assume they are) shows a measurable impact on
these all so fast pf paths.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 15:33                 ` Ingo Molnar
@ 2011-06-06 15:58                   ` pageexec
  0 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-06 15:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 6 Jun 2011 at 17:33, Ingo Molnar wrote:

> Is it this commit:
> 
>  320b2b8de126: mm: keep a guard page below a grow-down stack segment

yes and all the related ones.

> But you say that there's a Sun JVM breakage still left, right? Is 
> there a bugzilla # or simple .c reproducer for that?

i don't know if only that JVM is affected, the fact is that breaking
the maps API breaks everyone who relied on it the same way.

also it's not fixable without reverting the *entire* approach. see,
it's very simple: if the kernel lies about the stack boundary, it
breaks the JVM and similar approaches, if it doesn't lie about it
then it breaks other apps as you already found out.

as for bz/reproduction, neither exists, i read the JVM code carefully
at the time (had actually remembered from other times) and just went
ahead and fixed it properly in PaX.

for reproduction you'd have to trigger a stack overflow (not to be
confused with a buffer overflow) on the main jvm thread, iirc, i have
no idea how to pull that off. but you can easily write a small test
app based on what i explained and test it but i hope it's obvious
how the JVM logic breaks down with the maps changes.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 15:48                   ` pageexec
@ 2011-06-06 15:59                     ` Ingo Molnar
  2011-06-06 16:19                       ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:59 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 17:26, Ingo Molnar wrote:
> 
> > 
> > * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > 
> > > > > a page fault is never a fast path, after all the cpu has just 
> > > > > taken an exception (vs. the syscall/sysenter style actually 
> > > > > fast user->kernel transition) and is about to make page table 
> > > > > changes (and possibly TLB flushes).
> > > >
> > > > Sure it is.  It's a path that's optimized carefully and needs to 
> > > > be as fast as possible.  Just because it's annoyingly slow 
> > > > doesn't mean we get to make it even slower.
> > > 
> > > sorry, but stating that the pf handler is a fast path doesn't 
> > > make it so ;) [...]
> > 
> > Are you talking about the Linux kernel?
> 
> yes, what else? ;)

Dunno, Windows perhaps? You were talking about a page fault handler 
that was a slowpath, you cannot possibly have meant Linux with that.

> > FYI, incredible amount of work has gone into making pagefaults as 
> > fast and scalable as possible.
> 
> i wasn't talking about scalability (it's irrelevant anyway here), 
> only speed. [...]

Which part of "fast and scalable" did you not understand?

Just a couple of days ago i noticed a single cycle inefficiency in 
the pagefault fastpath, introduced in the 3.0 merge window. I 
requested (and got) an urgent fix for that:

 b80ef10e84d8: x86: Move do_page_fault()'s error path under unlikely()

 | Ingo suggested SIGKILL check should be moved into slowpath
 | function. This will reduce the page fault fastpath impact
 | of this recent commit:
 |
 |   37b23e0525d3: x86,mm: make pagefault killable

I treated it as a performance regression.

So i ask you again, what is your basis for calling the #PF path on 
Linux a 'slowpath'? Is Linus's and my word and 5 years of Git history 
showing that it's optimized as a fastpath not enough proof for you?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 15:59                     ` Ingo Molnar
@ 2011-06-06 16:19                       ` pageexec
  2011-06-06 16:47                         ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 16:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 6 Jun 2011 at 17:59, Ingo Molnar wrote:

> > > FYI, incredible amount of work has gone into making pagefaults as 
> > > fast and scalable as possible.
> > 
> > i wasn't talking about scalability (it's irrelevant anyway here), 
> > only speed. [...]
> 
> Which part of "fast and scalable" did you not understand?

uhm, not sure why you're so worked up here. is it because i said
'scalability' was completely irrelevant for the nx vsyscall page
approach? elaborate!

> Just a couple of days ago i noticed a single cycle inefficiency in 
> the pagefault fastpath, introduced in the 3.0 merge window. I 
> requested (and got) an urgent fix for that:

so you must have measurements. what's the mentioned page faults take
in cycles before/after your fix? what's a normal int xx path take?

> So i ask you again, what is your basis for calling the #PF path on 
> Linux a 'slowpath'? Is Linus's and my word and 5 years of Git history 
> showing that it's optimized as a fastpath not enough proof for you?

which part of

> you must have realized by now that slow/fast path in this context were
> relative to int xx vs. pf processing.

did you not understand? do you have "Linus's and my word and 5 years of
Git history" to show that pf processing is seriously that much faster
than int xx processing?


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 16:19                       ` pageexec
@ 2011-06-06 16:47                         ` Ingo Molnar
  2011-06-06 22:49                           ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 16:47 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 17:59, Ingo Molnar wrote:
> 
> > > > FYI, incredible amount of work has gone into making pagefaults as 
> > > > fast and scalable as possible.
> > > 
> > > i wasn't talking about scalability (it's irrelevant anyway here), 
> > > only speed. [...]
> > 
> > Which part of "fast and scalable" did you not understand?
> 
> uhm, not sure why you're so worked up here. is it because i said
> 'scalability' was completely irrelevant for the nx vsyscall page
> approach? elaborate!

Firstly, 'fast' is a necessary first step towards good scalability, 
secondly i was talking about *both* speed and scalability so your 
insistence to only discuss speed is banging on open doors ...

You are simply wrong about:

> > > sorry, but stating that the pf handler is a fast path doesn't 
> > > make it so ;).

and 5-6 mails down the line you are still unwilling to admit it. Why?

A fastpath is defined by optimization considerations applied to a 
codepath (the priority it gets compared to other codepaths), *not* by 
its absolute performance.

For example even though kmalloc() is about two orders of magnitude 
slower than the an unlikely() branch in the scheduler wakeup path, it 
is still kmalloc() that is the fastpath and the unlikely() branch in 
try_to_wake_up() is a slowpath.

You seem to be confused on several levels here.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 15:29               ` pageexec
@ 2011-06-06 16:54                 ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 16:54 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 17:15, Ingo Molnar wrote:

> > pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> >
> > > my bandwidth/quota for replying to idiocy is limited (and is 
> > > close to exhaustion for today ;), be patient, i'll reply to
> > > you as well.
> > 
> > You might want to save the insults to after we are done with the 
> > discussion.
> 
> haha, Ingo, seriously, you wrote the above 20 minutes after this 
> one?
>
> > Unfortunately Andi has been spouting nonsense in this thread 
> > without replying to mails that challenge his points, such as:
> 
> tell you what Ingo, heed your own advice. better, you can even keep 
> it to yourself. i do whatever i want, including what you reserve 
> for seemingly yourself only.

The difference is that:

 - You wrote an insult without waiting for the discussion to come to
   a conclusion. I think you are wrong and i am willing to argue it.

 - Andi first tried to injected fear, uncertainty and doubt into the
   discussion a week ago, then ignored 3 mails from 3 separate people
   and thus when he repeated his nonsense today (while still ignoring
   the feedback he got) i sure can call his opinion 'nonsense'.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 12:47         ` Ingo Molnar
  2011-06-06 12:48           ` Ingo Molnar
@ 2011-06-06 18:04           ` pageexec
  2011-06-06 19:12             ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 18:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra

On 6 Jun 2011 at 14:47, Ingo Molnar wrote:

> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > [...] does that mean that you guys would accept a patch that would 
> > map the vdso at a fixed address for old times's sake? if not, on 
> > what grounds would you refuse it? see, you can't have it both ways.
> 
> You can actually do that by enabling CONFIG_COMPAT_VDSO=y.

as you noted later, we're talking about amd64 here. but ignoring that,
let's see what you've just shown here.

1. why does CONFIG_COMPAT_VDSO exist?

because you guys realized some time in the past, after several public
exploits, that keeping known code at fixed addresses wasn't the brightest
of ideas. so you implemented without much if any resistance vdso
randomization and kept a backwards compatibility option for userland
that knew better and relied on those fixed addresses.

sound familiar? security issue with known code/addresses triggering move
to randomization? right in this patch series! why lie about it then and
paint it something else than what it is? oh yes, covering up security
related fixes/changes is a long held tradition in kernel circles.

2. who enables CONFIG_COMPAT_VDSO?

RHEL? Fedora? SLES? Debian? Ubuntu? (i don't know, i'm asking)

and whoever enables them, what do you think they're more likely to get in
return? some random and rare old binaries that still run for a minuscule
subset of users or every run-of-the-mill exploit working against *every*
user, metasploit style (did you know that it has a specific target for
the i386 compat vdso)?

so once again, tell me whether the randomized placement of the vdso wasn't
about security (in which case can we please have it back at a fixed mmap'd
address, since it doesn't matter for security you have no reason to refuse ;).

> > the fixed address of the vsyscall page *is* a very real security 
> > problem, it should have never been accepted as such and it's high 
> > time it went away finally in 2011AD.
> 
> It's only a security problem if there's a security hole elsewhere.

it's not an 'if', there *is* a security hole 'elsewhere', else the CVE
list had been abandoned long ago and noone would be doing proactive
security measures such as intrusion prevention mechanisms.

so it *is* a security problem.

> The thing is, and i'm not sure whether you realize or recognize it, 
> but these measures *are* two-edged swords.

they aren't, see below why.

> Yes, the upside is that they reduce the risks associated with 
> security holes - but only statistically so.

not sure what 'these measures' are here (if you mean ASLR related ones,
please say so), some are randomization based (so their impact on security
is probabilistic), some aren't (their impact is deterministic).

> The downside is that having such a measure in place makes it somewhat 
> less likely that those bugs will be found and fixed in the future:

i'm not sure i follow you here, it seems to me that you're mixing up
bug finding/fixing with exploit development and prevention measures.

these things are orthogonal to each other and neither affects the other
unless they're perfect (which neither side is). i.e., if we could find
all bugs, intrusion prevention and exploit writing would die out. or if
we could exploit all bugs under any circumstances, intrusion prevention
would die out. or if we could defeat all exploit techniques, exploit
writing would die out, etc. but there's no such perfection in the real
world.

so you can go find and fix bugs without ever writing exploits for them
or without ever implementing countermeasures against exploit techniques
for a given bug class (actually, it's not even correct to put it this way,
exploit techniques are orthogonal to bug classes, a bug can be exploited
by several techniques and an exploit technique can be used against different
kinds of bugs, so prevention mechanisms like ASLR are against techniques,
not bugs, for the latter we have to do some kind of analysis/instrumentation).

also not finding or fixing bugs in the presence of intrusion prevention
mechanisms means that an exploited bug is (usually) transformed into a
some kind of denial of service problem, not something you can go easy
about if you have paying customers and/or vocal users. so having such
measures is not reason to become lax about finding and/or fixing bugs.
what these measures buy you (your customers/users, that is) are time
and reduced risk of getting owned.

> if a bug is not exploitable then people like Spender wont spend time
> exploiting and making a big deal out of them, right? 

i'm not sure i get this example, if a bug is not exploitable, how could
anyone possibly spend time on, well, exploiting it?

btw, what's with this being fixed on specific individuals, circus and
what not? do you seriously base your decision about fixing bugs whether
you hear about them in the news? or are your collective egos being hurt
by showing the world what kind of facade you put up when you talk about
'full disclosure' but cover up security fixes? also, i never understood
the circus part, can you tell me what exactly you find in the security
world as 'circus'? specific examples will do.

> And yes, it might be embarrasing to see easy exploits and we might 
> roll eyes at the associated self-promotion circus but it will be one 
> more bug found, the reasons for the bug will be examined, potentially 
> avoiding a whole class of similar bugs *for sure*.

it's a nice theory, it has never worked anywhere (just look at OpenBSD ;).
show me a single class of bugs that you think you'd fixed in linux. for
that you'd have to know about them, try CWE (not to be confused with CVE)
in google.

in the meantime i can tell you what you did not fix *for sure*:

- use-after-free bugs
- double free bugs
- heap buffer overflows
- stack buffer overflows
- stack overflows (yes, it's not the buffer overflow kind)
- refcount overflows (as a subset of user-after-free bugs)
- integer overflows and wraparounds
- information leaking from heap/stack areas
- bugs resulting from undefined behaviour in C
- resource exhaustion bugs
- etc

> Can you guarantee that security bugs will be found and fixed with the 
> same kind of intensity even if we make their exploitation (much) 
> harder? I don't think you can make such a guarantee.

why would *i* have to guarantee anything? i'm not santa claus or something ;).
i'm not even into the business of finding & fixing bugs, i, at most, fix stuff
i (or users) run across while developing PaX but i don't go out of my way and
audit the kernel (or anything else) for bugs. life's too short and i placed my
bets long ago on intrusion prevention ;).

but if you're speaking of a hypothetical 'you', i think i explained above why
these processes are independent. also this particular feature (getting rid of
the vsyscall) is a very small dent in the exploit writers arsenal, it's an
anti-script kiddie measure at most and a feature box you can tick off when you
talk about 'full ASLR'. real exploit writers will continue to find info leaking
bugs, use brute forcing, heap/JIT spraying, and other techniques.

> So as long as we are trading bugs-fixed-for-sure against statistical 
> safety we have to be mindful of the downsides of such a tradeoff ...

while i'm still trying to put together the argument you're making, i hope you're
not saying that leaving users in exploitable conditions is actually *better*
for security...


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 13:56         ` Linus Torvalds
@ 2011-06-06 18:46           ` pageexec
  2011-06-06 20:40             ` Linus Torvalds
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 18:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Andy Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 22:56, Linus Torvalds wrote:

> On Mon, Jun 6, 2011 at 7:39 PM,  <pageexec@freemail.hu> wrote:
> >
> > what is annoying is your covering up of security fixes on grounds that you don't want
> > to help script kiddies (a bullshit argument as it were) but at the same time question
> > proactive security measures (one can debate the implementation, see my other mail) that
> > would *actually* prevent the same kiddies from writing textbook exploits.
> 
> Shut up unless you have any real arguments. I know you have your
> hangups, and I just don't care.

i have real arguments, i told them to you but i have yet to see anything
expect silly name calling from you. is that the best you can do? seriously?

> Calling the old vdso "UNSAFE" as a config option is just plain stupid.
> t's a politicized name, with no good reason except for your political
> agenda. And when I call it out as such, you just spout the same tired
> old security nonsense.

i didn't choose this name, Andy did but i happen to agree with it. whether
you like it or not is frankly and quite obviously irrelevant to me ;). as
for political agenda, tell me more, i'd like to know what it is. exposing
your lies to the public about doing full disclosure but still covering up
the security fixes is not politics, it's called honesty. not yours, mine.
maybe that's what bothers you.

> I'm happy with perhaps moving away from the fixed-address vdso,

it's not about the vdso that has been mmap'ed and randomized for quite some
time now. it's about the amd64 specific vsyscall page.

> but that does not excuse bad naming and non-descriptive crap like the
> feature-removal thing, and all the insanity going on in the thread. If
> the config option is about removing the legacy vdso, then CALL IT
> THAT, instead of spouting idiotic and irrelevant nonsense.

noone wants to remove the legacy vdso as one can simply configure out that
option already. it's about introducing a similar option for vsyscall.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 14:44         ` Ingo Molnar
  2011-06-06 15:01           ` pageexec
@ 2011-06-06 18:59           ` pageexec
  2011-06-06 19:25             ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-06 18:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 16:44, Ingo Molnar wrote:
> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > > > Seriously. The whole patch series just seems annoying.
> > 
> > what is annoying is your covering up of security fixes on grounds 
> > that you don't want to help script kiddies (a bullshit argument as 
> > it were) but at the same time question proactive security measures 
> > (one can debate the implementation, see my other mail) that would 
> > *actually* prevent the same kiddies from writing textbook exploits.
> 
> You are mixing up several issues here, and rather unfairly so.

but it's very simple logic Ingo. it goes like 'I am not willing to
do A because it would help script kiddies but I'd rather do B that
would help script kiddies'. with A = 'disclose security bugs' and
B = 'keep the last roadblock that prevents full ASLR'.

if someone's that worried about script kiddies as Linus claims to be
(which i always called a BS argument, but let's accept here), he can't
possibly argue for keeping the vsyscall page at a fixed address around,
simple as that.

and it is for security, no other reason, else you'd have to accept a patch
that maps the vdso at a fixed address again or come up with some very
convincing arguments why the vdso must stay randomized but the vsyscall
page is fine at a fixed address (i guess neither is forthcoming but you
guys can act in surprising ways, so i'm not placing any bets ;).

> Firstly, see my other mail, there's an imperfect balance to be
> found between statistical 'proactive' measures and the incentives
> that remove the *real* bugs.

i hope i replied to this already now to your satisfaction else feel free
to elaboarte.

> Secondly, *once* a real security bug has been found the correct 
> action is different from the considerations of proactive measures. 

as i said already, you're mixing up fixing bugs and fighting exploit
techniques. apples vs. oranges.

> How can you possibly draw equivalence between disclosure policies
> and the handling of statistical security measures?

see the simple logic above.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 18:04           ` pageexec
@ 2011-06-06 19:12             ` Ingo Molnar
  2011-06-07  0:02               ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 19:12 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 14:47, Ingo Molnar wrote:
> 
> > * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > > [...] does that mean that you guys would accept a patch that would 
> > > map the vdso at a fixed address for old times's sake? if not, on 
> > > what grounds would you refuse it? see, you can't have it both ways.
> > 
> > You can actually do that by enabling CONFIG_COMPAT_VDSO=y.

> [...]
> 
> 2. who enables CONFIG_COMPAT_VDSO?
> 
> RHEL? Fedora? SLES? Debian? Ubuntu? (i don't know, i'm asking)

Fedora has not enabled it for a long time.

> and whoever enables them, what do you think they're more likely to 
> get in return? some random and rare old binaries that still run for 
> a minuscule subset of users or every run-of-the-mill exploit 
> working against *every* user, metasploit style (did you know that 
> it has a specific target for the i386 compat vdso)?

That's what binary compatibility means, yes.

> so once again, tell me whether the randomized placement of the vdso 
> wasn't about security (in which case can we please have it back at 
> a fixed mmap'd address, since it doesn't matter for security you 
> have no reason to refuse ;).

It's a statistical security measure, and was such a measure from the 
day it was committed:

 | commit e6e5494cb23d1933735ee47cc674ffe1c4afed6f
 | Author: Ingo Molnar <mingo@elte.hu>
 | Date:   Tue Jun 27 02:53:50 2006 -0700
 |
 |    [PATCH] vdso: randomize the i386 vDSO by moving it into a vma
 |    
 |    Move the i386 VDSO down into a vma and thus randomize it.
 |    
 |    Besides the security implications, this feature also helps debuggers, which
 |    can COW a vma-backed VDSO just like a normal DSO and can thus do
 |    single-stepping and other debugging features.

So what's your point?

> > > the fixed address of the vsyscall page *is* a very real 
> > > security problem, it should have never been accepted as such 
> > > and it's high time it went away finally in 2011AD.
> > 
> > It's only a security problem if there's a security hole 
> > elsewhere.
> 
> it's not an 'if', there *is* a security hole 'elsewhere', else the 
> CVE list had been abandoned long ago and noone would be doing 
> proactive security measures such as intrusion prevention 
> mechanisms.
> 
> so it *is* a security problem.

Two arguments.

Firstly, you generalize too much, it's only a security problem if you 
actually have an attack surface:

  Many Linux systems don't have any: non-networked appliances that 
  are not physically connected to any hostile medium.

  For such a system a gaping root hole bug is *not even a bug*, while 
  a rare memory leak that you'd shrug off on a desktop might be a 
  showstopper.

Secondly, and more importantly, we try to maintain the kernel in a 
way so that it can converge to a no bugs state in the long run.

You can only do that by making sure that even in the very last 
stages, when there are virtually no bugs left, the incentives and 
mechanisms are still there to fix even those bugs.

If we add obstruction features that turn bugs into less severe 
statistical bugs then that automatically reduces the speed of 
convergence.

We might still do it, but you have to see and acknowledge that it's a 
*cost*. You seem to argue that it's a bona fide bug and that the fix 
is deterministic that it "needs fixing" - and that is wrong on both 
counts.

You generally seem to assume that security is an absolute goal with 
no costs attached.

> > The thing is, and i'm not sure whether you realize or recognize 
> > it, but these measures *are* two-edged swords.
> 
> they aren't, see below why.
> 
> > Yes, the upside is that they reduce the risks associated with 
> > security holes - but only statistically so.
> 
> not sure what 'these measures' are here (if you mean ASLR related 
> ones, please say so), some are randomization based (so their impact 
> on security is probabilistic), some aren't (their impact is 
> deterministic).

Which of these changes are deterministic?

Removing a syscall or a RET from a fixed address is *still* only a 
probabilistic fix: the attacker can still do brute-force attacks 
against the many executable pages in user-space, even if everything 
is ASLR obfuscated.

> > The downside is that having such a measure in place makes it 
> > somewhat less likely that those bugs will be found and fixed in 
> > the future:
> 
> i'm not sure i follow you here, it seems to me that you're mixing 
> up bug finding/fixing with exploit development and prevention 
> measures.

It helps if you read the bit i provided after the colon:

  > > if a bug is not exploitable then people like Spender wont spend 
  > > time exploiting and making a big deal out of them, right?

If a bug is hidden via ASLR (and *all* of the changes in this thread 
had only that effect) and can not be exploited using the simple fixed 
address techniques disabled by these patches, then people like you or 
Spender wont spend time exploiting them, right?

But it can still be exploited brute-force: just cycle through the 
possible addresses until you find the right instruction that elevates 
privileges.

> > And yes, it might be embarrasing to see easy exploits and we 
> > might roll eyes at the associated self-promotion circus but it 
> > will be one more bug found, the reasons for the bug will be 
> > examined, potentially avoiding a whole class of similar bugs *for 
> > sure*.
> 
> it's a nice theory, it has never worked anywhere (just look at 
> OpenBSD ;). show me a single class of bugs that you think you'd 
> fixed in linux. [...]

For example after this meta-fix:

  c41d68a: compat: Make compat_alloc_user_space() incorporate the access_ok()

We certainly have eliminated the class of bugs where we'd return 
out-of-bounds pointers allocated via compat_alloc_user_space() and 
exploited via large or negative 'len' values.

> > Can you guarantee that security bugs will be found and fixed with 
> > the same kind of intensity even if we make their exploitation 
> > (much) harder? I don't think you can make such a guarantee.
> 
> why would *i* have to guarantee anything? [...]

It was an generic/indefinite 'you'.

To understand my point you need to look at the context i replied to:

 > > > the fixed address of the vsyscall page *is* a very real 
 > > > security problem, it should have never been accepted as such 
 > > > and it's high time it went away finally in 2011AD.

You claimed that it is a very real security problem. I pointed out 
that this is not a real primary fix for some security bug but a 
statistical method that makes exploits of other bugs harder (but not 
impossible), and as such it has the cost of making *real* fixes 
slower to arrive.

I don't think this was a terribly complicated argument, yet you do 
not even seem to acknowledge that it exists.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 18:59           ` pageexec
@ 2011-06-06 19:25             ` Ingo Molnar
  2011-06-07  0:34               ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 19:25 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> [...] it goes like 'I am not willing to do A because it would help 
> script kiddies but I'd rather do B that would help script kiddies'. 
> with A = 'disclose security bugs' and B = 'keep the last roadblock 
> that prevents full ASLR'.

No, that's wrong, the logic goes like this:

  if i do A then it has X1 advantages and Y1 disadvantages.
  if i do B then it has X2 advantages and Y2 disadvantages.

The Y1 and Y2 set of disadvantages can both include "making it easier 
for script kiddies" but the sets of advantages and disadvantages can 
also include MANY OTHER considerations, making the decision unique in 
each case.

To translate it to this specific case (extremely simplifed, so please 
don't nit-pick that my descriptions of advantages and disadvantages 
are not precise nor complete):

 A) "i put a zero day exploit and a CVE code into a changelog"

     Advantages: - it describes the problem more fully

  Disadvantages: - it makes it easier for people (including script kiddies) do harm faster
                 - creates a false, misleading category for "security bugs"

 B) "i obfuscate the vsyscall page"

     Advantages: - it makes it statistically harder for people (including script kiddies) to do harm

  Disadvantages: - it reduces the incentive to fix *real* security bugs
                 - complicates the code

Do you see how A) and B) are not equivalent at all? Different cases, 
different attributes, different probabilities and different 
considerations.

> but it's very simple logic Ingo.

Please drop the condescending tone, i think it should be clear to you 
by now that i have a good basis to disagree with you.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 18:46           ` pageexec
@ 2011-06-06 20:40             ` Linus Torvalds
  2011-06-06 20:51               ` Andrew Lutomirski
                                 ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Linus Torvalds @ 2011-06-06 20:40 UTC (permalink / raw)
  To: pageexec
  Cc: Andi Kleen, Andy Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Tue, Jun 7, 2011 at 3:46 AM,  <pageexec@freemail.hu> wrote:
>
>> I'm happy with perhaps moving away from the fixed-address vdso,
>
> it's not about the vdso that has been mmap'ed and randomized for quite some
> time now. it's about the amd64 specific vsyscall page.

Duh. What do you think that thing is? It's a special fixed-address
vdso. Stop the whole jumping from issue to issue and making up random
irrelevant arguments. First it was you jumping up and down about
"covering up security issues", now you start instead complaining about
some random word choice. Stop it.

What I complain about in the patch series was (specifically) that I
think the naming sucks and (non-specifically) that the whole series is
annoying.

The config name is misleading and pointlessly scary - the whole thing
is not in itself "unsafe", so calling it that is just wrong. If we
want to make it a legacy option that you can turn off (which sounds
sane in itself), then name it that way. But if so, the name and
explanation should be that it's about legacy stuff and that you can
only do so once it's no longer used. Not "UNSAFE", which it isn't.

We *definitely* don't want to name it in a way that makes some random
person just turn it off because it's scary, since the random person
*shouldn't* turn it off today. Comprende?

And the annoying part about the whole patch series is how the whole
re-sending has gone on forever. Just pick some approach, do it, and
don't even bother making it a config option for now. If we can replace
the vsyscall page with a page fault or int3 or whatever, and it's only
used for the 'time()' system call, just do it.

The series is now extended with the cleanup patches so the end result
looks reasonable, but why have the whole "first implement it, then
clean it up" and sending it as a whole series. That's annoying. Just
send the cleaned-up end result to begin with.

                     Linus

PS. The reason you don't see direct replies seems to be this from gmail:

     ----- The following addresses had permanent fatal errors -----
    <pageexec@freemail.hu>
       (reason: 553 sorry, that domain isn't in my list of allowed
rcpthosts (#5.7.1))

which is probably because some spamming or other bad behavior from
within the same domain.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 20:40             ` Linus Torvalds
@ 2011-06-06 20:51               ` Andrew Lutomirski
  2011-06-06 21:54                 ` Ingo Molnar
  2011-06-06 21:45               ` Ingo Molnar
  2011-06-06 21:53               ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule pageexec
  2 siblings, 1 reply; 112+ messages in thread
From: Andrew Lutomirski @ 2011-06-06 20:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: pageexec, Andi Kleen, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Mon, Jun 6, 2011 at 4:40 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:

> We *definitely* don't want to name it in a way that makes some random
> person just turn it off because it's scary, since the random person
> *shouldn't* turn it off today. Comprende?

Yes, and fixed in the cleaned up version.

>
> And the annoying part about the whole patch series is how the whole
> re-sending has gone on forever.

If I have the patch-resending protocol wrong, please enlighten me.
I'm not sure how to make future work less annoying.

> Just pick some approach, do it, and
> don't even bother making it a config option for now. If we can replace
> the vsyscall page with a page fault or int3 or whatever, and it's only
> used for the 'time()' system call, just do it.

Really?

I won't personally complain about the 200+ ns hit, but I'm sure
someone will cc: me on a regression report if there's no option.

>
> The series is now extended with the cleanup patches so the end result
> looks reasonable, but why have the whole "first implement it, then
> clean it up" and sending it as a whole series. That's annoying. Just
> send the cleaned-up end result to begin with.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 20:40             ` Linus Torvalds
  2011-06-06 20:51               ` Andrew Lutomirski
@ 2011-06-06 21:45               ` Ingo Molnar
  2011-06-06 21:48                 ` Ingo Molnar
       [not found]                 ` <BANLkTi==uw_h78oaep1cCOCzwY0edLUU_Q@mail.gmail.com>
  2011-06-06 21:53               ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule pageexec
  2 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 21:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: pageexec, Andi Kleen, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> We *definitely* don't want to name it in a way that makes some 
> random person just turn it off because it's scary, since the random 
> person *shouldn't* turn it off today. Comprende?

Agreed, and that's fixed now.

> And the annoying part about the whole patch series is how the whole 
> re-sending has gone on forever. Just pick some approach, do it, and 
> don't even bother making it a config option for now. If we can 
> replace the vsyscall page with a page fault or int3 or whatever, 
> and it's only used for the 'time()' system call, just do it.

Ok, we can certainly remove CONFIG_LEGACY_VTIME - that would further 
simplify things!

I was unsure how big of a problem the time() slowdown was and the 
config option was easy enough to provide. My preference would be to 
just remove the config option and simplify the code - complexity is 
the #1 enemy of security.

> The series is now extended with the cleanup patches so the end 
> result looks reasonable, but why have the whole "first implement 
> it, then clean it up" and sending it as a whole series. That's 
> annoying. Just send the cleaned-up end result to begin with.

Do you think x86/vdso is worth rebasing at this stage? Right now it 
has:

 feba7e97df8c: x86-64: Rename COMPAT_VSYSCALLS to LEGACY_VTIME and clarify documentation
 7dc0452808b7: x86-64: Clean up vsyscall emulation and remove fixed-address ret
 8d6316596441: x86-64: Fix outdated comments in vsyscall_64.c
 1593843e2ada: x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
 764611c8dfb5: x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build
 38172403a978: x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
 d55ed1d30b82: x86-64: Emulate legacy vsyscalls
 5dfcea629a08: x86-64: Fill unused parts of the vsyscall page with 0xcc
 bb5fe2f78ead: x86-64: Remove vsyscall number 3 (venosys)
 d319bb79afa4: x86-64: Map the HPET NX
 0d7b8547fb67: x86-64: Remove kernel.vsyscall64 sysctl
 9fd67b4ed071: x86-64: Give vvars their own page
 8b4777a4b50c: x86-64: Document some of entry_64.S
 6879eb2deed7: x86-64: Fix alignment of jiffies variable

it's reasonably tested by now. We'd keep about 80% of the commits 
after the rebase.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 21:45               ` Ingo Molnar
@ 2011-06-06 21:48                 ` Ingo Molnar
       [not found]                 ` <BANLkTi==uw_h78oaep1cCOCzwY0edLUU_Q@mail.gmail.com>
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 21:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: pageexec, Andi Kleen, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks


* Ingo Molnar <mingo@elte.hu> wrote:

> I was unsure how big of a problem the time() slowdown was and the 
> config option was easy enough to provide. My preference would be to 
> just remove the config option and simplify the code - complexity is 
> the #1 enemy of security.

The patch below is what the config removal brings us.

Thanks,

	Ingo

--
 Documentation/feature-removal-schedule.txt |   13 -------------
 arch/x86/Kconfig                           |   23 -----------------------
 arch/x86/kernel/vsyscall_64.c              |   26 --------------------------
 arch/x86/kernel/vsyscall_emu_64.S          |    2 --
 4 files changed, 0 insertions(+), 64 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 7a7446b..1a9446b 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -600,16 +600,3 @@ Why:	Superseded by the UVCIOC_CTRL_QUERY ioctl.
 Who:	Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 
 ----------------------------
-
-What:	CONFIG_LEGACY_VTIME (x86_64)
-When:	When glibc 2.14 or newer is ubiquitous.  Perhaps 2013.
-Why:	Having user-executable syscall invoking code at a fixed addresses makes
-	it easier for attackers to exploit security holes.  Turning off
-	CONFIG_LEGACY_VTIME reduces the risk without breaking binary
-	compatibility but will make the time() function slightly slower on
-	glibc versions 2.13 and below.
-
-	We may flip the default setting to N before 2013.
-Who:	Andy Lutomirski <luto@mit.edu>
-
-----------------------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6746d35..da34972 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1646,29 +1646,6 @@ config COMPAT_VDSO
 
 	  If unsure, say Y.
 
-config LEGACY_VTIME
-	def_bool y
-	prompt "Fast legacy sys_time() vsyscall"
-	depends on X86_64
-	---help---
-	  Glibc 2.13 and older, statically linked binaries, and a few
-	  other things use a legacy ABI to implement time().
-
-	  If you say N here, the kernel will emulate that interface in
-	  order to make certain types of userspace bugs more difficult
-	  to exploit.  This will cause some legacy software to run
-	  slightly more slowly.
-
-	  If you say Y here, then the kernel will provide native code to
-	  allow legacy programs to run without any performance impact.
-	  This could make it easier to exploit certain types of
-	  userspace bugs.
-
-	  If unsure, say Y.
-
-	  NOTE: disabling this option will not break any ABI; the kernel
-	        will be fully compatible with all binaries either way.
-
 config CMDLINE_BOOL
 	bool "Built-in kernel command line"
 	---help---
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index f56644e..41f6a98 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -102,30 +102,6 @@ static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
 	       regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
 }
 
-#ifdef CONFIG_LEGACY_VTIME
-
-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
-time_t __attribute__ ((unused, __section__(".vsyscall_1"))) notrace
-vtime(time_t *t)
-{
-	unsigned seq;
-	time_t result;
-
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
-
-		result = VVAR(vsyscall_gtod_data).wall_time_sec;
-
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
-
-	if (t)
-		*t = result;
-	return result;
-}
-
-#endif /* CONFIG_LEGACY_VTIME */
-
 void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 {
 	static DEFINE_RATELIMIT_STATE(rs, 3600 * HZ, 3);
@@ -169,12 +145,10 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 			(struct timezone __user *)regs->si);
 		break;
 
-#ifndef CONFIG_LEGACY_VTIME
 	case 1:
 		vsyscall_name = "time";
 		ret = sys_time((time_t __user *)regs->di);
 		break;
-#endif
 
 	case 2:
 		vsyscall_name = "getcpu";
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
index b192283..bc10dba 100644
--- a/arch/x86/kernel/vsyscall_emu_64.S
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -14,12 +14,10 @@ ENTRY(vsyscall_0)
 	int $VSYSCALL_EMU_VECTOR
 END(vsyscall_0)
 
-#ifndef CONFIG_LEGACY_VTIME
 .section .vsyscall_1, "a"
 ENTRY(vsyscall_1)
 	int $VSYSCALL_EMU_VECTOR
 END(vsyscall_1)
-#endif
 
 .section .vsyscall_2, "a"
 ENTRY(vsyscall_2)

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 20:40             ` Linus Torvalds
  2011-06-06 20:51               ` Andrew Lutomirski
  2011-06-06 21:45               ` Ingo Molnar
@ 2011-06-06 21:53               ` pageexec
  2 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-06 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Andy Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 7 Jun 2011 at 5:40, Linus Torvalds wrote:

> On Tue, Jun 7, 2011 at 3:46 AM,  <pageexec@freemail.hu> wrote:
> >
> >> I'm happy with perhaps moving away from the fixed-address vdso,
> >
> > it's not about the vdso that has been mmap'ed and randomized for quite some
> > time now. it's about the amd64 specific vsyscall page.
> 
> Duh. What do you think that thing is? It's a special fixed-address
> vdso.

that we call the vsyscall page and not some random vdso thing, they're quite
different, that's why there's this whole patch series, duh.

> What I complain about in the patch series was (specifically) that I
> think the naming sucks and (non-specifically) that the whole series is
> annoying.
> 
> The config name is misleading and pointlessly scary - the whole thing
> is not in itself "unsafe", so calling it that is just wrong.

if it's safe to have the vsyscall page at a fixed address, then you surely
wouldn't object to have its replacement at a fixed address as well, would
you? yes/no? (if it's a 'yes' then you'd better have some non-security
arguments too ;)

> We *definitely* don't want to name it in a way that makes some random
> person just turn it off because it's scary, since the random person
> *shouldn't* turn it off today. Comprende?

actually you confused yourself and got it backwards. we want everyone sane
who cares an iota about security to turn off the legacy/fixed address vsyscall
as soon as possible else it's a pointless exercise. capito?

> If we can replace the vsyscall page with a page fault or int3 or
> whatever, and it's only used for the 'time()' system call, just do it. 

i agree fully, there's no real reason for a config option imho, i never
had one in PaX and noone ever complained let alone noticed it (except
perhaps for failed exploit attempts but that's by design).


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 20:51               ` Andrew Lutomirski
@ 2011-06-06 21:54                 ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-06 21:54 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, pageexec, Andi Kleen, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks


* Andrew Lutomirski <luto@mit.edu> wrote:

> > don't even bother making it a config option for now. If we can 
> > replace the vsyscall page with a page fault or int3 or whatever, 
> > and it's only used for the 'time()' system call, just do it.
> 
> Really?
> 
> I won't personally complain about the 200+ ns hit, but I'm sure 
> someone will cc: me on a regression report if there's no option.

Well, for 0.2 usecs to show up one would have to do like a million of 
them per second.

Why an application would want to query the *same value* from the 
kernel a million times per second is beyond me. vgettimeofday()? 
Sure. But vtime()?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 16:47                         ` Ingo Molnar
@ 2011-06-06 22:49                           ` pageexec
  2011-06-06 22:57                             ` david
                                               ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: pageexec @ 2011-06-06 22:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 6 Jun 2011 at 18:47, Ingo Molnar wrote:

> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > uhm, not sure why you're so worked up here. is it because i said
> > 'scalability' was completely irrelevant for the nx vsyscall page
> > approach? elaborate!
> 
> Firstly, 'fast' is a necessary first step towards good scalability, 
> secondly i was talking about *both* speed and scalability so your 
> insistence to only discuss speed is banging on open doors ...

uhm, why the heck do you keep bringing this up? what does it matter?
i talk about whatever i find relevant, and your scalability fetish
has no business with the vsyscall thing we're talking about here.
if you think it does, then you still haven't explained it.

> You are simply wrong about:
> 
> > > > sorry, but stating that the pf handler is a fast path doesn't 
> > > > make it so ;).
> 
> and 5-6 mails down the line you are still unwilling to admit it. Why?

why are you cutting out in all those mails of yours what i already told
you many times? the original statement from Andy was about the int cc path
vs. the pf path: he said that the latter would not tolerate a few well
predicted branches (if they were put there at all, that is) because the
pf handler is such a critical fast path code. it is *not*. it can't be
by almost definition given how much processing it has to do (it is by
far one of the most complex of cpu exceptions to process).

it seems to me that you're unwilling to admit that you tried to pick on
the wrong thing, probably in the heat of the discussion and now you try
to insist to save face or something. if you really want to get out of this
then please, go do the measurements i asked you and you'll see yourself.

> A fastpath is defined by optimization considerations applied to a 
> codepath (the priority it gets compared to other codepaths), *not* by 
> its absolute performance.

we're not talking about random arbitrarily defined paths here but the
impact of putting well predicted branches into the pf handler vs. int xx
(are you perhaps confused by 'fast path' vs. 'fastpath'?).

that impact only matters if it's measurable. you have yet to show that it
is. and all this sillyness is for a hypothetical situation since those
conditional branches don't even need to be in the general page fault
processing paths.

> You seem to be confused on several levels here.

you're talking about something else, probably because it's you who's
very confused about this whole fast path business. kinda surprising
given how much time you supposedly spent on this topic in the past.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 22:49                           ` pageexec
@ 2011-06-06 22:57                             ` david
  2011-06-07  9:07                               ` Ingo Molnar
  2011-06-07  6:59                             ` Pekka Enberg
  2011-06-07  8:30                             ` Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: david @ 2011-06-06 22:57 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Tue, 7 Jun 2011, pageexec@freemail.hu wrote:

> On 6 Jun 2011 at 18:47, Ingo Molnar wrote:
>
>> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
>>
>>>>> sorry, but stating that the pf handler is a fast path doesn't
>>>>> make it so ;).
>>
>> and 5-6 mails down the line you are still unwilling to admit it. Why?
>
> why are you cutting out in all those mails of yours what i already told
> you many times? the original statement from Andy was about the int cc path
> vs. the pf path: he said that the latter would not tolerate a few well
> predicted branches (if they were put there at all, that is) because the
> pf handler is such a critical fast path code. it is *not*. it can't be
> by almost definition given how much processing it has to do (it is by
> far one of the most complex of cpu exceptions to process).

it seems to me that such a complicated piece of code that is executed so 
frequently  is especially sensitive to anything that makes it take longer

>> A fastpath is defined by optimization considerations applied to a
>> codepath (the priority it gets compared to other codepaths), *not* by
>> its absolute performance.
>
> we're not talking about random arbitrarily defined paths here but the
> impact of putting well predicted branches into the pf handler vs. int xx
> (are you perhaps confused by 'fast path' vs. 'fastpath'?).

as someone watching, I know that I don't see what difference adding or 
removing a space makes.

the fast path or fastpath refers to the path that is the most performance 
critical (no matter how long the processing takes)

> that impact only matters if it's measurable. you have yet to show that it
> is. and all this sillyness is for a hypothetical situation since those
> conditional branches don't even need to be in the general page fault
> processing paths.

he has shown examples of other 'minor' changes in this code that have been 
considered very significant. given that so much processing already takes 
place here, there is unlikly to be extra processor cycles that can be used 
without making any impact.

>> You seem to be confused on several levels here.
>
> you're talking about something else, probably because it's you who's
> very confused about this whole fast path business. kinda surprising
> given how much time you supposedly spent on this topic in the past.

please educate the rest of us on what you think 'fast path' and 'fastpath' 
mean (and why you think they are so different)

David Lang

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-06 19:12             ` Ingo Molnar
@ 2011-06-07  0:02               ` pageexec
  2011-06-07  9:56                 ` Ingo Molnar
                                   ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: pageexec @ 2011-06-07  0:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra

On 6 Jun 2011 at 21:12, Ingo Molnar wrote:

> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > and whoever enables them, what do you think they're more likely to 
> > get in return? some random and rare old binaries that still run for 
> > a minuscule subset of users or every run-of-the-mill exploit 
> > working against *every* user, metasploit style (did you know that 
> > it has a specific target for the i386 compat vdso)?
> 
> That's what binary compatibility means, yes.

so fedora is not binary compatible. did just admit that in real life security
won out? we're on the right track! ;)

> > so once again, tell me whether the randomized placement of the vdso 
> > wasn't about security (in which case can we please have it back at 
> > a fixed mmap'd address, since it doesn't matter for security you 
> > have no reason to refuse ;).
> 
> It's a statistical security measure, and was such a measure from the 
> day it was committed:
> 
>  | commit e6e5494cb23d1933735ee47cc674ffe1c4afed6f
>  | Author: Ingo Molnar <mingo@elte.hu>
>  | Date:   Tue Jun 27 02:53:50 2006 -0700
>  |
>  |    [PATCH] vdso: randomize the i386 vDSO by moving it into a vma
>  |    
>  |    Move the i386 VDSO down into a vma and thus randomize it.
>  |    
>  |    Besides the security implications, this feature also helps debuggers, which
>  |    can COW a vma-backed VDSO just like a normal DSO and can thus do
>  |    single-stepping and other debugging features.
> 
> So what's your point?

you called this feature "borderline security FUD" but have yet to prove it.
on the contrary you did prove that it is a security feature and there's at
least one distro where it matters. of course you can call fedora's and even
mainline's vdso randomization FUD, but then please fix it and map it at a
constant address. you wouldn't want to live with "borderline security FUD"
features, would you? ;)

> > > It's only a security problem if there's a security hole 
> > > elsewhere.
> > 
> > it's not an 'if', there *is* a security hole 'elsewhere', else the 
> > CVE list had been abandoned long ago and noone would be doing 
> > proactive security measures such as intrusion prevention 
> > mechanisms.
> > 
> > so it *is* a security problem.
> 
> Two arguments.
> 
> Firstly, you generalize too much, it's only a security problem if you 
> actually have an attack surface:

a security problem without an attack surface is an animal that doesn't
exist. a potential problem becomes a security problem when there's a way
to attack/abuse it. your security vocabulary is seriously lacking and/or
you're very confused about very basic terminology but i'll try to do my
best to make sense out of what you're trying to say and also correct it
as needed.

>   Many Linux systems don't have any: non-networked appliances that 
>   are not physically connected to any hostile medium.

that only means that some problems are not security problems for those
systems. i find it somewhat ironic that you accuse me of too much
generalization yet you're the one who believes in black and white
terms, such as 'security bugs'. let me clear up your profound confusion:

pretty much every term in security is relative, of the 'it depends on this
or that condition' kind. e.g., when we call something an exploitable bug
it's because we know it can be exploited for sure under some conditions
(given OS, given userland app, given config options, etc, whatever applies
to the given situation) not because it cannot be exploited under some other
conditions, duh.

>   For such a system a gaping root hole bug is *not even a bug*, while 
>   a rare memory leak that you'd shrug off on a desktop might be a 
>   showstopper.

for such a system there's no 'gaping root hole' and we don't say that such
a system 'has an exploitable bug but not really'. such categories may exist
in your head only, but i can assure you that they don't exist among professionals.
what's more, there're things that some consider as a feature while others
consider a bug (or even security bug), then what ;). 

> Secondly, and more importantly, we try to maintain the kernel in a 
> way so that it can converge to a no bugs state in the long run.

no, you're not doing that. you don't even know that such a state is a
pipedream and cannot be achieved by any practical means we know of. i'm
somewhat saddened (if true) that this is a driving idea among kernel
developers.

consider that before eliminating old bugs you'd better not let new ones
in, in the first place. but you have no processes to ensure this, you don't
even know how to do it or if it's even possible to pull off such a feat.

> You can only do that by making sure that even in the very last 
> stages, when there are virtually no bugs left, the incentives and 
> mechanisms are still there to fix even those bugs.

more pipedreams. do you have *any* idea what you're talking about? seriously,
can you provide a program (think task list, not actual computer stuff)
that you think will get you anywhere near to this goal? i bet you cannot.
i bet you don't even know what the state of the art is in creating
such systems (on the linux scale, not thousand line long specialized
microkernels).

> If we add obstruction features that turn bugs into less severe 
> statistical bugs then that automatically reduces the speed of 
> convergence.

what? (you're still very confused about the bug vs. exploit thing btw,
ASLR doesn't affect bugs, it affects exploit techniques)

what's the connection again? you're just repeating what you said before
without anything to back it up. one more time: intrusion prevention is
orthogonal to bug finding & fixing. also for the latter group of people,
you only really care about those that actually disclose to you what they
find, and they are not influenced by exploit prevention techniques given
how they're not interested in writing said exploits (else they would not
disclose the bugs they're exploiting).

> We might still do it, but you have to see and acknowledge that it's a 
> *cost*.

i don't have to acknowledge non-existent things ;). you made up all this
'cost' thing and have yet to explain it.

> You seem to argue that it's a bona fide bug and that the fix 
> is deterministic that it "needs fixing" - and that is wrong on both 
> counts.

sorry i really lost you here. what bug are you talking about? and what's
with the 'fix is deterministic'? what else can a fix be? you either fix
a bug or you don't, as much as i hate black&white statements, i don't
know how you can make bugfixing non-deterministic ;). i hope you're not
mixing up ASLR (which works against exploit techniques) with bugfixing.

> You generally seem to assume that security is an absolute goal with 
> no costs attached.

quote me on that back please or admit you made this up. i'm very well
aware of every security feature and its cost in PaX for example, i have
written about them extensively and educated both users and non-users for
over a decade now. if you want to paint me something do yourself a favour
and at least read up on the project and person you're discussing. you
can start with the PaX documentation, its kernel config help, grsec
forum threads, etc.

> > > Yes, the upside is that they reduce the risks associated with 
> > > security holes - but only statistically so.
> > 
> > not sure what 'these measures' are here (if you mean ASLR related
> > ones, please say so), some are randomization based (so their impact 
> > on security is probabilistic), some aren't (their impact is 
> > deterministic).
> 
> Which of these changes are deterministic?

you tell me, you seemed to talk in generic terms without naming
anything in particular, so i was left guessing and made a similarly
generic statement ;). but to give you an idea here, my approach of
making the vsyscall page nx is a deterministic approach. there's no
randomness involved in that step. or Andy's approach of replacing
the syscall insns (which can be found and executed from anywhere)
with a specially chosen one is also a deterministic solution, there
is no randomness involved in whether an attacker can or cannot abuse
them. heck, you can consider having the vsyscall page at a fixed
address as a deterministic help library for exploit writers.

> Removing a syscall or a RET from a fixed address is *still* only a 
> probabilistic fix: the attacker can still do brute-force attacks 
> against the many executable pages in user-space, even if everything 
> is ASLR obfuscated.

no, it's not a fix, you're once again confusing bugs with exploit
techniques and bugfixes with exploit prevention techniques. other
that that yes, you stated a tautology, the point of which was?

> It helps if you read the bit i provided after the colon:
> 
>   > > if a bug is not exploitable then people like Spender wont spend 
>   > > time exploiting and making a big deal out of them, right?
> 
> If a bug is hidden via ASLR (and *all* of the changes in this thread 
> had only that effect) and can not be exploited using the simple fixed 
> address techniques disabled by these patches, then people like you or 
> Spender wont spend time exploiting them, right?

you're really really confused about this whole bug/exploit thing. ASLR
has nothing to do with bugs. it has everything to do with exploits and
more precisely, exploit techniques. do you understand the difference
between bugs and exploits? because if you do, you can't have made the
above question, so i'm guessing you don't understand the difference,
even if this is the most crucial point in understanding what all the
intrusion prevention approach is about.

ASLR prevents exploit techniques that rely on knowing addresses in the
attacked process, regardless of the the underlying bug. from another
angle, a bug may very well be exploitable by one exploit technique even
under ASLR but not exploitable by another technique (obviously an exploit
writer's job is to find out what the case is and choose wisely by
considering all the factors).

so what you wanted to say above but didn't know how to put it correctly
is that given an exploit technique prevented (in a probabilistic sense)
by ASLR, what would an exploit writer do (neither of us is, btw but your
continued insistence on painting us as such shows how far detached you
are from the world of computer security, i bet you can't even name a
single company that actually trades in exploits and employs such people).

and the answer to that is that 'it depends', as so many things do in security.

it depends on, among others:
- whether there's another bug that can leak back useful addresses
- whether there's another exploit technique that doesn't need fixed
  addresses (think partial pointer overwrites for example)
- whether the given target can sustain brute force or not
- anything else that a real exploit writer (unlike us) would consider

so your question has no black&white answer even if you'd like to get
one, that's just the way things often are in security.

> But it can still be exploited brute-force: just cycle through the 
> possible addresses until you find the right instruction that elevates 
> privileges.

that's not true, at least not in systems that do ASLR properly and have
a brute force prevention mechanism. there the probability can be bounded
below 1 (did you even read the ASLR doc i wrote some 8 years ago? it's
all explained there).

> > it's a nice theory, it has never worked anywhere (just look at 
> > OpenBSD ;). show me a single class of bugs that you think you'd 
> > fixed in linux. [...]
> 
> For example after this meta-fix:
> 
>   c41d68a: compat: Make compat_alloc_user_space() incorporate the access_ok()
> 
> We certainly have eliminated the class of bugs where we'd return 
> out-of-bounds pointers allocated via compat_alloc_user_space() and 
> exploited via large or negative 'len' values.

it wasn't a class but a single instance of a bug. a class of bugs needs
more instances. have you got any more of this kind? alternatively you
can try and find your supposed class in the CWE, i'm all ears ;).

> > > Can you guarantee that security bugs will be found and fixed with 
> > > the same kind of intensity even if we make their exploitation 
> > > (much) harder? I don't think you can make such a guarantee.
> > 
> > why would *i* have to guarantee anything? [...]
> 
> It was an generic/indefinite 'you'.
> 
> To understand my point you need to look at the context i replied to:
> 
>  > > > the fixed address of the vsyscall page *is* a very real 
>  > > > security problem, it should have never been accepted as such 
>  > > > and it's high time it went away finally in 2011AD.
> 
> You claimed that it is a very real security problem. I pointed out 
> that this is not a real primary fix for some security bug

ASLR was never about bugs, but exploit techniques, not sure where you
read anything to the contrary from me. so 'citation needed'. or are
you once again confusing bugfixes with intrusion prevention techniques?

> but a statistical method that makes exploits of other bugs harder (but
> not impossible),

that's not true as explained above.

> and as such it has the cost of making *real* fixes slower to arrive. 

yes, you keep saying that but you never presented any evidence why that
would be the case. and you probably meant 'finding bugs' not 'fixing
bugs' as it'd be a real shame if the kernel bug fixing process would
depend on the presence of intrusion prevention mechanisms. as i pointed
it out earlier, customers/userd don't take DoS that much more kindly
either ;).

> I don't think this was a terribly complicated argument, yet you do 
> not even seem to acknowledge that it exists.

that's because your 'argument' is bogus. you imagine things and believe
them to be true, without having even taken a look at real life out there.

host-based intrusion prevention techniques are what, like little over a
decade old? what do you think happened in all that timeframe with bugs?
did they die out? or did more and more people jump on the bug finding
bandwagon and eventually make the bug counts explode? what happened to
the exploits over the same time frame? did they die out? or did their
numbers explode? what happened to malware? etc, etc.

you're so uninformed yet insist on playing the knowledgable one that
one begins to wonder whether you have some psycholocigal issue with trying
to be the smart person in every field of life. it should now be becoming
clear to you that security won't be that field if you keep ignoring reality.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-06 19:25             ` Ingo Molnar
@ 2011-06-07  0:34               ` pageexec
  2011-06-07  9:51                 ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-07  0:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 21:25, Ingo Molnar wrote:

> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > [...] it goes like 'I am not willing to do A because it would help 
> > script kiddies but I'd rather do B that would help script kiddies'. 
> > with A = 'disclose security bugs' and B = 'keep the last roadblock 
> > that prevents full ASLR'.
> 
> No, that's wrong, the logic goes like this:
> 
>   if i do A then it has X1 advantages and Y1 disadvantages.
>   if i do B then it has X2 advantages and Y2 disadvantages.
> 
> The Y1 and Y2 set of disadvantages can both include "making it easier 
> for script kiddies" but the sets of advantages and disadvantages can 
> also include MANY OTHER considerations, making the decision unique in 
> each case.

sure, i was only reflecting on what Linus himself kept insisting on in
the past.

> To translate it to this specific case (extremely simplifed, so please 
> don't nit-pick that my descriptions of advantages and disadvantages 
> are not precise nor complete):

i don't even need to get there, you already failed right in the very
first sentence, very impressive. no. 'not precise' is an understatement.

>  A) "i put a zero day exploit and a CVE code into a changelog"
> 
>      Advantages: - it describes the problem more fully
> 
>   Disadvantages: - it makes it easier for people (including script kiddies) do harm faster
>                  - creates a false, misleading category for "security bugs"
> 

you try to set things up to serve your argument but it's not the things
we've ever talked about (IOW, this is a strawman).

in particular, i've never ever requested exploits in commit logs (and i
don't remember anyone else who has, do you?). why do you keep thinking in
only extremes? is it so impossible to simply state a CVE and the generic
bug class (CWE) that the commit fixes? what Linus has insisted on is 'no
greppable words', that's diametrically opposite to 'full disclosure' that
you guys say you're supposedly doing.

so if you omit the exploits that noone really requested (and i don't even
know why they'd be useful in a commit) then suddenly the script kiddies
are no longer helped.

and you have yet to explain what is false and misleading about the security
bug category. you used these words yourself several times today, how do you
explain that? why does the CVE exist? why does bugtraq exist? are all those
people discussing 'false and misleading' things? why does your employer
release security errata? etc, etc.

>  B) "i obfuscate the vsyscall page"
> 
>      Advantages: - it makes it statistically harder for people (including script kiddies) to do harm
> 
>   Disadvantages: - it reduces the incentive to fix *real* security bugs

as i pointed out in an earlier mail, this supposed disadvantage doesn't
exist so come up with something better, preferably real.

>                  - complicates the code

removing code simplifies things. next try? ;)

> Do you see how A) and B) are not equivalent at all? Different cases, 
> different attributes, different probabilities and different 
> considerations.

i only see a strawman that you thought would help your cause but since
it's just that, a strawman, something you made up for the sake of argument,
i don't think there's much more to see about it.

> > but it's very simple logic Ingo.
> 
> Please drop the condescending tone, i think it should be clear to you 
> by now that i have a good basis to disagree with you.

i'm a firm believer of instant karma, it seems to work on people like yourself
or Linus really well. in somewhat profane but simple english: if you behave as
an asshole i will treat you as one, if you believe i treated you as an asshole
it's because i think you acted as one, and if you don't understand why then you're
welcome to 1. look into yourself and figure it out yourself, 2. ask me. what is
not going to get you anywhere is if you talk to me and others from the high horse,
you must be a lot better than your current self for anyone to tolerate it.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 22:49                           ` pageexec
  2011-06-06 22:57                             ` david
@ 2011-06-07  6:59                             ` Pekka Enberg
  2011-06-07  8:30                             ` Ingo Molnar
  2 siblings, 0 replies; 112+ messages in thread
From: Pekka Enberg @ 2011-06-07  6:59 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Tue, Jun 7, 2011 at 1:49 AM,  <pageexec@freemail.hu> wrote:
>> A fastpath is defined by optimization considerations applied to a
>> codepath (the priority it gets compared to other codepaths), *not* by
>> its absolute performance.
>
> we're not talking about random arbitrarily defined paths here but the
> impact of putting well predicted branches into the pf handler vs. int xx
> (are you perhaps confused by 'fast path' vs. 'fastpath'?).

Well, I'm sure confused by 'fast path' vs 'fastpath'! Can you please
explain the difference? Thanks!

                                Pekka

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Emulate legacy vsyscalls
  2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
                     ` (2 preceding siblings ...)
  2011-06-06  8:35   ` [tip:x86/vdso] x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build tip-bot for Ingo Molnar
@ 2011-06-07  7:49   ` tip-bot for Andy Lutomirski
  2011-06-07  8:03   ` tip-bot for Andy Lutomirski
  4 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-07  7:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, luto, hpa, linux-kernel, luto,
	andi, bp, arjan, mingo

Commit-ID:  64b0e3256b8ac594cae6ebb4e154b98ec4da008b
Gitweb:     http://git.kernel.org/tip/64b0e3256b8ac594cae6ebb4e154b98ec4da008b
Author:     Andy Lutomirski <luto@MIT.EDU>
AuthorDate: Sun, 5 Jun 2011 13:50:24 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 7 Jun 2011 09:46:53 +0200

x86-64: Emulate legacy vsyscalls

There's a fair amount of code in the vsyscall page.  It contains
a syscall instruction (in the gettimeofday fallback) and who
knows what will happen if an exploit jumps into the middle of
some other code.

Reduce the risk by replacing the vsyscalls with short magic
incantations that cause the kernel to emulate the real
vsyscalls. These incantations are useless if entered in the
middle.

This causes vsyscalls to be a little more expensive than real
syscalls.  Fortunately sensible programs don't use them.
The only exception is time() which is still called by glibc
through the vsyscall - but calling time() millions of times
per second is not sensible. glibc has this fixed in the
development tree.

This patch is not perfect: the vread_tsc and vread_hpet
functions are still at a fixed address.  Fixing that might
involve making alternative patching work in the vDSO.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
[ Removed the CONFIG option - it's simpler to just do it unconditionally. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/irq_vectors.h |    6 +-
 arch/x86/include/asm/traps.h       |    4 +
 arch/x86/include/asm/vsyscall.h    |   12 ++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/entry_64.S         |    2 +
 arch/x86/kernel/traps.c            |    6 +
 arch/x86/kernel/vsyscall_64.c      |  236 +++++++++++++++++-------------------
 arch/x86/kernel/vsyscall_emu_64.S  |   25 ++++
 include/linux/seccomp.h            |   10 ++
 9 files changed, 179 insertions(+), 123 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 6e976ee..a563c50 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -17,7 +17,8 @@
  *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
  *  Vectors  32 ... 127 : device interrupts
  *  Vector  128         : legacy int80 syscall interface
- *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 : device interrupts
+ *  Vector  204         : legacy x86_64 vsyscall emulation
+ *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 except 204 : device interrupts
  *  Vectors INVALIDATE_TLB_VECTOR_START ... 255 : special interrupts
  *
  * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
@@ -50,6 +51,9 @@
 #ifdef CONFIG_X86_32
 # define SYSCALL_VECTOR			0x80
 #endif
+#ifdef CONFIG_X86_64
+# define VSYSCALL_EMU_VECTOR		0xcc
+#endif
 
 /*
  * Vectors 0x30-0x3f are used for ISA interrupts.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0310da6..2bae0a5 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_TRAPS_H
 #define _ASM_X86_TRAPS_H
 
+#include <linux/kprobes.h>
+
 #include <asm/debugreg.h>
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
@@ -38,6 +40,7 @@ asmlinkage void alignment_check(void);
 asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
+asmlinkage void emulate_vsyscall(void);
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
@@ -64,6 +67,7 @@ dotraplinkage void do_alignment_check(struct pt_regs *, long);
 dotraplinkage void do_machine_check(struct pt_regs *, long);
 #endif
 dotraplinkage void do_simd_coprocessor_error(struct pt_regs *, long);
+dotraplinkage void do_emulate_vsyscall(struct pt_regs *, long);
 #ifdef CONFIG_X86_32
 dotraplinkage void do_iret_error(struct pt_regs *, long);
 #endif
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index d555973..bb710cb 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -31,6 +31,18 @@ extern struct timezone sys_tz;
 
 extern void map_vsyscall(void);
 
+/* Emulation */
+
+static inline bool is_vsyscall_entry(unsigned long addr)
+{
+	return (addr & ~0xC00UL) == VSYSCALL_START;
+}
+
+static inline int vsyscall_entry_nr(unsigned long addr)
+{
+	return (addr & 0xC00UL) >> 10;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 90b06d4..cc0469a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -44,6 +44,7 @@ obj-y			+= probe_roms.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
+obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 72c4a77..e949793 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1123,6 +1123,8 @@ zeroentry spurious_interrupt_bug do_spurious_interrupt_bug
 zeroentry coprocessor_error do_coprocessor_error
 errorentry alignment_check do_alignment_check
 zeroentry simd_coprocessor_error do_simd_coprocessor_error
+zeroentry emulate_vsyscall do_emulate_vsyscall
+
 
 	/* Reload gs selector with exception handling */
 	/* edi:  new selector */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..fbc097a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -872,6 +872,12 @@ void __init trap_init(void)
 	set_bit(SYSCALL_VECTOR, used_vectors);
 #endif
 
+#ifdef CONFIG_X86_64
+	BUG_ON(test_bit(VSYSCALL_EMU_VECTOR, used_vectors));
+	set_system_intr_gate(VSYSCALL_EMU_VECTOR, &emulate_vsyscall);
+	set_bit(VSYSCALL_EMU_VECTOR, used_vectors);
+#endif
+
 	/*
 	 * Should be a barrier for any external CPU state:
 	 */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 70a5f6e..41f6a98 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -11,10 +11,11 @@
  *  vsyscalls. One vsyscall can reserve more than 1 slot to avoid
  *  jumping out of line if necessary. We cannot add more with this
  *  mechanism because older kernels won't return -ENOSYS.
- *  If we want more than four we need a vDSO.
  *
- *  Note: the concept clashes with user mode linux. If you use UML and
- *  want per guest time just set the kernel.vsyscall64 sysctl to 0.
+ *  This mechanism is deprecated in favor of the vDSO.
+ *
+ *  Note: the concept clashes with user mode linux.  UML users should
+ *  use the vDSO.
  */
 
 /* Disable profiling for userspace code: */
@@ -32,6 +33,8 @@
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/notifier.h>
+#include <linux/syscalls.h>
+#include <linux/ratelimit.h>
 
 #include <asm/vsyscall.h>
 #include <asm/pgtable.h>
@@ -44,10 +47,7 @@
 #include <asm/desc.h>
 #include <asm/topology.h>
 #include <asm/vgtod.h>
-
-#define __vsyscall(nr) \
-		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
-#define __syscall_clobber "r11","cx","memory"
+#include <asm/traps.h>
 
 DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
@@ -84,129 +84,124 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
 
-/* RED-PEN may want to readd seq locking, but then the variable should be
- * write-once.
- */
-static __always_inline void do_get_tz(struct timezone * tz)
+static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
+			      const char *message)
 {
-	*tz = VVAR(vsyscall_gtod_data).sys_tz;
-}
+	struct task_struct *tsk;
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
 
-static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
-{
-	int ret;
-	asm volatile("syscall"
-		: "=a" (ret)
-		: "0" (__NR_gettimeofday),"D" (tv),"S" (tz)
-		: __syscall_clobber );
-	return ret;
-}
+	if (!show_unhandled_signals || !__ratelimit(&rs))
+		return;
 
-static __always_inline void do_vgettimeofday(struct timeval * tv)
-{
-	cycle_t now, base, mask, cycle_delta;
-	unsigned seq;
-	unsigned long mult, shift, nsec;
-	cycle_t (*vread)(void);
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
-
-		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!vread)) {
-			gettimeofday(tv,NULL);
-			return;
-		}
-
-		now = vread();
-		base = VVAR(vsyscall_gtod_data).clock.cycle_last;
-		mask = VVAR(vsyscall_gtod_data).clock.mask;
-		mult = VVAR(vsyscall_gtod_data).clock.mult;
-		shift = VVAR(vsyscall_gtod_data).clock.shift;
-
-		tv->tv_sec = VVAR(vsyscall_gtod_data).wall_time_sec;
-		nsec = VVAR(vsyscall_gtod_data).wall_time_nsec;
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
-
-	/* calculate interval: */
-	cycle_delta = (now - base) & mask;
-	/* convert to nsecs: */
-	nsec += (cycle_delta * mult) >> shift;
-
-	while (nsec >= NSEC_PER_SEC) {
-		tv->tv_sec += 1;
-		nsec -= NSEC_PER_SEC;
-	}
-	tv->tv_usec = nsec / NSEC_PER_USEC;
-}
+	tsk = current;
 
-int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
-{
-	if (tv)
-		do_vgettimeofday(tv);
-	if (tz)
-		do_get_tz(tz);
-	return 0;
+	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx\n",
+	       level, tsk->comm, task_pid_nr(tsk),
+	       message,
+	       regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
 }
 
-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
-time_t __vsyscall(1) vtime(time_t *t)
+void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 {
-	unsigned seq;
-	time_t result;
+	static DEFINE_RATELIMIT_STATE(rs, 3600 * HZ, 3);
+	struct task_struct *tsk;
+	const char *vsyscall_name;
+	unsigned long caller;
+	int vsyscall_nr;
+	long ret;
+
+	/* Kernel code must never get here. */
+	BUG_ON(!user_mode(regs));
+
+	local_irq_enable();
+
+	/*
+	 * x86-ism here: regs->ip points to the instruction after the int 0xcc,
+	 * and int 0xcc is two bytes long.
+	 */
+	if (!is_vsyscall_entry(regs->ip - 2)) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "illegal int 0xcc "
+				  "(exploit attempt?)");
+		goto sigsegv;
+	}
+	vsyscall_nr = vsyscall_entry_nr(regs->ip - 2);
 
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
+	if (get_user(caller, (unsigned long __user *)regs->sp) != 0) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "int 0xcc with bad stack "
+				  "(exploit attempt?)");
+		goto sigsegv;
+	}
 
-		result = VVAR(vsyscall_gtod_data).wall_time_sec;
+	tsk = current;
+	if (seccomp_mode(&tsk->seccomp))
+		do_exit(SIGKILL);
+
+	switch (vsyscall_nr) {
+	case 0:
+		vsyscall_name = "gettimeofday";
+		ret = sys_gettimeofday(
+			(struct timeval __user *)regs->di,
+			(struct timezone __user *)regs->si);
+		break;
+
+	case 1:
+		vsyscall_name = "time";
+		ret = sys_time((time_t __user *)regs->di);
+		break;
+
+	case 2:
+		vsyscall_name = "getcpu";
+		ret = sys_getcpu((unsigned __user *)regs->di,
+				 (unsigned __user *)regs->si,
+				 0);
+		break;
+
+	default:
+		/*
+		 * If we get here, then vsyscall_nr indicates that int 0xcc
+		 * happened at an address in the vsyscall page that doesn't
+		 * contain int 0xcc.  That can't happen.
+		 */
+		BUG();
+	}
 
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
+	if (ret == -EFAULT) {
+		/*
+		 * Bad news -- userspace fed a bad pointer to a vsyscall.
+		 *
+		 * With a real vsyscall, that would have caused SIGSEGV.
+		 * To make writing reliable exploits using the emulated
+		 * vsyscalls harder, generate SIGSEGV here as well.
+		 */
+		warn_bad_vsyscall(KERN_INFO, regs,
+				  "vsyscall fault (exploit attempt?)");
+		goto sigsegv;
+	}
 
-	if (t)
-		*t = result;
-	return result;
-}
+	regs->ax = ret;
+
+	if (__ratelimit(&rs)) {
+		printk(KERN_INFO "%s[%d] emulated legacy vsyscall %s(); "
+		       "upgrade your code to avoid a performance hit. "
+		       "ip:%lx sp:%lx caller:%lx",
+		       tsk->comm, task_pid_nr(tsk), vsyscall_name,
+		       regs->ip - 2, regs->sp, caller);
+		if (caller)
+			print_vma_addr(" in ", caller);
+		printk("\n");
+	}
 
-/* Fast way to get current CPU and node.
-   This helps to do per node and per CPU caches in user space.
-   The result is not guaranteed without CPU affinity, but usually
-   works out because the scheduler tries to keep a thread on the same
-   CPU.
+	/* Emulate a ret instruction. */
+	regs->ip = caller;
+	regs->sp += 8;
 
-   tcache must point to a two element sized long array.
-   All arguments can be NULL. */
-long __vsyscall(2)
-vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
-{
-	unsigned int p;
-	unsigned long j = 0;
-
-	/* Fast cache - only recompute value once per jiffies and avoid
-	   relatively costly rdtscp/cpuid otherwise.
-	   This works because the scheduler usually keeps the process
-	   on the same CPU and this syscall doesn't guarantee its
-	   results anyways.
-	   We do this here because otherwise user space would do it on
-	   its own in a likely inferior way (no access to jiffies).
-	   If you don't like it pass NULL. */
-	if (tcache && tcache->blob[0] == (j = VVAR(jiffies))) {
-		p = tcache->blob[1];
-	} else if (VVAR(vgetcpu_mode) == VGETCPU_RDTSCP) {
-		/* Load per CPU data from RDTSCP */
-		native_read_tscp(&p);
-	} else {
-		/* Load per CPU data from GDT */
-		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
-	}
-	if (tcache) {
-		tcache->blob[0] = j;
-		tcache->blob[1] = p;
-	}
-	if (cpu)
-		*cpu = p & 0xfff;
-	if (node)
-		*node = p >> 12;
-	return 0;
+	local_irq_disable();
+	return;
+
+sigsegv:
+	regs->ip -= 2;  /* The faulting instruction should be the int 0xcc. */
+	force_sig(SIGSEGV, current);
 }
 
 /* Assume __initcall executes before all user space. Hopefully kmod
@@ -262,11 +257,8 @@ void __init map_vsyscall(void)
 
 static int __init vsyscall_init(void)
 {
-	BUG_ON(((unsigned long) &vgettimeofday !=
-			VSYSCALL_ADDR(__NR_vgettimeofday)));
-	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
-	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
-	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
+	BUG_ON(VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE));
+
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
new file mode 100644
index 0000000..bc10dba
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -0,0 +1,25 @@
+/*
+ * vsyscall_emu_64.S: Vsyscall emulation page
+ * Copyright (c) 2011 Andy Lutomirski
+ * Subject to the GNU General Public License, version 2
+*/
+
+#include <linux/linkage.h>
+#include <asm/irq_vectors.h>
+
+/* The unused parts of the page are filled with 0xcc by the linker script. */
+
+.section .vsyscall_0, "a"
+ENTRY(vsyscall_0)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_0)
+
+.section .vsyscall_1, "a"
+ENTRY(vsyscall_1)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_1)
+
+.section .vsyscall_2, "a"
+ENTRY(vsyscall_2)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_2)
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 167c333..cc7a4e9 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -19,6 +19,11 @@ static inline void secure_computing(int this_syscall)
 extern long prctl_get_seccomp(void);
 extern long prctl_set_seccomp(unsigned long);
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return s->mode;
+}
+
 #else /* CONFIG_SECCOMP */
 
 #include <linux/errno.h>
@@ -37,6 +42,11 @@ static inline long prctl_set_seccomp(unsigned long arg2)
 	return -EINVAL;
 }
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return 0;
+}
+
 #endif /* CONFIG_SECCOMP */
 
 #endif /* _LINUX_SECCOMP_H */

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH, v6] x86-64: Emulate legacy vsyscalls
       [not found]                 ` <BANLkTi==uw_h78oaep1cCOCzwY0edLUU_Q@mail.gmail.com>
@ 2011-06-07  8:03                   ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-07  8:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Andy Lutomirski, Andrew Morton, Louis Rilling,
	Brian Gerst, Valdis.Kletnieks, Arjan van de Ven, Borislav Petkov,
	x86, pageexec, Thomas Gleixner, richard -rw- weinberger,
	Jesper Juhl, linux-kernel, Mikael Pettersson, Jan Beulich


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Jun 7, 2011 6:46 AM, "Ingo Molnar" <mingo@elte.hu> wrote:
> >
> > Do you think x86/vdso is worth rebasing at this stage? Right now 
> > it has:
> 
> Yes, I'd rebase it. By now, much of what the initial patches are 
> adding is then removed in the final result, so the patch series 
> ends up first adding things, then changing the things it adds, and 
> then removing three quarters of it all.

Ok - the rebase was easy and kept much of the existing commits:

64b0e3256b8a: x86-64: Emulate legacy vsyscalls
5dfcea629a08: x86-64: Fill unused parts of the vsyscall page with 0xcc
bb5fe2f78ead: x86-64: Remove vsyscall number 3 (venosys)
d319bb79afa4: x86-64: Map the HPET NX
0d7b8547fb67: x86-64: Remove kernel.vsyscall64 sysctl
9fd67b4ed071: x86-64: Give vvars their own page
8b4777a4b50c: x86-64: Document some of entry_64.S
6879eb2deed7: x86-64: Fix alignment of jiffies variable

The 64b0e3256b8a commit (attached below) is the only rebased one - it 
merges all these commits:

d1a70649f49b: x86-64: Remove LEGACY_VTIME
feba7e97df8c: x86-64: Rename COMPAT_VSYSCALLS to LEGACY_VTIME and clarify documentation
7dc0452808b7: x86-64: Clean up vsyscall emulation and remove fixed-address ret
8d6316596441: x86-64: Fix outdated comments in vsyscall_64.c
1593843e2ada: x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
764611c8dfb5: x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build
38172403a978: x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
d55ed1d30b82: x86-64: Emulate legacy vsyscalls

and does a few obvious cleanups to the code.

I've added your Acked-by to this commit because you seemed to agree 
with the commit.

I removed this portion:

+	if (__ratelimit(&rs)) {
+		printk(KERN_INFO "%s[%d] emulated legacy vsyscall %s(); "
+		       "upgrade your code to avoid a performance hit. "
+		       "ip:%lx sp:%lx caller:%lx",
+		       tsk->comm, task_pid_nr(tsk), vsyscall_name,
+		       regs->ip - 2, regs->sp, caller);
+		if (caller)
+			print_vma_addr(" in ", caller);
+		printk("\n");
+	}

as it would now needlessly annoy everyone.

Thanks,

	Ingo

----------------------------->
>From 5cec93c216db77c45f7ce970d46283bcb1933884 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto@MIT.EDU>
Date: Sun, 5 Jun 2011 13:50:24 -0400
Subject: [PATCH] x86-64: Emulate legacy vsyscalls

There's a fair amount of code in the vsyscall page.  It contains
a syscall instruction (in the gettimeofday fallback) and who
knows what will happen if an exploit jumps into the middle of
some other code.

Reduce the risk by replacing the vsyscalls with short magic
incantations that cause the kernel to emulate the real
vsyscalls. These incantations are useless if entered in the
middle.

This causes vsyscalls to be a little more expensive than real
syscalls.  Fortunately sensible programs don't use them.
The only exception is time() which is still called by glibc
through the vsyscall - but calling time() millions of times
per second is not sensible. glibc has this fixed in the
development tree.

This patch is not perfect: the vread_tsc and vread_hpet
functions are still at a fixed address.  Fixing that might
involve making alternative patching work in the vDSO.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
[ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/irq_vectors.h |    6 +-
 arch/x86/include/asm/traps.h       |    4 +
 arch/x86/include/asm/vsyscall.h    |   12 ++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/entry_64.S         |    2 +
 arch/x86/kernel/traps.c            |    6 +
 arch/x86/kernel/vsyscall_64.c      |  261 +++++++++++++++++-------------------
 arch/x86/kernel/vsyscall_emu_64.S  |   27 ++++
 include/linux/seccomp.h            |   10 ++
 9 files changed, 189 insertions(+), 140 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 6e976ee..a563c50 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -17,7 +17,8 @@
  *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
  *  Vectors  32 ... 127 : device interrupts
  *  Vector  128         : legacy int80 syscall interface
- *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 : device interrupts
+ *  Vector  204         : legacy x86_64 vsyscall emulation
+ *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 except 204 : device interrupts
  *  Vectors INVALIDATE_TLB_VECTOR_START ... 255 : special interrupts
  *
  * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
@@ -50,6 +51,9 @@
 #ifdef CONFIG_X86_32
 # define SYSCALL_VECTOR			0x80
 #endif
+#ifdef CONFIG_X86_64
+# define VSYSCALL_EMU_VECTOR		0xcc
+#endif
 
 /*
  * Vectors 0x30-0x3f are used for ISA interrupts.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0310da6..2bae0a5 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_TRAPS_H
 #define _ASM_X86_TRAPS_H
 
+#include <linux/kprobes.h>
+
 #include <asm/debugreg.h>
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
@@ -38,6 +40,7 @@ asmlinkage void alignment_check(void);
 asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
+asmlinkage void emulate_vsyscall(void);
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
@@ -64,6 +67,7 @@ dotraplinkage void do_alignment_check(struct pt_regs *, long);
 dotraplinkage void do_machine_check(struct pt_regs *, long);
 #endif
 dotraplinkage void do_simd_coprocessor_error(struct pt_regs *, long);
+dotraplinkage void do_emulate_vsyscall(struct pt_regs *, long);
 #ifdef CONFIG_X86_32
 dotraplinkage void do_iret_error(struct pt_regs *, long);
 #endif
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index d555973..bb710cb 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -31,6 +31,18 @@ extern struct timezone sys_tz;
 
 extern void map_vsyscall(void);
 
+/* Emulation */
+
+static inline bool is_vsyscall_entry(unsigned long addr)
+{
+	return (addr & ~0xC00UL) == VSYSCALL_START;
+}
+
+static inline int vsyscall_entry_nr(unsigned long addr)
+{
+	return (addr & 0xC00UL) >> 10;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 90b06d4..cc0469a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -44,6 +44,7 @@ obj-y			+= probe_roms.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
+obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 72c4a77..e949793 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1123,6 +1123,8 @@ zeroentry spurious_interrupt_bug do_spurious_interrupt_bug
 zeroentry coprocessor_error do_coprocessor_error
 errorentry alignment_check do_alignment_check
 zeroentry simd_coprocessor_error do_simd_coprocessor_error
+zeroentry emulate_vsyscall do_emulate_vsyscall
+
 
 	/* Reload gs selector with exception handling */
 	/* edi:  new selector */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..fbc097a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -872,6 +872,12 @@ void __init trap_init(void)
 	set_bit(SYSCALL_VECTOR, used_vectors);
 #endif
 
+#ifdef CONFIG_X86_64
+	BUG_ON(test_bit(VSYSCALL_EMU_VECTOR, used_vectors));
+	set_system_intr_gate(VSYSCALL_EMU_VECTOR, &emulate_vsyscall);
+	set_bit(VSYSCALL_EMU_VECTOR, used_vectors);
+#endif
+
 	/*
 	 * Should be a barrier for any external CPU state:
 	 */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 70a5f6e..10cd8ac 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -2,6 +2,8 @@
  *  Copyright (C) 2001 Andrea Arcangeli <andrea@suse.de> SuSE
  *  Copyright 2003 Andi Kleen, SuSE Labs.
  *
+ *  [ NOTE: this mechanism is now deprecated in favor of the vDSO. ]
+ *
  *  Thanks to hpa@transmeta.com for some useful hint.
  *  Special thanks to Ingo Molnar for his early experience with
  *  a different vsyscall implementation for Linux/IA32 and for the name.
@@ -11,10 +13,9 @@
  *  vsyscalls. One vsyscall can reserve more than 1 slot to avoid
  *  jumping out of line if necessary. We cannot add more with this
  *  mechanism because older kernels won't return -ENOSYS.
- *  If we want more than four we need a vDSO.
  *
- *  Note: the concept clashes with user mode linux. If you use UML and
- *  want per guest time just set the kernel.vsyscall64 sysctl to 0.
+ *  Note: the concept clashes with user mode linux.  UML users should
+ *  use the vDSO.
  */
 
 /* Disable profiling for userspace code: */
@@ -32,6 +33,8 @@
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/notifier.h>
+#include <linux/syscalls.h>
+#include <linux/ratelimit.h>
 
 #include <asm/vsyscall.h>
 #include <asm/pgtable.h>
@@ -44,10 +47,7 @@
 #include <asm/desc.h>
 #include <asm/topology.h>
 #include <asm/vgtod.h>
-
-#define __vsyscall(nr) \
-		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
-#define __syscall_clobber "r11","cx","memory"
+#include <asm/traps.h>
 
 DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
@@ -71,146 +71,129 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	unsigned long flags;
 
 	write_seqlock_irqsave(&vsyscall_gtod_data.lock, flags);
+
 	/* copy vsyscall data */
-	vsyscall_gtod_data.clock.vread = clock->vread;
-	vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
-	vsyscall_gtod_data.clock.mask = clock->mask;
-	vsyscall_gtod_data.clock.mult = mult;
-	vsyscall_gtod_data.clock.shift = clock->shift;
-	vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
-	vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
-	vsyscall_gtod_data.wall_to_monotonic = *wtm;
-	vsyscall_gtod_data.wall_time_coarse = __current_kernel_time();
+	vsyscall_gtod_data.clock.vread		= clock->vread;
+	vsyscall_gtod_data.clock.cycle_last	= clock->cycle_last;
+	vsyscall_gtod_data.clock.mask		= clock->mask;
+	vsyscall_gtod_data.clock.mult		= mult;
+	vsyscall_gtod_data.clock.shift		= clock->shift;
+	vsyscall_gtod_data.wall_time_sec	= wall_time->tv_sec;
+	vsyscall_gtod_data.wall_time_nsec	= wall_time->tv_nsec;
+	vsyscall_gtod_data.wall_to_monotonic	= *wtm;
+	vsyscall_gtod_data.wall_time_coarse	= __current_kernel_time();
+
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
 
-/* RED-PEN may want to readd seq locking, but then the variable should be
- * write-once.
- */
-static __always_inline void do_get_tz(struct timezone * tz)
+static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
+			      const char *message)
 {
-	*tz = VVAR(vsyscall_gtod_data).sys_tz;
-}
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
+	struct task_struct *tsk;
 
-static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
-{
-	int ret;
-	asm volatile("syscall"
-		: "=a" (ret)
-		: "0" (__NR_gettimeofday),"D" (tv),"S" (tz)
-		: __syscall_clobber );
-	return ret;
-}
+	if (!show_unhandled_signals || !__ratelimit(&rs))
+		return;
 
-static __always_inline void do_vgettimeofday(struct timeval * tv)
-{
-	cycle_t now, base, mask, cycle_delta;
-	unsigned seq;
-	unsigned long mult, shift, nsec;
-	cycle_t (*vread)(void);
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
-
-		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!vread)) {
-			gettimeofday(tv,NULL);
-			return;
-		}
-
-		now = vread();
-		base = VVAR(vsyscall_gtod_data).clock.cycle_last;
-		mask = VVAR(vsyscall_gtod_data).clock.mask;
-		mult = VVAR(vsyscall_gtod_data).clock.mult;
-		shift = VVAR(vsyscall_gtod_data).clock.shift;
-
-		tv->tv_sec = VVAR(vsyscall_gtod_data).wall_time_sec;
-		nsec = VVAR(vsyscall_gtod_data).wall_time_nsec;
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
-
-	/* calculate interval: */
-	cycle_delta = (now - base) & mask;
-	/* convert to nsecs: */
-	nsec += (cycle_delta * mult) >> shift;
-
-	while (nsec >= NSEC_PER_SEC) {
-		tv->tv_sec += 1;
-		nsec -= NSEC_PER_SEC;
-	}
-	tv->tv_usec = nsec / NSEC_PER_USEC;
-}
+	tsk = current;
 
-int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
-{
-	if (tv)
-		do_vgettimeofday(tv);
-	if (tz)
-		do_get_tz(tz);
-	return 0;
+	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx\n",
+	       level, tsk->comm, task_pid_nr(tsk),
+	       message, regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
 }
 
-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
-time_t __vsyscall(1) vtime(time_t *t)
+void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 {
-	unsigned seq;
-	time_t result;
+	const char *vsyscall_name;
+	struct task_struct *tsk;
+	unsigned long caller;
+	int vsyscall_nr;
+	long ret;
+
+	/* Kernel code must never get here. */
+	BUG_ON(!user_mode(regs));
+
+	local_irq_enable();
+
+	/*
+	 * x86-ism here: regs->ip points to the instruction after the int 0xcc,
+	 * and int 0xcc is two bytes long.
+	 */
+	if (!is_vsyscall_entry(regs->ip - 2)) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "illegal int 0xcc (exploit attempt?)");
+		goto sigsegv;
+	}
+	vsyscall_nr = vsyscall_entry_nr(regs->ip - 2);
 
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
+	if (get_user(caller, (unsigned long __user *)regs->sp) != 0) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "int 0xcc with bad stack (exploit attempt?)");
+		goto sigsegv;
+	}
 
-		result = VVAR(vsyscall_gtod_data).wall_time_sec;
+	tsk = current;
+	if (seccomp_mode(&tsk->seccomp))
+		do_exit(SIGKILL);
+
+	switch (vsyscall_nr) {
+	case 0:
+		vsyscall_name = "gettimeofday";
+		ret = sys_gettimeofday(
+			(struct timeval __user *)regs->di,
+			(struct timezone __user *)regs->si);
+		break;
+
+	case 1:
+		vsyscall_name = "time";
+		ret = sys_time((time_t __user *)regs->di);
+		break;
+
+	case 2:
+		vsyscall_name = "getcpu";
+		ret = sys_getcpu((unsigned __user *)regs->di,
+				 (unsigned __user *)regs->si,
+				 0);
+		break;
+
+	default:
+		/*
+		 * If we get here, then vsyscall_nr indicates that int 0xcc
+		 * happened at an address in the vsyscall page that doesn't
+		 * contain int 0xcc.  That can't happen.
+		 */
+		BUG();
+	}
 
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
+	if (ret == -EFAULT) {
+		/*
+		 * Bad news -- userspace fed a bad pointer to a vsyscall.
+		 *
+		 * With a real vsyscall, that would have caused SIGSEGV.
+		 * To make writing reliable exploits using the emulated
+		 * vsyscalls harder, generate SIGSEGV here as well.
+		 */
+		warn_bad_vsyscall(KERN_INFO, regs,
+				  "vsyscall fault (exploit attempt?)");
+		goto sigsegv;
+	}
 
-	if (t)
-		*t = result;
-	return result;
-}
+	regs->ax = ret;
 
-/* Fast way to get current CPU and node.
-   This helps to do per node and per CPU caches in user space.
-   The result is not guaranteed without CPU affinity, but usually
-   works out because the scheduler tries to keep a thread on the same
-   CPU.
+	/* Emulate a ret instruction. */
+	regs->ip = caller;
+	regs->sp += 8;
 
-   tcache must point to a two element sized long array.
-   All arguments can be NULL. */
-long __vsyscall(2)
-vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
-{
-	unsigned int p;
-	unsigned long j = 0;
-
-	/* Fast cache - only recompute value once per jiffies and avoid
-	   relatively costly rdtscp/cpuid otherwise.
-	   This works because the scheduler usually keeps the process
-	   on the same CPU and this syscall doesn't guarantee its
-	   results anyways.
-	   We do this here because otherwise user space would do it on
-	   its own in a likely inferior way (no access to jiffies).
-	   If you don't like it pass NULL. */
-	if (tcache && tcache->blob[0] == (j = VVAR(jiffies))) {
-		p = tcache->blob[1];
-	} else if (VVAR(vgetcpu_mode) == VGETCPU_RDTSCP) {
-		/* Load per CPU data from RDTSCP */
-		native_read_tscp(&p);
-	} else {
-		/* Load per CPU data from GDT */
-		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
-	}
-	if (tcache) {
-		tcache->blob[0] = j;
-		tcache->blob[1] = p;
-	}
-	if (cpu)
-		*cpu = p & 0xfff;
-	if (node)
-		*node = p >> 12;
-	return 0;
+	local_irq_disable();
+	return;
+
+sigsegv:
+	regs->ip -= 2;  /* The faulting instruction should be the int 0xcc. */
+	force_sig(SIGSEGV, current);
 }
 
-/* Assume __initcall executes before all user space. Hopefully kmod
-   doesn't violate that. We'll find out if it does. */
+/*
+ * Assume __initcall executes before all user space. Hopefully kmod
+ * doesn't violate that. We'll find out if it does.
+ */
 static void __cpuinit vsyscall_set_cpu(int cpu)
 {
 	unsigned long d;
@@ -221,13 +204,15 @@ static void __cpuinit vsyscall_set_cpu(int cpu)
 	if (cpu_has(&cpu_data(cpu), X86_FEATURE_RDTSCP))
 		write_rdtscp_aux((node << 12) | cpu);
 
-	/* Store cpu number in limit so that it can be loaded quickly
-	   in user space in vgetcpu.
-	   12 bits for the CPU and 8 bits for the node. */
+	/*
+	 * Store cpu number in limit so that it can be loaded quickly
+	 * in user space in vgetcpu. (12 bits for the CPU and 8 bits for the node)
+	 */
 	d = 0x0f40000000000ULL;
 	d |= cpu;
 	d |= (node & 0xf) << 12;
 	d |= (node >> 4) << 48;
+
 	write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_PER_CPU, &d, DESCTYPE_S);
 }
 
@@ -241,8 +226,10 @@ static int __cpuinit
 cpu_vsyscall_notifier(struct notifier_block *n, unsigned long action, void *arg)
 {
 	long cpu = (long)arg;
+
 	if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN)
 		smp_call_function_single(cpu, cpu_vsyscall_init, NULL, 1);
+
 	return NOTIFY_DONE;
 }
 
@@ -256,21 +243,17 @@ void __init map_vsyscall(void)
 	/* Note that VSYSCALL_MAPPED_PAGES must agree with the code below. */
 	__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
 	__set_fixmap(VVAR_PAGE, physaddr_vvar_page, PAGE_KERNEL_VVAR);
-	BUILD_BUG_ON((unsigned long)__fix_to_virt(VVAR_PAGE) !=
-		     (unsigned long)VVAR_ADDRESS);
+	BUILD_BUG_ON((unsigned long)__fix_to_virt(VVAR_PAGE) != (unsigned long)VVAR_ADDRESS);
 }
 
 static int __init vsyscall_init(void)
 {
-	BUG_ON(((unsigned long) &vgettimeofday !=
-			VSYSCALL_ADDR(__NR_vgettimeofday)));
-	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
-	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
-	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
+	BUG_ON(VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE));
+
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
+
 	return 0;
 }
-
 __initcall(vsyscall_init);
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
new file mode 100644
index 0000000..ffa845e
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -0,0 +1,27 @@
+/*
+ * vsyscall_emu_64.S: Vsyscall emulation page
+ *
+ * Copyright (c) 2011 Andy Lutomirski
+ *
+ * Subject to the GNU General Public License, version 2
+ */
+
+#include <linux/linkage.h>
+#include <asm/irq_vectors.h>
+
+/* The unused parts of the page are filled with 0xcc by the linker script. */
+
+.section .vsyscall_0, "a"
+ENTRY(vsyscall_0)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_0)
+
+.section .vsyscall_1, "a"
+ENTRY(vsyscall_1)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_1)
+
+.section .vsyscall_2, "a"
+ENTRY(vsyscall_2)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_2)
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 167c333..cc7a4e9 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -19,6 +19,11 @@ static inline void secure_computing(int this_syscall)
 extern long prctl_get_seccomp(void);
 extern long prctl_set_seccomp(unsigned long);
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return s->mode;
+}
+
 #else /* CONFIG_SECCOMP */
 
 #include <linux/errno.h>
@@ -37,6 +42,11 @@ static inline long prctl_set_seccomp(unsigned long arg2)
 	return -EINVAL;
 }
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return 0;
+}
+
 #endif /* CONFIG_SECCOMP */
 
 #endif /* _LINUX_SECCOMP_H */

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [tip:x86/vdso] x86-64: Emulate legacy vsyscalls
  2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
                     ` (3 preceding siblings ...)
  2011-06-07  7:49   ` [tip:x86/vdso] x86-64: Emulate legacy vsyscalls tip-bot for Andy Lutomirski
@ 2011-06-07  8:03   ` tip-bot for Andy Lutomirski
  4 siblings, 0 replies; 112+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-06-07  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, torvalds, mikpe, richard.weinberger, jj,
	JBeulich, tglx, Louis.Rilling, luto, hpa, linux-kernel, luto,
	andi, bp, arjan, mingo

Commit-ID:  5cec93c216db77c45f7ce970d46283bcb1933884
Gitweb:     http://git.kernel.org/tip/5cec93c216db77c45f7ce970d46283bcb1933884
Author:     Andy Lutomirski <luto@MIT.EDU>
AuthorDate: Sun, 5 Jun 2011 13:50:24 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 7 Jun 2011 10:02:35 +0200

x86-64: Emulate legacy vsyscalls

There's a fair amount of code in the vsyscall page.  It contains
a syscall instruction (in the gettimeofday fallback) and who
knows what will happen if an exploit jumps into the middle of
some other code.

Reduce the risk by replacing the vsyscalls with short magic
incantations that cause the kernel to emulate the real
vsyscalls. These incantations are useless if entered in the
middle.

This causes vsyscalls to be a little more expensive than real
syscalls.  Fortunately sensible programs don't use them.
The only exception is time() which is still called by glibc
through the vsyscall - but calling time() millions of times
per second is not sensible. glibc has this fixed in the
development tree.

This patch is not perfect: the vread_tsc and vread_hpet
functions are still at a fixed address.  Fixing that might
involve making alternative patching work in the vDSO.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
[ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/irq_vectors.h |    6 +-
 arch/x86/include/asm/traps.h       |    4 +
 arch/x86/include/asm/vsyscall.h    |   12 ++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/entry_64.S         |    2 +
 arch/x86/kernel/traps.c            |    6 +
 arch/x86/kernel/vsyscall_64.c      |  261 +++++++++++++++++-------------------
 arch/x86/kernel/vsyscall_emu_64.S  |   27 ++++
 include/linux/seccomp.h            |   10 ++
 9 files changed, 189 insertions(+), 140 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 6e976ee..a563c50 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -17,7 +17,8 @@
  *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
  *  Vectors  32 ... 127 : device interrupts
  *  Vector  128         : legacy int80 syscall interface
- *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 : device interrupts
+ *  Vector  204         : legacy x86_64 vsyscall emulation
+ *  Vectors 129 ... INVALIDATE_TLB_VECTOR_START-1 except 204 : device interrupts
  *  Vectors INVALIDATE_TLB_VECTOR_START ... 255 : special interrupts
  *
  * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
@@ -50,6 +51,9 @@
 #ifdef CONFIG_X86_32
 # define SYSCALL_VECTOR			0x80
 #endif
+#ifdef CONFIG_X86_64
+# define VSYSCALL_EMU_VECTOR		0xcc
+#endif
 
 /*
  * Vectors 0x30-0x3f are used for ISA interrupts.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0310da6..2bae0a5 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_TRAPS_H
 #define _ASM_X86_TRAPS_H
 
+#include <linux/kprobes.h>
+
 #include <asm/debugreg.h>
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
@@ -38,6 +40,7 @@ asmlinkage void alignment_check(void);
 asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
+asmlinkage void emulate_vsyscall(void);
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
@@ -64,6 +67,7 @@ dotraplinkage void do_alignment_check(struct pt_regs *, long);
 dotraplinkage void do_machine_check(struct pt_regs *, long);
 #endif
 dotraplinkage void do_simd_coprocessor_error(struct pt_regs *, long);
+dotraplinkage void do_emulate_vsyscall(struct pt_regs *, long);
 #ifdef CONFIG_X86_32
 dotraplinkage void do_iret_error(struct pt_regs *, long);
 #endif
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index d555973..bb710cb 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -31,6 +31,18 @@ extern struct timezone sys_tz;
 
 extern void map_vsyscall(void);
 
+/* Emulation */
+
+static inline bool is_vsyscall_entry(unsigned long addr)
+{
+	return (addr & ~0xC00UL) == VSYSCALL_START;
+}
+
+static inline int vsyscall_entry_nr(unsigned long addr)
+{
+	return (addr & 0xC00UL) >> 10;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 90b06d4..cc0469a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -44,6 +44,7 @@ obj-y			+= probe_roms.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o vread_tsc_64.o
+obj-$(CONFIG_X86_64)	+= vsyscall_emu_64.o
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o hw_breakpoint.o
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 72c4a77..e949793 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1123,6 +1123,8 @@ zeroentry spurious_interrupt_bug do_spurious_interrupt_bug
 zeroentry coprocessor_error do_coprocessor_error
 errorentry alignment_check do_alignment_check
 zeroentry simd_coprocessor_error do_simd_coprocessor_error
+zeroentry emulate_vsyscall do_emulate_vsyscall
+
 
 	/* Reload gs selector with exception handling */
 	/* edi:  new selector */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..fbc097a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -872,6 +872,12 @@ void __init trap_init(void)
 	set_bit(SYSCALL_VECTOR, used_vectors);
 #endif
 
+#ifdef CONFIG_X86_64
+	BUG_ON(test_bit(VSYSCALL_EMU_VECTOR, used_vectors));
+	set_system_intr_gate(VSYSCALL_EMU_VECTOR, &emulate_vsyscall);
+	set_bit(VSYSCALL_EMU_VECTOR, used_vectors);
+#endif
+
 	/*
 	 * Should be a barrier for any external CPU state:
 	 */
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 70a5f6e..10cd8ac 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -2,6 +2,8 @@
  *  Copyright (C) 2001 Andrea Arcangeli <andrea@suse.de> SuSE
  *  Copyright 2003 Andi Kleen, SuSE Labs.
  *
+ *  [ NOTE: this mechanism is now deprecated in favor of the vDSO. ]
+ *
  *  Thanks to hpa@transmeta.com for some useful hint.
  *  Special thanks to Ingo Molnar for his early experience with
  *  a different vsyscall implementation for Linux/IA32 and for the name.
@@ -11,10 +13,9 @@
  *  vsyscalls. One vsyscall can reserve more than 1 slot to avoid
  *  jumping out of line if necessary. We cannot add more with this
  *  mechanism because older kernels won't return -ENOSYS.
- *  If we want more than four we need a vDSO.
  *
- *  Note: the concept clashes with user mode linux. If you use UML and
- *  want per guest time just set the kernel.vsyscall64 sysctl to 0.
+ *  Note: the concept clashes with user mode linux.  UML users should
+ *  use the vDSO.
  */
 
 /* Disable profiling for userspace code: */
@@ -32,6 +33,8 @@
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/notifier.h>
+#include <linux/syscalls.h>
+#include <linux/ratelimit.h>
 
 #include <asm/vsyscall.h>
 #include <asm/pgtable.h>
@@ -44,10 +47,7 @@
 #include <asm/desc.h>
 #include <asm/topology.h>
 #include <asm/vgtod.h>
-
-#define __vsyscall(nr) \
-		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
-#define __syscall_clobber "r11","cx","memory"
+#include <asm/traps.h>
 
 DEFINE_VVAR(int, vgetcpu_mode);
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
@@ -71,146 +71,129 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	unsigned long flags;
 
 	write_seqlock_irqsave(&vsyscall_gtod_data.lock, flags);
+
 	/* copy vsyscall data */
-	vsyscall_gtod_data.clock.vread = clock->vread;
-	vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
-	vsyscall_gtod_data.clock.mask = clock->mask;
-	vsyscall_gtod_data.clock.mult = mult;
-	vsyscall_gtod_data.clock.shift = clock->shift;
-	vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
-	vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
-	vsyscall_gtod_data.wall_to_monotonic = *wtm;
-	vsyscall_gtod_data.wall_time_coarse = __current_kernel_time();
+	vsyscall_gtod_data.clock.vread		= clock->vread;
+	vsyscall_gtod_data.clock.cycle_last	= clock->cycle_last;
+	vsyscall_gtod_data.clock.mask		= clock->mask;
+	vsyscall_gtod_data.clock.mult		= mult;
+	vsyscall_gtod_data.clock.shift		= clock->shift;
+	vsyscall_gtod_data.wall_time_sec	= wall_time->tv_sec;
+	vsyscall_gtod_data.wall_time_nsec	= wall_time->tv_nsec;
+	vsyscall_gtod_data.wall_to_monotonic	= *wtm;
+	vsyscall_gtod_data.wall_time_coarse	= __current_kernel_time();
+
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
 
-/* RED-PEN may want to readd seq locking, but then the variable should be
- * write-once.
- */
-static __always_inline void do_get_tz(struct timezone * tz)
+static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
+			      const char *message)
 {
-	*tz = VVAR(vsyscall_gtod_data).sys_tz;
-}
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
+	struct task_struct *tsk;
 
-static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
-{
-	int ret;
-	asm volatile("syscall"
-		: "=a" (ret)
-		: "0" (__NR_gettimeofday),"D" (tv),"S" (tz)
-		: __syscall_clobber );
-	return ret;
-}
+	if (!show_unhandled_signals || !__ratelimit(&rs))
+		return;
 
-static __always_inline void do_vgettimeofday(struct timeval * tv)
-{
-	cycle_t now, base, mask, cycle_delta;
-	unsigned seq;
-	unsigned long mult, shift, nsec;
-	cycle_t (*vread)(void);
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
-
-		vread = VVAR(vsyscall_gtod_data).clock.vread;
-		if (unlikely(!vread)) {
-			gettimeofday(tv,NULL);
-			return;
-		}
-
-		now = vread();
-		base = VVAR(vsyscall_gtod_data).clock.cycle_last;
-		mask = VVAR(vsyscall_gtod_data).clock.mask;
-		mult = VVAR(vsyscall_gtod_data).clock.mult;
-		shift = VVAR(vsyscall_gtod_data).clock.shift;
-
-		tv->tv_sec = VVAR(vsyscall_gtod_data).wall_time_sec;
-		nsec = VVAR(vsyscall_gtod_data).wall_time_nsec;
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
-
-	/* calculate interval: */
-	cycle_delta = (now - base) & mask;
-	/* convert to nsecs: */
-	nsec += (cycle_delta * mult) >> shift;
-
-	while (nsec >= NSEC_PER_SEC) {
-		tv->tv_sec += 1;
-		nsec -= NSEC_PER_SEC;
-	}
-	tv->tv_usec = nsec / NSEC_PER_USEC;
-}
+	tsk = current;
 
-int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
-{
-	if (tv)
-		do_vgettimeofday(tv);
-	if (tz)
-		do_get_tz(tz);
-	return 0;
+	printk("%s%s[%d] %s ip:%lx sp:%lx ax:%lx si:%lx di:%lx\n",
+	       level, tsk->comm, task_pid_nr(tsk),
+	       message, regs->ip - 2, regs->sp, regs->ax, regs->si, regs->di);
 }
 
-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
-time_t __vsyscall(1) vtime(time_t *t)
+void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code)
 {
-	unsigned seq;
-	time_t result;
+	const char *vsyscall_name;
+	struct task_struct *tsk;
+	unsigned long caller;
+	int vsyscall_nr;
+	long ret;
+
+	/* Kernel code must never get here. */
+	BUG_ON(!user_mode(regs));
+
+	local_irq_enable();
+
+	/*
+	 * x86-ism here: regs->ip points to the instruction after the int 0xcc,
+	 * and int 0xcc is two bytes long.
+	 */
+	if (!is_vsyscall_entry(regs->ip - 2)) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "illegal int 0xcc (exploit attempt?)");
+		goto sigsegv;
+	}
+	vsyscall_nr = vsyscall_entry_nr(regs->ip - 2);
 
-	do {
-		seq = read_seqbegin(&VVAR(vsyscall_gtod_data).lock);
+	if (get_user(caller, (unsigned long __user *)regs->sp) != 0) {
+		warn_bad_vsyscall(KERN_WARNING, regs, "int 0xcc with bad stack (exploit attempt?)");
+		goto sigsegv;
+	}
 
-		result = VVAR(vsyscall_gtod_data).wall_time_sec;
+	tsk = current;
+	if (seccomp_mode(&tsk->seccomp))
+		do_exit(SIGKILL);
+
+	switch (vsyscall_nr) {
+	case 0:
+		vsyscall_name = "gettimeofday";
+		ret = sys_gettimeofday(
+			(struct timeval __user *)regs->di,
+			(struct timezone __user *)regs->si);
+		break;
+
+	case 1:
+		vsyscall_name = "time";
+		ret = sys_time((time_t __user *)regs->di);
+		break;
+
+	case 2:
+		vsyscall_name = "getcpu";
+		ret = sys_getcpu((unsigned __user *)regs->di,
+				 (unsigned __user *)regs->si,
+				 0);
+		break;
+
+	default:
+		/*
+		 * If we get here, then vsyscall_nr indicates that int 0xcc
+		 * happened at an address in the vsyscall page that doesn't
+		 * contain int 0xcc.  That can't happen.
+		 */
+		BUG();
+	}
 
-	} while (read_seqretry(&VVAR(vsyscall_gtod_data).lock, seq));
+	if (ret == -EFAULT) {
+		/*
+		 * Bad news -- userspace fed a bad pointer to a vsyscall.
+		 *
+		 * With a real vsyscall, that would have caused SIGSEGV.
+		 * To make writing reliable exploits using the emulated
+		 * vsyscalls harder, generate SIGSEGV here as well.
+		 */
+		warn_bad_vsyscall(KERN_INFO, regs,
+				  "vsyscall fault (exploit attempt?)");
+		goto sigsegv;
+	}
 
-	if (t)
-		*t = result;
-	return result;
-}
+	regs->ax = ret;
 
-/* Fast way to get current CPU and node.
-   This helps to do per node and per CPU caches in user space.
-   The result is not guaranteed without CPU affinity, but usually
-   works out because the scheduler tries to keep a thread on the same
-   CPU.
+	/* Emulate a ret instruction. */
+	regs->ip = caller;
+	regs->sp += 8;
 
-   tcache must point to a two element sized long array.
-   All arguments can be NULL. */
-long __vsyscall(2)
-vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
-{
-	unsigned int p;
-	unsigned long j = 0;
-
-	/* Fast cache - only recompute value once per jiffies and avoid
-	   relatively costly rdtscp/cpuid otherwise.
-	   This works because the scheduler usually keeps the process
-	   on the same CPU and this syscall doesn't guarantee its
-	   results anyways.
-	   We do this here because otherwise user space would do it on
-	   its own in a likely inferior way (no access to jiffies).
-	   If you don't like it pass NULL. */
-	if (tcache && tcache->blob[0] == (j = VVAR(jiffies))) {
-		p = tcache->blob[1];
-	} else if (VVAR(vgetcpu_mode) == VGETCPU_RDTSCP) {
-		/* Load per CPU data from RDTSCP */
-		native_read_tscp(&p);
-	} else {
-		/* Load per CPU data from GDT */
-		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
-	}
-	if (tcache) {
-		tcache->blob[0] = j;
-		tcache->blob[1] = p;
-	}
-	if (cpu)
-		*cpu = p & 0xfff;
-	if (node)
-		*node = p >> 12;
-	return 0;
+	local_irq_disable();
+	return;
+
+sigsegv:
+	regs->ip -= 2;  /* The faulting instruction should be the int 0xcc. */
+	force_sig(SIGSEGV, current);
 }
 
-/* Assume __initcall executes before all user space. Hopefully kmod
-   doesn't violate that. We'll find out if it does. */
+/*
+ * Assume __initcall executes before all user space. Hopefully kmod
+ * doesn't violate that. We'll find out if it does.
+ */
 static void __cpuinit vsyscall_set_cpu(int cpu)
 {
 	unsigned long d;
@@ -221,13 +204,15 @@ static void __cpuinit vsyscall_set_cpu(int cpu)
 	if (cpu_has(&cpu_data(cpu), X86_FEATURE_RDTSCP))
 		write_rdtscp_aux((node << 12) | cpu);
 
-	/* Store cpu number in limit so that it can be loaded quickly
-	   in user space in vgetcpu.
-	   12 bits for the CPU and 8 bits for the node. */
+	/*
+	 * Store cpu number in limit so that it can be loaded quickly
+	 * in user space in vgetcpu. (12 bits for the CPU and 8 bits for the node)
+	 */
 	d = 0x0f40000000000ULL;
 	d |= cpu;
 	d |= (node & 0xf) << 12;
 	d |= (node >> 4) << 48;
+
 	write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_PER_CPU, &d, DESCTYPE_S);
 }
 
@@ -241,8 +226,10 @@ static int __cpuinit
 cpu_vsyscall_notifier(struct notifier_block *n, unsigned long action, void *arg)
 {
 	long cpu = (long)arg;
+
 	if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN)
 		smp_call_function_single(cpu, cpu_vsyscall_init, NULL, 1);
+
 	return NOTIFY_DONE;
 }
 
@@ -256,21 +243,17 @@ void __init map_vsyscall(void)
 	/* Note that VSYSCALL_MAPPED_PAGES must agree with the code below. */
 	__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
 	__set_fixmap(VVAR_PAGE, physaddr_vvar_page, PAGE_KERNEL_VVAR);
-	BUILD_BUG_ON((unsigned long)__fix_to_virt(VVAR_PAGE) !=
-		     (unsigned long)VVAR_ADDRESS);
+	BUILD_BUG_ON((unsigned long)__fix_to_virt(VVAR_PAGE) != (unsigned long)VVAR_ADDRESS);
 }
 
 static int __init vsyscall_init(void)
 {
-	BUG_ON(((unsigned long) &vgettimeofday !=
-			VSYSCALL_ADDR(__NR_vgettimeofday)));
-	BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
-	BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
-	BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
+	BUG_ON(VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE));
+
 	on_each_cpu(cpu_vsyscall_init, NULL, 1);
 	/* notifier priority > KVM */
 	hotcpu_notifier(cpu_vsyscall_notifier, 30);
+
 	return 0;
 }
-
 __initcall(vsyscall_init);
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/kernel/vsyscall_emu_64.S
new file mode 100644
index 0000000..ffa845e
--- /dev/null
+++ b/arch/x86/kernel/vsyscall_emu_64.S
@@ -0,0 +1,27 @@
+/*
+ * vsyscall_emu_64.S: Vsyscall emulation page
+ *
+ * Copyright (c) 2011 Andy Lutomirski
+ *
+ * Subject to the GNU General Public License, version 2
+ */
+
+#include <linux/linkage.h>
+#include <asm/irq_vectors.h>
+
+/* The unused parts of the page are filled with 0xcc by the linker script. */
+
+.section .vsyscall_0, "a"
+ENTRY(vsyscall_0)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_0)
+
+.section .vsyscall_1, "a"
+ENTRY(vsyscall_1)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_1)
+
+.section .vsyscall_2, "a"
+ENTRY(vsyscall_2)
+	int $VSYSCALL_EMU_VECTOR
+END(vsyscall_2)
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 167c333..cc7a4e9 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -19,6 +19,11 @@ static inline void secure_computing(int this_syscall)
 extern long prctl_get_seccomp(void);
 extern long prctl_set_seccomp(unsigned long);
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return s->mode;
+}
+
 #else /* CONFIG_SECCOMP */
 
 #include <linux/errno.h>
@@ -37,6 +42,11 @@ static inline long prctl_set_seccomp(unsigned long arg2)
 	return -EINVAL;
 }
 
+static inline int seccomp_mode(seccomp_t *s)
+{
+	return 0;
+}
+
 #endif /* CONFIG_SECCOMP */
 
 #endif /* _LINUX_SECCOMP_H */

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 22:49                           ` pageexec
  2011-06-06 22:57                             ` david
  2011-06-07  6:59                             ` Pekka Enberg
@ 2011-06-07  8:30                             ` Ingo Molnar
  2011-06-07 23:24                               ` pageexec
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-07  8:30 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> > A fastpath is defined by optimization considerations applied to a 
> > codepath (the priority it gets compared to other codepaths), 
> > *not* by its absolute performance.
> 
> we're not talking about random arbitrarily defined paths here but 
> the impact of putting well predicted branches into the pf handler 
> vs. int xx (are you perhaps confused by 'fast path' vs. 
> 'fastpath'?).

So please educate me, what is the difference between 'fast path' 
versus 'fastpath', as used by kernel developers, beyond the space?

> that impact only matters if it's measurable. you have yet to show 
> that it is. and all this sillyness is for a hypothetical situation 
> since those conditional branches don't even need to be in the 
> general page fault processing paths.

Is this some sort of sick joke?

Do you *really* claim that the number of instructions executed in a 
fastpath do not matter and that our years-long effort to shave off an 
instruction here and there from the x86 do_page_fault() code were 
meaningless and that we can add branches with zero cost?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 22:57                             ` david
@ 2011-06-07  9:07                               ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-07  9:07 UTC (permalink / raw)
  To: david
  Cc: pageexec, Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* david@lang.hm <david@lang.hm> wrote:

> > why are you cutting out in all those mails of yours what i 
> > already told you many times? the original statement from Andy was 
> > about the int cc path vs. the pf path: he said that the latter 
> > would not tolerate a few well predicted branches (if they were 
> > put there at all, that is) because the pf handler is such a 
> > critical fast path code. it is *not*. it can't be by almost 
> > definition given how much processing it has to do (it is by far 
> > one of the most complex of cpu exceptions to process).
> 
> it seems to me that such a complicated piece of code that is 
> executed so frequently is especially sensitive to anything that 
> makes it take longer

Exactly.

Firstly, fully handling the most important types of minor page faults 
takes about 2000 cycles on modern x86 hardware - just two cycles 
overhead is 0.1% overhead and in the kernel we are frequently doing 
0.01% optimizations as well ...

Secondly, we optimize the branch count, even if they are 
well-predicted: the reason is to reduce the BTB footprint which is a 
limited CPU resource like the TLB. Every BTB entry we use up reduces 
the effective BTB size visible to user-space applications.

Thirdly, we always try to optimize L1 instruction cache footprint in 
fastpaths as well and new instructions increase the icache footprint. 

Fourthly, the "single branch overhead" is the *best case* that is 
rarely achieved in practice: often there are other instructions such 
as the compare instruction that precedes the branch ...

These are the reasons why we did various micro-optimizations in the 
past like:

  b80ef10e84d8: x86: Move do_page_fault()'s error path under unlikely()
  92181f190b64: x86: optimise x86's do_page_fault (C entry point for the page fault path)
  74a0b5762713: x86: optimize page faults like all other achitectures and kill notifier cruft

So if he argues that a single condition does not matter to our page 
fault fastpath then that is just crazy talk and i'd not let him close 
to the page fault code with a ten foot pole.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-07  0:34               ` pageexec
@ 2011-06-07  9:51                 ` Ingo Molnar
  2011-06-07 23:24                   ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-07  9:51 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 21:25, Ingo Molnar wrote:
> 
> > * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > 
> > > [...] it goes like 'I am not willing to do A because it would 
> > > help script kiddies but I'd rather do B that would help script 
> > > kiddies'. with A = 'disclose security bugs' and B = 'keep the 
> > > last roadblock that prevents full ASLR'.
> > 
> > No, that's wrong, the logic goes like this:
> > 
> >   if i do A then it has X1 advantages and Y1 disadvantages.
> >   if i do B then it has X2 advantages and Y2 disadvantages.
> > 
> > The Y1 and Y2 set of disadvantages can both include "making it 
> > easier for script kiddies" but the sets of advantages and 
> > disadvantages can also include MANY OTHER considerations, making 
> > the decision unique in each case.
> 
> Sure, i was only reflecting on what Linus himself kept insisting on 
> in the past.

>From what i've seen his say in past discussions he clearly applied 
the common-sense logic i outlined above, not the twisted logic you 
provided.

You paraphrased Linus in such a way:

  " it goes like 'I am not willing to do A because it would 
    help script kiddies but I'd rather do B that would help script 
    kiddies'. with A = 'disclose security bugs' and B = 'keep the 
    last roadblock that prevents full ASLR'.
  "

IMO your are blatantly misrepresenting Linus's opinion.

> > To translate it to this specific case (extremely simplifed, so 
> > please don't nit-pick that my descriptions of advantages and 
> > disadvantages are not precise nor complete):
> 
> i don't even need to get there, you already failed right in the 
> very first sentence, very impressive. no. 'not precise' is an 
> understatement.
> 
> >  A) "i put a zero day exploit and a CVE code into a changelog"
> > 
> >      Advantages: - it describes the problem more fully
> > 
> >   Disadvantages: - it makes it easier for people (including script kiddies) do harm faster
> >                  - creates a false, misleading category for "security bugs"
> > 
> 
> you try to set things up to serve your argument but it's not the things
> we've ever talked about (IOW, this is a strawman).
> 
> in particular, i've never ever requested exploits in commit logs 
> (and i don't remember anyone else who has, do you?). why do you 
> keep thinking in only extremes? is it so impossible to simply state 
> a CVE and the generic bug class (CWE) that the commit fixes? what 
> Linus has insisted on is 'no greppable words', that's diametrically 
> opposite to 'full disclosure' that you guys say you're supposedly 
> doing.

You contradict yourself in that paragraph (see below).

I simply disagree with putting easily greppable words like 'CVE' into 
the changelogs of bugs, due to what i already said above:

     Disadvantages: - it makes it easier for people (including script kiddies) do harm faster
                    - creates a false, misleading category for "security bugs"

> so if you omit the exploits that noone really requested (and i 
> don't even know why they'd be useful in a commit) then suddenly the 
> script kiddies are no longer helped.
> 
> and you have yet to explain what is false and misleading about the 
> security bug category. you used these words yourself several times 
> today, how do you explain that? why does the CVE exist? why does 
> bugtraq exist? are all those people discussing 'false and 
> misleading' things? why does your employer release security errata? 
> etc, etc.

My arguments against putting easily greppable CVE numbers into 
changelogs are very simple:

Firstly:

  - in many cases it is equivalent to providing a zero-day exploit
    in the changelog, to any reasonably skilled exploit writer

And you yourself said that you don't want to put zero-day exploits 
into changelogs, so why are you even arguing?

Secondly:

  - it's misleading because IMO CVE tagged bugs do not cover all 
    bugs that matter, they are self-selected bugs often highlighted 
    by attention-seeking participants of the security circus.
    The primary driving force in that industry is attention seeking, 
    *not* the maximization of security - and often they act in direct 
    conflict to better security ...

Maximizing security is hard: whether a bug has security implications 
is highly usecase and bug dependent, and the true security impact of 
bugs is not discovered in the majority of cases. I estimate that in 
*ANY* OS there's probably at least 10 times more bugs with some 
potential security impact than ever get a CVE number...

So putting CVEs into the changelog is harmful, pointless, misleading 
and would just create a fake "scare users" and "gain attention" 
industry (coupled with a "delay bug fixes for a long time" aspect, if 
paid well enough) that operates based on issuing CVEs and 'solving' 
them - which disincentivises the *real* bugfixes and the 
non-self-selected bug fixers.

I'd like to strengthen the natural 'bug fixing' industry, not the 
security circus industry.

[ But this is a higher level meta argument based on opinion so it's 
  probably rather pointless to argue it with you as such arguments
  need a certain level of mutual trust to discuss efficiently. ]

> > > but it's very simple logic Ingo.
> > 
> > Please drop the condescending tone, i think it should be clear to 
> > you by now that i have a good basis to disagree with you.
> 
> i'm a firm believer of instant karma, it seems to work on people 
> like yourself or Linus really well. in somewhat profane but simple 
> english: if you behave as an asshole i will treat you as one, if 
> you believe i treated you as an asshole it's because i think you 
> acted as one, and if you don't understand why then you're welcome 
> to 1. look into yourself and figure it out yourself, 2. ask me. 
> what is not going to get you anywhere is if you talk to me and 
> others from the high horse, you must be a lot better than your 
> current self for anyone to tolerate it.

I simply disagreed with you and argued with you without insulting you 
in such a tone.

Does disagreeing with you make me an 'asshole'?

But the thing is, i probably shouldnt bother arguing with you since i 
have trouble convincing you about very obvious things like the simple 
fact that putting more instructions into the page fault path ... 
slows it down, why should i bother arguing with you here?

You are not willing to listen and amazingly, in all these recent 
discussions you've never *ever* conceded a single point - even in 
cases where you were proven wrong beyond recognition!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07  0:02               ` pageexec
@ 2011-06-07  9:56                 ` Ingo Molnar
  2011-06-07 23:24                   ` pageexec
  2011-06-07 10:05                 ` Ingo Molnar
  2011-06-07 10:13                 ` Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-07  9:56 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 6 Jun 2011 at 21:12, Ingo Molnar wrote:
> 
> > * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > >
> > > and whoever enables them, what do you think they're more likely 
> > > to get in return? some random and rare old binaries that still 
> > > run for a minuscule subset of users or every run-of-the-mill 
> > > exploit working against *every* user, metasploit style (did you 
> > > know that it has a specific target for the i386 compat vdso)?
> > 
> > That's what binary compatibility means, yes.
> 
> so fedora is not binary compatible. did just admit that in real 
> life security won out? we're on the right track! ;)

No, you are wrong, and you are really confused about what binary 
compatibility of the kernel means.

The kernel itself will try hard to stay binary compatible, so that if 
someone with older userspace upgrades to a new kernel old user-space 
still works fine.

Fedora was able to disable the fixed-address vdso in its newer 32-bit 
distro kernels because it *upgraded glibc*. It has not disabled that 
option for its older versions with old glibcs. There was no breakage 
of binary compatibility.

So we were able to improve real life security *without* breaking 
binary compatibility.

Do you understand this distinction?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07  0:02               ` pageexec
  2011-06-07  9:56                 ` Ingo Molnar
@ 2011-06-07 10:05                 ` Ingo Molnar
  2011-06-07 23:24                   ` pageexec
  2011-06-07 10:13                 ` Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-07 10:05 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> you called this feature "borderline security FUD" but have yet to 
> prove it.

No, i did not claim that this feature is "borderline security FUD", 
at all.

I said, and i quote my original reply to Linus:

 |
 | > Do you have *any* reason to call this "unsafe"?
 |
 | No, there's no reason at all for that. That naming is borderline
 | security FUD and last time i saw the series i considered renaming
 | it but got distracted :-)
 |

 https://lkml.org/lkml/2011/6/6/141

That the *NAMING* is borderline security FUD. (I already applied the 
patches before i wrote that mail, see the commit notifications on 
lkml.)

Either you have basic reading comprehension problems or you are
lying deliberately, putting words into my mouth that i never said.
Which one is it?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07  0:02               ` pageexec
  2011-06-07  9:56                 ` Ingo Molnar
  2011-06-07 10:05                 ` Ingo Molnar
@ 2011-06-07 10:13                 ` Ingo Molnar
  2011-06-07 23:24                   ` pageexec
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-07 10:13 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> > You generally seem to assume that security is an absolute goal 
> > with no costs attached.
> 
> quote me on that back please or admit you made this up. [...]

Just one quick example of your delusion:

   |
   | "a page fault is never a fast path"
   |

   (PageExec, Jun 6, 2011)

   http://lkml.org/lkml/2011/6/6/209

I think that sentence will become a classic quote to chuckle about.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07 10:05                 ` Ingo Molnar
@ 2011-06-07 23:24                   ` pageexec
  2011-06-09  7:02                     ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-07 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra

On 7 Jun 2011 at 12:05, Ingo Molnar wrote:

> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > you called this feature "borderline security FUD" but have yet to 
> > prove it.
> 
> No, i did not claim that this feature is "borderline security FUD", 
> at all.

so can i take it as your concession that the vsyscall feature is indeed
a security problem and it's being randomized/(re)moved for security reasons?

in that case the naming of this feature is correct and you have no reason
to call it "borderline security FUD". so make up your mind!

> That the *NAMING* is borderline security FUD. (I already applied the 
> patches before i wrote that mail, see the commit notifications on 
> lkml.)

how can the name be "borderline security FUD" but what the name refers
to not be that? you see, we name things for a reason, mostly because
we think the name has something to do with the thing it names, duh?


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07  9:56                 ` Ingo Molnar
@ 2011-06-07 23:24                   ` pageexec
  2011-06-09  6:48                     ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-07 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra

On 7 Jun 2011 at 11:56, Ingo Molnar wrote:

> Fedora was able to disable the fixed-address vdso in its newer 32-bit 
> distro kernels because it *upgraded glibc*.

and what happened to those apps that users statically linked against
the older glibc? what happened to their chroots that had dynamically
linked binaries with an older glibc? did you not break those either?


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-07  8:30                             ` Ingo Molnar
@ 2011-06-07 23:24                               ` pageexec
  2011-06-08  5:55                                 ` Pekka Enberg
                                                   ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: pageexec @ 2011-06-07 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 7 Jun 2011 at 10:30, Ingo Molnar wrote:

> 
> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > > A fastpath is defined by optimization considerations applied to a 
> > > codepath (the priority it gets compared to other codepaths), 
> > > *not* by its absolute performance.
> > 
> > we're not talking about random arbitrarily defined paths here but 
> > the impact of putting well predicted branches into the pf handler 
> > vs. int xx (are you perhaps confused by 'fast path' vs. 
> > 'fastpath'?).
> 
> So please educate me, what is the difference between 'fast path' 
> versus 'fastpath', as used by kernel developers, beyond the space?

you seemed to have made a distinction, you tell me ;), i was simply
using it as a generic english phrase.

to give you an idea:
- if a code path executes in 1M or 1K cycles once every hour, then
  it's not a fast path, it doesn't matter to anyone whether it runs
  1 or 10 cycles faster or not,
- if a code path executes in 1M cycles 100 times a second then it's
  still not a fast path where single cycle speedups would mean anything,
- now if a code path executes in 1K cycles 100K times a second then
  suddenly there's a huge multiplier on even single cycle improvements
  that *may* be measurable and therefore relevant for some users

obviously these are just sample points in the parameter space, but
you get the idea (and no, there's no black&white clearly defined
subspace that could be labeled as 'fast path', it's a per-user fuzzy
set).

now for your pf handler. you said the pf handler was a 'fastpath'. you
didn't provide any numbers yet to back that up though so let's take it
at face value. you also said, no scratch that, *boasted* about a single
cycle improvement somewhere in that pf path. problem is that you have
never presented evidence for this claim. what is the reason for this
resistance? you must have measured the impact of your change to be able
to claim your numbers, so you must be able to present them, right? and
you must also be able to show its real life impact because it must have
done wonders to typical userland workloads. so where are the numbers Ingo?
or do you realize but have no balls to admit that your single cycle
'improvement' is simply bloody bullshit that noone cares about let alone
can experience? it's scary that one of the x86 maintainers has nothing
better to do than implement bogus 'optimizations'.

> > that impact only matters if it's measurable. you have yet to show 
> > that it is. and all this sillyness is for a hypothetical situation 
> > since those conditional branches don't even need to be in the 
> > general page fault processing paths.
> 
> Is this some sort of sick joke?

hey lookie, someone found his voice for a change ;). more seriously,
you should probably take reading comprehension lessons:

> Do you *really* claim that the number of instructions executed in a 
> fastpath do not matter

did i say that? let's see what i said, will be very easy as i'll just
have to copy-paste from above:

   that impact only matters if it's measurable.

do you understand the words 'impact' and 'measurable'? do you see the
words 'number' or 'instructions' in there?  do you understand that it's
not the insn count that matters per se but their measurable impact (which
is a function of more than just the insn count)?

> and that our years-long effort to shave off an 
> instruction here and there from the x86 do_page_fault() code were 
> meaningless

if they have no measurable impact then yes, it was all a pointless
exercise. if they do have measurable impact then it depends on what
that impact is and what a given user cares about. there's no black
and white answer, even if you think only in extremes.

> and that we can add branches with zero cost?

see above.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-07  9:51                 ` Ingo Molnar
@ 2011-06-07 23:24                   ` pageexec
  2011-06-10 11:19                     ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-07 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

On 7 Jun 2011 at 11:51, Ingo Molnar wrote:

> > Sure, i was only reflecting on what Linus himself kept insisting on 
> > in the past.
> 
> From what i've seen his say in past discussions he clearly applied 
> the common-sense logic i outlined above, not the twisted logic you 
> provided.

you must have been reading a different discussion over the years than
what i have. he's never once provided anything remotely related to
'common sense'. every time i pointed out the holes in his logic, he
just shut up. not exactly the way to have a meaningful conversation
but let's see if you fare any better ;).

> You paraphrased Linus in such a way:
> 
>   " it goes like 'I am not willing to do A because it would 
>     help script kiddies but I'd rather do B that would help script 
>     kiddies'. with A = 'disclose security bugs' and B = 'keep the 
>     last roadblock that prevents full ASLR'.
>   "
> 
> IMO your are blatantly misrepresenting Linus's opinion.

i think the man can defend himself if he really feels that way, but let
me show you the origin of this idiotic script kiddie defense of his:
http://lkml.org/lkml/2008/7/15/293 notice how he didn't even know the
difference between publishing exploits (something script kiddies could
try) vs. describing the security aspects of the fix. but as we can see
below, you're not that much better either, so let's continue there.

> > in particular, i've never ever requested exploits in commit logs 
> > (and i don't remember anyone else who has, do you?). why do you 
> > keep thinking in only extremes? is it so impossible to simply state 
> > a CVE and the generic bug class (CWE) that the commit fixes? what 
> > Linus has insisted on is 'no greppable words', that's diametrically 
> > opposite to 'full disclosure' that you guys say you're supposedly 
> > doing.
> 
> You contradict yourself in that paragraph (see below).

you wish i did ;).

> I simply disagree with putting easily greppable words like 'CVE' into 
> the changelogs of bugs, due to what i already said above:

i note that most of the world does this already, so you'd better come up with
an excuse that applies only to you but not the other projects/companies/etc.
i also note that you provided no explanation for your employer's behaviour
(http://www.redhat.com/security/data/cve/). do you disagree with them?

>      Disadvantages: - it makes it easier for people (including script kiddies) do harm faster

what evidence do you have that people are doing harm faster when you put
CVEs/etc into commits? and what do scipt kiddies have to do with this again?
do you know what that term even means?

>                     - creates a false, misleading category for "security bugs"

let's see this one below.

> My arguments against putting easily greppable CVE numbers into 
> changelogs are very simple:
> 
> Firstly:
> 
>   - in many cases it is equivalent to providing a zero-day exploit
>     in the changelog, to any reasonably skilled exploit writer

bullshit again. you don't even know what the term '0-day exploit' means.
(do you understand that if knowing the CVE leads people to exploits then
they can just track the CVEs instead and forget about kernel commits?).

but that aside, you're assuming the existence of a person who can read
the code in a commit and create an exploit out of it (when the CVE/etc
are included) but at the same time when he reads the same commit, he's
suddenly unable to create an exploit out of it (when the CVE/etc are not
included). are you for real Ingo? do you know this thing called 'logic'?
do you understand that your 'argument' rests on an impossible situation,
the existence of a person with mutually exclusive attributes?

btw, what do you know about exploit writers? and then what is a reasonably
skilled exploiter writer according to you? how would you even be able to
determine that?

let me tell you now a real distadvantage of your coverup: you're hurting
the good guys (the defenders) a lot more than you're hurting the bad guys
(the attackers). why? because of the usual asymmetry of the situation we
often face in security. an attacker needs to find only a single commit
silently fixing a security bug (never mind finding the earlier commit
that introduced it) whereas the defenders would have to find all of them.

thanks to your policy you can guess which side has a distinct advantage
from the start and how well the other side fares.

> And you yourself said that you don't want to put zero-day exploits 
> into changelogs, so why are you even arguing?

it's not even arguing Ingo, that would assume sides with equal knowledge
but you have only delusions about the real world out there. i always
wondered where you guys cook up these ideas about script kiddies, exploits,
0day, etc. do you read about them in the news? do you follow conferences
and read papers? do you actively research vulnerabilities? seriously,
what's the source of all this nonsense i see?

> Secondly:
> 
>   - it's misleading because IMO CVE tagged bugs do not cover all 
>     bugs that matter,

sure but that's not your problem. as a kernel developer your task is
to keep your users informed about what you yourselves know. covering
it up does not serve users, it serves attackers.

>     they are self-selected bugs
               ^^^^^^^^^^^^^
what does this mean?

>     often highlighted by attention-seeking participants of the security circus.

what do external actors have to do with what you tell your users?
nothing? also, what is this 'security circus'? the security industry
is very big, it has many many actors in it (playing on a wide spectrum
of attack and defense), which ones are you referring to in particular?

>     The primary driving force in that industry is attention seeking, 

as always, it's not a black&white world, so yes, some actors can be
described as above but then others cannot. whatever their interest
however, i don't see what it matters to you. your job is not to force
people to behave one way or another (considering all the negative
response you've got so far, you've failed at it) but to keep your
users secure. covering up security fixes doesn't do that, informing
them does.

>     *not* the maximization of security - and often they act in direct 
>     conflict to better security ...

i don't understand who you're referring to here. i believe the context
is people who provide you with security bugs, get a CVE and then go
public with it with more or less fanfare? is that what you're referring
to here? in any case, what does 'maximization of security' mean to you?
and how do these actors (whoever they are, define them please) act against
better security?

> Maximizing security is hard: whether a bug has security implications 
> is highly usecase and bug dependent, and the true security impact of 
> bugs is not discovered in the majority of cases. I estimate that in 
> *ANY* OS there's probably at least 10 times more bugs with some 
> potential security impact than ever get a CVE number...

sure but what does that imply for the bugs that you do know are security
related? i don't see how the coverup follows from it.

> So putting CVEs into the changelog is harmful,

it's not harmful, or at least not more harmful than withholding it. remember
the asymmetric situation. also by extension anyone else who publishes CVEs
is actively harming their users, is that what you're saying? if not then
explain why the rule applies to kernel development only but not the rest
of the world.

> pointless,

why? not knowing about security fixes harms exactly those who you claim
to protect, your users, the good guys.

> misleading

why? who's being misled and about what? a security bug is a security bug,
there's nothing misleading about it.
 
> and would just create a fake "scare users"

who's being scared by attaching a CVE to a commit? i've never seen such a
person in my life, have you? and what percentage of the userbase do you
think are 'scared' whenever a security bug's fixed? and why should the existence
of such users affect how the rest is informed? do you think that the weatherman
should retire as well because someone might be scared when told about the coming
storm ahead of time?

> and "gain attention" industry

that part of the industry is there, has been for a long while and is not
going away. why does all that matter to you guys though? in particular, how
do you think covering up security fixes will change anything in that part
of the industry? hint: it won't, in fact it will allow them grow even bigger
and louder since you continually provide munition to them.

> (coupled with a "delay bug fixes for a long time" aspect, if 
> paid well enough)

uh, are you still talking about kernel security bugs? what are you referring
to here in particular? what's the connection between the above and covering
up security fixes? i think i'm getting lost here a bit, sorry ;).

> that operates based on issuing CVEs and 'solving' 
> them - which disincentivises the *real* bugfixes and the 
> non-self-selected bug fixers.

self-selected again, what's that mean here? and how does 'whatever you are
talking about above' discourage the 'real' bugfixes? what are these 'real
bugfixes'?

> I'd like to strengthen the natural 'bug fixing' industry, not the 
> security circus industry.

what is this 'natural bug fixing industry' exactly? who are you thinking
of in particular? and how do you think covering up security fixes strengthens
this 'industry'? what does it even mean to strengthen them?

> I simply disagreed with you and argued with you without insulting you 
> in such a tone.

yeah we all saw your tone now and in the past too. that's why you get
what you deserve ;). but then you're an adult person and can take the
heat, can't you? judging by the average Linus and other flames on lkml,
i think i was being pretty mild so far in fact ;).

> Does disagreeing with you make me an 'asshole'?

disagreements require (at least) two equally real and relevant choices
over which one can reasonably disagree. you have yet to present your
side of that relevant choice, until then i won't treat you as an equal
peer, sorry (you can start by answering the above questions, although
considering how many issues you skipped over so far i don't have high
hopes). so no, it's not about disagreements, it's about trying to
teach you some basics of security and in general, logical thinking.

> But the thing is, i probably shouldnt bother arguing with you since i 
> have trouble convincing you about very obvious things like the simple 
> fact that putting more instructions into the page fault path ... 
> slows it down, why should i bother arguing with you here?

as i said in another mail, the devil is in the details, here, whether
that slowdown matters or not. man, every time you bring this up i have
to stop typing as i'm still smiling over your pride about that absolutely
bogus and irrelevant single cycle 'speedup' ;).

> You are not willing to listen and amazingly, in all these recent 
> discussions you've never *ever* conceded a single point - even in 
> cases where you were proven wrong beyond recognition!

show me a single point then ;). so far i only saw ex cathedra statements
without numbers (single cycle improvement, remember? haha, sorry couldn't
help it ;). anything else i missed? also a discussion is not a trade of
'now i'll agree with you on this point if you agree with me on that other
point'. i'll agree with what i think is correct and i'll point out bullshit
when i see it, simple as that.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07 10:13                 ` Ingo Molnar
@ 2011-06-07 23:24                   ` pageexec
  0 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-07 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra

On 7 Jun 2011 at 12:13, Ingo Molnar wrote:

> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > > You generally seem to assume that security is an absolute goal 
          ^^^^^^^^^
> > > with no costs attached.
> > 
> > quote me on that back please or admit you made this up. [...]
> 
> Just one quick example of your delusion:
> 
>    |
>    | "a page fault is never a fast path"
>    |

i don't see 'security', 'absolute', 'goal' and 'cost' in the above, do you?
(btw, nice try to extract a single sentence out of context, looks like you're
running out of steam if you have to descend this low ;)

but more importantly, did you see 'generally' above? do you think a single
sample would justify it? i think even you're not that dumb. or maybe that's
how you cook up your performance measurements too?

so try harder. say, find all the PaX features i implemented over the years,
see what kind of decisions i made, determine which one was for or against
performance (vs. security, usability, etc) and then let's see if you can
draw your conclusion or not. until then, you stay in the hole you dug
yourself into ;).

>    (PageExec, Jun 6, 2011)
> 
>    http://lkml.org/lkml/2011/6/6/209
> 
> I think that sentence will become a classic quote to chuckle about.

heh, if Ingo 'single cycle' Molnar says so... i'm still ROTFL whenever
i think about it, it was really priceless, thank you! ;)))))


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-06 14:07                 ` Brian Gerst
@ 2011-06-07 23:32                   ` pageexec
  2011-06-07 23:49                     ` Andrew Lutomirski
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-07 23:32 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andrew Lutomirski, Ingo Molnar, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Louis Rilling, Valdis.Kletnieks

On 6 Jun 2011 at 10:07, Brian Gerst wrote:

> > do you know what that mucking looks like? ;) prepare for the most complex code
> > you've ever seen (it's in __bad_area_nosemaphore):
> >
> >  779 #ifdef CONFIG_X86_64
> >  780 »·······if (mm && (error_code & PF_INSTR) && mm->context.vdso) {
> >  781 »·······»·······if (regs->ip == (unsigned long)vgettimeofday) {
> >  782 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, gettimeofday);
> >  783 »·······»·······»·······return;
> >  784 »·······»·······} else if (regs->ip == (unsigned long)vtime) {
> >  785 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, clock_gettime);
> >  786 »·······»·······»·······return;
> >  787 »·······»·······} else if (regs->ip == (unsigned long)vgetcpu) {
> >  788 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, getcpu);
> >  789 »·······»·······»·······return;
> >  790 »·······»·······}
> >  791 »·······}
> >  792 #endif
> 
> I like this approach, however since we're already in the kernel it
> makes sense just to run the normal syscall instead of redirecting to
> the vdso.

it's not that simple as the page fault occurs not at the actual syscall
insn but at the first insn of the given vsyscall function, so you'd have
to emulate it carefully to be able to return back the original caller in
userland.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-07 23:32                   ` pageexec
@ 2011-06-07 23:49                     ` Andrew Lutomirski
  2011-06-08  6:32                       ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Andrew Lutomirski @ 2011-06-07 23:49 UTC (permalink / raw)
  To: pageexec
  Cc: Brian Gerst, Ingo Molnar, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Louis Rilling, Valdis.Kletnieks

On Tue, Jun 7, 2011 at 7:32 PM,  <pageexec@freemail.hu> wrote:
>> > do you know what that mucking looks like? ;) prepare for the most complex code
>> > you've ever seen (it's in __bad_area_nosemaphore):
>> >
>> >  779 #ifdef CONFIG_X86_64
>> >  780 »·······if (mm && (error_code & PF_INSTR) && mm->context.vdso) {
>> >  781 »·······»·······if (regs->ip == (unsigned long)vgettimeofday) {
>> >  782 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, gettimeofday);
>> >  783 »·······»·······»·······return;
>> >  784 »·······»·······} else if (regs->ip == (unsigned long)vtime) {
>> >  785 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, clock_gettime);
>> >  786 »·······»·······»·······return;
>> >  787 »·······»·······} else if (regs->ip == (unsigned long)vgetcpu) {
>> >  788 »·······»·······»·······regs->ip = (unsigned long)VDSO64_SYMBOL(mm->context.vdso, getcpu);
>> >  789 »·······»·······»·······return;
>> >  790 »·······»·······}
>> >  791 »·······}
>> >  792 #endif
>>
>> I like this approach, however since we're already in the kernel it
>> makes sense just to run the normal syscall instead of redirecting to
>> the vdso.
>
> it's not that simple as the page fault occurs not at the actual syscall
> insn but at the first insn of the given vsyscall function, so you'd have
> to emulate it carefully to be able to return back the original caller in
> userland.
>
>

My patch (the version that's in tip/x86/vdso) more-or-less does that.
It's something like six lines of code, including error handling.

__bad_area_nosemaphore is not a fast path, and in fact I tried that
for the very first version of this code that I wrote.  My recollection
is that it's noticeably slower than int 0xcc because it has to go
through the whole VMA lookup.

If you want to submit a patch to switch from int 0xcc to
__bad_area_nosemaphore, be my guest :)  It will have almost no effect
on the complexity of the code, and, in fact, you'll probably get a net
deletion of lines because you can remove all the crud in vmlinux.lds.S
and some of the mapping code in vsyscall_64.c.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-07 23:24                               ` pageexec
@ 2011-06-08  5:55                                 ` Pekka Enberg
  2011-06-08  6:19                                   ` pageexec
  2011-06-08  6:48                                 ` Ingo Molnar
  2011-06-08  7:16                                 ` Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: Pekka Enberg @ 2011-06-08  5:55 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks

On Wed, Jun 8, 2011 at 2:24 AM,  <pageexec@freemail.hu> wrote:
> to give you an idea:
> - if a code path executes in 1M or 1K cycles once every hour, then
>  it's not a fast path, it doesn't matter to anyone whether it runs
>  1 or 10 cycles faster or not,

I'm pretty sure people who try to optimize kernel boot times, for
example, don't agree with you.

                                        Pekka

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08  5:55                                 ` Pekka Enberg
@ 2011-06-08  6:19                                   ` pageexec
  0 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-08  6:19 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 8 Jun 2011 at 8:55, Pekka Enberg wrote:

> On Wed, Jun 8, 2011 at 2:24 AM,  <pageexec@freemail.hu> wrote:
> > to give you an idea:
> > - if a code path executes in 1M or 1K cycles once every hour, then
> >  it's not a fast path, it doesn't matter to anyone whether it runs
> >  1 or 10 cycles faster or not,
> 
> I'm pretty sure people who try to optimize kernel boot times, for
> example, don't agree with you.

let's see, we're talking about 100M or more cycles at least (being
generous here as my kernels here seem to take more on the order of
1G or more cycles). i'm pretty sure they're unable to measure single
cycle (or even 10 or a 100 cycle) improvements as well. but they're
welcome to prove me wrong ;).


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-07 23:49                     ` Andrew Lutomirski
@ 2011-06-08  6:32                       ` pageexec
  0 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-08  6:32 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Brian Gerst, Ingo Molnar, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Louis Rilling, Valdis.Kletnieks

On 7 Jun 2011 at 19:49, Andrew Lutomirski wrote:

> My patch (the version that's in tip/x86/vdso) more-or-less does that.
> It's something like six lines of code, including error handling.

my memory is about more complexity (else i would have done it that way
to avoid the current result of double kernel entries) but anyway it's
a moot point now that this code is headed for mainline.

> __bad_area_nosemaphore is not a fast path, and in fact I tried that
> for the very first version of this code that I wrote.  My recollection
> is that it's noticeably slower than int 0xcc because it has to go
> through the whole VMA lookup.

you can put this logic much earlier, even before taking any locks (but
then you'll have to fear the dreaded single cycle improvement brigade ;).

> If you want to submit a patch to switch from int 0xcc to
> __bad_area_nosemaphore, be my guest :)  It will have almost no effect
> on the complexity of the code, and, in fact, you'll probably get a net
> deletion of lines because you can remove all the crud in vmlinux.lds.S
> and some of the mapping code in vsyscall_64.c.

i'll forward port my changes when 3.0 comes out, we'll see how more or
less complex they become and can then decide if it's worth it.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-07 23:24                               ` pageexec
  2011-06-08  5:55                                 ` Pekka Enberg
@ 2011-06-08  6:48                                 ` Ingo Molnar
  2011-06-08  9:02                                   ` pageexec
  2011-06-08  7:16                                 ` Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-08  6:48 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 7 Jun 2011 at 10:30, Ingo Molnar wrote:
> 
> > 
> > * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > 
> > > > A fastpath is defined by optimization considerations applied 
> > > > to a codepath (the priority it gets compared to other 
> > > > codepaths), *not* by its absolute performance.
> > > 
> > > we're not talking about random arbitrarily defined paths here 
> > > but the impact of putting well predicted branches into the pf 
> > > handler vs. int xx (are you perhaps confused by 'fast path' vs. 
> > > 'fastpath'?).
> > 
> > So please educate me, what is the difference between 'fast path' 
> > versus 'fastpath', as used by kernel developers, beyond the 
> > space?
> 
> you seemed to have made a distinction, you tell me ;), [...]

I have not made any distinction at all, *you* wrote:

> > > (are you perhaps confused by 'fast path' vs. 'fastpath'?).

and several people have asked you why you wrote that and you have not 
replied to any of those questions yet.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-07 23:24                               ` pageexec
  2011-06-08  5:55                                 ` Pekka Enberg
  2011-06-08  6:48                                 ` Ingo Molnar
@ 2011-06-08  7:16                                 ` Ingo Molnar
  2011-06-08  9:29                                   ` pageexec
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-08  7:16 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> to give you an idea:
> - if a code path executes in 1M or 1K cycles once every hour, then
>   it's not a fast path, it doesn't matter to anyone whether it runs
>   1 or 10 cycles faster or not,
> - if a code path executes in 1M cycles 100 times a second then it's
>   still not a fast path where single cycle speedups would mean anything,
> - now if a code path executes in 1K cycles 100K times a second then
>   suddenly there's a huge multiplier on even single cycle improvements
>   that *may* be measurable and therefore relevant for some users

The thing is, as i explained it before, your claim:

  > a page fault is never a fast path

is simply ridiculous on its face and crazy talk.

Beyond all the reasons why we don't want to touch the page fault path 
we have a working, implemented, tested IDT based alternative approach 
here that is faster and more compartmented so there's no reason 
whatsoever to touch the page fault path. We *do* add code to the page 
fault path in justified cases so this is not an absolute rule, but we 
try to avoid doing it, for all the reasons that me and others 
outlined.

Even if you do not take my word for it, several prominent kernel 
developers told you already that you are wrong, and i also showed you 
the commits that prove you wrong.

Your reply to that was to try to change the topic, laced with 
frequent insults thrown at me. You called me an 'asshole' yet the 
only thing i did was that i argued with you patiently.

Is there *any* point where you are willing to admit that you are 
wrong or should i just start filtering out your emails to save me all 
this trouble? When you comment on technical details you generally 
make very good suggestions so i'd hate to stop listening to your 
feedback, but there's a S/N ratio threshold under which i will need 
to do it ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08  6:48                                 ` Ingo Molnar
@ 2011-06-08  9:02                                   ` pageexec
  2011-06-08  9:11                                     ` Andi Kleen
  2011-06-08  9:15                                     ` Ingo Molnar
  0 siblings, 2 replies; 112+ messages in thread
From: pageexec @ 2011-06-08  9:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 8 Jun 2011 at 8:48, Ingo Molnar wrote:

> > you seemed to have made a distinction, you tell me ;), [...]
> 
> I have not made any distinction at all, *you* wrote:

i asked you that question because for all this time you seemed to
have been very worked up by the fact that i called the page fault
path as not 'fast'. i thought maybe what caused your nervous reaction
and desperate attempts at trying to justify it was due to some
misunderstanding in wording, but i now see that we probably talked
about the same thing. with the exception that you *still* have not
provided any evidence for your claim. why is that Ingo? do you have
nothing to prove your single cycle 'improvemnt'? (sorry, had a
chuckle again ;).


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08  9:02                                   ` pageexec
@ 2011-06-08  9:11                                     ` Andi Kleen
  2011-06-08  9:35                                       ` pageexec
  2011-06-08  9:15                                     ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: Andi Kleen @ 2011-06-08  9:11 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks


There's clear evidence that any kind of exception in gettimeofday()/time()
is too slow. A few years ago gtod() had CPUID which lead to intercepts
with many hypervisors. This turned out to be a unacceptable slowdown,
so it was fixed,

A pagefault is a bit faster than a virtualization intercept, but not
much and therefore too slow.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08  9:02                                   ` pageexec
  2011-06-08  9:11                                     ` Andi Kleen
@ 2011-06-08  9:15                                     ` Ingo Molnar
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-08  9:15 UTC (permalink / raw)
  To: pageexec
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 8 Jun 2011 at 8:48, Ingo Molnar wrote:
> 
> > > you seemed to have made a distinction, you tell me ;), [...]
> > 
> > I have not made any distinction at all, *you* wrote:
> 
> i asked you that question because for all this time you seemed to 
> have been very worked up by the fact that i called the page fault 
> path as not 'fast'. i thought maybe what caused your nervous 
> reaction and desperate attempts at trying to justify it was due to 
> some misunderstanding in wording, but i now see that we probably 
> talked about the same thing. with the exception that you *still* 
> have not provided any evidence for your claim. why is that Ingo? do 
> you have nothing to prove your single cycle 'improvemnt'? (sorry, 
> had a chuckle again ;).

You are again trying to shift the topic. Your original claim, which 
you snipped from your reply:

  > a page fault is never a fast path

is simply ridiculous on its face and crazy talk, and no amount of 
insults you hurl at me will change that fact - you ignored the 
various pieces of evidence that i cited that the page fault code is 
very much a fastpath: past commits, cycles estimations, a list of 
various (obvious) types of impact, the statements of several 
prominent kernel developers (including Linus) that establish that the 
page fault path is very much treated as a fastpath by everyone who 
develops it and you also ignored the fact that there's a working 
alternative that has none of those disadvantages.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08  7:16                                 ` Ingo Molnar
@ 2011-06-08  9:29                                   ` pageexec
  0 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-08  9:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks

On 8 Jun 2011 at 9:16, Ingo Molnar wrote:

> The thing is, as i explained it before, your claim:
> 
>   > a page fault is never a fast path
> 
> is simply ridiculous on its face and crazy talk.

you didn't *explain* a thing. you *claimed* something but offered *no*
proof. where's your measurement showing the single cycle improvement?
how many times do i get to ask you for it before you're willing to provide
it? does it even exist? you see, i'm beginning to think that you simply
just made up that claim, or in plain english, you lied about it. is that
really the case?

> Beyond all the reasons why we don't want to touch the page fault path 
> we have a working, implemented, tested IDT based alternative approach 
> here that is faster

btw, the pf based approach can be made as fast as well since the necessary
checks can be moved up early. but then we'll face the single cycle brigade ;).

> and more compartmented 

what does that even mean here? the pf based approach is less code btw.

> Even if you do not take my word for it, several prominent kernel 
> developers told you already that you are wrong,

you must have been reading a different thread or i wasn't cc'd on those
claims. care to quote them back (i only remember Pekka's mail and he has
yet to back up his claim about single/low cycle counts being important
for the bootup case)?

also claiming something and proving something are different things. as
i told you already, ex cathedra statements don't work here.

> and i also showed you  the commits that prove you wrong.

unfounded single cycle improvement claims don't a proof make. show your
measurements instead. provided they exist that is.

> Your reply to that was to try to change the topic,

what change are you talking about? you insisted on calling the pf path
fast and your single cycle improvements relevant, you get to prove it.

> laced with frequent insults thrown at me. You called me an 'asshole' yet
> the only thing i did was that i argued with you patiently.

i wish you had argued (i.e., presented well thought out, true and releavant
statements) but instead you only threw out completely baseless accusations,
insinuations, or even outright lies, never mind the several ad hominem
statements that i generously overlooked since unlike you, i can handle
the heat of a discussion ;). IOW, stop pretending to be the hurt angel
here, you're very far from it.

> Is there *any* point where you are willing to admit that you are 
> wrong or should i just start filtering out your emails to save me all 
> this trouble?

sure, just prove me wrong on a claim and i'll admit it ;).

> When you comment on technical details you generally 
> make very good suggestions so i'd hate to stop listening to your 
> feedback, but there's a S/N ratio threshold under which i will need 
> to do it ...

you sound like i care about who you listen to. if you're a mature person
you might as well act as one. like start answering the questions i posed
you in the last round of emails then we'll see about that S/N ratio.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08  9:11                                     ` Andi Kleen
@ 2011-06-08  9:35                                       ` pageexec
  2011-06-08 10:06                                         ` Andi Kleen
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-08  9:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Andi Kleen,
	Brian Gerst, Louis Rilling, Valdis.Kletnieks

On 8 Jun 2011 at 11:11, Andi Kleen wrote:

> There's clear evidence that any kind of exception in gettimeofday()/time()
> is too slow. A few years ago gtod() had CPUID which lead to intercepts
> with many hypervisors. This turned out to be a unacceptable slowdown,
> so it was fixed,

yes, i don't think anyone claimed any performance improvements from moving
away from the vsyscall page into taking exceptions ;). i think everyone
considers this as a stop-gap compatibility measure only that will be less
and less relevant and eventually may even go away as time progresses and
linux systems begin to fully rely on the vdso instead.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08  9:35                                       ` pageexec
@ 2011-06-08 10:06                                         ` Andi Kleen
  2011-06-08 10:26                                           ` pageexec
  2011-06-08 10:35                                           ` Ingo Molnar
  0 siblings, 2 replies; 112+ messages in thread
From: Andi Kleen @ 2011-06-08 10:06 UTC (permalink / raw)
  To: pageexec
  Cc: Andi Kleen, Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

> eventually may even go away as time progresses and
> linux systems begin to fully rely on the vdso instead.

That assumes that everyone uses glibc and also updates their userland. 
As pointed out many times that's a deeply flawed assumption.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08 10:06                                         ` Andi Kleen
@ 2011-06-08 10:26                                           ` pageexec
  2011-06-08 10:39                                             ` Ingo Molnar
  2011-06-08 10:35                                           ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-08 10:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Ingo Molnar, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

On 8 Jun 2011 at 12:06, Andi Kleen wrote:

> > eventually may even go away as time progresses and
> > linux systems begin to fully rely on the vdso instead.
> 
> That assumes that everyone uses glibc and also updates their userland. 
> As pointed out many times that's a deeply flawed assumption.

i think the assumption is not that everyone uses glibc but that everyone
else (as in, every other libc) can simply take the necessary changes from
glibc, provided they need such changes at all (i.e., they're using the
vsyscall entry points over the vdso ones).

i frankly didn't check any of the alternatives myself (uclibc/klibc/bionic/etc)
but i can't imagine that it'd be that much harder to patch them than glibc.

as i said, this was a compromise solution but then i think you already made
it clear that you didn't even think there was a problem here to solve, so i
guess we should work that out first, if you want to ;).


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08 10:06                                         ` Andi Kleen
  2011-06-08 10:26                                           ` pageexec
@ 2011-06-08 10:35                                           ` Ingo Molnar
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-08 10:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: pageexec, Andrew Lutomirski, x86, Thomas Gleixner, linux-kernel,
	Jesper Juhl, Borislav Petkov, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Brian Gerst, Louis Rilling, Valdis.Kletnieks


* Andi Kleen <andi@firstfloor.org> wrote:

> > eventually may even go away as time progresses and linux systems 
> > begin to fully rely on the vdso instead.
> 
> That assumes that everyone uses glibc and also updates their 
> userland. As pointed out many times that's a deeply flawed 
> assumption.

No, it does not assume it: if a particular usecase cares *so* little 
about updates that it won't ever update their userland and kernel 
then they have no problem: they'll have what they had before.

If they are willing to update the *kernel* then they will have to 
consider what every kernel update brings with itself: legacy 
facilities are de-emphasised all the time (while the ABI is still 
fully guaranteed) so user-space should not assume that newer kernels 
will offer the exact same performance tradeoffs as before.

Nor do i think have you cited any *real* example - you are just 
talking hypotheticals with very little specifics. ABI does not mean 
'executes the same instructions', guaranteeing that would be crazy.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
  2011-06-08 10:26                                           ` pageexec
@ 2011-06-08 10:39                                             ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2011-06-08 10:39 UTC (permalink / raw)
  To: pageexec
  Cc: Andi Kleen, Andrew Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 8 Jun 2011 at 12:06, Andi Kleen wrote:
> 
> > > eventually may even go away as time progresses and linux 
> > > systems begin to fully rely on the vdso instead.
> > 
> > That assumes that everyone uses glibc and also updates their 
> > userland. As pointed out many times that's a deeply flawed 
> > assumption.
> 
> i think the assumption is not that everyone uses glibc but that 
> everyone else (as in, every other libc) can simply take the 
> necessary changes from glibc, provided they need such changes at 
> all (i.e., they're using the vsyscall entry points over the vdso 
> ones).
> 
> i frankly didn't check any of the alternatives myself 
> (uclibc/klibc/bionic/etc) but i can't imagine that it'd be that 
> much harder to patch them than glibc.

Correct. Also, as i pointed it out in the previous mail, ABI does not 
mean 'will execute the same instructions', that would be silly. We 
*do* fix serious wide-scale performance regressions (obviously), but 
if it's about some weird crazy legacy path that we had good security 
reasons to change, and which is trivial to performance-improve in the 
library then we are well within our boundaries to keep the change.

> as i said, this was a compromise solution but then i think you 
> already made it clear that you didn't even think there was a 
> problem here to solve, so i guess we should work that out first, if 
> you want to ;).

Heh, indeed ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07 23:24                   ` pageexec
@ 2011-06-09  6:48                     ` Ingo Molnar
  2011-06-09 23:33                       ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-09  6:48 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 7 Jun 2011 at 11:56, Ingo Molnar wrote:
> 
> > Fedora was able to disable the fixed-address vdso in its newer 32-bit 
> > distro kernels because it *upgraded glibc*.
> 
> and what happened to those apps that users statically linked against
> the older glibc? what happened to their chroots that had dynamically
> linked binaries with an older glibc? did you not break those either?

There's two reasons why a distributor will generally not worry about 
that case:

 - No such binaries come with a default distro install

 - Keeping such old libraries linked or chrooted can be a security
   hole in itself, so i dou guarantee something like this.

If another distribution considers this a serious enough issue it can 
keep the COMPAT_VDSO option enabled forever. Few (none?) did.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-07 23:24                   ` pageexec
@ 2011-06-09  7:02                     ` Ingo Molnar
  2011-06-09 23:33                       ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-09  7:02 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 7 Jun 2011 at 12:05, Ingo Molnar wrote:
> 
> > * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > 
> > > you called this feature "borderline security FUD" but have yet 
> > > to prove it.
> > 
> > No, i did not claim that this feature is "borderline security 
> > FUD", at all.
> 
> so can i take it as your concession that the vsyscall feature is 
> indeed a security problem and it's being randomized/(re)moved for 
> security reasons?

Again, i made two statements:

  "That naming is borderline security FUD"
  "It's only a security problem if there's a security hole elsewhere."

I stand by those statements and i reject your repeated attempts to 
put words in my mouth that i did not say, such as:

   > you called this feature "borderline security FUD" [...]

> in that case the naming of this feature is correct and you have no 
> reason to call it "borderline security FUD". so make up your mind!
> 
> > That the *NAMING* is borderline security FUD. (I already applied 
> > the patches before i wrote that mail, see the commit 
> > notifications on lkml.)
> 
> how can the name be "borderline security FUD" but what the name 
> refers to not be that? you see, we name things for a reason, mostly 
> because we think the name has something to do with the thing it 
> names, duh?

It's borderline security FUD because it suggests that keeping the 
vsyscall around is in itself a security hole. As i outlined whether 
there's *another* bug that *can be exploited* is highly dependent on 
the usecase - while the Kconfig name made no such distinction. (For 
example a device maker might choose to keep the option enabled in 
some embedded usecase, those are pretty limited environments that 
have few vectors of attack.)

Anyway, repeating and explaining my arguments a dozen times did not 
make any difference to you, and there's a point where i have to stop 
wasting time on a person, so i've started filtering out your mails. 
If you want to send me any patches then please send it to any of my 
co-maintainers who will be able to review them.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-09  7:02                     ` Ingo Molnar
@ 2011-06-09 23:33                       ` pageexec
  0 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-09 23:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra

On 9 Jun 2011 at 9:02, Ingo Molnar wrote:

> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> > so can i take it as your concession that the vsyscall feature is 
> > indeed a security problem and it's being randomized/(re)moved for 
> > security reasons?
> 
> Again, i made two statements:
> 
>   "That naming is borderline security FUD"
>   "It's only a security problem if there's a security hole elsewhere."
> 
> I stand by those statements and i reject your repeated attempts to 
> put words in my mouth that i did not say, such as:
> 
>    > you called this feature "borderline security FUD" [...]

it's no more putting words into your mouth than your own attempt:

> You generally seem to assume that security is an absolute goal 
> with no costs attached.

(for which i'm still waiting for an actual proof btw ;)

also, notice i didn't quote you ("..." or >... style), i simply paraphrased
what you said and explained why. i got no response to those arguments though
so i guess you conceded them. then why do you keep arguing the same nonsense?

> > how can the name be "borderline security FUD" but what the name 
> > refers to not be that? you see, we name things for a reason, mostly 
> > because we think the name has something to do with the thing it 
> > names, duh?
> 
> It's borderline security FUD because it suggests that keeping the 
> vsyscall around is in itself a security hole.

it's not a hole per se, rather it's an accessory to writing reliable
exploits (not only userland ones btw). that's the *only* reason this
patchset even exists. do you deny that? now, if you don't deny it then
you also agree that the current vsyscall page *is* a security problem
(problem != hole). therefore the suggested name for the config option
that removes it for good cannot be "borderline security FUD".

> As i outlined whether there's *another* bug that *can be exploited* is
> highly dependent on the usecase - while the Kconfig name made no such
> distinction. (For example a device maker might choose to keep the
> option enabled in some embedded usecase, those are pretty limited
> environments that have few vectors of attack.) 

but you see, your own argument defeats your point: if for another use
case the presence of the vsyscall page *is* a security hazard (like,
i don't know, every single server and desktop out there?) then *not*
making it clear in the option name *is* a problem. see how it cuts
both ways? what now? ;) prudent people err on the side of safety.

also if the above device maker disables it even if their devices would
never ever be exploited using the vsyscall page, they haven't lost anything.
heck, they'll have reduced their kernel image and gained some all too
important cycles ;).

> Anyway, repeating and explaining my arguments a dozen times

repeated - yes, explained - no, that's the problem. you didn't even address
what *i* said (how many questions of mine have you left unanswered?). i'm
guessing because you realized how wrong you were but you're not the kind of
person who'd admit it.

> did not make any difference to you, and there's a point where i have
> to stop wasting time on a person, so i've started filtering out your
> mails. If you want to send me any patches then please send it to any
> of my co-maintainers who will be able to review them. 

PS: thanks for the cycles^Wchuckles ;)


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS
  2011-06-09  6:48                     ` Ingo Molnar
@ 2011-06-09 23:33                       ` pageexec
  0 siblings, 0 replies; 112+ messages in thread
From: pageexec @ 2011-06-09 23:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, x86, Thomas Gleixner,
	linux-kernel, Jesper Juhl, Borislav Petkov, Andrew Morton,
	Arjan van de Ven, Jan Beulich, richard -rw- weinberger,
	Mikael Pettersson, Andi Kleen, Brian Gerst, Louis Rilling,
	Valdis.Kletnieks, Peter Zijlstra

On 9 Jun 2011 at 8:48, Ingo Molnar wrote:
> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > On 7 Jun 2011 at 11:56, Ingo Molnar wrote:
> > 
> > > Fedora was able to disable the fixed-address vdso in its newer 32-bit 
> > > distro kernels because it *upgraded glibc*.
> > 
> > and what happened to those apps that users statically linked against
> > the older glibc? what happened to their chroots that had dynamically
> > linked binaries with an older glibc? did you not break those either?
> 
> There's two reasons why a distributor will generally not worry about 
> that case:

so you went from "There was no breakage of binary compatibility." to
saying that you didn't care about it. you could have just admitted it
from the beginning.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-07 23:24                   ` pageexec
@ 2011-06-10 11:19                     ` Ingo Molnar
  2011-06-14  0:48                       ` pageexec
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2011-06-10 11:19 UTC (permalink / raw)
  To: pageexec
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> let me tell you now a real distadvantage of your coverup: [...]

Our opinion is that the scheme you are suggesting is flawed and 
reduces security, so we refuse to use it. That is not a 'coverup', to 
the contrary, it *helps* security - see below.

> [...] you're hurting the good guys (the defenders) a lot more than 
> you're hurting the bad guys (the attackers). why? because of the 
> usual asymmetry of the situation we often face in security. an 
> attacker needs to find only a single commit silently fixing a 
> security bug (never mind finding the earlier commit that introduced 
> it) whereas the defenders would have to find all of them.
> 
> thanks to your policy you can guess which side has a distinct 
> advantage from the start and how well the other side fares.

Firstly, the assymetry is fundamental: attackers *always* have an 
easier way destroying stuff than the good guys are at building new 
things. This is the second law of thermodynamics.

Secondly, you are missing one fundamental aspect: the 'good guys' are 
not just the 'defenders'. The good guys are a *much* broader group of 
people: the 'bug fixers'.

Thirdly, you never replied in substance to our arguments that CVE 
numbers are woefully inadequate: they miss the majority of bugs that 
can have a security impact. In fact i argue that the way software is 
written and fixed today it's not possible to effectively map out 
'bugs with a security impact' at all: pretty much *any* bug that 
modifies the kernel image can have a security impact. Bug fixers are 
not at all concentrated on thinking like parasitic attackers, so 
security side effects often remain undiscovered. Why pretend we have 
a list of CVEs when we know that it's only fake?

Fourth, exactly how does putting CVE numbers make it harder for 
attackers? It makes it distinctly *easier*: people will update their 
systems based on a list of say 10 CVEs that affect them, totally 
blind to the 100+ other bugs that may (or may not) have an effect on 
them. An attacker will now only have to find an *already fixed* bug 
that has a security impact and which didn't get a CVE and attack a 
large body of systems that people think are safe.

With the current upstream kernel policy we do not deceive users: we 
say that the way to be most secure is to be uptodate. Attackers will 
have to find an entirely new, not yet fixed security hole, instead of 
just a bug that missed the CVE filter ...

I.e. our opinion is, on very good and honest grounds, that your 
scheme creates a false sense of security and actually harms real 
security and we simply refuse to support such a scheme.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-10 11:19                     ` Ingo Molnar
@ 2011-06-14  0:48                       ` pageexec
  2011-06-15 19:42                         ` Valdis.Kletnieks
  0 siblings, 1 reply; 112+ messages in thread
From: pageexec @ 2011-06-14  0:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling, Valdis.Kletnieks

On 10 Jun 2011 at 13:19, Ingo Molnar wrote:
> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > let me tell you now a real distadvantage of your coverup: [...]
> 
> Our opinion is that the scheme you are suggesting [...]

why are you trying to make it 'my' scheme? it's not mine, i didn't come up
with it, it's what pretty much everyone else (other than you, that is) in
the world does, including your own employer, Red Hat.

i already asked you about this and you never responded so here it is again:
what do you think about Red Hat publishing security errata (including kernel
vulnerabilities)? with CVEs, description of fault, etc.

it's diametrically opposite to what you've been claiming so there seems to be
a disconnect here. do you actively disagree with your own employer's security
bug handling policy? you see, they're doing exactly what you're not willing to.

> [...] is flawed and reduces security, so we refuse to use it. That is not a
> 'coverup', to the contrary, it *helps* security - see below. 

yeah well, we'll see about it. it looks like year after year you guys manage
to outdo yourselves in absurdity, one wonders if there'll be a new category
needed for this year's pwnie awards because you're likely to no longer fit the
lamest vendor response category.

> > [...] you're hurting the good guys (the defenders) a lot more than 
> > you're hurting the bad guys (the attackers). why? because of the 
> > usual asymmetry of the situation we often face in security. an 
> > attacker needs to find only a single commit silently fixing a 
> > security bug (never mind finding the earlier commit that introduced 
> > it) whereas the defenders would have to find all of them.
> > 
> > thanks to your policy you can guess which side has a distinct 
> > advantage from the start and how well the other side fares.
> 
> Firstly, the asymmetry is fundamental: attackers *always* have an 
> easier way destroying stuff than the good guys are at building new 
> things. This is the second law of thermodynamics.

what garbage. both sides are building stuff! in fact, finding vulnerabilities,
writing exploits is an even higher level creation process than normal development
as it gives us knowledge well beyond what we'd have if we were doing only the
usual development. what extra knowledge is that? without this kind of research
we'd have to accept at face value a developer's claim (expressed in source code)
that he's just written code that does this or that.

the extra info we learn through all the work done by security research is whether
said code lives up to its developer's claims or not. e.g., whenever someone finds
a vulnerability that allows arbitrary code execution is basically a proof that a
Turing machine thought to be non-universal is actually a universal one. and with
a working exploit the proof is even machine verifiable. this is one of the rare
instances when we can actually pull such stunts off for non-trivial codebases in
fact.

i find it amazing that this fact is even up for debate when in another subfield
of security, cryptology, both sides (cryptography and cryptanalysis) are well
accepted and studied subjects in academic, commercial, military, etc settings
worldwide, without all the negative connotations that seem to plague vulnerability
research in some minds.

as for the asymmetry: whether it's present in all situations or not is something
you don't know because you don't know all situations (in fact, you seem to know
very little about this whole subject). since i tend to err on the side of safety,
i said 'usual' and 'often' just because i can't exclude the possibility of a
situation where such asymmetry is not present or is much less pronounced than
what we face with vulnerabilities and exploits.

> Secondly, you are missing one fundamental aspect: the 'good guys' are 
> not just the 'defenders'. The good guys are a *much* broader group of 
> people: the 'bug fixers'.

is this language lawyering? what do you think bug fixers do? they reduce the
attack surface of a system and therefore are part of the defender group.

> Thirdly, you never replied in substance to our arguments that CVE 
> numbers are woefully inadequate:

heh, i replied to you many times already but you didn't respond to dozens of
questions already (did you respond to this one because it was featured on LWN?).
the answers are all there Ingo, you just have to read them.

and it's never been about CVE per se btw, it was about 'some' information that
would help one reading the commit clearly understand that it's a fix for a
security bug, as far as the committer knew.

whether a CVE or similar piece of information is inadequate depends on what the
goal is. clearly, you're thinking in extreme black&white terms once again.
somehow you seem to believe that if you can't provide perfect and complete
information about security bugs then providing *no* information is somehow the
better choice? and better for what? end users' security? truly mind boggling!

> they miss the majority of bugs that can have a security impact.

you don't understand the whole purpose of CVE and similar information. it's not
about providing guaranteed full coverage about any and all vulnerabilities that
exist. that knowledge is kinda non-existent as far as we know. instead, CVE is
a mechanism that let's the world organize of what is *known* and communicate it
between all parties (vendors, developers, users, etc). your stubborn refusal to
even contemplate the idea of communicating your own knowledge to your own users
is very stupid and haven't earned you many friends out there (it, however, serves
as an excellent basis for every sales speech by every security vendor out there).

> In fact i argue that the way software is written and fixed today it's
> not possible to effectively map out 'bugs with a security impact' at
> all: pretty much *any* bug that modifies the kernel image can have a
> security impact. 

this is a strawman, noone asks for this kind of work as you're not even in the
position to be able to do this even if you tried. last but not least, it's also

  "not possible to effectively map out 'bugs that can cause filesystem
   corruption' at all: pretty much *any* bug that modifies the kernel
   image can cause filesystem corruption".

however that little fact somehow never prevented you guys from describing such
fixes in the commits which contradicts your desire to not give your users a false
sense of (filesystem) security. so if you want to follow your own words, you'll
have to *stop* letting the world know when you fix a known filesystem corruption
bug since based on what you argued so far, you can't guarantee that those are the
*only* such bugs/fixes. what's more, covering up filesystem corruption bugs will
also help everyone who has to backport them to their own supported kernels (for
yet to be explained reasons, i'm sure the world's dying to know now how they're
supposed to pull that off).

> Bug fixers are not at all concentrated on thinking like parasitic
> attackers, so security side effects often remain undiscovered.

noone ever expects it from them as it's never been a matter of concentration,
it's a matter of being skilled at it which you are not, there's nothing wrong
with it.

calling people who do the hard work of vulnerability research 'parasitic' shows
only how insecure (no pun intended) you feel about this whole situation: you
(presumably) do your best to write code and then comes someone out of the blue
and pokes a hundred holes in it and your subconscious self-defense begins to
distort your view of yourself and the others.

btw, would you call every respected cryptographer out there a parasite? because
that's what you effectively said.

> Why pretend we have a list of CVEs when we know that it's only fake? 

because CVEs are not about what you seem to think they are. knowing that a given
bug is exploitable is not 'fake', communicating it to your users is not 'fake'.
lying about it is however dishonest of utmost proportions. btw, how's all this
sit with the 'full disclosure' declared in Documentation/SecurityBugs? ever
thought of clearing it up?

> Fourth, exactly how does putting CVE numbers make it harder for 
> attackers?

a little help in reading comprehension of what i said:

> > you're hurting the good guys (the defenders) a lot more than 
> > you're hurting the bad guys (the attackers).

what (you think) makes life harder for attackers is *withholding* CVE or similar
information from commits, not their inclusion.

> It makes it distinctly *easier*: people will update their systems
> based on a list of say 10 CVEs that affect them, totally blind to the
> 100+ other bugs that may (or may not) have an effect on them. 

'people' are not updating their systems based on any list. 'people' update their
systems based on what their kernel supplier (vendor/distro/company's internal
team/etc) provides them with (there's an extreme minority of users who build
their own kernels of the latest vanilla tree).

now the big question becomes whether these suppliers are helped or obstructed by
your policy of covering up security fixes. given that pretty much none of them
supports the latest vanilla tree, not even -stable, it means that in order to
release new kernels they'll have to backport fixes. fixes that they don't even
know they should be backporting because of your covering them up. so what happens
is that everyone has to read every commit and do his best estimate whether it's
worthy of backporting or not (notice the waste of duplicated effort). don't you
think they could spend more time on finding actually important fixes if they
could skip over the already known ones and just backport them?

> An attacker will now only have to find an *already fixed* bug

what makes you think an attacker is interested at all in already fixed bugs?
who's gonna pay for an exploit against such a bug? that's not exactly where
the market is.

> that has a security impact and which didn't get a CVE and attack a 
> large body of systems that people think are safe.

people don't think that systems are safe just because there're no outstanding
CVEs against them (where did this idea come from?) because everyone who has ever
heard of CVEs knows about 0-day bugs as well (if for no other reason than the
simple fact that there're CVEs issued for 0-day bugs as they became public).

> With the current upstream kernel policy we do not deceive users: we 
> say that the way to be most secure is to be uptodate.

where do you say that?

what you do say is that you practice full disclosure though which you're
apparently not doing in practice as you cover up security fixes. besides, if,
as you said above, you don't actually figure out the security impact of all the
bugs you fix, what's the guarantee that the latest kernel (whatever up-to-date
means, git HEAD?) didn't introduce more bugs than it fixed? if you can't provide
such a guarantee (no, saying it is not it) then using the latest kernel is as
good as using anything else, if not worse (since older kernels at least had
more eyes scrutinize them by virtue of being around for longer).

the biggest flaw with your argument is that noone uses up-to-date kernels because
they have to rely on vendors/distros/etc. and for them to be able to produce an
up-to-date kernel they'd need to know the exact information that you're omitting.
so for the majority of users you make it impossible to be the most secure.

> Attackers will have to find an entirely new, not yet fixed security
> hole, instead of just a bug that missed the CVE filter ... 

why would an attacker need to find a 0-day bug when he can just sit back and
watch as the kernel suppliers struggle with backporting covered up security
fixes and pick up the ones they missed? unless you want to claim that attackers
are worse at identifying said silent fixes than kernel suppliers but i hope you
realize how ridiculous that would be.

> I.e. our opinion is, on very good and honest grounds,

'good' and 'honest' are not exactly the words i'd use here ;)

> that your scheme creates a false sense of security and actually harms
> real security and we simply refuse to support such a scheme. 

false sense of security is a term that you should understand before you use it
in context. in particular, you didn't demonstrate the origin of any sense of
security, never mind a false one. second, you didn't demonstrate any harm from
properly disclosing security fixes (vs. covering them up).


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule
  2011-06-14  0:48                       ` pageexec
@ 2011-06-15 19:42                         ` Valdis.Kletnieks
  0 siblings, 0 replies; 112+ messages in thread
From: Valdis.Kletnieks @ 2011-06-15 19:42 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Andy Lutomirski, x86,
	Thomas Gleixner, linux-kernel, Jesper Juhl, Borislav Petkov,
	Andrew Morton, Arjan van de Ven, Jan Beulich,
	richard -rw- weinberger, Mikael Pettersson, Brian Gerst,
	Louis Rilling

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

On Tue, 14 Jun 2011 02:48:35 +0200, pageexec@freemail.hu said:
> > It makes it distinctly *easier*: people will update their systems
> > based on a list of say 10 CVEs that affect them, totally blind to the
> > 100+ other bugs that may (or may not) have an effect on them. 
>
> 'people' are not updating their systems based on any list. 'people' update their
> systems based on what their kernel supplier (vendor/distro/company's internal
> team/etc) provides them with (there's an extreme minority of users who build
> their own kernels of the latest vanilla tree).
>
> now the big question becomes whether these suppliers are helped or obstructed by
> your policy of covering up security fixes. given that pretty much none of them
> supports the latest vanilla tree, not even -stable, it means that in order to
> release new kernels they'll have to backport fixes. fixes that they don't even
> know they should be backporting because of your covering them up. so what happens
> is that everyone has to read every commit and do his best estimate whether it's
> worthy of backporting or not (notice the waste of duplicated effort). don't you
> think they could spend more time on finding actually important fixes if they
> could skip over the already known ones and just backport them?

You point out that "people" don't do updates based on the list - but then
recommend that *vendors* base their updates on the same list for the same
reasons. So instead of a user at risk, now it's every customer of a vendor at
risk, because the *vendor* will just cherry-pick the fixes labelled "security".
The vendor *still* has to go through the entire pile of fixes *anyhow*, because
we already *know* that we may fail to recognize that a fix is
security-relevant. So cherry-picking "security" is doing the customers a
dis-service.

Second, the vendor's engineers *still* have to go through all the fixes
*anyhow*, because in the *real world*, "security" is *not* the only thing -
it's *one small piece*.  A vendor's customers don't really care if the vendor
includes a fix for some hard-to-exploit kernel hole if the vendor fails to ship
the patch that keeps the fiberchannel controller card from locking up the
system, or the patch that makes MySQL not SEGV, or the patch that fixes dropped
characters on the tty line their data aquisition system uses, or the patch that
keeps the filesystem from corrupting directories because of a bad refcount.

If the system *does not work*, customers *don't care* about its security.  And
you're going to have a hard time if you keep trying to sell your "security is
everything" viewpoint to people who have to support systems where security is
only one piece of it.


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl
  2011-06-05 17:50 ` [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
  2011-06-06  8:32   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
@ 2011-12-05 18:27   ` Matthew Maurer
  1 sibling, 0 replies; 112+ messages in thread
From: Matthew Maurer @ 2011-12-05 18:27 UTC (permalink / raw)
  To: linux-kernel

Andy Lutomirski <luto <at> MIT.EDU> writes:

> 
> It's unnecessary overhead in code that's supposed to be highly
> optimized.  Removing it allows us to remove one of the two syscall
> instructions in the vsyscall page.
> 
> The only sensible use for it is for UML users, and it doesn't fully
> address inconsistent vsyscall results on UML.  The real fix for UML
> is to stop using vsyscalls entirely.

UML is not the only use case for this. Anybody who wants to be able to
trace _all_ system calls (for debugging, ptrace-based sandboxing, or a
number of other purposes) needs this.

We either need this back, or need another way to make it possible to
trace these system calls.

Apologies for the late reply, I didn't notice this change until it got
into my distribution's kernel.

--Matthew Maurer


^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2011-12-05 18:30 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable Andy Lutomirski
2011-06-06  8:31   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 2/9] x86-64: Document some of entry_64.S Andy Lutomirski
2011-06-06  8:31   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 3/9] x86-64: Give vvars their own page Andy Lutomirski
2011-06-06  8:32   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
2011-06-06  8:32   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-12-05 18:27   ` [PATCH v5 4/9] " Matthew Maurer
2011-06-05 17:50 ` [PATCH v5 5/9] x86-64: Map the HPET NX Andy Lutomirski
2011-06-06  8:33   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 6/9] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
2011-06-06  8:33   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 7/9] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
2011-06-05 19:30   ` Ingo Molnar
2011-06-05 20:01     ` Andrew Lutomirski
2011-06-06  7:39       ` Ingo Molnar
2011-06-06  9:42       ` pageexec
2011-06-06 11:19         ` Andrew Lutomirski
2011-06-06 11:56           ` pageexec
2011-06-06 12:43             ` Andrew Lutomirski
2011-06-06 13:58               ` pageexec
2011-06-06 14:07                 ` Brian Gerst
2011-06-07 23:32                   ` pageexec
2011-06-07 23:49                     ` Andrew Lutomirski
2011-06-08  6:32                       ` pageexec
2011-06-06 15:26                 ` Ingo Molnar
2011-06-06 15:48                   ` pageexec
2011-06-06 15:59                     ` Ingo Molnar
2011-06-06 16:19                       ` pageexec
2011-06-06 16:47                         ` Ingo Molnar
2011-06-06 22:49                           ` pageexec
2011-06-06 22:57                             ` david
2011-06-07  9:07                               ` Ingo Molnar
2011-06-07  6:59                             ` Pekka Enberg
2011-06-07  8:30                             ` Ingo Molnar
2011-06-07 23:24                               ` pageexec
2011-06-08  5:55                                 ` Pekka Enberg
2011-06-08  6:19                                   ` pageexec
2011-06-08  6:48                                 ` Ingo Molnar
2011-06-08  9:02                                   ` pageexec
2011-06-08  9:11                                     ` Andi Kleen
2011-06-08  9:35                                       ` pageexec
2011-06-08 10:06                                         ` Andi Kleen
2011-06-08 10:26                                           ` pageexec
2011-06-08 10:39                                             ` Ingo Molnar
2011-06-08 10:35                                           ` Ingo Molnar
2011-06-08  9:15                                     ` Ingo Molnar
2011-06-08  7:16                                 ` Ingo Molnar
2011-06-08  9:29                                   ` pageexec
2011-06-06 14:01             ` Linus Torvalds
2011-06-06 14:55               ` pageexec
2011-06-06 15:33                 ` Ingo Molnar
2011-06-06 15:58                   ` pageexec
2011-06-06 15:41         ` Ingo Molnar
2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-06  8:35   ` [tip:x86/vdso] x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build tip-bot for Ingo Molnar
2011-06-07  7:49   ` [tip:x86/vdso] x86-64: Emulate legacy vsyscalls tip-bot for Andy Lutomirski
2011-06-07  8:03   ` tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule Andy Lutomirski
2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-06  8:46   ` [PATCH v5 9/9] " Linus Torvalds
2011-06-06  9:31     ` Andi Kleen
2011-06-06 10:39       ` pageexec
2011-06-06 13:56         ` Linus Torvalds
2011-06-06 18:46           ` pageexec
2011-06-06 20:40             ` Linus Torvalds
2011-06-06 20:51               ` Andrew Lutomirski
2011-06-06 21:54                 ` Ingo Molnar
2011-06-06 21:45               ` Ingo Molnar
2011-06-06 21:48                 ` Ingo Molnar
     [not found]                 ` <BANLkTi==uw_h78oaep1cCOCzwY0edLUU_Q@mail.gmail.com>
2011-06-07  8:03                   ` [PATCH, v6] x86-64: Emulate legacy vsyscalls Ingo Molnar
2011-06-06 21:53               ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule pageexec
2011-06-06 14:44         ` Ingo Molnar
2011-06-06 15:01           ` pageexec
2011-06-06 15:15             ` Ingo Molnar
2011-06-06 15:29               ` pageexec
2011-06-06 16:54                 ` Ingo Molnar
2011-06-06 18:59           ` pageexec
2011-06-06 19:25             ` Ingo Molnar
2011-06-07  0:34               ` pageexec
2011-06-07  9:51                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-10 11:19                     ` Ingo Molnar
2011-06-14  0:48                       ` pageexec
2011-06-15 19:42                         ` Valdis.Kletnieks
2011-06-06 14:52         ` Ingo Molnar
2011-06-06 10:24     ` [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS Ingo Molnar
2011-06-06 11:20       ` pageexec
2011-06-06 12:47         ` Ingo Molnar
2011-06-06 12:48           ` Ingo Molnar
2011-06-06 18:04           ` pageexec
2011-06-06 19:12             ` Ingo Molnar
2011-06-07  0:02               ` pageexec
2011-06-07  9:56                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-09  6:48                     ` Ingo Molnar
2011-06-09 23:33                       ` pageexec
2011-06-07 10:05                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-09  7:02                     ` Ingo Molnar
2011-06-09 23:33                       ` pageexec
2011-06-07 10:13                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-06 12:19       ` Ted Ts'o
2011-06-06 12:33         ` Andrew Lutomirski
2011-06-06 12:37         ` Ingo Molnar
2011-06-06 14:34     ` [tip:x86/vdso] " tip-bot for Ingo Molnar
2011-06-05 20:05 ` [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andrew Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).