All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option
@ 2015-07-30 21:31 Andy Lutomirski
  2015-07-30 21:31 ` [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall Andy Lutomirski
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper,
	Jan Beulich, xen-devel, Andy Lutomirski

This is intended for x86/urgent.  Sorry for taking so long, but it
seemed nice to avoid breaking Xen.

This fixes the "dazed and confused" issue which was exposed by the
CVE-2015-5157 fix.  It's also probably a good general attack surface
reduction, and it replaces some scary code with IMO less scary code.

Also, servers and embedded systems should probably turn off modify_ldt.
This makes that possible.

Xen people, can you test patch 1?  It works for me on my evil 32-bit
Xen virtio setup.

Willy and Kees: I left the config option alone.  The -tiny people will
like it, and we can always add a sysctl of some sort later.

Changes from v5:
 - Improve Xen bits, this time with comments that seem plausible.
 - Fix debug_locks warning on exec.
 - Add a test for exec preserving the LDT (!)

Changes from v4:
 - Fix Xen even better (patch 1 is new).
 - Reorder the patches to make a little more sense.

Changes from v3:
 - Hopefully fixed Xen.
 - Fixed 32-bit test case on 32-bit native kernel.
 - Fix bogus vumnap for some LDT sizes.
 - Strengthen test case to check all LDT sizes (catches bogus vunmap).
 - Lots of cleanups, mostly from Borislav.
 - Simplify IPI code using on_each_cpu_mask.

Changes from v2:
 - Allocate ldt_struct and the LDT entries separately.  This should fix Xen.
 - Stop using write_ldt_entry, since I'm pretty sure it's unnecessary now
   that we no longer mutate an in-use LDT.  (Xen people, can you check?)

Changes from v1:
 - The config option is new.
 - The test case is new.
 - Fixed a missing allocation failure check.
 - Fixed a use-after-free on fork().

Andy Lutomirski (4):
  x86/xen: Probe target addresses in set_aliased_prot before the
    hypercall
  x86/ldt: Make modify_ldt synchronous
  selftests/x86, x86/ldt: Add a selftest for modify_ldt
  x86/ldt: Make modify_ldt optional

 arch/x86/Kconfig                      |  17 +
 arch/x86/include/asm/desc.h           |  15 -
 arch/x86/include/asm/mmu.h            |   5 +-
 arch/x86/include/asm/mmu_context.h    |  68 +++-
 arch/x86/kernel/Makefile              |   3 +-
 arch/x86/kernel/cpu/common.c          |   4 +-
 arch/x86/kernel/cpu/perf_event.c      |  16 +-
 arch/x86/kernel/ldt.c                 | 262 +++++++++-------
 arch/x86/kernel/process_64.c          |   6 +-
 arch/x86/kernel/step.c                |   8 +-
 arch/x86/power/cpu.c                  |   3 +-
 arch/x86/xen/enlighten.c              |  40 +++
 kernel/sys_ni.c                       |   1 +
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/ldt_gdt.c | 576 ++++++++++++++++++++++++++++++++++
 15 files changed, 871 insertions(+), 155 deletions(-)
 create mode 100644 tools/testing/selftests/x86/ldt_gdt.c

-- 
2.4.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
  2015-07-30 21:31 ` [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-31  9:41   ` [Xen-devel] " David Vrabel
                     ` (3 more replies)
  2015-07-30 21:31 ` [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous Andy Lutomirski
                   ` (7 subsequent siblings)
  9 siblings, 4 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper,
	Jan Beulich, xen-devel, Andy Lutomirski, stable, David Vrabel

The update_va_mapping hypercall can fail if the VA isn't present in
the guest's page tables.  Under certain loads, this can result in an
OOPS when the target address is in unpopulated vmap space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Cc: stable@vger.kernel.org
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <dvrabel@cantab.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/xen/enlighten.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0b95c9b8283f..09f7fd6aa5c8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 	pte_t pte;
 	unsigned long pfn;
 	struct page *page;
+	unsigned char dummy;
 
 	ptep = lookup_address((unsigned long)v, &level);
 	BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
 	pte = pfn_pte(pfn, prot);
 
+	/*
+	 * Careful: update_va_mapping will fail if the virtual address
+	 * we're poking isn't populated in the page tables.  We don't
+	 * need to worry about the direct map (that's always in the page
+	 * tables), but we need to be careful about vmap space.  In
+	 * particular, the top level page table can lazily propagate
+	 * entries between processes, so if we've switched mms since we
+	 * vmapped the target in the first place, we might not have the
+	 * top-level page table entry populated.
+	 *
+	 * We disable preemption because we want the same mm active when
+	 * we probe the target and when we issue the hypercall.  We'll
+	 * have the same nominal mm, but if we're a kernel thread, lazy
+	 * mm dropping could change our pgd.
+	 *
+	 * Out of an abundance of caution, this uses __get_user to fault
+	 * in the target address just in case there's some obscure case
+	 * in which the target address isn't readable.
+	 */
+
+	preempt_disable();
+
+	pagefault_disable();	/* Avoid warnings due to being atomic. */
+	__get_user(dummy, (unsigned char __user __force *)v);
+	pagefault_enable();
+
 	if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
 		BUG();
 
@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 				BUG();
 	} else
 		kmap_flush_unused();
+
+	preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 	const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
 	int i;
 
+	/*
+	 * We need to mark the all aliases of the LDT pages RO.  We
+	 * don't need to call vm_flush_aliases(), though, since that's
+	 * only responsible for flushing aliases out the TLBs, not the
+	 * page tables, and Xen will flush the TLB for us if needed.
+	 *
+	 * To avoid confusing future readers: none of this is necessary
+	 * to load the LDT.  The hypervisor only checks this when the
+	 * LDT is faulted in due to subsequent descriptor access.
+	 */
+
 	for(i = 0; i < entries; i += entries_per_page)
 		set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }
-- 
2.4.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-30 21:31 ` Andy Lutomirski
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Andy Lutomirski, Peter Zijlstra, Andrew Cooper,
	Steven Rostedt, xen-devel, David Vrabel, stable, Jan Beulich,
	Sasha Levin, Boris Ostrovsky

The update_va_mapping hypercall can fail if the VA isn't present in
the guest's page tables.  Under certain loads, this can result in an
OOPS when the target address is in unpopulated vmap space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Cc: stable@vger.kernel.org
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <dvrabel@cantab.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/xen/enlighten.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0b95c9b8283f..09f7fd6aa5c8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 	pte_t pte;
 	unsigned long pfn;
 	struct page *page;
+	unsigned char dummy;
 
 	ptep = lookup_address((unsigned long)v, &level);
 	BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
 	pte = pfn_pte(pfn, prot);
 
+	/*
+	 * Careful: update_va_mapping will fail if the virtual address
+	 * we're poking isn't populated in the page tables.  We don't
+	 * need to worry about the direct map (that's always in the page
+	 * tables), but we need to be careful about vmap space.  In
+	 * particular, the top level page table can lazily propagate
+	 * entries between processes, so if we've switched mms since we
+	 * vmapped the target in the first place, we might not have the
+	 * top-level page table entry populated.
+	 *
+	 * We disable preemption because we want the same mm active when
+	 * we probe the target and when we issue the hypercall.  We'll
+	 * have the same nominal mm, but if we're a kernel thread, lazy
+	 * mm dropping could change our pgd.
+	 *
+	 * Out of an abundance of caution, this uses __get_user to fault
+	 * in the target address just in case there's some obscure case
+	 * in which the target address isn't readable.
+	 */
+
+	preempt_disable();
+
+	pagefault_disable();	/* Avoid warnings due to being atomic. */
+	__get_user(dummy, (unsigned char __user __force *)v);
+	pagefault_enable();
+
 	if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
 		BUG();
 
@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 				BUG();
 	} else
 		kmap_flush_unused();
+
+	preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 	const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
 	int i;
 
+	/*
+	 * We need to mark the all aliases of the LDT pages RO.  We
+	 * don't need to call vm_flush_aliases(), though, since that's
+	 * only responsible for flushing aliases out the TLBs, not the
+	 * page tables, and Xen will flush the TLB for us if needed.
+	 *
+	 * To avoid confusing future readers: none of this is necessary
+	 * to load the LDT.  The hypervisor only checks this when the
+	 * LDT is faulted in due to subsequent descriptor access.
+	 */
+
 	for(i = 0; i < entries; i += entries_per_page)
 		set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }
-- 
2.4.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
  2015-07-30 21:31 ` [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall Andy Lutomirski
  2015-07-30 21:31 ` Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-30 22:50   ` Andrew Cooper
                     ` (3 more replies)
  2015-07-30 21:31 ` [PATCH v6 2/4] " Andy Lutomirski
                   ` (6 subsequent siblings)
  9 siblings, 4 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper,
	Jan Beulich, xen-devel, Andy Lutomirski, stable

modify_ldt has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Note to -stable maintainers: by itself, this patch makes a
pre-existing Xen bug much easier to trigger; on a 32-bit Xen guest,
the new ldt_gdt selftest is likely to OOPS.  Even without this
patch, the test can OOPS, but it's much less likely to happen.  The
Xen maintainers should have a fix for that issue shortly.

Cc: stable@vger.kernel.org
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc.h        |  15 ---
 arch/x86/include/asm/mmu.h         |   3 +-
 arch/x86/include/asm/mmu_context.h |  54 +++++++-
 arch/x86/kernel/cpu/common.c       |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  12 +-
 arch/x86/kernel/ldt.c              | 262 ++++++++++++++++++++-----------------
 arch/x86/kernel/process_64.c       |   4 +-
 arch/x86/kernel/step.c             |   6 +-
 arch/x86/power/cpu.c               |   3 +-
 9 files changed, 210 insertions(+), 153 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89fd2647..4e10d73cf018 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
 	set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-	set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-	preempt_disable();
-	load_LDT_nolock(pc);
-	preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
 	return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620a73b4..364d27481a52 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-	void *ldt;
-	int size;
+	struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 804a3a6030ca..984abfe47edc 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,50 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+	/*
+	 * Xen requires page-aligned LDTs with special permissions.  This is
+	 * needed to prevent us from installing evil descriptors such as
+	 * call gates.  On native, we could merge the ldt_struct and LDT
+	 * allocations, but it's not worth trying to optimize.
+	 */
+	struct desc_struct *entries;
+	int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+	struct ldt_struct *ldt;
+
+	/* lockless_dereference synchronizes with smp_store_release */
+	ldt = lockless_dereference(mm->context.ldt);
+
+	/*
+	 * Any change to mm->context.ldt is followed by an IPI to all
+	 * CPUs with the mm active.  The LDT will not be freed until
+	 * after the IPI is handled by all such CPUs.  This means that,
+	 * if the ldt_struct changes before we return, the values we see
+	 * will be safe, and the new values will be loaded before we run
+	 * any user code.
+	 *
+	 * NB: don't try to convert this to use RCU without extreme care.
+	 * We would still need IRQs off, because we don't want to change
+	 * the local LDT after an IPI loaded a newer value than the one
+	 * that we can see.
+	 */
+
+	if (unlikely(ldt))
+		set_ldt(ldt->entries, ldt->size);
+	else
+		clear_LDT();
+
+	DEBUG_LOCKS_WARN_ON(preemptible());
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +122,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * was called and then modify_ldt changed
 		 * prev->context.ldt but suppressed an IPI to this CPU.
 		 * In this case, prev->context.ldt != NULL, because we
-		 * never free an LDT while the mm still exists.  That
-		 * means that next->context.ldt != prev->context.ldt,
-		 * because mms never share an LDT.
+		 * never set context.ldt to NULL while the mm still
+		 * exists.  That means that next->context.ldt !=
+		 * prev->context.ldt, because mms never share an LDT.
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 	}
 #ifdef CONFIG_SMP
 	  else {
@@ -106,7 +150,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 			load_mm_cr4(next);
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 		}
 	}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 922c5e0cea4c..cb9e5df42dd2 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1410,7 +1410,7 @@ void cpu_init(void)
 	load_sp0(t, &current->thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
 	dbg_restore_debug_regs();
@@ -1459,7 +1459,7 @@ void cpu_init(void)
 	load_sp0(t, thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3658de47900f..9469dfa55607 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,21 +2179,25 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+		struct ldt_struct *ldt;
+
 		if (idx > LDT_ENTRIES)
 			return 0;
 
-		if (idx > current->active_mm->context.size)
+		/* IRQs are off, so this synchronizes with smp_store_release */
+		ldt = lockless_dereference(current->active_mm->context.ldt);
+		if (!ldt || idx > ldt->size)
 			return 0;
 
-		desc = current->active_mm->context.ldt;
+		desc = &ldt->entries[idx];
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
 
-		desc = raw_cpu_ptr(gdt_page.gdt);
+		desc = raw_cpu_ptr(gdt_page.gdt) + idx;
 	}
 
-	return get_desc_base(desc + idx);
+	return get_desc_base(desc);
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c37886d759cc..2bcc0525f1c1 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -12,6 +12,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/smp.h>
+#include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 
@@ -20,82 +21,82 @@
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
 
-#ifdef CONFIG_SMP
+/* context.lock is held for us, so we don't need any locking. */
 static void flush_ldt(void *current_mm)
 {
-	if (current->active_mm == current_mm)
-		load_LDT(&current->active_mm->context);
+	mm_context_t *pc;
+
+	if (current->active_mm != current_mm)
+		return;
+
+	pc = &current->active_mm->context;
+	set_ldt(pc->ldt->entries, pc->ldt->size);
 }
-#endif
 
-static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
+/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
+static struct ldt_struct *alloc_ldt_struct(int size)
 {
-	void *oldldt, *newldt;
-	int oldsize;
-
-	if (mincount <= pc->size)
-		return 0;
-	oldsize = pc->size;
-	mincount = (mincount + (PAGE_SIZE / LDT_ENTRY_SIZE - 1)) &
-			(~(PAGE_SIZE / LDT_ENTRY_SIZE - 1));
-	if (mincount * LDT_ENTRY_SIZE > PAGE_SIZE)
-		newldt = vmalloc(mincount * LDT_ENTRY_SIZE);
+	struct ldt_struct *new_ldt;
+	int alloc_size;
+
+	if (size > LDT_ENTRIES)
+		return NULL;
+
+	new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
+	if (!new_ldt)
+		return NULL;
+
+	BUILD_BUG_ON(LDT_ENTRY_SIZE != sizeof(struct desc_struct));
+	alloc_size = size * LDT_ENTRY_SIZE;
+
+	/*
+	 * Xen is very picky: it requires a page-aligned LDT that has no
+	 * trailing nonzero bytes in any page that contains LDT descriptors.
+	 * Keep it simple: zero the whole allocation and never allocate less
+	 * than PAGE_SIZE.
+	 */
+	if (alloc_size > PAGE_SIZE)
+		new_ldt->entries = vzalloc(alloc_size);
 	else
-		newldt = (void *)__get_free_page(GFP_KERNEL);
-
-	if (!newldt)
-		return -ENOMEM;
+		new_ldt->entries = kzalloc(PAGE_SIZE, GFP_KERNEL);
 
-	if (oldsize)
-		memcpy(newldt, pc->ldt, oldsize * LDT_ENTRY_SIZE);
-	oldldt = pc->ldt;
-	memset(newldt + oldsize * LDT_ENTRY_SIZE, 0,
-	       (mincount - oldsize) * LDT_ENTRY_SIZE);
+	if (!new_ldt->entries) {
+		kfree(new_ldt);
+		return NULL;
+	}
 
-	paravirt_alloc_ldt(newldt, mincount);
+	new_ldt->size = size;
+	return new_ldt;
+}
 
-#ifdef CONFIG_X86_64
-	/* CHECKME: Do we really need this ? */
-	wmb();
-#endif
-	pc->ldt = newldt;
-	wmb();
-	pc->size = mincount;
-	wmb();
-
-	if (reload) {
-#ifdef CONFIG_SMP
-		preempt_disable();
-		load_LDT(pc);
-		if (!cpumask_equal(mm_cpumask(current->mm),
-				   cpumask_of(smp_processor_id())))
-			smp_call_function(flush_ldt, current->mm, 1);
-		preempt_enable();
-#else
-		load_LDT(pc);
-#endif
-	}
-	if (oldsize) {
-		paravirt_free_ldt(oldldt, oldsize);
-		if (oldsize * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(oldldt);
-		else
-			put_page(virt_to_page(oldldt));
-	}
-	return 0;
+/* After calling this, the LDT is immutable. */
+static void finalize_ldt_struct(struct ldt_struct *ldt)
+{
+	paravirt_alloc_ldt(ldt->entries, ldt->size);
 }
 
-static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
+/* context.lock is held */
+static void install_ldt(struct mm_struct *current_mm,
+			struct ldt_struct *ldt)
 {
-	int err = alloc_ldt(new, old->size, 0);
-	int i;
+	/* Synchronizes with lockless_dereference in load_mm_ldt. */
+	smp_store_release(&current_mm->context.ldt, ldt);
+
+	/* Activate the LDT for all CPUs using current_mm. */
+	on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+}
 
-	if (err < 0)
-		return err;
+static void free_ldt_struct(struct ldt_struct *ldt)
+{
+	if (likely(!ldt))
+		return;
 
-	for (i = 0; i < old->size; i++)
-		write_ldt_entry(new->ldt, i, old->ldt + i * LDT_ENTRY_SIZE);
-	return 0;
+	paravirt_free_ldt(ldt->entries, ldt->size);
+	if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+		vfree(ldt->entries);
+	else
+		kfree(ldt->entries);
+	kfree(ldt);
 }
 
 /*
@@ -104,17 +105,37 @@ static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
+	struct ldt_struct *new_ldt;
 	struct mm_struct *old_mm;
 	int retval = 0;
 
 	mutex_init(&mm->context.lock);
-	mm->context.size = 0;
 	old_mm = current->mm;
-	if (old_mm && old_mm->context.size > 0) {
-		mutex_lock(&old_mm->context.lock);
-		retval = copy_ldt(&mm->context, &old_mm->context);
-		mutex_unlock(&old_mm->context.lock);
+	if (!old_mm) {
+		mm->context.ldt = NULL;
+		return 0;
 	}
+
+	mutex_lock(&old_mm->context.lock);
+	if (!old_mm->context.ldt) {
+		mm->context.ldt = NULL;
+		goto out_unlock;
+	}
+
+	new_ldt = alloc_ldt_struct(old_mm->context.ldt->size);
+	if (!new_ldt) {
+		retval = -ENOMEM;
+		goto out_unlock;
+	}
+
+	memcpy(new_ldt->entries, old_mm->context.ldt->entries,
+	       new_ldt->size * LDT_ENTRY_SIZE);
+	finalize_ldt_struct(new_ldt);
+
+	mm->context.ldt = new_ldt;
+
+out_unlock:
+	mutex_unlock(&old_mm->context.lock);
 	return retval;
 }
 
@@ -125,53 +146,47 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
  */
 void destroy_context(struct mm_struct *mm)
 {
-	if (mm->context.size) {
-#ifdef CONFIG_X86_32
-		/* CHECKME: Can this ever happen ? */
-		if (mm == current->active_mm)
-			clear_LDT();
-#endif
-		paravirt_free_ldt(mm->context.ldt, mm->context.size);
-		if (mm->context.size * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(mm->context.ldt);
-		else
-			put_page(virt_to_page(mm->context.ldt));
-		mm->context.size = 0;
-	}
+	free_ldt_struct(mm->context.ldt);
+	mm->context.ldt = NULL;
 }
 
 static int read_ldt(void __user *ptr, unsigned long bytecount)
 {
-	int err;
+	int retval;
 	unsigned long size;
 	struct mm_struct *mm = current->mm;
 
-	if (!mm->context.size)
-		return 0;
+	mutex_lock(&mm->context.lock);
+
+	if (!mm->context.ldt) {
+		retval = 0;
+		goto out_unlock;
+	}
+
 	if (bytecount > LDT_ENTRY_SIZE * LDT_ENTRIES)
 		bytecount = LDT_ENTRY_SIZE * LDT_ENTRIES;
 
-	mutex_lock(&mm->context.lock);
-	size = mm->context.size * LDT_ENTRY_SIZE;
+	size = mm->context.ldt->size * LDT_ENTRY_SIZE;
 	if (size > bytecount)
 		size = bytecount;
 
-	err = 0;
-	if (copy_to_user(ptr, mm->context.ldt, size))
-		err = -EFAULT;
-	mutex_unlock(&mm->context.lock);
-	if (err < 0)
-		goto error_return;
+	if (copy_to_user(ptr, mm->context.ldt->entries, size)) {
+		retval = -EFAULT;
+		goto out_unlock;
+	}
+
 	if (size != bytecount) {
-		/* zero-fill the rest */
-		if (clear_user(ptr + size, bytecount - size) != 0) {
-			err = -EFAULT;
-			goto error_return;
+		/* Zero-fill the rest and pretend we read bytecount bytes. */
+		if (clear_user(ptr + size, bytecount - size)) {
+			retval = -EFAULT;
+			goto out_unlock;
 		}
 	}
-	return bytecount;
-error_return:
-	return err;
+	retval = bytecount;
+
+out_unlock:
+	mutex_unlock(&mm->context.lock);
+	return retval;
 }
 
 static int read_default_ldt(void __user *ptr, unsigned long bytecount)
@@ -195,6 +210,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	struct desc_struct ldt;
 	int error;
 	struct user_desc ldt_info;
+	int oldsize, newsize;
+	struct ldt_struct *new_ldt, *old_ldt;
 
 	error = -EINVAL;
 	if (bytecount != sizeof(ldt_info))
@@ -213,34 +230,39 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 			goto out;
 	}
 
-	mutex_lock(&mm->context.lock);
-	if (ldt_info.entry_number >= mm->context.size) {
-		error = alloc_ldt(&current->mm->context,
-				  ldt_info.entry_number + 1, 1);
-		if (error < 0)
-			goto out_unlock;
-	}
-
-	/* Allow LDTs to be cleared by the user. */
-	if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
-		if (oldmode || LDT_empty(&ldt_info)) {
-			memset(&ldt, 0, sizeof(ldt));
-			goto install;
+	if ((oldmode && !ldt_info.base_addr && !ldt_info.limit) ||
+	    LDT_empty(&ldt_info)) {
+		/* The user wants to clear the entry. */
+		memset(&ldt, 0, sizeof(ldt));
+	} else {
+		if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+			error = -EINVAL;
+			goto out;
 		}
+
+		fill_ldt(&ldt, &ldt_info);
+		if (oldmode)
+			ldt.avl = 0;
 	}
 
-	if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
-		error = -EINVAL;
+	mutex_lock(&mm->context.lock);
+
+	old_ldt = mm->context.ldt;
+	oldsize = old_ldt ? old_ldt->size : 0;
+	newsize = max((int)(ldt_info.entry_number + 1), oldsize);
+
+	error = -ENOMEM;
+	new_ldt = alloc_ldt_struct(newsize);
+	if (!new_ldt)
 		goto out_unlock;
-	}
 
-	fill_ldt(&ldt, &ldt_info);
-	if (oldmode)
-		ldt.avl = 0;
+	if (old_ldt)
+		memcpy(new_ldt->entries, old_ldt->entries, oldsize * LDT_ENTRY_SIZE);
+	new_ldt->entries[ldt_info.entry_number] = ldt;
+	finalize_ldt_struct(new_ldt);
 
-	/* Install the new entry ...  */
-install:
-	write_ldt_entry(mm->context.ldt, ldt_info.entry_number, &ldt);
+	install_ldt(mm, new_ldt);
+	free_ldt_struct(old_ldt);
 	error = 0;
 
 out_unlock:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 71d7849a07f7..f6b916387590 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,11 +121,11 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
-		if (dead_task->mm->context.size) {
+		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
 				dead_task->mm->context.ldt,
-				dead_task->mm->context.size);
+				dead_task->mm->context.ldt->size);
 			BUG();
 		}
 	}
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 9b4d51d0c0d0..6273324186ac 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/ptrace.h>
 #include <asm/desc.h>
+#include <asm/mmu_context.h>
 
 unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs)
 {
@@ -30,10 +31,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		seg &= ~7UL;
 
 		mutex_lock(&child->mm->context.lock);
-		if (unlikely((seg >> 3) >= child->mm->context.size))
+		if (unlikely(!child->mm->context.ldt ||
+			     (seg >> 3) >= child->mm->context.ldt->size))
 			addr = -1L; /* bogus selector, access would fault */
 		else {
-			desc = child->mm->context.ldt + seg;
+			desc = &child->mm->context.ldt->entries[seg];
 			base = get_desc_base(desc);
 
 			/* 16-bit code segment? */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 0d7dd1f5ac36..9ab52791fed5 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -22,6 +22,7 @@
 #include <asm/fpu/internal.h>
 #include <asm/debugreg.h>
 #include <asm/cpu.h>
+#include <asm/mmu_context.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -153,7 +154,7 @@ static void fix_processor_context(void)
 	syscall_init();				/* This sets MSR_*STAR and related */
 #endif
 	load_TR_desc();				/* This does ltr */
-	load_LDT(&current->active_mm->context);	/* This does lldt */
+	load_mm_ldt(current->active_mm);	/* This does lldt */
 
 	fpu__resume_cpu();
 }
-- 
2.4.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
                   ` (2 preceding siblings ...)
  2015-07-30 21:31 ` [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Andy Lutomirski, Peter Zijlstra, Andrew Cooper,
	Steven Rostedt, xen-devel, stable, Jan Beulich, Sasha Levin,
	Boris Ostrovsky

modify_ldt has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Note to -stable maintainers: by itself, this patch makes a
pre-existing Xen bug much easier to trigger; on a 32-bit Xen guest,
the new ldt_gdt selftest is likely to OOPS.  Even without this
patch, the test can OOPS, but it's much less likely to happen.  The
Xen maintainers should have a fix for that issue shortly.

Cc: stable@vger.kernel.org
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc.h        |  15 ---
 arch/x86/include/asm/mmu.h         |   3 +-
 arch/x86/include/asm/mmu_context.h |  54 +++++++-
 arch/x86/kernel/cpu/common.c       |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  12 +-
 arch/x86/kernel/ldt.c              | 262 ++++++++++++++++++++-----------------
 arch/x86/kernel/process_64.c       |   4 +-
 arch/x86/kernel/step.c             |   6 +-
 arch/x86/power/cpu.c               |   3 +-
 9 files changed, 210 insertions(+), 153 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89fd2647..4e10d73cf018 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
 	set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-	set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-	preempt_disable();
-	load_LDT_nolock(pc);
-	preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
 	return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620a73b4..364d27481a52 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-	void *ldt;
-	int size;
+	struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 804a3a6030ca..984abfe47edc 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,50 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+	/*
+	 * Xen requires page-aligned LDTs with special permissions.  This is
+	 * needed to prevent us from installing evil descriptors such as
+	 * call gates.  On native, we could merge the ldt_struct and LDT
+	 * allocations, but it's not worth trying to optimize.
+	 */
+	struct desc_struct *entries;
+	int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+	struct ldt_struct *ldt;
+
+	/* lockless_dereference synchronizes with smp_store_release */
+	ldt = lockless_dereference(mm->context.ldt);
+
+	/*
+	 * Any change to mm->context.ldt is followed by an IPI to all
+	 * CPUs with the mm active.  The LDT will not be freed until
+	 * after the IPI is handled by all such CPUs.  This means that,
+	 * if the ldt_struct changes before we return, the values we see
+	 * will be safe, and the new values will be loaded before we run
+	 * any user code.
+	 *
+	 * NB: don't try to convert this to use RCU without extreme care.
+	 * We would still need IRQs off, because we don't want to change
+	 * the local LDT after an IPI loaded a newer value than the one
+	 * that we can see.
+	 */
+
+	if (unlikely(ldt))
+		set_ldt(ldt->entries, ldt->size);
+	else
+		clear_LDT();
+
+	DEBUG_LOCKS_WARN_ON(preemptible());
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +122,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * was called and then modify_ldt changed
 		 * prev->context.ldt but suppressed an IPI to this CPU.
 		 * In this case, prev->context.ldt != NULL, because we
-		 * never free an LDT while the mm still exists.  That
-		 * means that next->context.ldt != prev->context.ldt,
-		 * because mms never share an LDT.
+		 * never set context.ldt to NULL while the mm still
+		 * exists.  That means that next->context.ldt !=
+		 * prev->context.ldt, because mms never share an LDT.
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 	}
 #ifdef CONFIG_SMP
 	  else {
@@ -106,7 +150,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 			load_mm_cr4(next);
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 		}
 	}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 922c5e0cea4c..cb9e5df42dd2 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1410,7 +1410,7 @@ void cpu_init(void)
 	load_sp0(t, &current->thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
 	dbg_restore_debug_regs();
@@ -1459,7 +1459,7 @@ void cpu_init(void)
 	load_sp0(t, thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3658de47900f..9469dfa55607 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,21 +2179,25 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+		struct ldt_struct *ldt;
+
 		if (idx > LDT_ENTRIES)
 			return 0;
 
-		if (idx > current->active_mm->context.size)
+		/* IRQs are off, so this synchronizes with smp_store_release */
+		ldt = lockless_dereference(current->active_mm->context.ldt);
+		if (!ldt || idx > ldt->size)
 			return 0;
 
-		desc = current->active_mm->context.ldt;
+		desc = &ldt->entries[idx];
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
 
-		desc = raw_cpu_ptr(gdt_page.gdt);
+		desc = raw_cpu_ptr(gdt_page.gdt) + idx;
 	}
 
-	return get_desc_base(desc + idx);
+	return get_desc_base(desc);
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c37886d759cc..2bcc0525f1c1 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -12,6 +12,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/smp.h>
+#include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 
@@ -20,82 +21,82 @@
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
 
-#ifdef CONFIG_SMP
+/* context.lock is held for us, so we don't need any locking. */
 static void flush_ldt(void *current_mm)
 {
-	if (current->active_mm == current_mm)
-		load_LDT(&current->active_mm->context);
+	mm_context_t *pc;
+
+	if (current->active_mm != current_mm)
+		return;
+
+	pc = &current->active_mm->context;
+	set_ldt(pc->ldt->entries, pc->ldt->size);
 }
-#endif
 
-static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
+/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
+static struct ldt_struct *alloc_ldt_struct(int size)
 {
-	void *oldldt, *newldt;
-	int oldsize;
-
-	if (mincount <= pc->size)
-		return 0;
-	oldsize = pc->size;
-	mincount = (mincount + (PAGE_SIZE / LDT_ENTRY_SIZE - 1)) &
-			(~(PAGE_SIZE / LDT_ENTRY_SIZE - 1));
-	if (mincount * LDT_ENTRY_SIZE > PAGE_SIZE)
-		newldt = vmalloc(mincount * LDT_ENTRY_SIZE);
+	struct ldt_struct *new_ldt;
+	int alloc_size;
+
+	if (size > LDT_ENTRIES)
+		return NULL;
+
+	new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
+	if (!new_ldt)
+		return NULL;
+
+	BUILD_BUG_ON(LDT_ENTRY_SIZE != sizeof(struct desc_struct));
+	alloc_size = size * LDT_ENTRY_SIZE;
+
+	/*
+	 * Xen is very picky: it requires a page-aligned LDT that has no
+	 * trailing nonzero bytes in any page that contains LDT descriptors.
+	 * Keep it simple: zero the whole allocation and never allocate less
+	 * than PAGE_SIZE.
+	 */
+	if (alloc_size > PAGE_SIZE)
+		new_ldt->entries = vzalloc(alloc_size);
 	else
-		newldt = (void *)__get_free_page(GFP_KERNEL);
-
-	if (!newldt)
-		return -ENOMEM;
+		new_ldt->entries = kzalloc(PAGE_SIZE, GFP_KERNEL);
 
-	if (oldsize)
-		memcpy(newldt, pc->ldt, oldsize * LDT_ENTRY_SIZE);
-	oldldt = pc->ldt;
-	memset(newldt + oldsize * LDT_ENTRY_SIZE, 0,
-	       (mincount - oldsize) * LDT_ENTRY_SIZE);
+	if (!new_ldt->entries) {
+		kfree(new_ldt);
+		return NULL;
+	}
 
-	paravirt_alloc_ldt(newldt, mincount);
+	new_ldt->size = size;
+	return new_ldt;
+}
 
-#ifdef CONFIG_X86_64
-	/* CHECKME: Do we really need this ? */
-	wmb();
-#endif
-	pc->ldt = newldt;
-	wmb();
-	pc->size = mincount;
-	wmb();
-
-	if (reload) {
-#ifdef CONFIG_SMP
-		preempt_disable();
-		load_LDT(pc);
-		if (!cpumask_equal(mm_cpumask(current->mm),
-				   cpumask_of(smp_processor_id())))
-			smp_call_function(flush_ldt, current->mm, 1);
-		preempt_enable();
-#else
-		load_LDT(pc);
-#endif
-	}
-	if (oldsize) {
-		paravirt_free_ldt(oldldt, oldsize);
-		if (oldsize * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(oldldt);
-		else
-			put_page(virt_to_page(oldldt));
-	}
-	return 0;
+/* After calling this, the LDT is immutable. */
+static void finalize_ldt_struct(struct ldt_struct *ldt)
+{
+	paravirt_alloc_ldt(ldt->entries, ldt->size);
 }
 
-static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
+/* context.lock is held */
+static void install_ldt(struct mm_struct *current_mm,
+			struct ldt_struct *ldt)
 {
-	int err = alloc_ldt(new, old->size, 0);
-	int i;
+	/* Synchronizes with lockless_dereference in load_mm_ldt. */
+	smp_store_release(&current_mm->context.ldt, ldt);
+
+	/* Activate the LDT for all CPUs using current_mm. */
+	on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+}
 
-	if (err < 0)
-		return err;
+static void free_ldt_struct(struct ldt_struct *ldt)
+{
+	if (likely(!ldt))
+		return;
 
-	for (i = 0; i < old->size; i++)
-		write_ldt_entry(new->ldt, i, old->ldt + i * LDT_ENTRY_SIZE);
-	return 0;
+	paravirt_free_ldt(ldt->entries, ldt->size);
+	if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+		vfree(ldt->entries);
+	else
+		kfree(ldt->entries);
+	kfree(ldt);
 }
 
 /*
@@ -104,17 +105,37 @@ static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
+	struct ldt_struct *new_ldt;
 	struct mm_struct *old_mm;
 	int retval = 0;
 
 	mutex_init(&mm->context.lock);
-	mm->context.size = 0;
 	old_mm = current->mm;
-	if (old_mm && old_mm->context.size > 0) {
-		mutex_lock(&old_mm->context.lock);
-		retval = copy_ldt(&mm->context, &old_mm->context);
-		mutex_unlock(&old_mm->context.lock);
+	if (!old_mm) {
+		mm->context.ldt = NULL;
+		return 0;
 	}
+
+	mutex_lock(&old_mm->context.lock);
+	if (!old_mm->context.ldt) {
+		mm->context.ldt = NULL;
+		goto out_unlock;
+	}
+
+	new_ldt = alloc_ldt_struct(old_mm->context.ldt->size);
+	if (!new_ldt) {
+		retval = -ENOMEM;
+		goto out_unlock;
+	}
+
+	memcpy(new_ldt->entries, old_mm->context.ldt->entries,
+	       new_ldt->size * LDT_ENTRY_SIZE);
+	finalize_ldt_struct(new_ldt);
+
+	mm->context.ldt = new_ldt;
+
+out_unlock:
+	mutex_unlock(&old_mm->context.lock);
 	return retval;
 }
 
@@ -125,53 +146,47 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
  */
 void destroy_context(struct mm_struct *mm)
 {
-	if (mm->context.size) {
-#ifdef CONFIG_X86_32
-		/* CHECKME: Can this ever happen ? */
-		if (mm == current->active_mm)
-			clear_LDT();
-#endif
-		paravirt_free_ldt(mm->context.ldt, mm->context.size);
-		if (mm->context.size * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(mm->context.ldt);
-		else
-			put_page(virt_to_page(mm->context.ldt));
-		mm->context.size = 0;
-	}
+	free_ldt_struct(mm->context.ldt);
+	mm->context.ldt = NULL;
 }
 
 static int read_ldt(void __user *ptr, unsigned long bytecount)
 {
-	int err;
+	int retval;
 	unsigned long size;
 	struct mm_struct *mm = current->mm;
 
-	if (!mm->context.size)
-		return 0;
+	mutex_lock(&mm->context.lock);
+
+	if (!mm->context.ldt) {
+		retval = 0;
+		goto out_unlock;
+	}
+
 	if (bytecount > LDT_ENTRY_SIZE * LDT_ENTRIES)
 		bytecount = LDT_ENTRY_SIZE * LDT_ENTRIES;
 
-	mutex_lock(&mm->context.lock);
-	size = mm->context.size * LDT_ENTRY_SIZE;
+	size = mm->context.ldt->size * LDT_ENTRY_SIZE;
 	if (size > bytecount)
 		size = bytecount;
 
-	err = 0;
-	if (copy_to_user(ptr, mm->context.ldt, size))
-		err = -EFAULT;
-	mutex_unlock(&mm->context.lock);
-	if (err < 0)
-		goto error_return;
+	if (copy_to_user(ptr, mm->context.ldt->entries, size)) {
+		retval = -EFAULT;
+		goto out_unlock;
+	}
+
 	if (size != bytecount) {
-		/* zero-fill the rest */
-		if (clear_user(ptr + size, bytecount - size) != 0) {
-			err = -EFAULT;
-			goto error_return;
+		/* Zero-fill the rest and pretend we read bytecount bytes. */
+		if (clear_user(ptr + size, bytecount - size)) {
+			retval = -EFAULT;
+			goto out_unlock;
 		}
 	}
-	return bytecount;
-error_return:
-	return err;
+	retval = bytecount;
+
+out_unlock:
+	mutex_unlock(&mm->context.lock);
+	return retval;
 }
 
 static int read_default_ldt(void __user *ptr, unsigned long bytecount)
@@ -195,6 +210,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	struct desc_struct ldt;
 	int error;
 	struct user_desc ldt_info;
+	int oldsize, newsize;
+	struct ldt_struct *new_ldt, *old_ldt;
 
 	error = -EINVAL;
 	if (bytecount != sizeof(ldt_info))
@@ -213,34 +230,39 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 			goto out;
 	}
 
-	mutex_lock(&mm->context.lock);
-	if (ldt_info.entry_number >= mm->context.size) {
-		error = alloc_ldt(&current->mm->context,
-				  ldt_info.entry_number + 1, 1);
-		if (error < 0)
-			goto out_unlock;
-	}
-
-	/* Allow LDTs to be cleared by the user. */
-	if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
-		if (oldmode || LDT_empty(&ldt_info)) {
-			memset(&ldt, 0, sizeof(ldt));
-			goto install;
+	if ((oldmode && !ldt_info.base_addr && !ldt_info.limit) ||
+	    LDT_empty(&ldt_info)) {
+		/* The user wants to clear the entry. */
+		memset(&ldt, 0, sizeof(ldt));
+	} else {
+		if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+			error = -EINVAL;
+			goto out;
 		}
+
+		fill_ldt(&ldt, &ldt_info);
+		if (oldmode)
+			ldt.avl = 0;
 	}
 
-	if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
-		error = -EINVAL;
+	mutex_lock(&mm->context.lock);
+
+	old_ldt = mm->context.ldt;
+	oldsize = old_ldt ? old_ldt->size : 0;
+	newsize = max((int)(ldt_info.entry_number + 1), oldsize);
+
+	error = -ENOMEM;
+	new_ldt = alloc_ldt_struct(newsize);
+	if (!new_ldt)
 		goto out_unlock;
-	}
 
-	fill_ldt(&ldt, &ldt_info);
-	if (oldmode)
-		ldt.avl = 0;
+	if (old_ldt)
+		memcpy(new_ldt->entries, old_ldt->entries, oldsize * LDT_ENTRY_SIZE);
+	new_ldt->entries[ldt_info.entry_number] = ldt;
+	finalize_ldt_struct(new_ldt);
 
-	/* Install the new entry ...  */
-install:
-	write_ldt_entry(mm->context.ldt, ldt_info.entry_number, &ldt);
+	install_ldt(mm, new_ldt);
+	free_ldt_struct(old_ldt);
 	error = 0;
 
 out_unlock:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 71d7849a07f7..f6b916387590 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,11 +121,11 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
-		if (dead_task->mm->context.size) {
+		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
 				dead_task->mm->context.ldt,
-				dead_task->mm->context.size);
+				dead_task->mm->context.ldt->size);
 			BUG();
 		}
 	}
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 9b4d51d0c0d0..6273324186ac 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/ptrace.h>
 #include <asm/desc.h>
+#include <asm/mmu_context.h>
 
 unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs)
 {
@@ -30,10 +31,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		seg &= ~7UL;
 
 		mutex_lock(&child->mm->context.lock);
-		if (unlikely((seg >> 3) >= child->mm->context.size))
+		if (unlikely(!child->mm->context.ldt ||
+			     (seg >> 3) >= child->mm->context.ldt->size))
 			addr = -1L; /* bogus selector, access would fault */
 		else {
-			desc = child->mm->context.ldt + seg;
+			desc = &child->mm->context.ldt->entries[seg];
 			base = get_desc_base(desc);
 
 			/* 16-bit code segment? */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 0d7dd1f5ac36..9ab52791fed5 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -22,6 +22,7 @@
 #include <asm/fpu/internal.h>
 #include <asm/debugreg.h>
 #include <asm/cpu.h>
+#include <asm/mmu_context.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -153,7 +154,7 @@ static void fix_processor_context(void)
 	syscall_init();				/* This sets MSR_*STAR and related */
 #endif
 	load_TR_desc();				/* This does ltr */
-	load_LDT(&current->active_mm->context);	/* This does lldt */
+	load_mm_ldt(current->active_mm);	/* This does lldt */
 
 	fpu__resume_cpu();
 }
-- 
2.4.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
                   ` (3 preceding siblings ...)
  2015-07-30 21:31 ` [PATCH v6 2/4] " Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-31 13:57   ` [tip:x86/asm] selftests/x86, x86/ldt: Add a selftest for modify_ldt() tip-bot for Andy Lutomirski
  2015-07-31 13:57   ` tip-bot for Andy Lutomirski
  2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper,
	Jan Beulich, xen-devel, Andy Lutomirski

This tests general modify_ldt behavior (only writes, so far) as
well as synchronous updates via IPI.  It fails on old kernels.

I called this ldt_gdt because I'll add set_thread_area tests to
it at some point.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/ldt_gdt.c | 576 ++++++++++++++++++++++++++++++++++
 2 files changed, 577 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/ldt_gdt.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index caa60d56d7d1..4138387b892c 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -4,7 +4,7 @@ include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
-TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs
+TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs ldt_gdt
 TARGETS_C_32BIT_ONLY := entry_from_vm86
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c
new file mode 100644
index 000000000000..31a3035cd4eb
--- /dev/null
+++ b/tools/testing/selftests/x86/ldt_gdt.c
@@ -0,0 +1,576 @@
+/*
+ * ldt_gdt.c - Test cases for LDT and GDT access
+ * Copyright (c) 2015 Andrew Lutomirski
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <asm/ldt.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <sched.h>
+#include <linux/futex.h>
+
+#define AR_ACCESSED		(1<<8)
+
+#define AR_TYPE_RODATA		(0 * (1<<9))
+#define AR_TYPE_RWDATA		(1 * (1<<9))
+#define AR_TYPE_RODATA_EXPDOWN	(2 * (1<<9))
+#define AR_TYPE_RWDATA_EXPDOWN	(3 * (1<<9))
+#define AR_TYPE_XOCODE		(4 * (1<<9))
+#define AR_TYPE_XRCODE		(5 * (1<<9))
+#define AR_TYPE_XOCODE_CONF	(6 * (1<<9))
+#define AR_TYPE_XRCODE_CONF	(7 * (1<<9))
+
+#define AR_DPL3			(3 * (1<<13))
+
+#define AR_S			(1 << 12)
+#define AR_P			(1 << 15)
+#define AR_AVL			(1 << 20)
+#define AR_L			(1 << 21)
+#define AR_DB			(1 << 22)
+#define AR_G			(1 << 23)
+
+static int nerrs;
+
+static void check_invalid_segment(uint16_t index, int ldt)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (has_limit || has_ar) {
+		printf("[FAIL]\t%s entry %hu is valid but should be invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+	} else {
+		printf("[OK]\t%s entry %hu is invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+	}
+}
+
+static void check_valid_segment(uint16_t index, int ldt,
+				uint32_t expected_ar, uint32_t expected_limit,
+				bool verbose)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (!has_limit || !has_ar) {
+		printf("[FAIL]\t%s entry %hu is invalid but should be valid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+		return;
+	}
+
+	if (ar != expected_ar) {
+		printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, expected_ar);
+		nerrs++;
+	} else if (limit != expected_limit) {
+		printf("[FAIL]\t%s entry %hu has limit 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, limit, expected_limit);
+		nerrs++;
+	} else if (verbose) {
+		printf("[OK]\t%s entry %hu has AR 0x%08X and limit 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, limit);
+	}
+}
+
+static bool install_valid_mode(const struct user_desc *desc, uint32_t ar,
+			       bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		uint32_t limit = desc->limit;
+		if (desc->limit_in_pages)
+			limit = (limit << 12) + 4095;
+		check_valid_segment(desc->entry_number, 1, ar, limit, true);
+		return true;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+		return false;
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+			return false;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+			return false;
+		}
+	}
+}
+
+static bool install_valid(const struct user_desc *desc, uint32_t ar)
+{
+	return install_valid_mode(desc, ar, false);
+}
+
+static void install_invalid(const struct user_desc *desc, bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		check_invalid_segment(desc->entry_number, 1);
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+		}
+	}
+}
+
+static int safe_modify_ldt(int func, struct user_desc *ptr,
+			   unsigned long bytecount)
+{
+	int ret = syscall(SYS_modify_ldt, 0x11, ptr, bytecount);
+	if (ret < -1)
+		errno = -ret;
+	return ret;
+}
+
+static void fail_install(struct user_desc *desc)
+{
+	if (safe_modify_ldt(0x11, desc, sizeof(*desc)) == 0) {
+		printf("[FAIL]\tmodify_ldt accepted a bad descriptor\n");
+		nerrs++;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		printf("[OK]\tmodify_ldt failure %d\n", errno);
+	}
+}
+
+static void do_simple_tests(void)
+{
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 10,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	desc.limit_in_pages = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.entry_number = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.base_addr = 0xf0000000;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	desc.useable = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_32bit = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_G | AR_AVL);
+
+	desc.seg_32bit = 1;
+	desc.contents = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.contents = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA_EXPDOWN |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.limit_in_pages = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA_EXPDOWN |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.contents = 3;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.contents = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+
+#ifdef __x86_64__
+	desc.lm = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+		      AR_S | AR_DB | AR_AVL);
+	desc.lm = 0;
+#endif
+
+	bool entry1_okay = install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+					 AR_S | AR_DB | AR_AVL);
+
+	if (entry1_okay) {
+		printf("[RUN]\tTest fork\n");
+		pid_t child = fork();
+		if (child == 0) {
+			nerrs = 0;
+			check_valid_segment(desc.entry_number, 1,
+					    AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, desc.limit,
+					    true);
+			check_invalid_segment(1, 1);
+			exit(nerrs ? 1 : 0);
+		} else {
+			int status;
+			if (waitpid(child, &status, 0) != child ||
+			    !WIFEXITED(status)) {
+				printf("[FAIL]\tChild died\n");
+				nerrs++;
+			} else if (WEXITSTATUS(status) != 0) {
+				printf("[FAIL]\tChild failed\n");
+				nerrs++;
+			} else {
+				printf("[OK]\tChild succeeded\n");
+			}
+		}
+
+		printf("[RUN]\tTest size\n");
+		int i;
+		for (i = 0; i < 8192; i++) {
+			desc.entry_number = i;
+			desc.limit = i;
+			if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+				printf("[FAIL]\tFailed to install entry %d\n", i);
+				nerrs++;
+				break;
+			}
+		}
+		for (int j = 0; j < i; j++) {
+			check_valid_segment(j, 1, AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, j, false);
+		}
+		printf("[DONE]\tSize test\n");
+	} else {
+		printf("[SKIP]\tSkipping fork and size tests because we have no LDT\n");
+	}
+
+	/* Test entry_number too high. */
+	desc.entry_number = 8192;
+	fail_install(&desc);
+
+	/* Test deletion and actions mistakeable for deletion. */
+	memset(&desc, 0, sizeof(desc));
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S | AR_P);
+
+	desc.read_exec_only = 0;
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.read_exec_only = 1;
+	desc.limit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.limit = 0;
+	desc.base_addr = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.base_addr = 0;
+	install_invalid(&desc, false);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 0;
+	desc.seg_32bit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P | AR_DB);
+	install_invalid(&desc, true);
+}
+
+/*
+ * 0: thread is idle
+ * 1: thread armed
+ * 2: thread should clear LDT entry 0
+ * 3: thread should exit
+ */
+static volatile unsigned int ftx;
+
+static void *threadproc(void *ctx)
+{
+	cpu_set_t cpuset;
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
+		err(1, "sched_setaffinity to CPU 1");	/* should never fail */
+
+	while (1) {
+		syscall(SYS_futex, &ftx, FUTEX_WAIT, 0, NULL, NULL, 0);
+		while (ftx != 2) {
+			if (ftx >= 3)
+				return NULL;
+		}
+
+		/* clear LDT entry 0 */
+		const struct user_desc desc = {};
+		if (syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)) != 0)
+			err(1, "modify_ldt");
+
+		/* If ftx == 2, set it to zero.  If ftx == 100, quit. */
+		unsigned int x = -2;
+		asm volatile ("lock xaddl %[x], %[ftx]" :
+			      [x] "+r" (x), [ftx] "+m" (ftx));
+		if (x != 2)
+			return NULL;
+	}
+}
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+
+}
+
+static jmp_buf jmpbuf;
+
+static void sigsegv(int sig, siginfo_t *info, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static void do_multicpu_tests(void)
+{
+	cpu_set_t cpuset;
+	pthread_t thread;
+	int failures = 0, iters = 5, i;
+	unsigned short orig_ss;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 1\n");
+		return;
+	}
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 0\n");
+		return;
+	}
+
+	sethandler(SIGSEGV, sigsegv, 0);
+#ifdef __i386__
+	/* True 32-bit kernels send SIGILL instead of SIGSEGV on IRET faults. */
+	sethandler(SIGILL, sigsegv, 0);
+#endif
+
+	printf("[RUN]\tCross-CPU LDT invalidation\n");
+
+	if (pthread_create(&thread, 0, threadproc, 0) != 0)
+		err(1, "pthread_create");
+
+	asm volatile ("mov %%ss, %0" : "=rm" (orig_ss));
+
+	for (i = 0; i < 5; i++) {
+		if (sigsetjmp(jmpbuf, 1) != 0)
+			continue;
+
+		/* Make sure the thread is ready after the last test. */
+		while (ftx != 0)
+			;
+
+		struct user_desc desc = {
+			.entry_number    = 0,
+			.base_addr       = 0,
+			.limit           = 0xfffff,
+			.seg_32bit       = 1,
+			.contents        = 0, /* Data */
+			.read_exec_only  = 0,
+			.limit_in_pages  = 1,
+			.seg_not_present = 0,
+			.useable         = 0
+		};
+
+		if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+			if (errno != ENOSYS)
+				err(1, "modify_ldt");
+			printf("[SKIP]\tmodify_ldt unavailable\n");
+			break;
+		}
+
+		/* Arm the thread. */
+		ftx = 1;
+		syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+		asm volatile ("mov %0, %%ss" : : "r" (0x7));
+
+		/* Go! */
+		ftx = 2;
+
+		while (ftx != 0)
+			;
+
+		/*
+		 * On success, modify_ldt will segfault us synchronously,
+		 * and we'll escape via siglongjmp.
+		 */
+
+		failures++;
+		asm volatile ("mov %0, %%ss" : : "rm" (orig_ss));
+	};
+
+	ftx = 100;  /* Kill the thread. */
+	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+	if (pthread_join(thread, NULL) != 0)
+		err(1, "pthread_join");
+
+	if (failures) {
+		printf("[FAIL]\t%d of %d iterations failed\n", failures, iters);
+		nerrs++;
+	} else {
+		printf("[OK]\tAll %d iterations succeeded\n", iters);
+	}
+}
+
+static int finish_exec_test(void)
+{
+	/*
+	 * In a sensible world, this would be check_invalid_segment(0, 1);
+	 * For better or for worse, though, the LDT is inherited across exec.
+	 * We can probably change this safely, but for now we test it.
+	 */
+	check_valid_segment(0, 1,
+			    AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB,
+			    42, true);
+
+	return nerrs ? 1 : 0;
+}
+
+static void do_exec_test(void)
+{
+	printf("[RUN]\tTest exec\n");
+
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 42,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	pid_t child = fork();
+	if (child == 0) {
+		execl("/proc/self/exe", "ldt_gdt_test_exec", NULL);
+		printf("[FAIL]\tCould not exec self\n");
+		exit(1);	/* exec failed */
+	} else {
+		int status;
+		if (waitpid(child, &status, 0) != child ||
+		    !WIFEXITED(status)) {
+			printf("[FAIL]\tChild died\n");
+			nerrs++;
+		} else if (WEXITSTATUS(status) != 0) {
+			printf("[FAIL]\tChild failed\n");
+			nerrs++;
+		} else {
+			printf("[OK]\tChild succeeded\n");
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	if (argc == 1 && !strcmp(argv[0], "ldt_gdt_test_exec"))
+		return finish_exec_test();
+
+	do_simple_tests();
+
+	do_multicpu_tests();
+
+	do_exec_test();
+
+	return nerrs ? 1 : 0;
+}
-- 
2.4.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
                   ` (4 preceding siblings ...)
  2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-30 21:31 ` [PATCH v6 4/4] x86/ldt: Make modify_ldt optional Andy Lutomirski
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Andy Lutomirski, Peter Zijlstra, Andrew Cooper,
	Steven Rostedt, xen-devel, Jan Beulich, Sasha Levin,
	Boris Ostrovsky

This tests general modify_ldt behavior (only writes, so far) as
well as synchronous updates via IPI.  It fails on old kernels.

I called this ldt_gdt because I'll add set_thread_area tests to
it at some point.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/ldt_gdt.c | 576 ++++++++++++++++++++++++++++++++++
 2 files changed, 577 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/ldt_gdt.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index caa60d56d7d1..4138387b892c 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -4,7 +4,7 @@ include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
-TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs
+TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs ldt_gdt
 TARGETS_C_32BIT_ONLY := entry_from_vm86
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c
new file mode 100644
index 000000000000..31a3035cd4eb
--- /dev/null
+++ b/tools/testing/selftests/x86/ldt_gdt.c
@@ -0,0 +1,576 @@
+/*
+ * ldt_gdt.c - Test cases for LDT and GDT access
+ * Copyright (c) 2015 Andrew Lutomirski
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <asm/ldt.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <sched.h>
+#include <linux/futex.h>
+
+#define AR_ACCESSED		(1<<8)
+
+#define AR_TYPE_RODATA		(0 * (1<<9))
+#define AR_TYPE_RWDATA		(1 * (1<<9))
+#define AR_TYPE_RODATA_EXPDOWN	(2 * (1<<9))
+#define AR_TYPE_RWDATA_EXPDOWN	(3 * (1<<9))
+#define AR_TYPE_XOCODE		(4 * (1<<9))
+#define AR_TYPE_XRCODE		(5 * (1<<9))
+#define AR_TYPE_XOCODE_CONF	(6 * (1<<9))
+#define AR_TYPE_XRCODE_CONF	(7 * (1<<9))
+
+#define AR_DPL3			(3 * (1<<13))
+
+#define AR_S			(1 << 12)
+#define AR_P			(1 << 15)
+#define AR_AVL			(1 << 20)
+#define AR_L			(1 << 21)
+#define AR_DB			(1 << 22)
+#define AR_G			(1 << 23)
+
+static int nerrs;
+
+static void check_invalid_segment(uint16_t index, int ldt)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (has_limit || has_ar) {
+		printf("[FAIL]\t%s entry %hu is valid but should be invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+	} else {
+		printf("[OK]\t%s entry %hu is invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+	}
+}
+
+static void check_valid_segment(uint16_t index, int ldt,
+				uint32_t expected_ar, uint32_t expected_limit,
+				bool verbose)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (!has_limit || !has_ar) {
+		printf("[FAIL]\t%s entry %hu is invalid but should be valid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+		return;
+	}
+
+	if (ar != expected_ar) {
+		printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, expected_ar);
+		nerrs++;
+	} else if (limit != expected_limit) {
+		printf("[FAIL]\t%s entry %hu has limit 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, limit, expected_limit);
+		nerrs++;
+	} else if (verbose) {
+		printf("[OK]\t%s entry %hu has AR 0x%08X and limit 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, limit);
+	}
+}
+
+static bool install_valid_mode(const struct user_desc *desc, uint32_t ar,
+			       bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		uint32_t limit = desc->limit;
+		if (desc->limit_in_pages)
+			limit = (limit << 12) + 4095;
+		check_valid_segment(desc->entry_number, 1, ar, limit, true);
+		return true;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+		return false;
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+			return false;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+			return false;
+		}
+	}
+}
+
+static bool install_valid(const struct user_desc *desc, uint32_t ar)
+{
+	return install_valid_mode(desc, ar, false);
+}
+
+static void install_invalid(const struct user_desc *desc, bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		check_invalid_segment(desc->entry_number, 1);
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+		}
+	}
+}
+
+static int safe_modify_ldt(int func, struct user_desc *ptr,
+			   unsigned long bytecount)
+{
+	int ret = syscall(SYS_modify_ldt, 0x11, ptr, bytecount);
+	if (ret < -1)
+		errno = -ret;
+	return ret;
+}
+
+static void fail_install(struct user_desc *desc)
+{
+	if (safe_modify_ldt(0x11, desc, sizeof(*desc)) == 0) {
+		printf("[FAIL]\tmodify_ldt accepted a bad descriptor\n");
+		nerrs++;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		printf("[OK]\tmodify_ldt failure %d\n", errno);
+	}
+}
+
+static void do_simple_tests(void)
+{
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 10,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	desc.limit_in_pages = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.entry_number = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.base_addr = 0xf0000000;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	desc.useable = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_32bit = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_G | AR_AVL);
+
+	desc.seg_32bit = 1;
+	desc.contents = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.contents = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA_EXPDOWN |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.limit_in_pages = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA_EXPDOWN |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.contents = 3;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.contents = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+
+#ifdef __x86_64__
+	desc.lm = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+		      AR_S | AR_DB | AR_AVL);
+	desc.lm = 0;
+#endif
+
+	bool entry1_okay = install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+					 AR_S | AR_DB | AR_AVL);
+
+	if (entry1_okay) {
+		printf("[RUN]\tTest fork\n");
+		pid_t child = fork();
+		if (child == 0) {
+			nerrs = 0;
+			check_valid_segment(desc.entry_number, 1,
+					    AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, desc.limit,
+					    true);
+			check_invalid_segment(1, 1);
+			exit(nerrs ? 1 : 0);
+		} else {
+			int status;
+			if (waitpid(child, &status, 0) != child ||
+			    !WIFEXITED(status)) {
+				printf("[FAIL]\tChild died\n");
+				nerrs++;
+			} else if (WEXITSTATUS(status) != 0) {
+				printf("[FAIL]\tChild failed\n");
+				nerrs++;
+			} else {
+				printf("[OK]\tChild succeeded\n");
+			}
+		}
+
+		printf("[RUN]\tTest size\n");
+		int i;
+		for (i = 0; i < 8192; i++) {
+			desc.entry_number = i;
+			desc.limit = i;
+			if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+				printf("[FAIL]\tFailed to install entry %d\n", i);
+				nerrs++;
+				break;
+			}
+		}
+		for (int j = 0; j < i; j++) {
+			check_valid_segment(j, 1, AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, j, false);
+		}
+		printf("[DONE]\tSize test\n");
+	} else {
+		printf("[SKIP]\tSkipping fork and size tests because we have no LDT\n");
+	}
+
+	/* Test entry_number too high. */
+	desc.entry_number = 8192;
+	fail_install(&desc);
+
+	/* Test deletion and actions mistakeable for deletion. */
+	memset(&desc, 0, sizeof(desc));
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S | AR_P);
+
+	desc.read_exec_only = 0;
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.read_exec_only = 1;
+	desc.limit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.limit = 0;
+	desc.base_addr = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.base_addr = 0;
+	install_invalid(&desc, false);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 0;
+	desc.seg_32bit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P | AR_DB);
+	install_invalid(&desc, true);
+}
+
+/*
+ * 0: thread is idle
+ * 1: thread armed
+ * 2: thread should clear LDT entry 0
+ * 3: thread should exit
+ */
+static volatile unsigned int ftx;
+
+static void *threadproc(void *ctx)
+{
+	cpu_set_t cpuset;
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
+		err(1, "sched_setaffinity to CPU 1");	/* should never fail */
+
+	while (1) {
+		syscall(SYS_futex, &ftx, FUTEX_WAIT, 0, NULL, NULL, 0);
+		while (ftx != 2) {
+			if (ftx >= 3)
+				return NULL;
+		}
+
+		/* clear LDT entry 0 */
+		const struct user_desc desc = {};
+		if (syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)) != 0)
+			err(1, "modify_ldt");
+
+		/* If ftx == 2, set it to zero.  If ftx == 100, quit. */
+		unsigned int x = -2;
+		asm volatile ("lock xaddl %[x], %[ftx]" :
+			      [x] "+r" (x), [ftx] "+m" (ftx));
+		if (x != 2)
+			return NULL;
+	}
+}
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+
+}
+
+static jmp_buf jmpbuf;
+
+static void sigsegv(int sig, siginfo_t *info, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static void do_multicpu_tests(void)
+{
+	cpu_set_t cpuset;
+	pthread_t thread;
+	int failures = 0, iters = 5, i;
+	unsigned short orig_ss;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 1\n");
+		return;
+	}
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 0\n");
+		return;
+	}
+
+	sethandler(SIGSEGV, sigsegv, 0);
+#ifdef __i386__
+	/* True 32-bit kernels send SIGILL instead of SIGSEGV on IRET faults. */
+	sethandler(SIGILL, sigsegv, 0);
+#endif
+
+	printf("[RUN]\tCross-CPU LDT invalidation\n");
+
+	if (pthread_create(&thread, 0, threadproc, 0) != 0)
+		err(1, "pthread_create");
+
+	asm volatile ("mov %%ss, %0" : "=rm" (orig_ss));
+
+	for (i = 0; i < 5; i++) {
+		if (sigsetjmp(jmpbuf, 1) != 0)
+			continue;
+
+		/* Make sure the thread is ready after the last test. */
+		while (ftx != 0)
+			;
+
+		struct user_desc desc = {
+			.entry_number    = 0,
+			.base_addr       = 0,
+			.limit           = 0xfffff,
+			.seg_32bit       = 1,
+			.contents        = 0, /* Data */
+			.read_exec_only  = 0,
+			.limit_in_pages  = 1,
+			.seg_not_present = 0,
+			.useable         = 0
+		};
+
+		if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+			if (errno != ENOSYS)
+				err(1, "modify_ldt");
+			printf("[SKIP]\tmodify_ldt unavailable\n");
+			break;
+		}
+
+		/* Arm the thread. */
+		ftx = 1;
+		syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+		asm volatile ("mov %0, %%ss" : : "r" (0x7));
+
+		/* Go! */
+		ftx = 2;
+
+		while (ftx != 0)
+			;
+
+		/*
+		 * On success, modify_ldt will segfault us synchronously,
+		 * and we'll escape via siglongjmp.
+		 */
+
+		failures++;
+		asm volatile ("mov %0, %%ss" : : "rm" (orig_ss));
+	};
+
+	ftx = 100;  /* Kill the thread. */
+	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+	if (pthread_join(thread, NULL) != 0)
+		err(1, "pthread_join");
+
+	if (failures) {
+		printf("[FAIL]\t%d of %d iterations failed\n", failures, iters);
+		nerrs++;
+	} else {
+		printf("[OK]\tAll %d iterations succeeded\n", iters);
+	}
+}
+
+static int finish_exec_test(void)
+{
+	/*
+	 * In a sensible world, this would be check_invalid_segment(0, 1);
+	 * For better or for worse, though, the LDT is inherited across exec.
+	 * We can probably change this safely, but for now we test it.
+	 */
+	check_valid_segment(0, 1,
+			    AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB,
+			    42, true);
+
+	return nerrs ? 1 : 0;
+}
+
+static void do_exec_test(void)
+{
+	printf("[RUN]\tTest exec\n");
+
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 42,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	pid_t child = fork();
+	if (child == 0) {
+		execl("/proc/self/exe", "ldt_gdt_test_exec", NULL);
+		printf("[FAIL]\tCould not exec self\n");
+		exit(1);	/* exec failed */
+	} else {
+		int status;
+		if (waitpid(child, &status, 0) != child ||
+		    !WIFEXITED(status)) {
+			printf("[FAIL]\tChild died\n");
+			nerrs++;
+		} else if (WEXITSTATUS(status) != 0) {
+			printf("[FAIL]\tChild failed\n");
+			nerrs++;
+		} else {
+			printf("[OK]\tChild succeeded\n");
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	if (argc == 1 && !strcmp(argv[0], "ldt_gdt_test_exec"))
+		return finish_exec_test();
+
+	do_simple_tests();
+
+	do_multicpu_tests();
+
+	do_exec_test();
+
+	return nerrs ? 1 : 0;
+}
-- 
2.4.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 4/4] x86/ldt: Make modify_ldt optional
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
                   ` (6 preceding siblings ...)
  2015-07-30 21:31 ` [PATCH v6 4/4] x86/ldt: Make modify_ldt optional Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-31 13:49   ` Ingo Molnar
                     ` (3 more replies)
  2015-07-31  9:10 ` [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andrew Cooper
  2015-07-31  9:10 ` Andrew Cooper
  9 siblings, 4 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper,
	Jan Beulich, xen-devel, Andy Lutomirski

The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace.  Make it optional.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/Kconfig                   | 17 +++++++++++++++++
 arch/x86/include/asm/mmu.h         |  2 ++
 arch/x86/include/asm/mmu_context.h | 28 +++++++++++++++++++++-------
 arch/x86/kernel/Makefile           |  3 ++-
 arch/x86/kernel/cpu/perf_event.c   |  4 ++++
 arch/x86/kernel/process_64.c       |  2 ++
 arch/x86/kernel/step.c             |  2 ++
 kernel/sys_ni.c                    |  1 +
 8 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b3a1a5d77d92..beabf30c7b52 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1015,6 +1015,7 @@ config VM86
 config X86_16BIT
 	bool "Enable support for 16-bit segments" if EXPERT
 	default y
+	depends on MODIFY_LDT_SYSCALL
 	---help---
 	  This option is required by programs like Wine to run 16-bit
 	  protected mode legacy code on x86 processors.  Disabling
@@ -2053,6 +2054,22 @@ config CMDLINE_OVERRIDE
 	  This is used to work around broken boot loaders.  This should
 	  be set to 'N' under normal conditions.
 
+config MODIFY_LDT_SYSCALL
+	bool "Enable the LDT (local descriptor table)" if EXPERT
+	default y
+	---help---
+	  Linux can allow user programs to install a per-process x86
+	  Local Descriptor Table (LDT) using the modify_ldt(2) system
+	  call.  This is required to run 16-bit or segmented code such as
+	  DOSEMU or some Wine programs.  It is also used by some very old
+	  threading libraries.
+
+	  Enabling this feature adds a small amount of overhead to
+	  context switches and increases the low-level kernel attack
+	  surface.  Disabling it removes the modify_ldt(2) system call.
+
+	  Saying 'N' here may make sense for embedded or server kernels.
+
 source "kernel/livepatch/Kconfig"
 
 endmenu
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 364d27481a52..55234d5e7160 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,7 +9,9 @@
  * we put the segment information here.
  */
 typedef struct {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
+#endif
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 984abfe47edc..379cd3658799 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -33,6 +33,7 @@ static inline void load_mm_cr4(struct mm_struct *mm)
 static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 /*
  * ldt_structs can be allocated, used, and freed, but they are never
  * modified while live.
@@ -48,8 +49,23 @@ struct ldt_struct {
 	int size;
 };
 
+/*
+ * Used for LDT copy/destruction.
+ */
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
+void destroy_context(struct mm_struct *mm);
+#else	/* CONFIG_MODIFY_LDT_SYSCALL */
+static inline int init_new_context(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	return 0;
+}
+static inline void destroy_context(struct mm_struct *mm) {}
+#endif
+
 static inline void load_mm_ldt(struct mm_struct *mm)
 {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
 
 	/* lockless_dereference synchronizes with smp_store_release */
@@ -73,17 +89,13 @@ static inline void load_mm_ldt(struct mm_struct *mm)
 		set_ldt(ldt->entries, ldt->size);
 	else
 		clear_LDT();
+#else
+	clear_LDT();
+#endif
 
 	DEBUG_LOCKS_WARN_ON(preemptible());
 }
 
-/*
- * Used for LDT copy/destruction.
- */
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void destroy_context(struct mm_struct *mm);
-
-
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
 #ifdef CONFIG_SMP
@@ -114,6 +126,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		/* Load per-mm CR4 state */
 		load_mm_cr4(next);
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		/*
 		 * Load the LDT, if the LDT is different.
 		 *
@@ -128,6 +141,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
 			load_mm_ldt(next);
+#endif
 	}
 #ifdef CONFIG_SMP
 	  else {
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f15af41bd80..2b507befcd3f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -24,7 +24,8 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
 
 obj-y			:= process_$(BITS).o signal.o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
+obj-y			+= time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_MODIFY_LDT_SYSCALL)	+= ldt.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
 obj-$(CONFIG_IRQ_WORK)  += irq_work.o
 obj-y			+= probe_roms.o
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 9469dfa55607..58b872ef2329 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,6 +2179,7 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		struct ldt_struct *ldt;
 
 		if (idx > LDT_ENTRIES)
@@ -2190,6 +2191,9 @@ static unsigned long get_segment_base(unsigned int segment)
 			return 0;
 
 		desc = &ldt->entries[idx];
+#else
+		return 0;
+#endif
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index f6b916387590..941295ddf802 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,6 +121,7 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
@@ -128,6 +129,7 @@ void release_thread(struct task_struct *dead_task)
 				dead_task->mm->context.ldt->size);
 			BUG();
 		}
+#endif
 	}
 }
 
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 6273324186ac..fd88e152d584 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -18,6 +18,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		return addr;
 	}
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	/*
 	 * We'll assume that the code segments in the GDT
 	 * are all zero-based. That is largely true: the
@@ -45,6 +46,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		}
 		mutex_unlock(&child->mm->context.lock);
 	}
+#endif
 
 	return addr;
 }
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5868d8..ca7d84f438f1 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -140,6 +140,7 @@ cond_syscall(sys_sgetmask);
 cond_syscall(sys_ssetmask);
 cond_syscall(sys_vm86old);
 cond_syscall(sys_vm86);
+cond_syscall(sys_modify_ldt);
 cond_syscall(sys_ipc);
 cond_syscall(compat_sys_ipc);
 cond_syscall(compat_sys_sysctl);
-- 
2.4.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 4/4] x86/ldt: Make modify_ldt optional
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
                   ` (5 preceding siblings ...)
  2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
@ 2015-07-30 21:31 ` Andy Lutomirski
  2015-07-30 21:31 ` Andy Lutomirski
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-30 21:31 UTC (permalink / raw)
  To: X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Andy Lutomirski, Peter Zijlstra, Andrew Cooper,
	Steven Rostedt, xen-devel, Jan Beulich, Sasha Levin,
	Boris Ostrovsky

The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace.  Make it optional.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/Kconfig                   | 17 +++++++++++++++++
 arch/x86/include/asm/mmu.h         |  2 ++
 arch/x86/include/asm/mmu_context.h | 28 +++++++++++++++++++++-------
 arch/x86/kernel/Makefile           |  3 ++-
 arch/x86/kernel/cpu/perf_event.c   |  4 ++++
 arch/x86/kernel/process_64.c       |  2 ++
 arch/x86/kernel/step.c             |  2 ++
 kernel/sys_ni.c                    |  1 +
 8 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b3a1a5d77d92..beabf30c7b52 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1015,6 +1015,7 @@ config VM86
 config X86_16BIT
 	bool "Enable support for 16-bit segments" if EXPERT
 	default y
+	depends on MODIFY_LDT_SYSCALL
 	---help---
 	  This option is required by programs like Wine to run 16-bit
 	  protected mode legacy code on x86 processors.  Disabling
@@ -2053,6 +2054,22 @@ config CMDLINE_OVERRIDE
 	  This is used to work around broken boot loaders.  This should
 	  be set to 'N' under normal conditions.
 
+config MODIFY_LDT_SYSCALL
+	bool "Enable the LDT (local descriptor table)" if EXPERT
+	default y
+	---help---
+	  Linux can allow user programs to install a per-process x86
+	  Local Descriptor Table (LDT) using the modify_ldt(2) system
+	  call.  This is required to run 16-bit or segmented code such as
+	  DOSEMU or some Wine programs.  It is also used by some very old
+	  threading libraries.
+
+	  Enabling this feature adds a small amount of overhead to
+	  context switches and increases the low-level kernel attack
+	  surface.  Disabling it removes the modify_ldt(2) system call.
+
+	  Saying 'N' here may make sense for embedded or server kernels.
+
 source "kernel/livepatch/Kconfig"
 
 endmenu
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 364d27481a52..55234d5e7160 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,7 +9,9 @@
  * we put the segment information here.
  */
 typedef struct {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
+#endif
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 984abfe47edc..379cd3658799 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -33,6 +33,7 @@ static inline void load_mm_cr4(struct mm_struct *mm)
 static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 /*
  * ldt_structs can be allocated, used, and freed, but they are never
  * modified while live.
@@ -48,8 +49,23 @@ struct ldt_struct {
 	int size;
 };
 
+/*
+ * Used for LDT copy/destruction.
+ */
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
+void destroy_context(struct mm_struct *mm);
+#else	/* CONFIG_MODIFY_LDT_SYSCALL */
+static inline int init_new_context(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	return 0;
+}
+static inline void destroy_context(struct mm_struct *mm) {}
+#endif
+
 static inline void load_mm_ldt(struct mm_struct *mm)
 {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
 
 	/* lockless_dereference synchronizes with smp_store_release */
@@ -73,17 +89,13 @@ static inline void load_mm_ldt(struct mm_struct *mm)
 		set_ldt(ldt->entries, ldt->size);
 	else
 		clear_LDT();
+#else
+	clear_LDT();
+#endif
 
 	DEBUG_LOCKS_WARN_ON(preemptible());
 }
 
-/*
- * Used for LDT copy/destruction.
- */
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void destroy_context(struct mm_struct *mm);
-
-
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
 #ifdef CONFIG_SMP
@@ -114,6 +126,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		/* Load per-mm CR4 state */
 		load_mm_cr4(next);
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		/*
 		 * Load the LDT, if the LDT is different.
 		 *
@@ -128,6 +141,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
 			load_mm_ldt(next);
+#endif
 	}
 #ifdef CONFIG_SMP
 	  else {
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f15af41bd80..2b507befcd3f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -24,7 +24,8 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
 
 obj-y			:= process_$(BITS).o signal.o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
+obj-y			+= time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_MODIFY_LDT_SYSCALL)	+= ldt.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
 obj-$(CONFIG_IRQ_WORK)  += irq_work.o
 obj-y			+= probe_roms.o
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 9469dfa55607..58b872ef2329 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,6 +2179,7 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		struct ldt_struct *ldt;
 
 		if (idx > LDT_ENTRIES)
@@ -2190,6 +2191,9 @@ static unsigned long get_segment_base(unsigned int segment)
 			return 0;
 
 		desc = &ldt->entries[idx];
+#else
+		return 0;
+#endif
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index f6b916387590..941295ddf802 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,6 +121,7 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
@@ -128,6 +129,7 @@ void release_thread(struct task_struct *dead_task)
 				dead_task->mm->context.ldt->size);
 			BUG();
 		}
+#endif
 	}
 }
 
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 6273324186ac..fd88e152d584 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -18,6 +18,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		return addr;
 	}
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	/*
 	 * We'll assume that the code segments in the GDT
 	 * are all zero-based. That is largely true: the
@@ -45,6 +46,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		}
 		mutex_unlock(&child->mm->context.lock);
 	}
+#endif
 
 	return addr;
 }
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5868d8..ca7d84f438f1 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -140,6 +140,7 @@ cond_syscall(sys_sgetmask);
 cond_syscall(sys_ssetmask);
 cond_syscall(sys_vm86old);
 cond_syscall(sys_vm86);
+cond_syscall(sys_modify_ldt);
 cond_syscall(sys_ipc);
 cond_syscall(compat_sys_ipc);
 cond_syscall(compat_sys_sysctl);
-- 
2.4.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous
  2015-07-30 21:31 ` [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous Andy Lutomirski
@ 2015-07-30 22:50   ` Andrew Cooper
  2015-07-31  3:15     ` Andy Lutomirski
  2015-07-31  3:15     ` Andy Lutomirski
  2015-07-30 22:50   ` Andrew Cooper
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 31+ messages in thread
From: Andrew Cooper @ 2015-07-30 22:50 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Jan Beulich, xen-devel,
	stable

On 30/07/2015 22:31, Andy Lutomirski wrote:
> Note to -stable maintainers: by itself, this patch makes a
> pre-existing Xen bug much easier to trigger; on a 32-bit Xen guest,
> the new ldt_gdt selftest is likely to OOPS.  Even without this
> patch, the test can OOPS, but it's much less likely to happen.  The
> Xen maintainers should have a fix for that issue shortly.

All Xen issues should be fixed by patch 1 now.

~Andrew

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous
  2015-07-30 21:31 ` [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous Andy Lutomirski
  2015-07-30 22:50   ` Andrew Cooper
@ 2015-07-30 22:50   ` Andrew Cooper
  2015-07-31 13:56   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: Andrew Cooper @ 2015-07-30 22:50 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Peter Zijlstra, Steven Rostedt, xen-devel, stable,
	Jan Beulich, Sasha Levin, Boris Ostrovsky

On 30/07/2015 22:31, Andy Lutomirski wrote:
> Note to -stable maintainers: by itself, this patch makes a
> pre-existing Xen bug much easier to trigger; on a 32-bit Xen guest,
> the new ldt_gdt selftest is likely to OOPS.  Even without this
> patch, the test can OOPS, but it's much less likely to happen.  The
> Xen maintainers should have a fix for that issue shortly.

All Xen issues should be fixed by patch 1 now.

~Andrew

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous
  2015-07-30 22:50   ` Andrew Cooper
  2015-07-31  3:15     ` Andy Lutomirski
@ 2015-07-31  3:15     ` Andy Lutomirski
  1 sibling, 0 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-31  3:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Jan Beulich, xen-devel,
	stable

On Thu, Jul 30, 2015 at 3:50 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 30/07/2015 22:31, Andy Lutomirski wrote:
>> Note to -stable maintainers: by itself, this patch makes a
>> pre-existing Xen bug much easier to trigger; on a 32-bit Xen guest,
>> the new ldt_gdt selftest is likely to OOPS.  Even without this
>> patch, the test can OOPS, but it's much less likely to happen.  The
>> Xen maintainers should have a fix for that issue shortly.
>
> All Xen issues should be fixed by patch 1 now.
>

Whoops, right.  That paragraph in the changelog can be deleted.

> ~Andrew



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous
  2015-07-30 22:50   ` Andrew Cooper
@ 2015-07-31  3:15     ` Andy Lutomirski
  2015-07-31  3:15     ` Andy Lutomirski
  1 sibling, 0 replies; 31+ messages in thread
From: Andy Lutomirski @ 2015-07-31  3:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: security, Jan Beulich, Peter Zijlstra, X86 ML, linux-kernel,
	Steven Rostedt, xen-devel, Borislav Petkov, stable,
	Andy Lutomirski, Sasha Levin, Boris Ostrovsky

On Thu, Jul 30, 2015 at 3:50 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 30/07/2015 22:31, Andy Lutomirski wrote:
>> Note to -stable maintainers: by itself, this patch makes a
>> pre-existing Xen bug much easier to trigger; on a 32-bit Xen guest,
>> the new ldt_gdt selftest is likely to OOPS.  Even without this
>> patch, the test can OOPS, but it's much less likely to happen.  The
>> Xen maintainers should have a fix for that issue shortly.
>
> All Xen issues should be fixed by patch 1 now.
>

Whoops, right.  That paragraph in the changelog can be deleted.

> ~Andrew



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
                   ` (7 preceding siblings ...)
  2015-07-30 21:31 ` Andy Lutomirski
@ 2015-07-31  9:10 ` Andrew Cooper
  2015-07-31 13:44   ` Boris Ostrovsky
  2015-07-31 13:44   ` Boris Ostrovsky
  2015-07-31  9:10 ` Andrew Cooper
  9 siblings, 2 replies; 31+ messages in thread
From: Andrew Cooper @ 2015-07-31  9:10 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Jan Beulich, xen-devel

On 30/07/15 22:31, Andy Lutomirski wrote:
> This is intended for x86/urgent.  Sorry for taking so long, but it
> seemed nice to avoid breaking Xen.

Very much appreciated.  Thanks!

>
> This fixes the "dazed and confused" issue which was exposed by the
> CVE-2015-5157 fix.  It's also probably a good general attack surface
> reduction, and it replaces some scary code with IMO less scary code.
>
> Also, servers and embedded systems should probably turn off modify_ldt.
> This makes that possible.
>
> Xen people, can you test patch 1?  It works for me on my evil 32-bit
> Xen virtio setup.

So the LDT issue seems to have gone away, which is good.

However, I did get this from my single vcpu guest test

[OK]    LDT entry 0 is invalid
[SKIP]    Cannot set affinity to CPU 1
[RUN]    Test exec
[    3.638967] CPU 0 set the LDT
[OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
[    3.639380] ------------[ cut here ]------------
[    3.639389] WARNING: CPU: 0 PID: 383 at
/local/linux-mainline.git/arch/x86/include/asm/mmu_context.h:96
flush_old_exec+0x7fd/0xb70()
[    3.639397] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[    3.639401] CPU: 0 PID: 383 Comm: ldt_gdt_32 Not tainted 4.2.0-rc4+ #21
[    3.639407]  00000000 00000000 c6fb9e10 c13b63ab 00000000 c6fb9e40
c105f299 c149476c
[    3.639417]  c6fb9e6c 0000017f c148d5ac 00000060 c11463dd c11463dd
c750aa00 c765e280
[    3.639426]  c765fb80 c6fb9e58 c105f333 00000009 c6fb9e50 c149476c
c6fb9e6c c6fb9e9c
[    3.639436] Call Trace:
[    3.639442]  [<c13b63ab>] dump_stack+0x48/0x60
[    3.639447]  [<c105f299>] warn_slowpath_common+0x89/0xc0
[    3.639451]  [<c11463dd>] ? flush_old_exec+0x7fd/0xb70
[    3.639456]  [<c11463dd>] ? flush_old_exec+0x7fd/0xb70
[    3.639460]  [<c105f333>] warn_slowpath_fmt+0x33/0x40
[    3.639464]  [<c11463dd>] flush_old_exec+0x7fd/0xb70
[    3.639470]  [<c118fdc4>] load_elf_binary+0x2b4/0x1060
[    3.639475]  [<c10a360e>] ? lock_release+0x27e/0x4e0
[    3.639480]  [<c10a3931>] ? lock_acquire+0xc1/0x240
[    3.639484]  [<c1146b6e>] search_binary_handler+0x4e/0xd0
[    3.639489]  [<c114721c>] do_execveat_common+0x62c/0x910
[    3.639493]  [<c114716d>] ? do_execveat_common+0x57d/0x910
[    3.639498]  [<c1147524>] do_execve+0x24/0x30
[    3.639502]  [<c1147788>] SyS_execve+0x28/0x40
[    3.639507]  [<c13bd497>] syscall_call+0x7/0x7
[    3.639511] ---[ end trace 95a382309b1f72bd ]---
[OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
[OK]    Child succeeded
[    3.639897] CPU 0 cleared the LDT

And this from a two-vcpu test:

[RUN]    Cross-CPU LDT invalidation
[    5.260315] CPU 1 set the LDT
[    5.260324] CPU 0 set the LDT
[    5.268245] CPU 0 cleared the LDT
[    5.268261] ------------[ cut here ]------------
[    5.268263] CPU 1 cleared the LDT
[    5.268272] WARNING: CPU: 0 PID: 389 at
/local/linux-mainline.git/kernel/locking/lockdep.c:2639
trace_hardirqs_off_caller+0xa9/0xb0()
[    5.268280] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[    5.268285] CPU: 0 PID: 389 Comm: ldt_gdt_32 Not tainted 4.2.0-rc4+ #21
[    5.268291]  00000000 00000000 c6863f38 c13b63ab 00000000 c6863f68
c105f299 c149476c
[    5.268301]  c6863f94 00000185 c1499b9c 00000a4f c109f8b9 c109f8b9
c13be3a7 00002000
[    5.268311]  c101b470 c6863f80 c105f333 00000009 c6863f78 c149476c
c6863f94 c6863f9c
[    5.268320] Call Trace:
[    5.268326]  [<c13b63ab>] dump_stack+0x48/0x60
[    5.268331]  [<c105f299>] warn_slowpath_common+0x89/0xc0
[    5.268336]  [<c109f8b9>] ? trace_hardirqs_off_caller+0xa9/0xb0
[    5.268340]  [<c109f8b9>] ? trace_hardirqs_off_caller+0xa9/0xb0
[    5.268346]  [<c13be3a7>] ? error_code+0x5b/0x64
[    5.268352]  [<c101b470>] ? do_device_not_available+0x50/0x50
[    5.268357]  [<c105f333>] warn_slowpath_fmt+0x33/0x40
[    5.268361]  [<c109f8b9>] trace_hardirqs_off_caller+0xa9/0xb0
[    5.268367]  [<c10029ec>] trace_hardirqs_off_thunk+0xc/0x10
[    5.268371]  [<c13be3a7>] ? error_code+0x5b/0x64
[    5.268376]  [<c13b0000>] ? xen_build_mfn_list_list+0x2a0/0x300
[    5.268381]  [<c101b470>] ? do_device_not_available+0x50/0x50
[    5.268386] ---[ end trace da759e1fe4c971e6 ]---
[    5.268434] CPU 1 set the LDT
[    5.268441] CPU 0 set the LDT
[    5.268537] CPU 0 cleared the LDT
[    5.268543] CPU 1 cleared the LDT
[    5.268629] CPU 1 set the LDT
[    5.268634] CPU 0 set the LDT
[    5.268726] CPU 0 cleared the LDT
[    5.268732] CPU 1 cleared the LDT
[    5.268818] CPU 1 set the LDT
[    5.268823] CPU 0 set the LDT
[    5.268916] CPU 0 cleared the LDT
[    5.268922] CPU 1 cleared the LDT
[    5.341360] CPU 1 set the LDT
[    5.341369] CPU 0 set the LDT
[    5.341528] CPU 0 cleared the LDT
[    5.341538] CPU 1 cleared the LDT
[OK]    All 5 iterations succeeded

I am going to try some 64bit tests now.

~Andrew

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option
  2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
                   ` (8 preceding siblings ...)
  2015-07-31  9:10 ` [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andrew Cooper
@ 2015-07-31  9:10 ` Andrew Cooper
  9 siblings, 0 replies; 31+ messages in thread
From: Andrew Cooper @ 2015-07-31  9:10 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Peter Zijlstra, Steven Rostedt, xen-devel, Jan Beulich,
	Sasha Levin, Boris Ostrovsky

On 30/07/15 22:31, Andy Lutomirski wrote:
> This is intended for x86/urgent.  Sorry for taking so long, but it
> seemed nice to avoid breaking Xen.

Very much appreciated.  Thanks!

>
> This fixes the "dazed and confused" issue which was exposed by the
> CVE-2015-5157 fix.  It's also probably a good general attack surface
> reduction, and it replaces some scary code with IMO less scary code.
>
> Also, servers and embedded systems should probably turn off modify_ldt.
> This makes that possible.
>
> Xen people, can you test patch 1?  It works for me on my evil 32-bit
> Xen virtio setup.

So the LDT issue seems to have gone away, which is good.

However, I did get this from my single vcpu guest test

[OK]    LDT entry 0 is invalid
[SKIP]    Cannot set affinity to CPU 1
[RUN]    Test exec
[    3.638967] CPU 0 set the LDT
[OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
[    3.639380] ------------[ cut here ]------------
[    3.639389] WARNING: CPU: 0 PID: 383 at
/local/linux-mainline.git/arch/x86/include/asm/mmu_context.h:96
flush_old_exec+0x7fd/0xb70()
[    3.639397] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[    3.639401] CPU: 0 PID: 383 Comm: ldt_gdt_32 Not tainted 4.2.0-rc4+ #21
[    3.639407]  00000000 00000000 c6fb9e10 c13b63ab 00000000 c6fb9e40
c105f299 c149476c
[    3.639417]  c6fb9e6c 0000017f c148d5ac 00000060 c11463dd c11463dd
c750aa00 c765e280
[    3.639426]  c765fb80 c6fb9e58 c105f333 00000009 c6fb9e50 c149476c
c6fb9e6c c6fb9e9c
[    3.639436] Call Trace:
[    3.639442]  [<c13b63ab>] dump_stack+0x48/0x60
[    3.639447]  [<c105f299>] warn_slowpath_common+0x89/0xc0
[    3.639451]  [<c11463dd>] ? flush_old_exec+0x7fd/0xb70
[    3.639456]  [<c11463dd>] ? flush_old_exec+0x7fd/0xb70
[    3.639460]  [<c105f333>] warn_slowpath_fmt+0x33/0x40
[    3.639464]  [<c11463dd>] flush_old_exec+0x7fd/0xb70
[    3.639470]  [<c118fdc4>] load_elf_binary+0x2b4/0x1060
[    3.639475]  [<c10a360e>] ? lock_release+0x27e/0x4e0
[    3.639480]  [<c10a3931>] ? lock_acquire+0xc1/0x240
[    3.639484]  [<c1146b6e>] search_binary_handler+0x4e/0xd0
[    3.639489]  [<c114721c>] do_execveat_common+0x62c/0x910
[    3.639493]  [<c114716d>] ? do_execveat_common+0x57d/0x910
[    3.639498]  [<c1147524>] do_execve+0x24/0x30
[    3.639502]  [<c1147788>] SyS_execve+0x28/0x40
[    3.639507]  [<c13bd497>] syscall_call+0x7/0x7
[    3.639511] ---[ end trace 95a382309b1f72bd ]---
[OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
[OK]    Child succeeded
[    3.639897] CPU 0 cleared the LDT

And this from a two-vcpu test:

[RUN]    Cross-CPU LDT invalidation
[    5.260315] CPU 1 set the LDT
[    5.260324] CPU 0 set the LDT
[    5.268245] CPU 0 cleared the LDT
[    5.268261] ------------[ cut here ]------------
[    5.268263] CPU 1 cleared the LDT
[    5.268272] WARNING: CPU: 0 PID: 389 at
/local/linux-mainline.git/kernel/locking/lockdep.c:2639
trace_hardirqs_off_caller+0xa9/0xb0()
[    5.268280] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[    5.268285] CPU: 0 PID: 389 Comm: ldt_gdt_32 Not tainted 4.2.0-rc4+ #21
[    5.268291]  00000000 00000000 c6863f38 c13b63ab 00000000 c6863f68
c105f299 c149476c
[    5.268301]  c6863f94 00000185 c1499b9c 00000a4f c109f8b9 c109f8b9
c13be3a7 00002000
[    5.268311]  c101b470 c6863f80 c105f333 00000009 c6863f78 c149476c
c6863f94 c6863f9c
[    5.268320] Call Trace:
[    5.268326]  [<c13b63ab>] dump_stack+0x48/0x60
[    5.268331]  [<c105f299>] warn_slowpath_common+0x89/0xc0
[    5.268336]  [<c109f8b9>] ? trace_hardirqs_off_caller+0xa9/0xb0
[    5.268340]  [<c109f8b9>] ? trace_hardirqs_off_caller+0xa9/0xb0
[    5.268346]  [<c13be3a7>] ? error_code+0x5b/0x64
[    5.268352]  [<c101b470>] ? do_device_not_available+0x50/0x50
[    5.268357]  [<c105f333>] warn_slowpath_fmt+0x33/0x40
[    5.268361]  [<c109f8b9>] trace_hardirqs_off_caller+0xa9/0xb0
[    5.268367]  [<c10029ec>] trace_hardirqs_off_thunk+0xc/0x10
[    5.268371]  [<c13be3a7>] ? error_code+0x5b/0x64
[    5.268376]  [<c13b0000>] ? xen_build_mfn_list_list+0x2a0/0x300
[    5.268381]  [<c101b470>] ? do_device_not_available+0x50/0x50
[    5.268386] ---[ end trace da759e1fe4c971e6 ]---
[    5.268434] CPU 1 set the LDT
[    5.268441] CPU 0 set the LDT
[    5.268537] CPU 0 cleared the LDT
[    5.268543] CPU 1 cleared the LDT
[    5.268629] CPU 1 set the LDT
[    5.268634] CPU 0 set the LDT
[    5.268726] CPU 0 cleared the LDT
[    5.268732] CPU 1 cleared the LDT
[    5.268818] CPU 1 set the LDT
[    5.268823] CPU 0 set the LDT
[    5.268916] CPU 0 cleared the LDT
[    5.268922] CPU 1 cleared the LDT
[    5.341360] CPU 1 set the LDT
[    5.341369] CPU 0 set the LDT
[    5.341528] CPU 0 cleared the LDT
[    5.341538] CPU 1 cleared the LDT
[OK]    All 5 iterations succeeded

I am going to try some 64bit tests now.

~Andrew

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xen-devel] [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall
  2015-07-30 21:31 ` Andy Lutomirski
@ 2015-07-31  9:41   ` David Vrabel
  2015-07-31  9:41   ` David Vrabel
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: David Vrabel @ 2015-07-31  9:41 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Peter Zijlstra, Andrew Cooper, Steven Rostedt,
	xen-devel, stable, Jan Beulich, Sasha Levin, Boris Ostrovsky

On 30/07/15 22:31, Andy Lutomirski wrote:
> The update_va_mapping hypercall can fail if the VA isn't present in
> the guest's page tables.  Under certain loads, this can result in an
> OOPS when the target address is in unpopulated vmap space.
> 
> While we're at it, add comments to help explain what's going on.
> 
> This isn't a great long-term fix.  This code should probably be
> changed to use something like set_memory_ro.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Thanks.

David

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall
  2015-07-30 21:31 ` Andy Lutomirski
  2015-07-31  9:41   ` [Xen-devel] " David Vrabel
@ 2015-07-31  9:41   ` David Vrabel
  2015-07-31 13:56   ` [tip:x86/asm] x86/xen: Probe target addresses in set_aliased_prot () " tip-bot for Andy Lutomirski
  2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: David Vrabel @ 2015-07-31  9:41 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Peter Zijlstra, Andrew Cooper, stable, xen-devel,
	Steven Rostedt, Jan Beulich, Sasha Levin, Boris Ostrovsky

On 30/07/15 22:31, Andy Lutomirski wrote:
> The update_va_mapping hypercall can fail if the VA isn't present in
> the guest's page tables.  Under certain loads, this can result in an
> OOPS when the target address is in unpopulated vmap space.
> 
> While we're at it, add comments to help explain what's going on.
> 
> This isn't a great long-term fix.  This code should probably be
> changed to use something like set_memory_ro.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Thanks.

David

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option
  2015-07-31  9:10 ` [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andrew Cooper
  2015-07-31 13:44   ` Boris Ostrovsky
@ 2015-07-31 13:44   ` Boris Ostrovsky
  2015-07-31 14:02       ` Andrew Cooper
  1 sibling, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2015-07-31 13:44 UTC (permalink / raw)
  To: Andrew Cooper, Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Jan Beulich, xen-devel

On 07/31/2015 05:10 AM, Andrew Cooper wrote:
> On 30/07/15 22:31, Andy Lutomirski wrote:
>> This is intended for x86/urgent.  Sorry for taking so long, but it
>> seemed nice to avoid breaking Xen.
> Very much appreciated.  Thanks!
>
>> This fixes the "dazed and confused" issue which was exposed by the
>> CVE-2015-5157 fix.  It's also probably a good general attack surface
>> reduction, and it replaces some scary code with IMO less scary code.
>>
>> Also, servers and embedded systems should probably turn off modify_ldt.
>> This makes that possible.
>>
>> Xen people, can you test patch 1?  It works for me on my evil 32-bit
>> Xen virtio setup.
> So the LDT issue seems to have gone away, which is good.
>
> However, I did get this from my single vcpu guest test
>
> [OK]    LDT entry 0 is invalid
> [SKIP]    Cannot set affinity to CPU 1
> [RUN]    Test exec
> [    3.638967] CPU 0 set the LDT
> [OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
> [    3.639380] ------------[ cut here ]------------
> [    3.639389] WARNING: CPU: 0 PID: 383 at
> /local/linux-mainline.git/arch/x86/include/asm/mmu_context.h:96
> flush_old_exec+0x7fd/0xb70()
> [    3.639397] DEBUG_LOCKS_WARN_ON(!irqs_disabled())

You must be running v5 (or earlier). This is fixed in v6 --- it is now 
'DEBUG_LOCKS_WARN_ON(preemptible());'


-boris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option
  2015-07-31  9:10 ` [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andrew Cooper
@ 2015-07-31 13:44   ` Boris Ostrovsky
  2015-07-31 13:44   ` Boris Ostrovsky
  1 sibling, 0 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2015-07-31 13:44 UTC (permalink / raw)
  To: Andrew Cooper, Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Peter Zijlstra, Steven Rostedt, xen-devel, Jan Beulich,
	Sasha Levin

On 07/31/2015 05:10 AM, Andrew Cooper wrote:
> On 30/07/15 22:31, Andy Lutomirski wrote:
>> This is intended for x86/urgent.  Sorry for taking so long, but it
>> seemed nice to avoid breaking Xen.
> Very much appreciated.  Thanks!
>
>> This fixes the "dazed and confused" issue which was exposed by the
>> CVE-2015-5157 fix.  It's also probably a good general attack surface
>> reduction, and it replaces some scary code with IMO less scary code.
>>
>> Also, servers and embedded systems should probably turn off modify_ldt.
>> This makes that possible.
>>
>> Xen people, can you test patch 1?  It works for me on my evil 32-bit
>> Xen virtio setup.
> So the LDT issue seems to have gone away, which is good.
>
> However, I did get this from my single vcpu guest test
>
> [OK]    LDT entry 0 is invalid
> [SKIP]    Cannot set affinity to CPU 1
> [RUN]    Test exec
> [    3.638967] CPU 0 set the LDT
> [OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
> [    3.639380] ------------[ cut here ]------------
> [    3.639389] WARNING: CPU: 0 PID: 383 at
> /local/linux-mainline.git/arch/x86/include/asm/mmu_context.h:96
> flush_old_exec+0x7fd/0xb70()
> [    3.639397] DEBUG_LOCKS_WARN_ON(!irqs_disabled())

You must be running v5 (or earlier). This is fixed in v6 --- it is now 
'DEBUG_LOCKS_WARN_ON(preemptible());'


-boris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/4] x86/ldt: Make modify_ldt optional
  2015-07-30 21:31 ` Andy Lutomirski
  2015-07-31 13:49   ` Ingo Molnar
@ 2015-07-31 13:49   ` Ingo Molnar
  2015-07-31 14:02   ` [tip:x86/asm] x86/ldt: Make modify_ldt() optional tip-bot for Andy Lutomirski
  2015-07-31 14:02   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2015-07-31 13:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Peter Zijlstra,
	Steven Rostedt, security, Sasha Levin, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel


* Andy Lutomirski <luto@kernel.org> wrote:

> The modify_ldt syscall exposes a large attack surface and is
> unnecessary for modern userspace.  Make it optional.
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/Kconfig                   | 17 +++++++++++++++++
>  arch/x86/include/asm/mmu.h         |  2 ++
>  arch/x86/include/asm/mmu_context.h | 28 +++++++++++++++++++++-------
>  arch/x86/kernel/Makefile           |  3 ++-
>  arch/x86/kernel/cpu/perf_event.c   |  4 ++++
>  arch/x86/kernel/process_64.c       |  2 ++
>  arch/x86/kernel/step.c             |  2 ++
>  kernel/sys_ni.c                    |  1 +
>  8 files changed, 51 insertions(+), 8 deletions(-)

btw., I fixed a (rare) build failure on MATH_EMULATION=y && !MODIFY_LDT_SYSCALL:

  arch/x86/math-emu/fpu_system.h:21:71: error: ‘mm_context_t’ has no member named ‘ldt’

I took the easy fix: made MATH_EMULATION depend on MODIFY_LDT_SYSCALL for now.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/4] x86/ldt: Make modify_ldt optional
  2015-07-30 21:31 ` Andy Lutomirski
@ 2015-07-31 13:49   ` Ingo Molnar
  2015-07-31 13:49   ` Ingo Molnar
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2015-07-31 13:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: security, Peter Zijlstra, Andrew Cooper, X86 ML, linux-kernel,
	Steven Rostedt, xen-devel, Borislav Petkov, Jan Beulich,
	Sasha Levin, Boris Ostrovsky


* Andy Lutomirski <luto@kernel.org> wrote:

> The modify_ldt syscall exposes a large attack surface and is
> unnecessary for modern userspace.  Make it optional.
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/Kconfig                   | 17 +++++++++++++++++
>  arch/x86/include/asm/mmu.h         |  2 ++
>  arch/x86/include/asm/mmu_context.h | 28 +++++++++++++++++++++-------
>  arch/x86/kernel/Makefile           |  3 ++-
>  arch/x86/kernel/cpu/perf_event.c   |  4 ++++
>  arch/x86/kernel/process_64.c       |  2 ++
>  arch/x86/kernel/step.c             |  2 ++
>  kernel/sys_ni.c                    |  1 +
>  8 files changed, 51 insertions(+), 8 deletions(-)

btw., I fixed a (rare) build failure on MATH_EMULATION=y && !MODIFY_LDT_SYSCALL:

  arch/x86/math-emu/fpu_system.h:21:71: error: ‘mm_context_t’ has no member named ‘ldt’

I took the easy fix: made MATH_EMULATION depend on MODIFY_LDT_SYSCALL for now.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] x86/xen: Probe target addresses in set_aliased_prot () before the hypercall
  2015-07-30 21:31 ` Andy Lutomirski
                     ` (2 preceding siblings ...)
  2015-07-31 13:56   ` [tip:x86/asm] x86/xen: Probe target addresses in set_aliased_prot () " tip-bot for Andy Lutomirski
@ 2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 13:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvrabel, luto, bp, mingo, rostedt, dvlasenk, andrew.cooper3,
	stable, tglx, sasha.levin, brgerst, boris.ostrovsky, hpa,
	jbeulich, peterz, xen-devel, luto, linux-kernel, konrad.wilk,
	torvalds, security

Commit-ID:  aa1acff356bbedfd03b544051f5b371746735d89
Gitweb:     http://git.kernel.org/tip/aa1acff356bbedfd03b544051f5b371746735d89
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:31 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 10:23:22 +0200

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/xen/enlighten.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0b95c9b..11d6fb4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 	pte_t pte;
 	unsigned long pfn;
 	struct page *page;
+	unsigned char dummy;
 
 	ptep = lookup_address((unsigned long)v, &level);
 	BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
 	pte = pfn_pte(pfn, prot);
 
+	/*
+	 * Careful: update_va_mapping() will fail if the virtual address
+	 * we're poking isn't populated in the page tables.  We don't
+	 * need to worry about the direct map (that's always in the page
+	 * tables), but we need to be careful about vmap space.  In
+	 * particular, the top level page table can lazily propagate
+	 * entries between processes, so if we've switched mms since we
+	 * vmapped the target in the first place, we might not have the
+	 * top-level page table entry populated.
+	 *
+	 * We disable preemption because we want the same mm active when
+	 * we probe the target and when we issue the hypercall.  We'll
+	 * have the same nominal mm, but if we're a kernel thread, lazy
+	 * mm dropping could change our pgd.
+	 *
+	 * Out of an abundance of caution, this uses __get_user() to fault
+	 * in the target address just in case there's some obscure case
+	 * in which the target address isn't readable.
+	 */
+
+	preempt_disable();
+
+	pagefault_disable();	/* Avoid warnings due to being atomic. */
+	__get_user(dummy, (unsigned char __user __force *)v);
+	pagefault_enable();
+
 	if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
 		BUG();
 
@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 				BUG();
 	} else
 		kmap_flush_unused();
+
+	preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 	const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
 	int i;
 
+	/*
+	 * We need to mark the all aliases of the LDT pages RO.  We
+	 * don't need to call vm_flush_aliases(), though, since that's
+	 * only responsible for flushing aliases out the TLBs, not the
+	 * page tables, and Xen will flush the TLB for us if needed.
+	 *
+	 * To avoid confusing future readers: none of this is necessary
+	 * to load the LDT.  The hypervisor only checks this when the
+	 * LDT is faulted in due to subsequent descriptor access.
+	 */
+
 	for(i = 0; i < entries; i += entries_per_page)
 		set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] x86/xen: Probe target addresses in set_aliased_prot () before the hypercall
  2015-07-30 21:31 ` Andy Lutomirski
  2015-07-31  9:41   ` [Xen-devel] " David Vrabel
  2015-07-31  9:41   ` David Vrabel
@ 2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 13:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: security, dvlasenk, luto, boris.ostrovsky, andrew.cooper3, hpa,
	linux-kernel, rostedt, luto, dvrabel, peterz, bp, stable,
	jbeulich, brgerst, sasha.levin, tglx, xen-devel, torvalds, mingo

Commit-ID:  aa1acff356bbedfd03b544051f5b371746735d89
Gitweb:     http://git.kernel.org/tip/aa1acff356bbedfd03b544051f5b371746735d89
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:31 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 10:23:22 +0200

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/xen/enlighten.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0b95c9b..11d6fb4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 	pte_t pte;
 	unsigned long pfn;
 	struct page *page;
+	unsigned char dummy;
 
 	ptep = lookup_address((unsigned long)v, &level);
 	BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
 	pte = pfn_pte(pfn, prot);
 
+	/*
+	 * Careful: update_va_mapping() will fail if the virtual address
+	 * we're poking isn't populated in the page tables.  We don't
+	 * need to worry about the direct map (that's always in the page
+	 * tables), but we need to be careful about vmap space.  In
+	 * particular, the top level page table can lazily propagate
+	 * entries between processes, so if we've switched mms since we
+	 * vmapped the target in the first place, we might not have the
+	 * top-level page table entry populated.
+	 *
+	 * We disable preemption because we want the same mm active when
+	 * we probe the target and when we issue the hypercall.  We'll
+	 * have the same nominal mm, but if we're a kernel thread, lazy
+	 * mm dropping could change our pgd.
+	 *
+	 * Out of an abundance of caution, this uses __get_user() to fault
+	 * in the target address just in case there's some obscure case
+	 * in which the target address isn't readable.
+	 */
+
+	preempt_disable();
+
+	pagefault_disable();	/* Avoid warnings due to being atomic. */
+	__get_user(dummy, (unsigned char __user __force *)v);
+	pagefault_enable();
+
 	if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
 		BUG();
 
@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 				BUG();
 	} else
 		kmap_flush_unused();
+
+	preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 	const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
 	int i;
 
+	/*
+	 * We need to mark the all aliases of the LDT pages RO.  We
+	 * don't need to call vm_flush_aliases(), though, since that's
+	 * only responsible for flushing aliases out the TLBs, not the
+	 * page tables, and Xen will flush the TLB for us if needed.
+	 *
+	 * To avoid confusing future readers: none of this is necessary
+	 * to load the LDT.  The hypervisor only checks this when the
+	 * LDT is faulted in due to subsequent descriptor access.
+	 */
+
 	for(i = 0; i < entries; i += entries_per_page)
 		set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] x86/ldt: Make modify_ldt synchronous
  2015-07-30 21:31 ` [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous Andy Lutomirski
  2015-07-30 22:50   ` Andrew Cooper
  2015-07-30 22:50   ` Andrew Cooper
@ 2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 13:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jbeulich, dvlasenk, luto, bp, andrew.cooper3, tglx, mingo,
	rostedt, bp, boris.ostrovsky, stable, xen-devel, torvalds, hpa,
	peterz, sasha.levin, security, linux-kernel, konrad.wilk, luto,
	brgerst

Commit-ID:  37868fe113ff2ba814b3b4eb12df214df555f8dc
Gitweb:     http://git.kernel.org/tip/37868fe113ff2ba814b3b4eb12df214df555f8dc
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:32 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 10:23:23 +0200

x86/ldt: Make modify_ldt synchronous

modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/desc.h        |  15 ---
 arch/x86/include/asm/mmu.h         |   3 +-
 arch/x86/include/asm/mmu_context.h |  54 +++++++-
 arch/x86/kernel/cpu/common.c       |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  12 +-
 arch/x86/kernel/ldt.c              | 262 ++++++++++++++++++++-----------------
 arch/x86/kernel/process_64.c       |   4 +-
 arch/x86/kernel/step.c             |   6 +-
 arch/x86/power/cpu.c               |   3 +-
 9 files changed, 210 insertions(+), 153 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89f..4e10d73 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
 	set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-	set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-	preempt_disable();
-	load_LDT_nolock(pc);
-	preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
 	return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620..364d274 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-	void *ldt;
-	int size;
+	struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 804a3a6..984abfe 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,50 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+	/*
+	 * Xen requires page-aligned LDTs with special permissions.  This is
+	 * needed to prevent us from installing evil descriptors such as
+	 * call gates.  On native, we could merge the ldt_struct and LDT
+	 * allocations, but it's not worth trying to optimize.
+	 */
+	struct desc_struct *entries;
+	int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+	struct ldt_struct *ldt;
+
+	/* lockless_dereference synchronizes with smp_store_release */
+	ldt = lockless_dereference(mm->context.ldt);
+
+	/*
+	 * Any change to mm->context.ldt is followed by an IPI to all
+	 * CPUs with the mm active.  The LDT will not be freed until
+	 * after the IPI is handled by all such CPUs.  This means that,
+	 * if the ldt_struct changes before we return, the values we see
+	 * will be safe, and the new values will be loaded before we run
+	 * any user code.
+	 *
+	 * NB: don't try to convert this to use RCU without extreme care.
+	 * We would still need IRQs off, because we don't want to change
+	 * the local LDT after an IPI loaded a newer value than the one
+	 * that we can see.
+	 */
+
+	if (unlikely(ldt))
+		set_ldt(ldt->entries, ldt->size);
+	else
+		clear_LDT();
+
+	DEBUG_LOCKS_WARN_ON(preemptible());
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +122,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * was called and then modify_ldt changed
 		 * prev->context.ldt but suppressed an IPI to this CPU.
 		 * In this case, prev->context.ldt != NULL, because we
-		 * never free an LDT while the mm still exists.  That
-		 * means that next->context.ldt != prev->context.ldt,
-		 * because mms never share an LDT.
+		 * never set context.ldt to NULL while the mm still
+		 * exists.  That means that next->context.ldt !=
+		 * prev->context.ldt, because mms never share an LDT.
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 	}
 #ifdef CONFIG_SMP
 	  else {
@@ -106,7 +150,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 			load_mm_cr4(next);
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 		}
 	}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 922c5e0..cb9e5df 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1410,7 +1410,7 @@ void cpu_init(void)
 	load_sp0(t, &current->thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
 	dbg_restore_debug_regs();
@@ -1459,7 +1459,7 @@ void cpu_init(void)
 	load_sp0(t, thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3658de4..9469dfa 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,21 +2179,25 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+		struct ldt_struct *ldt;
+
 		if (idx > LDT_ENTRIES)
 			return 0;
 
-		if (idx > current->active_mm->context.size)
+		/* IRQs are off, so this synchronizes with smp_store_release */
+		ldt = lockless_dereference(current->active_mm->context.ldt);
+		if (!ldt || idx > ldt->size)
 			return 0;
 
-		desc = current->active_mm->context.ldt;
+		desc = &ldt->entries[idx];
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
 
-		desc = raw_cpu_ptr(gdt_page.gdt);
+		desc = raw_cpu_ptr(gdt_page.gdt) + idx;
 	}
 
-	return get_desc_base(desc + idx);
+	return get_desc_base(desc);
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c37886d..2bcc052 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -12,6 +12,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/smp.h>
+#include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 
@@ -20,82 +21,82 @@
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
 
-#ifdef CONFIG_SMP
+/* context.lock is held for us, so we don't need any locking. */
 static void flush_ldt(void *current_mm)
 {
-	if (current->active_mm == current_mm)
-		load_LDT(&current->active_mm->context);
+	mm_context_t *pc;
+
+	if (current->active_mm != current_mm)
+		return;
+
+	pc = &current->active_mm->context;
+	set_ldt(pc->ldt->entries, pc->ldt->size);
 }
-#endif
 
-static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
+/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
+static struct ldt_struct *alloc_ldt_struct(int size)
 {
-	void *oldldt, *newldt;
-	int oldsize;
-
-	if (mincount <= pc->size)
-		return 0;
-	oldsize = pc->size;
-	mincount = (mincount + (PAGE_SIZE / LDT_ENTRY_SIZE - 1)) &
-			(~(PAGE_SIZE / LDT_ENTRY_SIZE - 1));
-	if (mincount * LDT_ENTRY_SIZE > PAGE_SIZE)
-		newldt = vmalloc(mincount * LDT_ENTRY_SIZE);
+	struct ldt_struct *new_ldt;
+	int alloc_size;
+
+	if (size > LDT_ENTRIES)
+		return NULL;
+
+	new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
+	if (!new_ldt)
+		return NULL;
+
+	BUILD_BUG_ON(LDT_ENTRY_SIZE != sizeof(struct desc_struct));
+	alloc_size = size * LDT_ENTRY_SIZE;
+
+	/*
+	 * Xen is very picky: it requires a page-aligned LDT that has no
+	 * trailing nonzero bytes in any page that contains LDT descriptors.
+	 * Keep it simple: zero the whole allocation and never allocate less
+	 * than PAGE_SIZE.
+	 */
+	if (alloc_size > PAGE_SIZE)
+		new_ldt->entries = vzalloc(alloc_size);
 	else
-		newldt = (void *)__get_free_page(GFP_KERNEL);
-
-	if (!newldt)
-		return -ENOMEM;
+		new_ldt->entries = kzalloc(PAGE_SIZE, GFP_KERNEL);
 
-	if (oldsize)
-		memcpy(newldt, pc->ldt, oldsize * LDT_ENTRY_SIZE);
-	oldldt = pc->ldt;
-	memset(newldt + oldsize * LDT_ENTRY_SIZE, 0,
-	       (mincount - oldsize) * LDT_ENTRY_SIZE);
+	if (!new_ldt->entries) {
+		kfree(new_ldt);
+		return NULL;
+	}
 
-	paravirt_alloc_ldt(newldt, mincount);
+	new_ldt->size = size;
+	return new_ldt;
+}
 
-#ifdef CONFIG_X86_64
-	/* CHECKME: Do we really need this ? */
-	wmb();
-#endif
-	pc->ldt = newldt;
-	wmb();
-	pc->size = mincount;
-	wmb();
-
-	if (reload) {
-#ifdef CONFIG_SMP
-		preempt_disable();
-		load_LDT(pc);
-		if (!cpumask_equal(mm_cpumask(current->mm),
-				   cpumask_of(smp_processor_id())))
-			smp_call_function(flush_ldt, current->mm, 1);
-		preempt_enable();
-#else
-		load_LDT(pc);
-#endif
-	}
-	if (oldsize) {
-		paravirt_free_ldt(oldldt, oldsize);
-		if (oldsize * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(oldldt);
-		else
-			put_page(virt_to_page(oldldt));
-	}
-	return 0;
+/* After calling this, the LDT is immutable. */
+static void finalize_ldt_struct(struct ldt_struct *ldt)
+{
+	paravirt_alloc_ldt(ldt->entries, ldt->size);
 }
 
-static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
+/* context.lock is held */
+static void install_ldt(struct mm_struct *current_mm,
+			struct ldt_struct *ldt)
 {
-	int err = alloc_ldt(new, old->size, 0);
-	int i;
+	/* Synchronizes with lockless_dereference in load_mm_ldt. */
+	smp_store_release(&current_mm->context.ldt, ldt);
+
+	/* Activate the LDT for all CPUs using current_mm. */
+	on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+}
 
-	if (err < 0)
-		return err;
+static void free_ldt_struct(struct ldt_struct *ldt)
+{
+	if (likely(!ldt))
+		return;
 
-	for (i = 0; i < old->size; i++)
-		write_ldt_entry(new->ldt, i, old->ldt + i * LDT_ENTRY_SIZE);
-	return 0;
+	paravirt_free_ldt(ldt->entries, ldt->size);
+	if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+		vfree(ldt->entries);
+	else
+		kfree(ldt->entries);
+	kfree(ldt);
 }
 
 /*
@@ -104,17 +105,37 @@ static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
+	struct ldt_struct *new_ldt;
 	struct mm_struct *old_mm;
 	int retval = 0;
 
 	mutex_init(&mm->context.lock);
-	mm->context.size = 0;
 	old_mm = current->mm;
-	if (old_mm && old_mm->context.size > 0) {
-		mutex_lock(&old_mm->context.lock);
-		retval = copy_ldt(&mm->context, &old_mm->context);
-		mutex_unlock(&old_mm->context.lock);
+	if (!old_mm) {
+		mm->context.ldt = NULL;
+		return 0;
 	}
+
+	mutex_lock(&old_mm->context.lock);
+	if (!old_mm->context.ldt) {
+		mm->context.ldt = NULL;
+		goto out_unlock;
+	}
+
+	new_ldt = alloc_ldt_struct(old_mm->context.ldt->size);
+	if (!new_ldt) {
+		retval = -ENOMEM;
+		goto out_unlock;
+	}
+
+	memcpy(new_ldt->entries, old_mm->context.ldt->entries,
+	       new_ldt->size * LDT_ENTRY_SIZE);
+	finalize_ldt_struct(new_ldt);
+
+	mm->context.ldt = new_ldt;
+
+out_unlock:
+	mutex_unlock(&old_mm->context.lock);
 	return retval;
 }
 
@@ -125,53 +146,47 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
  */
 void destroy_context(struct mm_struct *mm)
 {
-	if (mm->context.size) {
-#ifdef CONFIG_X86_32
-		/* CHECKME: Can this ever happen ? */
-		if (mm == current->active_mm)
-			clear_LDT();
-#endif
-		paravirt_free_ldt(mm->context.ldt, mm->context.size);
-		if (mm->context.size * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(mm->context.ldt);
-		else
-			put_page(virt_to_page(mm->context.ldt));
-		mm->context.size = 0;
-	}
+	free_ldt_struct(mm->context.ldt);
+	mm->context.ldt = NULL;
 }
 
 static int read_ldt(void __user *ptr, unsigned long bytecount)
 {
-	int err;
+	int retval;
 	unsigned long size;
 	struct mm_struct *mm = current->mm;
 
-	if (!mm->context.size)
-		return 0;
+	mutex_lock(&mm->context.lock);
+
+	if (!mm->context.ldt) {
+		retval = 0;
+		goto out_unlock;
+	}
+
 	if (bytecount > LDT_ENTRY_SIZE * LDT_ENTRIES)
 		bytecount = LDT_ENTRY_SIZE * LDT_ENTRIES;
 
-	mutex_lock(&mm->context.lock);
-	size = mm->context.size * LDT_ENTRY_SIZE;
+	size = mm->context.ldt->size * LDT_ENTRY_SIZE;
 	if (size > bytecount)
 		size = bytecount;
 
-	err = 0;
-	if (copy_to_user(ptr, mm->context.ldt, size))
-		err = -EFAULT;
-	mutex_unlock(&mm->context.lock);
-	if (err < 0)
-		goto error_return;
+	if (copy_to_user(ptr, mm->context.ldt->entries, size)) {
+		retval = -EFAULT;
+		goto out_unlock;
+	}
+
 	if (size != bytecount) {
-		/* zero-fill the rest */
-		if (clear_user(ptr + size, bytecount - size) != 0) {
-			err = -EFAULT;
-			goto error_return;
+		/* Zero-fill the rest and pretend we read bytecount bytes. */
+		if (clear_user(ptr + size, bytecount - size)) {
+			retval = -EFAULT;
+			goto out_unlock;
 		}
 	}
-	return bytecount;
-error_return:
-	return err;
+	retval = bytecount;
+
+out_unlock:
+	mutex_unlock(&mm->context.lock);
+	return retval;
 }
 
 static int read_default_ldt(void __user *ptr, unsigned long bytecount)
@@ -195,6 +210,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	struct desc_struct ldt;
 	int error;
 	struct user_desc ldt_info;
+	int oldsize, newsize;
+	struct ldt_struct *new_ldt, *old_ldt;
 
 	error = -EINVAL;
 	if (bytecount != sizeof(ldt_info))
@@ -213,34 +230,39 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 			goto out;
 	}
 
-	mutex_lock(&mm->context.lock);
-	if (ldt_info.entry_number >= mm->context.size) {
-		error = alloc_ldt(&current->mm->context,
-				  ldt_info.entry_number + 1, 1);
-		if (error < 0)
-			goto out_unlock;
-	}
-
-	/* Allow LDTs to be cleared by the user. */
-	if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
-		if (oldmode || LDT_empty(&ldt_info)) {
-			memset(&ldt, 0, sizeof(ldt));
-			goto install;
+	if ((oldmode && !ldt_info.base_addr && !ldt_info.limit) ||
+	    LDT_empty(&ldt_info)) {
+		/* The user wants to clear the entry. */
+		memset(&ldt, 0, sizeof(ldt));
+	} else {
+		if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+			error = -EINVAL;
+			goto out;
 		}
+
+		fill_ldt(&ldt, &ldt_info);
+		if (oldmode)
+			ldt.avl = 0;
 	}
 
-	if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
-		error = -EINVAL;
+	mutex_lock(&mm->context.lock);
+
+	old_ldt = mm->context.ldt;
+	oldsize = old_ldt ? old_ldt->size : 0;
+	newsize = max((int)(ldt_info.entry_number + 1), oldsize);
+
+	error = -ENOMEM;
+	new_ldt = alloc_ldt_struct(newsize);
+	if (!new_ldt)
 		goto out_unlock;
-	}
 
-	fill_ldt(&ldt, &ldt_info);
-	if (oldmode)
-		ldt.avl = 0;
+	if (old_ldt)
+		memcpy(new_ldt->entries, old_ldt->entries, oldsize * LDT_ENTRY_SIZE);
+	new_ldt->entries[ldt_info.entry_number] = ldt;
+	finalize_ldt_struct(new_ldt);
 
-	/* Install the new entry ...  */
-install:
-	write_ldt_entry(mm->context.ldt, ldt_info.entry_number, &ldt);
+	install_ldt(mm, new_ldt);
+	free_ldt_struct(old_ldt);
 	error = 0;
 
 out_unlock:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 71d7849..f6b9163 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,11 +121,11 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
-		if (dead_task->mm->context.size) {
+		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
 				dead_task->mm->context.ldt,
-				dead_task->mm->context.size);
+				dead_task->mm->context.ldt->size);
 			BUG();
 		}
 	}
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 9b4d51d..6273324 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/ptrace.h>
 #include <asm/desc.h>
+#include <asm/mmu_context.h>
 
 unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs)
 {
@@ -30,10 +31,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		seg &= ~7UL;
 
 		mutex_lock(&child->mm->context.lock);
-		if (unlikely((seg >> 3) >= child->mm->context.size))
+		if (unlikely(!child->mm->context.ldt ||
+			     (seg >> 3) >= child->mm->context.ldt->size))
 			addr = -1L; /* bogus selector, access would fault */
 		else {
-			desc = child->mm->context.ldt + seg;
+			desc = &child->mm->context.ldt->entries[seg];
 			base = get_desc_base(desc);
 
 			/* 16-bit code segment? */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 0d7dd1f..9ab5279 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -22,6 +22,7 @@
 #include <asm/fpu/internal.h>
 #include <asm/debugreg.h>
 #include <asm/cpu.h>
+#include <asm/mmu_context.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -153,7 +154,7 @@ static void fix_processor_context(void)
 	syscall_init();				/* This sets MSR_*STAR and related */
 #endif
 	load_TR_desc();				/* This does ltr */
-	load_LDT(&current->active_mm->context);	/* This does lldt */
+	load_mm_ldt(current->active_mm);	/* This does lldt */
 
 	fpu__resume_cpu();
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] x86/ldt: Make modify_ldt synchronous
  2015-07-30 21:31 ` [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous Andy Lutomirski
                     ` (2 preceding siblings ...)
  2015-07-31 13:56   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
@ 2015-07-31 13:56   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 13:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: security, xen-devel, dvlasenk, luto, brgerst, boris.ostrovsky,
	andrew.cooper3, linux-kernel, rostedt, luto, peterz, bp, stable,
	jbeulich, hpa, sasha.levin, tglx, bp, torvalds, mingo

Commit-ID:  37868fe113ff2ba814b3b4eb12df214df555f8dc
Gitweb:     http://git.kernel.org/tip/37868fe113ff2ba814b3b4eb12df214df555f8dc
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:32 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 10:23:23 +0200

x86/ldt: Make modify_ldt synchronous

modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/desc.h        |  15 ---
 arch/x86/include/asm/mmu.h         |   3 +-
 arch/x86/include/asm/mmu_context.h |  54 +++++++-
 arch/x86/kernel/cpu/common.c       |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  12 +-
 arch/x86/kernel/ldt.c              | 262 ++++++++++++++++++++-----------------
 arch/x86/kernel/process_64.c       |   4 +-
 arch/x86/kernel/step.c             |   6 +-
 arch/x86/power/cpu.c               |   3 +-
 9 files changed, 210 insertions(+), 153 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89f..4e10d73 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
 	set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-	set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-	preempt_disable();
-	load_LDT_nolock(pc);
-	preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
 	return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620..364d274 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-	void *ldt;
-	int size;
+	struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 804a3a6..984abfe 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,50 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+	/*
+	 * Xen requires page-aligned LDTs with special permissions.  This is
+	 * needed to prevent us from installing evil descriptors such as
+	 * call gates.  On native, we could merge the ldt_struct and LDT
+	 * allocations, but it's not worth trying to optimize.
+	 */
+	struct desc_struct *entries;
+	int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+	struct ldt_struct *ldt;
+
+	/* lockless_dereference synchronizes with smp_store_release */
+	ldt = lockless_dereference(mm->context.ldt);
+
+	/*
+	 * Any change to mm->context.ldt is followed by an IPI to all
+	 * CPUs with the mm active.  The LDT will not be freed until
+	 * after the IPI is handled by all such CPUs.  This means that,
+	 * if the ldt_struct changes before we return, the values we see
+	 * will be safe, and the new values will be loaded before we run
+	 * any user code.
+	 *
+	 * NB: don't try to convert this to use RCU without extreme care.
+	 * We would still need IRQs off, because we don't want to change
+	 * the local LDT after an IPI loaded a newer value than the one
+	 * that we can see.
+	 */
+
+	if (unlikely(ldt))
+		set_ldt(ldt->entries, ldt->size);
+	else
+		clear_LDT();
+
+	DEBUG_LOCKS_WARN_ON(preemptible());
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +122,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * was called and then modify_ldt changed
 		 * prev->context.ldt but suppressed an IPI to this CPU.
 		 * In this case, prev->context.ldt != NULL, because we
-		 * never free an LDT while the mm still exists.  That
-		 * means that next->context.ldt != prev->context.ldt,
-		 * because mms never share an LDT.
+		 * never set context.ldt to NULL while the mm still
+		 * exists.  That means that next->context.ldt !=
+		 * prev->context.ldt, because mms never share an LDT.
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 	}
 #ifdef CONFIG_SMP
 	  else {
@@ -106,7 +150,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 			load_mm_cr4(next);
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 		}
 	}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 922c5e0..cb9e5df 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1410,7 +1410,7 @@ void cpu_init(void)
 	load_sp0(t, &current->thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
 	dbg_restore_debug_regs();
@@ -1459,7 +1459,7 @@ void cpu_init(void)
 	load_sp0(t, thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3658de4..9469dfa 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,21 +2179,25 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+		struct ldt_struct *ldt;
+
 		if (idx > LDT_ENTRIES)
 			return 0;
 
-		if (idx > current->active_mm->context.size)
+		/* IRQs are off, so this synchronizes with smp_store_release */
+		ldt = lockless_dereference(current->active_mm->context.ldt);
+		if (!ldt || idx > ldt->size)
 			return 0;
 
-		desc = current->active_mm->context.ldt;
+		desc = &ldt->entries[idx];
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
 
-		desc = raw_cpu_ptr(gdt_page.gdt);
+		desc = raw_cpu_ptr(gdt_page.gdt) + idx;
 	}
 
-	return get_desc_base(desc + idx);
+	return get_desc_base(desc);
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c37886d..2bcc052 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -12,6 +12,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/smp.h>
+#include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 
@@ -20,82 +21,82 @@
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
 
-#ifdef CONFIG_SMP
+/* context.lock is held for us, so we don't need any locking. */
 static void flush_ldt(void *current_mm)
 {
-	if (current->active_mm == current_mm)
-		load_LDT(&current->active_mm->context);
+	mm_context_t *pc;
+
+	if (current->active_mm != current_mm)
+		return;
+
+	pc = &current->active_mm->context;
+	set_ldt(pc->ldt->entries, pc->ldt->size);
 }
-#endif
 
-static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
+/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
+static struct ldt_struct *alloc_ldt_struct(int size)
 {
-	void *oldldt, *newldt;
-	int oldsize;
-
-	if (mincount <= pc->size)
-		return 0;
-	oldsize = pc->size;
-	mincount = (mincount + (PAGE_SIZE / LDT_ENTRY_SIZE - 1)) &
-			(~(PAGE_SIZE / LDT_ENTRY_SIZE - 1));
-	if (mincount * LDT_ENTRY_SIZE > PAGE_SIZE)
-		newldt = vmalloc(mincount * LDT_ENTRY_SIZE);
+	struct ldt_struct *new_ldt;
+	int alloc_size;
+
+	if (size > LDT_ENTRIES)
+		return NULL;
+
+	new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
+	if (!new_ldt)
+		return NULL;
+
+	BUILD_BUG_ON(LDT_ENTRY_SIZE != sizeof(struct desc_struct));
+	alloc_size = size * LDT_ENTRY_SIZE;
+
+	/*
+	 * Xen is very picky: it requires a page-aligned LDT that has no
+	 * trailing nonzero bytes in any page that contains LDT descriptors.
+	 * Keep it simple: zero the whole allocation and never allocate less
+	 * than PAGE_SIZE.
+	 */
+	if (alloc_size > PAGE_SIZE)
+		new_ldt->entries = vzalloc(alloc_size);
 	else
-		newldt = (void *)__get_free_page(GFP_KERNEL);
-
-	if (!newldt)
-		return -ENOMEM;
+		new_ldt->entries = kzalloc(PAGE_SIZE, GFP_KERNEL);
 
-	if (oldsize)
-		memcpy(newldt, pc->ldt, oldsize * LDT_ENTRY_SIZE);
-	oldldt = pc->ldt;
-	memset(newldt + oldsize * LDT_ENTRY_SIZE, 0,
-	       (mincount - oldsize) * LDT_ENTRY_SIZE);
+	if (!new_ldt->entries) {
+		kfree(new_ldt);
+		return NULL;
+	}
 
-	paravirt_alloc_ldt(newldt, mincount);
+	new_ldt->size = size;
+	return new_ldt;
+}
 
-#ifdef CONFIG_X86_64
-	/* CHECKME: Do we really need this ? */
-	wmb();
-#endif
-	pc->ldt = newldt;
-	wmb();
-	pc->size = mincount;
-	wmb();
-
-	if (reload) {
-#ifdef CONFIG_SMP
-		preempt_disable();
-		load_LDT(pc);
-		if (!cpumask_equal(mm_cpumask(current->mm),
-				   cpumask_of(smp_processor_id())))
-			smp_call_function(flush_ldt, current->mm, 1);
-		preempt_enable();
-#else
-		load_LDT(pc);
-#endif
-	}
-	if (oldsize) {
-		paravirt_free_ldt(oldldt, oldsize);
-		if (oldsize * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(oldldt);
-		else
-			put_page(virt_to_page(oldldt));
-	}
-	return 0;
+/* After calling this, the LDT is immutable. */
+static void finalize_ldt_struct(struct ldt_struct *ldt)
+{
+	paravirt_alloc_ldt(ldt->entries, ldt->size);
 }
 
-static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
+/* context.lock is held */
+static void install_ldt(struct mm_struct *current_mm,
+			struct ldt_struct *ldt)
 {
-	int err = alloc_ldt(new, old->size, 0);
-	int i;
+	/* Synchronizes with lockless_dereference in load_mm_ldt. */
+	smp_store_release(&current_mm->context.ldt, ldt);
+
+	/* Activate the LDT for all CPUs using current_mm. */
+	on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+}
 
-	if (err < 0)
-		return err;
+static void free_ldt_struct(struct ldt_struct *ldt)
+{
+	if (likely(!ldt))
+		return;
 
-	for (i = 0; i < old->size; i++)
-		write_ldt_entry(new->ldt, i, old->ldt + i * LDT_ENTRY_SIZE);
-	return 0;
+	paravirt_free_ldt(ldt->entries, ldt->size);
+	if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+		vfree(ldt->entries);
+	else
+		kfree(ldt->entries);
+	kfree(ldt);
 }
 
 /*
@@ -104,17 +105,37 @@ static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
+	struct ldt_struct *new_ldt;
 	struct mm_struct *old_mm;
 	int retval = 0;
 
 	mutex_init(&mm->context.lock);
-	mm->context.size = 0;
 	old_mm = current->mm;
-	if (old_mm && old_mm->context.size > 0) {
-		mutex_lock(&old_mm->context.lock);
-		retval = copy_ldt(&mm->context, &old_mm->context);
-		mutex_unlock(&old_mm->context.lock);
+	if (!old_mm) {
+		mm->context.ldt = NULL;
+		return 0;
 	}
+
+	mutex_lock(&old_mm->context.lock);
+	if (!old_mm->context.ldt) {
+		mm->context.ldt = NULL;
+		goto out_unlock;
+	}
+
+	new_ldt = alloc_ldt_struct(old_mm->context.ldt->size);
+	if (!new_ldt) {
+		retval = -ENOMEM;
+		goto out_unlock;
+	}
+
+	memcpy(new_ldt->entries, old_mm->context.ldt->entries,
+	       new_ldt->size * LDT_ENTRY_SIZE);
+	finalize_ldt_struct(new_ldt);
+
+	mm->context.ldt = new_ldt;
+
+out_unlock:
+	mutex_unlock(&old_mm->context.lock);
 	return retval;
 }
 
@@ -125,53 +146,47 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
  */
 void destroy_context(struct mm_struct *mm)
 {
-	if (mm->context.size) {
-#ifdef CONFIG_X86_32
-		/* CHECKME: Can this ever happen ? */
-		if (mm == current->active_mm)
-			clear_LDT();
-#endif
-		paravirt_free_ldt(mm->context.ldt, mm->context.size);
-		if (mm->context.size * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(mm->context.ldt);
-		else
-			put_page(virt_to_page(mm->context.ldt));
-		mm->context.size = 0;
-	}
+	free_ldt_struct(mm->context.ldt);
+	mm->context.ldt = NULL;
 }
 
 static int read_ldt(void __user *ptr, unsigned long bytecount)
 {
-	int err;
+	int retval;
 	unsigned long size;
 	struct mm_struct *mm = current->mm;
 
-	if (!mm->context.size)
-		return 0;
+	mutex_lock(&mm->context.lock);
+
+	if (!mm->context.ldt) {
+		retval = 0;
+		goto out_unlock;
+	}
+
 	if (bytecount > LDT_ENTRY_SIZE * LDT_ENTRIES)
 		bytecount = LDT_ENTRY_SIZE * LDT_ENTRIES;
 
-	mutex_lock(&mm->context.lock);
-	size = mm->context.size * LDT_ENTRY_SIZE;
+	size = mm->context.ldt->size * LDT_ENTRY_SIZE;
 	if (size > bytecount)
 		size = bytecount;
 
-	err = 0;
-	if (copy_to_user(ptr, mm->context.ldt, size))
-		err = -EFAULT;
-	mutex_unlock(&mm->context.lock);
-	if (err < 0)
-		goto error_return;
+	if (copy_to_user(ptr, mm->context.ldt->entries, size)) {
+		retval = -EFAULT;
+		goto out_unlock;
+	}
+
 	if (size != bytecount) {
-		/* zero-fill the rest */
-		if (clear_user(ptr + size, bytecount - size) != 0) {
-			err = -EFAULT;
-			goto error_return;
+		/* Zero-fill the rest and pretend we read bytecount bytes. */
+		if (clear_user(ptr + size, bytecount - size)) {
+			retval = -EFAULT;
+			goto out_unlock;
 		}
 	}
-	return bytecount;
-error_return:
-	return err;
+	retval = bytecount;
+
+out_unlock:
+	mutex_unlock(&mm->context.lock);
+	return retval;
 }
 
 static int read_default_ldt(void __user *ptr, unsigned long bytecount)
@@ -195,6 +210,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	struct desc_struct ldt;
 	int error;
 	struct user_desc ldt_info;
+	int oldsize, newsize;
+	struct ldt_struct *new_ldt, *old_ldt;
 
 	error = -EINVAL;
 	if (bytecount != sizeof(ldt_info))
@@ -213,34 +230,39 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 			goto out;
 	}
 
-	mutex_lock(&mm->context.lock);
-	if (ldt_info.entry_number >= mm->context.size) {
-		error = alloc_ldt(&current->mm->context,
-				  ldt_info.entry_number + 1, 1);
-		if (error < 0)
-			goto out_unlock;
-	}
-
-	/* Allow LDTs to be cleared by the user. */
-	if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
-		if (oldmode || LDT_empty(&ldt_info)) {
-			memset(&ldt, 0, sizeof(ldt));
-			goto install;
+	if ((oldmode && !ldt_info.base_addr && !ldt_info.limit) ||
+	    LDT_empty(&ldt_info)) {
+		/* The user wants to clear the entry. */
+		memset(&ldt, 0, sizeof(ldt));
+	} else {
+		if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+			error = -EINVAL;
+			goto out;
 		}
+
+		fill_ldt(&ldt, &ldt_info);
+		if (oldmode)
+			ldt.avl = 0;
 	}
 
-	if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
-		error = -EINVAL;
+	mutex_lock(&mm->context.lock);
+
+	old_ldt = mm->context.ldt;
+	oldsize = old_ldt ? old_ldt->size : 0;
+	newsize = max((int)(ldt_info.entry_number + 1), oldsize);
+
+	error = -ENOMEM;
+	new_ldt = alloc_ldt_struct(newsize);
+	if (!new_ldt)
 		goto out_unlock;
-	}
 
-	fill_ldt(&ldt, &ldt_info);
-	if (oldmode)
-		ldt.avl = 0;
+	if (old_ldt)
+		memcpy(new_ldt->entries, old_ldt->entries, oldsize * LDT_ENTRY_SIZE);
+	new_ldt->entries[ldt_info.entry_number] = ldt;
+	finalize_ldt_struct(new_ldt);
 
-	/* Install the new entry ...  */
-install:
-	write_ldt_entry(mm->context.ldt, ldt_info.entry_number, &ldt);
+	install_ldt(mm, new_ldt);
+	free_ldt_struct(old_ldt);
 	error = 0;
 
 out_unlock:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 71d7849..f6b9163 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,11 +121,11 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
-		if (dead_task->mm->context.size) {
+		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
 				dead_task->mm->context.ldt,
-				dead_task->mm->context.size);
+				dead_task->mm->context.ldt->size);
 			BUG();
 		}
 	}
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 9b4d51d..6273324 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/ptrace.h>
 #include <asm/desc.h>
+#include <asm/mmu_context.h>
 
 unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs)
 {
@@ -30,10 +31,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		seg &= ~7UL;
 
 		mutex_lock(&child->mm->context.lock);
-		if (unlikely((seg >> 3) >= child->mm->context.size))
+		if (unlikely(!child->mm->context.ldt ||
+			     (seg >> 3) >= child->mm->context.ldt->size))
 			addr = -1L; /* bogus selector, access would fault */
 		else {
-			desc = child->mm->context.ldt + seg;
+			desc = &child->mm->context.ldt->entries[seg];
 			base = get_desc_base(desc);
 
 			/* 16-bit code segment? */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 0d7dd1f..9ab5279 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -22,6 +22,7 @@
 #include <asm/fpu/internal.h>
 #include <asm/debugreg.h>
 #include <asm/cpu.h>
+#include <asm/mmu_context.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -153,7 +154,7 @@ static void fix_processor_context(void)
 	syscall_init();				/* This sets MSR_*STAR and related */
 #endif
 	load_TR_desc();				/* This does ltr */
-	load_LDT(&current->active_mm->context);	/* This does lldt */
+	load_mm_ldt(current->active_mm);	/* This does lldt */
 
 	fpu__resume_cpu();
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] selftests/x86, x86/ldt: Add a selftest for modify_ldt()
  2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
@ 2015-07-31 13:57   ` tip-bot for Andy Lutomirski
  2015-07-31 13:57   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 13:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, sasha.levin, luto, bp, xen-devel, konrad.wilk,
	keescook, andrew.cooper3, peterz, boris.ostrovsky, rostedt, luto,
	jbeulich, tglx, brgerst, security, dvlasenk, torvalds,
	linux-kernel

Commit-ID:  014dc90b66c8d0b5f5a9400440727c134ee5e5a3
Gitweb:     http://git.kernel.org/tip/014dc90b66c8d0b5f5a9400440727c134ee5e5a3
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:33 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 10:24:37 +0200

selftests/x86, x86/ldt: Add a selftest for modify_ldt()

This tests general modify_ldt() behavior (only writes, so far) as
well as synchronous updates via IPI.  It fails on old kernels.

I called this ldt_gdt because I'll add set_thread_area() tests to
it at some point.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/dcfda65dad07ff5a3ea97a9172b5963bf8031b2e.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/ldt_gdt.c | 576 ++++++++++++++++++++++++++++++++++
 2 files changed, 577 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index e8df47e..b70da4a 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -4,7 +4,7 @@ include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
-TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs
+TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs ldt_gdt
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c
new file mode 100644
index 0000000..31a3035
--- /dev/null
+++ b/tools/testing/selftests/x86/ldt_gdt.c
@@ -0,0 +1,576 @@
+/*
+ * ldt_gdt.c - Test cases for LDT and GDT access
+ * Copyright (c) 2015 Andrew Lutomirski
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <asm/ldt.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <sched.h>
+#include <linux/futex.h>
+
+#define AR_ACCESSED		(1<<8)
+
+#define AR_TYPE_RODATA		(0 * (1<<9))
+#define AR_TYPE_RWDATA		(1 * (1<<9))
+#define AR_TYPE_RODATA_EXPDOWN	(2 * (1<<9))
+#define AR_TYPE_RWDATA_EXPDOWN	(3 * (1<<9))
+#define AR_TYPE_XOCODE		(4 * (1<<9))
+#define AR_TYPE_XRCODE		(5 * (1<<9))
+#define AR_TYPE_XOCODE_CONF	(6 * (1<<9))
+#define AR_TYPE_XRCODE_CONF	(7 * (1<<9))
+
+#define AR_DPL3			(3 * (1<<13))
+
+#define AR_S			(1 << 12)
+#define AR_P			(1 << 15)
+#define AR_AVL			(1 << 20)
+#define AR_L			(1 << 21)
+#define AR_DB			(1 << 22)
+#define AR_G			(1 << 23)
+
+static int nerrs;
+
+static void check_invalid_segment(uint16_t index, int ldt)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (has_limit || has_ar) {
+		printf("[FAIL]\t%s entry %hu is valid but should be invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+	} else {
+		printf("[OK]\t%s entry %hu is invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+	}
+}
+
+static void check_valid_segment(uint16_t index, int ldt,
+				uint32_t expected_ar, uint32_t expected_limit,
+				bool verbose)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (!has_limit || !has_ar) {
+		printf("[FAIL]\t%s entry %hu is invalid but should be valid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+		return;
+	}
+
+	if (ar != expected_ar) {
+		printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, expected_ar);
+		nerrs++;
+	} else if (limit != expected_limit) {
+		printf("[FAIL]\t%s entry %hu has limit 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, limit, expected_limit);
+		nerrs++;
+	} else if (verbose) {
+		printf("[OK]\t%s entry %hu has AR 0x%08X and limit 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, limit);
+	}
+}
+
+static bool install_valid_mode(const struct user_desc *desc, uint32_t ar,
+			       bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		uint32_t limit = desc->limit;
+		if (desc->limit_in_pages)
+			limit = (limit << 12) + 4095;
+		check_valid_segment(desc->entry_number, 1, ar, limit, true);
+		return true;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+		return false;
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+			return false;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+			return false;
+		}
+	}
+}
+
+static bool install_valid(const struct user_desc *desc, uint32_t ar)
+{
+	return install_valid_mode(desc, ar, false);
+}
+
+static void install_invalid(const struct user_desc *desc, bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		check_invalid_segment(desc->entry_number, 1);
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+		}
+	}
+}
+
+static int safe_modify_ldt(int func, struct user_desc *ptr,
+			   unsigned long bytecount)
+{
+	int ret = syscall(SYS_modify_ldt, 0x11, ptr, bytecount);
+	if (ret < -1)
+		errno = -ret;
+	return ret;
+}
+
+static void fail_install(struct user_desc *desc)
+{
+	if (safe_modify_ldt(0x11, desc, sizeof(*desc)) == 0) {
+		printf("[FAIL]\tmodify_ldt accepted a bad descriptor\n");
+		nerrs++;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		printf("[OK]\tmodify_ldt failure %d\n", errno);
+	}
+}
+
+static void do_simple_tests(void)
+{
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 10,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	desc.limit_in_pages = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.entry_number = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.base_addr = 0xf0000000;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	desc.useable = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_32bit = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_G | AR_AVL);
+
+	desc.seg_32bit = 1;
+	desc.contents = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.contents = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA_EXPDOWN |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.limit_in_pages = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA_EXPDOWN |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.contents = 3;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.contents = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+
+#ifdef __x86_64__
+	desc.lm = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+		      AR_S | AR_DB | AR_AVL);
+	desc.lm = 0;
+#endif
+
+	bool entry1_okay = install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+					 AR_S | AR_DB | AR_AVL);
+
+	if (entry1_okay) {
+		printf("[RUN]\tTest fork\n");
+		pid_t child = fork();
+		if (child == 0) {
+			nerrs = 0;
+			check_valid_segment(desc.entry_number, 1,
+					    AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, desc.limit,
+					    true);
+			check_invalid_segment(1, 1);
+			exit(nerrs ? 1 : 0);
+		} else {
+			int status;
+			if (waitpid(child, &status, 0) != child ||
+			    !WIFEXITED(status)) {
+				printf("[FAIL]\tChild died\n");
+				nerrs++;
+			} else if (WEXITSTATUS(status) != 0) {
+				printf("[FAIL]\tChild failed\n");
+				nerrs++;
+			} else {
+				printf("[OK]\tChild succeeded\n");
+			}
+		}
+
+		printf("[RUN]\tTest size\n");
+		int i;
+		for (i = 0; i < 8192; i++) {
+			desc.entry_number = i;
+			desc.limit = i;
+			if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+				printf("[FAIL]\tFailed to install entry %d\n", i);
+				nerrs++;
+				break;
+			}
+		}
+		for (int j = 0; j < i; j++) {
+			check_valid_segment(j, 1, AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, j, false);
+		}
+		printf("[DONE]\tSize test\n");
+	} else {
+		printf("[SKIP]\tSkipping fork and size tests because we have no LDT\n");
+	}
+
+	/* Test entry_number too high. */
+	desc.entry_number = 8192;
+	fail_install(&desc);
+
+	/* Test deletion and actions mistakeable for deletion. */
+	memset(&desc, 0, sizeof(desc));
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S | AR_P);
+
+	desc.read_exec_only = 0;
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.read_exec_only = 1;
+	desc.limit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.limit = 0;
+	desc.base_addr = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.base_addr = 0;
+	install_invalid(&desc, false);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 0;
+	desc.seg_32bit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P | AR_DB);
+	install_invalid(&desc, true);
+}
+
+/*
+ * 0: thread is idle
+ * 1: thread armed
+ * 2: thread should clear LDT entry 0
+ * 3: thread should exit
+ */
+static volatile unsigned int ftx;
+
+static void *threadproc(void *ctx)
+{
+	cpu_set_t cpuset;
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
+		err(1, "sched_setaffinity to CPU 1");	/* should never fail */
+
+	while (1) {
+		syscall(SYS_futex, &ftx, FUTEX_WAIT, 0, NULL, NULL, 0);
+		while (ftx != 2) {
+			if (ftx >= 3)
+				return NULL;
+		}
+
+		/* clear LDT entry 0 */
+		const struct user_desc desc = {};
+		if (syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)) != 0)
+			err(1, "modify_ldt");
+
+		/* If ftx == 2, set it to zero.  If ftx == 100, quit. */
+		unsigned int x = -2;
+		asm volatile ("lock xaddl %[x], %[ftx]" :
+			      [x] "+r" (x), [ftx] "+m" (ftx));
+		if (x != 2)
+			return NULL;
+	}
+}
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+
+}
+
+static jmp_buf jmpbuf;
+
+static void sigsegv(int sig, siginfo_t *info, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static void do_multicpu_tests(void)
+{
+	cpu_set_t cpuset;
+	pthread_t thread;
+	int failures = 0, iters = 5, i;
+	unsigned short orig_ss;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 1\n");
+		return;
+	}
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 0\n");
+		return;
+	}
+
+	sethandler(SIGSEGV, sigsegv, 0);
+#ifdef __i386__
+	/* True 32-bit kernels send SIGILL instead of SIGSEGV on IRET faults. */
+	sethandler(SIGILL, sigsegv, 0);
+#endif
+
+	printf("[RUN]\tCross-CPU LDT invalidation\n");
+
+	if (pthread_create(&thread, 0, threadproc, 0) != 0)
+		err(1, "pthread_create");
+
+	asm volatile ("mov %%ss, %0" : "=rm" (orig_ss));
+
+	for (i = 0; i < 5; i++) {
+		if (sigsetjmp(jmpbuf, 1) != 0)
+			continue;
+
+		/* Make sure the thread is ready after the last test. */
+		while (ftx != 0)
+			;
+
+		struct user_desc desc = {
+			.entry_number    = 0,
+			.base_addr       = 0,
+			.limit           = 0xfffff,
+			.seg_32bit       = 1,
+			.contents        = 0, /* Data */
+			.read_exec_only  = 0,
+			.limit_in_pages  = 1,
+			.seg_not_present = 0,
+			.useable         = 0
+		};
+
+		if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+			if (errno != ENOSYS)
+				err(1, "modify_ldt");
+			printf("[SKIP]\tmodify_ldt unavailable\n");
+			break;
+		}
+
+		/* Arm the thread. */
+		ftx = 1;
+		syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+		asm volatile ("mov %0, %%ss" : : "r" (0x7));
+
+		/* Go! */
+		ftx = 2;
+
+		while (ftx != 0)
+			;
+
+		/*
+		 * On success, modify_ldt will segfault us synchronously,
+		 * and we'll escape via siglongjmp.
+		 */
+
+		failures++;
+		asm volatile ("mov %0, %%ss" : : "rm" (orig_ss));
+	};
+
+	ftx = 100;  /* Kill the thread. */
+	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+	if (pthread_join(thread, NULL) != 0)
+		err(1, "pthread_join");
+
+	if (failures) {
+		printf("[FAIL]\t%d of %d iterations failed\n", failures, iters);
+		nerrs++;
+	} else {
+		printf("[OK]\tAll %d iterations succeeded\n", iters);
+	}
+}
+
+static int finish_exec_test(void)
+{
+	/*
+	 * In a sensible world, this would be check_invalid_segment(0, 1);
+	 * For better or for worse, though, the LDT is inherited across exec.
+	 * We can probably change this safely, but for now we test it.
+	 */
+	check_valid_segment(0, 1,
+			    AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB,
+			    42, true);
+
+	return nerrs ? 1 : 0;
+}
+
+static void do_exec_test(void)
+{
+	printf("[RUN]\tTest exec\n");
+
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 42,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	pid_t child = fork();
+	if (child == 0) {
+		execl("/proc/self/exe", "ldt_gdt_test_exec", NULL);
+		printf("[FAIL]\tCould not exec self\n");
+		exit(1);	/* exec failed */
+	} else {
+		int status;
+		if (waitpid(child, &status, 0) != child ||
+		    !WIFEXITED(status)) {
+			printf("[FAIL]\tChild died\n");
+			nerrs++;
+		} else if (WEXITSTATUS(status) != 0) {
+			printf("[FAIL]\tChild failed\n");
+			nerrs++;
+		} else {
+			printf("[OK]\tChild succeeded\n");
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	if (argc == 1 && !strcmp(argv[0], "ldt_gdt_test_exec"))
+		return finish_exec_test();
+
+	do_simple_tests();
+
+	do_multicpu_tests();
+
+	do_exec_test();
+
+	return nerrs ? 1 : 0;
+}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] selftests/x86, x86/ldt:  Add a selftest for modify_ldt()
  2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
  2015-07-31 13:57   ` [tip:x86/asm] selftests/x86, x86/ldt: Add a selftest for modify_ldt() tip-bot for Andy Lutomirski
@ 2015-07-31 13:57   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 13:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: security, tglx, keescook, brgerst, peterz, andrew.cooper3,
	linux-kernel, rostedt, xen-devel, jbeulich, bp, dvlasenk, luto,
	hpa, sasha.levin, boris.ostrovsky, luto, torvalds, mingo

Commit-ID:  014dc90b66c8d0b5f5a9400440727c134ee5e5a3
Gitweb:     http://git.kernel.org/tip/014dc90b66c8d0b5f5a9400440727c134ee5e5a3
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:33 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 10:24:37 +0200

selftests/x86, x86/ldt: Add a selftest for modify_ldt()

This tests general modify_ldt() behavior (only writes, so far) as
well as synchronous updates via IPI.  It fails on old kernels.

I called this ldt_gdt because I'll add set_thread_area() tests to
it at some point.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/dcfda65dad07ff5a3ea97a9172b5963bf8031b2e.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/ldt_gdt.c | 576 ++++++++++++++++++++++++++++++++++
 2 files changed, 577 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index e8df47e..b70da4a 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -4,7 +4,7 @@ include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
-TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs
+TARGETS_C_BOTHBITS := sigreturn single_step_syscall sysret_ss_attrs ldt_gdt
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c
new file mode 100644
index 0000000..31a3035
--- /dev/null
+++ b/tools/testing/selftests/x86/ldt_gdt.c
@@ -0,0 +1,576 @@
+/*
+ * ldt_gdt.c - Test cases for LDT and GDT access
+ * Copyright (c) 2015 Andrew Lutomirski
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <asm/ldt.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <sched.h>
+#include <linux/futex.h>
+
+#define AR_ACCESSED		(1<<8)
+
+#define AR_TYPE_RODATA		(0 * (1<<9))
+#define AR_TYPE_RWDATA		(1 * (1<<9))
+#define AR_TYPE_RODATA_EXPDOWN	(2 * (1<<9))
+#define AR_TYPE_RWDATA_EXPDOWN	(3 * (1<<9))
+#define AR_TYPE_XOCODE		(4 * (1<<9))
+#define AR_TYPE_XRCODE		(5 * (1<<9))
+#define AR_TYPE_XOCODE_CONF	(6 * (1<<9))
+#define AR_TYPE_XRCODE_CONF	(7 * (1<<9))
+
+#define AR_DPL3			(3 * (1<<13))
+
+#define AR_S			(1 << 12)
+#define AR_P			(1 << 15)
+#define AR_AVL			(1 << 20)
+#define AR_L			(1 << 21)
+#define AR_DB			(1 << 22)
+#define AR_G			(1 << 23)
+
+static int nerrs;
+
+static void check_invalid_segment(uint16_t index, int ldt)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (has_limit || has_ar) {
+		printf("[FAIL]\t%s entry %hu is valid but should be invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+	} else {
+		printf("[OK]\t%s entry %hu is invalid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+	}
+}
+
+static void check_valid_segment(uint16_t index, int ldt,
+				uint32_t expected_ar, uint32_t expected_limit,
+				bool verbose)
+{
+	uint32_t has_limit = 0, has_ar = 0, limit, ar;
+	uint32_t selector = (index << 3) | (ldt << 2) | 3;
+
+	asm ("lsl %[selector], %[limit]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_limit]\n\t"
+	     "1:"
+	     : [limit] "=r" (limit), [has_limit] "+rm" (has_limit)
+	     : [selector] "r" (selector));
+	asm ("larl %[selector], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "movl $1, %[has_ar]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [has_ar] "+rm" (has_ar)
+	     : [selector] "r" (selector));
+
+	if (!has_limit || !has_ar) {
+		printf("[FAIL]\t%s entry %hu is invalid but should be valid\n",
+		       (ldt ? "LDT" : "GDT"), index);
+		nerrs++;
+		return;
+	}
+
+	if (ar != expected_ar) {
+		printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, expected_ar);
+		nerrs++;
+	} else if (limit != expected_limit) {
+		printf("[FAIL]\t%s entry %hu has limit 0x%08X but expected 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, limit, expected_limit);
+		nerrs++;
+	} else if (verbose) {
+		printf("[OK]\t%s entry %hu has AR 0x%08X and limit 0x%08X\n",
+		       (ldt ? "LDT" : "GDT"), index, ar, limit);
+	}
+}
+
+static bool install_valid_mode(const struct user_desc *desc, uint32_t ar,
+			       bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		uint32_t limit = desc->limit;
+		if (desc->limit_in_pages)
+			limit = (limit << 12) + 4095;
+		check_valid_segment(desc->entry_number, 1, ar, limit, true);
+		return true;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+		return false;
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+			return false;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+			return false;
+		}
+	}
+}
+
+static bool install_valid(const struct user_desc *desc, uint32_t ar)
+{
+	return install_valid_mode(desc, ar, false);
+}
+
+static void install_invalid(const struct user_desc *desc, bool oldmode)
+{
+	int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11,
+			  desc, sizeof(*desc));
+	if (ret < -1)
+		errno = -ret;
+	if (ret == 0) {
+		check_invalid_segment(desc->entry_number, 1);
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		if (desc->seg_32bit) {
+			printf("[FAIL]\tUnexpected modify_ldt failure %d\n",
+			       errno);
+			nerrs++;
+		} else {
+			printf("[OK]\tmodify_ldt rejected 16 bit segment\n");
+		}
+	}
+}
+
+static int safe_modify_ldt(int func, struct user_desc *ptr,
+			   unsigned long bytecount)
+{
+	int ret = syscall(SYS_modify_ldt, 0x11, ptr, bytecount);
+	if (ret < -1)
+		errno = -ret;
+	return ret;
+}
+
+static void fail_install(struct user_desc *desc)
+{
+	if (safe_modify_ldt(0x11, desc, sizeof(*desc)) == 0) {
+		printf("[FAIL]\tmodify_ldt accepted a bad descriptor\n");
+		nerrs++;
+	} else if (errno == ENOSYS) {
+		printf("[OK]\tmodify_ldt returned -ENOSYS\n");
+	} else {
+		printf("[OK]\tmodify_ldt failure %d\n", errno);
+	}
+}
+
+static void do_simple_tests(void)
+{
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 10,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	desc.limit_in_pages = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.entry_number = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	check_invalid_segment(1, 1);
+
+	desc.base_addr = 0xf0000000;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G);
+
+	desc.useable = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_P | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.seg_32bit = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_G | AR_AVL);
+
+	desc.seg_32bit = 1;
+	desc.contents = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.contents = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA_EXPDOWN |
+		      AR_S | AR_DB | AR_G | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.limit_in_pages = 0;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA_EXPDOWN |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.contents = 3;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE_CONF |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 0;
+	desc.contents = 2;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE |
+		      AR_S | AR_DB | AR_AVL);
+
+	desc.read_exec_only = 1;
+
+#ifdef __x86_64__
+	desc.lm = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+		      AR_S | AR_DB | AR_AVL);
+	desc.lm = 0;
+#endif
+
+	bool entry1_okay = install_valid(&desc, AR_DPL3 | AR_TYPE_XOCODE |
+					 AR_S | AR_DB | AR_AVL);
+
+	if (entry1_okay) {
+		printf("[RUN]\tTest fork\n");
+		pid_t child = fork();
+		if (child == 0) {
+			nerrs = 0;
+			check_valid_segment(desc.entry_number, 1,
+					    AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, desc.limit,
+					    true);
+			check_invalid_segment(1, 1);
+			exit(nerrs ? 1 : 0);
+		} else {
+			int status;
+			if (waitpid(child, &status, 0) != child ||
+			    !WIFEXITED(status)) {
+				printf("[FAIL]\tChild died\n");
+				nerrs++;
+			} else if (WEXITSTATUS(status) != 0) {
+				printf("[FAIL]\tChild failed\n");
+				nerrs++;
+			} else {
+				printf("[OK]\tChild succeeded\n");
+			}
+		}
+
+		printf("[RUN]\tTest size\n");
+		int i;
+		for (i = 0; i < 8192; i++) {
+			desc.entry_number = i;
+			desc.limit = i;
+			if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+				printf("[FAIL]\tFailed to install entry %d\n", i);
+				nerrs++;
+				break;
+			}
+		}
+		for (int j = 0; j < i; j++) {
+			check_valid_segment(j, 1, AR_DPL3 | AR_TYPE_XOCODE |
+					    AR_S | AR_DB | AR_AVL, j, false);
+		}
+		printf("[DONE]\tSize test\n");
+	} else {
+		printf("[SKIP]\tSkipping fork and size tests because we have no LDT\n");
+	}
+
+	/* Test entry_number too high. */
+	desc.entry_number = 8192;
+	fail_install(&desc);
+
+	/* Test deletion and actions mistakeable for deletion. */
+	memset(&desc, 0, sizeof(desc));
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P);
+
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S | AR_P);
+
+	desc.read_exec_only = 0;
+	desc.seg_not_present = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S);
+
+	desc.read_exec_only = 1;
+	desc.limit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.limit = 0;
+	desc.base_addr = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S);
+
+	desc.base_addr = 0;
+	install_invalid(&desc, false);
+
+	desc.seg_not_present = 0;
+	desc.read_exec_only = 0;
+	desc.seg_32bit = 1;
+	install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P | AR_DB);
+	install_invalid(&desc, true);
+}
+
+/*
+ * 0: thread is idle
+ * 1: thread armed
+ * 2: thread should clear LDT entry 0
+ * 3: thread should exit
+ */
+static volatile unsigned int ftx;
+
+static void *threadproc(void *ctx)
+{
+	cpu_set_t cpuset;
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
+		err(1, "sched_setaffinity to CPU 1");	/* should never fail */
+
+	while (1) {
+		syscall(SYS_futex, &ftx, FUTEX_WAIT, 0, NULL, NULL, 0);
+		while (ftx != 2) {
+			if (ftx >= 3)
+				return NULL;
+		}
+
+		/* clear LDT entry 0 */
+		const struct user_desc desc = {};
+		if (syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)) != 0)
+			err(1, "modify_ldt");
+
+		/* If ftx == 2, set it to zero.  If ftx == 100, quit. */
+		unsigned int x = -2;
+		asm volatile ("lock xaddl %[x], %[ftx]" :
+			      [x] "+r" (x), [ftx] "+m" (ftx));
+		if (x != 2)
+			return NULL;
+	}
+}
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+
+}
+
+static jmp_buf jmpbuf;
+
+static void sigsegv(int sig, siginfo_t *info, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static void do_multicpu_tests(void)
+{
+	cpu_set_t cpuset;
+	pthread_t thread;
+	int failures = 0, iters = 5, i;
+	unsigned short orig_ss;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 1\n");
+		return;
+	}
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
+		printf("[SKIP]\tCannot set affinity to CPU 0\n");
+		return;
+	}
+
+	sethandler(SIGSEGV, sigsegv, 0);
+#ifdef __i386__
+	/* True 32-bit kernels send SIGILL instead of SIGSEGV on IRET faults. */
+	sethandler(SIGILL, sigsegv, 0);
+#endif
+
+	printf("[RUN]\tCross-CPU LDT invalidation\n");
+
+	if (pthread_create(&thread, 0, threadproc, 0) != 0)
+		err(1, "pthread_create");
+
+	asm volatile ("mov %%ss, %0" : "=rm" (orig_ss));
+
+	for (i = 0; i < 5; i++) {
+		if (sigsetjmp(jmpbuf, 1) != 0)
+			continue;
+
+		/* Make sure the thread is ready after the last test. */
+		while (ftx != 0)
+			;
+
+		struct user_desc desc = {
+			.entry_number    = 0,
+			.base_addr       = 0,
+			.limit           = 0xfffff,
+			.seg_32bit       = 1,
+			.contents        = 0, /* Data */
+			.read_exec_only  = 0,
+			.limit_in_pages  = 1,
+			.seg_not_present = 0,
+			.useable         = 0
+		};
+
+		if (safe_modify_ldt(0x11, &desc, sizeof(desc)) != 0) {
+			if (errno != ENOSYS)
+				err(1, "modify_ldt");
+			printf("[SKIP]\tmodify_ldt unavailable\n");
+			break;
+		}
+
+		/* Arm the thread. */
+		ftx = 1;
+		syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+		asm volatile ("mov %0, %%ss" : : "r" (0x7));
+
+		/* Go! */
+		ftx = 2;
+
+		while (ftx != 0)
+			;
+
+		/*
+		 * On success, modify_ldt will segfault us synchronously,
+		 * and we'll escape via siglongjmp.
+		 */
+
+		failures++;
+		asm volatile ("mov %0, %%ss" : : "rm" (orig_ss));
+	};
+
+	ftx = 100;  /* Kill the thread. */
+	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+
+	if (pthread_join(thread, NULL) != 0)
+		err(1, "pthread_join");
+
+	if (failures) {
+		printf("[FAIL]\t%d of %d iterations failed\n", failures, iters);
+		nerrs++;
+	} else {
+		printf("[OK]\tAll %d iterations succeeded\n", iters);
+	}
+}
+
+static int finish_exec_test(void)
+{
+	/*
+	 * In a sensible world, this would be check_invalid_segment(0, 1);
+	 * For better or for worse, though, the LDT is inherited across exec.
+	 * We can probably change this safely, but for now we test it.
+	 */
+	check_valid_segment(0, 1,
+			    AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB,
+			    42, true);
+
+	return nerrs ? 1 : 0;
+}
+
+static void do_exec_test(void)
+{
+	printf("[RUN]\tTest exec\n");
+
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = 0,
+		.limit           = 42,
+		.seg_32bit       = 1,
+		.contents        = 2, /* Code, not conforming */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	install_valid(&desc, AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB);
+
+	pid_t child = fork();
+	if (child == 0) {
+		execl("/proc/self/exe", "ldt_gdt_test_exec", NULL);
+		printf("[FAIL]\tCould not exec self\n");
+		exit(1);	/* exec failed */
+	} else {
+		int status;
+		if (waitpid(child, &status, 0) != child ||
+		    !WIFEXITED(status)) {
+			printf("[FAIL]\tChild died\n");
+			nerrs++;
+		} else if (WEXITSTATUS(status) != 0) {
+			printf("[FAIL]\tChild failed\n");
+			nerrs++;
+		} else {
+			printf("[OK]\tChild succeeded\n");
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	if (argc == 1 && !strcmp(argv[0], "ldt_gdt_test_exec"))
+		return finish_exec_test();
+
+	do_simple_tests();
+
+	do_multicpu_tests();
+
+	do_exec_test();
+
+	return nerrs ? 1 : 0;
+}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option
  2015-07-31 13:44   ` Boris Ostrovsky
@ 2015-07-31 14:02       ` Andrew Cooper
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Cooper @ 2015-07-31 14:02 UTC (permalink / raw)
  To: Boris Ostrovsky, Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: Peter Zijlstra, Steven Rostedt, security, Sasha Levin,
	Konrad Rzeszutek Wilk, Jan Beulich, xen-devel

On 31/07/15 14:44, Boris Ostrovsky wrote:
> On 07/31/2015 05:10 AM, Andrew Cooper wrote:
>> On 30/07/15 22:31, Andy Lutomirski wrote:
>>> This is intended for x86/urgent.  Sorry for taking so long, but it
>>> seemed nice to avoid breaking Xen.
>> Very much appreciated.  Thanks!
>>
>>> This fixes the "dazed and confused" issue which was exposed by the
>>> CVE-2015-5157 fix.  It's also probably a good general attack surface
>>> reduction, and it replaces some scary code with IMO less scary code.
>>>
>>> Also, servers and embedded systems should probably turn off modify_ldt.
>>> This makes that possible.
>>>
>>> Xen people, can you test patch 1?  It works for me on my evil 32-bit
>>> Xen virtio setup.
>> So the LDT issue seems to have gone away, which is good.
>>
>> However, I did get this from my single vcpu guest test
>>
>> [OK]    LDT entry 0 is invalid
>> [SKIP]    Cannot set affinity to CPU 1
>> [RUN]    Test exec
>> [    3.638967] CPU 0 set the LDT
>> [OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
>> [    3.639380] ------------[ cut here ]------------
>> [    3.639389] WARNING: CPU: 0 PID: 383 at
>> /local/linux-mainline.git/arch/x86/include/asm/mmu_context.h:96
>> flush_old_exec+0x7fd/0xb70()
>> [    3.639397] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
>
> You must be running v5 (or earlier). This is fixed in v6 --- it is now
> 'DEBUG_LOCKS_WARN_ON(preemptible());'

Hmm - I definitely have the correct code, but did a complete clean and
rebuild, and the issue went away.  I presume I had something stale in
the build.

I am still seeing

[    5.496264] WARNING: CPU: 0 PID: 389 at
/local/linux-mainline.git/kernel/locking/lockdep.c:2639
trace_hardirqs_off_caller+0xa9/0xb0()
[    5.496272] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[    5.496276] CPU: 0 PID: 389 Comm: ldt_gdt_32 Not tainted 4.2.0-rc4+ #21

But that looks incidental, and unrelated to these fixes.

~Andrew

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option
@ 2015-07-31 14:02       ` Andrew Cooper
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Cooper @ 2015-07-31 14:02 UTC (permalink / raw)
  To: Boris Ostrovsky, Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel
  Cc: security, Peter Zijlstra, Steven Rostedt, xen-devel, Jan Beulich,
	Sasha Levin

On 31/07/15 14:44, Boris Ostrovsky wrote:
> On 07/31/2015 05:10 AM, Andrew Cooper wrote:
>> On 30/07/15 22:31, Andy Lutomirski wrote:
>>> This is intended for x86/urgent.  Sorry for taking so long, but it
>>> seemed nice to avoid breaking Xen.
>> Very much appreciated.  Thanks!
>>
>>> This fixes the "dazed and confused" issue which was exposed by the
>>> CVE-2015-5157 fix.  It's also probably a good general attack surface
>>> reduction, and it replaces some scary code with IMO less scary code.
>>>
>>> Also, servers and embedded systems should probably turn off modify_ldt.
>>> This makes that possible.
>>>
>>> Xen people, can you test patch 1?  It works for me on my evil 32-bit
>>> Xen virtio setup.
>> So the LDT issue seems to have gone away, which is good.
>>
>> However, I did get this from my single vcpu guest test
>>
>> [OK]    LDT entry 0 is invalid
>> [SKIP]    Cannot set affinity to CPU 1
>> [RUN]    Test exec
>> [    3.638967] CPU 0 set the LDT
>> [OK]    LDT entry 0 has AR 0x0040FA00 and limit 0x0000002A
>> [    3.639380] ------------[ cut here ]------------
>> [    3.639389] WARNING: CPU: 0 PID: 383 at
>> /local/linux-mainline.git/arch/x86/include/asm/mmu_context.h:96
>> flush_old_exec+0x7fd/0xb70()
>> [    3.639397] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
>
> You must be running v5 (or earlier). This is fixed in v6 --- it is now
> 'DEBUG_LOCKS_WARN_ON(preemptible());'

Hmm - I definitely have the correct code, but did a complete clean and
rebuild, and the issue went away.  I presume I had something stale in
the build.

I am still seeing

[    5.496264] WARNING: CPU: 0 PID: 389 at
/local/linux-mainline.git/kernel/locking/lockdep.c:2639
trace_hardirqs_off_caller+0xa9/0xb0()
[    5.496272] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[    5.496276] CPU: 0 PID: 389 Comm: ldt_gdt_32 Not tainted 4.2.0-rc4+ #21

But that looks incidental, and unrelated to these fixes.

~Andrew

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] x86/ldt: Make modify_ldt() optional
  2015-07-30 21:31 ` Andy Lutomirski
                     ` (2 preceding siblings ...)
  2015-07-31 14:02   ` [tip:x86/asm] x86/ldt: Make modify_ldt() optional tip-bot for Andy Lutomirski
@ 2015-07-31 14:02   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 14:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, peterz, bp, mingo, hpa, luto, sasha.levin, keescook,
	andrew.cooper3, linux-kernel, security, tglx, torvalds, rostedt,
	xen-devel, dvlasenk, konrad.wilk, brgerst, jbeulich,
	boris.ostrovsky

Commit-ID:  a5b9e5a2f14f25a8dae987494d50ad3aac7366b6
Gitweb:     http://git.kernel.org/tip/a5b9e5a2f14f25a8dae987494d50ad3aac7366b6
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:34 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 13:30:45 +0200

x86/ldt: Make modify_ldt() optional

The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace.  Make it optional.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/a605166a771c343fd64802dece77a903507333bd.1438291540.git.luto@kernel.org
[ Made MATH_EMULATION dependent on MODIFY_LDT_SYSCALL. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Kconfig                   | 18 ++++++++++++++++++
 arch/x86/include/asm/mmu.h         |  2 ++
 arch/x86/include/asm/mmu_context.h | 28 +++++++++++++++++++++-------
 arch/x86/kernel/Makefile           |  3 ++-
 arch/x86/kernel/cpu/perf_event.c   |  4 ++++
 arch/x86/kernel/process_64.c       |  2 ++
 arch/x86/kernel/step.c             |  2 ++
 kernel/sys_ni.c                    |  1 +
 8 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e26fe7a..7986580 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1036,6 +1036,7 @@ config VM86
 config X86_16BIT
 	bool "Enable support for 16-bit segments" if EXPERT
 	default y
+	depends on MODIFY_LDT_SYSCALL
 	---help---
 	  This option is required by programs like Wine to run 16-bit
 	  protected mode legacy code on x86 processors.  Disabling
@@ -1530,6 +1531,7 @@ config X86_RESERVE_LOW
 
 config MATH_EMULATION
 	bool
+	depends on MODIFY_LDT_SYSCALL
 	prompt "Math emulation" if X86_32
 	---help---
 	  Linux can emulate a math coprocessor (used for floating point
@@ -2074,6 +2076,22 @@ config CMDLINE_OVERRIDE
 	  This is used to work around broken boot loaders.  This should
 	  be set to 'N' under normal conditions.
 
+config MODIFY_LDT_SYSCALL
+	bool "Enable the LDT (local descriptor table)" if EXPERT
+	default y
+	---help---
+	  Linux can allow user programs to install a per-process x86
+	  Local Descriptor Table (LDT) using the modify_ldt(2) system
+	  call.  This is required to run 16-bit or segmented code such as
+	  DOSEMU or some Wine programs.  It is also used by some very old
+	  threading libraries.
+
+	  Enabling this feature adds a small amount of overhead to
+	  context switches and increases the low-level kernel attack
+	  surface.  Disabling it removes the modify_ldt(2) system call.
+
+	  Saying 'N' here may make sense for embedded or server kernels.
+
 source "kernel/livepatch/Kconfig"
 
 endmenu
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 364d274..55234d5 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,7 +9,9 @@
  * we put the segment information here.
  */
 typedef struct {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
+#endif
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 984abfe..379cd36 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -33,6 +33,7 @@ static inline void load_mm_cr4(struct mm_struct *mm)
 static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 /*
  * ldt_structs can be allocated, used, and freed, but they are never
  * modified while live.
@@ -48,8 +49,23 @@ struct ldt_struct {
 	int size;
 };
 
+/*
+ * Used for LDT copy/destruction.
+ */
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
+void destroy_context(struct mm_struct *mm);
+#else	/* CONFIG_MODIFY_LDT_SYSCALL */
+static inline int init_new_context(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	return 0;
+}
+static inline void destroy_context(struct mm_struct *mm) {}
+#endif
+
 static inline void load_mm_ldt(struct mm_struct *mm)
 {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
 
 	/* lockless_dereference synchronizes with smp_store_release */
@@ -73,17 +89,13 @@ static inline void load_mm_ldt(struct mm_struct *mm)
 		set_ldt(ldt->entries, ldt->size);
 	else
 		clear_LDT();
+#else
+	clear_LDT();
+#endif
 
 	DEBUG_LOCKS_WARN_ON(preemptible());
 }
 
-/*
- * Used for LDT copy/destruction.
- */
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void destroy_context(struct mm_struct *mm);
-
-
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
 #ifdef CONFIG_SMP
@@ -114,6 +126,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		/* Load per-mm CR4 state */
 		load_mm_cr4(next);
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		/*
 		 * Load the LDT, if the LDT is different.
 		 *
@@ -128,6 +141,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
 			load_mm_ldt(next);
+#endif
 	}
 #ifdef CONFIG_SMP
 	  else {
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index dc19730..5140648 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -25,7 +25,8 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
 obj-y			:= process_$(BITS).o signal.o
 obj-$(CONFIG_COMPAT)	+= signal_compat.o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
+obj-y			+= time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_MODIFY_LDT_SYSCALL)	+= ldt.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
 obj-$(CONFIG_IRQ_WORK)  += irq_work.o
 obj-y			+= probe_roms.o
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 09f9ff2..cc25c8f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,6 +2179,7 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		struct ldt_struct *ldt;
 
 		if (idx > LDT_ENTRIES)
@@ -2190,6 +2191,9 @@ static unsigned long get_segment_base(unsigned int segment)
 			return 0;
 
 		desc = &ldt->entries[idx];
+#else
+		return 0;
+#endif
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 7ff035c..3c1bbcf 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,6 +121,7 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
@@ -128,6 +129,7 @@ void release_thread(struct task_struct *dead_task)
 				dead_task->mm->context.ldt->size);
 			BUG();
 		}
+#endif
 	}
 }
 
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 6273324..fd88e15 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -18,6 +18,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		return addr;
 	}
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	/*
 	 * We'll assume that the code segments in the GDT
 	 * are all zero-based. That is largely true: the
@@ -45,6 +46,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		}
 		mutex_unlock(&child->mm->context.lock);
 	}
+#endif
 
 	return addr;
 }
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..ca7d84f 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -140,6 +140,7 @@ cond_syscall(sys_sgetmask);
 cond_syscall(sys_ssetmask);
 cond_syscall(sys_vm86old);
 cond_syscall(sys_vm86);
+cond_syscall(sys_modify_ldt);
 cond_syscall(sys_ipc);
 cond_syscall(compat_sys_ipc);
 cond_syscall(compat_sys_sysctl);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:x86/asm] x86/ldt: Make modify_ldt() optional
  2015-07-30 21:31 ` Andy Lutomirski
  2015-07-31 13:49   ` Ingo Molnar
  2015-07-31 13:49   ` Ingo Molnar
@ 2015-07-31 14:02   ` tip-bot for Andy Lutomirski
  2015-07-31 14:02   ` tip-bot for Andy Lutomirski
  3 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-07-31 14:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: boris.ostrovsky, security, dvlasenk, keescook, brgerst, peterz,
	andrew.cooper3, linux-kernel, rostedt, luto, jbeulich, bp, luto,
	hpa, sasha.levin, tglx, xen-devel, torvalds, mingo

Commit-ID:  a5b9e5a2f14f25a8dae987494d50ad3aac7366b6
Gitweb:     http://git.kernel.org/tip/a5b9e5a2f14f25a8dae987494d50ad3aac7366b6
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 30 Jul 2015 14:31:34 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 31 Jul 2015 13:30:45 +0200

x86/ldt: Make modify_ldt() optional

The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace.  Make it optional.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/a605166a771c343fd64802dece77a903507333bd.1438291540.git.luto@kernel.org
[ Made MATH_EMULATION dependent on MODIFY_LDT_SYSCALL. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Kconfig                   | 18 ++++++++++++++++++
 arch/x86/include/asm/mmu.h         |  2 ++
 arch/x86/include/asm/mmu_context.h | 28 +++++++++++++++++++++-------
 arch/x86/kernel/Makefile           |  3 ++-
 arch/x86/kernel/cpu/perf_event.c   |  4 ++++
 arch/x86/kernel/process_64.c       |  2 ++
 arch/x86/kernel/step.c             |  2 ++
 kernel/sys_ni.c                    |  1 +
 8 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e26fe7a..7986580 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1036,6 +1036,7 @@ config VM86
 config X86_16BIT
 	bool "Enable support for 16-bit segments" if EXPERT
 	default y
+	depends on MODIFY_LDT_SYSCALL
 	---help---
 	  This option is required by programs like Wine to run 16-bit
 	  protected mode legacy code on x86 processors.  Disabling
@@ -1530,6 +1531,7 @@ config X86_RESERVE_LOW
 
 config MATH_EMULATION
 	bool
+	depends on MODIFY_LDT_SYSCALL
 	prompt "Math emulation" if X86_32
 	---help---
 	  Linux can emulate a math coprocessor (used for floating point
@@ -2074,6 +2076,22 @@ config CMDLINE_OVERRIDE
 	  This is used to work around broken boot loaders.  This should
 	  be set to 'N' under normal conditions.
 
+config MODIFY_LDT_SYSCALL
+	bool "Enable the LDT (local descriptor table)" if EXPERT
+	default y
+	---help---
+	  Linux can allow user programs to install a per-process x86
+	  Local Descriptor Table (LDT) using the modify_ldt(2) system
+	  call.  This is required to run 16-bit or segmented code such as
+	  DOSEMU or some Wine programs.  It is also used by some very old
+	  threading libraries.
+
+	  Enabling this feature adds a small amount of overhead to
+	  context switches and increases the low-level kernel attack
+	  surface.  Disabling it removes the modify_ldt(2) system call.
+
+	  Saying 'N' here may make sense for embedded or server kernels.
+
 source "kernel/livepatch/Kconfig"
 
 endmenu
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 364d274..55234d5 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,7 +9,9 @@
  * we put the segment information here.
  */
 typedef struct {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
+#endif
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 984abfe..379cd36 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -33,6 +33,7 @@ static inline void load_mm_cr4(struct mm_struct *mm)
 static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 /*
  * ldt_structs can be allocated, used, and freed, but they are never
  * modified while live.
@@ -48,8 +49,23 @@ struct ldt_struct {
 	int size;
 };
 
+/*
+ * Used for LDT copy/destruction.
+ */
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
+void destroy_context(struct mm_struct *mm);
+#else	/* CONFIG_MODIFY_LDT_SYSCALL */
+static inline int init_new_context(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	return 0;
+}
+static inline void destroy_context(struct mm_struct *mm) {}
+#endif
+
 static inline void load_mm_ldt(struct mm_struct *mm)
 {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct ldt_struct *ldt;
 
 	/* lockless_dereference synchronizes with smp_store_release */
@@ -73,17 +89,13 @@ static inline void load_mm_ldt(struct mm_struct *mm)
 		set_ldt(ldt->entries, ldt->size);
 	else
 		clear_LDT();
+#else
+	clear_LDT();
+#endif
 
 	DEBUG_LOCKS_WARN_ON(preemptible());
 }
 
-/*
- * Used for LDT copy/destruction.
- */
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void destroy_context(struct mm_struct *mm);
-
-
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
 #ifdef CONFIG_SMP
@@ -114,6 +126,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		/* Load per-mm CR4 state */
 		load_mm_cr4(next);
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		/*
 		 * Load the LDT, if the LDT is different.
 		 *
@@ -128,6 +141,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 */
 		if (unlikely(prev->context.ldt != next->context.ldt))
 			load_mm_ldt(next);
+#endif
 	}
 #ifdef CONFIG_SMP
 	  else {
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index dc19730..5140648 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -25,7 +25,8 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
 obj-y			:= process_$(BITS).o signal.o
 obj-$(CONFIG_COMPAT)	+= signal_compat.o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
+obj-y			+= time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_MODIFY_LDT_SYSCALL)	+= ldt.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
 obj-$(CONFIG_IRQ_WORK)  += irq_work.o
 obj-y			+= probe_roms.o
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 09f9ff2..cc25c8f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2179,6 +2179,7 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		struct ldt_struct *ldt;
 
 		if (idx > LDT_ENTRIES)
@@ -2190,6 +2191,9 @@ static unsigned long get_segment_base(unsigned int segment)
 			return 0;
 
 		desc = &ldt->entries[idx];
+#else
+		return 0;
+#endif
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 7ff035c..3c1bbcf 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,6 +121,7 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
@@ -128,6 +129,7 @@ void release_thread(struct task_struct *dead_task)
 				dead_task->mm->context.ldt->size);
 			BUG();
 		}
+#endif
 	}
 }
 
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 6273324..fd88e15 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -18,6 +18,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		return addr;
 	}
 
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
 	/*
 	 * We'll assume that the code segments in the GDT
 	 * are all zero-based. That is largely true: the
@@ -45,6 +46,7 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		}
 		mutex_unlock(&child->mm->context.lock);
 	}
+#endif
 
 	return addr;
 }
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..ca7d84f 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -140,6 +140,7 @@ cond_syscall(sys_sgetmask);
 cond_syscall(sys_ssetmask);
 cond_syscall(sys_vm86old);
 cond_syscall(sys_vm86);
+cond_syscall(sys_modify_ldt);
 cond_syscall(sys_ipc);
 cond_syscall(compat_sys_ipc);
 cond_syscall(compat_sys_sysctl);

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-07-31 14:20 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-30 21:31 [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andy Lutomirski
2015-07-30 21:31 ` [PATCH v6 1/4] x86/xen: Probe target addresses in set_aliased_prot before the hypercall Andy Lutomirski
2015-07-30 21:31 ` Andy Lutomirski
2015-07-31  9:41   ` [Xen-devel] " David Vrabel
2015-07-31  9:41   ` David Vrabel
2015-07-31 13:56   ` [tip:x86/asm] x86/xen: Probe target addresses in set_aliased_prot () " tip-bot for Andy Lutomirski
2015-07-31 13:56   ` tip-bot for Andy Lutomirski
2015-07-30 21:31 ` [PATCH v6 2/4] x86/ldt: Make modify_ldt synchronous Andy Lutomirski
2015-07-30 22:50   ` Andrew Cooper
2015-07-31  3:15     ` Andy Lutomirski
2015-07-31  3:15     ` Andy Lutomirski
2015-07-30 22:50   ` Andrew Cooper
2015-07-31 13:56   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2015-07-31 13:56   ` tip-bot for Andy Lutomirski
2015-07-30 21:31 ` [PATCH v6 2/4] " Andy Lutomirski
2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
2015-07-31 13:57   ` [tip:x86/asm] selftests/x86, x86/ldt: Add a selftest for modify_ldt() tip-bot for Andy Lutomirski
2015-07-31 13:57   ` tip-bot for Andy Lutomirski
2015-07-30 21:31 ` [PATCH v6 3/4] selftests/x86, x86/ldt: Add a selftest for modify_ldt Andy Lutomirski
2015-07-30 21:31 ` [PATCH v6 4/4] x86/ldt: Make modify_ldt optional Andy Lutomirski
2015-07-30 21:31 ` Andy Lutomirski
2015-07-31 13:49   ` Ingo Molnar
2015-07-31 13:49   ` Ingo Molnar
2015-07-31 14:02   ` [tip:x86/asm] x86/ldt: Make modify_ldt() optional tip-bot for Andy Lutomirski
2015-07-31 14:02   ` tip-bot for Andy Lutomirski
2015-07-31  9:10 ` [PATCH v6 0/4] x86: modify_ldt improvement, test, and config option Andrew Cooper
2015-07-31 13:44   ` Boris Ostrovsky
2015-07-31 13:44   ` Boris Ostrovsky
2015-07-31 14:02     ` Andrew Cooper
2015-07-31 14:02       ` Andrew Cooper
2015-07-31  9:10 ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.