linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size
@ 2017-01-20 16:41 Thomas Garnier
  2017-01-20 16:41 ` [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section Thomas Garnier
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Thomas Garnier @ 2017-01-20 16:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Thomas Garnier,
	Kees Cook, Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu
  Cc: x86, linux-kernel, linux-pm, kvm, kernel-hardening

This patch aligns MODULES_END to the beginning of the Fixmap section.
It optimizes the space available for both sections. The address is
pre-computed based on the number of pages required by the Fixmap
section.

It will allow GDT remapping in the Fixmap section. The current
MODULES_END static address does not provide enough space for the kernel
to support a large number of processors.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20170119
---
 arch/x86/include/asm/fixmap.h           | 8 ++++++++
 arch/x86/include/asm/pgtable_64_types.h | 3 ---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 8554f960e21b..c46289799b02 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -132,6 +132,14 @@ enum fixed_addresses {
 
 extern void reserve_top_address(unsigned long reserve);
 
+/* On 64bit, the module sections ends with the start of the fixmap */
+#ifdef CONFIG_X86_64
+#define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
+#define MODULES_LEN   (MODULES_END - MODULES_VADDR)
+#endif /* CONFIG_X86_64 */
+
+
 #define FIXADDR_SIZE	(__end_of_permanent_fixed_addresses << PAGE_SHIFT)
 #define FIXADDR_START		(FIXADDR_TOP - FIXADDR_SIZE)
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 3a264200c62f..de8bace10200 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -66,9 +66,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define VMEMMAP_START	__VMEMMAP_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 #define VMALLOC_END	(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
-#define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
-#define MODULES_END      _AC(0xffffffffff000000, UL)
-#define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 #define ESPFIX_PGD_ENTRY _AC(-2, UL)
 #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << PGDIR_SHIFT)
 #define EFI_VA_START	 ( -4 * (_AC(1, UL) << 30))
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section
  2017-01-20 16:41 [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Thomas Garnier
@ 2017-01-20 16:41 ` Thomas Garnier
  2017-01-21  0:57   ` Andy Lutomirski
                     ` (2 more replies)
  2017-01-20 16:41 ` [PATCH v1 3/3] x86: Make the GDT remapping read-only on 64 bit Thomas Garnier
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 13+ messages in thread
From: Thomas Garnier @ 2017-01-20 16:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Thomas Garnier,
	Kees Cook, Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu
  Cc: x86, linux-kernel, linux-pm, kvm, kernel-hardening

Each processor holds a GDT in its per-cpu structure. The sgdt
instruction gives the base address of the current GDT. This address can
be used to bypass KASLR memory randomization. With another bug, an
attacker could target other per-cpu structures or deduce the base of
the main memory section (PAGE_OFFSET).

This patch relocates the GDT table for each processor inside the
Fixmap section. The space is reserved based on number of supported
cpus.

For consistency, the remapping is done by default on 32 and 64 bit.

Each processor switches to its remapped GDT at the end of
initialization. For hibernation, the main processor returns with the
original GDT and switches back to the remapping at completion.

On 32 bit, the maximum number of processors is now 256. The Fixmap
section cannot handle the original 512. Additional asserts ensure that
the Fixmap section cannot grow beyond the space available.

This patch was tested on both architectures. Hibernation and KVM were
both tested specially for their usage of the GDT.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20170119
---
 arch/x86/Kconfig                 |  1 +
 arch/x86/include/asm/fixmap.h    |  4 ++++
 arch/x86/include/asm/processor.h |  1 +
 arch/x86/kernel/cpu/common.c     | 18 +++++++++++++++++-
 arch/x86/mm/init_32.c            |  2 ++
 arch/x86/power/cpu.c             |  3 +++
 6 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1d4e8f2131f..b4ed35db10a8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -912,6 +912,7 @@ config MAXSMP
 config NR_CPUS
 	int "Maximum number of CPUs" if SMP && !MAXSMP
 	range 2 8 if SMP && X86_32 && !X86_BIGSMP
+	range 2 256 if SMP && X86_32 && X86_BIGSMP
 	range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK
 	range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64
 	default "1" if !SMP
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index c46289799b02..8b913b5e9383 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -100,6 +100,10 @@ enum fixed_addresses {
 #ifdef	CONFIG_X86_INTEL_MID
 	FIX_LNW_VRTC,
 #endif
+	/* Fixmap entries to remap the GDTs, one per processor. */
+	FIX_GDT_REMAP_BEGIN,
+	FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
+
 	__end_of_permanent_fixed_addresses,
 
 	/*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1be64da0384e..280211ad8be9 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -705,6 +705,7 @@ extern struct desc_ptr		early_gdt_descr;
 
 extern void cpu_set_gdt(int);
 extern void switch_to_new_gdt(int);
+extern void load_remapped_gdt(int);
 extern void load_percpu_segment(int);
 extern void cpu_init(void);
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e97ffc8d29f4..7d940b0e805a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -443,6 +443,19 @@ void load_percpu_segment(int cpu)
 	load_stack_canary_segment();
 }
 
+/* Load a fixmap remapping of the per-cpu GDT */
+void load_remapped_gdt(int cpu)
+{
+	struct desc_ptr gdt_descr;
+	unsigned long idx = FIX_GDT_REMAP_BEGIN + cpu;
+
+	__set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), PAGE_KERNEL);
+
+	gdt_descr.address = (long)__fix_to_virt(idx);
+	gdt_descr.size = GDT_SIZE - 1;
+	load_gdt(&gdt_descr);
+}
+
 /*
  * Current gdt points %fs at the "master" per-cpu area: after this,
  * it's on the real one.
@@ -455,7 +468,6 @@ void switch_to_new_gdt(int cpu)
 	gdt_descr.size = GDT_SIZE - 1;
 	load_gdt(&gdt_descr);
 	/* Reload the per-cpu base */
-
 	load_percpu_segment(cpu);
 }
 
@@ -1508,6 +1520,8 @@ void cpu_init(void)
 
 	if (is_uv_system())
 		uv_cpu_init();
+
+	load_remapped_gdt(cpu);
 }
 
 #else
@@ -1563,6 +1577,8 @@ void cpu_init(void)
 	dbg_restore_debug_regs();
 
 	fpu__init_cpu();
+
+	load_remapped_gdt(cpu);
 }
 #endif
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 928d657de829..cfbcf42099d0 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -798,9 +798,11 @@ void __init mem_init(void)
 #ifdef CONFIG_HIGHMEM
 	BUILD_BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE	> FIXADDR_START);
 	BUILD_BUG_ON(VMALLOC_END			> PKMAP_BASE);
+	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= PKMAP_BASE + LAST_PKMAP*PAGE_SIZE);
 #endif
 #define high_memory (-128UL << 20)
 	BUILD_BUG_ON(VMALLOC_START			>= VMALLOC_END);
+	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= VMALLOC_END);
 #undef high_memory
 #undef __FIXADDR_TOP
 
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 66ade16c7693..7578de6db833 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -183,6 +183,9 @@ static void fix_processor_context(void)
 	load_mm_ldt(current->active_mm);	/* This does lldt */
 
 	fpu__resume_cpu();
+
+	/* Load remapped GDT */
+	load_remapped_gdt(cpu);
 }
 
 /**
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 3/3] x86: Make the GDT remapping read-only on 64 bit
  2017-01-20 16:41 [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Thomas Garnier
  2017-01-20 16:41 ` [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section Thomas Garnier
@ 2017-01-20 16:41 ` Thomas Garnier
  2017-01-21  1:06   ` Andy Lutomirski
  2017-01-20 22:24 ` [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Andy Lutomirski
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Thomas Garnier @ 2017-01-20 16:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Thomas Garnier,
	Kees Cook, Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu
  Cc: x86, linux-kernel, linux-pm, kvm, kernel-hardening

This patch makes the GDT remapped pages read-only to prevent corruption.
This change is done only on 64 bit.

The native_load_tr_desc function was adapted to correctly handle a
read-only GDT. The LTR instruction always writes to the GDT TSS entry.
This generates a page fault if the GDT is read-only. This change checks
if the current GDT is a remap and swap GDTs as needed. This function was
tested by booting multiple machines and checking hibernation works
properly.

KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the GDT
description and writeble address were put in two per-cpu variables for
convenience. For testing, VMs were started and restored on multiple
configurations.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20170119
---
 arch/x86/include/asm/desc.h      | 44 +++++++++++++++++++++++++++++++++++-----
 arch/x86/include/asm/processor.h |  1 +
 arch/x86/kernel/cpu/common.c     | 36 ++++++++++++++++++++++++++------
 arch/x86/kvm/svm.c               |  5 ++---
 arch/x86/kvm/vmx.c               | 23 +++++++++++++--------
 5 files changed, 86 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 12080d87da3b..6ed68d449c7b 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -50,6 +50,13 @@ static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
 	return per_cpu(gdt_page, cpu).gdt;
 }
 
+static inline struct desc_struct *get_current_gdt_table(void)
+{
+	return get_cpu_var(gdt_page).gdt;
+}
+
+struct desc_struct *get_remapped_gdt(int cpu);
+
 #ifdef CONFIG_X86_64
 
 static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func,
@@ -208,11 +215,6 @@ static inline void native_set_ldt(const void *addr, unsigned int entries)
 	}
 }
 
-static inline void native_load_tr_desc(void)
-{
-	asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
-}
-
 static inline void native_load_gdt(const struct desc_ptr *dtr)
 {
 	asm volatile("lgdt %0"::"m" (*dtr));
@@ -233,6 +235,38 @@ static inline void native_store_idt(struct desc_ptr *dtr)
 	asm volatile("sidt %0":"=m" (*dtr));
 }
 
+/*
+ * The LTR instruction marks the TSS GDT entry as busy. In 64bit, the GDT is
+ * a read-only remapping. To prevent a page fault, the GDT is switched to the
+ * original writeable version when needed.
+ */
+#ifdef CONFIG_X86_64
+static inline void native_load_tr_desc(void)
+{
+	struct desc_ptr gdt;
+	int cpu = raw_smp_processor_id();
+	bool restore = false;
+	struct desc_struct *remapped_gdt;
+
+	native_store_gdt(&gdt);
+	remapped_gdt = get_remapped_gdt(cpu);
+
+	/* Swap and restore only if the current GDT is the remap. */
+	if (gdt.address == (unsigned long)remapped_gdt) {
+		load_percpu_gdt(cpu);
+		restore = true;
+	}
+	asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
+	if (restore)
+		load_remapped_gdt(cpu);
+}
+#else
+static inline void native_load_tr_desc(void)
+{
+	asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
+}
+#endif
+
 static inline unsigned long native_store_tr(void)
 {
 	unsigned long tr;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 280211ad8be9..b58033e991ab 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -705,6 +705,7 @@ extern struct desc_ptr		early_gdt_descr;
 
 extern void cpu_set_gdt(int);
 extern void switch_to_new_gdt(int);
+extern void load_percpu_gdt(int);
 extern void load_remapped_gdt(int);
 extern void load_percpu_segment(int);
 extern void cpu_init(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 7d940b0e805a..bf1266212a40 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -443,18 +443,45 @@ void load_percpu_segment(int cpu)
 	load_stack_canary_segment();
 }
 
+/* Load the GDT from the per-cpu structure */
+void load_percpu_gdt(int cpu)
+{
+	struct desc_ptr gdt_descr;
+
+	gdt_descr.address = (long)get_cpu_gdt_table(cpu);
+	gdt_descr.size = GDT_SIZE - 1;
+	load_gdt(&gdt_descr);
+}
+EXPORT_SYMBOL(load_percpu_gdt);
+
+/* Provide the fixmap address of the remapped GDT */
+struct desc_struct *get_remapped_gdt(int cpu)
+{
+	return (struct desc_struct *)__fix_to_virt(FIX_GDT_REMAP_BEGIN + cpu);
+
+}
+EXPORT_SYMBOL(get_remapped_gdt);
+
+/* On 64bit the GDT remapping is read-only */
+#ifdef CONFIG_X86_64
+#define GDT_REMAP_PROT PAGE_KERNEL_RO
+#else
+#define GDT_REMAP_PROT PAGE_KERNEL
+#endif
+
 /* Load a fixmap remapping of the per-cpu GDT */
 void load_remapped_gdt(int cpu)
 {
 	struct desc_ptr gdt_descr;
 	unsigned long idx = FIX_GDT_REMAP_BEGIN + cpu;
 
-	__set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), PAGE_KERNEL);
+	__set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), GDT_REMAP_PROT);
 
 	gdt_descr.address = (long)__fix_to_virt(idx);
 	gdt_descr.size = GDT_SIZE - 1;
 	load_gdt(&gdt_descr);
 }
+EXPORT_SYMBOL(load_remapped_gdt);
 
 /*
  * Current gdt points %fs at the "master" per-cpu area: after this,
@@ -462,11 +489,8 @@ void load_remapped_gdt(int cpu)
  */
 void switch_to_new_gdt(int cpu)
 {
-	struct desc_ptr gdt_descr;
-
-	gdt_descr.address = (long)get_cpu_gdt_table(cpu);
-	gdt_descr.size = GDT_SIZE - 1;
-	load_gdt(&gdt_descr);
+	/* Load the original GDT */
+	load_percpu_gdt(cpu);
 	/* Reload the per-cpu base */
 	load_percpu_segment(cpu);
 }
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d0414f054bdf..da261f231017 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -741,7 +741,6 @@ static int svm_hardware_enable(void)
 
 	struct svm_cpu_data *sd;
 	uint64_t efer;
-	struct desc_ptr gdt_descr;
 	struct desc_struct *gdt;
 	int me = raw_smp_processor_id();
 
@@ -763,8 +762,7 @@ static int svm_hardware_enable(void)
 	sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
 	sd->next_asid = sd->max_asid + 1;
 
-	native_store_gdt(&gdt_descr);
-	gdt = (struct desc_struct *)gdt_descr.address;
+	gdt = get_current_gdt_table();
 	sd->tss_desc = (struct kvm_ldttss_desc *)(gdt + GDT_ENTRY_TSS);
 
 	wrmsrl(MSR_EFER, efer | EFER_SVME);
@@ -4251,6 +4249,7 @@ static void reload_tss(struct kvm_vcpu *vcpu)
 
 	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	sd->tss_desc->type = 9; /* available 32/64-bit TSS */
+
 	load_TR_desc();
 }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d2fe3a51876c..acbf78c962d0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -935,7 +935,8 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
  * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
  */
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
-static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
+static DEFINE_PER_CPU(struct desc_ptr, host_gdt_desc);
+static DEFINE_PER_CPU(unsigned long, host_gdt);
 
 /*
  * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
@@ -1997,10 +1998,9 @@ static void reload_tss(void)
 	/*
 	 * VT restores TR but not its size.  Useless.
 	 */
-	struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
 	struct desc_struct *descs;
 
-	descs = (void *)gdt->address;
+	descs = (void *)get_cpu_var(host_gdt);
 	descs[GDT_ENTRY_TSS].type = 9; /* available TSS */
 	load_TR_desc();
 }
@@ -2061,7 +2061,6 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 
 static unsigned long segment_base(u16 selector)
 {
-	struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
 	struct desc_struct *d;
 	unsigned long table_base;
 	unsigned long v;
@@ -2069,7 +2068,7 @@ static unsigned long segment_base(u16 selector)
 	if (!(selector & ~3))
 		return 0;
 
-	table_base = gdt->address;
+	table_base = get_cpu_var(host_gdt);
 
 	if (selector & 4) {           /* from ldt */
 		u16 ldt_selector = kvm_read_ldt();
@@ -2185,7 +2184,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
 #endif
 	if (vmx->host_state.msr_host_bndcfgs)
 		wrmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs);
-	load_gdt(this_cpu_ptr(&host_gdt));
+	load_gdt(this_cpu_ptr(&host_gdt_desc));
 }
 
 static void vmx_load_host_state(struct vcpu_vmx *vmx)
@@ -2287,7 +2286,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	}
 
 	if (!already_loaded) {
-		struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
+		unsigned long gdt = get_cpu_var(host_gdt);
 		unsigned long sysenter_esp;
 
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
@@ -2297,7 +2296,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		 * processors.
 		 */
 		vmcs_writel(HOST_TR_BASE, kvm_read_tr_base()); /* 22.2.4 */
-		vmcs_writel(HOST_GDTR_BASE, gdt->address);   /* 22.2.4 */
+		vmcs_writel(HOST_GDTR_BASE, gdt);   /* 22.2.4 */
 
 		rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
 		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
@@ -3523,7 +3522,13 @@ static int hardware_enable(void)
 		ept_sync_global();
 	}
 
-	native_store_gdt(this_cpu_ptr(&host_gdt));
+	native_store_gdt(this_cpu_ptr(&host_gdt_desc));
+
+	/*
+	 * The GDT is remapped and can be read-only, use the underlying memory
+	 * that is always writeable.
+	 */
+	per_cpu(host_gdt, cpu) = (unsigned long)get_current_gdt_table();
 
 	return 0;
 }
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size
  2017-01-20 16:41 [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Thomas Garnier
  2017-01-20 16:41 ` [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section Thomas Garnier
  2017-01-20 16:41 ` [PATCH v1 3/3] x86: Make the GDT remapping read-only on 64 bit Thomas Garnier
@ 2017-01-20 22:24 ` Andy Lutomirski
  2017-01-21  2:43 ` kbuild test robot
  2017-01-21  3:21 ` kbuild test robot
  4 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2017-01-20 22:24 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Kees Cook,
	Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu, X86 ML, linux-kernel, linux-pm, kvm list,
	kernel-hardening

On Fri, Jan 20, 2017 at 8:41 AM, Thomas Garnier <thgarnie@google.com> wrote:
> This patch aligns MODULES_END to the beginning of the Fixmap section.
> It optimizes the space available for both sections. The address is
> pre-computed based on the number of pages required by the Fixmap
> section.
>
> It will allow GDT remapping in the Fixmap section. The current
> MODULES_END static address does not provide enough space for the kernel
> to support a large number of processors.
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>

Looks good to me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section
  2017-01-20 16:41 ` [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section Thomas Garnier
@ 2017-01-21  0:57   ` Andy Lutomirski
  2017-01-21  1:06     ` Thomas Garnier
  2017-01-21  2:23   ` kbuild test robot
  2017-01-21  2:34   ` kbuild test robot
  2 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-01-21  0:57 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Kees Cook,
	Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu, X86 ML, linux-kernel, linux-pm, kvm list,
	kernel-hardening

On Fri, Jan 20, 2017 at 8:41 AM, Thomas Garnier <thgarnie@google.com> wrote:
> Each processor holds a GDT in its per-cpu structure. The sgdt
> instruction gives the base address of the current GDT. This address can
> be used to bypass KASLR memory randomization. With another bug, an
> attacker could target other per-cpu structures or deduce the base of
> the main memory section (PAGE_OFFSET).
>
> This patch relocates the GDT table for each processor inside the
> Fixmap section. The space is reserved based on number of supported
> cpus.
>
> For consistency, the remapping is done by default on 32 and 64 bit.
>
> Each processor switches to its remapped GDT at the end of
> initialization. For hibernation, the main processor returns with the
> original GDT and switches back to the remapping at completion.
>
> On 32 bit, the maximum number of processors is now 256. The Fixmap
> section cannot handle the original 512. Additional asserts ensure that
> the Fixmap section cannot grow beyond the space available.
>
> This patch was tested on both architectures. Hibernation and KVM were
> both tested specially for their usage of the GDT.
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20170119
> ---
>  arch/x86/Kconfig                 |  1 +
>  arch/x86/include/asm/fixmap.h    |  4 ++++
>  arch/x86/include/asm/processor.h |  1 +
>  arch/x86/kernel/cpu/common.c     | 18 +++++++++++++++++-
>  arch/x86/mm/init_32.c            |  2 ++
>  arch/x86/power/cpu.c             |  3 +++
>  6 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index f1d4e8f2131f..b4ed35db10a8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -912,6 +912,7 @@ config MAXSMP
>  config NR_CPUS
>         int "Maximum number of CPUs" if SMP && !MAXSMP
>         range 2 8 if SMP && X86_32 && !X86_BIGSMP
> +       range 2 256 if SMP && X86_32 && X86_BIGSMP
>         range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK
>         range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64
>         default "1" if !SMP
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index c46289799b02..8b913b5e9383 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -100,6 +100,10 @@ enum fixed_addresses {
>  #ifdef CONFIG_X86_INTEL_MID
>         FIX_LNW_VRTC,
>  #endif
> +       /* Fixmap entries to remap the GDTs, one per processor. */
> +       FIX_GDT_REMAP_BEGIN,
> +       FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
> +
>         __end_of_permanent_fixed_addresses,
>
>         /*
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 1be64da0384e..280211ad8be9 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -705,6 +705,7 @@ extern struct desc_ptr              early_gdt_descr;
>
>  extern void cpu_set_gdt(int);
>  extern void switch_to_new_gdt(int);
> +extern void load_remapped_gdt(int);
>  extern void load_percpu_segment(int);
>  extern void cpu_init(void);
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e97ffc8d29f4..7d940b0e805a 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -443,6 +443,19 @@ void load_percpu_segment(int cpu)
>         load_stack_canary_segment();
>  }
>
> +/* Load a fixmap remapping of the per-cpu GDT */
> +void load_remapped_gdt(int cpu)
> +{
> +       struct desc_ptr gdt_descr;
> +       unsigned long idx = FIX_GDT_REMAP_BEGIN + cpu;
> +
> +       __set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), PAGE_KERNEL);
> +
> +       gdt_descr.address = (long)__fix_to_virt(idx);
> +       gdt_descr.size = GDT_SIZE - 1;
> +       load_gdt(&gdt_descr);
> +}

IMO this should be split into two functions, one to set up the fixmap
entry and one to load the GDT.

Also, would it be possible to rename the various gdt helpers so that
their functionality is more obvious?  For example, get_cpu_gdt_table()
could be get_cpu_direct_gdt_table() and the function to load the gdt
could be load_fixmap_gdt().

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 3/3] x86: Make the GDT remapping read-only on 64 bit
  2017-01-20 16:41 ` [PATCH v1 3/3] x86: Make the GDT remapping read-only on 64 bit Thomas Garnier
@ 2017-01-21  1:06   ` Andy Lutomirski
  2017-01-21  1:14     ` Thomas Garnier
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-01-21  1:06 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Kees Cook,
	Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu, X86 ML, linux-kernel, linux-pm, kvm list,
	kernel-hardening

On Fri, Jan 20, 2017 at 8:41 AM, Thomas Garnier <thgarnie@google.com> wrote:
> This patch makes the GDT remapped pages read-only to prevent corruption.
> This change is done only on 64 bit.
>
> The native_load_tr_desc function was adapted to correctly handle a
> read-only GDT. The LTR instruction always writes to the GDT TSS entry.
> This generates a page fault if the GDT is read-only. This change checks
> if the current GDT is a remap and swap GDTs as needed. This function was
> tested by booting multiple machines and checking hibernation works
> properly.
>
> KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the GDT
> description and writeble address were put in two per-cpu variables for
> convenience. For testing, VMs were started and restored on multiple
> configurations.
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20170119
> ---
>  arch/x86/include/asm/desc.h      | 44 +++++++++++++++++++++++++++++++++++-----
>  arch/x86/include/asm/processor.h |  1 +
>  arch/x86/kernel/cpu/common.c     | 36 ++++++++++++++++++++++++++------
>  arch/x86/kvm/svm.c               |  5 ++---
>  arch/x86/kvm/vmx.c               | 23 +++++++++++++--------
>  5 files changed, 86 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index 12080d87da3b..6ed68d449c7b 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -50,6 +50,13 @@ static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
>         return per_cpu(gdt_page, cpu).gdt;
>  }
>
> +static inline struct desc_struct *get_current_gdt_table(void)
> +{
> +       return get_cpu_var(gdt_page).gdt;
> +}

This function is poorly named IMO.  Which GDT address does it return?
Can we call it get_current_direct_gdt()?  Also, IIRC
this_cpu_read(gdt_page.gdt) generates better code.

> +
> +struct desc_struct *get_remapped_gdt(int cpu);

And get_current_fixmap_gdt(void) perhaps?  And why isn't it inline?
It's very simple.

> +/*
> + * The LTR instruction marks the TSS GDT entry as busy. In 64bit, the GDT is
> + * a read-only remapping. To prevent a page fault, the GDT is switched to the
> + * original writeable version when needed.
> + */
> +#ifdef CONFIG_X86_64
> +static inline void native_load_tr_desc(void)
> +{
> +       struct desc_ptr gdt;
> +       int cpu = raw_smp_processor_id();
> +       bool restore = false;
> +       struct desc_struct *remapped_gdt;
> +
> +       native_store_gdt(&gdt);
> +       remapped_gdt = get_remapped_gdt(cpu);
> +
> +       /* Swap and restore only if the current GDT is the remap. */
> +       if (gdt.address == (unsigned long)remapped_gdt) {
> +               load_percpu_gdt(cpu);

This line (load_percpu_gdt(cpu)) is hard to understand because of the
poorly named function.

>  /* Load a fixmap remapping of the per-cpu GDT */
>  void load_remapped_gdt(int cpu)
>  {
>         struct desc_ptr gdt_descr;
>         unsigned long idx = FIX_GDT_REMAP_BEGIN + cpu;
>
> -       __set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), PAGE_KERNEL);
> +       __set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), GDT_REMAP_PROT);

How about PAGE_FIXMAP_GDT instead of GDT_REMAP_PROT?

> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d0414f054bdf..da261f231017 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -741,7 +741,6 @@ static int svm_hardware_enable(void)
>
>         struct svm_cpu_data *sd;
>         uint64_t efer;
> -       struct desc_ptr gdt_descr;
>         struct desc_struct *gdt;
>         int me = raw_smp_processor_id();
>
> @@ -763,8 +762,7 @@ static int svm_hardware_enable(void)
>         sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
>         sd->next_asid = sd->max_asid + 1;
>
> -       native_store_gdt(&gdt_descr);
> -       gdt = (struct desc_struct *)gdt_descr.address;
> +       gdt = get_current_gdt_table();
>         sd->tss_desc = (struct kvm_ldttss_desc *)(gdt + GDT_ENTRY_TSS);
>
>         wrmsrl(MSR_EFER, efer | EFER_SVME);
> @@ -4251,6 +4249,7 @@ static void reload_tss(struct kvm_vcpu *vcpu)
>
>         struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>         sd->tss_desc->type = 9; /* available 32/64-bit TSS */
> +
>         load_TR_desc();
>  }

Paulo, are you okay with the performance hit here?  Is there any easy
way to avoid it?

>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d2fe3a51876c..acbf78c962d0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -935,7 +935,8 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>   * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
>   */
>  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> -static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> +static DEFINE_PER_CPU(struct desc_ptr, host_gdt_desc);
> +static DEFINE_PER_CPU(unsigned long, host_gdt);

This pair of variables is inscrutible IMO.  It should at least have a
comment, but better names would help even more.

>
>  /*
>   * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> @@ -1997,10 +1998,9 @@ static void reload_tss(void)
>         /*
>          * VT restores TR but not its size.  Useless.
>          */
> -       struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
>         struct desc_struct *descs;
>
> -       descs = (void *)gdt->address;
> +       descs = (void *)get_cpu_var(host_gdt);
>         descs[GDT_ENTRY_TSS].type = 9; /* available TSS */
>         load_TR_desc();
>  }
> @@ -2061,7 +2061,6 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>
>  static unsigned long segment_base(u16 selector)
>  {
> -       struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
>         struct desc_struct *d;
>         unsigned long table_base;
>         unsigned long v;
> @@ -2069,7 +2068,7 @@ static unsigned long segment_base(u16 selector)
>         if (!(selector & ~3))
>                 return 0;
>
> -       table_base = gdt->address;
> +       table_base = get_cpu_var(host_gdt);

this_cpu_read() if possible, please.  But can't this just use the
generic accessor instead of a KVM-specific percpu variable?

>
>         if (selector & 4) {           /* from ldt */
>                 u16 ldt_selector = kvm_read_ldt();
> @@ -2185,7 +2184,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
>  #endif
>         if (vmx->host_state.msr_host_bndcfgs)
>                 wrmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs);
> -       load_gdt(this_cpu_ptr(&host_gdt));
> +       load_gdt(this_cpu_ptr(&host_gdt_desc));
>  }

I assume the intent is to have the VM exit restore the direct GDT,
then to load the new TR, then to load the fixmap GDT.  Is that indeed
the case?.

>
>  static void vmx_load_host_state(struct vcpu_vmx *vmx)
> @@ -2287,7 +2286,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>         }
>
>         if (!already_loaded) {
> -               struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
> +               unsigned long gdt = get_cpu_var(host_gdt);
>                 unsigned long sysenter_esp;
>
>                 kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> @@ -2297,7 +2296,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>                  * processors.
>                  */
>                 vmcs_writel(HOST_TR_BASE, kvm_read_tr_base()); /* 22.2.4 */
> -               vmcs_writel(HOST_GDTR_BASE, gdt->address);   /* 22.2.4 */
> +               vmcs_writel(HOST_GDTR_BASE, gdt);   /* 22.2.4 */
>
>                 rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
>                 vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
> @@ -3523,7 +3522,13 @@ static int hardware_enable(void)
>                 ept_sync_global();
>         }
>
> -       native_store_gdt(this_cpu_ptr(&host_gdt));
> +       native_store_gdt(this_cpu_ptr(&host_gdt_desc));
> +
> +       /*
> +        * The GDT is remapped and can be read-only, use the underlying memory
> +        * that is always writeable.
> +        */
> +       per_cpu(host_gdt, cpu) = (unsigned long)get_current_gdt_table();

this_cpu_write().  Better yet, just get rid of this entirely.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section
  2017-01-21  0:57   ` Andy Lutomirski
@ 2017-01-21  1:06     ` Thomas Garnier
  2017-01-25 20:10       ` Thomas Garnier
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Garnier @ 2017-01-21  1:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Kees Cook,
	Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu, X86 ML, linux-kernel, linux-pm, kvm list,
	kernel-hardening

On Fri, Jan 20, 2017 at 4:57 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Jan 20, 2017 at 8:41 AM, Thomas Garnier <thgarnie@google.com> wrote:
>> Each processor holds a GDT in its per-cpu structure. The sgdt
>> instruction gives the base address of the current GDT. This address can
>> be used to bypass KASLR memory randomization. With another bug, an
>> attacker could target other per-cpu structures or deduce the base of
>> the main memory section (PAGE_OFFSET).
>>
>> This patch relocates the GDT table for each processor inside the
>> Fixmap section. The space is reserved based on number of supported
>> cpus.
>>
>> For consistency, the remapping is done by default on 32 and 64 bit.
>>
>> Each processor switches to its remapped GDT at the end of
>> initialization. For hibernation, the main processor returns with the
>> original GDT and switches back to the remapping at completion.
>>
>> On 32 bit, the maximum number of processors is now 256. The Fixmap
>> section cannot handle the original 512. Additional asserts ensure that
>> the Fixmap section cannot grow beyond the space available.
>>
>> This patch was tested on both architectures. Hibernation and KVM were
>> both tested specially for their usage of the GDT.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20170119
>> ---
>>  arch/x86/Kconfig                 |  1 +
>>  arch/x86/include/asm/fixmap.h    |  4 ++++
>>  arch/x86/include/asm/processor.h |  1 +
>>  arch/x86/kernel/cpu/common.c     | 18 +++++++++++++++++-
>>  arch/x86/mm/init_32.c            |  2 ++
>>  arch/x86/power/cpu.c             |  3 +++
>>  6 files changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index f1d4e8f2131f..b4ed35db10a8 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -912,6 +912,7 @@ config MAXSMP
>>  config NR_CPUS
>>         int "Maximum number of CPUs" if SMP && !MAXSMP
>>         range 2 8 if SMP && X86_32 && !X86_BIGSMP
>> +       range 2 256 if SMP && X86_32 && X86_BIGSMP
>>         range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK
>>         range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64
>>         default "1" if !SMP
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index c46289799b02..8b913b5e9383 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -100,6 +100,10 @@ enum fixed_addresses {
>>  #ifdef CONFIG_X86_INTEL_MID
>>         FIX_LNW_VRTC,
>>  #endif
>> +       /* Fixmap entries to remap the GDTs, one per processor. */
>> +       FIX_GDT_REMAP_BEGIN,
>> +       FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>> +
>>         __end_of_permanent_fixed_addresses,
>>
>>         /*
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 1be64da0384e..280211ad8be9 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -705,6 +705,7 @@ extern struct desc_ptr              early_gdt_descr;
>>
>>  extern void cpu_set_gdt(int);
>>  extern void switch_to_new_gdt(int);
>> +extern void load_remapped_gdt(int);
>>  extern void load_percpu_segment(int);
>>  extern void cpu_init(void);
>>
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index e97ffc8d29f4..7d940b0e805a 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -443,6 +443,19 @@ void load_percpu_segment(int cpu)
>>         load_stack_canary_segment();
>>  }
>>
>> +/* Load a fixmap remapping of the per-cpu GDT */
>> +void load_remapped_gdt(int cpu)
>> +{
>> +       struct desc_ptr gdt_descr;
>> +       unsigned long idx = FIX_GDT_REMAP_BEGIN + cpu;
>> +
>> +       __set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), PAGE_KERNEL);
>> +
>> +       gdt_descr.address = (long)__fix_to_virt(idx);
>> +       gdt_descr.size = GDT_SIZE - 1;
>> +       load_gdt(&gdt_descr);
>> +}
>
> IMO this should be split into two functions, one to set up the fixmap
> entry and one to load the GDT.
>

That make sense.

> Also, would it be possible to rename the various gdt helpers so that
> their functionality is more obvious?  For example, get_cpu_gdt_table()
> could be get_cpu_direct_gdt_table() and the function to load the gdt
> could be load_fixmap_gdt().
>

Sure no problem.

> --Andy



-- 
Thomas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 3/3] x86: Make the GDT remapping read-only on 64 bit
  2017-01-21  1:06   ` Andy Lutomirski
@ 2017-01-21  1:14     ` Thomas Garnier
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Garnier @ 2017-01-21  1:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Kees Cook,
	Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu, X86 ML, linux-kernel, linux-pm, kvm list,
	kernel-hardening

On Fri, Jan 20, 2017 at 5:06 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Jan 20, 2017 at 8:41 AM, Thomas Garnier <thgarnie@google.com> wrote:
>> This patch makes the GDT remapped pages read-only to prevent corruption.
>> This change is done only on 64 bit.
>>
>> The native_load_tr_desc function was adapted to correctly handle a
>> read-only GDT. The LTR instruction always writes to the GDT TSS entry.
>> This generates a page fault if the GDT is read-only. This change checks
>> if the current GDT is a remap and swap GDTs as needed. This function was
>> tested by booting multiple machines and checking hibernation works
>> properly.
>>
>> KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the GDT
>> description and writeble address were put in two per-cpu variables for
>> convenience. For testing, VMs were started and restored on multiple
>> configurations.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20170119
>> ---
>>  arch/x86/include/asm/desc.h      | 44 +++++++++++++++++++++++++++++++++++-----
>>  arch/x86/include/asm/processor.h |  1 +
>>  arch/x86/kernel/cpu/common.c     | 36 ++++++++++++++++++++++++++------
>>  arch/x86/kvm/svm.c               |  5 ++---
>>  arch/x86/kvm/vmx.c               | 23 +++++++++++++--------
>>  5 files changed, 86 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
>> index 12080d87da3b..6ed68d449c7b 100644
>> --- a/arch/x86/include/asm/desc.h
>> +++ b/arch/x86/include/asm/desc.h
>> @@ -50,6 +50,13 @@ static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
>>         return per_cpu(gdt_page, cpu).gdt;
>>  }
>>
>> +static inline struct desc_struct *get_current_gdt_table(void)
>> +{
>> +       return get_cpu_var(gdt_page).gdt;
>> +}
>
> This function is poorly named IMO.  Which GDT address does it return?
> Can we call it get_current_direct_gdt()?  Also, IIRC
> this_cpu_read(gdt_page.gdt) generates better code.
>

I agree. I will use this_cpu_read and rename the function.

>> +
>> +struct desc_struct *get_remapped_gdt(int cpu);
>
> And get_current_fixmap_gdt(void) perhaps?  And why isn't it inline?
> It's very simple.
>

I was not sure about the KVM dependencies on fixmap. It should be
alright, I will add it to desc.h.

>> +/*
>> + * The LTR instruction marks the TSS GDT entry as busy. In 64bit, the GDT is
>> + * a read-only remapping. To prevent a page fault, the GDT is switched to the
>> + * original writeable version when needed.
>> + */
>> +#ifdef CONFIG_X86_64
>> +static inline void native_load_tr_desc(void)
>> +{
>> +       struct desc_ptr gdt;
>> +       int cpu = raw_smp_processor_id();
>> +       bool restore = false;
>> +       struct desc_struct *remapped_gdt;
>> +
>> +       native_store_gdt(&gdt);
>> +       remapped_gdt = get_remapped_gdt(cpu);
>> +
>> +       /* Swap and restore only if the current GDT is the remap. */
>> +       if (gdt.address == (unsigned long)remapped_gdt) {
>> +               load_percpu_gdt(cpu);
>
> This line (load_percpu_gdt(cpu)) is hard to understand because of the
> poorly named function.
>

It should make more sense with direct/fixmap naming.

>>  /* Load a fixmap remapping of the per-cpu GDT */
>>  void load_remapped_gdt(int cpu)
>>  {
>>         struct desc_ptr gdt_descr;
>>         unsigned long idx = FIX_GDT_REMAP_BEGIN + cpu;
>>
>> -       __set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), PAGE_KERNEL);
>> +       __set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), GDT_REMAP_PROT);
>
> How about PAGE_FIXMAP_GDT instead of GDT_REMAP_PROT?
>

Sure.

>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index d0414f054bdf..da261f231017 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -741,7 +741,6 @@ static int svm_hardware_enable(void)
>>
>>         struct svm_cpu_data *sd;
>>         uint64_t efer;
>> -       struct desc_ptr gdt_descr;
>>         struct desc_struct *gdt;
>>         int me = raw_smp_processor_id();
>>
>> @@ -763,8 +762,7 @@ static int svm_hardware_enable(void)
>>         sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
>>         sd->next_asid = sd->max_asid + 1;
>>
>> -       native_store_gdt(&gdt_descr);
>> -       gdt = (struct desc_struct *)gdt_descr.address;
>> +       gdt = get_current_gdt_table();
>>         sd->tss_desc = (struct kvm_ldttss_desc *)(gdt + GDT_ENTRY_TSS);
>>
>>         wrmsrl(MSR_EFER, efer | EFER_SVME);
>> @@ -4251,6 +4249,7 @@ static void reload_tss(struct kvm_vcpu *vcpu)
>>
>>         struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>>         sd->tss_desc->type = 9; /* available 32/64-bit TSS */
>> +
>>         load_TR_desc();
>>  }
>
> Paulo, are you okay with the performance hit here?  Is there any easy
> way to avoid it?
>
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index d2fe3a51876c..acbf78c962d0 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -935,7 +935,8 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>>   * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
>>   */
>>  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>> -static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
>> +static DEFINE_PER_CPU(struct desc_ptr, host_gdt_desc);
>> +static DEFINE_PER_CPU(unsigned long, host_gdt);
>
> This pair of variables is inscrutible IMO.  It should at least have a
> comment, but better names would help even more.
>

Easy to comments. What name would you suggest for GDT descriptor?

>>
>>  /*
>>   * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
>> @@ -1997,10 +1998,9 @@ static void reload_tss(void)
>>         /*
>>          * VT restores TR but not its size.  Useless.
>>          */
>> -       struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
>>         struct desc_struct *descs;
>>
>> -       descs = (void *)gdt->address;
>> +       descs = (void *)get_cpu_var(host_gdt);
>>         descs[GDT_ENTRY_TSS].type = 9; /* available TSS */
>>         load_TR_desc();
>>  }
>> @@ -2061,7 +2061,6 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>>
>>  static unsigned long segment_base(u16 selector)
>>  {
>> -       struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
>>         struct desc_struct *d;
>>         unsigned long table_base;
>>         unsigned long v;
>> @@ -2069,7 +2068,7 @@ static unsigned long segment_base(u16 selector)
>>         if (!(selector & ~3))
>>                 return 0;
>>
>> -       table_base = gdt->address;
>> +       table_base = get_cpu_var(host_gdt);
>
> this_cpu_read() if possible, please.  But can't this just use the
> generic accessor instead of a KVM-specific percpu variable?
>

Yes, it could. In that case, we could keep host_gdt for the GDT
descriptor and use the new current direct GDT function.

>>
>>         if (selector & 4) {           /* from ldt */
>>                 u16 ldt_selector = kvm_read_ldt();
>> @@ -2185,7 +2184,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
>>  #endif
>>         if (vmx->host_state.msr_host_bndcfgs)
>>                 wrmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs);
>> -       load_gdt(this_cpu_ptr(&host_gdt));
>> +       load_gdt(this_cpu_ptr(&host_gdt_desc));
>>  }
>
> I assume the intent is to have the VM exit restore the direct GDT,
> then to load the new TR, then to load the fixmap GDT.  Is that indeed
> the case?.
>

Yes, that's correct.

>>
>>  static void vmx_load_host_state(struct vcpu_vmx *vmx)
>> @@ -2287,7 +2286,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>         }
>>
>>         if (!already_loaded) {
>> -               struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
>> +               unsigned long gdt = get_cpu_var(host_gdt);
>>                 unsigned long sysenter_esp;
>>
>>                 kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>> @@ -2297,7 +2296,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>                  * processors.
>>                  */
>>                 vmcs_writel(HOST_TR_BASE, kvm_read_tr_base()); /* 22.2.4 */
>> -               vmcs_writel(HOST_GDTR_BASE, gdt->address);   /* 22.2.4 */
>> +               vmcs_writel(HOST_GDTR_BASE, gdt);   /* 22.2.4 */
>>
>>                 rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
>>                 vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
>> @@ -3523,7 +3522,13 @@ static int hardware_enable(void)
>>                 ept_sync_global();
>>         }
>>
>> -       native_store_gdt(this_cpu_ptr(&host_gdt));
>> +       native_store_gdt(this_cpu_ptr(&host_gdt_desc));
>> +
>> +       /*
>> +        * The GDT is remapped and can be read-only, use the underlying memory
>> +        * that is always writeable.
>> +        */
>> +       per_cpu(host_gdt, cpu) = (unsigned long)get_current_gdt_table();
>
> this_cpu_write().  Better yet, just get rid of this entirely.
>

Will do. Thanks for the feedback.

> --Andy



-- 
Thomas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section
  2017-01-20 16:41 ` [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section Thomas Garnier
  2017-01-21  0:57   ` Andy Lutomirski
@ 2017-01-21  2:23   ` kbuild test robot
  2017-01-21  2:34   ` kbuild test robot
  2 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-01-21  2:23 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: kbuild-all, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Thomas Garnier, Kees Cook, Rafael J . Wysocki, Pavel Machek,
	Andy Lutomirski, Borislav Petkov, Christian Borntraeger,
	Brian Gerst, He Chen, Dave Hansen, Chen Yucong, Baoquan He,
	Paul Gortmaker, Joerg Roedel, Paolo Bonzini,
	Radim Krčmář,
	Fenghua Yu, x86, linux-kernel, linux-pm, kvm, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 4773 bytes --]

Hi Thomas,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.10-rc4 next-20170120]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Thomas-Garnier/x86-mm-Adapt-MODULES_END-based-on-Fixmap-section-size/20170121-094756
config: i386-randconfig-x004-201703 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/list.h:4,
                    from include/linux/signal.h:4,
                    from arch/x86/mm/init_32.c:8:
   arch/x86/mm/init_32.c: In function 'mem_init':
>> include/linux/compiler.h:518:38: error: call to '__compiletime_assert_801' declared with attribute error: BUILD_BUG_ON failed: __fix_to_virt(__end_of_fixed_addresses) <= PKMAP_BASE + LAST_PKMAP*PAGE_SIZE
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:501:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:518:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/bug.h:54:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/bug.h:78:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
   arch/x86/mm/init_32.c:801:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= PKMAP_BASE + LAST_PKMAP*PAGE_SIZE);
     ^~~~~~~~~~~~
   include/linux/compiler.h:518:38: error: call to '__compiletime_assert_805' declared with attribute error: BUILD_BUG_ON failed: __fix_to_virt(__end_of_fixed_addresses) <= VMALLOC_END
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:501:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:518:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/bug.h:54:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/bug.h:78:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
   arch/x86/mm/init_32.c:805:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= VMALLOC_END);
     ^~~~~~~~~~~~

vim +/__compiletime_assert_801 +518 include/linux/compiler.h

9a8ab1c3 Daniel Santos  2013-02-21  512   *
9a8ab1c3 Daniel Santos  2013-02-21  513   * In tradition of POSIX assert, this macro will break the build if the
9a8ab1c3 Daniel Santos  2013-02-21  514   * supplied condition is *false*, emitting the supplied error message if the
9a8ab1c3 Daniel Santos  2013-02-21  515   * compiler has support to do so.
9a8ab1c3 Daniel Santos  2013-02-21  516   */
9a8ab1c3 Daniel Santos  2013-02-21  517  #define compiletime_assert(condition, msg) \
9a8ab1c3 Daniel Santos  2013-02-21 @518  	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
9a8ab1c3 Daniel Santos  2013-02-21  519  
47933ad4 Peter Zijlstra 2013-11-06  520  #define compiletime_assert_atomic_type(t)				\
47933ad4 Peter Zijlstra 2013-11-06  521  	compiletime_assert(__native_word(t),				\

:::::: The code at line 518 was first introduced by commit
:::::: 9a8ab1c39970a4938a72d94e6fd13be88a797590 bug.h, compiler.h: introduce compiletime_assert & BUILD_BUG_ON_MSG

:::::: TO: Daniel Santos <daniel.santos@pobox.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27588 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section
  2017-01-20 16:41 ` [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section Thomas Garnier
  2017-01-21  0:57   ` Andy Lutomirski
  2017-01-21  2:23   ` kbuild test robot
@ 2017-01-21  2:34   ` kbuild test robot
  2 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-01-21  2:34 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: kbuild-all, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Thomas Garnier, Kees Cook, Rafael J . Wysocki, Pavel Machek,
	Andy Lutomirski, Borislav Petkov, Christian Borntraeger,
	Brian Gerst, He Chen, Dave Hansen, Chen Yucong, Baoquan He,
	Paul Gortmaker, Joerg Roedel, Paolo Bonzini,
	Radim Krčmář,
	Fenghua Yu, x86, linux-kernel, linux-pm, kvm, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

Hi Thomas,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.10-rc4 next-20170120]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Thomas-Garnier/x86-mm-Adapt-MODULES_END-based-on-Fixmap-section-size/20170121-094756
config: i386-randconfig-x003-201703 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/list.h:4,
                    from include/linux/signal.h:4,
                    from arch/x86/mm/init_32.c:8:
   arch/x86/mm/init_32.c: In function 'mem_init':
>> include/linux/compiler.h:518:38: error: call to '__compiletime_assert_805' declared with attribute error: BUILD_BUG_ON failed: __fix_to_virt(__end_of_fixed_addresses) <= VMALLOC_END
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:501:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:518:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/bug.h:54:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/bug.h:78:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
>> arch/x86/mm/init_32.c:805:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= VMALLOC_END);
     ^~~~~~~~~~~~

vim +/__compiletime_assert_805 +518 include/linux/compiler.h

9a8ab1c3 Daniel Santos  2013-02-21  512   *
9a8ab1c3 Daniel Santos  2013-02-21  513   * In tradition of POSIX assert, this macro will break the build if the
9a8ab1c3 Daniel Santos  2013-02-21  514   * supplied condition is *false*, emitting the supplied error message if the
9a8ab1c3 Daniel Santos  2013-02-21  515   * compiler has support to do so.
9a8ab1c3 Daniel Santos  2013-02-21  516   */
9a8ab1c3 Daniel Santos  2013-02-21  517  #define compiletime_assert(condition, msg) \
9a8ab1c3 Daniel Santos  2013-02-21 @518  	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
9a8ab1c3 Daniel Santos  2013-02-21  519  
47933ad4 Peter Zijlstra 2013-11-06  520  #define compiletime_assert_atomic_type(t)				\
47933ad4 Peter Zijlstra 2013-11-06  521  	compiletime_assert(__native_word(t),				\

:::::: The code at line 518 was first introduced by commit
:::::: 9a8ab1c39970a4938a72d94e6fd13be88a797590 bug.h, compiler.h: introduce compiletime_assert & BUILD_BUG_ON_MSG

:::::: TO: Daniel Santos <daniel.santos@pobox.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22314 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size
  2017-01-20 16:41 [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Thomas Garnier
                   ` (2 preceding siblings ...)
  2017-01-20 22:24 ` [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Andy Lutomirski
@ 2017-01-21  2:43 ` kbuild test robot
  2017-01-21  3:21 ` kbuild test robot
  4 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-01-21  2:43 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: kbuild-all, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Thomas Garnier, Kees Cook, Rafael J . Wysocki, Pavel Machek,
	Andy Lutomirski, Borislav Petkov, Christian Borntraeger,
	Brian Gerst, He Chen, Dave Hansen, Chen Yucong, Baoquan He,
	Paul Gortmaker, Joerg Roedel, Paolo Bonzini,
	Radim Krčmář,
	Fenghua Yu, x86, linux-kernel, linux-pm, kvm, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 2259 bytes --]

Hi Thomas,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.10-rc4 next-20170120]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Thomas-Garnier/x86-mm-Adapt-MODULES_END-based-on-Fixmap-section-size/20170121-094756
config: x86_64-randconfig-x002-201703 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/mm/kasan_init_64.c: In function 'kasan_init':
>> arch/x86/mm/kasan_init_64.c:120:57: error: 'MODULES_END' undeclared (first use in this function)
     kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
                                                            ^~~~~~~~~~~
   arch/x86/mm/kasan_init_64.c:120:57: note: each undeclared identifier is reported only once for each function it appears in

vim +/MODULES_END +120 arch/x86/mm/kasan_init_64.c

c420f167d Andrey Ryabinin 2015-02-13  114  		kasan_mem_to_shadow((void *)__START_KERNEL_map));
c420f167d Andrey Ryabinin 2015-02-13  115  
c420f167d Andrey Ryabinin 2015-02-13  116  	vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext),
c420f167d Andrey Ryabinin 2015-02-13  117  			(unsigned long)kasan_mem_to_shadow(_end),
c420f167d Andrey Ryabinin 2015-02-13  118  			NUMA_NO_NODE);
c420f167d Andrey Ryabinin 2015-02-13  119  
69786cdb3 Andrey Ryabinin 2015-08-13 @120  	kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
ef7f0d6a6 Andrey Ryabinin 2015-02-13  121  			(void *)KASAN_SHADOW_END);
ef7f0d6a6 Andrey Ryabinin 2015-02-13  122  
ef7f0d6a6 Andrey Ryabinin 2015-02-13  123  	load_cr3(init_level4_pgt);

:::::: The code at line 120 was first introduced by commit
:::::: 69786cdb379bbc6eab14cf2393c1abd879316e85 x86/kasan, mm: Introduce generic kasan_populate_zero_shadow()

:::::: TO: Andrey Ryabinin <ryabinin.a.a@gmail.com>
:::::: CC: Ingo Molnar <mingo@kernel.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26686 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size
  2017-01-20 16:41 [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Thomas Garnier
                   ` (3 preceding siblings ...)
  2017-01-21  2:43 ` kbuild test robot
@ 2017-01-21  3:21 ` kbuild test robot
  4 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-01-21  3:21 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: kbuild-all, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Thomas Garnier, Kees Cook, Rafael J . Wysocki, Pavel Machek,
	Andy Lutomirski, Borislav Petkov, Christian Borntraeger,
	Brian Gerst, He Chen, Dave Hansen, Chen Yucong, Baoquan He,
	Paul Gortmaker, Joerg Roedel, Paolo Bonzini,
	Radim Krčmář,
	Fenghua Yu, x86, linux-kernel, linux-pm, kvm, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

Hi Thomas,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.10-rc4 next-20170120]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Thomas-Garnier/x86-mm-Adapt-MODULES_END-based-on-Fixmap-section-size/20170121-094756
config: x86_64-randconfig-x008-201703 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> arch/x86/mm/dump_pagetables.c:85:4: error: 'MODULES_VADDR' undeclared here (not in a function)
     { MODULES_VADDR,        "Modules" },
       ^~~~~~~~~~~~~
>> arch/x86/mm/dump_pagetables.c:86:4: error: 'MODULES_END' undeclared here (not in a function)
     { MODULES_END,          "End Modules" },
       ^~~~~~~~~~~

vim +/MODULES_VADDR +85 arch/x86/mm/dump_pagetables.c

3891a04aa H. Peter Anvin 2014-04-29  79  	{ ESPFIX_BASE_ADDR,	"ESPfix Area", 16 },
8a5a5d153 Mathias Krause 2014-09-07  80  # endif
8266e31ed Mathias Krause 2014-09-21  81  # ifdef CONFIG_EFI
8266e31ed Mathias Krause 2014-09-21  82  	{ EFI_VA_END,		"EFI Runtime Services" },
8266e31ed Mathias Krause 2014-09-21  83  # endif
fe770bf03 H. Peter Anvin 2008-04-17  84  	{ __START_KERNEL_map,   "High Kernel Mapping" },
9a79cf9c1 Yinghai Lu     2008-03-07 @85  	{ MODULES_VADDR,        "Modules" },
9a79cf9c1 Yinghai Lu     2008-03-07 @86  	{ MODULES_END,          "End Modules" },
fe770bf03 H. Peter Anvin 2008-04-17  87  #else
fe770bf03 H. Peter Anvin 2008-04-17  88  	{ PAGE_OFFSET,          "Kernel Mapping" },
fe770bf03 H. Peter Anvin 2008-04-17  89  	{ 0/* VMALLOC_START */, "vmalloc() Area" },

:::::: The code at line 85 was first introduced by commit
:::::: 9a79cf9c1aa671325fa5ba37576c2cee23823d04 x86: sort address_markers for dump_pagetables

:::::: TO: Yinghai Lu <yhlu.kernel@gmail.com>
:::::: CC: Ingo Molnar <mingo@elte.hu>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25543 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section
  2017-01-21  1:06     ` Thomas Garnier
@ 2017-01-25 20:10       ` Thomas Garnier
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Garnier @ 2017-01-25 20:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Kees Cook,
	Rafael J . Wysocki, Pavel Machek, Andy Lutomirski,
	Borislav Petkov, Christian Borntraeger, Brian Gerst, He Chen,
	Dave Hansen, Chen Yucong, Baoquan He, Paul Gortmaker,
	Joerg Roedel, Paolo Bonzini, Radim Krčmář,
	Fenghua Yu, X86 ML, linux-kernel, linux-pm, kvm list,
	kernel-hardening

The kbuild bot found interesting build errors with the new
BUILD_BUG_ON on 32bit (64G mem support). I think I went a bit too far
with them given the ioremap part is just temporary on early boot.

I removed them and tested different configurations trying to use as
much fixmap as possible (DEBUG_HIGHMEM, 64G/PAE support and more).
Everything looks good and we can still support 512 processors.

Next iteration, I will remove the BUILD_BUG_ON that I added and remove
restriction to 256 back to 512. On top of all changes suggested by
Andy on the patch set.

On Fri, Jan 20, 2017 at 5:06 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Fri, Jan 20, 2017 at 4:57 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Jan 20, 2017 at 8:41 AM, Thomas Garnier <thgarnie@google.com> wrote:
>>> Each processor holds a GDT in its per-cpu structure. The sgdt
>>> instruction gives the base address of the current GDT. This address can
>>> be used to bypass KASLR memory randomization. With another bug, an
>>> attacker could target other per-cpu structures or deduce the base of
>>> the main memory section (PAGE_OFFSET).
>>>
>>> This patch relocates the GDT table for each processor inside the
>>> Fixmap section. The space is reserved based on number of supported
>>> cpus.
>>>
>>> For consistency, the remapping is done by default on 32 and 64 bit.
>>>
>>> Each processor switches to its remapped GDT at the end of
>>> initialization. For hibernation, the main processor returns with the
>>> original GDT and switches back to the remapping at completion.
>>>
>>> On 32 bit, the maximum number of processors is now 256. The Fixmap
>>> section cannot handle the original 512. Additional asserts ensure that
>>> the Fixmap section cannot grow beyond the space available.
>>>
>>> This patch was tested on both architectures. Hibernation and KVM were
>>> both tested specially for their usage of the GDT.
>>>
>>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>>> ---
>>> Based on next-20170119
>>> ---
>>>  arch/x86/Kconfig                 |  1 +
>>>  arch/x86/include/asm/fixmap.h    |  4 ++++
>>>  arch/x86/include/asm/processor.h |  1 +
>>>  arch/x86/kernel/cpu/common.c     | 18 +++++++++++++++++-
>>>  arch/x86/mm/init_32.c            |  2 ++
>>>  arch/x86/power/cpu.c             |  3 +++
>>>  6 files changed, 28 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>> index f1d4e8f2131f..b4ed35db10a8 100644
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -912,6 +912,7 @@ config MAXSMP
>>>  config NR_CPUS
>>>         int "Maximum number of CPUs" if SMP && !MAXSMP
>>>         range 2 8 if SMP && X86_32 && !X86_BIGSMP
>>> +       range 2 256 if SMP && X86_32 && X86_BIGSMP
>>>         range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK
>>>         range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64
>>>         default "1" if !SMP
>>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>>> index c46289799b02..8b913b5e9383 100644
>>> --- a/arch/x86/include/asm/fixmap.h
>>> +++ b/arch/x86/include/asm/fixmap.h
>>> @@ -100,6 +100,10 @@ enum fixed_addresses {
>>>  #ifdef CONFIG_X86_INTEL_MID
>>>         FIX_LNW_VRTC,
>>>  #endif
>>> +       /* Fixmap entries to remap the GDTs, one per processor. */
>>> +       FIX_GDT_REMAP_BEGIN,
>>> +       FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>>> +
>>>         __end_of_permanent_fixed_addresses,
>>>
>>>         /*
>>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>>> index 1be64da0384e..280211ad8be9 100644
>>> --- a/arch/x86/include/asm/processor.h
>>> +++ b/arch/x86/include/asm/processor.h
>>> @@ -705,6 +705,7 @@ extern struct desc_ptr              early_gdt_descr;
>>>
>>>  extern void cpu_set_gdt(int);
>>>  extern void switch_to_new_gdt(int);
>>> +extern void load_remapped_gdt(int);
>>>  extern void load_percpu_segment(int);
>>>  extern void cpu_init(void);
>>>
>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>> index e97ffc8d29f4..7d940b0e805a 100644
>>> --- a/arch/x86/kernel/cpu/common.c
>>> +++ b/arch/x86/kernel/cpu/common.c
>>> @@ -443,6 +443,19 @@ void load_percpu_segment(int cpu)
>>>         load_stack_canary_segment();
>>>  }
>>>
>>> +/* Load a fixmap remapping of the per-cpu GDT */
>>> +void load_remapped_gdt(int cpu)
>>> +{
>>> +       struct desc_ptr gdt_descr;
>>> +       unsigned long idx = FIX_GDT_REMAP_BEGIN + cpu;
>>> +
>>> +       __set_fixmap(idx, __pa(get_cpu_gdt_table(cpu)), PAGE_KERNEL);
>>> +
>>> +       gdt_descr.address = (long)__fix_to_virt(idx);
>>> +       gdt_descr.size = GDT_SIZE - 1;
>>> +       load_gdt(&gdt_descr);
>>> +}
>>
>> IMO this should be split into two functions, one to set up the fixmap
>> entry and one to load the GDT.
>>
>
> That make sense.
>
>> Also, would it be possible to rename the various gdt helpers so that
>> their functionality is more obvious?  For example, get_cpu_gdt_table()
>> could be get_cpu_direct_gdt_table() and the function to load the gdt
>> could be load_fixmap_gdt().
>>
>
> Sure no problem.
>
>> --Andy
>
>
>
> --
> Thomas



-- 
Thomas

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-01-25 20:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-20 16:41 [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Thomas Garnier
2017-01-20 16:41 ` [PATCH v1 2/3] x86: Remap GDT tables in the Fixmap section Thomas Garnier
2017-01-21  0:57   ` Andy Lutomirski
2017-01-21  1:06     ` Thomas Garnier
2017-01-25 20:10       ` Thomas Garnier
2017-01-21  2:23   ` kbuild test robot
2017-01-21  2:34   ` kbuild test robot
2017-01-20 16:41 ` [PATCH v1 3/3] x86: Make the GDT remapping read-only on 64 bit Thomas Garnier
2017-01-21  1:06   ` Andy Lutomirski
2017-01-21  1:14     ` Thomas Garnier
2017-01-20 22:24 ` [PATCH v1 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size Andy Lutomirski
2017-01-21  2:43 ` kbuild test robot
2017-01-21  3:21 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).