From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755362AbcDNOOL (ORCPT ); Thu, 14 Apr 2016 10:14:11 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36576 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752654AbcDNOOJ (ORCPT ); Thu, 14 Apr 2016 10:14:09 -0400 Date: Thu, 14 Apr 2016 16:13:59 +0200 From: Ingo Molnar To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra Subject: [GIT PULL] x86 fixes Message-ID: <20160414141358.GA14587@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus, Please pull the latest x86-urgent-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-for-linus # HEAD: a3125494cff084b098c80bb36fbe2061ffed9d52 x86/mce: Avoid using object after free in genpool Misc fixes: a binutils fix, an lguest fix, an mcelog fix and a missing documentation fix. out-of-topic modifications in x86-urgent-for-linus: ----------------------------------------------------- drivers/lguest/interrupts_and_traps.c# f87e0434a3be: lguest, x86/entry/32: Fix ha drivers/lguest/lg.h # f87e0434a3be: lguest, x86/entry/32: Fix ha drivers/lguest/x86/core.c # f87e0434a3be: lguest, x86/entry/32: Fix ha Thanks, Ingo ------------------> Dave Hansen (1): x86/mm/pkeys: Add missing Documentation H.J. Lu (1): x86/build: Build compressed x86 kernels as PIE Rusty Russell (1): lguest, x86/entry/32: Fix handling of guest syscalls using interrupt gates Tony Luck (1): x86/mce: Avoid using object after free in genpool Documentation/x86/protection-keys.txt | 27 +++++++++++++++++++++++++++ arch/x86/boot/compressed/Makefile | 14 +++++++++++++- arch/x86/boot/compressed/head_32.S | 28 ++++++++++++++++++++++++++++ arch/x86/boot/compressed/head_64.S | 8 ++++++++ arch/x86/kernel/cpu/mcheck/mce-genpool.c | 4 ++-- drivers/lguest/interrupts_and_traps.c | 6 +++++- drivers/lguest/lg.h | 1 + drivers/lguest/x86/core.c | 6 +++++- 8 files changed, 89 insertions(+), 5 deletions(-) create mode 100644 Documentation/x86/protection-keys.txt diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.txt new file mode 100644 index 000000000000..c281ded1ba16 --- /dev/null +++ b/Documentation/x86/protection-keys.txt @@ -0,0 +1,27 @@ +Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature +which will be found on future Intel CPUs. + +Memory Protection Keys provides a mechanism for enforcing page-based +protections, but without requiring modification of the page tables +when an application changes protection domains. It works by +dedicating 4 previously ignored bits in each page table entry to a +"protection key", giving 16 possible keys. + +There is also a new user-accessible register (PKRU) with two separate +bits (Access Disable and Write Disable) for each key. Being a CPU +register, PKRU is inherently thread-local, potentially giving each +thread a different set of protections from every other thread. + +There are two new instructions (RDPKRU/WRPKRU) for reading and writing +to the new register. The feature is only available in 64-bit mode, +even though there is theoretically space in the PAE PTEs. These +permissions are enforced on data access only and have no effect on +instruction fetches. + +=========================== Config Option =========================== + +This config option adds approximately 1.5kb of text. and 50 bytes of +data to the executable. A workload which does large O_DIRECT reads +of holes in XFS files was run to exercise get_user_pages_fast(). No +performance delta was observed with the config option +enabled or disabled. diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 6915ff2bd996..8774cb23064f 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -26,7 +26,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \ vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2 -KBUILD_CFLAGS += -fno-strict-aliasing -fPIC +KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC) KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING cflags-$(CONFIG_X86_32) := -march=i386 cflags-$(CONFIG_X86_64) := -mcmodel=small @@ -40,6 +40,18 @@ GCOV_PROFILE := n UBSAN_SANITIZE :=n LDFLAGS := -m elf_$(UTS_MACHINE) +ifeq ($(CONFIG_RELOCATABLE),y) +# If kernel is relocatable, build compressed kernel as PIE. +ifeq ($(CONFIG_X86_32),y) +LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker) +else +# To build 64-bit compressed kernel as PIE, we disable relocation +# overflow check to avoid relocation overflow error with a new linker +# command-line option, -z noreloc-overflow. +LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \ + && echo "-z noreloc-overflow -pie --no-dynamic-linker") +endif +endif LDFLAGS_vmlinux := -T hostprogs-y := mkpiggy diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index 8ef964ddc18e..0256064da8da 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -31,6 +31,34 @@ #include #include +/* + * The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X + * relocation to get the symbol address in PIC. When the compressed x86 + * kernel isn't built as PIC, the linker optimizes R_386_GOT32X + * relocations to their fixed symbol addresses. However, when the + * compressed x86 kernel is loaded at a different address, it leads + * to the following load failure: + * + * Failed to allocate space for phdrs + * + * during the decompression stage. + * + * If the compressed x86 kernel is relocatable at run-time, it should be + * compiled with -fPIE, instead of -fPIC, if possible and should be built as + * Position Independent Executable (PIE) so that linker won't optimize + * R_386_GOT32X relocation to its fixed symbol address. Older + * linkers generate R_386_32 relocations against locally defined symbols, + * _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less + * optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle + * R_386_32 relocations when relocating the kernel. To generate + * R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as + * hidden: + */ + .hidden _bss + .hidden _ebss + .hidden _got + .hidden _egot + __HEAD ENTRY(startup_32) #ifdef CONFIG_EFI_STUB diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index b0c0d16ef58d..86558a199139 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -33,6 +33,14 @@ #include #include +/* + * Locally defined symbols should be marked hidden: + */ + .hidden _bss + .hidden _ebss + .hidden _got + .hidden _egot + __HEAD .code32 ENTRY(startup_32) diff --git a/arch/x86/kernel/cpu/mcheck/mce-genpool.c b/arch/x86/kernel/cpu/mcheck/mce-genpool.c index 0a850100c594..2658e2af74ec 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-genpool.c +++ b/arch/x86/kernel/cpu/mcheck/mce-genpool.c @@ -29,7 +29,7 @@ static char gen_pool_buf[MCE_POOLSZ]; void mce_gen_pool_process(void) { struct llist_node *head; - struct mce_evt_llist *node; + struct mce_evt_llist *node, *tmp; struct mce *mce; head = llist_del_all(&mce_event_llist); @@ -37,7 +37,7 @@ void mce_gen_pool_process(void) return; head = llist_reverse_order(head); - llist_for_each_entry(node, head, llnode) { + llist_for_each_entry_safe(node, tmp, head, llnode) { mce = &node->mce; atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce); gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node)); diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index eb934b0242e0..67392b6ab845 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -331,7 +331,7 @@ void set_interrupt(struct lg_cpu *cpu, unsigned int irq) * Actually now I think of it, it's possible that Ron *is* half the Plan 9 * userbase. Oh well. */ -static bool could_be_syscall(unsigned int num) +bool could_be_syscall(unsigned int num) { /* Normal Linux IA32_SYSCALL_VECTOR or reserved vector? */ return num == IA32_SYSCALL_VECTOR || num == syscall_vector; @@ -416,6 +416,10 @@ bool deliver_trap(struct lg_cpu *cpu, unsigned int num) * * This routine indicates if a particular trap number could be delivered * directly. + * + * Unfortunately, Linux 4.6 started using an interrupt gate instead of a + * trap gate for syscalls, so this trick is ineffective. See Mastery for + * how we could do this anyway... */ static bool direct_trap(unsigned int num) { diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index ac8ad0461e80..69b3814afd2f 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -167,6 +167,7 @@ void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta); bool send_notify_to_eventfd(struct lg_cpu *cpu); void init_clockdev(struct lg_cpu *cpu); bool check_syscall_vector(struct lguest *lg); +bool could_be_syscall(unsigned int num); int init_interrupts(void); void free_interrupts(void); diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c index 6a4cd771a2be..adc162c7040d 100644 --- a/drivers/lguest/x86/core.c +++ b/drivers/lguest/x86/core.c @@ -429,8 +429,12 @@ void lguest_arch_handle_trap(struct lg_cpu *cpu) return; break; case 32 ... 255: + /* This might be a syscall. */ + if (could_be_syscall(cpu->regs->trapnum)) + break; + /* - * These values mean a real interrupt occurred, in which case + * Other values mean a real interrupt occurred, in which case * the Host handler has already been run. We just do a * friendly check if another process should now be run, then * return to run the Guest again.