From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753416AbdKXJRO (ORCPT ); Fri, 24 Nov 2017 04:17:14 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:46312 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753423AbdKXJQC (ORCPT ); Fri, 24 Nov 2017 04:16:02 -0500 X-Google-Smtp-Source: AGs4zMbI7IOtyK1onHOp8Zh/Qd+KFVlJgkTunki0x7evvZwmxp6zfYAJWaeXHxnhArNnN2EsWQ8hvg== From: Ingo Molnar To: linux-kernel@vger.kernel.org Cc: Dave Hansen , Andy Lutomirski , Thomas Gleixner , "H . Peter Anvin" , Peter Zijlstra , Borislav Petkov , Linus Torvalds Subject: [PATCH 42/43] x86/mm/kaiser: Allow KAISER to be enabled/disabled at runtime Date: Fri, 24 Nov 2017 10:14:47 +0100 Message-Id: <20171124091448.7649-43-mingo@kernel.org> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171124091448.7649-1-mingo@kernel.org> References: <20171124091448.7649-1-mingo@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen The KAISER CR3 switches are expensive for many reasons. Not all systems benefit from the protection provided by KAISER. Some of them can not pay the high performance cost. This patch adds a debugfs file. To disable KAISER, you do: echo 0 > /sys/kernel/debug/x86/kaiser-enabled and to re-enable it, you can: echo 1 > /sys/kernel/debug/x86/kaiser-enabled This is a *minimal* implementation. There are certainly plenty of optimizations that can be done on top of this by using ALTERNATIVES among other things. This does, however, completely remove all the KAISER-based CR3 writes. This permits a paravirtualized system that can not tolerate CR3 writes to theoretically survive with CONFIG_KAISER=y, albeit with /sys/kernel/debug/x86/kaiser-enabled=0. Signed-off-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Daniel Gruss Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Hugh Dickins Cc: Josh Poimboeuf Cc: Kees Cook Cc: Linus Torvalds Cc: Michael Schwarz Cc: Moritz Lipp Cc: Peter Zijlstra Cc: Richard Fellner Cc: Thomas Gleixner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20171123003523.28FFBAB6@viggo.jf.intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h | 12 +++++++++ arch/x86/mm/kaiser.c | 70 +++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 78 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 66af80514197..89ccf7ae0e23 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -209,19 +209,29 @@ For 32-bit we have the following conventions - kernel is built with orq $(KAISER_SWITCH_MASK), \reg .endm +.macro JUMP_IF_KAISER_OFF label + testq $1, kaiser_asm_do_switch + jz \label +.endm + .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req + JUMP_IF_KAISER_OFF .Lswitch_done_\@ mov %cr3, \scratch_reg ADJUST_KERNEL_CR3 \scratch_reg mov \scratch_reg, %cr3 +.Lswitch_done_\@: .endm .macro SWITCH_TO_USER_CR3 scratch_reg:req + JUMP_IF_KAISER_OFF .Lswitch_done_\@ mov %cr3, \scratch_reg ADJUST_USER_CR3 \scratch_reg mov \scratch_reg, %cr3 +.Lswitch_done_\@: .endm .macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req + JUMP_IF_KAISER_OFF .Ldone_\@ movq %cr3, %r\scratch_reg movq %r\scratch_reg, \save_reg /* @@ -244,11 +254,13 @@ For 32-bit we have the following conventions - kernel is built with .endm .macro RESTORE_CR3 save_reg:req + JUMP_IF_KAISER_OFF .Ldone_\@ /* * The CR3 write could be avoided when not changing its value, * but would require a CR3 read *and* a scratch register. */ movq \save_reg, %cr3 +.Ldone_\@: .endm #else /* CONFIG_KAISER=n: */ diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c index 06966b111280..1eb27b410556 100644 --- a/arch/x86/mm/kaiser.c +++ b/arch/x86/mm/kaiser.c @@ -43,6 +43,9 @@ #define KAISER_WALK_ATOMIC 0x1 +__aligned(PAGE_SIZE) +unsigned long kaiser_asm_do_switch[PAGE_SIZE/sizeof(unsigned long)] = { 1 }; + /* * At runtime, the only things we map are some things for CPU * hotplug, and stacks for new processes. No two CPUs will ever @@ -395,6 +398,9 @@ void __init kaiser_init(void) kaiser_init_all_pgds(); + kaiser_add_user_map_early(&kaiser_asm_do_switch, PAGE_SIZE, + __PAGE_KERNEL | _PAGE_GLOBAL); + for_each_possible_cpu(cpu) { void *percpu_vaddr = __per_cpu_user_mapped_start + per_cpu_offset(cpu); @@ -483,6 +489,56 @@ static ssize_t kaiser_enabled_read_file(struct file *file, char __user *user_buf return simple_read_from_buffer(user_buf, count, ppos, buf, len); } +enum poison { + KAISER_POISON, + KAISER_UNPOISON +}; +void kaiser_poison_pgds(enum poison do_poison); + +void kaiser_do_disable(void) +{ + /* Make sure the kernel PGDs are usable by userspace: */ + kaiser_poison_pgds(KAISER_UNPOISON); + + /* + * Make sure all the CPUs have the poison clear in their TLBs. + * This also functions as a barrier to ensure that everyone + * sees the unpoisoned PGDs. + */ + flush_tlb_all(); + + /* Tell the assembly code to stop switching CR3. */ + kaiser_asm_do_switch[0] = 0; + + /* + * Make sure everybody does an interrupt. This means that + * they have gone through a SWITCH_TO_KERNEL_CR3 amd are no + * longer running on the userspace CR3. If we did not do + * this, we might have CPUs running on the shadow page tables + * that then enter the kernel and think they do *not* need to + * switch. + */ + flush_tlb_all(); +} + +void kaiser_do_enable(void) +{ + /* Tell the assembly code to start switching CR3: */ + kaiser_asm_do_switch[0] = 1; + + /* Make sure everyone can see the kaiser_asm_do_switch update: */ + synchronize_rcu(); + + /* + * Now that userspace is no longer using the kernel copy of + * the page tables, we can poison it: + */ + kaiser_poison_pgds(KAISER_POISON); + + /* Make sure all the CPUs see the poison: */ + flush_tlb_all(); +} + static ssize_t kaiser_enabled_write_file(struct file *file, const char __user *user_buf, size_t count, loff_t *ppos) { @@ -504,7 +560,17 @@ static ssize_t kaiser_enabled_write_file(struct file *file, if (kaiser_enabled == enable) return count; + /* + * This tells the page table code to stop poisoning PGDs + */ WRITE_ONCE(kaiser_enabled, enable); + synchronize_rcu(); + + if (enable) + kaiser_do_enable(); + else + kaiser_do_disable(); + return count; } @@ -522,10 +588,6 @@ static int __init create_kaiser_enabled(void) } late_initcall(create_kaiser_enabled); -enum poison { - KAISER_POISON, - KAISER_UNPOISON -}; void kaiser_poison_pgd_page(pgd_t *pgd_page, enum poison do_poison) { int i = 0; -- 2.14.1