* [patch V2 0/5] x86/kaiser: Boot time disabling and debug support
@ 2017-11-26 23:14 Thomas Gleixner
2017-11-26 23:14 ` [patch V2 1/5] x86/kaiser: Respect disabled CPU features Thomas Gleixner
` (4 more replies)
0 siblings, 5 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-26 23:14 UTC (permalink / raw)
To: LKML
Cc: Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Peter Zijlstra, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
This patch series applies on top of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/mm
It contains the following updates:
- Don't set NX/PAGE_GLOBAL unconditionally
- Get rid of the compile time PAGE_GLOBAL disabling
- Add debug support for WX mappings in the KAISER shadow table
- Provide debug files to dump the kernel and the user page table for the
current task.
- Add a boot time switch to disable KAISER. This does not yet take care of
the 8k PGD allocations, but that can be done on top.
Changes vs. V1:
- Prevent setting PAGE_GLOBAL/NX when not supported or disabled
- Restructured the debug stuff a bit
- Extended the boot time disable to debug stuff
I tried to reenable paravirt by disabling kaiser at boot time when XEN_PV
is detected. XEN_PV is the only one having CR3 access paravirtualized,
which will explode nicely in the enter/exit code.
But enabling KAISER has some weird not yet debugged side effects even on
KVM guests. Will look at that tomorrow morning.
Thanks,
tglx
---
arch/x86/entry/calling.h | 7 +++
arch/x86/include/asm/kaiser.h | 10 ++++
arch/x86/include/asm/pgtable.h | 1
arch/x86/include/asm/pgtable_64.h | 9 +++
arch/x86/include/asm/pgtable_types.h | 16 ------
arch/x86/mm/debug_pagetables.c | 81 ++++++++++++++++++++++++++++++++---
arch/x86/mm/dump_pagetables.c | 32 +++++++++++--
arch/x86/mm/init.c | 14 ++++--
arch/x86/mm/kaiser.c | 42 +++++++++++++++++-
arch/x86/mm/pageattr.c | 16 +++---
security/Kconfig | 2
11 files changed, 190 insertions(+), 40 deletions(-)
^ permalink raw reply [flat|nested] 32+ messages in thread
* [patch V2 1/5] x86/kaiser: Respect disabled CPU features
2017-11-26 23:14 [patch V2 0/5] x86/kaiser: Boot time disabling and debug support Thomas Gleixner
@ 2017-11-26 23:14 ` Thomas Gleixner
2017-11-27 9:57 ` Peter Zijlstra
2017-11-27 18:11 ` Dave Hansen
2017-11-26 23:14 ` [patch V2 2/5] x86/kaiser: Simplify disabling of global pages Thomas Gleixner
` (3 subsequent siblings)
4 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-26 23:14 UTC (permalink / raw)
To: LKML
Cc: Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Peter Zijlstra, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
[-- Attachment #1: x86-kaiser--Respect-disabled-features.patch --]
[-- Type: text/plain, Size: 1958 bytes --]
PAGE_NX and PAGE_GLOBAL might be not supported or disabled on the command
line, but KAISER sets them unconditionally.
Add proper protection against that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/include/asm/pgtable_64.h | 3 ++-
arch/x86/mm/kaiser.c | 12 +++++++++++-
2 files changed, 13 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -222,7 +222,8 @@ static inline pgd_t kaiser_set_shadow_pg
* wrong CR3 value, userspace will crash
* instead of running.
*/
- pgd.pgd |= _PAGE_NX;
+ if (__supported_pte_mask & _PAGE_NX)
+ pgd.pgd |= _PAGE_NX;
}
} else if (pgd_userspace_access(*pgdp)) {
/*
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -42,6 +42,8 @@
#define KAISER_WALK_ATOMIC 0x1
+static pteval_t kaiser_pte_mask __ro_after_init = ~(_PAGE_NX | _PAGE_GLOBAL);
+
/*
* At runtime, the only things we map are some things for CPU
* hotplug, and stacks for new processes. No two CPUs will ever
@@ -244,11 +246,14 @@ static pte_t *kaiser_shadow_pagetable_wa
int kaiser_add_user_map(const void *__start_addr, unsigned long size,
unsigned long flags)
{
- pte_t *pte;
unsigned long start_addr = (unsigned long)__start_addr;
unsigned long address = start_addr & PAGE_MASK;
unsigned long end_addr = PAGE_ALIGN(start_addr + size);
unsigned long target_address;
+ pte_t *pte;
+
+ /* Clear not supported bits */
+ flags &= kaiser_pte_mask;
for (; address < end_addr; address += PAGE_SIZE) {
target_address = get_pa_from_kernel_map(address);
@@ -308,6 +313,11 @@ static void __init kaiser_init_all_pgds(
pgd_t *pgd;
int i;
+ if (__supported_pte_mask & _PAGE_NX)
+ kaiser_pte_mask |= _PAGE_NX;
+ if (boot_cpu_has(X86_FEATURE_PGE))
+ kaiser_pte_mask |= _PAGE_GLOBAL;
+
pgd = kernel_to_shadow_pgdp(pgd_offset_k(0UL));
for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
/*
^ permalink raw reply [flat|nested] 32+ messages in thread
* [patch V2 2/5] x86/kaiser: Simplify disabling of global pages
2017-11-26 23:14 [patch V2 0/5] x86/kaiser: Boot time disabling and debug support Thomas Gleixner
2017-11-26 23:14 ` [patch V2 1/5] x86/kaiser: Respect disabled CPU features Thomas Gleixner
@ 2017-11-26 23:14 ` Thomas Gleixner
2017-11-27 11:49 ` Thomas Gleixner
2017-11-27 18:15 ` Dave Hansen
2017-11-26 23:14 ` [patch V2 3/5] x86/dump_pagetables: Check KAISER shadow page table for WX pages Thomas Gleixner
` (2 subsequent siblings)
4 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-26 23:14 UTC (permalink / raw)
To: LKML
Cc: Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Peter Zijlstra, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
[-- Attachment #1: x86-kaiser--Make-page-global-disabling-sane.patch --]
[-- Type: text/plain, Size: 3990 bytes --]
The current way of disabling global pages at compile time prevents boot
time disabling of kaiser and creates unnecessary indirections.
Global pages can be supressed by __supported_pte_mask as well. The shadow
mappings set PAGE_GLOBAL for the minimal kernel mappings which are required
for entry/exit. These mappings are setup manually so the filtering does not
take place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/include/asm/pgtable_types.h | 16 +---------------
arch/x86/mm/init.c | 13 ++++++++++---
arch/x86/mm/pageattr.c | 16 ++++++++--------
3 files changed, 19 insertions(+), 26 deletions(-)
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -191,23 +191,9 @@ enum page_cache_mode {
#define PAGE_READONLY_EXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \
_PAGE_ACCESSED)
-/*
- * Disable global pages for anything using the default
- * __PAGE_KERNEL* macros.
- *
- * PGE will still be enabled and _PAGE_GLOBAL may still be used carefully
- * for a few selected kernel mappings which must be visible to userspace,
- * when KAISER is enabled, like the entry/exit code and data.
- */
-#ifdef CONFIG_KAISER
-#define __PAGE_KERNEL_GLOBAL 0
-#else
-#define __PAGE_KERNEL_GLOBAL _PAGE_GLOBAL
-#endif
-
#define __PAGE_KERNEL_EXEC \
(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | \
- __PAGE_KERNEL_GLOBAL)
+ _PAGE_GLOBAL)
#define __PAGE_KERNEL (__PAGE_KERNEL_EXEC | _PAGE_NX)
#define __PAGE_KERNEL_RO (__PAGE_KERNEL & ~_PAGE_RW)
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -161,6 +161,13 @@ struct map_range {
static int page_size_mask;
+static void enable_global_pages(void)
+{
+#ifndef CONFIG_KAISER
+ __supported_pte_mask |= _PAGE_GLOBAL;
+#endif
+}
+
static void __init probe_page_size_mask(void)
{
/*
@@ -179,11 +186,11 @@ static void __init probe_page_size_mask(
cr4_set_bits_and_update_boot(X86_CR4_PSE);
/* Enable PGE if available */
+ __supported_pte_mask |= _PAGE_GLOBAL;
if (boot_cpu_has(X86_FEATURE_PGE)) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
- __supported_pte_mask |= _PAGE_GLOBAL;
- } else
- __supported_pte_mask &= ~_PAGE_GLOBAL;
+ enable_global_pages();
+ }
/* Enable 1 GB linear kernel mappings if available: */
if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -585,9 +585,9 @@ try_preserve_large_page(pte_t *kpte, uns
* for the ancient hardware that doesn't support it.
*/
if (pgprot_val(req_prot) & _PAGE_PRESENT)
- pgprot_val(req_prot) |= _PAGE_PSE | __PAGE_KERNEL_GLOBAL;
+ pgprot_val(req_prot) |= _PAGE_PSE | _PAGE_GLOBAL;
else
- pgprot_val(req_prot) &= ~(_PAGE_PSE | __PAGE_KERNEL_GLOBAL);
+ pgprot_val(req_prot) &= ~(_PAGE_PSE | _PAGE_GLOBAL);
req_prot = canon_pgprot(req_prot);
@@ -705,9 +705,9 @@ static int
* for the ancient hardware that doesn't support it.
*/
if (pgprot_val(ref_prot) & _PAGE_PRESENT)
- pgprot_val(ref_prot) |= __PAGE_KERNEL_GLOBAL;
+ pgprot_val(ref_prot) |= _PAGE_GLOBAL;
else
- pgprot_val(ref_prot) &= ~__PAGE_KERNEL_GLOBAL;
+ pgprot_val(ref_prot) &= ~_PAGE_GLOBAL;
/*
* Get the target pfn from the original entry:
@@ -938,9 +938,9 @@ static void populate_pte(struct cpa_data
* support it.
*/
if (pgprot_val(pgprot) & _PAGE_PRESENT)
- pgprot_val(pgprot) |= __PAGE_KERNEL_GLOBAL;
+ pgprot_val(pgprot) |= _PAGE_GLOBAL;
else
- pgprot_val(pgprot) &= ~__PAGE_KERNEL_GLOBAL;
+ pgprot_val(pgprot) &= ~_PAGE_GLOBAL;
pgprot = canon_pgprot(pgprot);
@@ -1242,9 +1242,9 @@ static int __change_page_attr(struct cpa
* support it.
*/
if (pgprot_val(new_prot) & _PAGE_PRESENT)
- pgprot_val(new_prot) |= __PAGE_KERNEL_GLOBAL;
+ pgprot_val(new_prot) |= _PAGE_GLOBAL;
else
- pgprot_val(new_prot) &= ~__PAGE_KERNEL_GLOBAL;
+ pgprot_val(new_prot) &= ~_PAGE_GLOBAL;
/*
* We need to keep the pfn from the existing PTE,
^ permalink raw reply [flat|nested] 32+ messages in thread
* [patch V2 3/5] x86/dump_pagetables: Check KAISER shadow page table for WX pages
2017-11-26 23:14 [patch V2 0/5] x86/kaiser: Boot time disabling and debug support Thomas Gleixner
2017-11-26 23:14 ` [patch V2 1/5] x86/kaiser: Respect disabled CPU features Thomas Gleixner
2017-11-26 23:14 ` [patch V2 2/5] x86/kaiser: Simplify disabling of global pages Thomas Gleixner
@ 2017-11-26 23:14 ` Thomas Gleixner
2017-11-27 18:17 ` Dave Hansen
2017-11-26 23:14 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner
2017-11-26 23:14 ` [patch V2 5/5] x86/kaiser: Add boottime disable switch Thomas Gleixner
4 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-26 23:14 UTC (permalink / raw)
To: LKML
Cc: Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Peter Zijlstra, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
[-- Attachment #1: x86-kaiser--Check-shadow-pagetable-for-WX-mappings.patch --]
[-- Type: text/plain, Size: 2661 bytes --]
ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages,
but does not check the KAISER shadow page table.
Restructure the code so that dmesg output is selected by an explicit
argument and not implicit via checking the pgd argument for !NULL.
Add the check for the shadow page table.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/include/asm/pgtable.h | 1 +
arch/x86/mm/debug_pagetables.c | 2 +-
arch/x86/mm/dump_pagetables.c | 27 ++++++++++++++++++++++-----
3 files changed, 24 insertions(+), 6 deletions(-)
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,6 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);
#ifdef CONFIG_DEBUG_WX
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -5,7 +5,7 @@
static int ptdump_show(struct seq_file *m, void *v)
{
- ptdump_walk_pgd_level(m, NULL);
+ ptdump_walk_pgd_level_debugfs(m, NULL);
return 0;
}
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -447,7 +447,7 @@ static inline bool is_hypervisor_range(i
}
static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
- bool checkwx)
+ bool checkwx, bool dmesg)
{
#ifdef CONFIG_X86_64
pgd_t *start = (pgd_t *) &init_top_pgt;
@@ -460,7 +460,7 @@ static void ptdump_walk_pgd_level_core(s
if (pgd) {
start = pgd;
- st.to_dmesg = true;
+ st.to_dmesg = dmesg;
}
st.check_wx = checkwx;
@@ -498,13 +498,30 @@ static void ptdump_walk_pgd_level_core(s
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
{
- ptdump_walk_pgd_level_core(m, pgd, false);
+ ptdump_walk_pgd_level_core(m, pgd, false, true);
+}
+
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd)
+{
+ ptdump_walk_pgd_level_core(m, pgd, false, false);
+}
+EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
+
+void ptdump_walk_shadow_pgd_level_checkwx(void)
+{
+#ifdef CONFIG_KAISER
+ pgd_t *pgd = (pgd_t *) &init_top_pgt;
+
+ pr_info("x86/mm: Checking shadow page tables\n");
+ pgd += PTRS_PER_PGD;
+ ptdump_walk_pgd_level_core(NULL, pgd, true, false);
+#endif
}
-EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level);
void ptdump_walk_pgd_level_checkwx(void)
{
- ptdump_walk_pgd_level_core(NULL, NULL, true);
+ ptdump_walk_pgd_level_core(NULL, NULL, true, false);
+ ptdump_walk_shadow_pgd_level_checkwx();
}
static int __init pt_dump_init(void)
^ permalink raw reply [flat|nested] 32+ messages in thread
* [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables
2017-11-26 23:14 [patch V2 0/5] x86/kaiser: Boot time disabling and debug support Thomas Gleixner
` (2 preceding siblings ...)
2017-11-26 23:14 ` [patch V2 3/5] x86/dump_pagetables: Check KAISER shadow page table for WX pages Thomas Gleixner
@ 2017-11-26 23:14 ` Thomas Gleixner
2017-11-27 9:41 ` Peter Zijlstra
2017-11-27 18:18 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Dave Hansen
2017-11-26 23:14 ` [patch V2 5/5] x86/kaiser: Add boottime disable switch Thomas Gleixner
4 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-26 23:14 UTC (permalink / raw)
To: LKML
Cc: Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Peter Zijlstra, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
[-- Attachment #1: x86-mm-dump_pagetables--Add-support-for-KAISER-tables.patch --]
[-- Type: text/plain, Size: 4404 bytes --]
Add two debugfs files which allow to dump the pagetable of the current task.
current_page_tables_knl dumps the regular page table. This is the page
table which is normally shared between kernel and user space. If KAISER is
enabled this is the kernel space mapping.
If KAISER is enabled the second file, current_page_tables_usr, dumps the
user space page table.
These files allow to verify the resulting page tables for KAISER, but even
in the non KAISER case its useful to be able to inspect user space page
tables of current for debugging purposes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/include/asm/pgtable.h | 2 -
arch/x86/mm/debug_pagetables.c | 81 +++++++++++++++++++++++++++++++++++++----
arch/x86/mm/dump_pagetables.c | 4 +-
3 files changed, 79 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,7 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool shadow);
void ptdump_walk_pgd_level_checkwx(void);
#ifdef CONFIG_DEBUG_WX
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -5,7 +5,7 @@
static int ptdump_show(struct seq_file *m, void *v)
{
- ptdump_walk_pgd_level_debugfs(m, NULL);
+ ptdump_walk_pgd_level_debugfs(m, NULL, false);
return 0;
}
@@ -22,21 +22,90 @@ static const struct file_operations ptdu
.release = single_release,
};
-static struct dentry *pe;
+static int ptdump_show_curknl(struct seq_file *m, void *v)
+{
+ if (current->mm->pgd) {
+ down_read(¤t->mm->mmap_sem);
+ ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
+ up_read(¤t->mm->mmap_sem);
+ }
+ return 0;
+}
+
+static int ptdump_open_curknl(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, ptdump_show_curknl, NULL);
+}
+
+static const struct file_operations ptdump_curknl_fops = {
+ .owner = THIS_MODULE,
+ .open = ptdump_open_curknl,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+#ifdef CONFIG_KAISER
+static int ptdump_show_curusr(struct seq_file *m, void *v)
+{
+ if (current->mm->pgd) {
+ down_read(¤t->mm->mmap_sem);
+ ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
+ up_read(¤t->mm->mmap_sem);
+ }
+ return 0;
+}
+
+static int ptdump_open_curusr(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, ptdump_show_curusr, NULL);
+}
+
+static const struct file_operations ptdump_curusr_fops = {
+ .owner = THIS_MODULE,
+ .open = ptdump_open_curusr,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+#endif
+
+static struct dentry *pe_knl, *pe_curknl, *pe_curusr;
+
+static void pt_dump_debug_remove_files(void)
+{
+ debugfs_remove_recursive(pe_knl);
+ debugfs_remove_recursive(pe_curknl);
+ debugfs_remove_recursive(pe_curusr);
+}
static int __init pt_dump_debug_init(void)
{
- pe = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL,
- &ptdump_fops);
- if (!pe)
+ pe_knl = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL,
+ &ptdump_fops);
+ if (!pe_knl)
return -ENOMEM;
+ pe_curknl = debugfs_create_file("current_page_tables_knl", S_IRUSR,
+ NULL, NULL, &ptdump_curknl_fops);
+ if (!pe_curknl)
+ goto err;
+
+#ifdef CONFIG_KAISER
+ pe_curusr = debugfs_create_file("current_page_tables_usr", S_IRUSR,
+ NULL, NULL, &ptdump_curusr_fops);
+ if (!pe_curusr)
+ goto err;
+#endif
return 0;
+err:
+ pt_dump_debug_remove_files();
+ return -ENOMEM;
}
static void __exit pt_dump_debug_exit(void)
{
- debugfs_remove_recursive(pe);
+ pt_dump_debug_remove_files();
}
module_init(pt_dump_debug_init);
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -501,8 +501,10 @@ void ptdump_walk_pgd_level(struct seq_fi
ptdump_walk_pgd_level_core(m, pgd, false, true);
}
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd)
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool shadow)
{
+ if (shadow)
+ pgd += PTRS_PER_PGD;
ptdump_walk_pgd_level_core(m, pgd, false, false);
}
EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
^ permalink raw reply [flat|nested] 32+ messages in thread
* [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-26 23:14 [patch V2 0/5] x86/kaiser: Boot time disabling and debug support Thomas Gleixner
` (3 preceding siblings ...)
2017-11-26 23:14 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner
@ 2017-11-26 23:14 ` Thomas Gleixner
2017-11-27 9:48 ` Peter Zijlstra
2017-11-27 18:22 ` Dave Hansen
4 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-26 23:14 UTC (permalink / raw)
To: LKML
Cc: Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Peter Zijlstra, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
[-- Attachment #1: x86-kaiser--Add-boottime-disable-switch.patch --]
[-- Type: text/plain, Size: 6942 bytes --]
KAISER comes with overhead. The most expensive part is the CR3 switching in
the entry code.
Add a command line parameter which allows to disable KAISER at boot time.
Most code pathes simply check a variable, but the entry code uses a static
branch. The other code pathes cannot use a static branch because they are
used before jump label patching is possible. Not an issue as the code
pathes are not so performance sensitive as the entry/exit code.
This makes KAISER depend on JUMP_LABEL and on a GCC which supports
it, but that's a resonable requirement.
The PGD allocation is still 8k when CONFIG_KAISER is enabled. This can be
addressed on top of this.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/entry/calling.h | 7 +++++++
arch/x86/include/asm/kaiser.h | 10 ++++++++++
arch/x86/include/asm/pgtable_64.h | 6 ++++++
arch/x86/mm/dump_pagetables.c | 5 ++++-
arch/x86/mm/init.c | 7 ++++---
arch/x86/mm/kaiser.c | 30 ++++++++++++++++++++++++++++++
security/Kconfig | 2 +-
7 files changed, 62 insertions(+), 5 deletions(-)
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -210,18 +210,23 @@ For 32-bit we have the following convent
.endm
.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+ STATIC_JUMP_IF_FALSE .Lend_\@, kaiser_enabled_key, def=1
mov %cr3, \scratch_reg
ADJUST_KERNEL_CR3 \scratch_reg
mov \scratch_reg, %cr3
+.Lend_\@:
.endm
.macro SWITCH_TO_USER_CR3 scratch_reg:req
+ STATIC_JUMP_IF_FALSE .Lend_\@, kaiser_enabled_key, def=1
mov %cr3, \scratch_reg
ADJUST_USER_CR3 \scratch_reg
mov \scratch_reg, %cr3
+.Lend_\@:
.endm
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+ STATIC_JUMP_IF_FALSE .Ldone_\@, kaiser_enabled_key, def=1
movq %cr3, %r\scratch_reg
movq %r\scratch_reg, \save_reg
/*
@@ -244,11 +249,13 @@ For 32-bit we have the following convent
.endm
.macro RESTORE_CR3 save_reg:req
+ STATIC_JUMP_IF_FALSE .Lend_\@, kaiser_enabled_key, def=1
/*
* The CR3 write could be avoided when not changing its value,
* but would require a CR3 read *and* a scratch register.
*/
movq \save_reg, %cr3
+.Lend_\@:
.endm
#else /* CONFIG_KAISER=n: */
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -56,6 +56,16 @@ extern void kaiser_remove_mapping(unsign
*/
extern void kaiser_init(void);
+/* True if kaiser is enabled at boot time */
+extern struct static_key_true kaiser_enabled_key;
+extern bool kaiser_enabled;
+extern void kaiser_check_cmdline(void);
+
+#else /* CONFIG_KAISER */
+
+#define kaiser_enabled (false)
+static inline void kaiser_check_cmdline(void) { }
+
#endif
#endif /* __ASSEMBLY__ */
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -175,6 +175,9 @@ static inline p4d_t *shadow_to_kernel_p4
{
return ptr_clear_bit(p4dp, KAISER_PGTABLE_SWITCH_BIT);
}
+
+extern bool kaiser_enabled;
+
#endif /* CONFIG_KAISER */
/*
@@ -208,6 +211,9 @@ static inline bool pgd_userspace_access(
static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
{
#ifdef CONFIG_KAISER
+ if (!kaiser_enabled)
+ return pgd;
+
if (pgd_userspace_access(pgd)) {
if (pgdp_maps_userspace(pgdp)) {
/*
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -20,6 +20,7 @@
#include <linux/seq_file.h>
#include <asm/pgtable.h>
+#include <asm/kaiser.h>
/*
* The dumper groups pagetable entries of the same type into one, and for
@@ -503,7 +504,7 @@ void ptdump_walk_pgd_level(struct seq_fi
void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool shadow)
{
- if (shadow)
+ if (shadow && kaiser_enabled)
pgd += PTRS_PER_PGD;
ptdump_walk_pgd_level_core(m, pgd, false, false);
}
@@ -514,6 +515,8 @@ void ptdump_walk_shadow_pgd_level_checkw
#ifdef CONFIG_KAISER
pgd_t *pgd = (pgd_t *) &init_top_pgt;
+ if (!kaiser_enabled)
+ return;
pr_info("x86/mm: Checking shadow page tables\n");
pgd += PTRS_PER_PGD;
ptdump_walk_pgd_level_core(NULL, pgd, true, false);
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -20,6 +20,7 @@
#include <asm/kaslr.h>
#include <asm/hypervisor.h>
#include <asm/cpufeature.h>
+#include <asm/kaiser.h>
/*
* We need to define the tracepoints somewhere, and tlb.c
@@ -163,9 +164,8 @@ static int page_size_mask;
static void enable_global_pages(void)
{
-#ifndef CONFIG_KAISER
- __supported_pte_mask |= _PAGE_GLOBAL;
-#endif
+ if (!kaiser_enabled)
+ __supported_pte_mask |= _PAGE_GLOBAL;
}
static void __init probe_page_size_mask(void)
@@ -656,6 +656,7 @@ void __init init_mem_mapping(void)
{
unsigned long end;
+ kaiser_check_cmdline();
probe_page_size_mask();
setup_pcid();
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -34,6 +34,7 @@
#include <linux/mm.h>
#include <linux/uaccess.h>
+#include <asm/cmdline.h>
#include <asm/kaiser.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -44,6 +45,16 @@
static pteval_t kaiser_pte_mask __ro_after_init = ~(_PAGE_NX | _PAGE_GLOBAL);
+/* Global flag for boot time kaiser enable/disable */
+bool kaiser_enabled __ro_after_init = true;
+DEFINE_STATIC_KEY_TRUE(kaiser_enabled_key);
+
+void __init kaiser_check_cmdline(void)
+{
+ if (cmdline_find_option_bool(boot_command_line, "nokaiser"))
+ kaiser_enabled = false;
+}
+
/*
* At runtime, the only things we map are some things for CPU
* hotplug, and stacks for new processes. No two CPUs will ever
@@ -252,6 +263,9 @@ int kaiser_add_user_map(const void *__st
unsigned long target_address;
pte_t *pte;
+ if (!kaiser_enabled)
+ return 0;
+
/* Clear not supported bits */
flags &= kaiser_pte_mask;
@@ -402,6 +416,9 @@ void __init kaiser_init(void)
{
int cpu;
+ if (!kaiser_enabled)
+ return;
+
kaiser_init_all_pgds();
for_each_possible_cpu(cpu) {
@@ -436,6 +453,16 @@ void __init kaiser_init(void)
kaiser_add_mapping_cpu_entry(0);
}
+static int __init kaiser_boottime_control(void)
+{
+ if (!kaiser_enabled) {
+ static_branch_disable(&kaiser_enabled_key);
+ pr_info("kaiser: Disabled on command line\n");
+ }
+ return 0;
+}
+subsys_initcall(kaiser_boottime_control);
+
int kaiser_add_mapping(unsigned long addr, unsigned long size,
unsigned long flags)
{
@@ -446,6 +473,9 @@ void kaiser_remove_mapping(unsigned long
{
unsigned long addr;
+ if (!kaiser_enabled)
+ return;
+
/* The shadow page tables always use small pages: */
for (addr = start; addr < start + size; addr += PAGE_SIZE) {
/*
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -56,7 +56,7 @@ config SECURITY_NETWORK
config KAISER
bool "Remove the kernel mapping in user mode"
- depends on X86_64 && SMP && !PARAVIRT
+ depends on X86_64 && SMP && !PARAVIRT && JUMP_LABEL
help
This feature reduces the number of hardware side channels by
ensuring that the majority of kernel addresses are not mapped
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables
2017-11-26 23:14 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner
@ 2017-11-27 9:41 ` Peter Zijlstra
2017-11-27 10:06 ` [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions Ingo Molnar
2017-11-27 18:18 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Dave Hansen
1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-27 9:41 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 12:14:07AM +0100, Thomas Gleixner wrote:
> static int __init pt_dump_debug_init(void)
> {
> + pe_knl = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL,
> + &ptdump_fops);
> + if (!pe_knl)
> return -ENOMEM;
>
> + pe_curknl = debugfs_create_file("current_page_tables_knl", S_IRUSR,
> + NULL, NULL, &ptdump_curknl_fops);
> + if (!pe_curknl)
> + goto err;
> +
> +#ifdef CONFIG_KAISER
> + pe_curusr = debugfs_create_file("current_page_tables_usr", S_IRUSR,
> + NULL, NULL, &ptdump_curusr_fops);
> + if (!pe_curusr)
> + goto err;
> +#endif
> return 0;
> +err:
> + pt_dump_debug_remove_files();
> + return -ENOMEM;
> }
Could we pretty please use the octal permission thing? I can't read
thise S_crap nonsense.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-26 23:14 ` [patch V2 5/5] x86/kaiser: Add boottime disable switch Thomas Gleixner
@ 2017-11-27 9:48 ` Peter Zijlstra
2017-11-27 10:22 ` Peter Zijlstra
2017-11-27 18:22 ` Dave Hansen
1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-27 9:48 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 12:14:08AM +0100, Thomas Gleixner wrote:
> KAISER comes with overhead. The most expensive part is the CR3 switching in
> the entry code.
>
> Add a command line parameter which allows to disable KAISER at boot time.
>
> Most code pathes simply check a variable, but the entry code uses a static
> branch. The other code pathes cannot use a static branch because they are
> used before jump label patching is possible. Not an issue as the code
> pathes are not so performance sensitive as the entry/exit code.
>
> This makes KAISER depend on JUMP_LABEL and on a GCC which supports
> it, but that's a resonable requirement.
>
> The PGD allocation is still 8k when CONFIG_KAISER is enabled. This can be
> addressed on top of this.
So in patch 15 Andy notes that we should probably also disable the
SYSCALL trampoline when we disable KAISER.
https://lkml.kernel.org/r/20171124172411.19476-16-mingo@kernel.org
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 1/5] x86/kaiser: Respect disabled CPU features
2017-11-26 23:14 ` [patch V2 1/5] x86/kaiser: Respect disabled CPU features Thomas Gleixner
@ 2017-11-27 9:57 ` Peter Zijlstra
2017-11-27 11:47 ` Thomas Gleixner
2017-11-27 18:11 ` Dave Hansen
1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-27 9:57 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 12:14:04AM +0100, Thomas Gleixner wrote:
> PAGE_NX and PAGE_GLOBAL might be not supported or disabled on the command
> line, but KAISER sets them unconditionally.
So KAISER is x86_64 only, right? AFAIK there is no x86_64 without NX
support. So would it not make sense to mandate NX for KAISER?, that is
instead of making "noexec" + KAISER work, make "noexec" kill KAISER +
emit a warning.
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions
2017-11-27 9:41 ` Peter Zijlstra
@ 2017-11-27 10:06 ` Ingo Molnar
2017-11-27 19:21 ` Linus Torvalds
0 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2017-11-27 10:06 UTC (permalink / raw)
To: Peter Zijlstra, Linus Torvalds
Cc: Thomas Gleixner, LKML, Dave Hansen, Andy Lutomirski,
Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
Josh Poimboeuf, Linus Torvalds, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Nov 27, 2017 at 12:14:07AM +0100, Thomas Gleixner wrote:
> > static int __init pt_dump_debug_init(void)
> > {
> > + pe_knl = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL,
> > + &ptdump_fops);
> > + if (!pe_knl)
> > return -ENOMEM;
> >
> > + pe_curknl = debugfs_create_file("current_page_tables_knl", S_IRUSR,
> > + NULL, NULL, &ptdump_curknl_fops);
> > + if (!pe_curknl)
> > + goto err;
> > +
> > +#ifdef CONFIG_KAISER
> > + pe_curusr = debugfs_create_file("current_page_tables_usr", S_IRUSR,
> > + NULL, NULL, &ptdump_curusr_fops);
> > + if (!pe_curusr)
> > + goto err;
> > +#endif
> > return 0;
> > +err:
> > + pt_dump_debug_remove_files();
> > + return -ENOMEM;
> > }
>
>
> Could we pretty please use the octal permission thing? I can't read
> thise S_crap nonsense.
They are completely unreadable to me too. So if we added these helpers I sent a
year ago:
https://lwn.net/Articles/696231/
Then the above could be written as:
pe_curknl = debugfs_create_file("current_page_tables_knl", PERM_r________,
NULL, NULL, &ptdump_curknl_fops);
... etc., which is much more readable IMHO. Not only that, it would be trivial to
_write_ permission masks as well. I just wrote this:
PERM_rw_r__r__
Which is so much more readable than "S_IRUGO|S_IWUSR" or even "0644". The former
pattern is used 527 times in the kernel ...
The patch below adds it to the current kernel.
Thanks,
Ingo
---
Signed-off-by: Ingo Molnar <mingo@kernel.org>
include/linux/stat.h | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 22484e44544d..fc389c3a8692 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -6,6 +6,38 @@
#include <asm/stat.h>
#include <uapi/linux/stat.h>
+/*
+ * Human readable symbolic definitions for common
+ * file permissions:
+ */
+#define PERM_r________ 0400
+#define PERM_r__r_____ 0440
+#define PERM_r__r__r__ 0444
+
+#define PERM_rw_______ 0600
+#define PERM_rw_r_____ 0640
+#define PERM_rw_r__r__ 0644
+#define PERM_rw_rw_r__ 0664
+#define PERM_rw_rw_rw_ 0666
+
+#define PERM__w_______ 0200
+#define PERM__w__w____ 0220
+#define PERM__w__w__w_ 0222
+
+#define PERM_r_x______ 0500
+#define PERM_r_xr_x___ 0550
+#define PERM_r_xr_xr_x 0555
+
+#define PERM_rwx______ 0700
+#define PERM_rwxr_x___ 0750
+#define PERM_rwxr_xr_x 0755
+#define PERM_rwxrwxr_x 0775
+#define PERM_rwxrwxrwx 0777
+
+#define PERM__wx______ 0300
+#define PERM__wx_wx___ 0330
+#define PERM__wx_wx_wx 0333
+
#define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO)
#define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
#define S_IRUGO (S_IRUSR|S_IRGRP|S_IROTH)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-27 9:48 ` Peter Zijlstra
@ 2017-11-27 10:22 ` Peter Zijlstra
2017-11-27 11:50 ` Thomas Gleixner
2017-11-27 21:43 ` Peter Zijlstra
0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-27 10:22 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 10:48:46AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 27, 2017 at 12:14:08AM +0100, Thomas Gleixner wrote:
> > KAISER comes with overhead. The most expensive part is the CR3 switching in
> > the entry code.
> >
> > Add a command line parameter which allows to disable KAISER at boot time.
> >
> > Most code pathes simply check a variable, but the entry code uses a static
> > branch. The other code pathes cannot use a static branch because they are
> > used before jump label patching is possible. Not an issue as the code
> > pathes are not so performance sensitive as the entry/exit code.
> >
> > This makes KAISER depend on JUMP_LABEL and on a GCC which supports
> > it, but that's a resonable requirement.
> >
> > The PGD allocation is still 8k when CONFIG_KAISER is enabled. This can be
> > addressed on top of this.
>
> So in patch 15 Andy notes that we should probably also disable the
> SYSCALL trampoline when we disable KAISER.
>
> https://lkml.kernel.org/r/20171124172411.19476-16-mingo@kernel.org
Could be a simple as this.. but I've not tested.
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f4f4ab8525bd..1be393a97421 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1442,7 +1442,10 @@ void syscall_init(void)
(entry_SYSCALL_64_trampoline - _entry_trampoline);
wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
- wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
+ if (kaiser_enabled)
+ wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
+ else
+ wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
#ifdef CONFIG_IA32_EMULATION
wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [patch V2 1/5] x86/kaiser: Respect disabled CPU features
2017-11-27 9:57 ` Peter Zijlstra
@ 2017-11-27 11:47 ` Thomas Gleixner
2017-11-27 12:31 ` Brian Gerst
0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-27 11:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, 27 Nov 2017, Peter Zijlstra wrote:
> On Mon, Nov 27, 2017 at 12:14:04AM +0100, Thomas Gleixner wrote:
> > PAGE_NX and PAGE_GLOBAL might be not supported or disabled on the command
> > line, but KAISER sets them unconditionally.
>
> So KAISER is x86_64 only, right? AFAIK there is no x86_64 without NX
> support. So would it not make sense to mandate NX for KAISER?, that is
> instead of making "noexec" + KAISER work, make "noexec" kill KAISER +
> emit a warning.
OTOH, disabling NX is a simple way to verify that DEBUG_WX works correctly
also on the shadow maps.
But surely we can drop the PAGE_GLOBAL thing, as all 64bit systems have it.
Thanks,
tglx
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 2/5] x86/kaiser: Simplify disabling of global pages
2017-11-26 23:14 ` [patch V2 2/5] x86/kaiser: Simplify disabling of global pages Thomas Gleixner
@ 2017-11-27 11:49 ` Thomas Gleixner
2017-11-27 18:15 ` Dave Hansen
1 sibling, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-27 11:49 UTC (permalink / raw)
To: LKML
Cc: Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Peter Zijlstra, Rik van Riel, daniel.gruss,
hughd, keescook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
On Mon, 27 Nov 2017, Thomas Gleixner wrote:
> /*
> @@ -179,11 +186,11 @@ static void __init probe_page_size_mask(
> cr4_set_bits_and_update_boot(X86_CR4_PSE);
>
> /* Enable PGE if available */
> + __supported_pte_mask |= _PAGE_GLOBAL;
Bah. Late night reject fixup wreckage. That wants to be
__supported_pte_mask &= ~_PAGE_GLOBAL;
Thanks,
tglx
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-27 10:22 ` Peter Zijlstra
@ 2017-11-27 11:50 ` Thomas Gleixner
2017-11-27 12:49 ` Peter Zijlstra
2017-11-27 21:43 ` Peter Zijlstra
1 sibling, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-27 11:50 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, 27 Nov 2017, Peter Zijlstra wrote:
> On Mon, Nov 27, 2017 at 10:48:46AM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 27, 2017 at 12:14:08AM +0100, Thomas Gleixner wrote:
> > > KAISER comes with overhead. The most expensive part is the CR3 switching in
> > > the entry code.
> > >
> > > Add a command line parameter which allows to disable KAISER at boot time.
> > >
> > > Most code pathes simply check a variable, but the entry code uses a static
> > > branch. The other code pathes cannot use a static branch because they are
> > > used before jump label patching is possible. Not an issue as the code
> > > pathes are not so performance sensitive as the entry/exit code.
> > >
> > > This makes KAISER depend on JUMP_LABEL and on a GCC which supports
> > > it, but that's a resonable requirement.
> > >
> > > The PGD allocation is still 8k when CONFIG_KAISER is enabled. This can be
> > > addressed on top of this.
> >
> > So in patch 15 Andy notes that we should probably also disable the
> > SYSCALL trampoline when we disable KAISER.
> >
> > https://lkml.kernel.org/r/20171124172411.19476-16-mingo@kernel.org
>
> Could be a simple as this.. but I've not tested.
That's only one part of it. I think we need to fiddle with the exit side as
well.
Thanks,
tglx
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f4f4ab8525bd..1be393a97421 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1442,7 +1442,10 @@ void syscall_init(void)
> (entry_SYSCALL_64_trampoline - _entry_trampoline);
>
> wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
> - wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
> + if (kaiser_enabled)
> + wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
> + else
> + wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
>
> #ifdef CONFIG_IA32_EMULATION
> wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 1/5] x86/kaiser: Respect disabled CPU features
2017-11-27 11:47 ` Thomas Gleixner
@ 2017-11-27 12:31 ` Brian Gerst
2017-11-27 13:18 ` Thomas Gleixner
0 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2017-11-27 12:31 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Peter Zijlstra, LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar,
Borislav Petkov, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, Daniel Gruss, Hugh Dickins,
Kees Cook, Linux-MM, michael.schwarz, moritz.lipp,
richard.fellner
On Mon, Nov 27, 2017 at 6:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 27 Nov 2017, Peter Zijlstra wrote:
>> On Mon, Nov 27, 2017 at 12:14:04AM +0100, Thomas Gleixner wrote:
>> > PAGE_NX and PAGE_GLOBAL might be not supported or disabled on the command
>> > line, but KAISER sets them unconditionally.
>>
>> So KAISER is x86_64 only, right? AFAIK there is no x86_64 without NX
>> support. So would it not make sense to mandate NX for KAISER?, that is
>> instead of making "noexec" + KAISER work, make "noexec" kill KAISER +
>> emit a warning.
>
> OTOH, disabling NX is a simple way to verify that DEBUG_WX works correctly
> also on the shadow maps.
>
> But surely we can drop the PAGE_GLOBAL thing, as all 64bit systems have it.
I seem to recall that some virtualized environments (maybe Xen?) don't
support global pages.
--
Brian Gerst
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-27 11:50 ` Thomas Gleixner
@ 2017-11-27 12:49 ` Peter Zijlstra
0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-27 12:49 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 12:50:45PM +0100, Thomas Gleixner wrote:
> On Mon, 27 Nov 2017, Peter Zijlstra wrote:
> > On Mon, Nov 27, 2017 at 10:48:46AM +0100, Peter Zijlstra wrote:
> > > So in patch 15 Andy notes that we should probably also disable the
> > > SYSCALL trampoline when we disable KAISER.
> > >
> > > https://lkml.kernel.org/r/20171124172411.19476-16-mingo@kernel.org
> >
> > Could be a simple as this.. but I've not tested.
>
> That's only one part of it. I think we need to fiddle with the exit side as
> well.
So I assumed that the patches were bisectable. From that I figured the
exit path (patch 14 in that set) would work either way.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 1/5] x86/kaiser: Respect disabled CPU features
2017-11-27 12:31 ` Brian Gerst
@ 2017-11-27 13:18 ` Thomas Gleixner
0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-27 13:18 UTC (permalink / raw)
To: Brian Gerst
Cc: Peter Zijlstra, LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar,
Borislav Petkov, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, Daniel Gruss, Hugh Dickins,
Kees Cook, Linux-MM, michael.schwarz, moritz.lipp,
richard.fellner
On Mon, 27 Nov 2017, Brian Gerst wrote:
> On Mon, Nov 27, 2017 at 6:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 27 Nov 2017, Peter Zijlstra wrote:
> >> On Mon, Nov 27, 2017 at 12:14:04AM +0100, Thomas Gleixner wrote:
> >> > PAGE_NX and PAGE_GLOBAL might be not supported or disabled on the command
> >> > line, but KAISER sets them unconditionally.
> >>
> >> So KAISER is x86_64 only, right? AFAIK there is no x86_64 without NX
> >> support. So would it not make sense to mandate NX for KAISER?, that is
> >> instead of making "noexec" + KAISER work, make "noexec" kill KAISER +
> >> emit a warning.
> >
> > OTOH, disabling NX is a simple way to verify that DEBUG_WX works correctly
> > also on the shadow maps.
> >
> > But surely we can drop the PAGE_GLOBAL thing, as all 64bit systems have it.
>
> I seem to recall that some virtualized environments (maybe Xen?) don't
> support global pages.
Uuurg. Ok, we leave it as is for now. Better safe than sorry. It does no
harm.
Thanks,
tglx
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 1/5] x86/kaiser: Respect disabled CPU features
2017-11-26 23:14 ` [patch V2 1/5] x86/kaiser: Respect disabled CPU features Thomas Gleixner
2017-11-27 9:57 ` Peter Zijlstra
@ 2017-11-27 18:11 ` Dave Hansen
2017-11-27 18:37 ` Kees Cook
1 sibling, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2017-11-27 18:11 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -222,7 +222,8 @@ static inline pgd_t kaiser_set_shadow_pg
> * wrong CR3 value, userspace will crash
> * instead of running.
> */
> - pgd.pgd |= _PAGE_NX;
> + if (__supported_pte_mask & _PAGE_NX)
> + pgd.pgd |= _PAGE_NX;
> }
Thanks for catching that. It's definitely a bug. Although,
practically, it's hard to hit, right? I think everything 64-bit
supports NX unless the hypervisor disabled it or something.
> } else if (pgd_userspace_access(*pgdp)) {
> /*
> --- a/arch/x86/mm/kaiser.c
> +++ b/arch/x86/mm/kaiser.c
> @@ -42,6 +42,8 @@
>
> #define KAISER_WALK_ATOMIC 0x1
>
> +static pteval_t kaiser_pte_mask __ro_after_init = ~(_PAGE_NX | _PAGE_GLOBAL);
Do we need a comment on this, like:
/*
* The NX and GLOBAL bits are not supported on all CPUs.
* We will add them back to this mask at runtime in
* kaiser_init_all_pgds() if supported.
*/
> /*
> * At runtime, the only things we map are some things for CPU
> * hotplug, and stacks for new processes. No two CPUs will ever
> @@ -244,11 +246,14 @@ static pte_t *kaiser_shadow_pagetable_wa
> int kaiser_add_user_map(const void *__start_addr, unsigned long size,
> unsigned long flags)
> {
> - pte_t *pte;
> unsigned long start_addr = (unsigned long)__start_addr;
> unsigned long address = start_addr & PAGE_MASK;
> unsigned long end_addr = PAGE_ALIGN(start_addr + size);
> unsigned long target_address;
> + pte_t *pte;
> +
> + /* Clear not supported bits */
> + flags &= kaiser_pte_mask;
Should we be warning on these if we clear them? Seems kinda funky to
silently fix them up.
> for (; address < end_addr; address += PAGE_SIZE) {
> target_address = get_pa_from_kernel_map(address);
> @@ -308,6 +313,11 @@ static void __init kaiser_init_all_pgds(
> pgd_t *pgd;
> int i;
>
> + if (__supported_pte_mask & _PAGE_NX)
> + kaiser_pte_mask |= _PAGE_NX;
> + if (boot_cpu_has(X86_FEATURE_PGE))
> + kaiser_pte_mask |= _PAGE_GLOBAL;
Practically, I guess boot_cpu_has(X86_FEATURE_PGE) == (cr4_read() &
X86_CR4_PGE). But, in a slow path like this, is it perhaps better to
just be checking CR4 directly?
Looks functionally fine to me, though. Feel free to add my Reviewed-by
or Acked-by.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 2/5] x86/kaiser: Simplify disabling of global pages
2017-11-26 23:14 ` [patch V2 2/5] x86/kaiser: Simplify disabling of global pages Thomas Gleixner
2017-11-27 11:49 ` Thomas Gleixner
@ 2017-11-27 18:15 ` Dave Hansen
2017-11-27 20:28 ` Thomas Gleixner
1 sibling, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2017-11-27 18:15 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> +static void enable_global_pages(void)
> +{
> +#ifndef CONFIG_KAISER
> + __supported_pte_mask |= _PAGE_GLOBAL;
> +#endif
> +}
> +
> static void __init probe_page_size_mask(void)
> {
> /*
> @@ -179,11 +186,11 @@ static void __init probe_page_size_mask(
> cr4_set_bits_and_update_boot(X86_CR4_PSE);
>
> /* Enable PGE if available */
> + __supported_pte_mask |= _PAGE_GLOBAL;
> if (boot_cpu_has(X86_FEATURE_PGE)) {
> cr4_set_bits_and_update_boot(X86_CR4_PGE);
> - __supported_pte_mask |= _PAGE_GLOBAL;
> - } else
> - __supported_pte_mask &= ~_PAGE_GLOBAL;
> + enable_global_pages();
> + }
This looks a little funky. Doesn't this or _PAGE_GLOBAL into
__supported_pte_mask twice? Once before the if(), and then a second
time via enable_global_pages() inside the if?
Did you intend for it to be masked *out* in the first one and then or'd
back in via enable_global_pages()?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 3/5] x86/dump_pagetables: Check KAISER shadow page table for WX pages
2017-11-26 23:14 ` [patch V2 3/5] x86/dump_pagetables: Check KAISER shadow page table for WX pages Thomas Gleixner
@ 2017-11-27 18:17 ` Dave Hansen
0 siblings, 0 replies; 32+ messages in thread
From: Dave Hansen @ 2017-11-27 18:17 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> +void ptdump_walk_shadow_pgd_level_checkwx(void)
> +{
> +#ifdef CONFIG_KAISER
> + pgd_t *pgd = (pgd_t *) &init_top_pgt;
> +
> + pr_info("x86/mm: Checking shadow page tables\n");
> + pgd += PTRS_PER_PGD;
> + ptdump_walk_pgd_level_core(NULL, pgd, true, false);
> +#endif
> }
We have the kernel_to_shadow_pgdp() function to use instead of "pgd +=
PTRS_PER_PGD;". Should it be used instead?
Otherwise, looks good to me.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables
2017-11-26 23:14 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner
2017-11-27 9:41 ` Peter Zijlstra
@ 2017-11-27 18:18 ` Dave Hansen
1 sibling, 0 replies; 32+ messages in thread
From: Dave Hansen @ 2017-11-27 18:18 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> -void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd)
> +void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool shadow)
> {
> + if (shadow)
> + pgd += PTRS_PER_PGD;
> ptdump_walk_pgd_level_core(m, pgd, false, false);
> }
Same comment about calculating the shadow pgd.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-26 23:14 ` [patch V2 5/5] x86/kaiser: Add boottime disable switch Thomas Gleixner
2017-11-27 9:48 ` Peter Zijlstra
@ 2017-11-27 18:22 ` Dave Hansen
2017-11-27 19:00 ` Thomas Gleixner
1 sibling, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2017-11-27 18:22 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -56,7 +56,7 @@ config SECURITY_NETWORK
>
> config KAISER
> bool "Remove the kernel mapping in user mode"
> - depends on X86_64 && SMP && !PARAVIRT
> + depends on X86_64 && SMP && !PARAVIRT && JUMP_LABEL
> help
> This feature reduces the number of hardware side channels by
> ensuring that the majority of kernel addresses are not mapped
One of the reasons for doing the runtime-disable was to get rid of the
!PARAVIRT dependency. I can add a follow-on here that will act as if we
did "nokaiser" whenever Xen is in play so we can remove this dependency.
I just hope Xen is detectable early enough to do the static patching.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 1/5] x86/kaiser: Respect disabled CPU features
2017-11-27 18:11 ` Dave Hansen
@ 2017-11-27 18:37 ` Kees Cook
0 siblings, 0 replies; 32+ messages in thread
From: Kees Cook @ 2017-11-27 18:37 UTC (permalink / raw)
To: Dave Hansen
Cc: Thomas Gleixner, LKML, Andy Lutomirski, Ingo Molnar,
Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, Rik van Riel,
Daniel Gruss, Hugh Dickins, Linux-MM, michael.schwarz,
moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 10:11 AM, Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>> --- a/arch/x86/include/asm/pgtable_64.h
>> +++ b/arch/x86/include/asm/pgtable_64.h
>> @@ -222,7 +222,8 @@ static inline pgd_t kaiser_set_shadow_pg
>> * wrong CR3 value, userspace will crash
>> * instead of running.
>> */
>> - pgd.pgd |= _PAGE_NX;
>> + if (__supported_pte_mask & _PAGE_NX)
>> + pgd.pgd |= _PAGE_NX;
>> }
>
> Thanks for catching that. It's definitely a bug. Although,
> practically, it's hard to hit, right? I think everything 64-bit
> supports NX unless the hypervisor disabled it or something.
There was a very narrow window where x86_64 machines were made without
NX. :( This is reflected in x86_report_nx(), though maybe we should
add a "OMG, why?" when 64-bit but no NX. ;)
-Kees
--
Kees Cook
Pixel Security
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-27 18:22 ` Dave Hansen
@ 2017-11-27 19:00 ` Thomas Gleixner
2017-11-27 19:18 ` Josh Poimboeuf
0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-27 19:00 UTC (permalink / raw)
To: Dave Hansen
Cc: LKML, Andy Lutomirski, Ingo Molnar, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, 27 Nov 2017, Dave Hansen wrote:
> On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -56,7 +56,7 @@ config SECURITY_NETWORK
> >
> > config KAISER
> > bool "Remove the kernel mapping in user mode"
> > - depends on X86_64 && SMP && !PARAVIRT
> > + depends on X86_64 && SMP && !PARAVIRT && JUMP_LABEL
> > help
> > This feature reduces the number of hardware side channels by
> > ensuring that the majority of kernel addresses are not mapped
>
> One of the reasons for doing the runtime-disable was to get rid of the
> !PARAVIRT dependency. I can add a follow-on here that will act as if we
> did "nokaiser" whenever Xen is in play so we can remove this dependency.
>
> I just hope Xen is detectable early enough to do the static patching.
Yes, it is. I'm currently trying to figure out why it fails on a KVM guest.
If I boot with 'nokaiser' on the command line it works. If kaiser is
runtime enabled then some early klibc user space in the ramdisk
explodes. Not sure yet whats going on.
Thanks,
tglx
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-27 19:00 ` Thomas Gleixner
@ 2017-11-27 19:18 ` Josh Poimboeuf
2017-11-27 20:47 ` Thomas Gleixner
0 siblings, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2017-11-27 19:18 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Dave Hansen, LKML, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 08:00:19PM +0100, Thomas Gleixner wrote:
> On Mon, 27 Nov 2017, Dave Hansen wrote:
>
> > On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> > > --- a/security/Kconfig
> > > +++ b/security/Kconfig
> > > @@ -56,7 +56,7 @@ config SECURITY_NETWORK
> > >
> > > config KAISER
> > > bool "Remove the kernel mapping in user mode"
> > > - depends on X86_64 && SMP && !PARAVIRT
> > > + depends on X86_64 && SMP && !PARAVIRT && JUMP_LABEL
> > > help
> > > This feature reduces the number of hardware side channels by
> > > ensuring that the majority of kernel addresses are not mapped
> >
> > One of the reasons for doing the runtime-disable was to get rid of the
> > !PARAVIRT dependency. I can add a follow-on here that will act as if we
> > did "nokaiser" whenever Xen is in play so we can remove this dependency.
> >
> > I just hope Xen is detectable early enough to do the static patching.
>
> Yes, it is. I'm currently trying to figure out why it fails on a KVM guest.
>
> If I boot with 'nokaiser' on the command line it works. If kaiser is
> runtime enabled then some early klibc user space in the ramdisk
> explodes. Not sure yet whats going on.
I'm also seeing weirdness with PARAVIRT+KAISER on kvm. The symptoms
aren't consistent. Sometimes it boots, sometimes it hangs before the
login prompt, sometimes there are user space seg faults.
It almost seems like the interrupt handler is corrupting user space
state somehow.
This is with tip/WIP.x86/mm plus a patch to remove the KAISER dependency
on !PARAVIRT.
--
Josh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions
2017-11-27 10:06 ` [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions Ingo Molnar
@ 2017-11-27 19:21 ` Linus Torvalds
2017-11-28 10:54 ` Ingo Molnar
2017-11-28 11:12 ` Ingo Molnar
0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2017-11-27 19:21 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Thomas Gleixner, LKML, Dave Hansen,
Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Josh Poimboeuf, Rik van Riel, Daniel Gruss,
Hugh Dickins, Kees Cook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
On Mon, Nov 27, 2017 at 2:06 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> +/*
> + * Human readable symbolic definitions for common
> + * file permissions:
> + */
> +#define PERM_r________ 0400
> +#define PERM_r__r_____ 0440
> +#define PERM_r__r__r__ 0444
I'm not a fan. Particularly as you have a very random set of
permissions (rx and wx? Not very common), but also because it's just
not that legible.
I've argued several times that we shouldn't use the defines at all.
The octal format isn't any less legible than any #define I've ever
seen, and is generally _more_ legible.
What's wrong with just using 0400 for "read by user"?
Linus
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 2/5] x86/kaiser: Simplify disabling of global pages
2017-11-27 18:15 ` Dave Hansen
@ 2017-11-27 20:28 ` Thomas Gleixner
0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-27 20:28 UTC (permalink / raw)
To: Dave Hansen
Cc: LKML, Andy Lutomirski, Ingo Molnar, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, 27 Nov 2017, Dave Hansen wrote:
> On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> > +static void enable_global_pages(void)
> > +{
> > +#ifndef CONFIG_KAISER
> > + __supported_pte_mask |= _PAGE_GLOBAL;
> > +#endif
> > +}
> > +
> > static void __init probe_page_size_mask(void)
> > {
> > /*
> > @@ -179,11 +186,11 @@ static void __init probe_page_size_mask(
> > cr4_set_bits_and_update_boot(X86_CR4_PSE);
> >
> > /* Enable PGE if available */
> > + __supported_pte_mask |= _PAGE_GLOBAL;
> > if (boot_cpu_has(X86_FEATURE_PGE)) {
> > cr4_set_bits_and_update_boot(X86_CR4_PGE);
> > - __supported_pte_mask |= _PAGE_GLOBAL;
> > - } else
> > - __supported_pte_mask &= ~_PAGE_GLOBAL;
> > + enable_global_pages();
> > + }
>
> This looks a little funky. Doesn't this or _PAGE_GLOBAL into
> __supported_pte_mask twice? Once before the if(), and then a second
> time via enable_global_pages() inside the if?
>
> Did you intend for it to be masked *out* in the first one and then or'd
> back in via enable_global_pages()?
It's fixed already ...
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-27 19:18 ` Josh Poimboeuf
@ 2017-11-27 20:47 ` Thomas Gleixner
0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-11-27 20:47 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Dave Hansen, LKML, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Linus Torvalds,
Peter Zijlstra, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, 27 Nov 2017, Josh Poimboeuf wrote:
> On Mon, Nov 27, 2017 at 08:00:19PM +0100, Thomas Gleixner wrote:
> > On Mon, 27 Nov 2017, Dave Hansen wrote:
> >
> > > On 11/26/2017 03:14 PM, Thomas Gleixner wrote:
> > > > --- a/security/Kconfig
> > > > +++ b/security/Kconfig
> > > > @@ -56,7 +56,7 @@ config SECURITY_NETWORK
> > > >
> > > > config KAISER
> > > > bool "Remove the kernel mapping in user mode"
> > > > - depends on X86_64 && SMP && !PARAVIRT
> > > > + depends on X86_64 && SMP && !PARAVIRT && JUMP_LABEL
> > > > help
> > > > This feature reduces the number of hardware side channels by
> > > > ensuring that the majority of kernel addresses are not mapped
> > >
> > > One of the reasons for doing the runtime-disable was to get rid of the
> > > !PARAVIRT dependency. I can add a follow-on here that will act as if we
> > > did "nokaiser" whenever Xen is in play so we can remove this dependency.
> > >
> > > I just hope Xen is detectable early enough to do the static patching.
> >
> > Yes, it is. I'm currently trying to figure out why it fails on a KVM guest.
> >
> > If I boot with 'nokaiser' on the command line it works. If kaiser is
> > runtime enabled then some early klibc user space in the ramdisk
> > explodes. Not sure yet whats going on.
>
> I'm also seeing weirdness with PARAVIRT+KAISER on kvm. The symptoms
> aren't consistent. Sometimes it boots, sometimes it hangs before the
> login prompt, sometimes there are user space seg faults.
>
> It almost seems like the interrupt handler is corrupting user space
> state somehow.
>
> This is with tip/WIP.x86/mm plus a patch to remove the KAISER dependency
> on !PARAVIRT.
See the patches I posted. Its the PV patching of flush_tlb_single() ...
Thanks,
tglx
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [patch V2 5/5] x86/kaiser: Add boottime disable switch
2017-11-27 10:22 ` Peter Zijlstra
2017-11-27 11:50 ` Thomas Gleixner
@ 2017-11-27 21:43 ` Peter Zijlstra
1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-27 21:43 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Dave Hansen, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
Linus Torvalds, Rik van Riel, daniel.gruss, hughd, keescook,
linux-mm, michael.schwarz, moritz.lipp, richard.fellner
On Mon, Nov 27, 2017 at 11:22:41AM +0100, Peter Zijlstra wrote:
> Could be a simple as this.. but I've not tested.
>
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f4f4ab8525bd..1be393a97421 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1442,7 +1442,10 @@ void syscall_init(void)
> (entry_SYSCALL_64_trampoline - _entry_trampoline);
>
> wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
> - wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
> + if (kaiser_enabled)
> + wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
> + else
> + wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
>
> #ifdef CONFIG_IA32_EMULATION
> wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
Seems to work:
root@ivb-ep:~# rdmsr --all 0xc0000082 | uniq
ffffffff81beb780
root@ivb-ep:~# grep ffffffff81beb780 /proc/kallsyms
ffffffff81beb780 T entry_SYSCALL_64
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions
2017-11-27 19:21 ` Linus Torvalds
@ 2017-11-28 10:54 ` Ingo Molnar
2017-11-28 11:12 ` Ingo Molnar
1 sibling, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2017-11-28 10:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Thomas Gleixner, LKML, Dave Hansen,
Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Josh Poimboeuf, Rik van Riel, Daniel Gruss,
Hugh Dickins, Kees Cook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 2:06 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > +/*
> > + * Human readable symbolic definitions for common
> > + * file permissions:
> > + */
> > +#define PERM_r________ 0400
> > +#define PERM_r__r_____ 0440
> > +#define PERM_r__r__r__ 0444
>
> I'm not a fan. Particularly as you have a very random set of
> permissions (rx and wx? Not very common), but also because it's just
> not that legible.
>
> I've argued several times that we shouldn't use the defines at all.
> The octal format isn't any less legible than any #define I've ever
> seen, and is generally _more_ legible.
>
> What's wrong with just using 0400 for "read by user"?
Yeah, the octal format isn't all that bad - at least relative to the symbolic
obfuscation defines.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions
2017-11-27 19:21 ` Linus Torvalds
2017-11-28 10:54 ` Ingo Molnar
@ 2017-11-28 11:12 ` Ingo Molnar
2017-11-29 8:52 ` Michael Ellerman
1 sibling, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2017-11-28 11:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Thomas Gleixner, LKML, Dave Hansen,
Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Josh Poimboeuf, Rik van Riel, Daniel Gruss,
Hugh Dickins, Kees Cook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 2:06 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > +/*
> > + * Human readable symbolic definitions for common
> > + * file permissions:
> > + */
> > +#define PERM_r________ 0400
> > +#define PERM_r__r_____ 0440
> > +#define PERM_r__r__r__ 0444
>
> I'm not a fan. Particularly as you have a very random set of
> permissions (rx and wx? Not very common),
So I originally created those defines based on a grep of patterns used in the
kernel, and added the 'wx' variants for completeness.
We would only need a small subset. Here's a git grep based histogram of octal file
permission masks used in the kernel source:
# mode
21 0200
8 0220
14 0222
33 0400
11 0440
219 0444
91 0555
39 0600
906 0644
12 0660
12 0664
18 0666
14 0755
31 0777
So there's literally only 14 variants used, and 0644 and 0444 make up 95% of the
cases. We get the patch below if we extend these existing patterns using their
natural (looking) generators to a complete group - 19 patterns that should cover
all the sensible combinations.
> but also because it's just not that legible.
Fair enough.
Thanks,
Ingo
---
include/linux/stat.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
Index: tip/include/linux/stat.h
===================================================================
--- tip.orig/include/linux/stat.h
+++ tip/include/linux/stat.h
@@ -6,6 +6,34 @@
#include <asm/stat.h>
#include <uapi/linux/stat.h>
+/*
+ * Human readable symbolic definitions for common
+ * file permissions:
+ */
+#define PERM_r________ 0400
+#define PERM_r__r_____ 0440
+#define PERM_r__r__r__ 0444
+
+#define PERM_rw_______ 0600
+#define PERM_rw_r_____ 0640
+#define PERM_rw_r__r__ 0644
+#define PERM_rw_rw_r__ 0664
+#define PERM_rw_rw_rw_ 0666
+
+#define PERM__w_______ 0200
+#define PERM__w__w____ 0220
+#define PERM__w__w__w_ 0222
+
+#define PERM_r_x______ 0500
+#define PERM_r_xr_x___ 0550
+#define PERM_r_xr_xr_x 0555
+
+#define PERM_rwx______ 0700
+#define PERM_rwxr_x___ 0750
+#define PERM_rwxr_xr_x 0755
+#define PERM_rwxrwxr_x 0775
+#define PERM_rwxrwxrwx 0777
+
#define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO)
#define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
#define S_IRUGO (S_IRUSR|S_IRGRP|S_IROTH)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions
2017-11-28 11:12 ` Ingo Molnar
@ 2017-11-29 8:52 ` Michael Ellerman
0 siblings, 0 replies; 32+ messages in thread
From: Michael Ellerman @ 2017-11-29 8:52 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds
Cc: Peter Zijlstra, Thomas Gleixner, LKML, Dave Hansen,
Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Josh Poimboeuf, Rik van Riel, Daniel Gruss,
Hugh Dickins, Kees Cook, linux-mm, michael.schwarz, moritz.lipp,
richard.fellner
Ingo Molnar <mingo@kernel.org> writes:
...
> Index: tip/include/linux/stat.h
> ===================================================================
> --- tip.orig/include/linux/stat.h
> +++ tip/include/linux/stat.h
> @@ -6,6 +6,34 @@
> #include <asm/stat.h>
> #include <uapi/linux/stat.h>
>
> +/*
> + * Human readable symbolic definitions for common
> + * file permissions:
> + */
> +#define PERM_r________ 0400
> +#define PERM_r__r_____ 0440
> +#define PERM_r__r__r__ 0444
> +
> +#define PERM_rw_______ 0600
> +#define PERM_rw_r_____ 0640
> +#define PERM_rw_r__r__ 0644
> +#define PERM_rw_rw_r__ 0664
> +#define PERM_rw_rw_rw_ 0666
> +
> +#define PERM__w_______ 0200
> +#define PERM__w__w____ 0220
> +#define PERM__w__w__w_ 0222
> +
> +#define PERM_r_x______ 0500
> +#define PERM_r_xr_x___ 0550
> +#define PERM_r_xr_xr_x 0555
> +
> +#define PERM_rwx______ 0700
> +#define PERM_rwxr_x___ 0750
> +#define PERM_rwxr_xr_x 0755
> +#define PERM_rwxrwxr_x 0775
> +#define PERM_rwxrwxrwx 0777
I see what you're trying to do with all the explicit underscores, but it
does make them look kinda ugly.
What if you just used underscores to separate the user/group/other, and
the unset permission bits are just omitted.
Then the two most common cases would be:
PERM_rw_r_r
PERM_r_r_r
Both of those read nicely I think. ie. the first is "perm read write,
read, read".
Full set would be:
#define PERM_r 0400
#define PERM_r_r 0440
#define PERM_r_r_r 0444
#define PERM_rw 0600
#define PERM_rw_r 0640
#define PERM_rw_r_r 0644
#define PERM_rw_rw_r 0664
#define PERM_rw_rw_rw 0666
#define PERM_w 0200
#define PERM_w_w 0220
#define PERM_w_w_w 0222
#define PERM_rx 0500
#define PERM_rx_rx 0550
#define PERM_rx_rx_rx 0555
#define PERM_rwx 0700
#define PERM_rwx_rx 0750
#define PERM_rwx_rx_rx 0755
#define PERM_rwx_rwx_rx 0775
#define PERM_rwx_rwx_rwx 0777
cheers
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2017-11-29 8:52 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-26 23:14 [patch V2 0/5] x86/kaiser: Boot time disabling and debug support Thomas Gleixner
2017-11-26 23:14 ` [patch V2 1/5] x86/kaiser: Respect disabled CPU features Thomas Gleixner
2017-11-27 9:57 ` Peter Zijlstra
2017-11-27 11:47 ` Thomas Gleixner
2017-11-27 12:31 ` Brian Gerst
2017-11-27 13:18 ` Thomas Gleixner
2017-11-27 18:11 ` Dave Hansen
2017-11-27 18:37 ` Kees Cook
2017-11-26 23:14 ` [patch V2 2/5] x86/kaiser: Simplify disabling of global pages Thomas Gleixner
2017-11-27 11:49 ` Thomas Gleixner
2017-11-27 18:15 ` Dave Hansen
2017-11-27 20:28 ` Thomas Gleixner
2017-11-26 23:14 ` [patch V2 3/5] x86/dump_pagetables: Check KAISER shadow page table for WX pages Thomas Gleixner
2017-11-27 18:17 ` Dave Hansen
2017-11-26 23:14 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner
2017-11-27 9:41 ` Peter Zijlstra
2017-11-27 10:06 ` [PATCH] vfs: Add PERM_* symbolic helpers for common file mode/permissions Ingo Molnar
2017-11-27 19:21 ` Linus Torvalds
2017-11-28 10:54 ` Ingo Molnar
2017-11-28 11:12 ` Ingo Molnar
2017-11-29 8:52 ` Michael Ellerman
2017-11-27 18:18 ` [patch V2 4/5] x86/mm/debug_pagetables: Allow dumping current pagetables Dave Hansen
2017-11-26 23:14 ` [patch V2 5/5] x86/kaiser: Add boottime disable switch Thomas Gleixner
2017-11-27 9:48 ` Peter Zijlstra
2017-11-27 10:22 ` Peter Zijlstra
2017-11-27 11:50 ` Thomas Gleixner
2017-11-27 12:49 ` Peter Zijlstra
2017-11-27 21:43 ` Peter Zijlstra
2017-11-27 18:22 ` Dave Hansen
2017-11-27 19:00 ` Thomas Gleixner
2017-11-27 19:18 ` Josh Poimboeuf
2017-11-27 20:47 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).