linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
@ 2017-05-25 20:33 Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 1/8] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
                   ` (8 more replies)
  0 siblings, 9 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

Here' my first attempt to bring boot-time between 4- and 5-level paging.
It looks not too terrible to me. I've expected it to be worse.

The basic idea is to implement the same logic as pgtable-nop4d.h provides,
but at runtime.

Runtime folding is only implemented for CONFIG_X86_5LEVEL=y case. With the
option disabled, we do compile-time folding.

Initially, I tried to fold pgd instread. I've got to shell, but it
required a lot of hacks as kernel threats pgd in a special way.

Few things are broken (see patch 7/8) and many things are not yet tested.
So more work is required.

I also haven't evaluated performance impact. We can look into some form of
boot-time code patching later if required.

Please review. Any feedback is welcome.

Kirill A. Shutemov (8):
  x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  x86/mm: Make virtual memory layout movable for CONFIG_X86_5LEVEL
  x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable
  x86/mm: Handle boot-time paging mode switching at early boot
  x86/mm: Fold p4d page table layer at runtime
  x86/mm: Replace compile-time checks for 5-level with runtime-time
  x86/mm: Hacks for boot-time switching between 4- and 5-level paging
  x86/mm: Allow to boot without la57 if CONFIG_X86_5LEVEL=y

 arch/x86/Kconfig                         |  4 +-
 arch/x86/boot/compressed/head_64.S       | 37 ++++++++++++++++++
 arch/x86/entry/entry_64.S                |  5 +++
 arch/x86/include/asm/kaslr.h             |  4 --
 arch/x86/include/asm/page_64.h           |  4 ++
 arch/x86/include/asm/page_64_types.h     | 15 +++-----
 arch/x86/include/asm/paravirt.h          |  3 +-
 arch/x86/include/asm/pgalloc.h           |  5 ++-
 arch/x86/include/asm/pgtable.h           | 10 ++++-
 arch/x86/include/asm/pgtable_32.h        |  2 +
 arch/x86/include/asm/pgtable_64_types.h  | 46 ++++++++++++++--------
 arch/x86/include/asm/processor.h         |  2 +-
 arch/x86/include/asm/required-features.h |  8 +---
 arch/x86/kernel/head64.c                 | 66 ++++++++++++++++++++++++++++----
 arch/x86/kernel/head_64.S                | 22 +++++++----
 arch/x86/mm/dump_pagetables.c            | 11 ++----
 arch/x86/mm/ident_map.c                  |  2 +-
 arch/x86/mm/init_64.c                    | 30 +++++++++------
 arch/x86/mm/kaslr.c                      | 16 ++------
 arch/x86/platform/efi/efi_64.c           |  4 +-
 arch/x86/power/hibernate_64.c            |  4 +-
 arch/x86/xen/Kconfig                     |  2 +-
 arch/x86/xen/mmu_pv.c                    |  2 +-
 23 files changed, 208 insertions(+), 96 deletions(-)

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 1/8] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 2/8] x86/mm: Make virtual memory layout movable for CONFIG_X86_5LEVEL Kirill A. Shutemov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

This patch prepare decompression code to boot-time switching between 4-
and 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 3ed26769810b..89d886c95afc 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -109,6 +109,31 @@ ENTRY(startup_32)
 	movl	$LOAD_PHYSICAL_ADDR, %ebx
 1:
 
+#ifdef CONFIG_X86_5LEVEL
+	pushl	%ebx
+
+	/* Check if leaf 7 is supported*/
+	movl	$0, %eax
+	cpuid
+	cmpl	$7, %eax
+	jb	1f
+
+	/*
+	 * Check if la57 is supported.
+	 * The feature is enumerated with CPUID.(EAX=07H, ECX=0):ECX[bit 16]
+	 */
+	movl	$7, %eax
+	movl	$0, %ecx
+	cpuid
+	andl	$(1 << 16), %ecx
+	jz	1f
+
+	/* p4d page table is not folded if la57 is present */
+	movl	$0, p4d_folded(%ebp)
+1:
+	popl %ebx
+#endif
+
 	/* Target address to relocate to for decompression */
 	movl	BP_init_size(%esi), %eax
 	subl	$_end, %eax
@@ -125,9 +150,14 @@ ENTRY(startup_32)
 	/* Enable PAE and LA57 mode */
 	movl	%cr4, %eax
 	orl	$X86_CR4_PAE, %eax
+
 #ifdef CONFIG_X86_5LEVEL
+	testl	$1, p4d_folded(%ebp)
+	jnz	1f
 	orl	$X86_CR4_LA57, %eax
+1:
 #endif
+
 	movl	%eax, %cr4
 
  /*
@@ -147,11 +177,15 @@ ENTRY(startup_32)
 	movl	%eax, 0(%edi)
 
 #ifdef CONFIG_X86_5LEVEL
+	testl	$1, p4d_folded(%ebp)
+	jnz	1f
+
 	/* Build Level 4 */
 	addl	$0x1000, %edx
 	leal	pgtable(%ebx,%edx), %edi
 	leal	0x1007 (%edi), %eax
 	movl	%eax, 0(%edi)
+1:
 #endif
 
 	/* Build Level 3 */
@@ -464,6 +498,9 @@ gdt:
 	.quad   0x0000000000000000	/* TS continued */
 gdt_end:
 
+p4d_folded:
+	.word	1
+
 #ifdef CONFIG_EFI_STUB
 efi_config:
 	.quad	0
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 2/8] x86/mm: Make virtual memory layout movable for CONFIG_X86_5LEVEL
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 1/8] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 3/8] x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable Kirill A. Shutemov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

We need to be able to adjust virtual memory layout at runtime to be able
to switch between 4- and 5-level paging at boot-time.

KASLR already has movable __VMALLOC_BASE, __VMEMMAP_BASE and __PAGE_OFFSET.
Let's re-use it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/kaslr.h            | 4 ----
 arch/x86/include/asm/page_64.h          | 4 ++++
 arch/x86/include/asm/page_64_types.h    | 2 +-
 arch/x86/include/asm/pgtable_64_types.h | 2 +-
 arch/x86/kernel/head64.c                | 9 +++++++++
 arch/x86/mm/kaslr.c                     | 8 --------
 6 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index 1052a797d71d..683c9d736314 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -4,10 +4,6 @@
 unsigned long kaslr_get_random_long(const char *purpose);
 
 #ifdef CONFIG_RANDOMIZE_MEMORY
-extern unsigned long page_offset_base;
-extern unsigned long vmalloc_base;
-extern unsigned long vmemmap_base;
-
 void kernel_randomize_memory(void);
 #else
 static inline void kernel_randomize_memory(void) { }
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index b4a0d43248cf..a12fb4dcdd15 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -10,6 +10,10 @@
 extern unsigned long max_pfn;
 extern unsigned long phys_base;
 
+extern unsigned long page_offset_base;
+extern unsigned long vmalloc_base;
+extern unsigned long vmemmap_base;
+
 static inline unsigned long __phys_addr_nodebug(unsigned long x)
 {
 	unsigned long y = x - __START_KERNEL_map;
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 3f5f08b010d0..0126d6bc2eb1 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -42,7 +42,7 @@
 #define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
 #endif
 
-#ifdef CONFIG_RANDOMIZE_MEMORY
+#if defined(CONFIG_RANDOMIZE_MEMORY) || defined(CONFIG_X86_5LEVEL)
 #define __PAGE_OFFSET           page_offset_base
 #else
 #define __PAGE_OFFSET           __PAGE_OFFSET_BASE
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..a9f77ead7088 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -85,7 +85,7 @@ typedef struct { pteval_t pte; } pte_t;
 #define __VMALLOC_BASE	_AC(0xffffc90000000000, UL)
 #define __VMEMMAP_BASE	_AC(0xffffea0000000000, UL)
 #endif
-#ifdef CONFIG_RANDOMIZE_MEMORY
+#if defined(CONFIG_RANDOMIZE_MEMORY) || defined(CONFIG_X86_5LEVEL)
 #define VMALLOC_START	vmalloc_base
 #define VMEMMAP_START	vmemmap_base
 #else
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 9403633f4c7c..408ed402db1a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -38,6 +38,15 @@ extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
 static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
+#if defined(CONFIG_RANDOMIZE_MEMORY) || defined(CONFIG_X86_5LEVEL)
+unsigned long page_offset_base = __PAGE_OFFSET_BASE;
+EXPORT_SYMBOL(page_offset_base);
+unsigned long vmalloc_base = __VMALLOC_BASE;
+EXPORT_SYMBOL(vmalloc_base);
+unsigned long vmemmap_base = __VMEMMAP_BASE;
+EXPORT_SYMBOL(vmemmap_base);
+#endif
+
 static void __init *fixup_pointer(void *ptr, unsigned long physaddr)
 {
 	return ptr - (void *)_text + (void *)physaddr;
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index af599167fe3c..e6420b18f6e0 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -53,14 +53,6 @@ static const unsigned long vaddr_end = EFI_VA_END;
 static const unsigned long vaddr_end = __START_KERNEL_map;
 #endif
 
-/* Default values */
-unsigned long page_offset_base = __PAGE_OFFSET_BASE;
-EXPORT_SYMBOL(page_offset_base);
-unsigned long vmalloc_base = __VMALLOC_BASE;
-EXPORT_SYMBOL(vmalloc_base);
-unsigned long vmemmap_base = __VMEMMAP_BASE;
-EXPORT_SYMBOL(vmemmap_base);
-
 /*
  * Memory regions randomized by KASLR (except modules that use a separate logic
  * earlier during boot). The list is ordered based on virtual addresses. This
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 3/8] x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 1/8] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 2/8] x86/mm: Make virtual memory layout movable for CONFIG_X86_5LEVEL Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 4/8] x86/mm: Handle boot-time paging mode switching at early boot Kirill A. Shutemov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable_32.h       |  2 ++
 arch/x86/include/asm/pgtable_64_types.h |  7 +++++--
 arch/x86/kernel/head64.c                |  9 ++++++++-
 arch/x86/mm/dump_pagetables.c           | 11 +++--------
 arch/x86/mm/init_64.c                   |  2 +-
 arch/x86/platform/efi/efi_64.c          |  4 ++--
 6 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index bfab55675c16..9c3c811347b0 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -32,6 +32,8 @@ static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
 void paging_init(void);
 
+static inline int pgd_large(pgd_t pgd) { return 0; }
+
 /*
  * Define this if things work differently on an i386 and an i486:
  * it will (on an i486) warn about kernel memory accesses that are
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index a9f77ead7088..a09f2fa91e09 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -19,6 +19,9 @@ typedef unsigned long	pgprotval_t;
 
 typedef struct { pteval_t pte; } pte_t;
 
+extern unsigned int pgdir_shift;
+extern unsigned int ptrs_per_p4d;
+
 #endif	/* !__ASSEMBLY__ */
 
 #define SHARED_KERNEL_PMD	0
@@ -28,14 +31,14 @@ typedef struct { pteval_t pte; } pte_t;
 /*
  * PGDIR_SHIFT determines what a top-level page table entry can map
  */
-#define PGDIR_SHIFT	48
+#define PGDIR_SHIFT	pgdir_shift
 #define PTRS_PER_PGD	512
 
 /*
  * 4th level page in 5-level paging case
  */
 #define P4D_SHIFT	39
-#define PTRS_PER_P4D	512
+#define PTRS_PER_P4D	ptrs_per_p4d
 #define P4D_SIZE	(_AC(1, UL) << P4D_SHIFT)
 #define P4D_MASK	(~(P4D_SIZE - 1))
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 408ed402db1a..d4e8d4beeb62 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -38,6 +38,13 @@ extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
 static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
+#ifdef CONFIG_X86_5LEVEL
+unsigned int pgdir_shift = 48;
+EXPORT_SYMBOL(pgdir_shift);
+unsigned int ptrs_per_p4d = 512;
+EXPORT_SYMBOL(ptrs_per_p4d);
+#endif
+
 #if defined(CONFIG_RANDOMIZE_MEMORY) || defined(CONFIG_X86_5LEVEL)
 unsigned long page_offset_base = __PAGE_OFFSET_BASE;
 EXPORT_SYMBOL(page_offset_base);
@@ -273,7 +280,7 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
 	BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
 	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
-	BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
+	MAYBE_BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 0470826d2bdc..d7b3cf2320fd 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -380,14 +380,15 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
 #define p4d_none(a)  pud_none(__pud(p4d_val(a)))
 #endif
 
-#if PTRS_PER_P4D > 1
-
 static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr, unsigned long P)
 {
 	int i;
 	p4d_t *start;
 	pgprotval_t prot;
 
+	if (PTRS_PER_P4D > 1)
+		return walk_pud_level(m, st, __p4d(pgd_val(addr)), P);
+
 	start = (p4d_t *)pgd_page_vaddr(addr);
 
 	for (i = 0; i < PTRS_PER_P4D; i++) {
@@ -407,12 +408,6 @@ static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
 	}
 }
 
-#else
-#define walk_p4d_level(m,s,a,p) walk_pud_level(m,s,__p4d(pgd_val(a)),p)
-#define pgd_large(a) p4d_large(__p4d(pgd_val(a)))
-#define pgd_none(a)  p4d_none(__p4d(pgd_val(a)))
-#endif
-
 static inline bool is_hypervisor_range(int idx)
 {
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 124f1a77c181..d135c613bf7b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -143,7 +143,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
 		 * With folded p4d, pgd_none() is always false, we need to
 		 * handle synchonization on p4d level.
 		 */
-		BUILD_BUG_ON(pgd_none(*pgd_ref));
+		MAYBE_BUILD_BUG_ON(pgd_none(*pgd_ref));
 		p4d_ref = p4d_offset(pgd_ref, addr);
 
 		if (p4d_none(*p4d_ref))
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c488625c9712..d6cfba3e164f 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -186,8 +186,8 @@ void efi_sync_low_kernel_mappings(void)
 	 * only span a single PGD entry and that the entry also maps
 	 * other important kernel regions.
 	 */
-	BUILD_BUG_ON(pgd_index(EFI_VA_END) != pgd_index(MODULES_END));
-	BUILD_BUG_ON((EFI_VA_START & PGDIR_MASK) !=
+	MAYBE_BUILD_BUG_ON(pgd_index(EFI_VA_END) != pgd_index(MODULES_END));
+	MAYBE_BUILD_BUG_ON((EFI_VA_START & PGDIR_MASK) !=
 			(EFI_VA_END & PGDIR_MASK));
 
 	pgd_efi = efi_pgd + pgd_index(PAGE_OFFSET);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 4/8] x86/mm: Handle boot-time paging mode switching at early boot
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2017-05-25 20:33 ` [PATCHv1, RFC 3/8] x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime Kirill A. Shutemov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

This patch adds detection of 5-level paging at boot-time and adjusts
virtual memory layout and folds p4d page table layer if needed.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page_64_types.h    | 13 +++----
 arch/x86/include/asm/pgtable_64_types.h | 37 +++++++++++++-------
 arch/x86/include/asm/processor.h        |  2 +-
 arch/x86/kernel/head64.c                | 62 +++++++++++++++++++++++++--------
 arch/x86/kernel/head_64.S               | 16 +++++----
 arch/x86/mm/kaslr.c                     |  2 +-
 6 files changed, 90 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 0126d6bc2eb1..26056ef366b8 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -36,24 +36,21 @@
  * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
  * what Xen requires.
  */
-#ifdef CONFIG_X86_5LEVEL
-#define __PAGE_OFFSET_BASE      _AC(0xff10000000000000, UL)
-#else
-#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
-#endif
+#define __PAGE_OFFSET_BASE57	_AC(0xff10000000000000, UL)
+#define __PAGE_OFFSET_BASE48	_AC(0xffff880000000000, UL)
 
 #if defined(CONFIG_RANDOMIZE_MEMORY) || defined(CONFIG_X86_5LEVEL)
 #define __PAGE_OFFSET           page_offset_base
 #else
-#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
+#define __PAGE_OFFSET           __PAGE_OFFSET_BASE48
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #ifdef CONFIG_X86_5LEVEL
-#define __PHYSICAL_MASK_SHIFT	52
-#define __VIRTUAL_MASK_SHIFT	56
+#define __PHYSICAL_MASK_SHIFT	(p4d_folded ? 52 : 46)
+#define __VIRTUAL_MASK_SHIFT	(p4d_folded ? 47 : 56)
 #else
 #define __PHYSICAL_MASK_SHIFT	46
 #define __VIRTUAL_MASK_SHIFT	47
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index a09f2fa91e09..46f52da75e16 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -19,6 +19,12 @@ typedef unsigned long	pgprotval_t;
 
 typedef struct { pteval_t pte; } pte_t;
 
+#ifdef CONFIG_X86_5LEVEL
+extern unsigned int p4d_folded;
+#else
+#define p4d_folded 1
+#endif
+
 extern unsigned int pgdir_shift;
 extern unsigned int ptrs_per_p4d;
 
@@ -79,23 +85,30 @@ extern unsigned int ptrs_per_p4d;
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #define MAXMEM		_AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
-#ifdef CONFIG_X86_5LEVEL
-#define VMALLOC_SIZE_TB _AC(16384, UL)
-#define __VMALLOC_BASE	_AC(0xff92000000000000, UL)
-#define __VMEMMAP_BASE	_AC(0xffd4000000000000, UL)
-#else
-#define VMALLOC_SIZE_TB	_AC(32, UL)
-#define __VMALLOC_BASE	_AC(0xffffc90000000000, UL)
-#define __VMEMMAP_BASE	_AC(0xffffea0000000000, UL)
-#endif
+
+#ifndef __ASSEMBLY__
+#define __VMALLOC_BASE48	0xffffc90000000000
+#define __VMALLOC_BASE57	0xff92000000000000
+
+#define VMALLOC_SIZE_TB48	32UL
+#define VMALLOC_SIZE_TB57	16384UL
+
+#define __VMEMMAP_BASE48	0xffffea0000000000
+#define __VMEMMAP_BASE57	0xffd4000000000000
+
 #if defined(CONFIG_RANDOMIZE_MEMORY) || defined(CONFIG_X86_5LEVEL)
 #define VMALLOC_START	vmalloc_base
+#define VMALLOC_SIZE_TB	(!p4d_folded ? VMALLOC_SIZE_TB57 : VMALLOC_SIZE_TB48)
 #define VMEMMAP_START	vmemmap_base
 #else
-#define VMALLOC_START	__VMALLOC_BASE
-#define VMEMMAP_START	__VMEMMAP_BASE
+#define VMALLOC_START	__VMALLOC_BASE48
+#define VMALLOC_SIZE_TB	VMALLOC_SIZE_TB48
+#define VMEMMAP_START	__VMEMMAP_BASE48
 #endif /* CONFIG_RANDOMIZE_MEMORY */
-#define VMALLOC_END	(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
+
+#define VMALLOC_END	(VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+#endif
+
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 65663de9287b..92c3f33f7682 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -854,7 +854,7 @@ static inline void spin_lock_prefetch(const void *x)
 					IA32_PAGE_OFFSET : TASK_SIZE_MAX)
 
 #define STACK_TOP		TASK_SIZE_LOW
-#define STACK_TOP_MAX		TASK_SIZE_MAX
+#define STACK_TOP_MAX		(!p4d_folded ? TASK_SIZE_MAX : DEFAULT_MAP_WINDOW)
 
 #define INIT_THREAD  {						\
 	.sp0			= TOP_OF_INIT_STACK,		\
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index d4e8d4beeb62..47629f3e32aa 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -39,26 +39,54 @@ static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
 #ifdef CONFIG_X86_5LEVEL
-unsigned int pgdir_shift = 48;
+unsigned int pgdir_shift = 39;
 EXPORT_SYMBOL(pgdir_shift);
-unsigned int ptrs_per_p4d = 512;
+unsigned int ptrs_per_p4d = 1;
 EXPORT_SYMBOL(ptrs_per_p4d);
 #endif
 
-#if defined(CONFIG_RANDOMIZE_MEMORY) || defined(CONFIG_X86_5LEVEL)
-unsigned long page_offset_base = __PAGE_OFFSET_BASE;
+unsigned long page_offset_base = __PAGE_OFFSET_BASE48;
 EXPORT_SYMBOL(page_offset_base);
-unsigned long vmalloc_base = __VMALLOC_BASE;
+unsigned long vmalloc_base = __VMALLOC_BASE48;
 EXPORT_SYMBOL(vmalloc_base);
-unsigned long vmemmap_base = __VMEMMAP_BASE;
+unsigned long vmemmap_base = __VMEMMAP_BASE48;
 EXPORT_SYMBOL(vmemmap_base);
-#endif
 
 static void __init *fixup_pointer(void *ptr, unsigned long physaddr)
 {
 	return ptr - (void *)_text + (void *)physaddr;
 }
 
+static unsigned long __init *fixup_long(void *ptr, unsigned long physaddr)
+{
+	return fixup_pointer(ptr, physaddr);
+}
+
+#ifdef CONFIG_X86_5LEVEL
+static unsigned int __init *fixup_int(void *ptr, unsigned long physaddr)
+{
+	return fixup_pointer(ptr, physaddr);
+}
+
+static void __init check_la57_support(unsigned long physaddr)
+{
+	if (native_cpuid_eax(0) < 7)
+		return;
+
+	if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+		return;
+
+	*fixup_int(&p4d_folded, physaddr) = 0;
+	*fixup_int(&pgdir_shift, physaddr) = 48;
+	*fixup_int(&ptrs_per_p4d, physaddr) = 512;
+	*fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE57;
+	*fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE57;
+	*fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE57;
+}
+#else
+static void __init check_la57_support(unsigned long physaddr) {}
+#endif
+
 void __init __startup_64(unsigned long physaddr)
 {
 	unsigned long load_delta, *p;
@@ -68,6 +96,8 @@ void __init __startup_64(unsigned long physaddr)
 	pmdval_t *pmd, pmd_entry;
 	int i;
 
+	check_la57_support(physaddr);
+
 	/* Is the address too large? */
 	if (physaddr >> MAX_PHYSMEM_BITS)
 		for (;;);
@@ -85,9 +115,14 @@ void __init __startup_64(unsigned long physaddr)
 	/* Fixup the physical addresses in the page table */
 
 	pgd = fixup_pointer(&early_top_pgt, physaddr);
-	pgd[pgd_index(__START_KERNEL_map)] += load_delta;
-
-	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+	p = pgd + pgd_index(__START_KERNEL_map);
+	if (p4d_folded)
+		*p = (unsigned long)level3_kernel_pgt;
+	else
+		*p = (unsigned long)level4_kernel_pgt;
+	*p += _PAGE_TABLE - __START_KERNEL_map + load_delta;
+
+	if (!p4d_folded) {
 		p4d = fixup_pointer(&level4_kernel_pgt, physaddr);
 		p4d[511] += load_delta;
 	}
@@ -109,7 +144,7 @@ void __init __startup_64(unsigned long physaddr)
 	pud = fixup_pointer(early_dynamic_pgts[next_early_pgt++], physaddr);
 	pmd = fixup_pointer(early_dynamic_pgts[next_early_pgt++], physaddr);
 
-	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+	if (!p4d_folded) {
 		p4d = fixup_pointer(early_dynamic_pgts[next_early_pgt++], physaddr);
 
 		i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
@@ -151,8 +186,7 @@ void __init __startup_64(unsigned long physaddr)
 	}
 
 	/* Fixup phys_base */
-	p = fixup_pointer(&phys_base, physaddr);
-	*p += load_delta;
+	*fixup_long(&phys_base, physaddr) += load_delta;
 }
 
 /* Wipe all early page tables except for the kernel symbol map */
@@ -185,7 +219,7 @@ int __init early_make_pgtable(unsigned long address)
 	 * critical -- __PAGE_OFFSET would point us back into the dynamic
 	 * range and we might end up looping forever...
 	 */
-	if (!IS_ENABLED(CONFIG_X86_5LEVEL))
+	if (p4d_folded)
 		p4d_p = pgd_p;
 	else if (pgd)
 		p4d_p = (p4dval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 6225550883df..2009d9849e98 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -104,7 +104,10 @@ ENTRY(secondary_startup_64)
 	/* Enable PAE mode, PGE and LA57 */
 	movl	$(X86_CR4_PAE | X86_CR4_PGE), %ecx
 #ifdef CONFIG_X86_5LEVEL
+	testl	$1, p4d_folded(%rip)
+	jnz	1f
 	orl	$X86_CR4_LA57, %ecx
+1:
 #endif
 	movq	%rcx, %cr4
 
@@ -333,12 +336,7 @@ GLOBAL(name)
 
 	__INITDATA
 NEXT_PAGE(early_top_pgt)
-	.fill	511,8,0
-#ifdef CONFIG_X86_5LEVEL
-	.quad	level4_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
-#else
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
-#endif
+	.fill	512,8,0
 
 NEXT_PAGE(early_dynamic_pgts)
 	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -417,6 +415,12 @@ ENTRY(phys_base)
 	.quad   0x0000000000000000
 EXPORT_SYMBOL(phys_base)
 
+#ifdef CONFIG_X86_5LEVEL
+ENTRY(p4d_folded)
+	.word	1
+EXPORT_SYMBOL(p4d_folded)
+#endif
+
 #include "../../x86/xen/xen-head.S"
 	
 	__PAGE_ALIGNED_BSS
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index e6420b18f6e0..55433f2d1957 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -43,7 +43,7 @@
  * before. You also need to add a BUILD_BUG_ON() in kernel_randomize_memory() to
  * ensure that this order is correct and won't be changed.
  */
-static const unsigned long vaddr_start = __PAGE_OFFSET_BASE;
+static const unsigned long vaddr_start = __PAGE_OFFSET_BASE48;
 
 #if defined(CONFIG_X86_ESPFIX64)
 static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2017-05-25 20:33 ` [PATCHv1, RFC 4/8] x86/mm: Handle boot-time paging mode switching at early boot Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-27 15:09   ` Brian Gerst
  2017-05-25 20:33 ` [PATCHv1, RFC 6/8] x86/mm: Replace compile-time checks for 5-level with runtime-time Kirill A. Shutemov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

This patch changes page table helpers to fold p4d at runtime.
The logic is the same as in <asm-generic/pgtable-nop4d.h>.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/paravirt.h |  3 ++-
 arch/x86/include/asm/pgalloc.h  |  5 ++++-
 arch/x86/include/asm/pgtable.h  | 10 +++++++++-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 55fa56fe4e45..e934ed6dc036 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -615,7 +615,8 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
 
 static inline void pgd_clear(pgd_t *pgdp)
 {
-	set_pgd(pgdp, __pgd(0));
+	if (!p4d_folded)
+		set_pgd(pgdp, __pgd(0));
 }
 
 #endif  /* CONFIG_PGTABLE_LEVELS == 5 */
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index b2d0cd8288aa..5c42262169d0 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -155,6 +155,8 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
 #if CONFIG_PGTABLE_LEVELS > 4
 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
 {
+	if (p4d_folded)
+		return;
 	paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT);
 	set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
 }
@@ -179,7 +181,8 @@ extern void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d);
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
 				  unsigned long address)
 {
-	___p4d_free_tlb(tlb, p4d);
+	if (!p4d_folded)
+		___p4d_free_tlb(tlb, p4d);
 }
 
 #endif	/* CONFIG_PGTABLE_LEVELS > 4 */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 77037b6f1caa..4516a1bdcc31 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -53,7 +53,7 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
 
 #ifndef __PAGETABLE_P4D_FOLDED
 #define set_pgd(pgdp, pgd)		native_set_pgd(pgdp, pgd)
-#define pgd_clear(pgd)			native_pgd_clear(pgd)
+#define pgd_clear(pgd)			(!p4d_folded ? native_pgd_clear(pgd) : 0)
 #endif
 
 #ifndef set_p4d
@@ -847,6 +847,8 @@ static inline unsigned long p4d_index(unsigned long address)
 #if CONFIG_PGTABLE_LEVELS > 4
 static inline int pgd_present(pgd_t pgd)
 {
+	if (p4d_folded)
+		return 1;
 	return pgd_flags(pgd) & _PAGE_PRESENT;
 }
 
@@ -864,16 +866,22 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
 /* to find an entry in a page-table-directory. */
 static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
 {
+	if (p4d_folded)
+		return (p4d_t *)pgd;
 	return (p4d_t *)pgd_page_vaddr(*pgd) + p4d_index(address);
 }
 
 static inline int pgd_bad(pgd_t pgd)
 {
+	if (p4d_folded)
+		return 0;
 	return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
 }
 
 static inline int pgd_none(pgd_t pgd)
 {
+	if (p4d_folded)
+		return 0;
 	/*
 	 * There is no need to do a workaround for the KNL stray
 	 * A/D bit erratum here.  PGDs only point to page tables
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 6/8] x86/mm: Replace compile-time checks for 5-level with runtime-time
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2017-05-25 20:33 ` [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 7/8] x86/mm: Hacks for boot-time switching between 4- and 5-level paging Kirill A. Shutemov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

This patch converts the of CONFIG_X86_5LEVEL check to runtime checks for
p4d folding.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/ident_map.c       |  2 +-
 arch/x86/mm/init_64.c         | 28 +++++++++++++++++-----------
 arch/x86/mm/kaslr.c           |  6 +++---
 arch/x86/power/hibernate_64.c |  4 ++--
 arch/x86/xen/mmu_pv.c         |  2 +-
 5 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index adab1595f4bd..d2df33a2cbfb 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -115,7 +115,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 		result = ident_p4d_init(info, p4d, addr, next);
 		if (result)
 			return result;
-		if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+		if (!p4d_folded) {
 			set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
 		} else {
 			/*
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index d135c613bf7b..b1b70a79fa14 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -88,12 +88,7 @@ static int __init nonx32_setup(char *str)
 }
 __setup("noexec32=", nonx32_setup);
 
-/*
- * When memory was added make sure all the processes MM have
- * suitable PGD entries in the local PGD level page.
- */
-#ifdef CONFIG_X86_5LEVEL
-void sync_global_pgds(unsigned long start, unsigned long end)
+static void sync_global_pgds_57(unsigned long start, unsigned long end)
 {
 	unsigned long addr;
 
@@ -129,8 +124,8 @@ void sync_global_pgds(unsigned long start, unsigned long end)
 		spin_unlock(&pgd_lock);
 	}
 }
-#else
-void sync_global_pgds(unsigned long start, unsigned long end)
+
+static void sync_global_pgds_48(unsigned long start, unsigned long end)
 {
 	unsigned long addr;
 
@@ -173,7 +168,18 @@ void sync_global_pgds(unsigned long start, unsigned long end)
 		spin_unlock(&pgd_lock);
 	}
 }
-#endif
+
+/*
+ * When memory was added make sure all the processes MM have
+ * suitable PGD entries in the local PGD level page.
+ */
+void sync_global_pgds(unsigned long start, unsigned long end)
+{
+	if (!p4d_folded)
+		sync_global_pgds_57(start, end);
+	else
+		sync_global_pgds_48(start, end);
+}
 
 /*
  * NOTE: This function is marked __ref because it calls __init function
@@ -632,7 +638,7 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 	unsigned long vaddr = (unsigned long)__va(paddr);
 	int i = p4d_index(vaddr);
 
-	if (!IS_ENABLED(CONFIG_X86_5LEVEL))
+	if (p4d_folded)
 		return phys_pud_init((pud_t *) p4d_page, paddr, paddr_end, page_size_mask);
 
 	for (; i < PTRS_PER_P4D; i++, paddr = paddr_next) {
@@ -712,7 +718,7 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 					   page_size_mask);
 
 		spin_lock(&init_mm.page_table_lock);
-		if (IS_ENABLED(CONFIG_X86_5LEVEL))
+		if (!p4d_folded)
 			pgd_populate(&init_mm, pgd, p4d);
 		else
 			p4d_populate(&init_mm, p4d_offset(pgd, vaddr), (pud_t *) p4d);
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 55433f2d1957..a691ff07d825 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -134,7 +134,7 @@ void __init kernel_randomize_memory(void)
 		 */
 		entropy = remain_entropy / (ARRAY_SIZE(kaslr_regions) - i);
 		prandom_bytes_state(&rand_state, &rand, sizeof(rand));
-		if (IS_ENABLED(CONFIG_X86_5LEVEL))
+		if (!p4d_folded)
 			entropy = (rand % (entropy + 1)) & P4D_MASK;
 		else
 			entropy = (rand % (entropy + 1)) & PUD_MASK;
@@ -146,7 +146,7 @@ void __init kernel_randomize_memory(void)
 		 * randomization alignment.
 		 */
 		vaddr += get_padding(&kaslr_regions[i]);
-		if (IS_ENABLED(CONFIG_X86_5LEVEL))
+		if (!p4d_folded)
 			vaddr = round_up(vaddr + 1, P4D_SIZE);
 		else
 			vaddr = round_up(vaddr + 1, PUD_SIZE);
@@ -222,7 +222,7 @@ void __meminit init_trampoline(void)
 		return;
 	}
 
-	if (IS_ENABLED(CONFIG_X86_5LEVEL))
+	if (!p4d_folded)
 		init_trampoline_p4d();
 	else
 		init_trampoline_pud();
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index a6e21fee22ea..86696ff275b9 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -66,7 +66,7 @@ static int set_up_temporary_text_mapping(pgd_t *pgd)
 	 * tables used by the image kernel.
 	 */
 
-	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+	if (!p4d_folded) {
 		p4d = (p4d_t *)get_safe_page(GFP_ATOMIC);
 		if (!p4d)
 			return -ENOMEM;
@@ -84,7 +84,7 @@ static int set_up_temporary_text_mapping(pgd_t *pgd)
 		__pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC));
 	set_pud(pud + pud_index(restore_jump_address),
 		__pud(__pa(pmd) | _KERNPG_TABLE));
-	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+	if (!p4d_folded) {
 		set_p4d(p4d + p4d_index(restore_jump_address), __p4d(__pa(pud) | _KERNPG_TABLE));
 		set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(p4d) | _KERNPG_TABLE));
 	} else {
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index d9ee946559c9..e39054fca812 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1214,7 +1214,7 @@ static void __init xen_cleanmfnmap(unsigned long vaddr)
 			continue;
 		xen_cleanmfnmap_p4d(p4d + i, unpin);
 	}
-	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+	if (!p4d_folded) {
 		set_pgd(pgd, __pgd(0));
 		xen_cleanmfnmap_free_pgtbl(p4d, unpin);
 	}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 7/8] x86/mm: Hacks for boot-time switching between 4- and 5-level paging
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2017-05-25 20:33 ` [PATCHv1, RFC 6/8] x86/mm: Replace compile-time checks for 5-level with runtime-time Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-26 22:10   ` KASAN vs. " Kirill A. Shutemov
  2017-05-25 20:33 ` [PATCHv1, RFC 8/8] x86/mm: Allow to boot without la57 if CONFIG_X86_5LEVEL=y Kirill A. Shutemov
  2017-05-25 23:24 ` [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Linus Torvalds
  8 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

There're bunch of workaround to make switching between 4- and 5-level
paging compile.

All of them need to be addressed properly before upstreaming.

Not-yet-signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig          | 4 ++--
 arch/x86/entry/entry_64.S | 5 +++++
 arch/x86/kernel/head_64.S | 6 ++++--
 arch/x86/xen/Kconfig      | 2 +-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0bf81e837cbf..c795207d8a3c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HUGE_VMAP		if X86_64 || X86_PAE
 	select HAVE_ARCH_JUMP_LABEL
-	select HAVE_ARCH_KASAN			if X86_64 && SPARSEMEM_VMEMMAP
+	select HAVE_ARCH_KASAN			if X86_64 && SPARSEMEM_VMEMMAP && !X86_5LEVEL
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
@@ -1980,7 +1980,7 @@ config RELOCATABLE
 
 config RANDOMIZE_BASE
 	bool "Randomize the address of the kernel image (KASLR)"
-	depends on RELOCATABLE
+	depends on RELOCATABLE && !X86_5LEVEL
 	default y
 	---help---
 	  In support of Kernel Address Space Layout Randomization (KASLR),
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index edec30584eb8..9e868fd6d792 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -269,6 +269,11 @@ return_from_SYSCALL_64:
 	 * Change top bits to match most significant bit (47th or 56th bit
 	 * depending on paging mode) in the address.
 	 */
+#ifdef CONFIG_X86_5LEVEL
+#warning FIXME
+#undef __VIRTUAL_MASK_SHIFT
+#define __VIRTUAL_MASK_SHIFT 56
+#endif
 	shl	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
 	sar	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 2009d9849e98..9dcf7a4d8612 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -37,11 +37,13 @@
  *
  */
 
-#define p4d_index(x)	(((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
 
-PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
+#ifdef CONFIG_XEN
+/* FIXME */
+PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE48)
 PGD_START_KERNEL = pgd_index(__START_KERNEL_map)
+#endif
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
 	.text
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 1be9667bd476..c1714cac7595 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -4,7 +4,7 @@
 
 config XEN
 	bool "Xen guest support"
-	depends on PARAVIRT
+	depends on PARAVIRT && !X86_5LEVEL
 	select PARAVIRT_CLOCK
 	depends on X86_64 || (X86_32 && X86_PAE)
 	depends on X86_LOCAL_APIC && X86_TSC
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCHv1, RFC 8/8] x86/mm: Allow to boot without la57 if CONFIG_X86_5LEVEL=y
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2017-05-25 20:33 ` [PATCHv1, RFC 7/8] x86/mm: Hacks for boot-time switching between 4- and 5-level paging Kirill A. Shutemov
@ 2017-05-25 20:33 ` Kirill A. Shutemov
  2017-05-25 23:24 ` [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Linus Torvalds
  8 siblings, 0 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-25 20:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

All pieces of the puzzle are in place and we can now allow to boot with
CONFIG_X86_5LEVEL=y on a machine without la57 support.

Kernel will detect that la57 is missing and fold p4d at runtime.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/required-features.h | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
index d91ba04dd007..fac9a5c0abe9 100644
--- a/arch/x86/include/asm/required-features.h
+++ b/arch/x86/include/asm/required-features.h
@@ -53,12 +53,6 @@
 # define NEED_MOVBE	0
 #endif
 
-#ifdef CONFIG_X86_5LEVEL
-# define NEED_LA57	(1<<(X86_FEATURE_LA57 & 31))
-#else
-# define NEED_LA57	0
-#endif
-
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT
 /* Paravirtualized systems may not have PSE or PGE available */
@@ -104,7 +98,7 @@
 #define REQUIRED_MASK13	0
 #define REQUIRED_MASK14	0
 #define REQUIRED_MASK15	0
-#define REQUIRED_MASK16	(NEED_LA57)
+#define REQUIRED_MASK16	0
 #define REQUIRED_MASK17	0
 #define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2017-05-25 20:33 ` [PATCHv1, RFC 8/8] x86/mm: Allow to boot without la57 if CONFIG_X86_5LEVEL=y Kirill A. Shutemov
@ 2017-05-25 23:24 ` Linus Torvalds
  2017-05-26  0:40   ` Andy Lutomirski
  2017-05-26 13:00   ` Kirill A. Shutemov
  8 siblings, 2 replies; 54+ messages in thread
From: Linus Torvalds @ 2017-05-25 23:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, the arch/x86 maintainers, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andi Kleen, Dave Hansen,
	Andy Lutomirski, linux-arch, linux-mm, Linux Kernel Mailing List

On Thu, May 25, 2017 at 1:33 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> Here' my first attempt to bring boot-time between 4- and 5-level paging.
> It looks not too terrible to me. I've expected it to be worse.

If I read this right, you just made it a global on/off thing.

May I suggest possibly a different model entirely? Can you make it a
per-mm flag instead?

And then we

 (a) make all kthreads use the 4-level page tables

 (b) which means that all the init code uses the 4-level page tables

 (c) which means that all those checks for "start_secondary" etc can
just go away, because those all run with 4-level page tables.

Or is it just much too expensive to switch between 4-level and 5-level
paging at run-time?

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-25 23:24 ` [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Linus Torvalds
@ 2017-05-26  0:40   ` Andy Lutomirski
  2017-05-26  4:18     ` Kevin Easton
  2017-05-26 13:00   ` Kirill A. Shutemov
  1 sibling, 1 reply; 54+ messages in thread
From: Andy Lutomirski @ 2017-05-26  0:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, Linux Kernel Mailing List

On Thu, May 25, 2017 at 4:24 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, May 25, 2017 at 1:33 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
>> Here' my first attempt to bring boot-time between 4- and 5-level paging.
>> It looks not too terrible to me. I've expected it to be worse.
>
> If I read this right, you just made it a global on/off thing.
>
> May I suggest possibly a different model entirely? Can you make it a
> per-mm flag instead?
>
> And then we
>
>  (a) make all kthreads use the 4-level page tables
>
>  (b) which means that all the init code uses the 4-level page tables
>
>  (c) which means that all those checks for "start_secondary" etc can
> just go away, because those all run with 4-level page tables.
>
> Or is it just much too expensive to switch between 4-level and 5-level
> paging at run-time?
>

Even ignoring expensiveness, I'm not convinced it's practical.  AFAICT
you can't atomically switch the paging mode and CR3, so either you
need some magic page table with trampoline that works in both modes
(which is presumably doable with some trickery) or you need to flip
paging off.  Good luck if an NMI hits in the mean time.  There was
code like that once upon a time for EFI mixed mode, but it got deleted
due to triple-faults.

Doing this in switch_mm() sounds painful.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26  0:40   ` Andy Lutomirski
@ 2017-05-26  4:18     ` Kevin Easton
  2017-05-26  7:21       ` Andy Lutomirski
  0 siblings, 1 reply; 54+ messages in thread
From: Kevin Easton @ 2017-05-26  4:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Thu, May 25, 2017 at 05:40:16PM -0700, Andy Lutomirski wrote:
> On Thu, May 25, 2017 at 4:24 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, May 25, 2017 at 1:33 PM, Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> >> Here' my first attempt to bring boot-time between 4- and 5-level paging.
> >> It looks not too terrible to me. I've expected it to be worse.
> >
> > If I read this right, you just made it a global on/off thing.
> >
> > May I suggest possibly a different model entirely? Can you make it a
> > per-mm flag instead?
> >
> > And then we
> >
> >  (a) make all kthreads use the 4-level page tables
> >
> >  (b) which means that all the init code uses the 4-level page tables
> >
> >  (c) which means that all those checks for "start_secondary" etc can
> > just go away, because those all run with 4-level page tables.
> >
> > Or is it just much too expensive to switch between 4-level and 5-level
> > paging at run-time?
> >
> 
> Even ignoring expensiveness, I'm not convinced it's practical.  AFAICT
> you can't atomically switch the paging mode and CR3, so either you
> need some magic page table with trampoline that works in both modes
> (which is presumably doable with some trickery) or you need to flip
> paging off.  Good luck if an NMI hits in the mean time.  There was
> code like that once upon a time for EFI mixed mode, but it got deleted
> due to triple-faults.

According to Intel's documentation you pretty much have to disable
paging anyway:

"The processor allows software to modify CR4.LA57 only outside of IA-32e
mode. In IA-32e mode, an attempt to modify CR4.LA57 using the MOV CR
instruction causes a general-protection exception (#GP)."

(If it weren't for that, maybe you could point the last entry in the PML4
at the PML4 itself, so it also works as a PML5 for accessing kernel
addresses? And of course make sure nothing gets loaded above 
0xffffff8000000000).

    - Kevin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26  4:18     ` Kevin Easton
@ 2017-05-26  7:21       ` Andy Lutomirski
  0 siblings, 0 replies; 54+ messages in thread
From: Andy Lutomirski @ 2017-05-26  7:21 UTC (permalink / raw)
  To: Kevin Easton
  Cc: Andy Lutomirski, Linus Torvalds, Kirill A. Shutemov,
	Andrew Morton, the arch/x86 maintainers, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andi Kleen, Dave Hansen, linux-arch,
	linux-mm, Linux Kernel Mailing List

On Thu, May 25, 2017 at 9:18 PM, Kevin Easton <kevin@guarana.org> wrote:
> (If it weren't for that, maybe you could point the last entry in the PML4
> at the PML4 itself, so it also works as a PML5 for accessing kernel
> addresses? And of course make sure nothing gets loaded above
> 0xffffff8000000000).

This was an old trick done for a very different reason: it lets you
find your page tables at virtual addresses that depend only on the VA
whose page table you're looking for and the top-level slot that points
back to itself.  IIRC Windows used to do this for its own memory
management purposes.  A major downside is that an arbitrary write
vulnerability lets you write your own PTEs without any guesswork.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-25 23:24 ` [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Linus Torvalds
  2017-05-26  0:40   ` Andy Lutomirski
@ 2017-05-26 13:00   ` Kirill A. Shutemov
  2017-05-26 13:35     ` Andi Kleen
                       ` (2 more replies)
  1 sibling, 3 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-26 13:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Thu, May 25, 2017 at 04:24:24PM -0700, Linus Torvalds wrote:
> On Thu, May 25, 2017 at 1:33 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > Here' my first attempt to bring boot-time between 4- and 5-level paging.
> > It looks not too terrible to me. I've expected it to be worse.
> 
> If I read this right, you just made it a global on/off thing.
> 
> May I suggest possibly a different model entirely? Can you make it a
> per-mm flag instead?
> 
> And then we
> 
>  (a) make all kthreads use the 4-level page tables
> 
>  (b) which means that all the init code uses the 4-level page tables
> 
>  (c) which means that all those checks for "start_secondary" etc can
> just go away, because those all run with 4-level page tables.
> 
> Or is it just much too expensive to switch between 4-level and 5-level
> paging at run-time?

Hm..

I don't see how kernel threads can use 4-level paging. It doesn't work
from virtual memory layout POV. Kernel claims half of full virtual address
space for itself -- 256 PGD entries, not one as we would effectively have
in case of switching to 4-level paging. For instance, addresses, where
vmalloc and vmemmap are mapped, are not canonical with 4-level paging.

And you cannot see whole direct mapping of physical memory. Back to
highmem? (Please, no, please).

We could possible reduce number of PGD required by kernel. Currently,
layout for 5-level paging allows up-to 55-bit physical memory. It's
redundant as SDM claim that we never will get more than 52. So we could
reduce size of kernel part of layout by few bits, but not definitely to 1.

I don't see how it can possibly work.

Besides difficulties of getting switching between paging modes correct,
that Andy mentioned, it will also hurt performance. You cannot switch
between paging modes directly. It would require disabling paging
completely. It means we loose benefit from global page table entries on
such switching. More page-walks.

Even ignoring all of above, I don't see much benefit of having per-mm
switching. It adds complexity without much benefit -- saving few lines of
logic during early boot doesn't look as huge win to me.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 13:00   ` Kirill A. Shutemov
@ 2017-05-26 13:35     ` Andi Kleen
  2017-05-26 15:51     ` Linus Torvalds
  2017-05-26 19:40     ` hpa
  2 siblings, 0 replies; 54+ messages in thread
From: Andi Kleen @ 2017-05-26 13:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, linux-arch,
	linux-mm, Linux Kernel Mailing List

> Even ignoring all of above, I don't see much benefit of having per-mm
> switching. It adds complexity without much benefit -- saving few lines of
> logic during early boot doesn't look as huge win to me.

Also giving kthreads a different VM would prevent lazy VM switching
when switching from/to idle, which can be quite important for performance
when doing fast IO.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 13:00   ` Kirill A. Shutemov
  2017-05-26 13:35     ` Andi Kleen
@ 2017-05-26 15:51     ` Linus Torvalds
  2017-05-26 15:58       ` Kirill A. Shutemov
  2017-05-26 18:24       ` hpa
  2017-05-26 19:40     ` hpa
  2 siblings, 2 replies; 54+ messages in thread
From: Linus Torvalds @ 2017-05-26 15:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Fri, May 26, 2017 at 6:00 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> I don't see how kernel threads can use 4-level paging. It doesn't work
> from virtual memory layout POV. Kernel claims half of full virtual address
> space for itself -- 256 PGD entries, not one as we would effectively have
> in case of switching to 4-level paging. For instance, addresses, where
> vmalloc and vmemmap are mapped, are not canonical with 4-level paging.

I would have just assumed we'd map the kernel in the shared part that
fits in the top 47 bits.

But it sounds like you can't switch back and forth anyway, so I guess it's moot.

Where *is* the LA57 documentation, btw? I had an old x86 architecture
manual, so I updated it, but LA57 isn't mentioned in the new one
either.

                       Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 15:51     ` Linus Torvalds
@ 2017-05-26 15:58       ` Kirill A. Shutemov
  2017-05-26 16:13         ` Linus Torvalds
  2017-05-26 18:24       ` hpa
  1 sibling, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-26 15:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Fri, May 26, 2017 at 08:51:48AM -0700, Linus Torvalds wrote:
> On Fri, May 26, 2017 at 6:00 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> >
> > I don't see how kernel threads can use 4-level paging. It doesn't work
> > from virtual memory layout POV. Kernel claims half of full virtual address
> > space for itself -- 256 PGD entries, not one as we would effectively have
> > in case of switching to 4-level paging. For instance, addresses, where
> > vmalloc and vmemmap are mapped, are not canonical with 4-level paging.
> 
> I would have just assumed we'd map the kernel in the shared part that
> fits in the top 47 bits.
> 
> But it sounds like you can't switch back and forth anyway, so I guess it's moot.
> 
> Where *is* the LA57 documentation, btw? I had an old x86 architecture
> manual, so I updated it, but LA57 isn't mentioned in the new one
> either.

It's in a separate white paper for now:

https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 15:58       ` Kirill A. Shutemov
@ 2017-05-26 16:13         ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2017-05-26 16:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Fri, May 26, 2017 at 8:58 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> It's in a separate white paper for now:
>
> https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf

Thanks. It didn't show up with "LA57 site:intel.com" with google,
which is how I tried to find it ;)

                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 15:51     ` Linus Torvalds
  2017-05-26 15:58       ` Kirill A. Shutemov
@ 2017-05-26 18:24       ` hpa
  2017-05-26 19:23         ` Dave Hansen
  1 sibling, 1 reply; 54+ messages in thread
From: hpa @ 2017-05-26 18:24 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Dave Hansen,
	Andy Lutomirski, linux-arch, linux-mm, Linux Kernel Mailing List

On May 26, 2017 8:51:48 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Fri, May 26, 2017 at 6:00 AM, Kirill A. Shutemov
><kirill@shutemov.name> wrote:
>>
>> I don't see how kernel threads can use 4-level paging. It doesn't
>work
>> from virtual memory layout POV. Kernel claims half of full virtual
>address
>> space for itself -- 256 PGD entries, not one as we would effectively
>have
>> in case of switching to 4-level paging. For instance, addresses,
>where
>> vmalloc and vmemmap are mapped, are not canonical with 4-level
>paging.
>
>I would have just assumed we'd map the kernel in the shared part that
>fits in the top 47 bits.
>
>But it sounds like you can't switch back and forth anyway, so I guess
>it's moot.
>
>Where *is* the LA57 documentation, btw? I had an old x86 architecture
>manual, so I updated it, but LA57 isn't mentioned in the new one
>either.
>
>                       Linus

As one of the major motivations for LA57 is that we expect that we will have machines with more than 2^46 bytes of memory in the near future, it isn't feasible in most cases to do per-VM LA57.

The only case where that even has any utility is for an application to want more than 128 TiB address space on a machine with no more than 64 TiB of RAM.  It is kind of a narrow use case, I think.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 18:24       ` hpa
@ 2017-05-26 19:23         ` Dave Hansen
  2017-05-26 19:36           ` hpa
  0 siblings, 1 reply; 54+ messages in thread
From: Dave Hansen @ 2017-05-26 19:23 UTC (permalink / raw)
  To: hpa, Linus Torvalds, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Andy Lutomirski,
	linux-arch, linux-mm, Linux Kernel Mailing List

On 05/26/2017 11:24 AM, hpa@zytor.com wrote:
> The only case where that even has any utility is for an application
> to want more than 128 TiB address space on a machine with no more
> than 64 TiB of RAM.  It is kind of a narrow use case, I think.

Doesn't more address space increase the effectiveness of ASLR?  I
thought KASLR, especially, was limited in its effectiveness because of a
lack of address space.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 19:23         ` Dave Hansen
@ 2017-05-26 19:36           ` hpa
  0 siblings, 0 replies; 54+ messages in thread
From: hpa @ 2017-05-26 19:36 UTC (permalink / raw)
  To: Dave Hansen, Linus Torvalds, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Andy Lutomirski,
	linux-arch, linux-mm, Linux Kernel Mailing List

On May 26, 2017 12:23:18 PM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
>On 05/26/2017 11:24 AM, hpa@zytor.com wrote:
>> The only case where that even has any utility is for an application
>> to want more than 128 TiB address space on a machine with no more
>> than 64 TiB of RAM.  It is kind of a narrow use case, I think.
>
>Doesn't more address space increase the effectiveness of ASLR?  I
>thought KASLR, especially, was limited in its effectiveness because of
>a
>lack of address space.

The shortage of address space for KASLR is not addressable by LA57; rather, it would have to be addressed by compiling the kernel using a different (less efficient) memory model, presumably the "medium" memory model.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
  2017-05-26 13:00   ` Kirill A. Shutemov
  2017-05-26 13:35     ` Andi Kleen
  2017-05-26 15:51     ` Linus Torvalds
@ 2017-05-26 19:40     ` hpa
  2 siblings, 0 replies; 54+ messages in thread
From: hpa @ 2017-05-26 19:40 UTC (permalink / raw)
  To: Kirill A. Shutemov, Linus Torvalds
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Dave Hansen,
	Andy Lutomirski, linux-arch, linux-mm, Linux Kernel Mailing List

On May 26, 2017 6:00:57 AM PDT, "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>On Thu, May 25, 2017 at 04:24:24PM -0700, Linus Torvalds wrote:
>> On Thu, May 25, 2017 at 1:33 PM, Kirill A. Shutemov
>> <kirill.shutemov@linux.intel.com> wrote:
>> > Here' my first attempt to bring boot-time between 4- and 5-level
>paging.
>> > It looks not too terrible to me. I've expected it to be worse.
>> 
>> If I read this right, you just made it a global on/off thing.
>> 
>> May I suggest possibly a different model entirely? Can you make it a
>> per-mm flag instead?
>> 
>> And then we
>> 
>>  (a) make all kthreads use the 4-level page tables
>> 
>>  (b) which means that all the init code uses the 4-level page tables
>> 
>>  (c) which means that all those checks for "start_secondary" etc can
>> just go away, because those all run with 4-level page tables.
>> 
>> Or is it just much too expensive to switch between 4-level and
>5-level
>> paging at run-time?
>
>Hm..
>
>I don't see how kernel threads can use 4-level paging. It doesn't work
>from virtual memory layout POV. Kernel claims half of full virtual
>address
>space for itself -- 256 PGD entries, not one as we would effectively
>have
>in case of switching to 4-level paging. For instance, addresses, where
>vmalloc and vmemmap are mapped, are not canonical with 4-level paging.
>
>And you cannot see whole direct mapping of physical memory. Back to
>highmem? (Please, no, please).
>
>We could possible reduce number of PGD required by kernel. Currently,
>layout for 5-level paging allows up-to 55-bit physical memory. It's
>redundant as SDM claim that we never will get more than 52. So we could
>reduce size of kernel part of layout by few bits, but not definitely to
>1.
>
>I don't see how it can possibly work.
>
>Besides difficulties of getting switching between paging modes correct,
>that Andy mentioned, it will also hurt performance. You cannot switch
>between paging modes directly. It would require disabling paging
>completely. It means we loose benefit from global page table entries on
>such switching. More page-walks.
>
>Even ignoring all of above, I don't see much benefit of having per-mm
>switching. It adds complexity without much benefit -- saving few lines
>of
>logic during early boot doesn't look as huge win to me.

It also makes no sense – the kernel threads only need one common page table anyway.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* KASAN vs. boot-time switching between 4- and 5-level paging
  2017-05-25 20:33 ` [PATCHv1, RFC 7/8] x86/mm: Hacks for boot-time switching between 4- and 5-level paging Kirill A. Shutemov
@ 2017-05-26 22:10   ` Kirill A. Shutemov
  2017-05-29 10:02     ` Dmitry Vyukov
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-26 22:10 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov
  Cc: Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, linux-mm, linux-kernel

On Thu, May 25, 2017 at 11:33:33PM +0300, Kirill A. Shutemov wrote:
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0bf81e837cbf..c795207d8a3c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -100,7 +100,7 @@ config X86
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_HUGE_VMAP		if X86_64 || X86_PAE
>  	select HAVE_ARCH_JUMP_LABEL
> -	select HAVE_ARCH_KASAN			if X86_64 && SPARSEMEM_VMEMMAP
> +	select HAVE_ARCH_KASAN			if X86_64 && SPARSEMEM_VMEMMAP && !X86_5LEVEL
>  	select HAVE_ARCH_KGDB
>  	select HAVE_ARCH_KMEMCHECK
>  	select HAVE_ARCH_MMAP_RND_BITS		if MMU

Looks like KASAN will be a problem for boot-time paging mode switching.
It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
gcc -fasan-shadow-offset=. But this value varies between paging modes...

I don't see how to solve it. Folks, any ideas?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime
  2017-05-25 20:33 ` [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime Kirill A. Shutemov
@ 2017-05-27 15:09   ` Brian Gerst
  2017-05-27 22:46     ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Gerst @ 2017-05-27 15:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, Linux-MM,
	Linux Kernel Mailing List

On Thu, May 25, 2017 at 4:33 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> This patch changes page table helpers to fold p4d at runtime.
> The logic is the same as in <asm-generic/pgtable-nop4d.h>.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/paravirt.h |  3 ++-
>  arch/x86/include/asm/pgalloc.h  |  5 ++++-
>  arch/x86/include/asm/pgtable.h  | 10 +++++++++-
>  3 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 55fa56fe4e45..e934ed6dc036 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -615,7 +615,8 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
>
>  static inline void pgd_clear(pgd_t *pgdp)
>  {
> -       set_pgd(pgdp, __pgd(0));
> +       if (!p4d_folded)
> +               set_pgd(pgdp, __pgd(0));
>  }
>
>  #endif  /* CONFIG_PGTABLE_LEVELS == 5 */
> diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
> index b2d0cd8288aa..5c42262169d0 100644
> --- a/arch/x86/include/asm/pgalloc.h
> +++ b/arch/x86/include/asm/pgalloc.h
> @@ -155,6 +155,8 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
>  #if CONFIG_PGTABLE_LEVELS > 4
>  static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
>  {
> +       if (p4d_folded)
> +               return;
>         paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT);
>         set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
>  }
> @@ -179,7 +181,8 @@ extern void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d);
>  static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
>                                   unsigned long address)
>  {
> -       ___p4d_free_tlb(tlb, p4d);
> +       if (!p4d_folded)
> +               ___p4d_free_tlb(tlb, p4d);
>  }
>
>  #endif /* CONFIG_PGTABLE_LEVELS > 4 */
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 77037b6f1caa..4516a1bdcc31 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -53,7 +53,7 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
>
>  #ifndef __PAGETABLE_P4D_FOLDED
>  #define set_pgd(pgdp, pgd)             native_set_pgd(pgdp, pgd)
> -#define pgd_clear(pgd)                 native_pgd_clear(pgd)
> +#define pgd_clear(pgd)                 (!p4d_folded ? native_pgd_clear(pgd) : 0)
>  #endif
>
>  #ifndef set_p4d
> @@ -847,6 +847,8 @@ static inline unsigned long p4d_index(unsigned long address)
>  #if CONFIG_PGTABLE_LEVELS > 4
>  static inline int pgd_present(pgd_t pgd)
>  {
> +       if (p4d_folded)
> +               return 1;
>         return pgd_flags(pgd) & _PAGE_PRESENT;
>  }
>
> @@ -864,16 +866,22 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
>  /* to find an entry in a page-table-directory. */
>  static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
>  {
> +       if (p4d_folded)
> +               return (p4d_t *)pgd;
>         return (p4d_t *)pgd_page_vaddr(*pgd) + p4d_index(address);
>  }
>
>  static inline int pgd_bad(pgd_t pgd)
>  {
> +       if (p4d_folded)
> +               return 0;
>         return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
>  }
>
>  static inline int pgd_none(pgd_t pgd)
>  {
> +       if (p4d_folded)
> +               return 0;
>         /*
>          * There is no need to do a workaround for the KNL stray
>          * A/D bit erratum here.  PGDs only point to page tables

These should use static_cpu_has(X86_FEATURE_LA57), so that it gets
patched by alternatives.

--
Brian Gerst

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime
  2017-05-27 15:09   ` Brian Gerst
@ 2017-05-27 22:46     ` Kirill A. Shutemov
  2017-05-27 22:56       ` Brian Gerst
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-05-27 22:46 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Kirill A. Shutemov, Linus Torvalds, Andrew Morton,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, Linux-MM, Linux Kernel Mailing List

On Sat, May 27, 2017 at 11:09:54AM -0400, Brian Gerst wrote:
> >  static inline int pgd_none(pgd_t pgd)
> >  {
> > +       if (p4d_folded)
> > +               return 0;
> >         /*
> >          * There is no need to do a workaround for the KNL stray
> >          * A/D bit erratum here.  PGDs only point to page tables
> 
> These should use static_cpu_has(X86_FEATURE_LA57), so that it gets
> patched by alternatives.

Right, eventually we would likely need something like this. But at this
point I'm more worried about correctness than performance. Performance
will be the next step.

And I haven't tried it yet, but I would expect direct use of alternatives
wouldn't be possible. If I read code correctly, we enable paging way
before we apply alternatives. But we need to have something functional in
between.

I guess it will be fun :)

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime
  2017-05-27 22:46     ` Kirill A. Shutemov
@ 2017-05-27 22:56       ` Brian Gerst
  0 siblings, 0 replies; 54+ messages in thread
From: Brian Gerst @ 2017-05-27 22:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Linus Torvalds, Andrew Morton,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, Linux-MM, Linux Kernel Mailing List

On Sat, May 27, 2017 at 6:46 PM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Sat, May 27, 2017 at 11:09:54AM -0400, Brian Gerst wrote:
>> >  static inline int pgd_none(pgd_t pgd)
>> >  {
>> > +       if (p4d_folded)
>> > +               return 0;
>> >         /*
>> >          * There is no need to do a workaround for the KNL stray
>> >          * A/D bit erratum here.  PGDs only point to page tables
>>
>> These should use static_cpu_has(X86_FEATURE_LA57), so that it gets
>> patched by alternatives.
>
> Right, eventually we would likely need something like this. But at this
> point I'm more worried about correctness than performance. Performance
> will be the next step.
>
> And I haven't tried it yet, but I would expect direct use of alternatives
> wouldn't be possible. If I read code correctly, we enable paging way
> before we apply alternatives. But we need to have something functional in
> between.

static_cpu_has() does the check dynamically before alternatives are
applied, so using it early isn't a problem.

--
Brian Gerst

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-05-26 22:10   ` KASAN vs. " Kirill A. Shutemov
@ 2017-05-29 10:02     ` Dmitry Vyukov
  2017-05-29 11:18       ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Dmitry Vyukov @ 2017-05-29 10:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrey Ryabinin, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev

On Sat, May 27, 2017 at 12:10 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Thu, May 25, 2017 at 11:33:33PM +0300, Kirill A. Shutemov wrote:
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 0bf81e837cbf..c795207d8a3c 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -100,7 +100,7 @@ config X86
>>       select HAVE_ARCH_AUDITSYSCALL
>>       select HAVE_ARCH_HUGE_VMAP              if X86_64 || X86_PAE
>>       select HAVE_ARCH_JUMP_LABEL
>> -     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP
>> +     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP && !X86_5LEVEL
>>       select HAVE_ARCH_KGDB
>>       select HAVE_ARCH_KMEMCHECK
>>       select HAVE_ARCH_MMAP_RND_BITS          if MMU
>
> Looks like KASAN will be a problem for boot-time paging mode switching.
> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>
> I don't see how to solve it. Folks, any ideas?

+kasan-dev

I wonder if we can use the same offset for both modes. If we use
0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
offset that we use for 4 levels (0xdffffc0000000000) will also work
for 5 levels. Namely, ending of 5 level shadow will overlap with 4
level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
extends towards lower addresses. The current 5 level start of shadow
is actually close -- 0xffd8000000000000 and it seems that the required
space after it is unused at the moment (at least looking at mm.txt).
So just try to move it to 0xFFDFFC0000000000?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-05-29 10:02     ` Dmitry Vyukov
@ 2017-05-29 11:18       ` Andrey Ryabinin
  2017-05-29 11:19         ` Dmitry Vyukov
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-05-29 11:18 UTC (permalink / raw)
  To: Dmitry Vyukov, Kirill A. Shutemov
  Cc: Alexander Potapenko, Kirill A. Shutemov, Linus Torvalds,
	Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	LKML, kasan-dev



On 05/29/2017 01:02 PM, Dmitry Vyukov wrote:
> On Sat, May 27, 2017 at 12:10 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
>> On Thu, May 25, 2017 at 11:33:33PM +0300, Kirill A. Shutemov wrote:
>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>> index 0bf81e837cbf..c795207d8a3c 100644
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -100,7 +100,7 @@ config X86
>>>       select HAVE_ARCH_AUDITSYSCALL
>>>       select HAVE_ARCH_HUGE_VMAP              if X86_64 || X86_PAE
>>>       select HAVE_ARCH_JUMP_LABEL
>>> -     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP
>>> +     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP && !X86_5LEVEL
>>>       select HAVE_ARCH_KGDB
>>>       select HAVE_ARCH_KMEMCHECK
>>>       select HAVE_ARCH_MMAP_RND_BITS          if MMU
>>
>> Looks like KASAN will be a problem for boot-time paging mode switching.
>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>
>> I don't see how to solve it. Folks, any ideas?
> 
> +kasan-dev
> 
> I wonder if we can use the same offset for both modes. If we use
> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
> offset that we use for 4 levels (0xdffffc0000000000) will also work
> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
> extends towards lower addresses. The current 5 level start of shadow
> is actually close -- 0xffd8000000000000 and it seems that the required
> space after it is unused at the moment (at least looking at mm.txt).
> So just try to move it to 0xFFDFFC0000000000?
> 

Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
assumes that kasan shadow stars and ends on the PGDIR aligned address.
Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
of the shadow.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-05-29 11:18       ` Andrey Ryabinin
@ 2017-05-29 11:19         ` Dmitry Vyukov
  2017-05-29 11:45           ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Dmitry Vyukov @ 2017-05-29 11:19 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kirill A. Shutemov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev

On Mon, May 29, 2017 at 1:18 PM, Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
>
>
> On 05/29/2017 01:02 PM, Dmitry Vyukov wrote:
>> On Sat, May 27, 2017 at 12:10 AM, Kirill A. Shutemov
>> <kirill@shutemov.name> wrote:
>>> On Thu, May 25, 2017 at 11:33:33PM +0300, Kirill A. Shutemov wrote:
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index 0bf81e837cbf..c795207d8a3c 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -100,7 +100,7 @@ config X86
>>>>       select HAVE_ARCH_AUDITSYSCALL
>>>>       select HAVE_ARCH_HUGE_VMAP              if X86_64 || X86_PAE
>>>>       select HAVE_ARCH_JUMP_LABEL
>>>> -     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP
>>>> +     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP && !X86_5LEVEL
>>>>       select HAVE_ARCH_KGDB
>>>>       select HAVE_ARCH_KMEMCHECK
>>>>       select HAVE_ARCH_MMAP_RND_BITS          if MMU
>>>
>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>>
>>> I don't see how to solve it. Folks, any ideas?
>>
>> +kasan-dev
>>
>> I wonder if we can use the same offset for both modes. If we use
>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>> extends towards lower addresses. The current 5 level start of shadow
>> is actually close -- 0xffd8000000000000 and it seems that the required
>> space after it is unused at the moment (at least looking at mm.txt).
>> So just try to move it to 0xFFDFFC0000000000?
>>
>
> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
> assumes that kasan shadow stars and ends on the PGDIR aligned address.
> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
> of the shadow.

I think we can extend the shadow backwards (to the current address),
provided that it does not affect shadow offset that we pass to
compiler.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-05-29 11:19         ` Dmitry Vyukov
@ 2017-05-29 11:45           ` Andrey Ryabinin
  2017-05-29 12:46             ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-05-29 11:45 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Kirill A. Shutemov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev



On 05/29/2017 02:19 PM, Dmitry Vyukov wrote:
> On Mon, May 29, 2017 at 1:18 PM, Andrey Ryabinin
> <aryabinin@virtuozzo.com> wrote:
>>
>>
>> On 05/29/2017 01:02 PM, Dmitry Vyukov wrote:
>>> On Sat, May 27, 2017 at 12:10 AM, Kirill A. Shutemov
>>> <kirill@shutemov.name> wrote:
>>>> On Thu, May 25, 2017 at 11:33:33PM +0300, Kirill A. Shutemov wrote:
>>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>>> index 0bf81e837cbf..c795207d8a3c 100644
>>>>> --- a/arch/x86/Kconfig
>>>>> +++ b/arch/x86/Kconfig
>>>>> @@ -100,7 +100,7 @@ config X86
>>>>>       select HAVE_ARCH_AUDITSYSCALL
>>>>>       select HAVE_ARCH_HUGE_VMAP              if X86_64 || X86_PAE
>>>>>       select HAVE_ARCH_JUMP_LABEL
>>>>> -     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP
>>>>> +     select HAVE_ARCH_KASAN                  if X86_64 && SPARSEMEM_VMEMMAP && !X86_5LEVEL
>>>>>       select HAVE_ARCH_KGDB
>>>>>       select HAVE_ARCH_KMEMCHECK
>>>>>       select HAVE_ARCH_MMAP_RND_BITS          if MMU
>>>>
>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>>>
>>>> I don't see how to solve it. Folks, any ideas?
>>>
>>> +kasan-dev
>>>
>>> I wonder if we can use the same offset for both modes. If we use
>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>>> extends towards lower addresses. The current 5 level start of shadow
>>> is actually close -- 0xffd8000000000000 and it seems that the required
>>> space after it is unused at the moment (at least looking at mm.txt).
>>> So just try to move it to 0xFFDFFC0000000000?
>>>
>>
>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>> of the shadow.
> 
> I think we can extend the shadow backwards (to the current address),
> provided that it does not affect shadow offset that we pass to
> compiler.

I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
So we still need at least one more page to cover unaligned end.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-05-29 11:45           ` Andrey Ryabinin
@ 2017-05-29 12:46             ` Andrey Ryabinin
  2017-06-01 14:56               ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-05-29 12:46 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Kirill A. Shutemov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev

On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>>>>
>>>>> I don't see how to solve it. Folks, any ideas?
>>>>
>>>> +kasan-dev
>>>>
>>>> I wonder if we can use the same offset for both modes. If we use
>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>>>> extends towards lower addresses. The current 5 level start of shadow
>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>>>> space after it is unused at the moment (at least looking at mm.txt).
>>>> So just try to move it to 0xFFDFFC0000000000?
>>>>
>>>
>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>>> of the shadow.
>>
>> I think we can extend the shadow backwards (to the current address),
>> provided that it does not affect shadow offset that we pass to
>> compiler.
> 
> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
> So we still need at least one more page to cover unaligned end.

Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
is mapped.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-05-29 12:46             ` Andrey Ryabinin
@ 2017-06-01 14:56               ` Andrey Ryabinin
  2017-07-10 12:33                 ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-06-01 14:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dmitry Vyukov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev

On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>>>>>
>>>>>> I don't see how to solve it. Folks, any ideas?
>>>>>
>>>>> +kasan-dev
>>>>>
>>>>> I wonder if we can use the same offset for both modes. If we use
>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>>>>> extends towards lower addresses. The current 5 level start of shadow
>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>>>>> space after it is unused at the moment (at least looking at mm.txt).
>>>>> So just try to move it to 0xFFDFFC0000000000?
>>>>>
>>>>
>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>>>> of the shadow.
>>>
>>> I think we can extend the shadow backwards (to the current address),
>>> provided that it does not affect shadow offset that we pass to
>>> compiler.
>>
>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>> So we still need at least one more page to cover unaligned end.
> 
> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> is mapped.
> 


Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
And it's only build-tested.

Based on top of: git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git la57/integration


---
 arch/x86/Kconfig            |  1 -
 arch/x86/mm/kasan_init_64.c | 74 ++++++++++++++++++++++++++++++++-------------
 2 files changed, 53 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 11bd0498f64c..3456f2fdda52 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -291,7 +291,6 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
 config KASAN_SHADOW_OFFSET
 	hex
 	depends on KASAN
-	default 0xdff8000000000000 if X86_5LEVEL
 	default 0xdffffc0000000000
 
 config HAVE_INTEL_TXT
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 88215ac16b24..d79a7ea83d05 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -15,6 +15,10 @@
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 extern struct range pfn_mapped[E820_MAX_ENTRIES];
 
+#if CONFIG_PGTABLE_LEVELS == 5
+p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
+#endif
+
 static int __init map_range(struct range *range)
 {
 	unsigned long start;
@@ -35,8 +39,9 @@ static void __init clear_pgds(unsigned long start,
 			unsigned long end)
 {
 	pgd_t *pgd;
+	unsigned long pgd_end = end & PGDIR_MASK;
 
-	for (; start < end; start += PGDIR_SIZE) {
+	for (; start < pgd_end; start += PGDIR_SIZE) {
 		pgd = pgd_offset_k(start);
 		/*
 		 * With folded p4d, pgd_clear() is nop, use p4d_clear()
@@ -47,29 +52,50 @@ static void __init clear_pgds(unsigned long start,
 		else
 			pgd_clear(pgd);
 	}
+
+	pgd = pgd_offset_k(start);
+	for (; start < end; start += P4D_SIZE)
+		p4d_clear(p4d_offset(pgd, start));
+}
+
+static void __init kasan_early_p4d_populate(pgd_t *pgd,
+					unsigned long addr,
+					unsigned long end)
+{
+	p4d_t *p4d;
+	unsigned long next;
+
+	if (pgd_none(*pgd))
+		set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa_nodebug(kasan_zero_p4d)));
+
+	/* early p4d_offset()
+	 * TODO: we need helpers for this shit
+	 */
+	if (CONFIG_PGTABLE_LEVELS == 5)
+		p4d = ((p4d_t*)((__pa_nodebug(pgd->pgd) & PTE_PFN_MASK) + __START_KERNEL_map))
+			+ p4d_index(addr);
+	else
+		p4d = (p4d_t*)pgd;
+	do {
+		next = p4d_addr_end(addr, end);
+
+		if (p4d_none(*p4d))
+			set_p4d(p4d, __p4d(_KERNPG_TABLE |
+					__pa_nodebug(kasan_zero_pud)));
+	} while (p4d++, addr = next, addr != end && p4d_none(*p4d));
 }
 
 static void __init kasan_map_early_shadow(pgd_t *pgd)
 {
-	int i;
-	unsigned long start = KASAN_SHADOW_START;
+	unsigned long addr = KASAN_SHADOW_START & PGDIR_MASK;
 	unsigned long end = KASAN_SHADOW_END;
+	unsigned long next;
 
-	for (i = pgd_index(start); start < end; i++) {
-		switch (CONFIG_PGTABLE_LEVELS) {
-		case 4:
-			pgd[i] = __pgd(__pa_nodebug(kasan_zero_pud) |
-					_KERNPG_TABLE);
-			break;
-		case 5:
-			pgd[i] = __pgd(__pa_nodebug(kasan_zero_p4d) |
-					_KERNPG_TABLE);
-			break;
-		default:
-			BUILD_BUG();
-		}
-		start += PGDIR_SIZE;
-	}
+	pgd = pgd + pgd_index(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		kasan_early_p4d_populate(pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
 }
 
 #ifdef CONFIG_KASAN_INLINE
@@ -120,14 +146,20 @@ void __init kasan_init(void)
 #ifdef CONFIG_KASAN_INLINE
 	register_die_notifier(&kasan_die_notifier);
 #endif
-
 	memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
+#if CONFIG_PGTABLE_LEVELS == 5
+	memcpy(tmp_p4d_table, (void*)pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_END)),
+		sizeof(tmp_p4d_table));
+	set_pgd(&early_top_pgt[pgd_index(KASAN_SHADOW_END)],
+		__pgd(__pa(tmp_p4d_table) | _KERNPG_TABLE));
+#endif
+
 	load_cr3(early_top_pgt);
 	__flush_tlb_all();
 
-	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
+	clear_pgds(KASAN_SHADOW_START & PGDIR_MASK, KASAN_SHADOW_END);
 
-	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
+	kasan_populate_zero_shadow((void *)(KASAN_SHADOW_START & PGDIR_MASK),
 			kasan_mem_to_shadow((void *)PAGE_OFFSET));
 
 	for (i = 0; i < E820_MAX_ENTRIES; i++) {
-- 
2.13.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-06-01 14:56               ` Andrey Ryabinin
@ 2017-07-10 12:33                 ` Kirill A. Shutemov
  2017-07-10 12:43                   ` Dmitry Vyukov
  2017-07-10 16:57                   ` Andrey Ryabinin
  0 siblings, 2 replies; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-10 12:33 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Dmitry Vyukov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev

On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> > On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
> >>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
> >>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
> >>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
> >>>>>>
> >>>>>> I don't see how to solve it. Folks, any ideas?
> >>>>>
> >>>>> +kasan-dev
> >>>>>
> >>>>> I wonder if we can use the same offset for both modes. If we use
> >>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
> >>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
> >>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
> >>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
> >>>>> extends towards lower addresses. The current 5 level start of shadow
> >>>>> is actually close -- 0xffd8000000000000 and it seems that the required
> >>>>> space after it is unused at the moment (at least looking at mm.txt).
> >>>>> So just try to move it to 0xFFDFFC0000000000?
> >>>>>
> >>>>
> >>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
> >>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
> >>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
> >>>> of the shadow.
> >>>
> >>> I think we can extend the shadow backwards (to the current address),
> >>> provided that it does not affect shadow offset that we pass to
> >>> compiler.
> >>
> >> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
> >> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
> >> So we still need at least one more page to cover unaligned end.
> > 
> > Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> > but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> > is mapped.
> > 
> 
> 
> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
> And it's only build-tested.

[Sorry for loong delay.]

The patch works for me for legacy boot. But it breaks EFI boot with
5-level paging. And I struggle to understand why.

What I see is many page faults at mm/kasan/kasan.c:758 --
"DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
fault.

Any ideas?

If you want to play with this by yourself, qemu supports la57 -- use
-cpu "qemu64,+la57".

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 12:33                 ` Kirill A. Shutemov
@ 2017-07-10 12:43                   ` Dmitry Vyukov
  2017-07-10 14:17                     ` Kirill A. Shutemov
  2017-07-10 16:57                   ` Andrey Ryabinin
  1 sibling, 1 reply; 54+ messages in thread
From: Dmitry Vyukov @ 2017-07-10 12:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrey Ryabinin, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev

On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
>> > On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>> >>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>> >>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>> >>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>> >>>>>>
>> >>>>>> I don't see how to solve it. Folks, any ideas?
>> >>>>>
>> >>>>> +kasan-dev
>> >>>>>
>> >>>>> I wonder if we can use the same offset for both modes. If we use
>> >>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>> >>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>> >>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>> >>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>> >>>>> extends towards lower addresses. The current 5 level start of shadow
>> >>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>> >>>>> space after it is unused at the moment (at least looking at mm.txt).
>> >>>>> So just try to move it to 0xFFDFFC0000000000?
>> >>>>>
>> >>>>
>> >>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>> >>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>> >>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>> >>>> of the shadow.
>> >>>
>> >>> I think we can extend the shadow backwards (to the current address),
>> >>> provided that it does not affect shadow offset that we pass to
>> >>> compiler.
>> >>
>> >> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>> >> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>> >> So we still need at least one more page to cover unaligned end.
>> >
>> > Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
>> > but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
>> > is mapped.
>> >
>>
>>
>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
>> And it's only build-tested.
>
> [Sorry for loong delay.]
>
> The patch works for me for legacy boot. But it breaks EFI boot with
> 5-level paging. And I struggle to understand why.
>
> What I see is many page faults at mm/kasan/kasan.c:758 --
> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> fault.
>
> Any ideas?


Just playing the role of the rubber duck:
 - what is the fault address?
 - is it within the shadow range?
 - was the shadow mapped already?


> If you want to play with this by yourself, qemu supports la57 -- use
> -cpu "qemu64,+la57".
>
> --
>  Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 12:43                   ` Dmitry Vyukov
@ 2017-07-10 14:17                     ` Kirill A. Shutemov
  2017-07-10 15:56                       ` Andy Lutomirski
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-10 14:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev

On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> > On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
> >> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> >> > On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
> >> >>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
> >> >>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
> >> >>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
> >> >>>>>>
> >> >>>>>> I don't see how to solve it. Folks, any ideas?
> >> >>>>>
> >> >>>>> +kasan-dev
> >> >>>>>
> >> >>>>> I wonder if we can use the same offset for both modes. If we use
> >> >>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
> >> >>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
> >> >>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
> >> >>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
> >> >>>>> extends towards lower addresses. The current 5 level start of shadow
> >> >>>>> is actually close -- 0xffd8000000000000 and it seems that the required
> >> >>>>> space after it is unused at the moment (at least looking at mm.txt).
> >> >>>>> So just try to move it to 0xFFDFFC0000000000?
> >> >>>>>
> >> >>>>
> >> >>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
> >> >>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
> >> >>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
> >> >>>> of the shadow.
> >> >>>
> >> >>> I think we can extend the shadow backwards (to the current address),
> >> >>> provided that it does not affect shadow offset that we pass to
> >> >>> compiler.
> >> >>
> >> >> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
> >> >> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
> >> >> So we still need at least one more page to cover unaligned end.
> >> >
> >> > Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> >> > but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> >> > is mapped.
> >> >
> >>
> >>
> >> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
> >> And it's only build-tested.
> >
> > [Sorry for loong delay.]
> >
> > The patch works for me for legacy boot. But it breaks EFI boot with
> > 5-level paging. And I struggle to understand why.
> >
> > What I see is many page faults at mm/kasan/kasan.c:758 --
> > "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> > arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> > fault.
> >
> > Any ideas?
> 
> 
> Just playing the role of the rubber duck:
>  - what is the fault address?
>  - is it within the shadow range?
>  - was the shadow mapped already?

I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
which is ".endr" in definition of early_idt_handler_array.

The fault address for all three faults is 0xffffffff7ffffff8, which is
outside shadow range. It's just before kernel text mapping.

Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
before kasan_init().

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 14:17                     ` Kirill A. Shutemov
@ 2017-07-10 15:56                       ` Andy Lutomirski
  2017-07-10 18:47                         ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andy Lutomirski @ 2017-07-10 15:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dmitry Vyukov, Andrey Ryabinin, Alexander Potapenko,
	Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev



> On Jul 10, 2017, at 7:17 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
>> On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
>> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
>> <kirill@shutemov.name> wrote:
>>> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
>>>>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
>>>>> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>>>>>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>>>>>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>>>>>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>>>>>>>>>> 
>>>>>>>>>> I don't see how to solve it. Folks, any ideas?
>>>>>>>>> 
>>>>>>>>> +kasan-dev
>>>>>>>>> 
>>>>>>>>> I wonder if we can use the same offset for both modes. If we use
>>>>>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>>>>>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>>>>>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>>>>>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>>>>>>>>> extends towards lower addresses. The current 5 level start of shadow
>>>>>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>>>>>>>>> space after it is unused at the moment (at least looking at mm.txt).
>>>>>>>>> So just try to move it to 0xFFDFFC0000000000?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>>>>>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>>>>>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>>>>>>>> of the shadow.
>>>>>>> 
>>>>>>> I think we can extend the shadow backwards (to the current address),
>>>>>>> provided that it does not affect shadow offset that we pass to
>>>>>>> compiler.
>>>>>> 
>>>>>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>>>>>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>>>>>> So we still need at least one more page to cover unaligned end.
>>>>> 
>>>>> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
>>>>> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
>>>>> is mapped.
>>>>> 
>>>> 
>>>> 
>>>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
>>>> And it's only build-tested.
>>> 
>>> [Sorry for loong delay.]
>>> 
>>> The patch works for me for legacy boot. But it breaks EFI boot with
>>> 5-level paging. And I struggle to understand why.
>>> 
>>> What I see is many page faults at mm/kasan/kasan.c:758 --
>>> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
>>> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
>>> fault.
>>> 
>>> Any ideas?
>> 
>> 
>> Just playing the role of the rubber duck:
>> - what is the fault address?
>> - is it within the shadow range?
>> - was the shadow mapped already?
> 
> I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
> which is ".endr" in definition of early_idt_handler_array.
> 
> The fault address for all three faults is 0xffffffff7ffffff8, which is
> outside shadow range. It's just before kernel text mapping.
> 
> Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
> before kasan_init().

My theory is that, in 5 level mode, the early IDT code isn't all mapped in the page tables.  This could sometimes be papered over by lazy page table setup, but lazy setup can't handle faults in the page fault code or data structures.

EFI sometimes uses separate page tables, which could contribute.

> 
> -- 
> Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 12:33                 ` Kirill A. Shutemov
  2017-07-10 12:43                   ` Dmitry Vyukov
@ 2017-07-10 16:57                   ` Andrey Ryabinin
  1 sibling, 0 replies; 54+ messages in thread
From: Andrey Ryabinin @ 2017-07-10 16:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dmitry Vyukov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, LKML, kasan-dev



On 07/10/2017 03:33 PM, Kirill A. Shutemov wrote:

> 
> [Sorry for loong delay.]
> 
> The patch works for me for legacy boot. But it breaks EFI boot with
> 5-level paging. And I struggle to understand why.
> 
> What I see is many page faults at mm/kasan/kasan.c:758 --
> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> fault.
> 
> Any ideas?
> 
> If you want to play with this by yourself, qemu supports la57 -- use
> -cpu "qemu64,+la57".
> 

I'll have a look.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 15:56                       ` Andy Lutomirski
@ 2017-07-10 18:47                         ` Kirill A. Shutemov
  2017-07-10 20:07                           ` Andy Lutomirski
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-10 18:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Vyukov, Andrey Ryabinin, Alexander Potapenko,
	Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Mon, Jul 10, 2017 at 08:56:37AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Jul 10, 2017, at 7:17 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > 
> >> On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
> >> <kirill@shutemov.name> wrote:
> >>> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
> >>>>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
> >>>>> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
> >>>>>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
> >>>>>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
> >>>>>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
> >>>>>>>>>> 
> >>>>>>>>>> I don't see how to solve it. Folks, any ideas?
> >>>>>>>>> 
> >>>>>>>>> +kasan-dev
> >>>>>>>>> 
> >>>>>>>>> I wonder if we can use the same offset for both modes. If we use
> >>>>>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
> >>>>>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
> >>>>>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
> >>>>>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
> >>>>>>>>> extends towards lower addresses. The current 5 level start of shadow
> >>>>>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
> >>>>>>>>> space after it is unused at the moment (at least looking at mm.txt).
> >>>>>>>>> So just try to move it to 0xFFDFFC0000000000?
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
> >>>>>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
> >>>>>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
> >>>>>>>> of the shadow.
> >>>>>>> 
> >>>>>>> I think we can extend the shadow backwards (to the current address),
> >>>>>>> provided that it does not affect shadow offset that we pass to
> >>>>>>> compiler.
> >>>>>> 
> >>>>>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
> >>>>>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
> >>>>>> So we still need at least one more page to cover unaligned end.
> >>>>> 
> >>>>> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
> >>>>> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
> >>>>> is mapped.
> >>>>> 
> >>>> 
> >>>> 
> >>>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
> >>>> And it's only build-tested.
> >>> 
> >>> [Sorry for loong delay.]
> >>> 
> >>> The patch works for me for legacy boot. But it breaks EFI boot with
> >>> 5-level paging. And I struggle to understand why.
> >>> 
> >>> What I see is many page faults at mm/kasan/kasan.c:758 --
> >>> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
> >>> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
> >>> fault.
> >>> 
> >>> Any ideas?
> >> 
> >> 
> >> Just playing the role of the rubber duck:
> >> - what is the fault address?
> >> - is it within the shadow range?
> >> - was the shadow mapped already?
> > 
> > I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
> > which is ".endr" in definition of early_idt_handler_array.
> > 
> > The fault address for all three faults is 0xffffffff7ffffff8, which is
> > outside shadow range. It's just before kernel text mapping.
> > 
> > Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
> > before kasan_init().
> 
> My theory is that, in 5 level mode, the early IDT code isn't all mapped
> in the page tables.  This could sometimes be papered over by lazy page
> table setup, but lazy setup can't handle faults in the page fault code
> or data structures.
> 
> EFI sometimes uses separate page tables, which could contribute.

As far as I can see all involved code is within the same page:

(gdb) p/x &x86_64_start_kernel
$1 = 0xffffffff84bad2ae
(gdb) p/x &early_idt_handler_array
$2 = 0xffffffff84bad000
(gdb) p/x &early_idt_handler_common
$3 = 0xffffffff84bad120
(gdb) p/x &early_make_pgtable
$4 = 0xffffffff84bad3b4

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 18:47                         ` Kirill A. Shutemov
@ 2017-07-10 20:07                           ` Andy Lutomirski
  2017-07-10 21:24                             ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andy Lutomirski @ 2017-07-10 20:07 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dmitry Vyukov, Andrey Ryabinin, Alexander Potapenko,
	Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Mon, Jul 10, 2017 at 11:47 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Mon, Jul 10, 2017 at 08:56:37AM -0700, Andy Lutomirski wrote:
>>
>>
>> > On Jul 10, 2017, at 7:17 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>> >
>> >> On Mon, Jul 10, 2017 at 02:43:17PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Jul 10, 2017 at 2:33 PM, Kirill A. Shutemov
>> >> <kirill@shutemov.name> wrote:
>> >>> On Thu, Jun 01, 2017 at 05:56:30PM +0300, Andrey Ryabinin wrote:
>> >>>>> On 05/29/2017 03:46 PM, Andrey Ryabinin wrote:
>> >>>>> On 05/29/2017 02:45 PM, Andrey Ryabinin wrote:
>> >>>>>>>>>> Looks like KASAN will be a problem for boot-time paging mode switching.
>> >>>>>>>>>> It wants to know CONFIG_KASAN_SHADOW_OFFSET at compile-time to pass to
>> >>>>>>>>>> gcc -fasan-shadow-offset=. But this value varies between paging modes...
>> >>>>>>>>>>
>> >>>>>>>>>> I don't see how to solve it. Folks, any ideas?
>> >>>>>>>>>
>> >>>>>>>>> +kasan-dev
>> >>>>>>>>>
>> >>>>>>>>> I wonder if we can use the same offset for both modes. If we use
>> >>>>>>>>> 0xFFDFFC0000000000 as start of shadow for 5 levels, then the same
>> >>>>>>>>> offset that we use for 4 levels (0xdffffc0000000000) will also work
>> >>>>>>>>> for 5 levels. Namely, ending of 5 level shadow will overlap with 4
>> >>>>>>>>> level mapping (both end at 0xfffffbffffffffff), but 5 level mapping
>> >>>>>>>>> extends towards lower addresses. The current 5 level start of shadow
>> >>>>>>>>> is actually close -- 0xffd8000000000000 and it seems that the required
>> >>>>>>>>> space after it is unused at the moment (at least looking at mm.txt).
>> >>>>>>>>> So just try to move it to 0xFFDFFC0000000000?
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Yeah, this should work, but note that 0xFFDFFC0000000000 is not PGDIR aligned address. Our init code
>> >>>>>>>> assumes that kasan shadow stars and ends on the PGDIR aligned address.
>> >>>>>>>> Fortunately this is fixable, we'd need two more pages for page tables to map unaligned start/end
>> >>>>>>>> of the shadow.
>> >>>>>>>
>> >>>>>>> I think we can extend the shadow backwards (to the current address),
>> >>>>>>> provided that it does not affect shadow offset that we pass to
>> >>>>>>> compiler.
>> >>>>>>
>> >>>>>> I thought about this. We can round down shadow start to 0xffdf000000000000, but we can't
>> >>>>>> round up shadow end, because in that case shadow would end at 0xffffffffffffffff.
>> >>>>>> So we still need at least one more page to cover unaligned end.
>> >>>>>
>> >>>>> Actually, I'm wrong here. I assumed that we would need an additional page to store p4d entries,
>> >>>>> but in fact we don't need it, as such page should already exist. It's the same last pgd where kernel image
>> >>>>> is mapped.
>> >>>>>
>> >>>>
>> >>>>
>> >>>> Something like bellow might work. It's just a proposal to demonstrate the idea, so some code might look ugly.
>> >>>> And it's only build-tested.
>> >>>
>> >>> [Sorry for loong delay.]
>> >>>
>> >>> The patch works for me for legacy boot. But it breaks EFI boot with
>> >>> 5-level paging. And I struggle to understand why.
>> >>>
>> >>> What I see is many page faults at mm/kasan/kasan.c:758 --
>> >>> "DEFINE_ASAN_LOAD_STORE(4)". Handling one of them I get double-fault at
>> >>> arch/x86/kernel/head_64.S:298 -- "pushq %r14", which ends up with triple
>> >>> fault.
>> >>>
>> >>> Any ideas?
>> >>
>> >>
>> >> Just playing the role of the rubber duck:
>> >> - what is the fault address?
>> >> - is it within the shadow range?
>> >> - was the shadow mapped already?
>> >
>> > I misread trace. The initial fault is at arch/x86/kernel/head_64.S:270,
>> > which is ".endr" in definition of early_idt_handler_array.
>> >
>> > The fault address for all three faults is 0xffffffff7ffffff8, which is
>> > outside shadow range. It's just before kernel text mapping.
>> >
>> > Codewise, it happens in load_ucode_bsp() -- after kasan_early_init(), but
>> > before kasan_init().
>>
>> My theory is that, in 5 level mode, the early IDT code isn't all mapped
>> in the page tables.  This could sometimes be papered over by lazy page
>> table setup, but lazy setup can't handle faults in the page fault code
>> or data structures.
>>
>> EFI sometimes uses separate page tables, which could contribute.
>
> As far as I can see all involved code is within the same page:
>
> (gdb) p/x &x86_64_start_kernel
> $1 = 0xffffffff84bad2ae
> (gdb) p/x &early_idt_handler_array
> $2 = 0xffffffff84bad000
> (gdb) p/x &early_idt_handler_common
> $3 = 0xffffffff84bad120
> (gdb) p/x &early_make_pgtable
> $4 = 0xffffffff84bad3b4
>

Can you give the disassembly of the backtrace lines?  Blaming the
.endr doesn't make much sense to me.

Or maybe Andrey will figure it out quickly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 20:07                           ` Andy Lutomirski
@ 2017-07-10 21:24                             ` Kirill A. Shutemov
  2017-07-11  0:30                               ` Andy Lutomirski
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-10 21:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Vyukov, Andrey Ryabinin, Alexander Potapenko,
	Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
> Can you give the disassembly of the backtrace lines?  Blaming the
> .endr doesn't make much sense to me.

I don't have backtrace. It's before printk() is functional. I only see
triple fault and reboot.

I had to rely on qemu tracing and gdb.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-10 21:24                             ` Kirill A. Shutemov
@ 2017-07-11  0:30                               ` Andy Lutomirski
  2017-07-11 10:35                                 ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andy Lutomirski @ 2017-07-11  0:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Dmitry Vyukov, Andrey Ryabinin,
	Alexander Potapenko, Kirill A. Shutemov, Linus Torvalds,
	Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
>> Can you give the disassembly of the backtrace lines?  Blaming the
>> .endr doesn't make much sense to me.
>
> I don't have backtrace. It's before printk() is functional. I only see
> triple fault and reboot.
>
> I had to rely on qemu tracing and gdb.

Can you ask GDB or objtool to disassemble around those addresses?  Can
you also attach the big dump that QEMU throws out that shows register
state?  In particular, CR2, CR3, and CR4 could be useful.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11  0:30                               ` Andy Lutomirski
@ 2017-07-11 10:35                                 ` Kirill A. Shutemov
  2017-07-11 15:06                                   ` Andy Lutomirski
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-11 10:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Vyukov, Andrey Ryabinin, Alexander Potapenko,
	Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Mon, Jul 10, 2017 at 05:30:38PM -0700, Andy Lutomirski wrote:
> On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> > On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
> >> Can you give the disassembly of the backtrace lines?  Blaming the
> >> .endr doesn't make much sense to me.
> >
> > I don't have backtrace. It's before printk() is functional. I only see
> > triple fault and reboot.
> >
> > I had to rely on qemu tracing and gdb.
> 
> Can you ask GDB or objtool to disassemble around those addresses?  Can
> you also attach the big dump that QEMU throws out that shows register
> state?  In particular, CR2, CR3, and CR4 could be useful.

The last three execptions:

check_exception old: 0xffffffff new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3036
RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
R8 =6d756e2032616476 R9 =2f7665642f3d746f R10=6f72203053797474 R11=3d656c6f736e6f63
R12=0000000000000006 R13=000000003fffb000 R14=ffffffff82a07ed8 R15=000000000140008e
RIP=ffffffff84bb3036 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffffffff84b8f000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     ffffffff84ba1000 0000007f
IDT=     ffffffff84d92000 00000fff
CR0=80050033 CR2=ffffffff7ffffff8 CR3=0000000009c58000 CR4=000010a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01

check_exception old: 0xe new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
R8 =6d756e2032616476 R9 =2f7665642f3d746f R10=6f72203053797474 R11=3d656c6f736e6f63
R12=0000000000000006 R13=000000003fffb000 R14=ffffffff82a07ed8 R15=000000000140008e
RIP=ffffffff84bb3141 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffffffff84b8f000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     ffffffff84ba1000 0000007f
IDT=     ffffffff84d92000 00000fff
CR0=80050033 CR2=ffffffff7ffffff8 CR3=0000000009c58000 CR4=000010a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01

check_exception old: 0x8 new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
R8 =6d756e2032616476 R9 =2f7665642f3d746f R10=6f72203053797474 R11=3d656c6f736e6f63
R12=0000000000000006 R13=000000003fffb000 R14=ffffffff82a07ed8 R15=000000000140008e
RIP=ffffffff84bb3141 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffffffff84b8f000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     ffffffff84ba1000 0000007f
IDT=     ffffffff84d92000 00000fff
CR0=80050033 CR2=ffffffff7ffffff8 CR3=0000000009c58000 CR4=000010a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Triple fault

Dump of assembler code for function early_idt_handler_array:
   0xffffffff84bb3000 <+0>:     pushq  $0x0
   0xffffffff84bb3002 <+2>:     pushq  $0x0
   0xffffffff84bb3004 <+4>:     jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb3009 <+9>:     pushq  $0x0
   0xffffffff84bb300b <+11>:    pushq  $0x1
   0xffffffff84bb300d <+13>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb3012 <+18>:    pushq  $0x0
   0xffffffff84bb3014 <+20>:    pushq  $0x2
   0xffffffff84bb3016 <+22>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb301b <+27>:    pushq  $0x0
   0xffffffff84bb301d <+29>:    pushq  $0x3
   0xffffffff84bb301f <+31>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb3024 <+36>:    pushq  $0x0
   0xffffffff84bb3026 <+38>:    pushq  $0x4
   0xffffffff84bb3028 <+40>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb302d <+45>:    pushq  $0x0
   0xffffffff84bb302f <+47>:    pushq  $0x5
   0xffffffff84bb3031 <+49>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
=> 0xffffffff84bb3036 <+54>:    pushq  $0x0
   0xffffffff84bb3038 <+56>:    pushq  $0x6
   0xffffffff84bb303a <+58>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb303f <+63>:    pushq  $0x0
   0xffffffff84bb3041 <+65>:    pushq  $0x7
   0xffffffff84bb3043 <+67>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb3048 <+72>:    pushq  $0x8
   0xffffffff84bb304a <+74>:    jmpq   0xffffffff84bb3120 <early_idt_handler_common>
   0xffffffff84bb304f <+79>:    int3
   0xffffffff84bb3050 <+80>:    int3
...

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11 10:35                                 ` Kirill A. Shutemov
@ 2017-07-11 15:06                                   ` Andy Lutomirski
  2017-07-11 15:15                                     ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Andy Lutomirski @ 2017-07-11 15:06 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Dmitry Vyukov, Andrey Ryabinin,
	Alexander Potapenko, Kirill A. Shutemov, Linus Torvalds,
	Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Tue, Jul 11, 2017 at 3:35 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Mon, Jul 10, 2017 at 05:30:38PM -0700, Andy Lutomirski wrote:
>> On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
>> <kirill@shutemov.name> wrote:
>> > On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
>> >> Can you give the disassembly of the backtrace lines?  Blaming the
>> >> .endr doesn't make much sense to me.
>> >
>> > I don't have backtrace. It's before printk() is functional. I only see
>> > triple fault and reboot.
>> >
>> > I had to rely on qemu tracing and gdb.
>>
>> Can you ask GDB or objtool to disassemble around those addresses?  Can
>> you also attach the big dump that QEMU throws out that shows register
>> state?  In particular, CR2, CR3, and CR4 could be useful.
>
> The last three execptions:
>
> check_exception old: 0xffffffff new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3036
> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000

So RSP was 0xffffffff80000000, a push happened, and we tried to write
to 0xffffffff7ffffff8, which failed.

> check_exception old: 0xe new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000

And #PF doesn't use IST, so it double-faulted.

Either the stack isn't mapped in the page tables, RSP is corrupt, or
there's a genuine stack overflow here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11 15:06                                   ` Andy Lutomirski
@ 2017-07-11 15:15                                     ` Andrey Ryabinin
  2017-07-11 16:45                                       ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-07-11 15:15 UTC (permalink / raw)
  To: Andy Lutomirski, Kirill A. Shutemov
  Cc: Dmitry Vyukov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, linux-arch, linux-mm,
	LKML, kasan-dev



On 07/11/2017 06:06 PM, Andy Lutomirski wrote:
> On Tue, Jul 11, 2017 at 3:35 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
>> On Mon, Jul 10, 2017 at 05:30:38PM -0700, Andy Lutomirski wrote:
>>> On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
>>> <kirill@shutemov.name> wrote:
>>>> On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
>>>>> Can you give the disassembly of the backtrace lines?  Blaming the
>>>>> .endr doesn't make much sense to me.
>>>>
>>>> I don't have backtrace. It's before printk() is functional. I only see
>>>> triple fault and reboot.
>>>>
>>>> I had to rely on qemu tracing and gdb.
>>>
>>> Can you ask GDB or objtool to disassemble around those addresses?  Can
>>> you also attach the big dump that QEMU throws out that shows register
>>> state?  In particular, CR2, CR3, and CR4 could be useful.
>>
>> The last three execptions:
>>
>> check_exception old: 0xffffffff new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3036
>> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
>> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
> 
> So RSP was 0xffffffff80000000, a push happened, and we tried to write
> to 0xffffffff7ffffff8, which failed.
> 
>> check_exception old: 0xe new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
>> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
>> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000
> 
> And #PF doesn't use IST, so it double-faulted.
> 
> Either the stack isn't mapped in the page tables, RSP is corrupt, or
> there's a genuine stack overflow here.
> 

I reproduced this, and this is kasan bug:

   a??0xffffffff84864897 <x86_early_init_platform_quirks+5>   mov    $0xffffffff83f1d0b8,%rdi 
   a??0xffffffff8486489e <x86_early_init_platform_quirks+12>  movabs $0xdffffc0000000000,%rax 
   a??0xffffffff848648a8 <x86_early_init_platform_quirks+22>  push   %rbp
   a??0xffffffff848648a9 <x86_early_init_platform_quirks+23>  mov    %rdi,%rdx  
   a??0xffffffff848648ac <x86_early_init_platform_quirks+26>  shr    $0x3,%rdx
   a??0xffffffff848648b0 <x86_early_init_platform_quirks+30>  mov    %rsp,%rbp
  >a??0xffffffff848648b3 <x86_early_init_platform_quirks+33>  mov    (%rdx,%rax,1),%al

we crash on the last move which is a read from shadow

(gdb) p/x $rdx 
$1 = 0x1ffffffff07e3a17
(gdb) p/x $rax
$2 = 0xdffffc0000000000

(gdb) p/x 0xdffffc0000000000 + 0x1ffffffff07e3a17
$4 = 0xfffffbfff07e3a17
(gdb) p/x *0xfffffbfff07e3a17
Cannot access memory at address 0xfffffbfff07e3a17

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11 15:15                                     ` Andrey Ryabinin
@ 2017-07-11 16:45                                       ` Andrey Ryabinin
  2017-07-11 17:03                                         ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-07-11 16:45 UTC (permalink / raw)
  To: Andy Lutomirski, Kirill A. Shutemov
  Cc: Dmitry Vyukov, Alexander Potapenko, Kirill A. Shutemov,
	Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, linux-arch, linux-mm,
	LKML, kasan-dev

On 07/11/2017 06:15 PM, Andrey Ryabinin wrote:
> 
> I reproduced this, and this is kasan bug:
> 
>    a??0xffffffff84864897 <x86_early_init_platform_quirks+5>   mov    $0xffffffff83f1d0b8,%rdi 
>    a??0xffffffff8486489e <x86_early_init_platform_quirks+12>  movabs $0xdffffc0000000000,%rax 
>    a??0xffffffff848648a8 <x86_early_init_platform_quirks+22>  push   %rbp
>    a??0xffffffff848648a9 <x86_early_init_platform_quirks+23>  mov    %rdi,%rdx  
>    a??0xffffffff848648ac <x86_early_init_platform_quirks+26>  shr    $0x3,%rdx
>    a??0xffffffff848648b0 <x86_early_init_platform_quirks+30>  mov    %rsp,%rbp
>   >a??0xffffffff848648b3 <x86_early_init_platform_quirks+33>  mov    (%rdx,%rax,1),%al
> 
> we crash on the last move which is a read from shadow


Ughh, I forgot about phys_base.
Plus, I added KASAN_SANITIZE_paravirt.o :=n because with PARAVIRTY=y set_pgd() calls native_set_pgd()
from paravirt.c translation unit.



---
 arch/x86/kernel/Makefile    | 1 +
 arch/x86/mm/kasan_init_64.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4b994232cb57..5a1f18b87fb2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -24,6 +24,7 @@ KASAN_SANITIZE_head$(BITS).o				:= n
 KASAN_SANITIZE_dumpstack.o				:= n
 KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
 KASAN_SANITIZE_stacktrace.o := n
+KASAN_SANITIZE_paravirt.o				:= n
 
 OBJECT_FILES_NON_STANDARD_head_$(BITS).o		:= y
 OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o	:= y
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index d79a7ea83d05..d5743fd37df9 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -72,7 +72,8 @@ static void __init kasan_early_p4d_populate(pgd_t *pgd,
 	 * TODO: we need helpers for this shit
 	 */
 	if (CONFIG_PGTABLE_LEVELS == 5)
-		p4d = ((p4d_t*)((__pa_nodebug(pgd->pgd) & PTE_PFN_MASK) + __START_KERNEL_map))
+		p4d = ((p4d_t*)((__pa_nodebug(pgd->pgd) & PTE_PFN_MASK)
+					+ __START_KERNEL_map - phys_base))
 			+ p4d_index(addr);
 	else
 		p4d = (p4d_t*)pgd;
-- 
2.13.0


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11 16:45                                       ` Andrey Ryabinin
@ 2017-07-11 17:03                                         ` Kirill A. Shutemov
  2017-07-11 17:29                                           ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-11 17:03 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Andy Lutomirski, Dmitry Vyukov, Alexander Potapenko,
	Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Tue, Jul 11, 2017 at 07:45:48PM +0300, Andrey Ryabinin wrote:
> On 07/11/2017 06:15 PM, Andrey Ryabinin wrote:
> > 
> > I reproduced this, and this is kasan bug:
> > 
> >    a??0xffffffff84864897 <x86_early_init_platform_quirks+5>   mov    $0xffffffff83f1d0b8,%rdi 
> >    a??0xffffffff8486489e <x86_early_init_platform_quirks+12>  movabs $0xdffffc0000000000,%rax 
> >    a??0xffffffff848648a8 <x86_early_init_platform_quirks+22>  push   %rbp
> >    a??0xffffffff848648a9 <x86_early_init_platform_quirks+23>  mov    %rdi,%rdx  
> >    a??0xffffffff848648ac <x86_early_init_platform_quirks+26>  shr    $0x3,%rdx
> >    a??0xffffffff848648b0 <x86_early_init_platform_quirks+30>  mov    %rsp,%rbp
> >   >a??0xffffffff848648b3 <x86_early_init_platform_quirks+33>  mov    (%rdx,%rax,1),%al
> > 
> > we crash on the last move which is a read from shadow
> 
> 
> Ughh, I forgot about phys_base.

Thanks! Works for me.

Can use your Signed-off-by for a [cleaned up version of your] patch?


-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11 17:03                                         ` Kirill A. Shutemov
@ 2017-07-11 17:29                                           ` Andrey Ryabinin
  2017-07-11 19:05                                             ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-07-11 17:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Dmitry Vyukov, Alexander Potapenko,
	Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev



On 07/11/2017 08:03 PM, Kirill A. Shutemov wrote:
> On Tue, Jul 11, 2017 at 07:45:48PM +0300, Andrey Ryabinin wrote:
>> On 07/11/2017 06:15 PM, Andrey Ryabinin wrote:
>>>
>>> I reproduced this, and this is kasan bug:
>>>
>>>    a??0xffffffff84864897 <x86_early_init_platform_quirks+5>   mov    $0xffffffff83f1d0b8,%rdi 
>>>    a??0xffffffff8486489e <x86_early_init_platform_quirks+12>  movabs $0xdffffc0000000000,%rax 
>>>    a??0xffffffff848648a8 <x86_early_init_platform_quirks+22>  push   %rbp
>>>    a??0xffffffff848648a9 <x86_early_init_platform_quirks+23>  mov    %rdi,%rdx  
>>>    a??0xffffffff848648ac <x86_early_init_platform_quirks+26>  shr    $0x3,%rdx
>>>    a??0xffffffff848648b0 <x86_early_init_platform_quirks+30>  mov    %rsp,%rbp
>>>   >a??0xffffffff848648b3 <x86_early_init_platform_quirks+33>  mov    (%rdx,%rax,1),%al
>>>
>>> we crash on the last move which is a read from shadow
>>
>>
>> Ughh, I forgot about phys_base.
> 
> Thanks! Works for me.
> 
> Can use your Signed-off-by for a [cleaned up version of your] patch?

Sure.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11 17:29                                           ` Andrey Ryabinin
@ 2017-07-11 19:05                                             ` Kirill A. Shutemov
  2017-07-13 12:58                                               ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-11 19:05 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kirill A. Shutemov, Andy Lutomirski, Dmitry Vyukov,
	Alexander Potapenko, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

> > Can use your Signed-off-by for a [cleaned up version of your] patch?
> 
> Sure.

Another KASAN-releated issue: dumping page tables for KASAN shadow memory
region takes unreasonable time due to kasan_zero_p?? mapped there.

The patch below helps. Any objections?

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index b371ab68f2d4..8601153c34e7 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -17,8 +17,8 @@
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
+#include <linux/kasan.h>
 
-#include <asm/kasan.h>
 #include <asm/pgtable.h>
 
 /*
@@ -291,10 +291,15 @@ static void note_page(struct seq_file *m, struct pg_state *st,
 static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr, unsigned long P)
 {
 	int i;
+	unsigned long pte_addr;
 	pte_t *start;
 	pgprotval_t prot;
 
-	start = (pte_t *)pmd_page_vaddr(addr);
+	pte_addr = pmd_page_vaddr(addr);
+	if (__pa(pte_addr) == __pa(kasan_zero_pte))
+		return;
+
+	start = (pte_t *)pte_addr;
 	for (i = 0; i < PTRS_PER_PTE; i++) {
 		prot = pte_flags(*start);
 		st->current_address = normalize_addr(P + i * PTE_LEVEL_MULT);
@@ -308,10 +313,15 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
 static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr, unsigned long P)
 {
 	int i;
+	unsigned long pmd_addr;
 	pmd_t *start;
 	pgprotval_t prot;
 
-	start = (pmd_t *)pud_page_vaddr(addr);
+	pmd_addr = pud_page_vaddr(addr);
+	if (__pa(pmd_addr) == __pa(kasan_zero_pmd))
+		return;
+
+	start = (pmd_t *)pmd_addr;
 	for (i = 0; i < PTRS_PER_PMD; i++) {
 		st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
 		if (!pmd_none(*start)) {
@@ -350,12 +360,16 @@ static bool pud_already_checked(pud_t *prev_pud, pud_t *pud, bool checkwx)
 static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr, unsigned long P)
 {
 	int i;
+	unsigned long pud_addr;
 	pud_t *start;
 	pgprotval_t prot;
 	pud_t *prev_pud = NULL;
 
-	start = (pud_t *)p4d_page_vaddr(addr);
+	pud_addr = p4d_page_vaddr(addr);
+	if (__pa(pud_addr) == __pa(kasan_zero_pud))
+		return;
 
+	start = (pud_t *)pud_addr;
 	for (i = 0; i < PTRS_PER_PUD; i++) {
 		st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
 		if (!pud_none(*start) &&
@@ -386,11 +400,15 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
 static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr, unsigned long P)
 {
 	int i;
+	unsigned long p4d_addr;
 	p4d_t *start;
 	pgprotval_t prot;
 
-	start = (p4d_t *)pgd_page_vaddr(addr);
+	p4d_addr = pgd_page_vaddr(addr);
+	if (__pa(p4d_addr) == __pa(kasan_zero_p4d))
+		return;
 
+	start = (p4d_t *)p4d_addr;
 	for (i = 0; i < PTRS_PER_P4D; i++) {
 		st->current_address = normalize_addr(P + i * P4D_LEVEL_MULT);
 		if (!p4d_none(*start)) {
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-11 19:05                                             ` Kirill A. Shutemov
@ 2017-07-13 12:58                                               ` Andrey Ryabinin
  2017-07-13 13:52                                                 ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-07-13 12:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andy Lutomirski, Dmitry Vyukov,
	Alexander Potapenko, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On 07/11/2017 10:05 PM, Kirill A. Shutemov wrote:
>>> Can use your Signed-off-by for a [cleaned up version of your] patch?
>>
>> Sure.
> 
> Another KASAN-releated issue: dumping page tables for KASAN shadow memory
> region takes unreasonable time due to kasan_zero_p?? mapped there.
> 
> The patch below helps. Any objections?
> 

Well, page tables dump doesn't work at all on 5-level paging.
E.g. I've got this nonsense: 

....
---[ Kernel Space ]---
0xffff800000000000-0xffff808000000000         512G                               pud
---[ Low Kernel Mapping ]---
0xffff808000000000-0xffff810000000000         512G                               pud
---[ vmalloc() Area ]---
0xffff810000000000-0xffff818000000000         512G                               pud
---[ Vmemmap ]---
0xffff818000000000-0xffffff0000000000      128512G                               pud
---[ ESPfix Area ]---
0xffffff0000000000-0x0000000000000000           1T                               pud
0x0000000000000000-0x0000000000000000           0E                               pgd
0x0000000000000000-0x0000000000001000           4K     RW     PCD         GLB NX pte
0x0000000000001000-0x0000000000002000           4K                               pte
0x0000000000002000-0x0000000000003000           4K     ro                 GLB NX pte
0x0000000000003000-0x0000000000004000           4K                               pte
0x0000000000004000-0x0000000000007000          12K     RW                 GLB NX pte
0x0000000000007000-0x0000000000008000           4K                               pte
0x0000000000008000-0x0000000000108000           1M     RW                 GLB NX pte
0x0000000000108000-0x0000000000109000           4K                               pte
0x0000000000109000-0x0000000000189000         512K     RW                 GLB NX pte
0x0000000000189000-0x000000000018a000           4K                               pte
0x000000000018a000-0x000000000018e000          16K     RW                 GLB NX pte
0x000000000018e000-0x000000000018f000           4K                               pte
0x000000000018f000-0x0000000000193000          16K     RW                 GLB NX pte
0x0000000000193000-0x0000000000194000           4K                               pte
... 304 entries skipped ... 
---[ EFI Runtime Services ]---
0xffffffef00000000-0xffffffff80000000          66G                               pud
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffffc0000000           1G                               pud
...



As for KASAN, I think it would be better just to make it work faster, the patch below demonstrates the idea.



---
 arch/x86/mm/dump_pagetables.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 0470826d2bdc..36515fba86b0 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -13,6 +13,7 @@
  */
 
 #include <linux/debugfs.h>
+#include <linux/kasan.h>
 #include <linux/mm.h>
 #include <linux/init.h>
 #include <linux/sched.h>
@@ -307,16 +308,19 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
 static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr, unsigned long P)
 {
 	int i;
-	pmd_t *start;
+	pmd_t *start, *pmd_addr;
 	pgprotval_t prot;
 
-	start = (pmd_t *)pud_page_vaddr(addr);
+	pmd_addr = start = (pmd_t *)pud_page_vaddr(addr);
 	for (i = 0; i < PTRS_PER_PMD; i++) {
 		st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
 		if (!pmd_none(*start)) {
 			if (pmd_large(*start) || !pmd_present(*start)) {
 				prot = pmd_flags(*start);
 				note_page(m, st, __pgprot(prot), 3);
+			} else if (__pa(pmd_addr) == __pa(kasan_zero_pmd)) {
+				prot = pte_flags(kasan_zero_pte[0]);
+				note_page(m, st, __pgprot(prot), 4);
 			} else {
 				walk_pte_level(m, st, *start,
 					       P + i * PMD_LEVEL_MULT);
@@ -349,11 +353,11 @@ static bool pud_already_checked(pud_t *prev_pud, pud_t *pud, bool checkwx)
 static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr, unsigned long P)
 {
 	int i;
-	pud_t *start;
+	pud_t *start, *pud_addr;
 	pgprotval_t prot;
 	pud_t *prev_pud = NULL;
 
-	start = (pud_t *)p4d_page_vaddr(addr);
+	pud_addr = start = (pud_t *)p4d_page_vaddr(addr);
 
 	for (i = 0; i < PTRS_PER_PUD; i++) {
 		st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
@@ -362,6 +366,9 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
 			if (pud_large(*start) || !pud_present(*start)) {
 				prot = pud_flags(*start);
 				note_page(m, st, __pgprot(prot), 2);
+			} else if (__pa(pud_addr) == __pa(kasan_zero_pud)) {
+				prot = pte_flags(kasan_zero_pte[0]);
+				note_page(m, st, __pgprot(prot), 4);
 			} else {
 				walk_pmd_level(m, st, *start,
 					       P + i * PUD_LEVEL_MULT);
@@ -385,10 +392,10 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
 static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr, unsigned long P)
 {
 	int i;
-	p4d_t *start;
+	p4d_t *start, *p4d_addr;
 	pgprotval_t prot;
 
-	start = (p4d_t *)pgd_page_vaddr(addr);
+	p4d_addr = start = (p4d_t *)pgd_page_vaddr(addr);
 
 	for (i = 0; i < PTRS_PER_P4D; i++) {
 		st->current_address = normalize_addr(P + i * P4D_LEVEL_MULT);
@@ -396,6 +403,9 @@ static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
 			if (p4d_large(*start) || !p4d_present(*start)) {
 				prot = p4d_flags(*start);
 				note_page(m, st, __pgprot(prot), 2);
+			} else if (__pa(p4d_addr) == __pa(kasan_zero_p4d)) {
+				prot = pte_flags(kasan_zero_pte[0]);
+				note_page(m, st, __pgprot(prot), 4);
 			} else {
 				walk_pud_level(m, st, *start,
 					       P + i * P4D_LEVEL_MULT);
-- 
2.13.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-13 12:58                                               ` Andrey Ryabinin
@ 2017-07-13 13:52                                                 ` Kirill A. Shutemov
  2017-07-13 14:15                                                   ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-13 13:52 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kirill A. Shutemov, Andy Lutomirski, Dmitry Vyukov,
	Alexander Potapenko, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Thu, Jul 13, 2017 at 03:58:29PM +0300, Andrey Ryabinin wrote:
> On 07/11/2017 10:05 PM, Kirill A. Shutemov wrote:
> >>> Can use your Signed-off-by for a [cleaned up version of your] patch?
> >>
> >> Sure.
> > 
> > Another KASAN-releated issue: dumping page tables for KASAN shadow memory
> > region takes unreasonable time due to kasan_zero_p?? mapped there.
> > 
> > The patch below helps. Any objections?
> > 
> 
> Well, page tables dump doesn't work at all on 5-level paging.
> E.g. I've got this nonsense: 
> 
> ....
> ---[ Kernel Space ]---
> 0xffff800000000000-0xffff808000000000         512G                               pud
> ---[ Low Kernel Mapping ]---
> 0xffff808000000000-0xffff810000000000         512G                               pud
> ---[ vmalloc() Area ]---
> 0xffff810000000000-0xffff818000000000         512G                               pud
> ---[ Vmemmap ]---
> 0xffff818000000000-0xffffff0000000000      128512G                               pud
> ---[ ESPfix Area ]---
> 0xffffff0000000000-0x0000000000000000           1T                               pud
> 0x0000000000000000-0x0000000000000000           0E                               pgd
> 0x0000000000000000-0x0000000000001000           4K     RW     PCD         GLB NX pte
> 0x0000000000001000-0x0000000000002000           4K                               pte
> 0x0000000000002000-0x0000000000003000           4K     ro                 GLB NX pte
> 0x0000000000003000-0x0000000000004000           4K                               pte
> 0x0000000000004000-0x0000000000007000          12K     RW                 GLB NX pte
> 0x0000000000007000-0x0000000000008000           4K                               pte
> 0x0000000000008000-0x0000000000108000           1M     RW                 GLB NX pte
> 0x0000000000108000-0x0000000000109000           4K                               pte
> 0x0000000000109000-0x0000000000189000         512K     RW                 GLB NX pte
> 0x0000000000189000-0x000000000018a000           4K                               pte
> 0x000000000018a000-0x000000000018e000          16K     RW                 GLB NX pte
> 0x000000000018e000-0x000000000018f000           4K                               pte
> 0x000000000018f000-0x0000000000193000          16K     RW                 GLB NX pte
> 0x0000000000193000-0x0000000000194000           4K                               pte
> ... 304 entries skipped ... 
> ---[ EFI Runtime Services ]---
> 0xffffffef00000000-0xffffffff80000000          66G                               pud
> ---[ High Kernel Mapping ]---
> 0xffffffff80000000-0xffffffffc0000000           1G                               pud
> ...

Hm. I don't see this:

...
[    0.247532] 0xff9e938000000000-0xff9f000000000000      111104G                               p4d
[    0.247733] 0xff9f000000000000-0xffff000000000000          24P                               pgd
[    0.248066] 0xffff000000000000-0xffffff0000000000         255T                               p4d
[    0.248290] ---[ ESPfix Area ]---
[    0.248393] 0xffffff0000000000-0xffffff8000000000         512G                               p4d
[    0.248663] 0xffffff8000000000-0xffffffef00000000         444G                               pud
[    0.248892] ---[ EFI Runtime Services ]---
[    0.248996] 0xffffffef00000000-0xfffffffec0000000          63G                               pud
[    0.249308] 0xfffffffec0000000-0xfffffffefe400000         996M                               pmd
...

Do you have commit "x86/dump_pagetables: Generalize address normalization"
in your tree?

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01

> As for KASAN, I think it would be better just to make it work faster,
> the patch below demonstrates the idea.

Okay, let me test this.

> ---
>  arch/x86/mm/dump_pagetables.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 0470826d2bdc..36515fba86b0 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -13,6 +13,7 @@
>   */
>  
>  #include <linux/debugfs.h>
> +#include <linux/kasan.h>
>  #include <linux/mm.h>
>  #include <linux/init.h>
>  #include <linux/sched.h>

<asm/kasan.h> can be dropped. And I don't think it compiles with KASAN
disabled.

For reference, the patch I use now:

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=c4b1439f719b1689a1cfca9c0df17b9f8b8462b9

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-13 13:52                                                 ` Kirill A. Shutemov
@ 2017-07-13 14:15                                                   ` Kirill A. Shutemov
  2017-07-13 14:19                                                     ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-13 14:15 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kirill A. Shutemov, Andy Lutomirski, Dmitry Vyukov,
	Alexander Potapenko, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Thu, Jul 13, 2017 at 04:52:28PM +0300, Kirill A. Shutemov wrote:
> On Thu, Jul 13, 2017 at 03:58:29PM +0300, Andrey Ryabinin wrote:
> > On 07/11/2017 10:05 PM, Kirill A. Shutemov wrote:
> > >>> Can use your Signed-off-by for a [cleaned up version of your] patch?
> > >>
> > >> Sure.
> > > 
> > > Another KASAN-releated issue: dumping page tables for KASAN shadow memory
> > > region takes unreasonable time due to kasan_zero_p?? mapped there.
> > > 
> > > The patch below helps. Any objections?
> > > 
> > 
> > Well, page tables dump doesn't work at all on 5-level paging.
> > E.g. I've got this nonsense: 
> > 
> > ....
> > ---[ Kernel Space ]---
> > 0xffff800000000000-0xffff808000000000         512G                               pud
> > ---[ Low Kernel Mapping ]---
> > 0xffff808000000000-0xffff810000000000         512G                               pud
> > ---[ vmalloc() Area ]---
> > 0xffff810000000000-0xffff818000000000         512G                               pud
> > ---[ Vmemmap ]---
> > 0xffff818000000000-0xffffff0000000000      128512G                               pud
> > ---[ ESPfix Area ]---
> > 0xffffff0000000000-0x0000000000000000           1T                               pud
> > 0x0000000000000000-0x0000000000000000           0E                               pgd
> > 0x0000000000000000-0x0000000000001000           4K     RW     PCD         GLB NX pte
> > 0x0000000000001000-0x0000000000002000           4K                               pte
> > 0x0000000000002000-0x0000000000003000           4K     ro                 GLB NX pte
> > 0x0000000000003000-0x0000000000004000           4K                               pte
> > 0x0000000000004000-0x0000000000007000          12K     RW                 GLB NX pte
> > 0x0000000000007000-0x0000000000008000           4K                               pte
> > 0x0000000000008000-0x0000000000108000           1M     RW                 GLB NX pte
> > 0x0000000000108000-0x0000000000109000           4K                               pte
> > 0x0000000000109000-0x0000000000189000         512K     RW                 GLB NX pte
> > 0x0000000000189000-0x000000000018a000           4K                               pte
> > 0x000000000018a000-0x000000000018e000          16K     RW                 GLB NX pte
> > 0x000000000018e000-0x000000000018f000           4K                               pte
> > 0x000000000018f000-0x0000000000193000          16K     RW                 GLB NX pte
> > 0x0000000000193000-0x0000000000194000           4K                               pte
> > ... 304 entries skipped ... 
> > ---[ EFI Runtime Services ]---
> > 0xffffffef00000000-0xffffffff80000000          66G                               pud
> > ---[ High Kernel Mapping ]---
> > 0xffffffff80000000-0xffffffffc0000000           1G                               pud
> > ...
> 
> Hm. I don't see this:
> 
> ...
> [    0.247532] 0xff9e938000000000-0xff9f000000000000      111104G                               p4d
> [    0.247733] 0xff9f000000000000-0xffff000000000000          24P                               pgd
> [    0.248066] 0xffff000000000000-0xffffff0000000000         255T                               p4d
> [    0.248290] ---[ ESPfix Area ]---
> [    0.248393] 0xffffff0000000000-0xffffff8000000000         512G                               p4d
> [    0.248663] 0xffffff8000000000-0xffffffef00000000         444G                               pud
> [    0.248892] ---[ EFI Runtime Services ]---
> [    0.248996] 0xffffffef00000000-0xfffffffec0000000          63G                               pud
> [    0.249308] 0xfffffffec0000000-0xfffffffefe400000         996M                               pmd
> ...
> 
> Do you have commit "x86/dump_pagetables: Generalize address normalization"
> in your tree?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
> 
> > As for KASAN, I think it would be better just to make it work faster,
> > the patch below demonstrates the idea.
> 
> Okay, let me test this.

The patch works for me.

The problem is not exclusive to 5-level paging, so could you prepare and
push proper patch upstream?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-13 14:15                                                   ` Kirill A. Shutemov
@ 2017-07-13 14:19                                                     ` Andrey Ryabinin
  2017-07-24 12:13                                                       ` Kirill A. Shutemov
  0 siblings, 1 reply; 54+ messages in thread
From: Andrey Ryabinin @ 2017-07-13 14:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andy Lutomirski, Dmitry Vyukov,
	Alexander Potapenko, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev



On 07/13/2017 05:15 PM, Kirill A. Shutemov wrote:

>>
>> Hm. I don't see this:
>>
>> ...
>> [    0.247532] 0xff9e938000000000-0xff9f000000000000      111104G                               p4d
>> [    0.247733] 0xff9f000000000000-0xffff000000000000          24P                               pgd
>> [    0.248066] 0xffff000000000000-0xffffff0000000000         255T                               p4d
>> [    0.248290] ---[ ESPfix Area ]---
>> [    0.248393] 0xffffff0000000000-0xffffff8000000000         512G                               p4d
>> [    0.248663] 0xffffff8000000000-0xffffffef00000000         444G                               pud
>> [    0.248892] ---[ EFI Runtime Services ]---
>> [    0.248996] 0xffffffef00000000-0xfffffffec0000000          63G                               pud
>> [    0.249308] 0xfffffffec0000000-0xfffffffefe400000         996M                               pmd
>> ...
>>
>> Do you have commit "x86/dump_pagetables: Generalize address normalization"
>> in your tree?
>>

Nope. Applied now, it helped.

>> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
>>
>>> As for KASAN, I think it would be better just to make it work faster,
>>> the patch below demonstrates the idea.
>>
>> Okay, let me test this.
> 
> The patch works for me.
> 
> The problem is not exclusive to 5-level paging, so could you prepare and
> push proper patch upstream?
> 

Sure, will do

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-13 14:19                                                     ` Andrey Ryabinin
@ 2017-07-24 12:13                                                       ` Kirill A. Shutemov
  2017-07-24 14:07                                                         ` Andrey Ryabinin
  0 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2017-07-24 12:13 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kirill A. Shutemov, Andy Lutomirski, Dmitry Vyukov,
	Alexander Potapenko, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev

On Thu, Jul 13, 2017 at 05:19:22PM +0300, Andrey Ryabinin wrote:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
> >>
> >>> As for KASAN, I think it would be better just to make it work faster,
> >>> the patch below demonstrates the idea.
> >>
> >> Okay, let me test this.
> > 
> > The patch works for me.
> > 
> > The problem is not exclusive to 5-level paging, so could you prepare and
> > push proper patch upstream?
> > 
> 
> Sure, will do

Andrey, any follow up on this?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: KASAN vs. boot-time switching between 4- and 5-level paging
  2017-07-24 12:13                                                       ` Kirill A. Shutemov
@ 2017-07-24 14:07                                                         ` Andrey Ryabinin
  0 siblings, 0 replies; 54+ messages in thread
From: Andrey Ryabinin @ 2017-07-24 14:07 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andy Lutomirski, Dmitry Vyukov,
	Alexander Potapenko, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, linux-arch, linux-mm, LKML, kasan-dev



On 07/24/2017 03:13 PM, Kirill A. Shutemov wrote:
> On Thu, Jul 13, 2017 at 05:19:22PM +0300, Andrey Ryabinin wrote:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/v2&id=13327fec85ffe95d9c8a3f57ba174bf5d5c1fb01
>>>>
>>>>> As for KASAN, I think it would be better just to make it work faster,
>>>>> the patch below demonstrates the idea.
>>>>
>>>> Okay, let me test this.
>>>
>>> The patch works for me.
>>>
>>> The problem is not exclusive to 5-level paging, so could you prepare and
>>> push proper patch upstream?
>>>
>>
>> Sure, will do
> 
> Andrey, any follow up on this?
> 

Sorry, I've been busy a bit. Will send patch shortly

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2017-07-24 14:04 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 1/8] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 2/8] x86/mm: Make virtual memory layout movable for CONFIG_X86_5LEVEL Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 3/8] x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 4/8] x86/mm: Handle boot-time paging mode switching at early boot Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime Kirill A. Shutemov
2017-05-27 15:09   ` Brian Gerst
2017-05-27 22:46     ` Kirill A. Shutemov
2017-05-27 22:56       ` Brian Gerst
2017-05-25 20:33 ` [PATCHv1, RFC 6/8] x86/mm: Replace compile-time checks for 5-level with runtime-time Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 7/8] x86/mm: Hacks for boot-time switching between 4- and 5-level paging Kirill A. Shutemov
2017-05-26 22:10   ` KASAN vs. " Kirill A. Shutemov
2017-05-29 10:02     ` Dmitry Vyukov
2017-05-29 11:18       ` Andrey Ryabinin
2017-05-29 11:19         ` Dmitry Vyukov
2017-05-29 11:45           ` Andrey Ryabinin
2017-05-29 12:46             ` Andrey Ryabinin
2017-06-01 14:56               ` Andrey Ryabinin
2017-07-10 12:33                 ` Kirill A. Shutemov
2017-07-10 12:43                   ` Dmitry Vyukov
2017-07-10 14:17                     ` Kirill A. Shutemov
2017-07-10 15:56                       ` Andy Lutomirski
2017-07-10 18:47                         ` Kirill A. Shutemov
2017-07-10 20:07                           ` Andy Lutomirski
2017-07-10 21:24                             ` Kirill A. Shutemov
2017-07-11  0:30                               ` Andy Lutomirski
2017-07-11 10:35                                 ` Kirill A. Shutemov
2017-07-11 15:06                                   ` Andy Lutomirski
2017-07-11 15:15                                     ` Andrey Ryabinin
2017-07-11 16:45                                       ` Andrey Ryabinin
2017-07-11 17:03                                         ` Kirill A. Shutemov
2017-07-11 17:29                                           ` Andrey Ryabinin
2017-07-11 19:05                                             ` Kirill A. Shutemov
2017-07-13 12:58                                               ` Andrey Ryabinin
2017-07-13 13:52                                                 ` Kirill A. Shutemov
2017-07-13 14:15                                                   ` Kirill A. Shutemov
2017-07-13 14:19                                                     ` Andrey Ryabinin
2017-07-24 12:13                                                       ` Kirill A. Shutemov
2017-07-24 14:07                                                         ` Andrey Ryabinin
2017-07-10 16:57                   ` Andrey Ryabinin
2017-05-25 20:33 ` [PATCHv1, RFC 8/8] x86/mm: Allow to boot without la57 if CONFIG_X86_5LEVEL=y Kirill A. Shutemov
2017-05-25 23:24 ` [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Linus Torvalds
2017-05-26  0:40   ` Andy Lutomirski
2017-05-26  4:18     ` Kevin Easton
2017-05-26  7:21       ` Andy Lutomirski
2017-05-26 13:00   ` Kirill A. Shutemov
2017-05-26 13:35     ` Andi Kleen
2017-05-26 15:51     ` Linus Torvalds
2017-05-26 15:58       ` Kirill A. Shutemov
2017-05-26 16:13         ` Linus Torvalds
2017-05-26 18:24       ` hpa
2017-05-26 19:23         ` Dave Hansen
2017-05-26 19:36           ` hpa
2017-05-26 19:40     ` hpa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).