linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
@ 2017-03-13 14:33 Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 1/6] x86/mm: Extend headers with basic definitions to support 5-level paging Kirill A. Shutemov
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-13 14:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

Here's the first bunch of patches of 5-level patchset. Let's see if I'm on
right track addressing Ingo's feedback. :)

These patches prepare x86 code to be switched from <asm-generic/5level-fixup>
to <asm-generic/pgtable-nop4d.h>. It's a stepping stone for adding 5-level
paging support.

Please review and consider applying.

Kirill A. Shutemov (6):
  x86/mm: Extend headers with basic definitions to support 5-level
    paging
  x86/mm: Convert trivial cases of page table walk to 5-level paging
  x86/gup: Add 5-level paging support
  x86/ident_map: Add 5-level paging support
  x86/vmalloc: Add 5-level paging support
  x86/power: Add 5-level paging support

 arch/x86/include/asm/pgtable-2level_types.h |  1 +
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable.h              | 26 +++++++++---
 arch/x86/include/asm/pgtable_64_types.h     |  1 +
 arch/x86/include/asm/pgtable_types.h        | 30 ++++++++++++-
 arch/x86/kernel/tboot.c                     |  6 ++-
 arch/x86/kernel/vm86_32.c                   |  6 ++-
 arch/x86/mm/fault.c                         | 66 +++++++++++++++++++++++++----
 arch/x86/mm/gup.c                           | 33 ++++++++++++---
 arch/x86/mm/ident_map.c                     | 51 +++++++++++++++++++---
 arch/x86/mm/init_32.c                       | 22 +++++++---
 arch/x86/mm/ioremap.c                       |  3 +-
 arch/x86/mm/pgtable.c                       |  4 +-
 arch/x86/mm/pgtable_32.c                    |  8 +++-
 arch/x86/platform/efi/efi_64.c              | 13 ++++--
 arch/x86/power/hibernate_32.c               |  7 ++-
 arch/x86/power/hibernate_64.c               | 50 ++++++++++++++++------
 17 files changed, 269 insertions(+), 59 deletions(-)

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] x86/mm: Extend headers with basic definitions to support 5-level paging
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
@ 2017-03-13 14:33 ` Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 2/6] x86/mm: Convert trivial cases of page table walk to " Kirill A. Shutemov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-13 14:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

This patch extends x86 headers to enable 5-level paging support.

It's still based on <asm-generic/5level-fixup.h>. We will get to the
point where we can have <asm-generic/pgtable-nop4d.h> later.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable-2level_types.h |  1 +
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable.h              | 26 ++++++++++++++++++++-----
 arch/x86/include/asm/pgtable_64_types.h     |  1 +
 arch/x86/include/asm/pgtable_types.h        | 30 ++++++++++++++++++++++++++++-
 5 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level_types.h b/arch/x86/include/asm/pgtable-2level_types.h
index 392576433e77..373ab1de909f 100644
--- a/arch/x86/include/asm/pgtable-2level_types.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -7,6 +7,7 @@
 typedef unsigned long	pteval_t;
 typedef unsigned long	pmdval_t;
 typedef unsigned long	pudval_t;
+typedef unsigned long	p4dval_t;
 typedef unsigned long	pgdval_t;
 typedef unsigned long	pgprotval_t;
 
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index bcc89625ebe5..b8a4341faafa 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -7,6 +7,7 @@
 typedef u64	pteval_t;
 typedef u64	pmdval_t;
 typedef u64	pudval_t;
+typedef u64	p4dval_t;
 typedef u64	pgdval_t;
 typedef u64	pgprotval_t;
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1cfb36b8c024..6f6f351e0a81 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -179,6 +179,17 @@ static inline unsigned long pud_pfn(pud_t pud)
 	return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
 }
 
+static inline unsigned long p4d_pfn(p4d_t p4d)
+{
+	return (p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT;
+}
+
+static inline int p4d_large(p4d_t p4d)
+{
+	/* No 512 GiB pages yet */
+	return 0;
+}
+
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 
 static inline int pmd_large(pmd_t pte)
@@ -770,6 +781,16 @@ static inline int pud_large(pud_t pud)
 }
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
+static inline unsigned long pud_index(unsigned long address)
+{
+	return (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+}
+
+static inline unsigned long p4d_index(unsigned long address)
+{
+	return (address >> P4D_SHIFT) & (PTRS_PER_P4D - 1);
+}
+
 #if CONFIG_PGTABLE_LEVELS > 3
 static inline int pgd_present(pgd_t pgd)
 {
@@ -788,11 +809,6 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
 #define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
 
 /* to find an entry in a page-table-directory. */
-static inline unsigned long pud_index(unsigned long address)
-{
-	return (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
-}
-
 static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address)
 {
 	return (pud_t *)pgd_page_vaddr(*pgd) + pud_index(address);
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 3a264200c62f..0b2797e5083c 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -13,6 +13,7 @@
 typedef unsigned long	pteval_t;
 typedef unsigned long	pmdval_t;
 typedef unsigned long	pudval_t;
+typedef unsigned long	p4dval_t;
 typedef unsigned long	pgdval_t;
 typedef unsigned long	pgprotval_t;
 
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 62484333673d..df08535f774a 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -272,9 +272,20 @@ static inline pgdval_t pgd_flags(pgd_t pgd)
 	return native_pgd_val(pgd) & PTE_FLAGS_MASK;
 }
 
-#if CONFIG_PGTABLE_LEVELS > 3
+#if CONFIG_PGTABLE_LEVELS > 4
+
+#error FIXME
+
+#else
 #include <asm-generic/5level-fixup.h>
 
+static inline p4dval_t native_p4d_val(p4d_t p4d)
+{
+	return native_pgd_val(p4d);
+}
+#endif
+
+#if CONFIG_PGTABLE_LEVELS > 3
 typedef struct { pudval_t pud; } pud_t;
 
 static inline pud_t native_make_pud(pmdval_t val)
@@ -318,6 +329,22 @@ static inline pmdval_t native_pmd_val(pmd_t pmd)
 }
 #endif
 
+static inline p4dval_t p4d_pfn_mask(p4d_t p4d)
+{
+	/* No 512 GiB huge pages yet */
+	return PTE_PFN_MASK;
+}
+
+static inline p4dval_t p4d_flags_mask(p4d_t p4d)
+{
+	return ~p4d_pfn_mask(p4d);
+}
+
+static inline p4dval_t p4d_flags(p4d_t p4d)
+{
+	return native_p4d_val(p4d) & p4d_flags_mask(p4d);
+}
+
 static inline pudval_t pud_pfn_mask(pud_t pud)
 {
 	if (native_pud_val(pud) & _PAGE_PSE)
@@ -461,6 +488,7 @@ enum pg_level {
 	PG_LEVEL_4K,
 	PG_LEVEL_2M,
 	PG_LEVEL_1G,
+	PG_LEVEL_512G,
 	PG_LEVEL_NUM
 };
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] x86/mm: Convert trivial cases of page table walk to 5-level paging
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 1/6] x86/mm: Extend headers with basic definitions to support 5-level paging Kirill A. Shutemov
@ 2017-03-13 14:33 ` Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 3/6] x86/gup: Add 5-level paging support Kirill A. Shutemov
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-13 14:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

This patch only covers simple cases. Less trivial cases will be
converted with separate patches.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/tboot.c        |  6 +++++-
 arch/x86/kernel/vm86_32.c      |  6 +++++-
 arch/x86/mm/fault.c            | 39 +++++++++++++++++++++++++++++++++------
 arch/x86/mm/init_32.c          | 22 ++++++++++++++++------
 arch/x86/mm/ioremap.c          |  3 ++-
 arch/x86/mm/pgtable.c          |  4 +++-
 arch/x86/mm/pgtable_32.c       |  8 +++++++-
 arch/x86/platform/efi/efi_64.c | 13 +++++++++----
 arch/x86/power/hibernate_32.c  |  7 +++++--
 9 files changed, 85 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index b868fa1b812b..5db0f33cbf2c 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -118,12 +118,16 @@ static int map_tboot_page(unsigned long vaddr, unsigned long pfn,
 			  pgprot_t prot)
 {
 	pgd_t *pgd;
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 
 	pgd = pgd_offset(&tboot_mm, vaddr);
-	pud = pud_alloc(&tboot_mm, pgd, vaddr);
+	p4d = p4d_alloc(&tboot_mm, pgd, vaddr);
+	if (!p4d)
+		return -1;
+	pud = pud_alloc(&tboot_mm, p4d, vaddr);
 	if (!pud)
 		return -1;
 	pmd = pmd_alloc(&tboot_mm, pud, vaddr);
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 23ee89ce59a9..62597c300d94 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -164,6 +164,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	spinlock_t *ptl;
 	pgd_t *pgd;
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -173,7 +174,10 @@ static void mark_screen_rdonly(struct mm_struct *mm)
 	pgd = pgd_offset(mm, 0xA0000);
 	if (pgd_none_or_clear_bad(pgd))
 		goto out;
-	pud = pud_offset(pgd, 0xA0000);
+	p4d = p4d_offset(pgd, 0xA0000);
+	if (p4d_none_or_clear_bad(p4d))
+		goto out;
+	pud = pud_offset(p4d, 0xA0000);
 	if (pud_none_or_clear_bad(pud))
 		goto out;
 	pmd = pmd_offset(pud, 0xA0000);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 428e31763cb9..605fd5e8e048 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -253,6 +253,7 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 {
 	unsigned index = pgd_index(address);
 	pgd_t *pgd_k;
+	p4d_t *p4d, *p4d_k;
 	pud_t *pud, *pud_k;
 	pmd_t *pmd, *pmd_k;
 
@@ -265,10 +266,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 	/*
 	 * set_pgd(pgd, *pgd_k); here would be useless on PAE
 	 * and redundant with the set_pmd() on non-PAE. As would
-	 * set_pud.
+	 * set_p4d/set_pud.
 	 */
-	pud = pud_offset(pgd, address);
-	pud_k = pud_offset(pgd_k, address);
+	p4d = p4d_offset(pgd, address);
+	p4d_k = p4d_offset(pgd_k, address);
+	if (!p4d_present(*p4d_k))
+		return NULL;
+
+	pud = pud_offset(p4d, address);
+	pud_k = pud_offset(p4d_k, address);
 	if (!pud_present(*pud_k))
 		return NULL;
 
@@ -384,6 +390,8 @@ static void dump_pagetable(unsigned long address)
 {
 	pgd_t *base = __va(read_cr3());
 	pgd_t *pgd = &base[pgd_index(address)];
+	p4d_t *p4d;
+	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 
@@ -392,7 +400,9 @@ static void dump_pagetable(unsigned long address)
 	if (!low_pfn(pgd_val(*pgd) >> PAGE_SHIFT) || !pgd_present(*pgd))
 		goto out;
 #endif
-	pmd = pmd_offset(pud_offset(pgd, address), address);
+	p4d = p4d_offset(pgd, address);
+	pud = pud_offset(p4d, address);
+	pmd = pmd_offset(pud, address);
 	printk(KERN_CONT "*pde = %0*Lx ", sizeof(*pmd) * 2, (u64)pmd_val(*pmd));
 
 	/*
@@ -526,6 +536,7 @@ static void dump_pagetable(unsigned long address)
 {
 	pgd_t *base = __va(read_cr3() & PHYSICAL_PAGE_MASK);
 	pgd_t *pgd = base + pgd_index(address);
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -538,7 +549,15 @@ static void dump_pagetable(unsigned long address)
 	if (!pgd_present(*pgd))
 		goto out;
 
-	pud = pud_offset(pgd, address);
+	p4d = p4d_offset(pgd, address);
+	if (bad_address(p4d))
+		goto bad;
+
+	printk("P4D %lx ", p4d_val(*p4d));
+	if (!p4d_present(*p4d) || p4d_large(*p4d))
+		goto out;
+
+	pud = pud_offset(p4d, address);
 	if (bad_address(pud))
 		goto bad;
 
@@ -1082,6 +1101,7 @@ static noinline int
 spurious_fault(unsigned long error_code, unsigned long address)
 {
 	pgd_t *pgd;
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -1104,7 +1124,14 @@ spurious_fault(unsigned long error_code, unsigned long address)
 	if (!pgd_present(*pgd))
 		return 0;
 
-	pud = pud_offset(pgd, address);
+	p4d = p4d_offset(pgd, address);
+	if (!p4d_present(*p4d))
+		return 0;
+
+	if (p4d_large(*p4d))
+		return spurious_fault_check(error_code, (pte_t *) p4d);
+
+	pud = pud_offset(p4d, address);
 	if (!pud_present(*pud))
 		return 0;
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 2b4b53e6793f..5ed3c141bbd5 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -67,6 +67,7 @@ bool __read_mostly __vmalloc_start_set = false;
  */
 static pmd_t * __init one_md_table_init(pgd_t *pgd)
 {
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd_table;
 
@@ -75,13 +76,15 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd)
 		pmd_table = (pmd_t *)alloc_low_page();
 		paravirt_alloc_pmd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT);
 		set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
-		pud = pud_offset(pgd, 0);
+		p4d = p4d_offset(pgd, 0);
+		pud = pud_offset(p4d, 0);
 		BUG_ON(pmd_table != pmd_offset(pud, 0));
 
 		return pmd_table;
 	}
 #endif
-	pud = pud_offset(pgd, 0);
+	p4d = p4d_offset(pgd, 0);
+	pud = pud_offset(p4d, 0);
 	pmd_table = pmd_offset(pud, 0);
 
 	return pmd_table;
@@ -390,8 +393,11 @@ pte_t *kmap_pte;
 
 static inline pte_t *kmap_get_fixmap_pte(unsigned long vaddr)
 {
-	return pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(vaddr),
-			vaddr), vaddr), vaddr);
+	pgd_t *pgd = pgd_offset_k(vaddr);
+	p4d_t *p4d = p4d_offset(pgd, vaddr);
+	pud_t *pud = pud_offset(p4d, vaddr);
+	pmd_t *pmd = pmd_offset(pud, vaddr);
+	return pte_offset_kernel(pmd, vaddr);
 }
 
 static void __init kmap_init(void)
@@ -410,6 +416,7 @@ static void __init permanent_kmaps_init(pgd_t *pgd_base)
 {
 	unsigned long vaddr;
 	pgd_t *pgd;
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -418,7 +425,8 @@ static void __init permanent_kmaps_init(pgd_t *pgd_base)
 	page_table_range_init(vaddr, vaddr + PAGE_SIZE*LAST_PKMAP, pgd_base);
 
 	pgd = swapper_pg_dir + pgd_index(vaddr);
-	pud = pud_offset(pgd, vaddr);
+	p4d = p4d_offset(pgd, vaddr);
+	pud = pud_offset(p4d, vaddr);
 	pmd = pmd_offset(pud, vaddr);
 	pte = pte_offset_kernel(pmd, vaddr);
 	pkmap_page_table = pte;
@@ -450,6 +458,7 @@ void __init native_pagetable_init(void)
 {
 	unsigned long pfn, va;
 	pgd_t *pgd, *base = swapper_pg_dir;
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -469,7 +478,8 @@ void __init native_pagetable_init(void)
 		if (!pgd_present(*pgd))
 			break;
 
-		pud = pud_offset(pgd, va);
+		p4d = p4d_offset(pgd, va);
+		pud = pud_offset(p4d, va);
 		pmd = pmd_offset(pud, va);
 		if (!pmd_present(*pmd))
 			break;
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 7aaa2635862d..a5e1cda85974 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -425,7 +425,8 @@ static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
 	/* Don't assume we're using swapper_pg_dir at this point */
 	pgd_t *base = __va(read_cr3());
 	pgd_t *pgd = &base[pgd_index(addr)];
-	pud_t *pud = pud_offset(pgd, addr);
+	p4d_t *p4d = p4d_offset(pgd, addr);
+	pud_t *pud = pud_offset(p4d, addr);
 	pmd_t *pmd = pmd_offset(pud, addr);
 
 	return pmd;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6cbdff26bb96..38b6daf72deb 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -261,13 +261,15 @@ static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
 
 static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
 {
+	p4d_t *p4d;
 	pud_t *pud;
 	int i;
 
 	if (PREALLOCATED_PMDS == 0) /* Work around gcc-3.4.x bug */
 		return;
 
-	pud = pud_offset(pgd, 0);
+	p4d = p4d_offset(pgd, 0);
+	pud = pud_offset(p4d, 0);
 
 	for (i = 0; i < PREALLOCATED_PMDS; i++, pud++) {
 		pmd_t *pmd = pmds[i];
diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
index 9adce776852b..3d275a791c76 100644
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -26,6 +26,7 @@ unsigned int __VMALLOC_RESERVE = 128 << 20;
 void set_pte_vaddr(unsigned long vaddr, pte_t pteval)
 {
 	pgd_t *pgd;
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -35,7 +36,12 @@ void set_pte_vaddr(unsigned long vaddr, pte_t pteval)
 		BUG();
 		return;
 	}
-	pud = pud_offset(pgd, vaddr);
+	p4d = p4d_offset(pgd, vaddr);
+	if (p4d_none(*p4d)) {
+		BUG();
+		return;
+	}
+	pud = pud_offset(p4d, vaddr);
 	if (pud_none(*pud)) {
 		BUG();
 		return;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index a4695da42d77..8544dae3d1b4 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -166,6 +166,7 @@ void efi_sync_low_kernel_mappings(void)
 {
 	unsigned num_entries;
 	pgd_t *pgd_k, *pgd_efi;
+	p4d_t *p4d_k, *p4d_efi;
 	pud_t *pud_k, *pud_efi;
 
 	if (efi_enabled(EFI_OLD_MEMMAP))
@@ -197,16 +198,20 @@ void efi_sync_low_kernel_mappings(void)
 	BUILD_BUG_ON((EFI_VA_END & ~PUD_MASK) != 0);
 
 	pgd_efi = efi_pgd + pgd_index(EFI_VA_END);
-	pud_efi = pud_offset(pgd_efi, 0);
+	p4d_efi = p4d_offset(pgd_efi, 0);
+	pud_efi = pud_offset(p4d_efi, 0);
 
 	pgd_k = pgd_offset_k(EFI_VA_END);
-	pud_k = pud_offset(pgd_k, 0);
+	p4d_k = p4d_offset(pgd_k, 0);
+	pud_k = pud_offset(p4d_k, 0);
 
 	num_entries = pud_index(EFI_VA_END);
 	memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 
-	pud_efi = pud_offset(pgd_efi, EFI_VA_START);
-	pud_k = pud_offset(pgd_k, EFI_VA_START);
+	p4d_efi = p4d_offset(pgd_efi, EFI_VA_START);
+	pud_efi = pud_offset(p4d_efi, EFI_VA_START);
+	p4d_k = p4d_offset(pgd_k, EFI_VA_START);
+	pud_k = pud_offset(p4d_k, EFI_VA_START);
 
 	num_entries = PTRS_PER_PUD - pud_index(EFI_VA_START);
 	memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c
index 9f14bd34581d..c35fdb585c68 100644
--- a/arch/x86/power/hibernate_32.c
+++ b/arch/x86/power/hibernate_32.c
@@ -32,6 +32,7 @@ pgd_t *resume_pg_dir;
  */
 static pmd_t *resume_one_md_table_init(pgd_t *pgd)
 {
+	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd_table;
 
@@ -41,11 +42,13 @@ static pmd_t *resume_one_md_table_init(pgd_t *pgd)
 		return NULL;
 
 	set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
-	pud = pud_offset(pgd, 0);
+	p4d = p4d_offset(pgd, 0);
+	pud = pud_offset(p4d, 0);
 
 	BUG_ON(pmd_table != pmd_offset(pud, 0));
 #else
-	pud = pud_offset(pgd, 0);
+	p4d = p4d_offset(pgd, 0);
+	pud = pud_offset(p4d, 0);
 	pmd_table = pmd_offset(pud, 0);
 #endif
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] x86/gup: Add 5-level paging support
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 1/6] x86/mm: Extend headers with basic definitions to support 5-level paging Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 2/6] x86/mm: Convert trivial cases of page table walk to " Kirill A. Shutemov
@ 2017-03-13 14:33 ` Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 4/6] x86/ident_map: " Kirill A. Shutemov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-13 14:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

get_user_pages_fast() has to handle additional page table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/gup.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 1f3b6ef105cd..456dfdfd2249 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -76,9 +76,9 @@ static void undo_dev_pagemap(int *nr, int nr_start, struct page **pages)
 }
 
 /*
- * 'pteval' can come from a pte, pmd or pud.  We only check
+ * 'pteval' can come from a pte, pmd, pud or p4d.  We only check
  * _PAGE_PRESENT, _PAGE_USER, and _PAGE_RW in here which are the
- * same value on all 3 types.
+ * same value on all 4 types.
  */
 static inline int pte_allows_gup(unsigned long pteval, int write)
 {
@@ -295,13 +295,13 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr,
 	return 1;
 }
 
-static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
+static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
 			int write, struct page **pages, int *nr)
 {
 	unsigned long next;
 	pud_t *pudp;
 
-	pudp = pud_offset(&pgd, addr);
+	pudp = pud_offset(&p4d, addr);
 	do {
 		pud_t pud = *pudp;
 
@@ -320,6 +320,27 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 	return 1;
 }
 
+static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end,
+			int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	p4d_t *p4dp;
+
+	p4dp = p4d_offset(&pgd, addr);
+	do {
+		p4d_t p4d = *p4dp;
+
+		next = p4d_addr_end(addr, end);
+		if (p4d_none(p4d))
+			return 0;
+		BUILD_BUG_ON(p4d_large(p4d));
+		if (!gup_pud_range(p4d, addr, next, write, pages, nr))
+			return 0;
+	} while (p4dp++, addr = next, addr != end);
+
+	return 1;
+}
+
 /*
  * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
  * back to the regular GUP.
@@ -368,7 +389,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		next = pgd_addr_end(addr, end);
 		if (pgd_none(pgd))
 			break;
-		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+		if (!gup_p4d_range(pgd, addr, next, write, pages, &nr))
 			break;
 	} while (pgdp++, addr = next, addr != end);
 	local_irq_restore(flags);
@@ -440,7 +461,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		next = pgd_addr_end(addr, end);
 		if (pgd_none(pgd))
 			goto slow;
-		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+		if (!gup_p4d_range(pgd, addr, next, write, pages, &nr))
 			goto slow;
 	} while (pgdp++, addr = next, addr != end);
 	local_irq_enable();
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] x86/ident_map: Add 5-level paging support
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2017-03-13 14:33 ` [PATCH 3/6] x86/gup: Add 5-level paging support Kirill A. Shutemov
@ 2017-03-13 14:33 ` Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 5/6] x86/vmalloc: " Kirill A. Shutemov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-13 14:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

Add additional page table level handing. It's mostly mechanical.

The only quirk is that with p4d folded, 'pgd' is equal to 'p4d' in
kernel_ident_mapping_init(). pgd entry has to point pud page table in
this case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/ident_map.c | 51 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 4473cb4f8b90..1c3f166bd8c3 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -45,6 +45,34 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 	return 0;
 }
 
+static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		p4d_t *p4d = p4d_page + p4d_index(addr);
+		pud_t *pud;
+
+		next = (addr & P4D_MASK) + P4D_SIZE;
+		if (next > end)
+			next = end;
+
+		if (p4d_present(*p4d)) {
+			pud = pud_offset(p4d, 0);
+			ident_pud_init(info, pud, addr, next);
+			continue;
+		}
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		ident_pud_init(info, pud, addr, next);
+		set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 			      unsigned long pstart, unsigned long pend)
 {
@@ -55,27 +83,36 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 
 	for (; addr < end; addr = next) {
 		pgd_t *pgd = pgd_page + pgd_index(addr);
-		pud_t *pud;
+		p4d_t *p4d;
 
 		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
 		if (next > end)
 			next = end;
 
 		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
+			p4d = p4d_offset(pgd, 0);
+			result = ident_p4d_init(info, p4d, addr, next);
 			if (result)
 				return result;
 			continue;
 		}
 
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
+		p4d = (p4d_t *)info->alloc_pgt_page(info->context);
+		if (!p4d)
 			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
+		result = ident_p4d_init(info, p4d, addr, next);
 		if (result)
 			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+		if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+			set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
+		} else {
+			/*
+			 * With p4d folded, pgd is equal to p4d.
+			 * pgd entry has to point pud page table in this case.
+			 */
+			pud_t *pud = pud_offset(p4d, 0);
+			set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+		}
 	}
 
 	return 0;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] x86/vmalloc: Add 5-level paging support
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2017-03-13 14:33 ` [PATCH 4/6] x86/ident_map: " Kirill A. Shutemov
@ 2017-03-13 14:33 ` Kirill A. Shutemov
  2017-03-13 14:33 ` [PATCH 6/6] x86/power: " Kirill A. Shutemov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-13 14:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

Modify vmalloc_fault() to handle additional page table level.

With 4-level paging, copying happens on p4d level, as we have pgd_none()
always false if p4d_t is folded.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/fault.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 605fd5e8e048..1928ea02e182 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -435,6 +435,7 @@ void vmalloc_sync_all(void)
 static noinline int vmalloc_fault(unsigned long address)
 {
 	pgd_t *pgd, *pgd_ref;
+	p4d_t *p4d, *p4d_ref;
 	pud_t *pud, *pud_ref;
 	pmd_t *pmd, *pmd_ref;
 	pte_t *pte, *pte_ref;
@@ -458,17 +459,37 @@ static noinline int vmalloc_fault(unsigned long address)
 	if (pgd_none(*pgd)) {
 		set_pgd(pgd, *pgd_ref);
 		arch_flush_lazy_mmu_mode();
-	} else {
+	} else if (CONFIG_PGTABLE_LEVELS > 4) {
+		/*
+		 * With folded p4d, pgd_none() is always false. So pgd may
+		 * point to empty page table entry and pgd_page_vaddr()
+		 * will return garbage.
+		 *
+		 * We will do the correct sanity check on p4d level.
+		 */
 		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
 	}
 
+	/* With 4-level paging, copying happens on p4d level. */
+	p4d = p4d_offset(pgd, address);
+	p4d_ref = p4d_offset(pgd_ref, address);
+	if (p4d_none(*p4d_ref))
+		return -1;
+
+	if (p4d_none(*p4d)) {
+		set_p4d(p4d, *p4d_ref);
+		arch_flush_lazy_mmu_mode();
+	} else {
+		BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_ref));
+	}
+
 	/*
 	 * Below here mismatches are bugs because these lower tables
 	 * are shared:
 	 */
 
-	pud = pud_offset(pgd, address);
-	pud_ref = pud_offset(pgd_ref, address);
+	pud = pud_offset(p4d, address);
+	pud_ref = pud_offset(p4d_ref, address);
 	if (pud_none(*pud_ref))
 		return -1;
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] x86/power: Add 5-level paging support
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2017-03-13 14:33 ` [PATCH 5/6] x86/vmalloc: " Kirill A. Shutemov
@ 2017-03-13 14:33 ` Kirill A. Shutemov
  2017-03-13 19:46 ` [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Linus Torvalds
  2017-03-14  7:47 ` Ingo Molnar
  7 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-13 14:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

set_up_temporary_text_mapping() and relocate_restore_code() require
adjustments to handle additional page table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/power/hibernate_64.c | 50 +++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index ded2e8272382..aa054feb1860 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -49,6 +49,7 @@ static int set_up_temporary_text_mapping(pgd_t *pgd)
 {
 	pmd_t *pmd;
 	pud_t *pud;
+	p4d_t *p4d;
 
 	/*
 	 * The new mapping only has to cover the page containing the image
@@ -63,6 +64,13 @@ static int set_up_temporary_text_mapping(pgd_t *pgd)
 	 * the virtual address space after switching over to the original page
 	 * tables used by the image kernel.
 	 */
+
+	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+		p4d = (p4d_t *)get_safe_page(GFP_ATOMIC);
+		if (!p4d)
+			return -ENOMEM;
+	}
+
 	pud = (pud_t *)get_safe_page(GFP_ATOMIC);
 	if (!pud)
 		return -ENOMEM;
@@ -75,8 +83,16 @@ static int set_up_temporary_text_mapping(pgd_t *pgd)
 		__pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC));
 	set_pud(pud + pud_index(restore_jump_address),
 		__pud(__pa(pmd) | _KERNPG_TABLE));
-	set_pgd(pgd + pgd_index(restore_jump_address),
-		__pgd(__pa(pud) | _KERNPG_TABLE));
+	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+		set_p4d(p4d + p4d_index(restore_jump_address),
+				__p4d(__pa(pud) | _KERNPG_TABLE));
+		set_pgd(pgd + pgd_index(restore_jump_address),
+				__pgd(__pa(p4d) | _KERNPG_TABLE));
+	} else {
+		/* No p4d for 4-level paging: point pgd to pud page table */
+		set_pgd(pgd + pgd_index(restore_jump_address),
+				__pgd(__pa(pud) | _KERNPG_TABLE));
+	}
 
 	return 0;
 }
@@ -124,7 +140,10 @@ static int set_up_temporary_mappings(void)
 static int relocate_restore_code(void)
 {
 	pgd_t *pgd;
+	p4d_t *p4d;
 	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
 
 	relocated_restore_code = get_safe_page(GFP_ATOMIC);
 	if (!relocated_restore_code)
@@ -134,22 +153,25 @@ static int relocate_restore_code(void)
 
 	/* Make the page containing the relocated code executable */
 	pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
-	pud = pud_offset(pgd, relocated_restore_code);
+	p4d = p4d_offset(pgd, relocated_restore_code);
+	if (p4d_large(*p4d)) {
+		set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
+		goto out;
+	}
+	pud = pud_offset(p4d, relocated_restore_code);
 	if (pud_large(*pud)) {
 		set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
-	} else {
-		pmd_t *pmd = pmd_offset(pud, relocated_restore_code);
-
-		if (pmd_large(*pmd)) {
-			set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
-		} else {
-			pte_t *pte = pte_offset_kernel(pmd, relocated_restore_code);
-
-			set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
-		}
+		goto out;
+	}
+	pmd = pmd_offset(pud, relocated_restore_code);
+	if (pmd_large(*pmd)) {
+		set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
+		goto out;
 	}
+	pte = pte_offset_kernel(pmd, relocated_restore_code);
+	set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
+out:
 	__flush_tlb_all();
-
 	return 0;
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2017-03-13 14:33 ` [PATCH 6/6] x86/power: " Kirill A. Shutemov
@ 2017-03-13 19:46 ` Linus Torvalds
  2017-03-14  7:47 ` Ingo Molnar
  7 siblings, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2017-03-13 19:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, the arch/x86 maintainers, Thomas Gleixner,
	Ingo Molnar, Arnd Bergmann, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, Michal Hocko, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Mon, Mar 13, 2017 at 7:33 AM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> Here's the first bunch of patches of 5-level patchset. Let's see if I'm on
> right track addressing Ingo's feedback. :)

Considering the bug we just had with the HAVE_GENERIC_RCU_GUP code,
I'm wondering if people would be willing to look at what it would take
to make x86 use the generic version?

The x86 version of __get_user_pages_fast() seems to be quite similar
to the generic one. And it would be lovely if all the main
architectures shared the same core gup code.

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2017-03-13 19:46 ` [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Linus Torvalds
@ 2017-03-14  7:47 ` Ingo Molnar
  2017-03-14  8:24   ` Kirill A. Shutemov
                     ` (2 more replies)
  7 siblings, 3 replies; 15+ messages in thread
From: Ingo Molnar @ 2017-03-14  7:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin, Andi Kleen, Dave Hansen,
	Andy Lutomirski, Michal Hocko, linux-arch, linux-mm,
	linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> Here's the first bunch of patches of 5-level patchset. Let's see if I'm on
> right track addressing Ingo's feedback. :)
> 
> These patches prepare x86 code to be switched from <asm-generic/5level-fixup>
> to <asm-generic/pgtable-nop4d.h>. It's a stepping stone for adding 5-level
> paging support.
> 
> Please review and consider applying.
> 
> Kirill A. Shutemov (6):
>   x86/mm: Extend headers with basic definitions to support 5-level
>     paging
>   x86/mm: Convert trivial cases of page table walk to 5-level paging
>   x86/gup: Add 5-level paging support
>   x86/ident_map: Add 5-level paging support
>   x86/vmalloc: Add 5-level paging support
>   x86/power: Add 5-level paging support
> 
>  arch/x86/include/asm/pgtable-2level_types.h |  1 +
>  arch/x86/include/asm/pgtable-3level_types.h |  1 +
>  arch/x86/include/asm/pgtable.h              | 26 +++++++++---
>  arch/x86/include/asm/pgtable_64_types.h     |  1 +
>  arch/x86/include/asm/pgtable_types.h        | 30 ++++++++++++-
>  arch/x86/kernel/tboot.c                     |  6 ++-
>  arch/x86/kernel/vm86_32.c                   |  6 ++-
>  arch/x86/mm/fault.c                         | 66 +++++++++++++++++++++++++----
>  arch/x86/mm/gup.c                           | 33 ++++++++++++---
>  arch/x86/mm/ident_map.c                     | 51 +++++++++++++++++++---
>  arch/x86/mm/init_32.c                       | 22 +++++++---
>  arch/x86/mm/ioremap.c                       |  3 +-
>  arch/x86/mm/pgtable.c                       |  4 +-
>  arch/x86/mm/pgtable_32.c                    |  8 +++-
>  arch/x86/platform/efi/efi_64.c              | 13 ++++--
>  arch/x86/power/hibernate_32.c               |  7 ++-
>  arch/x86/power/hibernate_64.c               | 50 ++++++++++++++++------
>  17 files changed, 269 insertions(+), 59 deletions(-)

Much better!

I've applied them, with (very) minor readability edits here and there, and will 
push them out into tip:x86/mm and tip:master after some testing - you can use that 
as a base for the remaining submissions.

I've also applied the GUP patch, with the assumption that you'll address Linus's 
request to switch x86 over to the generic version.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-14  7:47 ` Ingo Molnar
@ 2017-03-14  8:24   ` Kirill A. Shutemov
  2017-03-14  8:33     ` Thomas Gleixner
  2017-03-14 17:48   ` Linus Torvalds
  2017-03-15  9:23   ` Michal Hocko
  2 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-14  8:24 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, Arnd Bergmann, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, Michal Hocko, linux-arch, linux-mm,
	linux-kernel

On Tue, Mar 14, 2017 at 08:47:29AM +0100, Ingo Molnar wrote:
> 
> * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> 
> > Here's the first bunch of patches of 5-level patchset. Let's see if I'm on
> > right track addressing Ingo's feedback. :)
> > 
> > These patches prepare x86 code to be switched from <asm-generic/5level-fixup>
> > to <asm-generic/pgtable-nop4d.h>. It's a stepping stone for adding 5-level
> > paging support.
> > 
> > Please review and consider applying.
> > 
> > Kirill A. Shutemov (6):
> >   x86/mm: Extend headers with basic definitions to support 5-level
> >     paging
> >   x86/mm: Convert trivial cases of page table walk to 5-level paging
> >   x86/gup: Add 5-level paging support
> >   x86/ident_map: Add 5-level paging support
> >   x86/vmalloc: Add 5-level paging support
> >   x86/power: Add 5-level paging support
> > 
> >  arch/x86/include/asm/pgtable-2level_types.h |  1 +
> >  arch/x86/include/asm/pgtable-3level_types.h |  1 +
> >  arch/x86/include/asm/pgtable.h              | 26 +++++++++---
> >  arch/x86/include/asm/pgtable_64_types.h     |  1 +
> >  arch/x86/include/asm/pgtable_types.h        | 30 ++++++++++++-
> >  arch/x86/kernel/tboot.c                     |  6 ++-
> >  arch/x86/kernel/vm86_32.c                   |  6 ++-
> >  arch/x86/mm/fault.c                         | 66 +++++++++++++++++++++++++----
> >  arch/x86/mm/gup.c                           | 33 ++++++++++++---
> >  arch/x86/mm/ident_map.c                     | 51 +++++++++++++++++++---
> >  arch/x86/mm/init_32.c                       | 22 +++++++---
> >  arch/x86/mm/ioremap.c                       |  3 +-
> >  arch/x86/mm/pgtable.c                       |  4 +-
> >  arch/x86/mm/pgtable_32.c                    |  8 +++-
> >  arch/x86/platform/efi/efi_64.c              | 13 ++++--
> >  arch/x86/power/hibernate_32.c               |  7 ++-
> >  arch/x86/power/hibernate_64.c               | 50 ++++++++++++++++------
> >  17 files changed, 269 insertions(+), 59 deletions(-)
> 
> Much better!
> 
> I've applied them, with (very) minor readability edits here and there, and will 
> push them out into tip:x86/mm and tip:master after some testing - you can use that 
> as a base for the remaining submissions.

Thanks.

> I've also applied the GUP patch, with the assumption that you'll address Linus's 
> request to switch x86 over to the generic version.

Okay, I'll do this.

I just want to make priorities clear here: is it okay to finish with the
rest of 5-level paging patches first before moving to GUP_fast switch?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-14  8:24   ` Kirill A. Shutemov
@ 2017-03-14  8:33     ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2017-03-14  8:33 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, Kirill A. Shutemov, Andrew Morton,
	x86, Ingo Molnar, Arnd Bergmann, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, Michal Hocko, linux-arch, linux-mm,
	linux-kernel

On Tue, 14 Mar 2017, Kirill A. Shutemov wrote:
> On Tue, Mar 14, 2017 at 08:47:29AM +0100, Ingo Molnar wrote:
> > I've also applied the GUP patch, with the assumption that you'll address Linus's 
> > request to switch x86 over to the generic version.
> 
> Okay, I'll do this.
> 
> I just want to make priorities clear here: is it okay to finish with the
> rest of 5-level paging patches first before moving to GUP_fast switch?

I think moving it first is the preferred way to do it.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-14  7:47 ` Ingo Molnar
  2017-03-14  8:24   ` Kirill A. Shutemov
@ 2017-03-14 17:48   ` Linus Torvalds
  2017-03-15 14:51     ` Kirill A. Shutemov
  2017-03-15 15:42     ` Kirill A. Shutemov
  2017-03-15  9:23   ` Michal Hocko
  2 siblings, 2 replies; 15+ messages in thread
From: Linus Torvalds @ 2017-03-14 17:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Andrew Morton, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Arnd Bergmann, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Andy Lutomirski, Michal Hocko,
	linux-arch, linux-mm, Linux Kernel Mailing List

On Tue, Mar 14, 2017 at 12:47 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> I've also applied the GUP patch, with the assumption that you'll address Linus's
> request to switch x86 over to the generic version.

Note that switching over to the generic version is somewhat fraught
with subtle issues:

 (a) we need to make sure that x86 actually matches the required
semantics for the generic GUP.

 (b) we need to make sure the atomicity of the page table reads is ok.

 (c) need to verify the maximum VM address properly

I _think_ (a) is ok. The code (and the config option name) talks about
freeing page tables using RCU, but in fact I don't think it relies on
it, and it's sufficient that it disables interrupts and that that will
block any IPI's.

In contrast, I think (b) needs real work to make sure it's ok on
32-bit PAE with 64-bit pte entries. The generic code currently just
does READ_ONCE(), while the x86 code does gup_get_pte().

And (c) means that we need to really replace that generic code that
does "access_ok()": with a proper check against maximum user address
(ie independent of set_fs(KERNEL_DS)).

But it would be good to aim for unifying this part of the VM,
considering how many bugs we've had in GUP. The latest 5-level typo
has not been the only one. It's clearly more subtle than you'd think.

So it's not quite as simple as just "switching over". I think we need
to introduce that gup_get_pte() to all the generic users, and we need
to introduce a "user address limit" for those architectures too.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-14  7:47 ` Ingo Molnar
  2017-03-14  8:24   ` Kirill A. Shutemov
  2017-03-14 17:48   ` Linus Torvalds
@ 2017-03-15  9:23   ` Michal Hocko
  2 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2017-03-15  9:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, Arnd Bergmann, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel

On Tue 14-03-17 08:47:29, Ingo Molnar wrote:
> 
> * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> 
> > Here's the first bunch of patches of 5-level patchset. Let's see if I'm on
> > right track addressing Ingo's feedback. :)
> > 
> > These patches prepare x86 code to be switched from <asm-generic/5level-fixup>
> > to <asm-generic/pgtable-nop4d.h>. It's a stepping stone for adding 5-level
> > paging support.
> > 
> > Please review and consider applying.
> > 
> > Kirill A. Shutemov (6):
> >   x86/mm: Extend headers with basic definitions to support 5-level
> >     paging
> >   x86/mm: Convert trivial cases of page table walk to 5-level paging
> >   x86/gup: Add 5-level paging support
> >   x86/ident_map: Add 5-level paging support
> >   x86/vmalloc: Add 5-level paging support
> >   x86/power: Add 5-level paging support
> > 
> >  arch/x86/include/asm/pgtable-2level_types.h |  1 +
> >  arch/x86/include/asm/pgtable-3level_types.h |  1 +
> >  arch/x86/include/asm/pgtable.h              | 26 +++++++++---
> >  arch/x86/include/asm/pgtable_64_types.h     |  1 +
> >  arch/x86/include/asm/pgtable_types.h        | 30 ++++++++++++-
> >  arch/x86/kernel/tboot.c                     |  6 ++-
> >  arch/x86/kernel/vm86_32.c                   |  6 ++-
> >  arch/x86/mm/fault.c                         | 66 +++++++++++++++++++++++++----
> >  arch/x86/mm/gup.c                           | 33 ++++++++++++---
> >  arch/x86/mm/ident_map.c                     | 51 +++++++++++++++++++---
> >  arch/x86/mm/init_32.c                       | 22 +++++++---
> >  arch/x86/mm/ioremap.c                       |  3 +-
> >  arch/x86/mm/pgtable.c                       |  4 +-
> >  arch/x86/mm/pgtable_32.c                    |  8 +++-
> >  arch/x86/platform/efi/efi_64.c              | 13 ++++--
> >  arch/x86/power/hibernate_32.c               |  7 ++-
> >  arch/x86/power/hibernate_64.c               | 50 ++++++++++++++++------
> >  17 files changed, 269 insertions(+), 59 deletions(-)
> 
> Much better!
> 
> I've applied them, with (very) minor readability edits here and there, and will 
> push them out into tip:x86/mm and tip:master after some testing - you can use that 
> as a base for the remaining submissions.

JFYI, I have cherry picked these and those merged via Linus tree into
the mmotm git tree [1] (tag mmotm-2017-03-14-15-41)

[1] git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-14 17:48   ` Linus Torvalds
@ 2017-03-15 14:51     ` Kirill A. Shutemov
  2017-03-15 15:42     ` Kirill A. Shutemov
  1 sibling, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-15 14:51 UTC (permalink / raw)
  To: Linus Torvalds, Andrea Arcangeli
  Cc: Ingo Molnar, Kirill A. Shutemov, Andrew Morton,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin, Andi Kleen, Dave Hansen,
	Andy Lutomirski, Michal Hocko, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Tue, Mar 14, 2017 at 10:48:51AM -0700, Linus Torvalds wrote:
> On Tue, Mar 14, 2017 at 12:47 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > I've also applied the GUP patch, with the assumption that you'll address Linus's
> > request to switch x86 over to the generic version.
> 
> Note that switching over to the generic version is somewhat fraught
> with subtle issues:
> 
>  (a) we need to make sure that x86 actually matches the required
> semantics for the generic GUP.
> 
>  (b) we need to make sure the atomicity of the page table reads is ok.
> 
>  (c) need to verify the maximum VM address properly
> 
> I _think_ (a) is ok. The code (and the config option name) talks about
> freeing page tables using RCU, but in fact I don't think it relies on
> it, and it's sufficient that it disables interrupts and that that will
> block any IPI's.
> 
> In contrast, I think (b) needs real work to make sure it's ok on
> 32-bit PAE with 64-bit pte entries. The generic code currently just
> does READ_ONCE(), while the x86 code does gup_get_pte().

+ Andrea.

Looking on gup_get_pte() makes me thinkg, why don't we need the same
approach for pmd level (pud is not relevant for PAE)?

Looks like a bug to me.

We have pmd_read_atomic() to address the issue in other places. The helper
doesn't match required for GUP_fast() semantics, but we clearly need to
address the issue.

pgd deference doesn't look good too on PAE. Or am I missing something?

Heck, we don't even have READ_ONCE() on x86 for page table entry
dereference. Looks like a bug waiting to explode. And not only on PAE.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1
  2017-03-14 17:48   ` Linus Torvalds
  2017-03-15 14:51     ` Kirill A. Shutemov
@ 2017-03-15 15:42     ` Kirill A. Shutemov
  1 sibling, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2017-03-15 15:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Kirill A. Shutemov, Andrew Morton,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin, Andi Kleen, Dave Hansen,
	Andy Lutomirski, Michal Hocko, linux-arch, linux-mm,
	Linux Kernel Mailing List

On Tue, Mar 14, 2017 at 10:48:51AM -0700, Linus Torvalds wrote:
> On Tue, Mar 14, 2017 at 12:47 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > I've also applied the GUP patch, with the assumption that you'll address Linus's
> > request to switch x86 over to the generic version.
> 
> Note that switching over to the generic version is somewhat fraught
> with subtle issues:
> 
>  (a) we need to make sure that x86 actually matches the required
> semantics for the generic GUP.
> 
>  (b) we need to make sure the atomicity of the page table reads is ok.
> 
>  (c) need to verify the maximum VM address properly

There's another difference with generic version: it uses
page_cache_get_speculative() instead of plain get_page() on x86.
That's somewhat more expensive, but probably fine.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-18 17:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-13 14:33 [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Kirill A. Shutemov
2017-03-13 14:33 ` [PATCH 1/6] x86/mm: Extend headers with basic definitions to support 5-level paging Kirill A. Shutemov
2017-03-13 14:33 ` [PATCH 2/6] x86/mm: Convert trivial cases of page table walk to " Kirill A. Shutemov
2017-03-13 14:33 ` [PATCH 3/6] x86/gup: Add 5-level paging support Kirill A. Shutemov
2017-03-13 14:33 ` [PATCH 4/6] x86/ident_map: " Kirill A. Shutemov
2017-03-13 14:33 ` [PATCH 5/6] x86/vmalloc: " Kirill A. Shutemov
2017-03-13 14:33 ` [PATCH 6/6] x86/power: " Kirill A. Shutemov
2017-03-13 19:46 ` [PATCH 0/6] x86: 5-level paging enabling for v4.12, Part 1 Linus Torvalds
2017-03-14  7:47 ` Ingo Molnar
2017-03-14  8:24   ` Kirill A. Shutemov
2017-03-14  8:33     ` Thomas Gleixner
2017-03-14 17:48   ` Linus Torvalds
2017-03-15 14:51     ` Kirill A. Shutemov
2017-03-15 15:42     ` Kirill A. Shutemov
2017-03-15  9:23   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).