[PATCH 0/6] powerpc/8xx: implementation of huge pages

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] powerpc/8xx: implementation of huge pages
@ 2016-08-12 16:55 Christophe Leroy
  2016-08-12 16:55 ` [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits Christophe Leroy
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Christophe Leroy @ 2016-08-12 16:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

This set provides implementation of huge pages on the 8xx

Christophe Leroy (6):
  powerpc: port 64 bits pgtable_cache to 32 bits
  powerpc: fix usage of _PAGE_RO in hugepage
  powerpc/8xx: use r3 to scratch CR in ITLBmiss
  powerpc/8xx: Move additional DTLBMiss handlers out of exception area
  powerpc/8xx: make user addr DTLB miss the short path
  powerpc/8xx: implementation of huge pages

 arch/powerpc/include/asm/book3s/32/pgalloc.h |  44 ++++-
 arch/powerpc/include/asm/book3s/32/pgtable.h |  43 ++---
 arch/powerpc/include/asm/book3s/64/pgtable.h |   5 +-
 arch/powerpc/include/asm/hugetlb.h           |  20 ++-
 arch/powerpc/include/asm/mmu-8xx.h           |  35 ++++
 arch/powerpc/include/asm/mmu.h               |  25 +--
 arch/powerpc/include/asm/nohash/32/pgalloc.h |  44 ++++-
 arch/powerpc/include/asm/nohash/32/pgtable.h |  45 ++---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h |   1 +
 arch/powerpc/include/asm/nohash/64/pgtable.h |   2 -
 arch/powerpc/include/asm/nohash/pgtable.h    |   4 +
 arch/powerpc/include/asm/pgtable.h           |   2 +
 arch/powerpc/include/asm/reg_8xx.h           |   2 +-
 arch/powerpc/kernel/head_8xx.S               | 235 +++++++++++++++++++--------
 arch/powerpc/mm/Makefile                     |   2 +-
 arch/powerpc/mm/hugetlbpage.c                | 184 ++++++++-------------
 arch/powerpc/mm/init-common.c                | 152 +++++++++++++++++
 arch/powerpc/mm/init_32.c                    |   5 -
 arch/powerpc/mm/init_64.c                    |  82 ----------
 arch/powerpc/mm/pgtable_32.c                 |  37 -----
 arch/powerpc/mm/tlb_nohash.c                 |  21 ++-
 arch/powerpc/platforms/8xx/Kconfig           |   1 +
 arch/powerpc/platforms/Kconfig.cputype       |   1 +
 23 files changed, 603 insertions(+), 389 deletions(-)
 create mode 100644 arch/powerpc/mm/init-common.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits
  2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
@ 2016-08-12 16:55 ` Christophe Leroy
  2016-08-14 14:17   ` Aneesh Kumar K.V
  2016-08-12 16:55 ` [PATCH 2/6] powerpc: fix usage of _PAGE_RO in hugepage Christophe Leroy
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Christophe Leroy @ 2016-08-12 16:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
standard pages when using 4k pages and a single pgtable_cache
if using other size pages. In addition powerpc32 uses another cache
when handling huge pages.

In preparation of implementing huge pages on the 8xx, this patch
replaces the specific powerpc32 handling by the 64 bits approach.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h |  44 ++++++--
 arch/powerpc/include/asm/book3s/32/pgtable.h |  43 ++++----
 arch/powerpc/include/asm/book3s/64/pgtable.h |   3 -
 arch/powerpc/include/asm/hugetlb.h           |   2 -
 arch/powerpc/include/asm/nohash/32/pgalloc.h |  44 ++++++--
 arch/powerpc/include/asm/nohash/32/pgtable.h |  45 ++++----
 arch/powerpc/include/asm/nohash/64/pgtable.h |   2 -
 arch/powerpc/include/asm/pgtable.h           |   2 +
 arch/powerpc/mm/Makefile                     |   2 +-
 arch/powerpc/mm/hugetlbpage.c                |  12 +--
 arch/powerpc/mm/init-common.c                | 152 +++++++++++++++++++++++++++
 arch/powerpc/mm/init_32.c                    |   5 -
 arch/powerpc/mm/init_64.c                    |  82 ---------------
 arch/powerpc/mm/pgtable_32.c                 |  37 -------
 14 files changed, 282 insertions(+), 193 deletions(-)
 create mode 100644 arch/powerpc/mm/init-common.c

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 8e21bb4..ab215fd 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -2,14 +2,42 @@
 #define _ASM_POWERPC_BOOK3S_32_PGALLOC_H
 
 #include <linux/threads.h>
+#include <linux/slab.h>
 
-/* For 32-bit, all levels of page tables are just drawn from get_free_page() */
-#define MAX_PGTABLE_INDEX_SIZE	0
+/*
+ * Functions that deal with pagetables that could be at any level of
+ * the table need to be passed an "index_size" so they know how to
+ * handle allocation.  For PTE pages (which are linked to a struct
+ * page for now, and drawn from the main get_free_pages() pool), the
+ * allocation size will be (2^index_size * sizeof(pointer)) and
+ * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
+ *
+ * The maximum index size needs to be big enough to allow any
+ * pagetable sizes we need, but small enough to fit in the low bits of
+ * any page table pointer.  In other words all pagetables, even tiny
+ * ones, must be aligned to allow at least enough low 0 bits to
+ * contain this value.  This value is also used as a mask, so it must
+ * be one less than a power of two.
+ */
+#define MAX_PGTABLE_INDEX_SIZE	0xf
 
 extern void __bad_pte(pmd_t *pmd);
 
-extern pgd_t *pgd_alloc(struct mm_struct *mm);
-extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
+extern struct kmem_cache *pgtable_cache[];
+#define PGT_CACHE(shift) ({				\
+			BUG_ON(!(shift));		\
+			pgtable_cache[(shift) - 1];	\
+		})
+
+static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+	return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), GFP_KERNEL);
+}
+
+static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
+{
+	kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
+}
 
 /*
  * We don't have any real pmd's, and this code never triggers because
@@ -68,8 +96,12 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 
 static inline void pgtable_free(void *table, unsigned index_size)
 {
-	BUG_ON(index_size); /* 32-bit doesn't use this */
-	free_page((unsigned long)table);
+	if (!index_size)
+		free_page((unsigned long)table);
+	else {
+		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
+		kmem_cache_free(PGT_CACHE(index_size), table);
+	}
 }
 
 #define check_pgt_cache()	do { } while (0)
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 38b33dc..83a2159 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -8,6 +8,26 @@
 /* And here we include common definitions */
 #include <asm/pte-common.h>
 
+#define PTE_INDEX_SIZE	PTE_SHIFT
+#define PMD_INDEX_SIZE	0
+#define PUD_INDEX_SIZE	0
+#define PGD_INDEX_SIZE	(32 - PGDIR_SHIFT)
+
+#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
+
+#ifndef __ASSEMBLY__
+#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_INDEX_SIZE)
+#define PMD_TABLE_SIZE	(sizeof(pmd_t) << PTE_INDEX_SIZE)
+#define PUD_TABLE_SIZE	(sizeof(pud_t) << PTE_INDEX_SIZE)
+#define PGD_TABLE_SIZE	(sizeof(pgd_t) << PGD_INDEX_SIZE)
+#endif	/* __ASSEMBLY__ */
+
+#define PTRS_PER_PTE	(1 << PTE_INDEX_SIZE)
+#define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
+
+/* With 4k base page size, hugepage PTEs go at the PMD level */
+#define MIN_HUGEPTE_SHIFT	PMD_SHIFT
+
 /*
  * The normal case is that PTEs are 32-bits and we have a 1-page
  * 1024-entry pgdir pointing to 1-page 1024-entry PTE pages.  -- paulus
@@ -19,14 +39,10 @@
  * -Matt
  */
 /* PGDIR_SHIFT determines what a top-level page table entry can map */
-#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_SHIFT)
+#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_INDEX_SIZE)
 #define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 
-#define PTRS_PER_PTE	(1 << PTE_SHIFT)
-#define PTRS_PER_PMD	1
-#define PTRS_PER_PGD	(1 << (32 - PGDIR_SHIFT))
-
 #define USER_PTRS_PER_PGD	(TASK_SIZE / PGDIR_SIZE)
 /*
  * This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
@@ -82,12 +98,8 @@
 
 extern unsigned long ioremap_bot;
 
-/*
- * entries per page directory level: our page-table tree is two-level, so
- * we don't really have any PMD directory.
- */
-#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_SHIFT)
-#define PGD_TABLE_SIZE	(sizeof(pgd_t) << (32 - PGDIR_SHIFT))
+/* Bits to mask out from a PGD to get to the PUD page */
+#define PGD_MASKED_BITS		0
 
 #define pte_ERROR(e) \
 	pr_err("%s:%d: bad pte %llx.\n", __FILE__, __LINE__, \
@@ -282,15 +294,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val(pte) >> 3 })
 #define __swp_entry_to_pte(x)		((pte_t) { (x).val << 3 })
 
-#ifndef CONFIG_PPC_4K_PAGES
-void pgtable_cache_init(void);
-#else
-/*
- * No page table caches to initialise
- */
-#define pgtable_cache_init()	do { } while (0)
-#endif
-
 extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
 		      pmd_t **pmdp);
 
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 263bf39..3f85d43 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -786,9 +786,6 @@ extern struct page *pgd_page(pgd_t pgd);
 #define pgd_ERROR(e) \
 	pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 
-void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
-void pgtable_cache_init(void);
-
 static inline int map_kernel_page(unsigned long ea, unsigned long pa,
 				  unsigned long flags)
 {
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index c5517f4..c201cd6 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -5,8 +5,6 @@
 #include <asm/page.h>
 #include <asm-generic/hugetlb.h>
 
-extern struct kmem_cache *hugepte_cache;
-
 #ifdef CONFIG_PPC_BOOK3S_64
 
 #include <asm/book3s/64/hugetlb-radix.h>
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 76d6b9e..c2fe85c 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -2,14 +2,42 @@
 #define _ASM_POWERPC_PGALLOC_32_H
 
 #include <linux/threads.h>
+#include <linux/slab.h>
 
-/* For 32-bit, all levels of page tables are just drawn from get_free_page() */
-#define MAX_PGTABLE_INDEX_SIZE	0
+/*
+ * Functions that deal with pagetables that could be at any level of
+ * the table need to be passed an "index_size" so they know how to
+ * handle allocation.  For PTE pages (which are linked to a struct
+ * page for now, and drawn from the main get_free_pages() pool), the
+ * allocation size will be (2^index_size * sizeof(pointer)) and
+ * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
+ *
+ * The maximum index size needs to be big enough to allow any
+ * pagetable sizes we need, but small enough to fit in the low bits of
+ * any page table pointer.  In other words all pagetables, even tiny
+ * ones, must be aligned to allow at least enough low 0 bits to
+ * contain this value.  This value is also used as a mask, so it must
+ * be one less than a power of two.
+ */
+#define MAX_PGTABLE_INDEX_SIZE	0xf
 
 extern void __bad_pte(pmd_t *pmd);
 
-extern pgd_t *pgd_alloc(struct mm_struct *mm);
-extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
+extern struct kmem_cache *pgtable_cache[];
+#define PGT_CACHE(shift) ({				\
+			BUG_ON(!(shift));		\
+			pgtable_cache[(shift) - 1];	\
+		})
+
+static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+	return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), GFP_KERNEL);
+}
+
+static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
+{
+	kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
+}
 
 /*
  * We don't have any real pmd's, and this code never triggers because
@@ -68,8 +96,12 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 
 static inline void pgtable_free(void *table, unsigned index_size)
 {
-	BUG_ON(index_size); /* 32-bit doesn't use this */
-	free_page((unsigned long)table);
+	if (!index_size)
+		free_page((unsigned long)table);
+	else {
+		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
+		kmem_cache_free(PGT_CACHE(index_size), table);
+	}
 }
 
 #define check_pgt_cache()	do { } while (0)
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 7808475..8a2937d 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -16,6 +16,26 @@ extern int icache_44x_need_flush;
 
 #endif /* __ASSEMBLY__ */
 
+#define PTE_INDEX_SIZE	PTE_SHIFT
+#define PMD_INDEX_SIZE	0
+#define PUD_INDEX_SIZE	0
+#define PGD_INDEX_SIZE	(32 - PGDIR_SHIFT)
+
+#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
+
+#ifndef __ASSEMBLY__
+#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_INDEX_SIZE)
+#define PMD_TABLE_SIZE	(sizeof(pmd_t) << PTE_INDEX_SIZE)
+#define PUD_TABLE_SIZE	(sizeof(pud_t) << PTE_INDEX_SIZE)
+#define PGD_TABLE_SIZE	(sizeof(pgd_t) << PGD_INDEX_SIZE)
+#endif	/* __ASSEMBLY__ */
+
+#define PTRS_PER_PTE	(1 << PTE_INDEX_SIZE)
+#define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
+
+/* With 4k base page size, hugepage PTEs go at the PMD level */
+#define MIN_HUGEPTE_SHIFT	PMD_SHIFT
+
 /*
  * The normal case is that PTEs are 32-bits and we have a 1-page
  * 1024-entry pgdir pointing to 1-page 1024-entry PTE pages.  -- paulus
@@ -27,22 +47,12 @@ extern int icache_44x_need_flush;
  * -Matt
  */
 /* PGDIR_SHIFT determines what a top-level page table entry can map */
-#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_SHIFT)
+#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_INDEX_SIZE)
 #define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 
-/*
- * entries per page directory level: our page-table tree is two-level, so
- * we don't really have any PMD directory.
- */
-#ifndef __ASSEMBLY__
-#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_SHIFT)
-#define PGD_TABLE_SIZE	(sizeof(pgd_t) << (32 - PGDIR_SHIFT))
-#endif	/* __ASSEMBLY__ */
-
-#define PTRS_PER_PTE	(1 << PTE_SHIFT)
-#define PTRS_PER_PMD	1
-#define PTRS_PER_PGD	(1 << (32 - PGDIR_SHIFT))
+/* Bits to mask out from a PGD to get to the PUD page */
+#define PGD_MASKED_BITS		0
 
 #define USER_PTRS_PER_PGD	(TASK_SIZE / PGDIR_SIZE)
 #define FIRST_USER_ADDRESS	0UL
@@ -327,15 +337,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val(pte) >> 3 })
 #define __swp_entry_to_pte(x)		((pte_t) { (x).val << 3 })
 
-#ifndef CONFIG_PPC_4K_PAGES
-void pgtable_cache_init(void);
-#else
-/*
- * No page table caches to initialise
- */
-#define pgtable_cache_init()	do { } while (0)
-#endif
-
 extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
 		      pmd_t **pmdp);
 
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index d4d808c..b0fc9e4 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -357,8 +357,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
 #define __swp_entry_to_pte(x)		__pte((x).val)
 
-void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
-void pgtable_cache_init(void);
 extern int map_kernel_page(unsigned long ea, unsigned long pa,
 			   unsigned long flags);
 extern int __meminit vmemmap_create_mapping(unsigned long start,
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9bd87f2..dd01212 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -78,6 +78,8 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
 
 unsigned long vmalloc_to_phys(void *vmalloc_addr);
 
+void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
+void pgtable_cache_init(void);
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index f2cea6d..08bb010 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -7,7 +7,7 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 ccflags-$(CONFIG_PPC64)	:= $(NO_MINIMAL_TOC)
 
 obj-y				:= fault.o mem.o pgtable.o mmap.o \
-				   init_$(CONFIG_WORD_SIZE).o \
+				   init_$(CONFIG_WORD_SIZE).o init-common.o \
 				   pgtable_$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o tlb_nohash.o \
 				   tlb_nohash_low.o
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 7372ee1..9164a77 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -68,7 +68,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 #ifdef CONFIG_PPC_FSL_BOOK3E
 	int i;
 	int num_hugepd = 1 << (pshift - pdshift);
-	cachep = hugepte_cache;
+	cachep = PGT_CACHE(1);
 #else
 	cachep = PGT_CACHE(pdshift - pshift);
 #endif
@@ -411,7 +411,7 @@ static void hugepd_free_rcu_callback(struct rcu_head *head)
 	unsigned int i;
 
 	for (i = 0; i < batch->index; i++)
-		kmem_cache_free(hugepte_cache, batch->ptes[i]);
+		kmem_cache_free(PGT_CACHE(1), batch->ptes[i]);
 
 	free_page((unsigned long)batch);
 }
@@ -425,7 +425,7 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
 	if (atomic_read(&tlb->mm->mm_users) < 2 ||
 	    cpumask_equal(mm_cpumask(tlb->mm),
 			  cpumask_of(smp_processor_id()))) {
-		kmem_cache_free(hugepte_cache, hugepte);
+		kmem_cache_free(PGT_CACHE(1), hugepte);
 		put_cpu_var(hugepd_freelist_cur);
 		return;
 	}
@@ -792,7 +792,6 @@ static int __init hugepage_setup_sz(char *str)
 __setup("hugepagesz=", hugepage_setup_sz);
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
-struct kmem_cache *hugepte_cache;
 static int __init hugetlbpage_init(void)
 {
 	int psize;
@@ -815,9 +814,8 @@ static int __init hugetlbpage_init(void)
 	 * Create a kmem cache for hugeptes.  The bottom bits in the pte have
 	 * size information encoded in them, so align them to allow this
 	 */
-	hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
-					   HUGEPD_SHIFT_MASK + 1, 0, NULL);
-	if (hugepte_cache == NULL)
+	pgtable_cache_add(1, NULL);
+	if (!PGT_CACHE(1))
 		panic("%s: Unable to create kmem cache for hugeptes\n",
 		      __func__);
 
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
new file mode 100644
index 0000000..2632eab
--- /dev/null
+++ b/arch/powerpc/mm/init-common.c
@@ -0,0 +1,152 @@
+/*
+ *  PowerPC version
+ *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
+ *
+ *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
+ *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
+ *    Copyright (C) 1996 Paul Mackerras
+ *
+ *  Derived from "arch/i386/mm/init.c"
+ *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
+ *
+ *  Dave Engebretsen <engebret@us.ibm.com>
+ *      Rework for PPC64 port.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#undef DEBUG
+
+#include <linux/signal.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/mman.h>
+#include <linux/mm.h>
+#include <linux/swap.h>
+#include <linux/stddef.h>
+#include <linux/vmalloc.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/highmem.h>
+#include <linux/idr.h>
+#include <linux/nodemask.h>
+#include <linux/module.h>
+#include <linux/poison.h>
+#include <linux/memblock.h>
+#include <linux/hugetlb.h>
+#include <linux/slab.h>
+
+#include <asm/pgalloc.h>
+#include <asm/page.h>
+#include <asm/prom.h>
+#include <asm/rtas.h>
+#include <asm/io.h>
+#include <asm/mmu_context.h>
+#include <asm/pgtable.h>
+#include <asm/mmu.h>
+#include <asm/uaccess.h>
+#include <asm/smp.h>
+#include <asm/machdep.h>
+#include <asm/tlb.h>
+#include <asm/eeh.h>
+#include <asm/processor.h>
+#include <asm/mmzone.h>
+#include <asm/cputable.h>
+#include <asm/sections.h>
+#include <asm/iommu.h>
+#include <asm/vdso.h>
+
+#include "mmu_decl.h"
+
+phys_addr_t memstart_addr = (phys_addr_t)~0ull;
+EXPORT_SYMBOL_GPL(memstart_addr);
+phys_addr_t kernstart_addr;
+EXPORT_SYMBOL_GPL(kernstart_addr);
+
+static void pgd_ctor(void *addr)
+{
+	memset(addr, 0, PGD_TABLE_SIZE);
+}
+
+static void pud_ctor(void *addr)
+{
+	memset(addr, 0, PUD_TABLE_SIZE);
+}
+
+static void pmd_ctor(void *addr)
+{
+	memset(addr, 0, PMD_TABLE_SIZE);
+}
+
+struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
+
+/*
+ * Create a kmem_cache() for pagetables.  This is not used for PTE
+ * pages - they're linked to struct page, come from the normal free
+ * pages pool and have a different entry size (see real_pte_t) to
+ * everything else.  Caches created by this function are used for all
+ * the higher level pagetables, and for hugepage pagetables.
+ */
+void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
+{
+	char *name;
+	unsigned long table_size = sizeof(void *) << shift;
+	unsigned long align = table_size;
+
+	/* When batching pgtable pointers for RCU freeing, we store
+	 * the index size in the low bits.  Table alignment must be
+	 * big enough to fit it.
+	 *
+	 * Likewise, hugeapge pagetable pointers contain a (different)
+	 * shift value in the low bits.  All tables must be aligned so
+	 * as to leave enough 0 bits in the address to contain it. */
+	unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
+				     HUGEPD_SHIFT_MASK + 1);
+	struct kmem_cache *new;
+
+	/* It would be nice if this was a BUILD_BUG_ON(), but at the
+	 * moment, gcc doesn't seem to recognize is_power_of_2 as a
+	 * constant expression, so so much for that. */
+	BUG_ON(!is_power_of_2(minalign));
+	BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
+
+	if (PGT_CACHE(shift))
+		return; /* Already have a cache of this size */
+
+	align = max_t(unsigned long, align, minalign);
+	name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
+	new = kmem_cache_create(name, table_size, align, 0, ctor);
+	kfree(name);
+	pgtable_cache[shift - 1] = new;
+	pr_debug("Allocated pgtable cache for order %d\n", shift);
+}
+
+
+void pgtable_cache_init(void)
+{
+	pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
+
+	if (PMD_INDEX_SIZE && !PGT_CACHE(PMD_INDEX_SIZE))
+		pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor);
+	/*
+	 * In all current configs, when the PUD index exists it's the
+	 * same size as either the pgd or pmd index except with THP enabled
+	 * on book3s 64
+	 */
+	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
+		pgtable_cache_add(PUD_INDEX_SIZE, pud_ctor);
+
+	if (!PGT_CACHE(PGD_INDEX_SIZE))
+		panic("Couldn't allocate pgd cache");
+	if (PMD_INDEX_SIZE && !PGT_CACHE(PMD_INDEX_SIZE))
+		panic("Couldn't allocate pmd pgtable caches");
+	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
+		panic("Couldn't allocate pud pgtable caches");
+}
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 448685f..79c24d4 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -59,11 +59,6 @@
 phys_addr_t total_memory;
 phys_addr_t total_lowmem;
 
-phys_addr_t memstart_addr = (phys_addr_t)~0ull;
-EXPORT_SYMBOL(memstart_addr);
-phys_addr_t kernstart_addr;
-EXPORT_SYMBOL(kernstart_addr);
-
 #ifdef CONFIG_RELOCATABLE
 /* Used in __va()/__pa() */
 long long virt_phys_offset;
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 16ada1e..4acd546 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -75,88 +75,6 @@
 #endif
 #endif /* CONFIG_PPC_STD_MMU_64 */
 
-phys_addr_t memstart_addr = ~0;
-EXPORT_SYMBOL_GPL(memstart_addr);
-phys_addr_t kernstart_addr;
-EXPORT_SYMBOL_GPL(kernstart_addr);
-
-static void pgd_ctor(void *addr)
-{
-	memset(addr, 0, PGD_TABLE_SIZE);
-}
-
-static void pud_ctor(void *addr)
-{
-	memset(addr, 0, PUD_TABLE_SIZE);
-}
-
-static void pmd_ctor(void *addr)
-{
-	memset(addr, 0, PMD_TABLE_SIZE);
-}
-
-struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
-
-/*
- * Create a kmem_cache() for pagetables.  This is not used for PTE
- * pages - they're linked to struct page, come from the normal free
- * pages pool and have a different entry size (see real_pte_t) to
- * everything else.  Caches created by this function are used for all
- * the higher level pagetables, and for hugepage pagetables.
- */
-void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
-{
-	char *name;
-	unsigned long table_size = sizeof(void *) << shift;
-	unsigned long align = table_size;
-
-	/* When batching pgtable pointers for RCU freeing, we store
-	 * the index size in the low bits.  Table alignment must be
-	 * big enough to fit it.
-	 *
-	 * Likewise, hugeapge pagetable pointers contain a (different)
-	 * shift value in the low bits.  All tables must be aligned so
-	 * as to leave enough 0 bits in the address to contain it. */
-	unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
-				     HUGEPD_SHIFT_MASK + 1);
-	struct kmem_cache *new;
-
-	/* It would be nice if this was a BUILD_BUG_ON(), but at the
-	 * moment, gcc doesn't seem to recognize is_power_of_2 as a
-	 * constant expression, so so much for that. */
-	BUG_ON(!is_power_of_2(minalign));
-	BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
-
-	if (PGT_CACHE(shift))
-		return; /* Already have a cache of this size */
-
-	align = max_t(unsigned long, align, minalign);
-	name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
-	new = kmem_cache_create(name, table_size, align, 0, ctor);
-	kfree(name);
-	pgtable_cache[shift - 1] = new;
-	pr_debug("Allocated pgtable cache for order %d\n", shift);
-}
-
-
-void pgtable_cache_init(void)
-{
-	pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
-	pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor);
-	/*
-	 * In all current configs, when the PUD index exists it's the
-	 * same size as either the pgd or pmd index except with THP enabled
-	 * on book3s 64
-	 */
-	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
-		pgtable_cache_add(PUD_INDEX_SIZE, pud_ctor);
-
-	if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_CACHE_INDEX))
-		panic("Couldn't allocate pgtable caches");
-	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
-		panic("Couldn't allocate pud pgtable caches");
-}
-
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Given an address within the vmemmap, determine the pfn of the page that
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 0ae0572..a65c0b4 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -42,43 +42,6 @@ EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
 
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
-#define PGDIR_ORDER	(32 + PGD_T_LOG2 - PGDIR_SHIFT)
-
-#ifndef CONFIG_PPC_4K_PAGES
-static struct kmem_cache *pgtable_cache;
-
-void pgtable_cache_init(void)
-{
-	pgtable_cache = kmem_cache_create("PGDIR cache", 1 << PGDIR_ORDER,
-					  1 << PGDIR_ORDER, 0, NULL);
-	if (pgtable_cache == NULL)
-		panic("Couldn't allocate pgtable caches");
-}
-#endif
-
-pgd_t *pgd_alloc(struct mm_struct *mm)
-{
-	pgd_t *ret;
-
-	/* pgdir take page or two with 4K pages and a page fraction otherwise */
-#ifndef CONFIG_PPC_4K_PAGES
-	ret = kmem_cache_alloc(pgtable_cache, GFP_KERNEL | __GFP_ZERO);
-#else
-	ret = (pgd_t *)__get_free_pages(GFP_KERNEL|__GFP_ZERO,
-			PGDIR_ORDER - PAGE_SHIFT);
-#endif
-	return ret;
-}
-
-void pgd_free(struct mm_struct *mm, pgd_t *pgd)
-{
-#ifndef CONFIG_PPC_4K_PAGES
-	kmem_cache_free(pgtable_cache, (void *)pgd);
-#else
-	free_pages((unsigned long)pgd, PGDIR_ORDER - PAGE_SHIFT);
-#endif
-}
-
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
 	pte_t *pte;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] powerpc: fix usage of _PAGE_RO in hugepage
  2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
  2016-08-12 16:55 ` [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits Christophe Leroy
@ 2016-08-12 16:55 ` Christophe Leroy
  2016-08-12 16:55 ` [PATCH 3/6] powerpc/8xx: use r3 to scratch CR in ITLBmiss Christophe Leroy
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Christophe Leroy @ 2016-08-12 16:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

fixes: a7b9f671f2d14 ("powerpc32: adds handling of _PAGE_RO")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 2 ++
 arch/powerpc/mm/hugetlbpage.c                | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 3f85d43..7873b09 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -6,6 +6,8 @@
  */
 #define _PAGE_BIT_SWAP_TYPE	0
 
+#define _PAGE_RO		0
+
 #define _PAGE_EXEC		0x00001 /* execute permission */
 #define _PAGE_WRITE		0x00002 /* write access allowed */
 #define _PAGE_READ		0x00004	/* read access allowed */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 9164a77..03fcb7e 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -1019,6 +1019,8 @@ int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 	mask = _PAGE_PRESENT | _PAGE_READ;
 	if (write)
 		mask |= _PAGE_WRITE;
+	else
+		mask |= _PAGE_RO;
 
 	if ((pte_val(pte) & mask) != mask)
 		return 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] powerpc/8xx: use r3 to scratch CR in ITLBmiss
  2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
  2016-08-12 16:55 ` [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits Christophe Leroy
  2016-08-12 16:55 ` [PATCH 2/6] powerpc: fix usage of _PAGE_RO in hugepage Christophe Leroy
@ 2016-08-12 16:55 ` Christophe Leroy
  2016-08-12 16:55 ` [PATCH 4/6] powerpc/8xx: Move additional DTLBMiss handlers out of exception area Christophe Leroy
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Christophe Leroy @ 2016-08-12 16:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/head_8xx.S | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 43ddaae..708fd43 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -322,7 +322,7 @@ SystemCall:
 #endif
 
 InstructionTLBMiss:
-#ifdef CONFIG_8xx_CPU6
+#if defined(CONFIG_8xx_CPU6) || defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
 	mtspr	SPRN_SPRG_SCRATCH2, r3
 #endif
 	EXCEPTION_PROLOG_0
@@ -330,23 +330,20 @@ InstructionTLBMiss:
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
+	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
+	INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
 #if defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
 	/* Only modules will cause ITLB Misses as we always
 	 * pin the first 8MB of kernel memory */
-	mfspr	r11, SPRN_SRR0	/* Get effective address of fault */
-	INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
-	mfcr	r10
-	IS_KERNEL(r11, r11)
+	mfcr	r3
+	IS_KERNEL(r11, r10)
+#endif
 	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
+#if defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
 	BRANCH_UNLESS_KERNEL(3f)
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 3:
-	mtcr	r10
-	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
-#else
-	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
-	INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
-	mfspr	r11, SPRN_M_TW	/* Get level 1 table base address */
+	mtcr	r3
 #endif
 	/* Insert level 1 index */
 	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
@@ -378,7 +375,7 @@ InstructionTLBMiss:
 	MTSPR_CPU6(SPRN_MI_RPN, r10, r3)	/* Update TLB entry */
 
 	/* Restore registers */
-#ifdef CONFIG_8xx_CPU6
+#if defined(CONFIG_8xx_CPU6) || defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
 	mfspr	r3, SPRN_SPRG_SCRATCH2
 #endif
 	EXCEPTION_EPILOG_0
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] powerpc/8xx: Move additional DTLBMiss handlers out of exception area
  2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
                   ` (2 preceding siblings ...)
  2016-08-12 16:55 ` [PATCH 3/6] powerpc/8xx: use r3 to scratch CR in ITLBmiss Christophe Leroy
@ 2016-08-12 16:55 ` Christophe Leroy
  2016-08-12 16:55 ` [PATCH 5/6] powerpc/8xx: make user addr DTLB miss the short path Christophe Leroy
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Christophe Leroy @ 2016-08-12 16:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

When all options are activated, there is not enough space for the
DTLBMiss handlers that handles IMMR area and linear RAM pages in the
exception area. So lets move them after .0x2000

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/head_8xx.S | 84 +++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 708fd43..5f122e6 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -381,26 +381,6 @@ InstructionTLBMiss:
 	EXCEPTION_EPILOG_0
 	rfi
 
-/*
- * Bottom part of DataStoreTLBMiss handler for IMMR area
- * not enough space in the DataStoreTLBMiss area
- */
-DTLBMissIMMR:
-	mtcr	r10
-	/* Set 512k byte guarded page and mark it valid */
-	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID
-	MTSPR_CPU6(SPRN_MD_TWC, r10, r11)
-	mfspr	r10, SPRN_IMMR			/* Get current IMMR */
-	rlwinm	r10, r10, 0, 0xfff80000		/* Get 512 kbytes boundary */
-	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
-			  _PAGE_PRESENT | _PAGE_NO_CACHE
-	MTSPR_CPU6(SPRN_MD_RPN, r10, r11)	/* Update TLB entry */
-
-	li	r11, RPN_PATTERN
-	mtspr	SPRN_DAR, r11	/* Tag DAR */
-	EXCEPTION_EPILOG_0
-	rfi
-
 	. = 0x1200
 DataStoreTLBMiss:
 	EXCEPTION_PROLOG_0
@@ -419,7 +399,7 @@ DataStoreTLBMiss:
 _ENTRY(DTLBMiss_jmp)
 	beq-	DTLBMissIMMR
 #endif
-	bge-	cr7, 4f
+	bge-	cr7, DTLBMissLinear
 
 	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
 3:
@@ -486,27 +466,6 @@ _ENTRY(DTLBMiss_jmp)
 	EXCEPTION_EPILOG_0
 	rfi
 
-4:
-_ENTRY(DTLBMiss_cmp)
-	cmpli	cr0, r11, (PAGE_OFFSET + 0x1800000)@h
-	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
-	bge-	3b
-
-	mtcr	r10
-	/* Set 8M byte page and mark it valid */
-	li	r10, MD_PS8MEG | MD_SVALID
-	MTSPR_CPU6(SPRN_MD_TWC, r10, r11)
-	mfspr	r10, SPRN_MD_EPN
-	rlwinm	r10, r10, 0, 0x0f800000		/* 8xx supports max 256Mb RAM */
-	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
-			  _PAGE_PRESENT
-	MTSPR_CPU6(SPRN_MD_RPN, r10, r11)	/* Update TLB entry */
-
-	li	r11, RPN_PATTERN
-	mtspr	SPRN_DAR, r11	/* Tag DAR */
-	EXCEPTION_EPILOG_0
-	rfi
-
 
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
@@ -568,6 +527,47 @@ DARFixed:/* Return from dcbx instruction bug workaround */
 
 	. = 0x2000
 
+/*
+ * Bottom part of DataStoreTLBMiss handlers for IMMR area and linear RAM.
+ * not enough space in the DataStoreTLBMiss area.
+ */
+DTLBMissIMMR:
+	mtcr	r10
+	/* Set 512k byte guarded page and mark it valid */
+	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID
+	MTSPR_CPU6(SPRN_MD_TWC, r10, r11)
+	mfspr	r10, SPRN_IMMR			/* Get current IMMR */
+	rlwinm	r10, r10, 0, 0xfff80000		/* Get 512 kbytes boundary */
+	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
+			  _PAGE_PRESENT | _PAGE_NO_CACHE
+	MTSPR_CPU6(SPRN_MD_RPN, r10, r11)	/* Update TLB entry */
+
+	li	r11, RPN_PATTERN
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	EXCEPTION_EPILOG_0
+	rfi
+
+DTLBMissLinear:
+_ENTRY(DTLBMiss_cmp)
+	cmpli	cr0, r11, (PAGE_OFFSET + 0x1800000)@h
+	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+	bge-	3b
+
+	mtcr	r10
+	/* Set 8M byte page and mark it valid */
+	li	r10, MD_PS8MEG | MD_SVALID
+	MTSPR_CPU6(SPRN_MD_TWC, r10, r11)
+	mfspr	r10, SPRN_MD_EPN
+	rlwinm	r10, r10, 0, 0x0f800000		/* 8xx supports max 256Mb RAM */
+	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
+			  _PAGE_PRESENT
+	MTSPR_CPU6(SPRN_MD_RPN, r10, r11)	/* Update TLB entry */
+
+	li	r11, RPN_PATTERN
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	EXCEPTION_EPILOG_0
+	rfi
+
 /* This is the procedure to calculate the data EA for buggy dcbx,dcbi instructions
  * by decoding the registers used by the dcbx instruction and adding them.
  * DAR is set to the calculated address.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] powerpc/8xx: make user addr DTLB miss the short path
  2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
                   ` (3 preceding siblings ...)
  2016-08-12 16:55 ` [PATCH 4/6] powerpc/8xx: Move additional DTLBMiss handlers out of exception area Christophe Leroy
@ 2016-08-12 16:55 ` Christophe Leroy
  2016-08-12 16:55 ` [PATCH 6/6] powerpc/8xx: implementation of huge pages Christophe Leroy
  2016-08-14 14:27 ` [PATCH 0/6] " Aneesh Kumar K.V
  6 siblings, 0 replies; 16+ messages in thread
From: Christophe Leroy @ 2016-08-12 16:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

User space DTLB miss represent approximatly 90% of TLB misses
so make it the shortest path.

Also remove an unneccessary double jump in FixupDAR

Before this patch, we spend 3.3 TB ticks in the handler for each
user address miss and 3.4 TB ticks for each kernel address miss
After this patch, we send 3.0 TB ticks in the handler for each
user address miss and 3.9 TB ticks for each kernel address miss
Taking into account that user misses represent 90% of the total,
this patch provides an improvement of approx. 9%

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/head_8xx.S | 53 ++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 5f122e6..5ce67f2 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -383,30 +383,31 @@ InstructionTLBMiss:
 
 	. = 0x1200
 DataStoreTLBMiss:
+	mtspr	SPRN_SPRG_SCRATCH2, r3
 	EXCEPTION_PROLOG_0
-	mfcr	r10
+	mfcr	r3
 
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
-	mfspr	r11, SPRN_MD_EPN
-	rlwinm	r11, r11, 16, 0xfff8
+	mfspr	r10, SPRN_MD_EPN
+	rlwinm	r10, r10, 16, 0xfff8
+	cmpli	cr0, r10, PAGE_OFFSET@h
+	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
+	blt+	3f
 #ifndef CONFIG_PIN_TLB_IMMR
-	cmpli	cr0, r11, VIRT_IMMR_BASE@h
+	cmpli	cr0, r10, VIRT_IMMR_BASE@h
 #endif
-	cmpli	cr7, r11, PAGE_OFFSET@h
+_ENTRY(DTLBMiss_cmp)
+	cmpli	cr7, r10, (PAGE_OFFSET + 0x1800000)@h
+	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 #ifndef CONFIG_PIN_TLB_IMMR
 _ENTRY(DTLBMiss_jmp)
 	beq-	DTLBMissIMMR
 #endif
-	bge-	cr7, DTLBMissLinear
-
-	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
+	blt	cr7, DTLBMissLinear
 3:
-	mtcr	r10
-#ifdef CONFIG_8xx_CPU6
-	mtspr	SPRN_SPRG_SCRATCH2, r3
-#endif
+	mtcr	r3
 	mfspr	r10, SPRN_MD_EPN
 
 	/* Insert level 1 index */
@@ -459,9 +460,7 @@ _ENTRY(DTLBMiss_jmp)
 	MTSPR_CPU6(SPRN_MD_RPN, r10, r3)	/* Update TLB entry */
 
 	/* Restore registers */
-#ifdef CONFIG_8xx_CPU6
 	mfspr	r3, SPRN_SPRG_SCRATCH2
-#endif
 	mtspr	SPRN_DAR, r11	/* Tag DAR */
 	EXCEPTION_EPILOG_0
 	rfi
@@ -532,7 +531,7 @@ DARFixed:/* Return from dcbx instruction bug workaround */
  * not enough space in the DataStoreTLBMiss area.
  */
 DTLBMissIMMR:
-	mtcr	r10
+	mtcr	r3
 	/* Set 512k byte guarded page and mark it valid */
 	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID
 	MTSPR_CPU6(SPRN_MD_TWC, r10, r11)
@@ -544,27 +543,23 @@ DTLBMissIMMR:
 
 	li	r11, RPN_PATTERN
 	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	mfspr	r3, SPRN_SPRG_SCRATCH2
 	EXCEPTION_EPILOG_0
 	rfi
 
 DTLBMissLinear:
-_ENTRY(DTLBMiss_cmp)
-	cmpli	cr0, r11, (PAGE_OFFSET + 0x1800000)@h
-	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
-	bge-	3b
-
-	mtcr	r10
+	mtcr	r3
 	/* Set 8M byte page and mark it valid */
-	li	r10, MD_PS8MEG | MD_SVALID
-	MTSPR_CPU6(SPRN_MD_TWC, r10, r11)
-	mfspr	r10, SPRN_MD_EPN
-	rlwinm	r10, r10, 0, 0x0f800000		/* 8xx supports max 256Mb RAM */
+	li	r11, MD_PS8MEG | MD_SVALID
+	MTSPR_CPU6(SPRN_MD_TWC, r11, r3)
+	rlwinm	r10, r10, 16, 0x0f800000	/* 8xx supports max 256Mb RAM */
 	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
 			  _PAGE_PRESENT
 	MTSPR_CPU6(SPRN_MD_RPN, r10, r11)	/* Update TLB entry */
 
 	li	r11, RPN_PATTERN
 	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	mfspr	r3, SPRN_SPRG_SCRATCH2
 	EXCEPTION_EPILOG_0
 	rfi
 
@@ -584,7 +579,9 @@ FixupDAR:/* Entry point for dcbx workaround. */
 	rlwinm	r11, r10, 16, 0xfff8
 _ENTRY(FixupDAR_cmp)
 	cmpli	cr7, r11, (PAGE_OFFSET + 0x1800000)@h
-	blt-	cr7, 200f
+	/* create physical page address from effective address */
+	tophys(r11, r10)
+	blt-	cr7, 201f
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 	/* Insert level 1 index */
 3:	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
@@ -614,10 +611,6 @@ _ENTRY(FixupDAR_cmp)
 141:	mfspr	r10,SPRN_SPRG_SCRATCH2
 	b	DARFixed	/* Nope, go back to normal TLB processing */
 
-	/* create physical page address from effective address */
-200:	tophys(r11, r10)
-	b	201b
-
 144:	mfspr	r10, SPRN_DSISR
 	rlwinm	r10, r10,0,7,5	/* Clear store bit for buggy dcbst insn */
 	mtspr	SPRN_DSISR, r10
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] powerpc/8xx: implementation of huge pages
  2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
                   ` (4 preceding siblings ...)
  2016-08-12 16:55 ` [PATCH 5/6] powerpc/8xx: make user addr DTLB miss the short path Christophe Leroy
@ 2016-08-12 16:55 ` Christophe Leroy
  2016-08-14 14:25   ` Aneesh Kumar K.V
  2016-08-14 14:27 ` [PATCH 0/6] " Aneesh Kumar K.V
  6 siblings, 1 reply; 16+ messages in thread
From: Christophe Leroy @ 2016-08-12 16:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

The 8xx has 512k and 8M pages. This patch implements hugepages using
those sizes.

On the 8xx, the size of pages is in the PGD entry,
using PS field (bits 28-29):
00 : Small pages (4k or 16k)
01 : 512k pages
10 : reserved
11 : 8M pages

The implementation uses a mix of what is used on BOOKS and BOOKE,
as 512k pages are in HUGEPTE tables while for 8M pages we have
several PGD entries pointing on a leaf HUGEPTE entry

For the time being, we do not support CPU15 ERRATA if HUGETLB is
selected

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/hugetlb.h           |  18 ++-
 arch/powerpc/include/asm/mmu-8xx.h           |  35 ++++++
 arch/powerpc/include/asm/mmu.h               |  25 ++--
 arch/powerpc/include/asm/nohash/32/pte-8xx.h |   1 +
 arch/powerpc/include/asm/nohash/pgtable.h    |   4 +
 arch/powerpc/include/asm/reg_8xx.h           |   2 +-
 arch/powerpc/kernel/head_8xx.S               | 119 +++++++++++++++++-
 arch/powerpc/mm/hugetlbpage.c                | 176 ++++++++++-----------------
 arch/powerpc/mm/tlb_nohash.c                 |  21 +++-
 arch/powerpc/platforms/8xx/Kconfig           |   1 +
 arch/powerpc/platforms/Kconfig.cputype       |   1 +
 11 files changed, 267 insertions(+), 136 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index c201cd6..96b3219 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -49,12 +49,20 @@ static inline void __local_flush_hugetlb_page(struct vm_area_struct *vma,
 static inline pte_t *hugepd_page(hugepd_t hpd)
 {
 	BUG_ON(!hugepd_ok(hpd));
+#ifdef CONFIG_PPC_8xx
+	return (pte_t *)__va(hpd.pd & ~(_PMD_PAGE_MASK | _PMD_PRESENT_MASK));
+#else
 	return (pte_t *)((hpd.pd & ~HUGEPD_SHIFT_MASK) | PD_HUGE);
+#endif
 }
 
 static inline unsigned int hugepd_shift(hugepd_t hpd)
 {
+#ifdef CONFIG_PPC_8xx
+	return ((hpd.pd & _PMD_PAGE_MASK) >> 1) + 17;
+#else
 	return hpd.pd & HUGEPD_SHIFT_MASK;
+#endif
 }
 
 #endif /* CONFIG_PPC_BOOK3S_64 */
@@ -97,7 +105,14 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
 
 void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
 			    pte_t pte);
+#ifdef CONFIG_PPC_8xx
+static inline void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
+{
+	flush_tlb_page(vma, vmaddr);
+}
+#else
 void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
+#endif
 
 void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
 			    unsigned long end, unsigned long floor,
@@ -203,7 +218,8 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
  * are reserved early in the boot process by memblock instead of via
  * the .dts as on IBM platforms.
  */
-#if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_PPC_FSL_BOOK3E)
+#if defined(CONFIG_HUGETLB_PAGE) && (defined(CONFIG_PPC_FSL_BOOK3E) || \
+    defined(CONFIG_PPC_8xx))
 extern void __init reserve_hugetlb_gpages(void);
 #else
 static inline void reserve_hugetlb_gpages(void)
diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index 3e0e492..3179688 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -172,6 +172,41 @@ typedef struct {
 
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff80000)
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
+
+/* Page size definitions, common between 32 and 64-bit
+ *
+ *    shift : is the "PAGE_SHIFT" value for that page size
+ *    penc  : is the pte encoding mask
+ *
+ */
+struct mmu_psize_def
+{
+	unsigned int	shift;	/* number of bits */
+	unsigned int	enc;	/* PTE encoding */
+	unsigned int    ind;    /* Corresponding indirect page size shift */
+	unsigned int	flags;
+#define MMU_PAGE_SIZE_DIRECT	0x1	/* Supported as a direct size */
+#define MMU_PAGE_SIZE_INDIRECT	0x2	/* Supported as an indirect size */
+};
+extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
+
+static inline int shift_to_mmu_psize(unsigned int shift)
+{
+	int psize;
+
+	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize)
+		if (mmu_psize_defs[psize].shift == shift)
+			return psize;
+	return -1;
+}
+
+static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
+{
+	if (mmu_psize_defs[mmu_psize].shift)
+		return mmu_psize_defs[mmu_psize].shift;
+	BUG();
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #if defined(CONFIG_PPC_4K_PAGES)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e2fb408..beccfbe 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -260,18 +260,19 @@ static inline bool early_radix_enabled(void)
 #define MMU_PAGE_64K	2
 #define MMU_PAGE_64K_AP	3	/* "Admixed pages" (hash64 only) */
 #define MMU_PAGE_256K	4
-#define MMU_PAGE_1M	5
-#define MMU_PAGE_2M	6
-#define MMU_PAGE_4M	7
-#define MMU_PAGE_8M	8
-#define MMU_PAGE_16M	9
-#define MMU_PAGE_64M	10
-#define MMU_PAGE_256M	11
-#define MMU_PAGE_1G	12
-#define MMU_PAGE_16G	13
-#define MMU_PAGE_64G	14
-
-#define MMU_PAGE_COUNT	15
+#define MMU_PAGE_512K	5
+#define MMU_PAGE_1M	6
+#define MMU_PAGE_2M	7
+#define MMU_PAGE_4M	8
+#define MMU_PAGE_8M	9
+#define MMU_PAGE_16M	10
+#define MMU_PAGE_64M	11
+#define MMU_PAGE_256M	12
+#define MMU_PAGE_1G	13
+#define MMU_PAGE_16G	14
+#define MMU_PAGE_64G	15
+
+#define MMU_PAGE_COUNT	16
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #include <asm/book3s/64/mmu.h>
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 3742b19..b4df273 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -49,6 +49,7 @@
 #define _PMD_BAD	0x0ff0
 #define _PMD_PAGE_MASK	0x000c
 #define _PMD_PAGE_8M	0x000c
+#define _PMD_PAGE_512K	0x0004
 
 /* Until my rework is finished, 8xx still needs atomic PTE updates */
 #define PTE_ATOMIC_UPDATES	1
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 1263c22..1728497 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -226,7 +226,11 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 #ifdef CONFIG_HUGETLB_PAGE
 static inline int hugepd_ok(hugepd_t hpd)
 {
+#ifdef CONFIG_PPC_8xx
+	return ((hpd.pd & 0x4) != 0);
+#else
 	return (hpd.pd > 0);
+#endif
 }
 
 static inline int pmd_huge(pmd_t pmd)
diff --git a/arch/powerpc/include/asm/reg_8xx.h b/arch/powerpc/include/asm/reg_8xx.h
index 94d01f8..feaf641 100644
--- a/arch/powerpc/include/asm/reg_8xx.h
+++ b/arch/powerpc/include/asm/reg_8xx.h
@@ -4,7 +4,7 @@
 #ifndef _ASM_POWERPC_REG_8xx_H
 #define _ASM_POWERPC_REG_8xx_H
 
-#include <asm/mmu-8xx.h>
+#include <asm/mmu.h>
 
 /* Cache control on the MPC8xx is provided through some additional
  * special purpose registers.
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 5ce67f2..c77e0c6 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -72,6 +72,9 @@
 #define RPN_PATTERN	0x00f0
 #endif
 
+#define PAGE_SHIFT_512K		19
+#define PAGE_SHIFT_8M		23
+
 	__HEAD
 _ENTRY(_stext);
 _ENTRY(_start);
@@ -322,7 +325,7 @@ SystemCall:
 #endif
 
 InstructionTLBMiss:
-#if defined(CONFIG_8xx_CPU6) || defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
+#if defined(CONFIG_8xx_CPU6) || defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC) || defined (CONFIG_HUGETLB_PAGE)
 	mtspr	SPRN_SPRG_SCRATCH2, r3
 #endif
 	EXCEPTION_PROLOG_0
@@ -332,10 +335,12 @@ InstructionTLBMiss:
 	 */
 	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
 	INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
-#if defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
 	/* Only modules will cause ITLB Misses as we always
 	 * pin the first 8MB of kernel memory */
+#if defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC) || defined (CONFIG_HUGETLB_PAGE)
 	mfcr	r3
+#endif
+#if defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
 	IS_KERNEL(r11, r10)
 #endif
 	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
@@ -343,7 +348,6 @@ InstructionTLBMiss:
 	BRANCH_UNLESS_KERNEL(3f)
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 3:
-	mtcr	r3
 #endif
 	/* Insert level 1 index */
 	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
@@ -351,14 +355,25 @@ InstructionTLBMiss:
 
 	/* Extract level 2 index */
 	rlwinm	r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
+#ifdef CONFIG_HUGETLB_PAGE
+	mtcr	r11
+	bt-	28, 10f		/* bit 28 = Large page (8M) */
+	bt-	29, 20f		/* bit 29 = Large page (8M or 512k) */
+#endif
 	rlwimi	r10, r11, 0, 0, 32 - PAGE_SHIFT - 1	/* Add level 2 base */
 	lwz	r10, 0(r10)	/* Get the pte */
-
+4:
+#if defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC) || defined (CONFIG_HUGETLB_PAGE)
+	mtcr	r3
+#endif
 	/* Insert the APG into the TWC from the Linux PTE. */
 	rlwimi	r11, r10, 0, 25, 26
 	/* Load the MI_TWC with the attributes for this "segment." */
 	MTSPR_CPU6(SPRN_MI_TWC, r11, r3)	/* Set segment attributes */
 
+#if defined (CONFIG_HUGETLB_PAGE) && defined (CONFIG_PPC_4K_PAGES)
+	rlwimi	r10, r11, 1, MI_SPS16K
+#endif
 #ifdef CONFIG_SWAP
 	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
 	and	r11, r11, r10
@@ -371,16 +386,45 @@ InstructionTLBMiss:
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
+#if defined (CONFIG_HUGETLB_PAGE) && defined (CONFIG_PPC_4K_PAGES)
+	rlwimi	r10, r11, 0, 0x0ff0	/* Set 24-27, clear 20-23 */
+#else
 	rlwimi	r10, r11, 0, 0x0ff8	/* Set 24-27, clear 20-23,28 */
+#endif
 	MTSPR_CPU6(SPRN_MI_RPN, r10, r3)	/* Update TLB entry */
 
 	/* Restore registers */
-#if defined(CONFIG_8xx_CPU6) || defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC)
+#if defined(CONFIG_8xx_CPU6) || defined(CONFIG_MODULES) || defined (CONFIG_DEBUG_PAGEALLOC) || defined (CONFIG_HUGETLB_PAGE)
 	mfspr	r3, SPRN_SPRG_SCRATCH2
 #endif
 	EXCEPTION_EPILOG_0
 	rfi
 
+#ifdef CONFIG_HUGETLB_PAGE
+10:	/* 8M pages */
+#ifdef CONFIG_PPC_16K_PAGES
+	/* Extract level 2 index */
+	rlwinm	r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
+	/* Add level 2 base */
+	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
+#else
+	/* Level 2 base */
+	rlwinm	r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1 - 1
+#endif
+	lwz	r10, 0(r10)	/* Get the pte */
+	rlwinm	r11, r11, 0, 0xf
+	b	4b
+
+20:	/* 512k pages */
+	/* Extract level 2 index */
+	rlwinm	r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
+	/* Add level 2 base */
+	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
+	lwz	r10, 0(r10)	/* Get the pte */
+	rlwinm	r11, r11, 0, 0xf
+	b	4b
+#endif
+
 	. = 0x1200
 DataStoreTLBMiss:
 	mtspr	SPRN_SPRG_SCRATCH2, r3
@@ -407,7 +451,6 @@ _ENTRY(DTLBMiss_jmp)
 #endif
 	blt	cr7, DTLBMissLinear
 3:
-	mtcr	r3
 	mfspr	r10, SPRN_MD_EPN
 
 	/* Insert level 1 index */
@@ -418,8 +461,15 @@ _ENTRY(DTLBMiss_jmp)
 	 */
 	/* Extract level 2 index */
 	rlwinm	r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
+#ifdef CONFIG_HUGETLB_PAGE
+	mtcr	r11
+	bt-	28, 10f		/* bit 28 = Large page (8M) */
+	bt-	29, 20f		/* bit 29 = Large page (8M or 512k) */
+#endif
 	rlwimi	r10, r11, 0, 0, 32 - PAGE_SHIFT - 1	/* Add level 2 base */
 	lwz	r10, 0(r10)	/* Get the pte */
+4:
+	mtcr	r3
 
 	/* Insert the Guarded flag and APG into the TWC from the Linux PTE.
 	 * It is bit 26-27 of both the Linux PTE and the TWC (at least
@@ -434,6 +484,11 @@ _ENTRY(DTLBMiss_jmp)
 	rlwimi	r11, r10, 32-5, 30, 30
 	MTSPR_CPU6(SPRN_MD_TWC, r11, r3)
 
+	/* In 4k pages mode, SPS (bit 28) in RPN must match PS[1] (bit 29)
+	 * In 16k pages mode, SPS is always 1 */
+#if defined (CONFIG_HUGETLB_PAGE) && defined (CONFIG_PPC_4K_PAGES)
+	rlwimi	r10, r11, 1, MD_SPS16K
+#endif
 	/* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
 	 * We also need to know if the insn is a load/store, so:
 	 * Clear _PAGE_PRESENT and load that which will
@@ -455,7 +510,11 @@ _ENTRY(DTLBMiss_jmp)
 	 * of the MMU.
 	 */
 	li	r11, RPN_PATTERN
+#if defined (CONFIG_HUGETLB_PAGE) && defined (CONFIG_PPC_4K_PAGES)
+	rlwimi	r10, r11, 0, 24, 27	/* Set 24-27 */
+#else
 	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
+#endif
 	rlwimi	r10, r11, 0, 20, 20	/* clear 20 */
 	MTSPR_CPU6(SPRN_MD_RPN, r10, r3)	/* Update TLB entry */
 
@@ -465,6 +524,30 @@ _ENTRY(DTLBMiss_jmp)
 	EXCEPTION_EPILOG_0
 	rfi
 
+#ifdef CONFIG_HUGETLB_PAGE
+10:	/* 8M pages */
+	/* Extract level 2 index */
+#ifdef CONFIG_PPC_16K_PAGES
+	rlwinm	r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
+	/* Add level 2 base */
+	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
+#else
+	/* Level 2 base */
+	rlwinm	r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1 - 1
+#endif
+	lwz	r10, 0(r10)	/* Get the pte */
+	rlwinm	r11, r11, 0, 0xf
+	b	4b
+
+20:	/* 512k pages */
+	/* Extract level 2 index */
+	rlwinm	r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
+	/* Add level 2 base */
+	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
+	lwz	r10, 0(r10)	/* Get the pte */
+	rlwinm	r11, r11, 0, 0xf
+	b	4b
+#endif
 
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
@@ -586,6 +669,9 @@ _ENTRY(FixupDAR_cmp)
 	/* Insert level 1 index */
 3:	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
+	mtcr	r11
+	bt	28,200f		/* bit 28 = Large page (8M) */
+	bt	29,202f		/* bit 29 = Large page (8M or 512K) */
 	rlwinm	r11, r11,0,0,19	/* Extract page descriptor page address */
 	/* Insert level 2 index */
 	rlwimi	r11, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
@@ -611,6 +697,27 @@ _ENTRY(FixupDAR_cmp)
 141:	mfspr	r10,SPRN_SPRG_SCRATCH2
 	b	DARFixed	/* Nope, go back to normal TLB processing */
 
+	/* concat physical page address(r11) and page offset(r10) */
+200:
+#ifdef CONFIG_PPC_16K_PAGES
+	rlwinm	r11, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
+	rlwimi	r11, r10, 32 - (PAGE_SHIFT_8M - 2), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
+#else
+	rlwinm	r11, r10, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1 - 1
+#endif
+	lwz	r11, 0(r11)	/* Get the pte */
+	/* concat physical page address(r11) and page offset(r10) */
+	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT_8M, 31
+	b	201b
+
+202:
+	rlwinm	r11, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
+	rlwimi	r11, r10, 32 - (PAGE_SHIFT_512K - 2), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
+	lwz	r11, 0(r11)	/* Get the pte */
+	/* concat physical page address(r11) and page offset(r10) */
+	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT_512K, 31
+	b	201b
+
 144:	mfspr	r10, SPRN_DSISR
 	rlwinm	r10, r10,0,7,5	/* Clear store bit for buggy dcbst insn */
 	mtspr	SPRN_DSISR, r10
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 03fcb7e..20934db 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -26,6 +26,8 @@
 #ifdef CONFIG_HUGETLB_PAGE
 
 #define PAGE_SHIFT_64K	16
+#define PAGE_SHIFT_512K	19
+#define PAGE_SHIFT_8M	23
 #define PAGE_SHIFT_16M	24
 #define PAGE_SHIFT_16G	34
 
@@ -38,7 +40,7 @@ unsigned int HPAGE_SHIFT;
  * implementations may have more than one gpage size, so we need multiple
  * arrays
  */
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 #define MAX_NUMBER_GPAGES	128
 struct psize_gpages {
 	u64 gpage_list[MAX_NUMBER_GPAGES];
@@ -64,14 +66,10 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 {
 	struct kmem_cache *cachep;
 	pte_t *new;
-
-#ifdef CONFIG_PPC_FSL_BOOK3E
 	int i;
-	int num_hugepd = 1 << (pshift - pdshift);
-	cachep = PGT_CACHE(1);
-#else
-	cachep = PGT_CACHE(pdshift - pshift);
-#endif
+	int num_hugepd = 1 << (pshift - pdshift) ? : 1;
+
+	cachep = PGT_CACHE(pdshift > pshift ? pdshift - pshift : 1);
 
 	new = kmem_cache_zalloc(cachep, GFP_KERNEL);
 
@@ -89,7 +87,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 	smp_wmb();
 
 	spin_lock(&mm->page_table_lock);
-#ifdef CONFIG_PPC_FSL_BOOK3E
 	/*
 	 * We have multiple higher-level entries that point to the same
 	 * actual pte location.  Fill in each as we go and backtrack on error.
@@ -100,8 +97,18 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 		if (unlikely(!hugepd_none(*hpdp)))
 			break;
 		else
+#ifdef CONFIG_PPC_BOOK3S_64
+			hpdp->pd = __pa(new) |
+				   (shift_to_mmu_psize(pshift) << 2);
+#elif defined(CONFIG_PPC_8xx)
+			hpdp->pd = ((unsigned long)__pa(new)) |
+				   (pshift == PAGE_SHIFT_8M ? _PMD_PAGE_8M :
+							      _PMD_PAGE_512K) |
+				   _PMD_PRESENT;
+#else
 			/* We use the old format for PPC_FSL_BOOK3E */
 			hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
+#endif
 	}
 	/* If we bailed from the for loop early, an error occurred, clean up */
 	if (i < num_hugepd) {
@@ -109,17 +116,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 			hpdp->pd = 0;
 		kmem_cache_free(cachep, new);
 	}
-#else
-	if (!hugepd_none(*hpdp))
-		kmem_cache_free(cachep, new);
-	else {
-#ifdef CONFIG_PPC_BOOK3S_64
-		hpdp->pd = __pa(new) | (shift_to_mmu_psize(pshift) << 2);
-#else
-		hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
-#endif
-	}
-#endif
 	spin_unlock(&mm->page_table_lock);
 	return 0;
 }
@@ -128,7 +124,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
  * These macros define how to determine which level of the page table holds
  * the hpdp.
  */
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 #define HUGEPD_PGD_SHIFT PGDIR_SHIFT
 #define HUGEPD_PUD_SHIFT PUD_SHIFT
 #else
@@ -136,7 +132,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 #define HUGEPD_PUD_SHIFT PMD_SHIFT
 #endif
 
-#ifdef CONFIG_PPC_BOOK3S_64
 /*
  * At this point we do the placement change only for BOOK3S 64. This would
  * possibly work on other subarchs.
@@ -153,6 +148,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 	addr &= ~(sz-1);
 	pg = pgd_offset(mm, addr);
 
+#ifdef CONFIG_PPC_BOOK3S_64
 	if (pshift == PGDIR_SHIFT)
 		/* 16GB huge page */
 		return (pte_t *) pg;
@@ -178,32 +174,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 				hpdp = (hugepd_t *)pm;
 		}
 	}
-	if (!hpdp)
-		return NULL;
-
-	BUG_ON(!hugepd_none(*hpdp) && !hugepd_ok(*hpdp));
-
-	if (hugepd_none(*hpdp) && __hugepte_alloc(mm, hpdp, addr, pdshift, pshift))
-		return NULL;
-
-	return hugepte_offset(*hpdp, addr, pdshift);
-}
-
 #else
-
-pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
-{
-	pgd_t *pg;
-	pud_t *pu;
-	pmd_t *pm;
-	hugepd_t *hpdp = NULL;
-	unsigned pshift = __ffs(sz);
-	unsigned pdshift = PGDIR_SHIFT;
-
-	addr &= ~(sz-1);
-
-	pg = pgd_offset(mm, addr);
-
 	if (pshift >= HUGEPD_PGD_SHIFT) {
 		hpdp = (hugepd_t *)pg;
 	} else {
@@ -217,7 +188,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 			hpdp = (hugepd_t *)pm;
 		}
 	}
-
+#endif
 	if (!hpdp)
 		return NULL;
 
@@ -228,9 +199,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 
 	return hugepte_offset(*hpdp, addr, pdshift);
 }
-#endif
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 /* Build list of addresses of gigantic pages.  This function is used in early
  * boot before the buddy allocator is setup.
  */
@@ -310,7 +280,11 @@ static int __init do_gpage_early_setup(char *param, char *val,
 				npages = 0;
 			if (npages > MAX_NUMBER_GPAGES) {
 				pr_warn("MMU: %lu pages requested for page "
+#ifdef CONFIG_PPC64
 					"size %llu KB, limiting to "
+#else
+					"size %u KB, limiting to "
+#endif
 					__stringify(MAX_NUMBER_GPAGES) "\n",
 					npages, size / 1024);
 				npages = MAX_NUMBER_GPAGES;
@@ -392,7 +366,7 @@ int alloc_bootmem_huge_page(struct hstate *hstate)
 }
 #endif
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 #define HUGEPD_FREELIST_SIZE \
 	((PAGE_SIZE - sizeof(struct hugepd_freelist)) / sizeof(pte_t))
 
@@ -442,6 +416,12 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
 	}
 	put_cpu_var(hugepd_freelist_cur);
 }
+#else
+static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
+{
+	BUG();
+}
+
 #endif
 
 static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshift,
@@ -452,14 +432,9 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
 	int i;
 
 	unsigned long pdmask = ~((1UL << pdshift) - 1);
-	unsigned int num_hugepd = 1;
-
-#ifdef CONFIG_PPC_FSL_BOOK3E
-	/* Note: On fsl the hpdp may be the first of several */
-	num_hugepd = (1 << (hugepd_shift(*hpdp) - pdshift));
-#else
 	unsigned int shift = hugepd_shift(*hpdp);
-#endif
+	/* Note: On fsl the hpdp may be the first of several */
+	unsigned int num_hugepd = (1 << (shift - pdshift)) ? : 1;
 
 	start &= pdmask;
 	if (start < floor)
@@ -475,11 +450,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
 	for (i = 0; i < num_hugepd; i++, hpdp++)
 		hpdp->pd = 0;
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
-	hugepd_free(tlb, hugepte);
-#else
-	pgtable_free_tlb(tlb, hugepte, pdshift - shift);
-#endif
+	if (pdshift <= shift)
+		hugepd_free(tlb, hugepte);
+	else
+		pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -502,7 +476,7 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 			WARN_ON(!pmd_none_or_clear_bad(pmd));
 			continue;
 		}
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 		/*
 		 * Increment next by the size of the huge mapping since
 		 * there may be more than one entry at this level for a
@@ -550,7 +524,7 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
 			hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
 					       ceiling);
 		} else {
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 			/*
 			 * Increment next by the size of the huge mapping since
 			 * there may be more than one entry at this level for a
@@ -615,7 +589,7 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
 				continue;
 			hugetlb_free_pud_range(tlb, pgd, addr, next, floor, ceiling);
 		} else {
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 			/*
 			 * Increment next by the size of the huge mapping since
 			 * there may be more than one entry at the pgd level
@@ -753,12 +727,13 @@ static int __init add_huge_page_size(unsigned long long size)
 
 	/* Check that it is a page size supported by the hardware and
 	 * that it fits within pagetable and slice limits. */
-#ifdef CONFIG_PPC_FSL_BOOK3E
-	if ((size < PAGE_SIZE) || !is_power_of_4(size))
+	if ((size <= PAGE_SIZE))
 		return -EINVAL;
-#else
-	if (!is_power_of_2(size)
-	    || (shift > SLICE_HIGH_SHIFT) || (shift <= PAGE_SHIFT))
+#if defined(CONFIG_PPC_FSL_BOOK3E)
+	if (!is_power_of_4(size))
+		return -EINVAL;
+#elif !defined(CONFIG_PPC_8xx)
+	if (!is_power_of_2(size) || (shift > SLICE_HIGH_SHIFT))
 		return -EINVAL;
 #endif
 
@@ -791,51 +766,14 @@ static int __init hugepage_setup_sz(char *str)
 }
 __setup("hugepagesz=", hugepage_setup_sz);
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
-static int __init hugetlbpage_init(void)
-{
-	int psize;
-
-	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
-		unsigned shift;
-
-		if (!mmu_psize_defs[psize].shift)
-			continue;
-
-		shift = mmu_psize_to_shift(psize);
-
-		/* Don't treat normal page sizes as huge... */
-		if (shift != PAGE_SHIFT)
-			if (add_huge_page_size(1ULL << shift) < 0)
-				continue;
-	}
-
-	/*
-	 * Create a kmem cache for hugeptes.  The bottom bits in the pte have
-	 * size information encoded in them, so align them to allow this
-	 */
-	pgtable_cache_add(1, NULL);
-	if (!PGT_CACHE(1))
-		panic("%s: Unable to create kmem cache for hugeptes\n",
-		      __func__);
-
-	/* Default hpage size = 4M */
-	if (mmu_psize_defs[MMU_PAGE_4M].shift)
-		HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
-	else
-		panic("%s: Unable to set default huge page size\n", __func__);
-
-
-	return 0;
-}
-#else
 static int __init hugetlbpage_init(void)
 {
 	int psize;
 
+#if !defined(CONFIG_PPC_FSL_BOOK3E) && !defined(CONFIG_PPC_8xx)
 	if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE))
 		return -ENODEV;
-
+#endif
 	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
 		unsigned shift;
 		unsigned pdshift;
@@ -859,15 +797,18 @@ static int __init hugetlbpage_init(void)
 		 * use pgt cache for hugepd.
 		 */
 		if (pdshift != shift) {
-			pgtable_cache_add(pdshift - shift, NULL);
-			if (!PGT_CACHE(pdshift - shift))
+			int size_hugepd = pdshift > shift ? pdshift - shift : 1;
+
+			pgtable_cache_add(size_hugepd, NULL);
+			if (!PGT_CACHE(size_hugepd))
 				panic("hugetlbpage_init(): could not create "
 				      "pgtable cache for %d bit pagesize\n", shift);
 		}
 	}
 
 	/* Set default large page size. Currently, we pick 16M or 1M
-	 * depending on what is available
+	 * depending on what is available. On PPC_8xx we select 512K.
+	 * We select 4M on other ones.
 	 */
 	if (mmu_psize_defs[MMU_PAGE_16M].shift)
 		HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_16M].shift;
@@ -875,11 +816,16 @@ static int __init hugetlbpage_init(void)
 		HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_1M].shift;
 	else if (mmu_psize_defs[MMU_PAGE_2M].shift)
 		HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_2M].shift;
-
+	else if (mmu_psize_defs[MMU_PAGE_4M].shift)
+		HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
+	else if (mmu_psize_defs[MMU_PAGE_512K].shift)
+		HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_512K].shift;
+	else
+		panic("%s: Unable to set default huge page size\n", __func__);
 
 	return 0;
 }
-#endif
+
 arch_initcall(hugetlbpage_init);
 
 void flush_dcache_icache_hugepage(struct page *page)
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 050badc..a33522b 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -53,7 +53,7 @@
  * other sizes not listed here.   The .ind field is only used on MMUs that have
  * indirect page table entries.
  */
-#ifdef CONFIG_PPC_BOOK3E_MMU
+#if defined(CONFIG_PPC_BOOK3E_MMU) || defined (CONFIG_PPC_8xx)
 #ifdef CONFIG_PPC_FSL_BOOK3E
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
 	[MMU_PAGE_4K] = {
@@ -85,6 +85,25 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
 		.enc	= BOOK3E_PAGESZ_1GB,
 	},
 };
+#elif defined (CONFIG_PPC_8xx)
+struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
+	/* we only manage 4k and 16k pages as normal pages */
+#ifdef CONFIG_PPC_4K_PAGES
+	[MMU_PAGE_4K] = {
+		.shift	= 12,
+	},
+#else
+	[MMU_PAGE_16K] = {
+		.shift	= 14,
+	},
+#endif
+	[MMU_PAGE_512K] = {
+		.shift	= 19,
+	},
+	[MMU_PAGE_8M] = {
+		.shift	= 23,
+	},
+};
 #else
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
 	[MMU_PAGE_4K] = {
diff --git a/arch/powerpc/platforms/8xx/Kconfig b/arch/powerpc/platforms/8xx/Kconfig
index 564d99b..80cbcb0 100644
--- a/arch/powerpc/platforms/8xx/Kconfig
+++ b/arch/powerpc/platforms/8xx/Kconfig
@@ -130,6 +130,7 @@ config 8xx_CPU6
 
 config 8xx_CPU15
 	bool "CPU15 Silicon Errata"
+	depends on !HUGETLB_PAGE
 	default y
 	help
 	  This enables a workaround for erratum CPU15 on MPC8xx chips.
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index f32edec..59887ad 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -34,6 +34,7 @@ config PPC_8xx
 	select FSL_SOC
 	select 8xx
 	select PPC_LIB_RHEAP
+	select SYS_SUPPORTS_HUGETLBFS
 
 config 40x
 	bool "AMCC 40x"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits
  2016-08-12 16:55 ` [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits Christophe Leroy
@ 2016-08-14 14:17   ` Aneesh Kumar K.V
  2016-08-14 18:51     ` christophe leroy
  0 siblings, 1 reply; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-14 14:17 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linuxppc-dev, linux-kernel

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
> standard pages when using 4k pages and a single pgtable_cache
> if using other size pages. In addition powerpc32 uses another cache
> when handling huge pages.
>
> In preparation of implementing huge pages on the 8xx, this patch
> replaces the specific powerpc32 handling by the 64 bits approach.

Why is this needed ? Can you also summarize the page size used and the
hugepage format you are planning to use ? . What are the page sizes
supported by 8xx ? Also is the new code copy of existing powerpc64 4k
page size code ?

>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  arch/powerpc/include/asm/book3s/32/pgalloc.h |  44 ++++++--
>  arch/powerpc/include/asm/book3s/32/pgtable.h |  43 ++++----
>  arch/powerpc/include/asm/book3s/64/pgtable.h |   3 -
>  arch/powerpc/include/asm/hugetlb.h           |   2 -
>  arch/powerpc/include/asm/nohash/32/pgalloc.h |  44 ++++++--
>  arch/powerpc/include/asm/nohash/32/pgtable.h |  45 ++++----
>  arch/powerpc/include/asm/nohash/64/pgtable.h |   2 -
>  arch/powerpc/include/asm/pgtable.h           |   2 +
>  arch/powerpc/mm/Makefile                     |   2 +-
>  arch/powerpc/mm/hugetlbpage.c                |  12 +--
>  arch/powerpc/mm/init-common.c                | 152 +++++++++++++++++++++++++++
>  arch/powerpc/mm/init_32.c                    |   5 -
>  arch/powerpc/mm/init_64.c                    |  82 ---------------
>  arch/powerpc/mm/pgtable_32.c                 |  37 -------
>  14 files changed, 282 insertions(+), 193 deletions(-)
>  create mode 100644 arch/powerpc/mm/init-common.c
>
> diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> index 8e21bb4..ab215fd 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> @@ -2,14 +2,42 @@
>  #define _ASM_POWERPC_BOOK3S_32_PGALLOC_H
>  
>  #include <linux/threads.h>
> +#include <linux/slab.h>
>  
> -/* For 32-bit, all levels of page tables are just drawn from get_free_page() */
> -#define MAX_PGTABLE_INDEX_SIZE	0
> +/*
> + * Functions that deal with pagetables that could be at any level of
> + * the table need to be passed an "index_size" so they know how to
> + * handle allocation.  For PTE pages (which are linked to a struct
> + * page for now, and drawn from the main get_free_pages() pool), the
> + * allocation size will be (2^index_size * sizeof(pointer)) and
> + * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
> + *
> + * The maximum index size needs to be big enough to allow any
> + * pagetable sizes we need, but small enough to fit in the low bits of
> + * any page table pointer.  In other words all pagetables, even tiny
> + * ones, must be aligned to allow at least enough low 0 bits to
> + * contain this value.  This value is also used as a mask, so it must
> + * be one less than a power of two.
> + */
> +#define MAX_PGTABLE_INDEX_SIZE	0xf
>  
>  extern void __bad_pte(pmd_t *pmd);
>  
> -extern pgd_t *pgd_alloc(struct mm_struct *mm);
> -extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
> +extern struct kmem_cache *pgtable_cache[];
> +#define PGT_CACHE(shift) ({				\
> +			BUG_ON(!(shift));		\
> +			pgtable_cache[(shift) - 1];	\
> +		})
> +
> +static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> +{
> +	return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), GFP_KERNEL);
> +}
> +
> +static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> +{
> +	kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
> +}
>  
>  /*
>   * We don't have any real pmd's, and this code never triggers because
> @@ -68,8 +96,12 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
>  
>  static inline void pgtable_free(void *table, unsigned index_size)
>  {
> -	BUG_ON(index_size); /* 32-bit doesn't use this */
> -	free_page((unsigned long)table);
> +	if (!index_size)
> +		free_page((unsigned long)table);
> +	else {
> +		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
> +		kmem_cache_free(PGT_CACHE(index_size), table);
> +	}
>  }
>  
>  #define check_pgt_cache()	do { } while (0)
> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 38b33dc..83a2159 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -8,6 +8,26 @@
>  /* And here we include common definitions */
>  #include <asm/pte-common.h>
>  
> +#define PTE_INDEX_SIZE	PTE_SHIFT
> +#define PMD_INDEX_SIZE	0
> +#define PUD_INDEX_SIZE	0
> +#define PGD_INDEX_SIZE	(32 - PGDIR_SHIFT)
> +
> +#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
> +
> +#ifndef __ASSEMBLY__
> +#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_INDEX_SIZE)
> +#define PMD_TABLE_SIZE	(sizeof(pmd_t) << PTE_INDEX_SIZE)
> +#define PUD_TABLE_SIZE	(sizeof(pud_t) << PTE_INDEX_SIZE)
> +#define PGD_TABLE_SIZE	(sizeof(pgd_t) << PGD_INDEX_SIZE)
> +#endif	/* __ASSEMBLY__ */
> +
> +#define PTRS_PER_PTE	(1 << PTE_INDEX_SIZE)
> +#define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
> +
> +/* With 4k base page size, hugepage PTEs go at the PMD level */
> +#define MIN_HUGEPTE_SHIFT	PMD_SHIFT
> +
>  /*
>   * The normal case is that PTEs are 32-bits and we have a 1-page
>   * 1024-entry pgdir pointing to 1-page 1024-entry PTE pages.  -- paulus
> @@ -19,14 +39,10 @@
>   * -Matt
>   */
>  /* PGDIR_SHIFT determines what a top-level page table entry can map */
> -#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_SHIFT)
> +#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_INDEX_SIZE)
>  #define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
>  #define PGDIR_MASK	(~(PGDIR_SIZE-1))
>  
> -#define PTRS_PER_PTE	(1 << PTE_SHIFT)
> -#define PTRS_PER_PMD	1
> -#define PTRS_PER_PGD	(1 << (32 - PGDIR_SHIFT))
> -
>  #define USER_PTRS_PER_PGD	(TASK_SIZE / PGDIR_SIZE)
>  /*
>   * This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
> @@ -82,12 +98,8 @@
>  
>  extern unsigned long ioremap_bot;
>  
> -/*
> - * entries per page directory level: our page-table tree is two-level, so
> - * we don't really have any PMD directory.
> - */
> -#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_SHIFT)
> -#define PGD_TABLE_SIZE	(sizeof(pgd_t) << (32 - PGDIR_SHIFT))
> +/* Bits to mask out from a PGD to get to the PUD page */
> +#define PGD_MASKED_BITS		0
>  
>  #define pte_ERROR(e) \
>  	pr_err("%s:%d: bad pte %llx.\n", __FILE__, __LINE__, \
> @@ -282,15 +294,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
>  #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val(pte) >> 3 })
>  #define __swp_entry_to_pte(x)		((pte_t) { (x).val << 3 })
>  
> -#ifndef CONFIG_PPC_4K_PAGES
> -void pgtable_cache_init(void);
> -#else
> -/*
> - * No page table caches to initialise
> - */
> -#define pgtable_cache_init()	do { } while (0)
> -#endif
> -
>  extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
>  		      pmd_t **pmdp);
>  
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 263bf39..3f85d43 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -786,9 +786,6 @@ extern struct page *pgd_page(pgd_t pgd);
>  #define pgd_ERROR(e) \
>  	pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
>  
> -void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
> -void pgtable_cache_init(void);
> -
>  static inline int map_kernel_page(unsigned long ea, unsigned long pa,
>  				  unsigned long flags)
>  {
> diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
> index c5517f4..c201cd6 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -5,8 +5,6 @@
>  #include <asm/page.h>
>  #include <asm-generic/hugetlb.h>
>  
> -extern struct kmem_cache *hugepte_cache;
> -
>  #ifdef CONFIG_PPC_BOOK3S_64
>  
>  #include <asm/book3s/64/hugetlb-radix.h>
> diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> index 76d6b9e..c2fe85c 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> @@ -2,14 +2,42 @@
>  #define _ASM_POWERPC_PGALLOC_32_H
>  
>  #include <linux/threads.h>
> +#include <linux/slab.h>
>  
> -/* For 32-bit, all levels of page tables are just drawn from get_free_page() */
> -#define MAX_PGTABLE_INDEX_SIZE	0
> +/*
> + * Functions that deal with pagetables that could be at any level of
> + * the table need to be passed an "index_size" so they know how to
> + * handle allocation.  For PTE pages (which are linked to a struct
> + * page for now, and drawn from the main get_free_pages() pool), the
> + * allocation size will be (2^index_size * sizeof(pointer)) and
> + * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
> + *
> + * The maximum index size needs to be big enough to allow any
> + * pagetable sizes we need, but small enough to fit in the low bits of
> + * any page table pointer.  In other words all pagetables, even tiny
> + * ones, must be aligned to allow at least enough low 0 bits to
> + * contain this value.  This value is also used as a mask, so it must
> + * be one less than a power of two.
> + */
> +#define MAX_PGTABLE_INDEX_SIZE	0xf
>  
>  extern void __bad_pte(pmd_t *pmd);
>  
> -extern pgd_t *pgd_alloc(struct mm_struct *mm);
> -extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
> +extern struct kmem_cache *pgtable_cache[];
> +#define PGT_CACHE(shift) ({				\
> +			BUG_ON(!(shift));		\
> +			pgtable_cache[(shift) - 1];	\
> +		})
> +
> +static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> +{
> +	return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), GFP_KERNEL);
> +}
> +
> +static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> +{
> +	kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
> +}
>  
>  /*
>   * We don't have any real pmd's, and this code never triggers because
> @@ -68,8 +96,12 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
>  
>  static inline void pgtable_free(void *table, unsigned index_size)
>  {
> -	BUG_ON(index_size); /* 32-bit doesn't use this */
> -	free_page((unsigned long)table);
> +	if (!index_size)
> +		free_page((unsigned long)table);
> +	else {
> +		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
> +		kmem_cache_free(PGT_CACHE(index_size), table);
> +	}
>  }
>  
>  #define check_pgt_cache()	do { } while (0)
> diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
> index 7808475..8a2937d 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@ -16,6 +16,26 @@ extern int icache_44x_need_flush;
>  
>  #endif /* __ASSEMBLY__ */
>  
> +#define PTE_INDEX_SIZE	PTE_SHIFT
> +#define PMD_INDEX_SIZE	0
> +#define PUD_INDEX_SIZE	0
> +#define PGD_INDEX_SIZE	(32 - PGDIR_SHIFT)
> +
> +#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
> +
> +#ifndef __ASSEMBLY__
> +#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_INDEX_SIZE)
> +#define PMD_TABLE_SIZE	(sizeof(pmd_t) << PTE_INDEX_SIZE)
> +#define PUD_TABLE_SIZE	(sizeof(pud_t) << PTE_INDEX_SIZE)
> +#define PGD_TABLE_SIZE	(sizeof(pgd_t) << PGD_INDEX_SIZE)
> +#endif	/* __ASSEMBLY__ */
> +
> +#define PTRS_PER_PTE	(1 << PTE_INDEX_SIZE)
> +#define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
> +
> +/* With 4k base page size, hugepage PTEs go at the PMD level */
> +#define MIN_HUGEPTE_SHIFT	PMD_SHIFT
> +
>  /*
>   * The normal case is that PTEs are 32-bits and we have a 1-page
>   * 1024-entry pgdir pointing to 1-page 1024-entry PTE pages.  -- paulus
> @@ -27,22 +47,12 @@ extern int icache_44x_need_flush;
>   * -Matt
>   */
>  /* PGDIR_SHIFT determines what a top-level page table entry can map */
> -#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_SHIFT)
> +#define PGDIR_SHIFT	(PAGE_SHIFT + PTE_INDEX_SIZE)
>  #define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
>  #define PGDIR_MASK	(~(PGDIR_SIZE-1))
>  
> -/*
> - * entries per page directory level: our page-table tree is two-level, so
> - * we don't really have any PMD directory.
> - */
> -#ifndef __ASSEMBLY__
> -#define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_SHIFT)
> -#define PGD_TABLE_SIZE	(sizeof(pgd_t) << (32 - PGDIR_SHIFT))
> -#endif	/* __ASSEMBLY__ */
> -
> -#define PTRS_PER_PTE	(1 << PTE_SHIFT)
> -#define PTRS_PER_PMD	1
> -#define PTRS_PER_PGD	(1 << (32 - PGDIR_SHIFT))
> +/* Bits to mask out from a PGD to get to the PUD page */
> +#define PGD_MASKED_BITS		0
>  
>  #define USER_PTRS_PER_PGD	(TASK_SIZE / PGDIR_SIZE)
>  #define FIRST_USER_ADDRESS	0UL
> @@ -327,15 +337,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
>  #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val(pte) >> 3 })
>  #define __swp_entry_to_pte(x)		((pte_t) { (x).val << 3 })
>  
> -#ifndef CONFIG_PPC_4K_PAGES
> -void pgtable_cache_init(void);
> -#else
> -/*
> - * No page table caches to initialise
> - */
> -#define pgtable_cache_init()	do { } while (0)
> -#endif
> -
>  extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
>  		      pmd_t **pmdp);
>  
> diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
> index d4d808c..b0fc9e4 100644
> --- a/arch/powerpc/include/asm/nohash/64/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
> @@ -357,8 +357,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
>  #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
>  #define __swp_entry_to_pte(x)		__pte((x).val)
>  
> -void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
> -void pgtable_cache_init(void);
>  extern int map_kernel_page(unsigned long ea, unsigned long pa,
>  			   unsigned long flags);
>  extern int __meminit vmemmap_create_mapping(unsigned long start,
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 9bd87f2..dd01212 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -78,6 +78,8 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
>  
>  unsigned long vmalloc_to_phys(void *vmalloc_addr);
>  
> +void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
> +void pgtable_cache_init(void);
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_PGTABLE_H */
> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> index f2cea6d..08bb010 100644
> --- a/arch/powerpc/mm/Makefile
> +++ b/arch/powerpc/mm/Makefile
> @@ -7,7 +7,7 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
>  ccflags-$(CONFIG_PPC64)	:= $(NO_MINIMAL_TOC)
>  
>  obj-y				:= fault.o mem.o pgtable.o mmap.o \
> -				   init_$(CONFIG_WORD_SIZE).o \
> +				   init_$(CONFIG_WORD_SIZE).o init-common.o \
>  				   pgtable_$(CONFIG_WORD_SIZE).o
>  obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o tlb_nohash.o \
>  				   tlb_nohash_low.o
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 7372ee1..9164a77 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -68,7 +68,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>  #ifdef CONFIG_PPC_FSL_BOOK3E
>  	int i;
>  	int num_hugepd = 1 << (pshift - pdshift);
> -	cachep = hugepte_cache;
> +	cachep = PGT_CACHE(1);
>  #else
>  	cachep = PGT_CACHE(pdshift - pshift);
>  #endif

Can you explain the usage of PGT_CACHE(1) ?

> @@ -411,7 +411,7 @@ static void hugepd_free_rcu_callback(struct rcu_head *head)
>  	unsigned int i;
>  
>  	for (i = 0; i < batch->index; i++)
> -		kmem_cache_free(hugepte_cache, batch->ptes[i]);
> +		kmem_cache_free(PGT_CACHE(1), batch->ptes[i]);
>  
>  	free_page((unsigned long)batch);
>  }
> @@ -425,7 +425,7 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
>  	if (atomic_read(&tlb->mm->mm_users) < 2 ||
>  	    cpumask_equal(mm_cpumask(tlb->mm),
>  			  cpumask_of(smp_processor_id()))) {
> -		kmem_cache_free(hugepte_cache, hugepte);
> +		kmem_cache_free(PGT_CACHE(1), hugepte);
>  		put_cpu_var(hugepd_freelist_cur);
>  		return;
>  	}
> @@ -792,7 +792,6 @@ static int __init hugepage_setup_sz(char *str)
>  __setup("hugepagesz=", hugepage_setup_sz);
>  
>  #ifdef CONFIG_PPC_FSL_BOOK3E
> -struct kmem_cache *hugepte_cache;
>  static int __init hugetlbpage_init(void)
>  {
>  	int psize;
> @@ -815,9 +814,8 @@ static int __init hugetlbpage_init(void)
>  	 * Create a kmem cache for hugeptes.  The bottom bits in the pte have
>  	 * size information encoded in them, so align them to allow this
>  	 */
> -	hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
> -					   HUGEPD_SHIFT_MASK + 1, 0, NULL);
> -	if (hugepte_cache == NULL)
> +	pgtable_cache_add(1, NULL);
> +	if (!PGT_CACHE(1))
>  		panic("%s: Unable to create kmem cache for hugeptes\n",
>  		      __func__);
>  
> diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
> new file mode 100644
> index 0000000..2632eab
> --- /dev/null
> +++ b/arch/powerpc/mm/init-common.c
> @@ -0,0 +1,152 @@
> +/*
> + *  PowerPC version
> + *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
> + *
> + *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
> + *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
> + *    Copyright (C) 1996 Paul Mackerras
> + *
> + *  Derived from "arch/i386/mm/init.c"
> + *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> + *
> + *  Dave Engebretsen <engebret@us.ibm.com>
> + *      Rework for PPC64 port.
> + *
> + *  This program is free software; you can redistribute it and/or
> + *  modify it under the terms of the GNU General Public License
> + *  as published by the Free Software Foundation; either version
> + *  2 of the License, or (at your option) any later version.
> + *
> + */
> +
> +#undef DEBUG
> +
> +#include <linux/signal.h>
> +#include <linux/sched.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/string.h>
> +#include <linux/types.h>
> +#include <linux/mman.h>
> +#include <linux/mm.h>
> +#include <linux/swap.h>
> +#include <linux/stddef.h>
> +#include <linux/vmalloc.h>
> +#include <linux/init.h>
> +#include <linux/delay.h>
> +#include <linux/highmem.h>
> +#include <linux/idr.h>
> +#include <linux/nodemask.h>
> +#include <linux/module.h>
> +#include <linux/poison.h>
> +#include <linux/memblock.h>
> +#include <linux/hugetlb.h>
> +#include <linux/slab.h>
> +
> +#include <asm/pgalloc.h>
> +#include <asm/page.h>
> +#include <asm/prom.h>
> +#include <asm/rtas.h>
> +#include <asm/io.h>
> +#include <asm/mmu_context.h>
> +#include <asm/pgtable.h>
> +#include <asm/mmu.h>
> +#include <asm/uaccess.h>
> +#include <asm/smp.h>
> +#include <asm/machdep.h>
> +#include <asm/tlb.h>
> +#include <asm/eeh.h>
> +#include <asm/processor.h>
> +#include <asm/mmzone.h>
> +#include <asm/cputable.h>
> +#include <asm/sections.h>
> +#include <asm/iommu.h>
> +#include <asm/vdso.h>
> +
> +#include "mmu_decl.h"
> +
> +phys_addr_t memstart_addr = (phys_addr_t)~0ull;
> +EXPORT_SYMBOL_GPL(memstart_addr);
> +phys_addr_t kernstart_addr;
> +EXPORT_SYMBOL_GPL(kernstart_addr);
> +
> +static void pgd_ctor(void *addr)
> +{
> +	memset(addr, 0, PGD_TABLE_SIZE);
> +}
> +
> +static void pud_ctor(void *addr)
> +{
> +	memset(addr, 0, PUD_TABLE_SIZE);
> +}
> +
> +static void pmd_ctor(void *addr)
> +{
> +	memset(addr, 0, PMD_TABLE_SIZE);
> +}
> +
> +struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
> +
> +/*
> + * Create a kmem_cache() for pagetables.  This is not used for PTE
> + * pages - they're linked to struct page, come from the normal free
> + * pages pool and have a different entry size (see real_pte_t) to
> + * everything else.  Caches created by this function are used for all
> + * the higher level pagetables, and for hugepage pagetables.
> + */
> +void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
> +{
> +	char *name;
> +	unsigned long table_size = sizeof(void *) << shift;
> +	unsigned long align = table_size;
> +
> +	/* When batching pgtable pointers for RCU freeing, we store
> +	 * the index size in the low bits.  Table alignment must be
> +	 * big enough to fit it.
> +	 *
> +	 * Likewise, hugeapge pagetable pointers contain a (different)
> +	 * shift value in the low bits.  All tables must be aligned so
> +	 * as to leave enough 0 bits in the address to contain it. */
> +	unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
> +				     HUGEPD_SHIFT_MASK + 1);
> +	struct kmem_cache *new;
> +
> +	/* It would be nice if this was a BUILD_BUG_ON(), but at the
> +	 * moment, gcc doesn't seem to recognize is_power_of_2 as a
> +	 * constant expression, so so much for that. */
> +	BUG_ON(!is_power_of_2(minalign));
> +	BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
> +
> +	if (PGT_CACHE(shift))
> +		return; /* Already have a cache of this size */
> +
> +	align = max_t(unsigned long, align, minalign);
> +	name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
> +	new = kmem_cache_create(name, table_size, align, 0, ctor);
> +	kfree(name);
> +	pgtable_cache[shift - 1] = new;
> +	pr_debug("Allocated pgtable cache for order %d\n", shift);
> +}
> +
> +
> +void pgtable_cache_init(void)
> +{
> +	pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
> +
> +	if (PMD_INDEX_SIZE && !PGT_CACHE(PMD_INDEX_SIZE))
> +		pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor);
> +	/*
> +	 * In all current configs, when the PUD index exists it's the
> +	 * same size as either the pgd or pmd index except with THP enabled
> +	 * on book3s 64
> +	 */
> +	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
> +		pgtable_cache_add(PUD_INDEX_SIZE, pud_ctor);
> +
> +	if (!PGT_CACHE(PGD_INDEX_SIZE))
> +		panic("Couldn't allocate pgd cache");
> +	if (PMD_INDEX_SIZE && !PGT_CACHE(PMD_INDEX_SIZE))
> +		panic("Couldn't allocate pmd pgtable caches");
> +	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
> +		panic("Couldn't allocate pud pgtable caches");
> +}
> diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
> index 448685f..79c24d4 100644
> --- a/arch/powerpc/mm/init_32.c
> +++ b/arch/powerpc/mm/init_32.c
> @@ -59,11 +59,6 @@
>  phys_addr_t total_memory;
>  phys_addr_t total_lowmem;
>  
> -phys_addr_t memstart_addr = (phys_addr_t)~0ull;
> -EXPORT_SYMBOL(memstart_addr);
> -phys_addr_t kernstart_addr;
> -EXPORT_SYMBOL(kernstart_addr);
> -
>  #ifdef CONFIG_RELOCATABLE
>  /* Used in __va()/__pa() */
>  long long virt_phys_offset;
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 16ada1e..4acd546 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -75,88 +75,6 @@
>  #endif
>  #endif /* CONFIG_PPC_STD_MMU_64 */
>  
> -phys_addr_t memstart_addr = ~0;
> -EXPORT_SYMBOL_GPL(memstart_addr);
> -phys_addr_t kernstart_addr;
> -EXPORT_SYMBOL_GPL(kernstart_addr);
> -
> -static void pgd_ctor(void *addr)
> -{
> -	memset(addr, 0, PGD_TABLE_SIZE);
> -}
> -
> -static void pud_ctor(void *addr)
> -{
> -	memset(addr, 0, PUD_TABLE_SIZE);
> -}
> -
> -static void pmd_ctor(void *addr)
> -{
> -	memset(addr, 0, PMD_TABLE_SIZE);
> -}
> -
> -struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
> -
> -/*
> - * Create a kmem_cache() for pagetables.  This is not used for PTE
> - * pages - they're linked to struct page, come from the normal free
> - * pages pool and have a different entry size (see real_pte_t) to
> - * everything else.  Caches created by this function are used for all
> - * the higher level pagetables, and for hugepage pagetables.
> - */
> -void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
> -{
> -	char *name;
> -	unsigned long table_size = sizeof(void *) << shift;
> -	unsigned long align = table_size;
> -
> -	/* When batching pgtable pointers for RCU freeing, we store
> -	 * the index size in the low bits.  Table alignment must be
> -	 * big enough to fit it.
> -	 *
> -	 * Likewise, hugeapge pagetable pointers contain a (different)
> -	 * shift value in the low bits.  All tables must be aligned so
> -	 * as to leave enough 0 bits in the address to contain it. */
> -	unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
> -				     HUGEPD_SHIFT_MASK + 1);
> -	struct kmem_cache *new;
> -
> -	/* It would be nice if this was a BUILD_BUG_ON(), but at the
> -	 * moment, gcc doesn't seem to recognize is_power_of_2 as a
> -	 * constant expression, so so much for that. */
> -	BUG_ON(!is_power_of_2(minalign));
> -	BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
> -
> -	if (PGT_CACHE(shift))
> -		return; /* Already have a cache of this size */
> -
> -	align = max_t(unsigned long, align, minalign);
> -	name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
> -	new = kmem_cache_create(name, table_size, align, 0, ctor);
> -	kfree(name);
> -	pgtable_cache[shift - 1] = new;
> -	pr_debug("Allocated pgtable cache for order %d\n", shift);
> -}
> -
> -
> -void pgtable_cache_init(void)
> -{
> -	pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
> -	pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor);
> -	/*
> -	 * In all current configs, when the PUD index exists it's the
> -	 * same size as either the pgd or pmd index except with THP enabled
> -	 * on book3s 64
> -	 */
> -	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
> -		pgtable_cache_add(PUD_INDEX_SIZE, pud_ctor);
> -
> -	if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_CACHE_INDEX))
> -		panic("Couldn't allocate pgtable caches");
> -	if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
> -		panic("Couldn't allocate pud pgtable caches");
> -}
> -
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  /*
>   * Given an address within the vmemmap, determine the pfn of the page that
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 0ae0572..a65c0b4 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -42,43 +42,6 @@ EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
>  
>  extern char etext[], _stext[], _sinittext[], _einittext[];
>  
> -#define PGDIR_ORDER	(32 + PGD_T_LOG2 - PGDIR_SHIFT)
> -
> -#ifndef CONFIG_PPC_4K_PAGES
> -static struct kmem_cache *pgtable_cache;
> -
> -void pgtable_cache_init(void)
> -{
> -	pgtable_cache = kmem_cache_create("PGDIR cache", 1 << PGDIR_ORDER,
> -					  1 << PGDIR_ORDER, 0, NULL);
> -	if (pgtable_cache == NULL)
> -		panic("Couldn't allocate pgtable caches");
> -}
> -#endif
> -
> -pgd_t *pgd_alloc(struct mm_struct *mm)
> -{
> -	pgd_t *ret;
> -
> -	/* pgdir take page or two with 4K pages and a page fraction otherwise */
> -#ifndef CONFIG_PPC_4K_PAGES
> -	ret = kmem_cache_alloc(pgtable_cache, GFP_KERNEL | __GFP_ZERO);
> -#else
> -	ret = (pgd_t *)__get_free_pages(GFP_KERNEL|__GFP_ZERO,
> -			PGDIR_ORDER - PAGE_SHIFT);
> -#endif
> -	return ret;
> -}
> -
> -void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> -{
> -#ifndef CONFIG_PPC_4K_PAGES
> -	kmem_cache_free(pgtable_cache, (void *)pgd);
> -#else
> -	free_pages((unsigned long)pgd, PGDIR_ORDER - PAGE_SHIFT);
> -#endif
> -}
> -
>  __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
>  {
>  	pte_t *pte;
> -- 
> 2.1.0

I still didn't quiet follow why we are replacing

 -	hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
 -					   HUGEPD_SHIFT_MASK + 1, 0, NULL);
 +	pgtable_cache_add(1, NULL);

-aneesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] powerpc/8xx: implementation of huge pages
  2016-08-12 16:55 ` [PATCH 6/6] powerpc/8xx: implementation of huge pages Christophe Leroy
@ 2016-08-14 14:25   ` Aneesh Kumar K.V
  2016-08-14 17:38     ` christophe leroy
  0 siblings, 1 reply; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-14 14:25 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> The 8xx has 512k and 8M pages. This patch implements hugepages using
> those sizes.
>
> On the 8xx, the size of pages is in the PGD entry,
> using PS field (bits 28-29):
> 00 : Small pages (4k or 16k)
> 01 : 512k pages
> 10 : reserved
> 11 : 8M pages
>
> The implementation uses a mix of what is used on BOOKS and BOOKE,
> as 512k pages are in HUGEPTE tables while for 8M pages we have
> several PGD entries pointing on a leaf HUGEPTE entry
>
> For the time being, we do not support CPU15 ERRATA if HUGETLB is
> selected

Can you also document here the format for linux page table with different
huge page size. ?

>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] powerpc/8xx: implementation of huge pages
  2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
                   ` (5 preceding siblings ...)
  2016-08-12 16:55 ` [PATCH 6/6] powerpc/8xx: implementation of huge pages Christophe Leroy
@ 2016-08-14 14:27 ` Aneesh Kumar K.V
  2016-08-14 17:33   ` christophe leroy
  6 siblings, 1 reply; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-14 14:27 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linuxppc-dev, linux-kernel

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> This set provides implementation of huge pages on the 8xx
>
> Christophe Leroy (6):
>   powerpc: port 64 bits pgtable_cache to 32 bits
>   powerpc: fix usage of _PAGE_RO in hugepage
>   powerpc/8xx: use r3 to scratch CR in ITLBmiss
>   powerpc/8xx: Move additional DTLBMiss handlers out of exception area
>   powerpc/8xx: make user addr DTLB miss the short path
>   powerpc/8xx: implementation of huge pages

Patch 2,3,4,5 are not really related to hugepage implementation right ?
May be that can be sent as a separate series ?

-aneesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] powerpc/8xx: implementation of huge pages
  2016-08-14 14:27 ` [PATCH 0/6] " Aneesh Kumar K.V
@ 2016-08-14 17:33   ` christophe leroy
  2016-08-15 10:31     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 16+ messages in thread
From: christophe leroy @ 2016-08-14 17:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linuxppc-dev, linux-kernel

Le 14/08/2016 à 16:27, Aneesh Kumar K.V a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>
>> This set provides implementation of huge pages on the 8xx
>>
>> Christophe Leroy (6):
>>   powerpc: port 64 bits pgtable_cache to 32 bits
>>   powerpc: fix usage of _PAGE_RO in hugepage
>>   powerpc/8xx: use r3 to scratch CR in ITLBmiss
>>   powerpc/8xx: Move additional DTLBMiss handlers out of exception area
>>   powerpc/8xx: make user addr DTLB miss the short path
>>   powerpc/8xx: implementation of huge pages
>
> Patch 2,3,4,5 are not really related to hugepage implementation right ?
> May be that can be sent as a separate series ?
>

Patch 2 is a lack in gup_hugepte: on 8xx, _PAGE_RW (hence _PAGE_WRITE) 
is defined as 0 and _PAGE_RO must be set when the page in not writeable. 
So that's a prerequisite to implementation of huge page

Patch 3, 4, 5 are prerequisits for the implementation of huge page 
handling in the TLB miss handlers:
* 3: for huge pages implementation we need to branch base of value of 
bit 28 and 29 of the PGD entry. For this we need to preserve CR.
* 4: With the instructions related to hugepages, there is not enough 
space anymore in the TLB Exception areas, so part of the it has to be 
moved away.
* 5: That one might be seen as not directly related to hugepages, but it 
requires patch 3.

Maybe I should reorder into 3, 4, 5, 2, 1, 6 ?

Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] powerpc/8xx: implementation of huge pages
  2016-08-14 14:25   ` Aneesh Kumar K.V
@ 2016-08-14 17:38     ` christophe leroy
  2016-08-15 10:30       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 16+ messages in thread
From: christophe leroy @ 2016-08-14 17:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev



Le 14/08/2016 à 16:25, Aneesh Kumar K.V a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>
>> The 8xx has 512k and 8M pages. This patch implements hugepages using
>> those sizes.
>>
>> On the 8xx, the size of pages is in the PGD entry,
>> using PS field (bits 28-29):
>> 00 : Small pages (4k or 16k)
>> 01 : 512k pages
>> 10 : reserved
>> 11 : 8M pages
>>
>> The implementation uses a mix of what is used on BOOKS and BOOKE,
>> as 512k pages are in HUGEPTE tables while for 8M pages we have
>> several PGD entries pointing on a leaf HUGEPTE entry
>>
>> For the time being, we do not support CPU15 ERRATA if HUGETLB is
>> selected
>
> Can you also document here the format for linux page table with different
> huge page size. ?

Euh ... isn't it what I do when explaining the use of the PS field in 
the PGD entry ? That's the thing, that's how the 8xx knows how it is a 
huge page, and that's how Linux will know it is one. On the 8xx, the 
Linux PGD entry (almost) match the L1 MMU entry and the Linux PTE almost 
match the L2 MMU entry (some bits are copied from the PTE to the L1 
entry and then removed from the value writen to the L2 MMU entry)

Christophe

>
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits
  2016-08-14 14:17   ` Aneesh Kumar K.V
@ 2016-08-14 18:51     ` christophe leroy
  2016-08-15 10:23       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 16+ messages in thread
From: christophe leroy @ 2016-08-14 18:51 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linuxppc-dev, linux-kernel

Le 14/08/2016 à 16:17, Aneesh Kumar K.V a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>
>> Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
>> standard pages when using 4k pages and a single pgtable_cache
>> if using other size pages. In addition powerpc32 uses another cache
>> when handling huge pages.
>>
>> In preparation of implementing huge pages on the 8xx, this patch
>> replaces the specific powerpc32 handling by the 64 bits approach.
>
> Why is this needed ? Can you also summarize the page size used and the
> hugepage format you are planning to use ? . What are the page sizes
> supported by 8xx ? Also is the new code copy of existing powerpc64 4k
> page size code ?

8xx supports two huge page sizes: 8M and 512k.
As PGD entries points on 4M page tables, it means we are in an 
eterogenous situation:
1/ when using 8M huge pages, we are in the same situation as what is 
done for the BOOK3S (which supports 16M, 256M and 1G), that is several 
PDG entries pointing to a single PTE entry.
2/ when using 512k huge pages, we are in the same situation as whan is 
done for the BOOK3E: a PGD entry points to the hugepage table that 
handles several huge pages (in our case 8 huge pages)

The code from init_64 have been moved to a new file named init-common in 
order to be used by init_32 too.
The code from the 64 bits .h has been copied into the 32 bits .h (indeed 
it's been copied twice as the .h are now duplicated into nohash and 
book3s versions)

[...]

>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index 7372ee1..9164a77 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -68,7 +68,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>>  #ifdef CONFIG_PPC_FSL_BOOK3E
>>  	int i;
>>  	int num_hugepd = 1 << (pshift - pdshift);
>> -	cachep = hugepte_cache;
>> +	cachep = PGT_CACHE(1);
>>  #else
>>  	cachep = PGT_CACHE(pdshift - pshift);
>>  #endif
>
> Can you explain the usage of PGT_CACHE(1) ?

[...]

>
> I still didn't quiet follow why we are replacing
>
>  -	hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
>  -					   HUGEPD_SHIFT_MASK + 1, 0, NULL);
>  +	pgtable_cache_add(1, NULL);
>

Euh ... Indeed I wanted something to replace hugepte_cache. But it looks 
like it should be something like PGT_CACHE(0) for 32 bits targets having 
32 bits PTEs and PGT_CACHE(1) for 32 bits targets having 64 bits PTEs. 
But PGT_CACHE(0) doesn't exist (yet).

Looking once more, that might not really be needed I think. I'll rework 
it and see what I can achieve.

Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits
  2016-08-14 18:51     ` christophe leroy
@ 2016-08-15 10:23       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-15 10:23 UTC (permalink / raw)
  To: christophe leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linuxppc-dev, linux-kernel

christophe leroy <christophe.leroy@c-s.fr> writes:

> Le 14/08/2016 =C3=A0 16:17, Aneesh Kumar K.V a =C3=A9crit :
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>
>>> Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
>>> standard pages when using 4k pages and a single pgtable_cache
>>> if using other size pages. In addition powerpc32 uses another cache
>>> when handling huge pages.
>>>
>>> In preparation of implementing huge pages on the 8xx, this patch
>>> replaces the specific powerpc32 handling by the 64 bits approach.
>>
>> Why is this needed ? Can you also summarize the page size used and the
>> hugepage format you are planning to use ? . What are the page sizes
>> supported by 8xx ? Also is the new code copy of existing powerpc64 4k
>> page size code ?
>
> 8xx supports two huge page sizes: 8M and 512k.
> As PGD entries points on 4M page tables, it means we are in an=20
> eterogenous situation:
> 1/ when using 8M huge pages, we are in the same situation as what is=20
> done for the BOOK3S (which supports 16M, 256M and 1G), that is several=20
> PDG entries pointing to a single PTE entry.

what is done for FSL BOOK3E ?


> 2/ when using 512k huge pages, we are in the same situation as whan is=20
> done for the BOOK3E: a PGD entry points to the hugepage table that=20
> handles several huge pages (in our case 8 huge pages)
>

what is done for Book3s with 4K linux page size. ?


So the idea here is to allocate different hugepte table based on
hugepage size requested and hence the need to switch from hugpte-cache
to a more generic PGT_CACHE ?

> The code from init_64 have been moved to a new file named init-common in=
=20
> order to be used by init_32 too.
> The code from the 64 bits .h has been copied into the 32 bits .h (indeed=
=20
> it's been copied twice as the .h are now duplicated into nohash and=20
> book3s versions)


That explanation made it a lot easy to follow the patch. Can we capture
that in commit message too. Also Do we support hugepage with both 4k and
16K linux page size ?. I guess we do because 8xx only do a two level
linux page table ?=20

>
> [...]
>
>>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpag=
e.c
>>> index 7372ee1..9164a77 100644
>>> --- a/arch/powerpc/mm/hugetlbpage.c
>>> +++ b/arch/powerpc/mm/hugetlbpage.c
>>> @@ -68,7 +68,7 @@ static int __hugepte_alloc(struct mm_struct *mm, huge=
pd_t *hpdp,
>>>  #ifdef CONFIG_PPC_FSL_BOOK3E
>>>  	int i;
>>>  	int num_hugepd =3D 1 << (pshift - pdshift);
>>> -	cachep =3D hugepte_cache;
>>> +	cachep =3D PGT_CACHE(1);
>>>  #else
>>>  	cachep =3D PGT_CACHE(pdshift - pshift);
>>>  #endif
>>
>> Can you explain the usage of PGT_CACHE(1) ?
>
> [...]
>
>>
>> I still didn't quiet follow why we are replacing
>>
>>  -	hugepte_cache =3D  kmem_cache_create("hugepte-cache", sizeof(pte_t),
>>  -					   HUGEPD_SHIFT_MASK + 1, 0, NULL);
>>  +	pgtable_cache_add(1, NULL);
>>
>
> Euh ... Indeed I wanted something to replace hugepte_cache. But it looks=
=20
> like it should be something like PGT_CACHE(0) for 32 bits targets having=
=20
> 32 bits PTEs and PGT_CACHE(1) for 32 bits targets having 64 bits PTEs.=20
> But PGT_CACHE(0) doesn't exist (yet).
>
> Looking once more, that might not really be needed I think. I'll rework=20
> it and see what I can achieve.
>

Thanks
-aneesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] powerpc/8xx: implementation of huge pages
  2016-08-14 17:38     ` christophe leroy
@ 2016-08-15 10:30       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-15 10:30 UTC (permalink / raw)
  To: christophe leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev

christophe leroy <christophe.leroy@c-s.fr> writes:

> Le 14/08/2016 =C3=A0 16:25, Aneesh Kumar K.V a =C3=A9crit :
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>
>>> The 8xx has 512k and 8M pages. This patch implements hugepages using
>>> those sizes.
>>>
>>> On the 8xx, the size of pages is in the PGD entry,
>>> using PS field (bits 28-29):
>>> 00 : Small pages (4k or 16k)
>>> 01 : 512k pages
>>> 10 : reserved
>>> 11 : 8M pages
>>>
>>> The implementation uses a mix of what is used on BOOKS and BOOKE,
>>> as 512k pages are in HUGEPTE tables while for 8M pages we have
>>> several PGD entries pointing on a leaf HUGEPTE entry
>>>
>>> For the time being, we do not support CPU15 ERRATA if HUGETLB is
>>> selected
>>
>> Can you also document here the format for linux page table with different
>> huge page size. ?
>
> Euh ... isn't it what I do when explaining the use of the PS field in=20
> the PGD entry ? That's the thing, that's how the 8xx knows how it is a=20
> huge page, and that's how Linux will know it is one. On the 8xx, the=20
> Linux PGD entry (almost) match the L1 MMU entry and the Linux PTE almost=
=20
> match the L2 MMU entry (some bits are copied from the PTE to the L1=20
> entry and then removed from the value writen to the L2 MMU entry)
>

Sorry if that answer was obvious in the commit message. I haven't looked
at 8xx pagetable format closely to understand the details. Now with your
reply to the earlier email, I looked at the changes again and wonder
whether we can document details like.

8xx uses a two level page table with two different linux page size
support (4k and 16k). 8xx also support two different hugepage sizes
512k and 8M. Inorder to support then on linux we define two different
page table layout.

For 512K hugepage size a pgd entry have the below format
[<hugepte address >0100] . The hugepte table allocated will contain <x>
entries pointing to 512K huge pte.

For 8M multiple pgd entries point to the same hugepte address. and pgd
entry will have the below format
[<hugepte address>1100]. The hugepte table allocated will only have one
entry.

I agree that this is the same details you explained in the commit
messages. But calling out all details will help anybody reading the code
later.

-aneesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] powerpc/8xx: implementation of huge pages
  2016-08-14 17:33   ` christophe leroy
@ 2016-08-15 10:31     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-15 10:31 UTC (permalink / raw)
  To: christophe leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Scott Wood
  Cc: linuxppc-dev, linux-kernel

christophe leroy <christophe.leroy@c-s.fr> writes:

> Le 14/08/2016 =C3=A0 16:27, Aneesh Kumar K.V a =C3=A9crit :
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>
>>> This set provides implementation of huge pages on the 8xx
>>>
>>> Christophe Leroy (6):
>>>   powerpc: port 64 bits pgtable_cache to 32 bits
>>>   powerpc: fix usage of _PAGE_RO in hugepage
>>>   powerpc/8xx: use r3 to scratch CR in ITLBmiss
>>>   powerpc/8xx: Move additional DTLBMiss handlers out of exception area
>>>   powerpc/8xx: make user addr DTLB miss the short path
>>>   powerpc/8xx: implementation of huge pages
>>
>> Patch 2,3,4,5 are not really related to hugepage implementation right ?
>> May be that can be sent as a separate series ?
>>
>
> Patch 2 is a lack in gup_hugepte: on 8xx, _PAGE_RW (hence _PAGE_WRITE)=20
> is defined as 0 and _PAGE_RO must be set when the page in not writeable.=
=20
> So that's a prerequisite to implementation of huge page
>
> Patch 3, 4, 5 are prerequisits for the implementation of huge page=20
> handling in the TLB miss handlers:
> * 3: for huge pages implementation we need to branch base of value of=20
> bit 28 and 29 of the PGD entry. For this we need to preserve CR.
> * 4: With the instructions related to hugepages, there is not enough=20
> space anymore in the TLB Exception areas, so part of the it has to be=20
> moved away.
> * 5: That one might be seen as not directly related to hugepages, but it=
=20
> requires patch 3.
>
> Maybe I should reorder into 3, 4, 5, 2, 1, 6 ?
>

Can't 3,4,5,2 be merged independent of the series ? Reading then
suggested they are either fixes or performance improvements ?

-aneesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-08-15 10:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12 16:55 [PATCH 0/6] powerpc/8xx: implementation of huge pages Christophe Leroy
2016-08-12 16:55 ` [PATCH 1/6] powerpc: port 64 bits pgtable_cache to 32 bits Christophe Leroy
2016-08-14 14:17   ` Aneesh Kumar K.V
2016-08-14 18:51     ` christophe leroy
2016-08-15 10:23       ` Aneesh Kumar K.V
2016-08-12 16:55 ` [PATCH 2/6] powerpc: fix usage of _PAGE_RO in hugepage Christophe Leroy
2016-08-12 16:55 ` [PATCH 3/6] powerpc/8xx: use r3 to scratch CR in ITLBmiss Christophe Leroy
2016-08-12 16:55 ` [PATCH 4/6] powerpc/8xx: Move additional DTLBMiss handlers out of exception area Christophe Leroy
2016-08-12 16:55 ` [PATCH 5/6] powerpc/8xx: make user addr DTLB miss the short path Christophe Leroy
2016-08-12 16:55 ` [PATCH 6/6] powerpc/8xx: implementation of huge pages Christophe Leroy
2016-08-14 14:25   ` Aneesh Kumar K.V
2016-08-14 17:38     ` christophe leroy
2016-08-15 10:30       ` Aneesh Kumar K.V
2016-08-14 14:27 ` [PATCH 0/6] " Aneesh Kumar K.V
2016-08-14 17:33   ` christophe leroy
2016-08-15 10:31     ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).