linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: [PATCH v4 26/45] powerpc/8xx: Manage 512k huge pages as standard pages.
Date: Tue, 19 May 2020 05:49:09 +0000 (UTC)	[thread overview]
Message-ID: <002819e8e166bf81d24b24782d98de7c40905d8f.1589866984.git.christophe.leroy@csgroup.eu> (raw)
In-Reply-To: <cover.1589866984.git.christophe.leroy@csgroup.eu>

At the time being, 512k huge pages are handled through hugepd page
tables. The PMD entry is flagged as a hugepd pointer and it
means that only 512k hugepages can be managed in that 4M block.
However, the hugepd table has the same size as a normal page
table, and 512k entries can therefore be nested with normal pages.

On the 8xx, TLB loading is performed by software and allthough the
page tables are organised to match the L1 and L2 level defined by
the HW, all TLB entries have both L1 and L2 independent entries.
It means that even if two TLB entries are associated with the same
PMD entry, they can be loaded with different values in L1 part.

The L1 entry contains the page size (PS field):
- 00 for 4k and 16 pages
- 01 for 512k pages
- 11 for 8M pages

By adding a flag for hugepages in the PTE (_PAGE_HUGE) and copying it
into the lower bit of PS, we can then manage 512k pages with normal
page tables:
- PMD entry has PS=11 for 8M pages
- PMD entry has PS=00 for other pages.

As a PMD entry covers 4M areas, a PMD will either point to a hugepd
table having a single entry to an 8M page, or the PMD will point to
a standard page table which will have either entries to 4k or 16k or
512k pages. For 512k pages, as the L1 entry will not know it is a
512k page before the PTE is read, there will be 128 entries in the
PTE as if it was 4k pages. But when loading the TLB, it will be
flagged as a 512k page.

Note that we can't use pmd_ptr() in asm/nohash/32/pgtable.h because
it is not defined yet.

In ITLB miss, we keep the possibility to opt it out as when kernel
text is pinned and no user hugepages are used, we can save several
instruction by not using r11.

In DTLB miss, that's just one instruction so it's not worth bothering
with it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 10 ++++++---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h |  4 +++-
 arch/powerpc/include/asm/nohash/pgtable.h    |  2 +-
 arch/powerpc/kernel/head_8xx.S               | 12 +++++------
 arch/powerpc/mm/hugetlbpage.c                | 22 +++++++++++++++++---
 arch/powerpc/mm/pgtable.c                    | 10 ++++++++-
 6 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 9a287a95acad..717f995d21b8 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -229,8 +229,9 @@ static inline void pmd_clear(pmd_t *pmdp)
  * those implementations.
  *
  * On the 8xx, the page tables are a bit special. For 16k pages, we have
- * 4 identical entries. For other page sizes, we have a single entry in the
- * table.
+ * 4 identical entries. For 512k pages, we have 128 entries as if it was
+ * 4k pages, but they are flagged as 512k pages for the hardware.
+ * For other page sizes, we have a single entry in the table.
  */
 #ifdef CONFIG_PPC_8xx
 static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, pte_t *p,
@@ -240,13 +241,16 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, p
 	pte_basic_t old = pte_val(*p);
 	pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
 	int num, i;
+	pmd_t *pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
 
 	if (!huge)
 		num = PAGE_SIZE / SZ_4K;
+	else if ((pmd_val(*pmd) & _PMD_PAGE_MASK) != _PMD_PAGE_8M)
+		num = SZ_512K / SZ_4K;
 	else
 		num = 1;
 
-	for (i = 0; i < num; i++, entry++)
+	for (i = 0; i < num; i++, entry++, new += SZ_4K)
 		*entry = new;
 
 	return old;
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index c9e4b2d90f65..66f403a7da44 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -46,6 +46,8 @@
 #define _PAGE_NA	0x0200	/* Supervisor NA, User no access */
 #define _PAGE_RO	0x0600	/* Supervisor RO, User no access */
 
+#define _PAGE_HUGE	0x0800	/* Copied to L1 PS bit 29 */
+
 /* cache related flags non existing on 8xx */
 #define _PAGE_COHERENT	0
 #define _PAGE_WRITETHRU	0
@@ -128,7 +130,7 @@ static inline pte_t pte_mkuser(pte_t pte)
 
 static inline pte_t pte_mkhuge(pte_t pte)
 {
-	return __pte(pte_val(pte) | _PAGE_SPS);
+	return __pte(pte_val(pte) | _PAGE_SPS | _PAGE_HUGE);
 }
 
 #define pte_mkhuge pte_mkhuge
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 7fed9dc0f147..f27c967d9269 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -267,7 +267,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 static inline int hugepd_ok(hugepd_t hpd)
 {
 #ifdef CONFIG_PPC_8xx
-	return ((hpd_val(hpd) & 0x4) != 0);
+	return ((hpd_val(hpd) & _PMD_PAGE_MASK) == _PMD_PAGE_8M);
 #else
 	/* We clear the top bit to indicate hugepd */
 	return (hpd_val(hpd) && (hpd_val(hpd) & PD_HUGE) == 0);
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index adad8baadcf5..423465b10c82 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -239,7 +239,6 @@ InstructionTLBMiss:
 #endif
 #ifdef CONFIG_HUGETLBFS
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r10)	/* Get level 1 entry */
-	mtspr	SPRN_MI_TWC, r11	/* Set segment attributes */
 	mtspr	SPRN_MD_TWC, r11
 #else
 	lwz	r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10)	/* Get level 1 entry */
@@ -248,6 +247,10 @@ InstructionTLBMiss:
 #endif
 	mfspr	r10, SPRN_MD_TWC
 	lwz	r10, 0(r10)	/* Get the pte */
+#ifdef CONFIG_HUGETLBFS
+	rlwimi	r11, r10, 32 - 9, _PMD_PAGE_512K
+	mtspr	SPRN_MI_TWC, r11
+#endif
 #ifdef CONFIG_SWAP
 	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
 	and	r11, r11, r10
@@ -353,6 +356,7 @@ DataStoreTLBMiss:
 	 * above.
 	 */
 	rlwimi	r11, r10, 0, _PAGE_GUARDED
+	rlwimi	r11, r10, 32 - 9, _PMD_PAGE_512K
 	mtspr	SPRN_MD_TWC, r11
 
 	/* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
@@ -584,7 +588,6 @@ FixupDAR:/* Entry point for dcbx workaround. */
 	mfspr	r11, SPRN_MD_TWC
 	lwz	r11, 0(r11)	/* Get the pte */
 	bt	28,200f		/* bit 28 = Large page (8M) */
-	bt	29,202f		/* bit 29 = Large page (8M or 512K) */
 	/* concat physical page address(r11) and page offset(r10) */
 	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT, 31
 201:	lwz	r11,0(r11)
@@ -611,11 +614,6 @@ FixupDAR:/* Entry point for dcbx workaround. */
 	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT_8M, 31
 	b	201b
 
-202:
-	/* concat physical page address(r11) and page offset(r10) */
-	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT_512K, 31
-	b	201b
-
 144:	mfspr	r10, SPRN_DSISR
 	rlwinm	r10, r10,0,7,5	/* Clear store bit for buggy dcbst insn */
 	mtspr	SPRN_DSISR, r10
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 521929a371af..38bad839e608 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -189,6 +189,9 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 	if (!hpdp)
 		return NULL;
 
+	if (IS_ENABLED(CONFIG_PPC_8xx) && sz == SZ_512K)
+		return pte_alloc_map(mm, (pmd_t *)hpdp, addr);
+
 	BUG_ON(!hugepd_none(*hpdp) && !hugepd_ok(*hpdp));
 
 	if (hugepd_none(*hpdp) && __hugepte_alloc(mm, hpdp, addr,
@@ -331,13 +334,20 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
 
 	if (shift >= pdshift)
 		hugepd_free(tlb, hugepte);
-	else if (IS_ENABLED(CONFIG_PPC_8xx))
-		pgtable_free_tlb(tlb, hugepte, 0);
 	else
 		pgtable_free_tlb(tlb, hugepte,
 				 get_hugepd_cache_index(pdshift - shift));
 }
 
+static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long addr)
+{
+	pgtable_t token = pmd_pgtable(*pmd);
+
+	pmd_clear(pmd);
+	pte_free_tlb(tlb, token, addr);
+	mm_dec_nr_ptes(tlb->mm);
+}
+
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 				   unsigned long addr, unsigned long end,
 				   unsigned long floor, unsigned long ceiling)
@@ -353,11 +363,17 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 		pmd = pmd_offset(pud, addr);
 		next = pmd_addr_end(addr, end);
 		if (!is_hugepd(__hugepd(pmd_val(*pmd)))) {
+			if (pmd_none_or_clear_bad(pmd))
+				continue;
+
 			/*
 			 * if it is not hugepd pointer, we should already find
 			 * it cleared.
 			 */
-			WARN_ON(!pmd_none_or_clear_bad(pmd));
+			WARN_ON(!IS_ENABLED(CONFIG_PPC_8xx));
+
+			hugetlb_free_pte_range(tlb, pmd, addr);
+
 			continue;
 		}
 		/*
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 214a5f4beb6c..60c4b8ff046c 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -264,6 +264,12 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 #if defined(CONFIG_PPC_8xx)
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
 {
+	pmd_t *pmd = pmd_ptr(mm, addr);
+	pte_basic_t val;
+	pte_basic_t *entry = &ptep->pte;
+	int num = is_hugepd(*((hugepd_t *)pmd)) ? 1 : SZ_512K / SZ_4K;
+	int i;
+
 	/*
 	 * Make sure hardware valid bit is not set. We don't do
 	 * tlb flush for this update.
@@ -274,7 +280,9 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_
 
 	pte = set_pte_filter(pte);
 
-	ptep->pte = pte_val(pte);
+	val = pte_val(pte);
+	for (i = 0; i < num; i++, entry++, val += SZ_4K)
+		*entry = val;
 }
 #endif
 #endif /* CONFIG_HUGETLB_PAGE */
-- 
2.25.0


  parent reply	other threads:[~2020-05-19  5:49 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19  5:48 [PATCH v4 00/45] Use hugepages to map kernel mem on 8xx Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 01/45] powerpc/kasan: Fix error detection on memory allocation Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 02/45] powerpc/kasan: Fix issues by lowering KASAN_SHADOW_END Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 03/45] powerpc/kasan: Fix shadow pages allocation failure Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 04/45] powerpc/kasan: Remove unnecessary page table locking Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 05/45] powerpc/kasan: Refactor update of early shadow mappings Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 06/45] powerpc/kasan: Declare kasan_init_region() weak Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 07/45] powerpc/ptdump: Limit size of flags text to 1/2 chars on PPC32 Christophe Leroy
2020-05-25  5:15   ` Michael Ellerman
2020-05-25 11:06     ` Christophe Leroy
2020-05-26 12:53       ` Michael Ellerman
2020-05-19  5:48 ` [PATCH v4 08/45] powerpc/ptdump: Reorder flags Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 09/45] powerpc/ptdump: Add _PAGE_COHERENT flag Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 10/45] powerpc/ptdump: Display size of BATs Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 11/45] powerpc/ptdump: Standardise display of BAT flags Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 12/45] powerpc/ptdump: Properly handle non standard page size Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 13/45] powerpc/ptdump: Handle hugepd at PGD level Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 14/45] powerpc/32s: Don't warn when mapping RO data ROX Christophe Leroy
2020-05-25  5:40   ` Michael Ellerman
2020-05-19  5:48 ` [PATCH v4 15/45] powerpc/mm: Allocate static page tables for fixmap Christophe Leroy
2020-05-19  5:48 ` [PATCH v4 16/45] powerpc/mm: Fix conditions to perform MMU specific management by blocks on PPC32 Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 17/45] powerpc/mm: PTE_ATOMIC_UPDATES is only for 40x Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 18/45] powerpc/mm: Refactor pte_update() on nohash/32 Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 19/45] powerpc/mm: Refactor pte_update() on book3s/32 Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 20/45] powerpc/mm: Standardise __ptep_test_and_clear_young() params between PPC32 and PPC64 Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 21/45] powerpc/mm: Standardise pte_update() prototype " Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 22/45] powerpc/mm: Create a dedicated pte_update() for 8xx Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 23/45] powerpc/mm: Reduce hugepd size for 8M hugepages on 8xx Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 24/45] powerpc/8xx: Drop CONFIG_8xx_COPYBACK option Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 25/45] powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages Christophe Leroy
2020-05-19  5:49 ` Christophe Leroy [this message]
2020-05-19  5:49 ` [PATCH v4 27/45] powerpc/8xx: Only 8M pages are hugepte pages now Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 28/45] powerpc/8xx: MM_SLICE is not needed anymore Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 29/45] powerpc/8xx: Move PPC_PIN_TLB options into 8xx Kconfig Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 30/45] powerpc/8xx: Add function to set pinned TLBs Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 31/45] powerpc/8xx: Don't set IMMR map anymore at boot Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 32/45] powerpc/8xx: Always pin TLBs at startup Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 33/45] powerpc/8xx: Drop special handling of Linear and IMMR mappings in I/D TLB handlers Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 34/45] powerpc/8xx: Remove now unused TLB miss functions Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 35/45] powerpc/8xx: Move DTLB perf handling closer Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 36/45] powerpc/mm: Don't be too strict with _etext alignment on PPC32 Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 37/45] powerpc/8xx: Refactor kernel address boundary comparison Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 38/45] powerpc/8xx: Add a function to early map kernel via huge pages Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 39/45] powerpc/8xx: Map IMMR with a huge page Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 40/45] powerpc/8xx: Map linear memory with huge pages Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 41/45] powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLB Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 42/45] powerpc/8xx: Allow large TLBs with DEBUG_PAGEALLOC Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 43/45] powerpc/8xx: Implement dedicated kasan_init_region() Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 44/45] powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOC Christophe Leroy
2020-05-19  5:49 ` [PATCH v4 45/45] powerpc/32s: Implement dedicated kasan_init_region() Christophe Leroy
2020-06-09  5:28 ` [PATCH v4 00/45] Use hugepages to map kernel mem on 8xx Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=002819e8e166bf81d24b24782d98de7c40905d8f.1589866984.git.christophe.leroy@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).