* [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx
@ 2018-09-18 16:57 Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 01/20] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" Christophe Leroy
` (19 more replies)
0 siblings, 20 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
The purpose of this serie is to implement hardware assistance for TLB table walk
on the 8xx.
First part switches to patch_site instead of patch_instruction,
as it makes the code clearer and avoids pollution with global symbols.
Optimise access to perf counters (hence reduce number of registers used)
Second part implements HW assistance in TLB routines.
Last part is to make L1 entries and L2 entries independant. For that,
we need to alter ioremap functions in order to handle GUARD attribute
at the PGD/PMD level.
Tested successfully on 8xx.
This serie applies after the two following series:
- [v2 00/24] ban the use of _PAGE_XXX flags outside platform specific code (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=65376)
- [v2,1/4] powerpc/mm: enable the use of page table cache of order 0 (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=60777)
Successfull compilation on kisskb (v4)
http://kisskb.ellerman.id.au/kisskb/branch/chleroy/head/cfdf3349e3877df4cbfa9193ad1f4f4e4ada52de/
Successfull compilation on following defconfigs (v3):
ppc64_defconfig
ppc64e_defconfig
Successfull compilation on following defconfigs (v2):
ppc64_defconfig
ppc64e_defconfig
pseries_defconfig
pmac32_defconfig
linkstation_defconfig
corenet32_smp_defconfig
ppc40x_defconfig
storcenter_defconfig
ppc44x_defconfig
Changes in v4:
- Reordered the serie to put at the end the modifications which makes
L1 and L2 entries independant.
- No modifications to ppc64 ioremap (we still have an opportunity to
merge them, for a future patch serie)
- 8xx code modified to use patch_site instead of patch_instruction
to get a clearer code and avoid object pollution with global symbols
- Moved perf counters in first 32kb of memory to optimise access
- Split the big bang to HW assistance in several steps:
1. Temporarily removes support of 16k pages and 512k hugepages
2. Change TLB routines to use HW assistance for 4k pages and 8M hugepages
3. Add back support for 512k hugepages
4. Add back support for 16k pages (using pte_fragment as page tables are still 4k)
Changes in v3:
- Fixed an issue in the 09/14 when CONFIG_PIN_TLB_TEXT was not enabled
- Added performance measurement in the 09/14 commit log
- Rebased on latest 'powerpc/merge' tree, which conflicted with 13/14
Changes in v2:
- Removed the 3 first patchs which have been applied already
- Fixed compilation errors reported by Michael
- Squashed the commonalisation of ioremap functions into a single patch
- Fixed the use of pte_fragment
- Added a patch optimising perf counting of TLB misses and instructions
Christophe Leroy (20):
Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for
CONFIG_SWAP"
powerpc/code-patching: add a helper to get the address of a patch_site
powerpc/8xx: Use patch_site for memory setup patching
powerpc/8xx: Use patch_site for perf counters setup
powerpc/8xx: Move SW perf counters in first 32kb of memory
powerpc/8xx: Temporarily disable 16k pages and 512k hugepages
powerpc/mm: Use hardware assistance in TLB handlers on the 8xx
powerpc/mm: Enable 512k hugepage support with HW assistance on the 8xx
powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers
powerpc/8xx: regroup TLB handler routines
powerpc/mm: don't use pte_alloc_one_kernel() before slab is available
powerpc/mm: inline pte_alloc_one() and pte_alloc_one_kernel() in PPC32
powerpc/book3s32: Remove CONFIG_BOOKE dependent code
powerpc/mm: Move pte_fragment_alloc() to a common location
powerpc/mm: Avoid useless lock with single page fragments
powerpc/mm: Extend pte_fragment functionality to nohash/32
powerpc/8xx: Remove PTE_ATOMIC_UPDATES
powerpc/mm: reintroduce 16K pages with HW assistance on 8xx
powerpc/nohash32: allow setting GUARDED attribute in the PMD directly
powerpc/8xx: set GUARDED attribute in the PMD directly
arch/powerpc/include/asm/book3s/32/pgalloc.h | 28 +-
arch/powerpc/include/asm/book3s/32/pgtable.h | 16 +-
arch/powerpc/include/asm/code-patching.h | 5 +
arch/powerpc/include/asm/hugetlb.h | 4 +-
arch/powerpc/include/asm/mmu-40x.h | 1 +
arch/powerpc/include/asm/mmu-44x.h | 1 +
arch/powerpc/include/asm/mmu-8xx.h | 44 +--
arch/powerpc/include/asm/mmu-book3e.h | 1 +
arch/powerpc/include/asm/mmu_context.h | 2 +-
arch/powerpc/include/asm/nohash/32/pgalloc.h | 43 ++-
arch/powerpc/include/asm/nohash/32/pgtable.h | 45 ++-
arch/powerpc/include/asm/nohash/32/pte-8xx.h | 6 +-
arch/powerpc/include/asm/nohash/pgtable.h | 4 +
arch/powerpc/include/asm/page.h | 6 +-
arch/powerpc/include/asm/pgtable-types.h | 4 +
arch/powerpc/include/asm/pgtable.h | 8 +
arch/powerpc/kernel/head_8xx.S | 425 +++++++++++----------------
arch/powerpc/mm/8xx_mmu.c | 29 +-
arch/powerpc/mm/Makefile | 7 +-
arch/powerpc/mm/dump_linuxpagetables.c | 21 +-
arch/powerpc/mm/hugetlbpage.c | 13 +
arch/powerpc/mm/mem.c | 7 +
arch/powerpc/mm/mmu_context.c | 1 -
arch/powerpc/mm/mmu_context_book3s64.c | 67 -----
arch/powerpc/mm/mmu_context_nohash.c | 1 +
arch/powerpc/mm/pgtable-book3s64.c | 85 ------
arch/powerpc/mm/pgtable-frag.c | 176 +++++++++++
arch/powerpc/mm/pgtable_32.c | 103 ++++---
arch/powerpc/perf/8xx-pmu.c | 27 +-
arch/powerpc/platforms/Kconfig.cputype | 3 +
30 files changed, 620 insertions(+), 563 deletions(-)
create mode 100644 arch/powerpc/mm/pgtable-frag.c
--
2.13.3
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 01/20] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP"
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 02/20] powerpc/code-patching: add a helper to get the address of a patch_site Christophe Leroy
` (18 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
This reverts commit 4f94b2c7462d9720b2afa7e8e8d4c19446bb31ce.
That commit was buggy, as it used rlwinm instead of rlwimi.
Instead of fixing that bug, we revert the previous commit in order to
reduce the dependency between L1 entries and L2 entries
Fixes: 4f94b2c7462d9 ("powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/mmu-8xx.h | 34 +++++-----------------------
arch/powerpc/kernel/head_8xx.S | 45 +++++++++++++++++++++++---------------
arch/powerpc/mm/8xx_mmu.c | 2 +-
3 files changed, 34 insertions(+), 47 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index 4f547752ae79..193f53116c7a 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -34,20 +34,12 @@
* respectively NA for All or X for Supervisor and no access for User.
* Then we use the APG to say whether accesses are according to Page rules or
* "all Supervisor" rules (Access to all)
- * We also use the 2nd APG bit for _PAGE_ACCESSED when having SWAP:
- * When that bit is not set access is done iaw "all user"
- * which means no access iaw page rules.
- * Therefore, we define 4 APG groups. lsb is _PMD_USER, 2nd is _PAGE_ACCESSED
- * 0x => No access => 11 (all accesses performed as user iaw page definition)
- * 10 => No user => 01 (all accesses performed according to page definition)
- * 11 => User => 00 (all accesses performed as supervisor iaw page definition)
+ * Therefore, we define 2 APG groups. lsb is _PMD_USER
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
* We define all 16 groups so that all other bits of APG can take any value
*/
-#ifdef CONFIG_SWAP
-#define MI_APG_INIT 0xf4f4f4f4
-#else
#define MI_APG_INIT 0x44444444
-#endif
/* The effective page number register. When read, contains the information
* about the last instruction TLB miss. When MI_RPN is written, bits in
@@ -115,20 +107,12 @@
* Supervisor and no access for user and NA for ALL.
* Then we use the APG to say whether accesses are according to Page rules or
* "all Supervisor" rules (Access to all)
- * We also use the 2nd APG bit for _PAGE_ACCESSED when having SWAP:
- * When that bit is not set access is done iaw "all user"
- * which means no access iaw page rules.
- * Therefore, we define 4 APG groups. lsb is _PMD_USER, 2nd is _PAGE_ACCESSED
- * 0x => No access => 11 (all accesses performed as user iaw page definition)
- * 10 => No user => 01 (all accesses performed according to page definition)
- * 11 => User => 00 (all accesses performed as supervisor iaw page definition)
+ * Therefore, we define 2 APG groups. lsb is _PMD_USER
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
* We define all 16 groups so that all other bits of APG can take any value
*/
-#ifdef CONFIG_SWAP
-#define MD_APG_INIT 0xf4f4f4f4
-#else
#define MD_APG_INIT 0x44444444
-#endif
/* The effective page number register. When read, contains the information
* about the last instruction TLB miss. When MD_RPN is written, bits in
@@ -180,12 +164,6 @@
*/
#define SPRN_M_TW 799
-/* APGs */
-#define M_APG0 0x00000000
-#define M_APG1 0x00000020
-#define M_APG2 0x00000040
-#define M_APG3 0x00000060
-
#ifdef CONFIG_PPC_MM_SLICES
#include <asm/nohash/32/slice.h>
#define SLICE_ARRAY_SIZE (1 << (32 - SLICE_LOW_SHIFT - 1))
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 134a573a9f2d..12c92a483fb1 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -353,13 +353,14 @@ _ENTRY(ITLBMiss_cmp)
#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
mtcr r12
#endif
-
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 31, _PAGE_ACCESSED >> 1
-#endif
/* Load the MI_TWC with the attributes for this "segment." */
mtspr SPRN_MI_TWC, r11 /* Set segment attributes */
+#ifdef CONFIG_SWAP
+ rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ and r11, r11, r10
+ rlwimi r10, r11, 0, _PAGE_PRESENT
+#endif
li r11, RPN_PATTERN | 0x200
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 20 and 23 must be clear.
@@ -470,14 +471,22 @@ _ENTRY(DTLBMiss_jmp)
* above.
*/
rlwimi r11, r10, 0, _PAGE_GUARDED
-#ifdef CONFIG_SWAP
- /* _PAGE_ACCESSED has to be set. We use second APG bit for that, 0
- * on that bit will represent a Non Access group
- */
- rlwinm r11, r10, 31, _PAGE_ACCESSED >> 1
-#endif
mtspr SPRN_MD_TWC, r11
+ /* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
+ * We also need to know if the insn is a load/store, so:
+ * Clear _PAGE_PRESENT and load that which will
+ * trap into DTLB Error with store bit set accordinly.
+ */
+ /* PRESENT=0x1, ACCESSED=0x20
+ * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
+ * r10 = (r10 & ~PRESENT) | r11;
+ */
+#ifdef CONFIG_SWAP
+ rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ and r11, r11, r10
+ rlwimi r10, r11, 0, _PAGE_PRESENT
+#endif
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
@@ -637,8 +646,8 @@ InstructionBreakpoint:
*/
DTLBMissIMMR:
mtcr r12
- /* Set 512k byte guarded page and mark it valid and accessed */
- li r10, MD_PS512K | MD_GUARDED | MD_SVALID | M_APG2
+ /* Set 512k byte guarded page and mark it valid */
+ li r10, MD_PS512K | MD_GUARDED | MD_SVALID
mtspr SPRN_MD_TWC, r10
mfspr r10, SPRN_IMMR /* Get current IMMR */
rlwinm r10, r10, 0, 0xfff80000 /* Get 512 kbytes boundary */
@@ -656,8 +665,8 @@ _ENTRY(dtlb_miss_exit_2)
DTLBMissLinear:
mtcr r12
- /* Set 8M byte page and mark it valid and accessed */
- li r11, MD_PS8MEG | MD_SVALID | M_APG2
+ /* Set 8M byte page and mark it valid */
+ li r11, MD_PS8MEG | MD_SVALID
mtspr SPRN_MD_TWC, r11
rlwinm r10, r10, 0, 0x0f800000 /* 8xx supports max 256Mb RAM */
ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
@@ -675,8 +684,8 @@ _ENTRY(dtlb_miss_exit_3)
#ifndef CONFIG_PIN_TLB_TEXT
ITLBMissLinear:
mtcr r12
- /* Set 8M byte page and mark it valid,accessed */
- li r11, MI_PS8MEG | MI_SVALID | M_APG2
+ /* Set 8M byte page and mark it valid */
+ li r11, MI_PS8MEG | MI_SVALID
mtspr SPRN_MI_TWC, r11
rlwinm r10, r10, 0, 0x0f800000 /* 8xx supports max 256Mb RAM */
ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
@@ -960,7 +969,7 @@ initial_mmu:
ori r8, r8, MI_EVALID /* Mark it valid */
mtspr SPRN_MI_EPN, r8
li r8, MI_PS8MEG /* Set 8M byte page */
- ori r8, r8, MI_SVALID | M_APG2 /* Make it valid, APG 2 */
+ ori r8, r8, MI_SVALID /* Make it valid */
mtspr SPRN_MI_TWC, r8
li r8, MI_BOOTINIT /* Create RPN for address 0 */
mtspr SPRN_MI_RPN, r8 /* Store TLB entry */
@@ -987,7 +996,7 @@ initial_mmu:
ori r8, r8, MD_EVALID /* Mark it valid */
mtspr SPRN_MD_EPN, r8
li r8, MD_PS512K | MD_GUARDED /* Set 512k byte page */
- ori r8, r8, MD_SVALID | M_APG2 /* Make it valid and accessed */
+ ori r8, r8, MD_SVALID /* Make it valid */
mtspr SPRN_MD_TWC, r8
mr r8, r9 /* Create paddr for TLB */
ori r8, r8, MI_BOOTINIT|0x2 /* Inhibit cache -- Cort */
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 36484a2ef915..fee599cf3bc3 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -79,7 +79,7 @@ void __init MMU_init_hw(void)
for (; i < 32 && mem >= LARGE_PAGE_SIZE_8M; i++) {
mtspr(SPRN_MD_CTR, ctr | (i << 8));
mtspr(SPRN_MD_EPN, (unsigned long)__va(addr) | MD_EVALID);
- mtspr(SPRN_MD_TWC, MD_PS8MEG | MD_SVALID | M_APG2);
+ mtspr(SPRN_MD_TWC, MD_PS8MEG | MD_SVALID);
mtspr(SPRN_MD_RPN, addr | flags | _PAGE_PRESENT);
addr += LARGE_PAGE_SIZE_8M;
mem -= LARGE_PAGE_SIZE_8M;
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 02/20] powerpc/code-patching: add a helper to get the address of a patch_site
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 01/20] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 03/20] powerpc/8xx: Use patch_site for memory setup patching Christophe Leroy
` (17 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
This patch adds a helper to get the address of a patch_site
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/code-patching.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index 31733a95bbd0..bca48cc1b6ad 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -36,6 +36,11 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr);
int patch_instruction_site(s32 *addr, unsigned int instr);
int patch_branch_site(s32 *site, unsigned long target, int flags);
+static inline unsigned long site_addr(s32 *site)
+{
+ return (unsigned long)site + *site;
+}
+
int instr_is_relative_branch(unsigned int instr);
int instr_is_relative_link_branch(unsigned int instr);
int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 03/20] powerpc/8xx: Use patch_site for memory setup patching
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 01/20] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 02/20] powerpc/code-patching: add a helper to get the address of a patch_site Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 04/20] powerpc/8xx: Use patch_site for perf counters setup Christophe Leroy
` (16 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
The 8xx TLB miss routines are patched at startup at several places.
This patch uses the new patch_site functionality in order
to get a better code readability and avoid a label mess when
dumping the code with 'objdump -d'
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/mmu-8xx.h | 5 +++++
arch/powerpc/kernel/head_8xx.S | 19 +++++++++++--------
arch/powerpc/mm/8xx_mmu.c | 23 +++++++----------------
3 files changed, 23 insertions(+), 24 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index 193f53116c7a..3a15d6647d47 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -229,6 +229,11 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
BUG();
}
+/* patch sites */
+extern s32 patch__itlbmiss_linmem_top;
+extern s32 patch__dtlbmiss_linmem_top, patch__dtlbmiss_immr_jmp;
+extern s32 patch__fixupdar_linmem_top;
+
#endif /* !__ASSEMBLY__ */
#if defined(CONFIG_PPC_4K_PAGES)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 12c92a483fb1..0425571a533d 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -31,6 +31,7 @@
#include <asm/asm-offsets.h>
#include <asm/ptrace.h>
#include <asm/export.h>
+#include <asm/code-patching-asm.h>
#if CONFIG_TASK_SIZE <= 0x80000000 && CONFIG_PAGE_OFFSET >= 0x80000000
/* By simply checking Address >= 0x80000000, we know if its a kernel address */
@@ -318,8 +319,8 @@ InstructionTLBMiss:
cmpli cr0, r11, PAGE_OFFSET@h
#ifndef CONFIG_PIN_TLB_TEXT
/* It is assumed that kernel code fits into the first 8M page */
-_ENTRY(ITLBMiss_cmp)
- cmpli cr7, r11, (PAGE_OFFSET + 0x0800000)@h
+0: cmpli cr7, r11, (PAGE_OFFSET + 0x0800000)@h
+ patch_site 0b, patch__itlbmiss_linmem_top
#endif
#endif
#endif
@@ -436,11 +437,11 @@ DataStoreTLBMiss:
#ifndef CONFIG_PIN_TLB_IMMR
cmpli cr0, r11, VIRT_IMMR_BASE@h
#endif
-_ENTRY(DTLBMiss_cmp)
- cmpli cr7, r11, (PAGE_OFFSET + 0x1800000)@h
+0: cmpli cr7, r11, (PAGE_OFFSET + 0x1800000)@h
+ patch_site 0b, patch__dtlbmiss_linmem_top
#ifndef CONFIG_PIN_TLB_IMMR
-_ENTRY(DTLBMiss_jmp)
- beq- DTLBMissIMMR
+0: beq- DTLBMissIMMR
+ patch_site 0b, patch__dtlbmiss_immr_jmp
#endif
blt cr7, DTLBMissLinear
lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
@@ -714,8 +715,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
mfspr r11, SPRN_M_TW /* Get level 1 table */
blt+ 3f
rlwinm r11, r10, 16, 0xfff8
-_ENTRY(FixupDAR_cmp)
- cmpli cr7, r11, (PAGE_OFFSET + 0x1800000)@h
+
+0: cmpli cr7, r11, (PAGE_OFFSET + 0x1800000)@h
+ patch_site 0b, patch__fixupdar_linmem_top
+
/* create physical page address from effective address */
tophys(r11, r10)
blt- cr7, 201f
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index fee599cf3bc3..d39f3af03221 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -97,22 +97,13 @@ static void __init mmu_mapin_immr(void)
map_kernel_page(v + offset, p + offset, PAGE_KERNEL_NCG);
}
-/* Address of instructions to patch */
-#ifndef CONFIG_PIN_TLB_IMMR
-extern unsigned int DTLBMiss_jmp;
-#endif
-extern unsigned int DTLBMiss_cmp, FixupDAR_cmp;
-#ifndef CONFIG_PIN_TLB_TEXT
-extern unsigned int ITLBMiss_cmp;
-#endif
-
-static void __init mmu_patch_cmp_limit(unsigned int *addr, unsigned long mapped)
+static void __init mmu_patch_cmp_limit(s32 *site, unsigned long mapped)
{
- unsigned int instr = *addr;
+ unsigned int instr = *(unsigned int *)site_addr(site);
instr &= 0xffff0000;
instr |= (unsigned long)__va(mapped) >> 16;
- patch_instruction(addr, instr);
+ patch_instruction_site(site, instr);
}
unsigned long __init mmu_mapin_ram(unsigned long top)
@@ -123,17 +114,17 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
mapped = 0;
mmu_mapin_immr();
#ifndef CONFIG_PIN_TLB_IMMR
- patch_instruction(&DTLBMiss_jmp, PPC_INST_NOP);
+ patch_instruction_site(&patch__dtlbmiss_immr_jmp, PPC_INST_NOP);
#endif
#ifndef CONFIG_PIN_TLB_TEXT
- mmu_patch_cmp_limit(&ITLBMiss_cmp, 0);
+ mmu_patch_cmp_limit(&patch__itlbmiss_linmem_top, 0);
#endif
} else {
mapped = top & ~(LARGE_PAGE_SIZE_8M - 1);
}
- mmu_patch_cmp_limit(&DTLBMiss_cmp, mapped);
- mmu_patch_cmp_limit(&FixupDAR_cmp, mapped);
+ mmu_patch_cmp_limit(&patch__dtlbmiss_linmem_top, mapped);
+ mmu_patch_cmp_limit(&patch__fixupdar_linmem_top, mapped);
/* If the size of RAM is not an exact power of two, we may not
* have covered RAM in its entirety with 8 MiB
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 04/20] powerpc/8xx: Use patch_site for perf counters setup
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (2 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 03/20] powerpc/8xx: Use patch_site for memory setup patching Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 05/20] powerpc/8xx: Move SW perf counters in first 32kb of memory Christophe Leroy
` (15 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
The 8xx TLB miss routines are patched when (de)activating
perf counters.
This patch uses the new patch_site functionality in order
to get a better code readability and avoid a label mess when
dumping the code with 'objdump -d'
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/mmu-8xx.h | 4 ++++
arch/powerpc/kernel/head_8xx.S | 33 +++++++++++++++++++--------------
arch/powerpc/perf/8xx-pmu.c | 27 ++++++++++++---------------
3 files changed, 35 insertions(+), 29 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index 3a15d6647d47..fa05aa566ece 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -234,6 +234,10 @@ extern s32 patch__itlbmiss_linmem_top;
extern s32 patch__dtlbmiss_linmem_top, patch__dtlbmiss_immr_jmp;
extern s32 patch__fixupdar_linmem_top;
+extern s32 patch__itlbmiss_exit_1, patch__itlbmiss_exit_2;
+extern s32 patch__dtlbmiss_exit_1, patch__dtlbmiss_exit_2, patch__dtlbmiss_exit_3;
+extern s32 patch__itlbmiss_perf, patch__dtlbmiss_perf;
+
#endif /* !__ASSEMBLY__ */
#if defined(CONFIG_PPC_4K_PAGES)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 0425571a533d..3b67b9533c82 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -374,16 +374,17 @@ InstructionTLBMiss:
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
/* Restore registers */
-_ENTRY(itlb_miss_exit_1)
- mfspr r10, SPRN_SPRG_SCRATCH0
+0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
mfspr r12, SPRN_SPRG_SCRATCH2
#endif
rfi
+ patch_site 0b, patch__itlbmiss_exit_1
+
#ifdef CONFIG_PERF_EVENTS
-_ENTRY(itlb_miss_perf)
- lis r10, (itlb_miss_counter - PAGE_OFFSET)@ha
+ patch_site 0f, patch__itlbmiss_perf
+0: lis r10, (itlb_miss_counter - PAGE_OFFSET)@ha
lwz r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
addi r11, r11, 1
stw r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
@@ -499,14 +500,16 @@ DataStoreTLBMiss:
/* Restore registers */
mtspr SPRN_DAR, r11 /* Tag DAR */
-_ENTRY(dtlb_miss_exit_1)
- mfspr r10, SPRN_SPRG_SCRATCH0
+
+0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
mfspr r12, SPRN_SPRG_SCRATCH2
rfi
+ patch_site 0b, patch__dtlbmiss_exit_1
+
#ifdef CONFIG_PERF_EVENTS
-_ENTRY(dtlb_miss_perf)
- lis r10, (dtlb_miss_counter - PAGE_OFFSET)@ha
+ patch_site 0f, patch__dtlbmiss_perf
+0: lis r10, (dtlb_miss_counter - PAGE_OFFSET)@ha
lwz r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
addi r11, r11, 1
stw r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
@@ -658,11 +661,12 @@ DTLBMissIMMR:
li r11, RPN_PATTERN
mtspr SPRN_DAR, r11 /* Tag DAR */
-_ENTRY(dtlb_miss_exit_2)
- mfspr r10, SPRN_SPRG_SCRATCH0
+
+0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
mfspr r12, SPRN_SPRG_SCRATCH2
rfi
+ patch_site 0b, patch__dtlbmiss_exit_2
DTLBMissLinear:
mtcr r12
@@ -676,11 +680,12 @@ DTLBMissLinear:
li r11, RPN_PATTERN
mtspr SPRN_DAR, r11 /* Tag DAR */
-_ENTRY(dtlb_miss_exit_3)
- mfspr r10, SPRN_SPRG_SCRATCH0
+
+0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
mfspr r12, SPRN_SPRG_SCRATCH2
rfi
+ patch_site 0b, patch__dtlbmiss_exit_3
#ifndef CONFIG_PIN_TLB_TEXT
ITLBMissLinear:
@@ -693,11 +698,11 @@ ITLBMissLinear:
_PAGE_PRESENT
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
-_ENTRY(itlb_miss_exit_2)
- mfspr r10, SPRN_SPRG_SCRATCH0
+0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
mfspr r12, SPRN_SPRG_SCRATCH2
rfi
+ patch_site 0b, patch__itlbmiss_exit_2
#endif
/* This is the procedure to calculate the data EA for buggy dcbx,dcbi instructions
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index 6c0020d1c561..808f1873de61 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -31,9 +31,6 @@
extern unsigned long itlb_miss_counter, dtlb_miss_counter;
extern atomic_t instruction_counter;
-extern unsigned int itlb_miss_perf, dtlb_miss_perf;
-extern unsigned int itlb_miss_exit_1, itlb_miss_exit_2;
-extern unsigned int dtlb_miss_exit_1, dtlb_miss_exit_2, dtlb_miss_exit_3;
static atomic_t insn_ctr_ref;
static atomic_t itlb_miss_ref;
@@ -103,22 +100,22 @@ static int mpc8xx_pmu_add(struct perf_event *event, int flags)
break;
case PERF_8xx_ID_ITLB_LOAD_MISS:
if (atomic_inc_return(&itlb_miss_ref) == 1) {
- unsigned long target = (unsigned long)&itlb_miss_perf;
+ unsigned long target = site_addr(&patch__itlbmiss_perf);
- patch_branch(&itlb_miss_exit_1, target, 0);
+ patch_branch_site(&patch__itlbmiss_exit_1, target, 0);
#ifndef CONFIG_PIN_TLB_TEXT
- patch_branch(&itlb_miss_exit_2, target, 0);
+ patch_branch_site(&patch__itlbmiss_exit_2, target, 0);
#endif
}
val = itlb_miss_counter;
break;
case PERF_8xx_ID_DTLB_LOAD_MISS:
if (atomic_inc_return(&dtlb_miss_ref) == 1) {
- unsigned long target = (unsigned long)&dtlb_miss_perf;
+ unsigned long target = site_addr(&patch__dtlbmiss_perf);
- patch_branch(&dtlb_miss_exit_1, target, 0);
- patch_branch(&dtlb_miss_exit_2, target, 0);
- patch_branch(&dtlb_miss_exit_3, target, 0);
+ patch_branch_site(&patch__dtlbmiss_exit_1, target, 0);
+ patch_branch_site(&patch__dtlbmiss_exit_2, target, 0);
+ patch_branch_site(&patch__dtlbmiss_exit_3, target, 0);
}
val = dtlb_miss_counter;
break;
@@ -180,17 +177,17 @@ static void mpc8xx_pmu_del(struct perf_event *event, int flags)
break;
case PERF_8xx_ID_ITLB_LOAD_MISS:
if (atomic_dec_return(&itlb_miss_ref) == 0) {
- patch_instruction(&itlb_miss_exit_1, insn);
+ patch_instruction_site(&patch__itlbmiss_exit_1, insn);
#ifndef CONFIG_PIN_TLB_TEXT
- patch_instruction(&itlb_miss_exit_2, insn);
+ patch_instruction_site(&patch__itlbmiss_exit_2, insn);
#endif
}
break;
case PERF_8xx_ID_DTLB_LOAD_MISS:
if (atomic_dec_return(&dtlb_miss_ref) == 0) {
- patch_instruction(&dtlb_miss_exit_1, insn);
- patch_instruction(&dtlb_miss_exit_2, insn);
- patch_instruction(&dtlb_miss_exit_3, insn);
+ patch_instruction_site(&patch__dtlbmiss_exit_1, insn);
+ patch_instruction_site(&patch__dtlbmiss_exit_2, insn);
+ patch_instruction_site(&patch__dtlbmiss_exit_3, insn);
}
break;
}
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 05/20] powerpc/8xx: Move SW perf counters in first 32kb of memory
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (3 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 04/20] powerpc/8xx: Use patch_site for perf counters setup Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 06/20] powerpc/8xx: Temporarily disable 16k pages and 512k hugepages Christophe Leroy
` (14 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
In order to simplify time critical exceptions handling 8xx
specific SW perf counters, this patch moves the counters into
the beginning of memory. This is possible because .text is readable
and the counters are never modified outside of the handlers.
By doing this, we avoid having to set a second register with
the upper part of the address of the counters.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/kernel/head_8xx.S | 58 ++++++++++++++++++++----------------------
1 file changed, 28 insertions(+), 30 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3b67b9533c82..c203defe49a4 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -106,6 +106,23 @@ turn_on_mmu:
mtspr SPRN_SRR0,r0
rfi /* enables MMU */
+
+#ifdef CONFIG_PERF_EVENTS
+ .align 4
+
+ .globl itlb_miss_counter
+itlb_miss_counter:
+ .space 4
+
+ .globl dtlb_miss_counter
+dtlb_miss_counter:
+ .space 4
+
+ .globl instruction_counter
+instruction_counter:
+ .space 4
+#endif
+
/*
* Exception entry code. This code runs with address translation
* turned off, i.e. using physical addresses.
@@ -384,17 +401,16 @@ InstructionTLBMiss:
#ifdef CONFIG_PERF_EVENTS
patch_site 0f, patch__itlbmiss_perf
-0: lis r10, (itlb_miss_counter - PAGE_OFFSET)@ha
- lwz r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
- addi r11, r11, 1
- stw r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
+ addi r10, r10, 1
+ stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
mfspr r12, SPRN_SPRG_SCRATCH2
#endif
rfi
+#endif
#ifdef CONFIG_HUGETLB_PAGE
10: /* 8M pages */
@@ -509,15 +525,14 @@ DataStoreTLBMiss:
#ifdef CONFIG_PERF_EVENTS
patch_site 0f, patch__dtlbmiss_perf
-0: lis r10, (dtlb_miss_counter - PAGE_OFFSET)@ha
- lwz r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
- addi r11, r11, 1
- stw r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+ addi r10, r10, 1
+ stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
mfspr r12, SPRN_SPRG_SCRATCH2
rfi
+#endif
#ifdef CONFIG_HUGETLB_PAGE
10: /* 8M pages */
@@ -625,16 +640,13 @@ DataBreakpoint:
. = 0x1d00
InstructionBreakpoint:
mtspr SPRN_SPRG_SCRATCH0, r10
- mtspr SPRN_SPRG_SCRATCH1, r11
- lis r10, (instruction_counter - PAGE_OFFSET)@ha
- lwz r11, (instruction_counter - PAGE_OFFSET)@l(r10)
- addi r11, r11, -1
- stw r11, (instruction_counter - PAGE_OFFSET)@l(r10)
+ lwz r10, (instruction_counter - PAGE_OFFSET)@l(0)
+ addi r10, r10, -1
+ stw r10, (instruction_counter - PAGE_OFFSET)@l(0)
lis r10, 0xffff
ori r10, r10, 0x01
mtspr SPRN_COUNTA, r10
mfspr r10, SPRN_SPRG_SCRATCH0
- mfspr r11, SPRN_SPRG_SCRATCH1
rfi
#else
EXCEPTION(0x1d00, Trap_1d, unknown_exception, EXC_XFER_EE)
@@ -1065,17 +1077,3 @@ swapper_pg_dir:
*/
abatron_pteptrs:
.space 8
-
-#ifdef CONFIG_PERF_EVENTS
- .globl itlb_miss_counter
-itlb_miss_counter:
- .space 4
-
- .globl dtlb_miss_counter
-dtlb_miss_counter:
- .space 4
-
- .globl instruction_counter
-instruction_counter:
- .space 4
-#endif
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 06/20] powerpc/8xx: Temporarily disable 16k pages and 512k hugepages
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (4 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 05/20] powerpc/8xx: Move SW perf counters in first 32kb of memory Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 07/20] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx Christophe Leroy
` (13 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
In preparation of making use of hardware assistance in TLB handlers,
this patch temporarily disables 16K pages and 512K pages. The reason
is that when using HW assistance in 4K pages mode, the linux model
fit with the HW model for 4K pages and 8M pages.
However for 16K pages and 512K mode some additional work is needed
to get linux model fit with HW model.
Therefore the 4K pages mode will be implemented first and without
support for 512k hugepages. Then the 512k hugepages will be brought
back. And the 16K pages will be implemented in further steps.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/Kconfig | 2 +-
arch/powerpc/kernel/head_8xx.S | 36 ------------------------------------
arch/powerpc/mm/tlb_nohash.c | 3 ---
3 files changed, 1 insertion(+), 40 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a80669209155..33931804c46f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -698,7 +698,7 @@ config PPC_4K_PAGES
config PPC_16K_PAGES
bool "16k page size"
- depends on 44x || PPC_8xx
+ depends on 44x
config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c203defe49a4..9b31721b522c 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -363,7 +363,6 @@ InstructionTLBMiss:
#ifdef CONFIG_HUGETLB_PAGE
mtcr r11
bt- 28, 10f /* bit 28 = Large page (8M) */
- bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
#endif
rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
lwz r10, 0(r10) /* Get the pte */
@@ -414,23 +413,8 @@ InstructionTLBMiss:
#ifdef CONFIG_HUGETLB_PAGE
10: /* 8M pages */
-#ifdef CONFIG_PPC_16K_PAGES
- /* Extract level 2 index */
- rlwinm r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
- /* Add level 2 base */
- rlwimi r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-#else
/* Level 2 base */
rlwinm r10, r11, 0, ~HUGEPD_SHIFT_MASK
-#endif
- lwz r10, 0(r10) /* Get the pte */
- b 4b
-
-20: /* 512k pages */
- /* Extract level 2 index */
- rlwinm r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
- /* Add level 2 base */
- rlwimi r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
lwz r10, 0(r10) /* Get the pte */
b 4b
#endif
@@ -475,7 +459,6 @@ DataStoreTLBMiss:
#ifdef CONFIG_HUGETLB_PAGE
mtcr r11
bt- 28, 10f /* bit 28 = Large page (8M) */
- bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
#endif
rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
lwz r10, 0(r10) /* Get the pte */
@@ -537,22 +520,8 @@ DataStoreTLBMiss:
#ifdef CONFIG_HUGETLB_PAGE
10: /* 8M pages */
/* Extract level 2 index */
-#ifdef CONFIG_PPC_16K_PAGES
- rlwinm r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
- /* Add level 2 base */
- rlwimi r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-#else
/* Level 2 base */
rlwinm r10, r11, 0, ~HUGEPD_SHIFT_MASK
-#endif
- lwz r10, 0(r10) /* Get the pte */
- b 4b
-
-20: /* 512k pages */
- /* Extract level 2 index */
- rlwinm r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
- /* Add level 2 base */
- rlwimi r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
lwz r10, 0(r10) /* Get the pte */
b 4b
#endif
@@ -773,12 +742,7 @@ FixupDAR:/* Entry point for dcbx workaround. */
/* concat physical page address(r11) and page offset(r10) */
200:
-#ifdef CONFIG_PPC_16K_PAGES
- rlwinm r11, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
- rlwimi r11, r10, 32 - (PAGE_SHIFT_8M - 2), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
-#else
rlwinm r11, r10, 0, ~HUGEPD_SHIFT_MASK
-#endif
lwz r11, 0(r11) /* Get the pte */
/* concat physical page address(r11) and page offset(r10) */
rlwimi r11, r10, 0, 32 - PAGE_SHIFT_8M, 31
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 15fe5f0c8665..49441963d285 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,9 +97,6 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift = 14,
},
#endif
- [MMU_PAGE_512K] = {
- .shift = 19,
- },
[MMU_PAGE_8M] = {
.shift = 23,
},
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 07/20] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (5 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 06/20] powerpc/8xx: Temporarily disable 16k pages and 512k hugepages Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 08/20] powerpc/mm: Enable 512k hugepage support with HW assistance " Christophe Leroy
` (12 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.
The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.
However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.
This patch modifies the TLB handlers to use HW assistance for 4K PAGES.
Before that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 80 ticks
- DTLB miss: 62 ticks
After that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 72 ticks
- DTLB miss: 54 ticks
So the improvement is 10% for ITLB and 13% for DTLB misses
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/kernel/head_8xx.S | 97 +++++++++++++-----------------------------
arch/powerpc/mm/8xx_mmu.c | 4 +-
2 files changed, 32 insertions(+), 69 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 9b31721b522c..50e97027b507 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -292,7 +292,7 @@ SystemCall:
. = 0x1100
/*
* For the MPC8xx, this is a software tablewalk to load the instruction
- * TLB. The task switch loads the M_TW register with the pointer to the first
+ * TLB. The task switch loads the M_TWB register with the pointer to the first
* level table.
* If we discover there is no second level table (value is zero) or if there
* is an invalid pte, we load that into the TLB, which causes another fault
@@ -314,7 +314,7 @@ SystemCall:
InstructionTLBMiss:
mtspr SPRN_SPRG_SCRATCH0, r10
mtspr SPRN_SPRG_SCRATCH1, r11
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mtspr SPRN_SPRG_SCRATCH2, r12
#endif
@@ -323,12 +323,11 @@ InstructionTLBMiss:
*/
mfspr r10, SPRN_SRR0 /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+ mtspr SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
* pin the first 8MB of kernel memory */
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
- mfcr r12
-#endif
#ifdef ITLB_MISS_KERNEL
+ mfcr r12
#if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
andis. r11, r10, 0x8000 /* Address >= 0x80000000 */
#else
@@ -341,7 +340,7 @@ InstructionTLBMiss:
#endif
#endif
#endif
- mfspr r11, SPRN_M_TW /* Get level 1 table */
+ mfspr r11, SPRN_M_TWB /* Get level 1 table */
#ifdef ITLB_MISS_KERNEL
#if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
beq+ 3f
@@ -351,23 +350,17 @@ InstructionTLBMiss:
#ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
#endif
- lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+ rlwinm r11, r11, 0, 20, 31
+ oris r11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
3:
#endif
- /* Insert level 1 index */
- rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11) /* Get the level 1 entry */
- /* Extract level 2 index */
- rlwinm r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
- mtcr r11
- bt- 28, 10f /* bit 28 = Large page (8M) */
-#endif
- rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+ mtspr SPRN_MD_TWC, r11
+ mfspr r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
-4:
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+
+#ifdef ITLB_MISS_KERNEL
mtcr r12
#endif
/* Load the MI_TWC with the attributes for this "segment." */
@@ -392,7 +385,7 @@ InstructionTLBMiss:
/* Restore registers */
0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr r12, SPRN_SPRG_SCRATCH2
#endif
rfi
@@ -405,20 +398,12 @@ InstructionTLBMiss:
stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr r12, SPRN_SPRG_SCRATCH2
#endif
rfi
#endif
-#ifdef CONFIG_HUGETLB_PAGE
-10: /* 8M pages */
- /* Level 2 base */
- rlwinm r10, r11, 0, ~HUGEPD_SHIFT_MASK
- lwz r10, 0(r10) /* Get the pte */
- b 4b
-#endif
-
. = 0x1200
DataStoreTLBMiss:
mtspr SPRN_SPRG_SCRATCH0, r10
@@ -432,7 +417,7 @@ DataStoreTLBMiss:
mfspr r10, SPRN_MD_EPN
rlwinm r11, r10, 16, 0xfff8
cmpli cr0, r11, PAGE_OFFSET@h
- mfspr r11, SPRN_M_TW /* Get level 1 table */
+ mfspr r11, SPRN_M_TWB /* Get level 1 table */
blt+ 3f
rlwinm r11, r10, 16, 0xfff8
#ifndef CONFIG_PIN_TLB_IMMR
@@ -445,24 +430,16 @@ DataStoreTLBMiss:
patch_site 0b, patch__dtlbmiss_immr_jmp
#endif
blt cr7, DTLBMissLinear
- lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+ mfspr r11, SPRN_M_TWB /* Get level 1 table */
+ rlwinm r11, r11, 0, 20, 31
+ oris r11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
3:
-
- /* Insert level 1 index */
- rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11) /* Get the level 1 entry */
- /* We have a pte table, so load fetch the pte from the table.
- */
- /* Extract level 2 index */
- rlwinm r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
- mtcr r11
- bt- 28, 10f /* bit 28 = Large page (8M) */
-#endif
- rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+ mtspr SPRN_MD_TWC, r11
+ mfspr r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
-4:
+
mtcr r12
/* Insert the Guarded flag into the TWC from the Linux PTE.
@@ -517,15 +494,6 @@ DataStoreTLBMiss:
rfi
#endif
-#ifdef CONFIG_HUGETLB_PAGE
-10: /* 8M pages */
- /* Extract level 2 index */
- /* Level 2 base */
- rlwinm r10, r11, 0, ~HUGEPD_SHIFT_MASK
- lwz r10, 0(r10) /* Get the pte */
- b 4b
-#endif
-
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
@@ -696,9 +664,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
mtspr SPRN_SPRG_SCRATCH2, r10
/* fetch instruction from memory. */
mfspr r10, SPRN_SRR0
+ mtspr SPRN_MD_EPN, r10
rlwinm r11, r10, 16, 0xfff8
cmpli cr0, r11, PAGE_OFFSET@h
- mfspr r11, SPRN_M_TW /* Get level 1 table */
+ mfspr r11, SPRN_M_TWB /* Get level 1 table */
blt+ 3f
rlwinm r11, r10, 16, 0xfff8
@@ -708,17 +677,17 @@ FixupDAR:/* Entry point for dcbx workaround. */
/* create physical page address from effective address */
tophys(r11, r10)
blt- cr7, 201f
- lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
- /* Insert level 1 index */
-3: rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
+ mfspr r11, SPRN_M_TWB /* Get level 1 table */
+ rlwinm r11, r11, 0, 20, 31
+ oris r11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
+3:
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11) /* Get the level 1 entry */
+ mtspr SPRN_MD_TWC, r11
mtcr r11
+ mfspr r11, SPRN_MD_TWC
+ lwz r11, 0(r11) /* Get the pte */
bt 28,200f /* bit 28 = Large page (8M) */
bt 29,202f /* bit 29 = Large page (8M or 512K) */
- rlwinm r11, r11,0,0,19 /* Extract page descriptor page address */
- /* Insert level 2 index */
- rlwimi r11, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
- lwz r11, 0(r11) /* Get the pte */
/* concat physical page address(r11) and page offset(r10) */
rlwimi r11, r10, 0, 32 - PAGE_SHIFT, 31
201: lwz r11,0(r11)
@@ -740,18 +709,12 @@ FixupDAR:/* Entry point for dcbx workaround. */
141: mfspr r10,SPRN_SPRG_SCRATCH2
b DARFixed /* Nope, go back to normal TLB processing */
- /* concat physical page address(r11) and page offset(r10) */
200:
- rlwinm r11, r10, 0, ~HUGEPD_SHIFT_MASK
- lwz r11, 0(r11) /* Get the pte */
/* concat physical page address(r11) and page offset(r10) */
rlwimi r11, r10, 0, 32 - PAGE_SHIFT_8M, 31
b 201b
202:
- rlwinm r11, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
- rlwimi r11, r10, 32 - (PAGE_SHIFT_512K - 2), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
- lwz r11, 0(r11) /* Get the pte */
/* concat physical page address(r11) and page offset(r10) */
rlwimi r11, r10, 0, 32 - PAGE_SHIFT_512K, 31
b 201b
@@ -867,7 +830,7 @@ start_here:
lis r6, swapper_pg_dir@ha
tophys(r6,r6)
- mtspr SPRN_M_TW, r6
+ mtspr SPRN_M_TWB, r6
bl early_init /* We have to do this with MMU on */
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index d39f3af03221..896e710e5697 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -174,12 +174,12 @@ void set_context(unsigned long id, pgd_t *pgd)
*(ptr + 1) = pgd;
#endif
- /* Register M_TW will contain base address of level 1 table minus the
+ /* Register M_TWB will contain base address of level 1 table minus the
* lower part of the kernel PGDIR base address, so that all accesses to
* level 1 table are done relative to lower part of kernel PGDIR base
* address.
*/
- mtspr(SPRN_M_TW, __pa(pgd) - offset);
+ mtspr(SPRN_M_TWB, __pa(pgd) - offset);
/* Update context */
mtspr(SPRN_M_CASID, id - 1);
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 08/20] powerpc/mm: Enable 512k hugepage support with HW assistance on the 8xx
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (6 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 07/20] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 09/20] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers Christophe Leroy
` (11 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
For using 512k pages with hardware assistance, the PTEs have to be spread
every 128 bytes in the L2 table.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/hugetlb.h | 4 +++-
arch/powerpc/mm/hugetlbpage.c | 13 +++++++++++++
arch/powerpc/mm/tlb_nohash.c | 3 +++
3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index e13843556414..b22f164216ad 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -75,7 +75,9 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
unsigned long idx = 0;
pte_t *dir = hugepd_page(hpd);
-#ifndef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_8xx
+ idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
+#elif !defined(CONFIG_PPC_FSL_BOOK3E)
idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
#endif
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 16846649499b..527ea2451cc2 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -66,7 +66,11 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
cachep = PGT_CACHE(PTE_T_ORDER);
num_hugepd = 1 << (pshift - pdshift);
} else {
+#ifdef CONFIG_PPC_8xx
+ cachep = PGT_CACHE(PTE_SHIFT);
+#else
cachep = PGT_CACHE(pdshift - pshift);
+#endif
num_hugepd = 1;
}
@@ -330,8 +334,13 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
if (shift >= pdshift)
hugepd_free(tlb, hugepte);
else
+#ifdef CONFIG_PPC_8xx
+ pgtable_free_tlb(tlb, hugepte,
+ get_hugepd_cache_index(PTE_SHIFT));
+#else
pgtable_free_tlb(tlb, hugepte,
get_hugepd_cache_index(pdshift - shift));
+#endif
}
static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -699,7 +708,11 @@ static int __init hugetlbpage_init(void)
* use pgt cache for hugepd.
*/
if (pdshift > shift)
+#ifdef CONFIG_PPC_8xx
+ pgtable_cache_add(PTE_SHIFT);
+#else
pgtable_cache_add(pdshift - shift);
+#endif
#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
else
pgtable_cache_add(PTE_T_ORDER);
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 49441963d285..15fe5f0c8665 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift = 14,
},
#endif
+ [MMU_PAGE_512K] = {
+ .shift = 19,
+ },
[MMU_PAGE_8M] = {
.shift = 23,
},
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 09/20] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (7 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 08/20] powerpc/mm: Enable 512k hugepage support with HW assistance " Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 10/20] powerpc/8xx: regroup TLB handler routines Christophe Leroy
` (10 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
This patch reworks the TLB Miss handler in order to not use r12
register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2.
In the DAR Fixup code we can now use SPRN_M_TW, freeing
SPRN_SPRG_SCRATCH2.
Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/kernel/head_8xx.S | 110 ++++++++++++++++++-----------------------
1 file changed, 49 insertions(+), 61 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 50e97027b507..d69c6e3d5cc1 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -302,91 +302,88 @@ SystemCall:
*/
#ifdef CONFIG_8xx_CPU15
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) \
- addi tmp, addr, PAGE_SIZE; \
- tlbie tmp; \
- addi tmp, addr, -PAGE_SIZE; \
- tlbie tmp
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr) \
+ addi addr, addr, PAGE_SIZE; \
+ tlbie addr; \
+ addi addr, addr, -(PAGE_SIZE << 1); \
+ tlbie addr; \
+ addi addr, addr, PAGE_SIZE
#else
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr)
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)
#endif
InstructionTLBMiss:
mtspr SPRN_SPRG_SCRATCH0, r10
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mtspr SPRN_SPRG_SCRATCH1, r11
-#ifdef ITLB_MISS_KERNEL
- mtspr SPRN_SPRG_SCRATCH2, r12
#endif
/* If we are faulting a kernel address, we have to use the
* kernel page tables.
*/
mfspr r10, SPRN_SRR0 /* Get effective address of fault */
- INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+ INVALIDATE_ADJACENT_PAGES_CPU15(r10)
mtspr SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
* pin the first 8MB of kernel memory */
#ifdef ITLB_MISS_KERNEL
- mfcr r12
+ mfcr r11
#if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
- andis. r11, r10, 0x8000 /* Address >= 0x80000000 */
+ cmpi cr0, r10, 0 /* Address >= 0x80000000 */
#else
- rlwinm r11, r10, 16, 0xfff8
- cmpli cr0, r11, PAGE_OFFSET@h
+ rlwinm r10, r10, 16, 0xfff8
+ cmpli cr0, r10, PAGE_OFFSET@h
#ifndef CONFIG_PIN_TLB_TEXT
/* It is assumed that kernel code fits into the first 8M page */
-0: cmpli cr7, r11, (PAGE_OFFSET + 0x0800000)@h
+0: cmpli cr7, r10, (PAGE_OFFSET + 0x0800000)@h
patch_site 0b, patch__itlbmiss_linmem_top
#endif
#endif
#endif
- mfspr r11, SPRN_M_TWB /* Get level 1 table */
+ mfspr r10, SPRN_M_TWB /* Get level 1 table */
#ifdef ITLB_MISS_KERNEL
#if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
- beq+ 3f
+ bge+ 3f
#else
blt+ 3f
#endif
#ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
#endif
- rlwinm r11, r11, 0, 20, 31
- oris r11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
+ rlwinm r10, r10, 0, 20, 31
+ oris r10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
3:
#endif
- lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11) /* Get the level 1 entry */
+ lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10) /* Get level 1 entry */
+ mtspr SPRN_MI_TWC, r10 /* Set segment attributes */
- mtspr SPRN_MD_TWC, r11
+ mtspr SPRN_MD_TWC, r10
mfspr r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
#ifdef ITLB_MISS_KERNEL
- mtcr r12
+ mtcr r11
#endif
- /* Load the MI_TWC with the attributes for this "segment." */
- mtspr SPRN_MI_TWC, r11 /* Set segment attributes */
-
#ifdef CONFIG_SWAP
rlwinm r11, r10, 32-5, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
#endif
- li r11, RPN_PATTERN | 0x200
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 20 and 23 must be clear.
* Software indicator bits 22, 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
- rlwimi r11, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
- rlwimi r10, r11, 0, 0x0ff0 /* Set 22, 24-27, clear 20,23 */
+ rlwimi r10, r10, 0, 0x0f00 /* Clear bits 20-23 */
+ rlwimi r10, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
+ ori r10, r10, RPN_PATTERN | 0x200 /* Set 22 and 24-27 */
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
/* Restore registers */
0: mfspr r10, SPRN_SPRG_SCRATCH0
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mfspr r11, SPRN_SPRG_SCRATCH1
-#ifdef ITLB_MISS_KERNEL
- mfspr r12, SPRN_SPRG_SCRATCH2
#endif
rfi
patch_site 0b, patch__itlbmiss_exit_1
@@ -397,9 +394,8 @@ InstructionTLBMiss:
addi r10, r10, 1
stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr r10, SPRN_SPRG_SCRATCH0
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mfspr r11, SPRN_SPRG_SCRATCH1
-#ifdef ITLB_MISS_KERNEL
- mfspr r12, SPRN_SPRG_SCRATCH2
#endif
rfi
#endif
@@ -408,40 +404,37 @@ InstructionTLBMiss:
DataStoreTLBMiss:
mtspr SPRN_SPRG_SCRATCH0, r10
mtspr SPRN_SPRG_SCRATCH1, r11
- mtspr SPRN_SPRG_SCRATCH2, r12
- mfcr r12
+ mfcr r11
/* If we are faulting a kernel address, we have to use the
* kernel page tables.
*/
mfspr r10, SPRN_MD_EPN
- rlwinm r11, r10, 16, 0xfff8
- cmpli cr0, r11, PAGE_OFFSET@h
- mfspr r11, SPRN_M_TWB /* Get level 1 table */
- blt+ 3f
- rlwinm r11, r10, 16, 0xfff8
+ rlwinm r10, r10, 16, 0xfff8
+ cmpli cr0, r10, PAGE_OFFSET@h
#ifndef CONFIG_PIN_TLB_IMMR
- cmpli cr0, r11, VIRT_IMMR_BASE@h
+ cmpli cr6, r10, VIRT_IMMR_BASE@h
#endif
-0: cmpli cr7, r11, (PAGE_OFFSET + 0x1800000)@h
+0: cmpli cr7, r10, (PAGE_OFFSET + 0x1800000)@h
patch_site 0b, patch__dtlbmiss_linmem_top
+
+ mfspr r10, SPRN_M_TWB /* Get level 1 table */
+ blt+ 3f
#ifndef CONFIG_PIN_TLB_IMMR
-0: beq- DTLBMissIMMR
+0: beq- cr6, DTLBMissIMMR
patch_site 0b, patch__dtlbmiss_immr_jmp
#endif
blt cr7, DTLBMissLinear
- mfspr r11, SPRN_M_TWB /* Get level 1 table */
- rlwinm r11, r11, 0, 20, 31
- oris r11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
+ rlwinm r10, r10, 0, 20, 31
+ oris r10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
3:
- lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11) /* Get the level 1 entry */
+ mtcr r11
+ lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r10) /* Get level 1 entry */
mtspr SPRN_MD_TWC, r11
mfspr r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
- mtcr r12
-
/* Insert the Guarded flag into the TWC from the Linux PTE.
* It is bit 27 of both the Linux PTE and the TWC (at least
* I got that right :-). It will be better when we can put
@@ -479,7 +472,6 @@ DataStoreTLBMiss:
0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
- mfspr r12, SPRN_SPRG_SCRATCH2
rfi
patch_site 0b, patch__dtlbmiss_exit_1
@@ -490,7 +482,6 @@ DataStoreTLBMiss:
stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
- mfspr r12, SPRN_SPRG_SCRATCH2
rfi
#endif
@@ -598,7 +589,7 @@ InstructionBreakpoint:
* not enough space in the DataStoreTLBMiss area.
*/
DTLBMissIMMR:
- mtcr r12
+ mtcr r11
/* Set 512k byte guarded page and mark it valid */
li r10, MD_PS512K | MD_GUARDED | MD_SVALID
mtspr SPRN_MD_TWC, r10
@@ -613,16 +604,15 @@ DTLBMissIMMR:
0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
- mfspr r12, SPRN_SPRG_SCRATCH2
rfi
patch_site 0b, patch__dtlbmiss_exit_2
DTLBMissLinear:
- mtcr r12
+ mtcr r11
/* Set 8M byte page and mark it valid */
li r11, MD_PS8MEG | MD_SVALID
mtspr SPRN_MD_TWC, r11
- rlwinm r10, r10, 0, 0x0f800000 /* 8xx supports max 256Mb RAM */
+ rlwinm r10, r10, 20, 0x0f800000 /* 8xx supports max 256Mb RAM */
ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
_PAGE_PRESENT
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
@@ -632,24 +622,22 @@ DTLBMissLinear:
0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
- mfspr r12, SPRN_SPRG_SCRATCH2
rfi
patch_site 0b, patch__dtlbmiss_exit_3
#ifndef CONFIG_PIN_TLB_TEXT
ITLBMissLinear:
- mtcr r12
+ mtcr r11
/* Set 8M byte page and mark it valid */
li r11, MI_PS8MEG | MI_SVALID
mtspr SPRN_MI_TWC, r11
- rlwinm r10, r10, 0, 0x0f800000 /* 8xx supports max 256Mb RAM */
+ rlwinm r10, r10, 20, 0x0f800000 /* 8xx supports max 256Mb RAM */
ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
_PAGE_PRESENT
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
0: mfspr r10, SPRN_SPRG_SCRATCH0
mfspr r11, SPRN_SPRG_SCRATCH1
- mfspr r12, SPRN_SPRG_SCRATCH2
rfi
patch_site 0b, patch__itlbmiss_exit_2
#endif
@@ -661,7 +649,7 @@ ITLBMissLinear:
/* define if you don't want to use self modifying code */
#define NO_SELF_MODIFYING_CODE
FixupDAR:/* Entry point for dcbx workaround. */
- mtspr SPRN_SPRG_SCRATCH2, r10
+ mtspr SPRN_M_TW, r10
/* fetch instruction from memory. */
mfspr r10, SPRN_SRR0
mtspr SPRN_MD_EPN, r10
@@ -706,7 +694,7 @@ FixupDAR:/* Entry point for dcbx workaround. */
beq+ 142f
cmpwi cr0, r10, 1964 /* Is icbi? */
beq+ 142f
-141: mfspr r10,SPRN_SPRG_SCRATCH2
+141: mfspr r10,SPRN_M_TW
b DARFixed /* Nope, go back to normal TLB processing */
200:
@@ -741,7 +729,7 @@ modified_instr:
bne+ 143f
subf r10,r0,r10 /* r10=r10-r0, only if reg RA is r0 */
143: mtdar r10 /* store faulting EA in DAR */
- mfspr r10,SPRN_SPRG_SCRATCH2
+ mfspr r10,SPRN_M_TW
b DARFixed /* Go back to normal TLB handling */
#else
mfctr r10
@@ -795,7 +783,7 @@ modified_instr:
mfdar r11
mtctr r11 /* restore ctr reg from DAR */
mtdar r10 /* save fault EA to DAR */
- mfspr r10,SPRN_SPRG_SCRATCH2
+ mfspr r10,SPRN_M_TW
b DARFixed /* Go back to normal TLB handling */
/* special handling for r10,r11 since these are modified already */
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 10/20] powerpc/8xx: regroup TLB handler routines
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (8 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 09/20] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 11/20] powerpc/mm: don't use pte_alloc_one_kernel() before slab is available Christophe Leroy
` (9 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
As this is running with MMU off, the CPU only does speculative
fetch for code in the same page.
Following the significant size reduction of TLB handler routines,
the side handlers can be brought back close to the main part,
ie in the same page.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/kernel/head_8xx.S | 112 ++++++++++++++++++++---------------------
1 file changed, 54 insertions(+), 58 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index d69c6e3d5cc1..3e38af7489a9 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -400,6 +400,23 @@ InstructionTLBMiss:
rfi
#endif
+#ifndef CONFIG_PIN_TLB_TEXT
+ITLBMissLinear:
+ mtcr r11
+ /* Set 8M byte page and mark it valid */
+ li r11, MI_PS8MEG | MI_SVALID
+ mtspr SPRN_MI_TWC, r11
+ rlwinm r10, r10, 20, 0x0f800000 /* 8xx supports max 256Mb RAM */
+ ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+ mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
+
+0: mfspr r10, SPRN_SPRG_SCRATCH0
+ mfspr r11, SPRN_SPRG_SCRATCH1
+ rfi
+ patch_site 0b, patch__itlbmiss_exit_2
+#endif
+
. = 0x1200
DataStoreTLBMiss:
mtspr SPRN_SPRG_SCRATCH0, r10
@@ -485,6 +502,43 @@ DataStoreTLBMiss:
rfi
#endif
+DTLBMissIMMR:
+ mtcr r11
+ /* Set 512k byte guarded page and mark it valid */
+ li r10, MD_PS512K | MD_GUARDED | MD_SVALID
+ mtspr SPRN_MD_TWC, r10
+ mfspr r10, SPRN_IMMR /* Get current IMMR */
+ rlwinm r10, r10, 0, 0xfff80000 /* Get 512 kbytes boundary */
+ ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT | _PAGE_NO_CACHE
+ mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+
+ li r11, RPN_PATTERN
+ mtspr SPRN_DAR, r11 /* Tag DAR */
+
+0: mfspr r10, SPRN_SPRG_SCRATCH0
+ mfspr r11, SPRN_SPRG_SCRATCH1
+ rfi
+ patch_site 0b, patch__dtlbmiss_exit_2
+
+DTLBMissLinear:
+ mtcr r11
+ /* Set 8M byte page and mark it valid */
+ li r11, MD_PS8MEG | MD_SVALID
+ mtspr SPRN_MD_TWC, r11
+ rlwinm r10, r10, 20, 0x0f800000 /* 8xx supports max 256Mb RAM */
+ ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+ mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+
+ li r11, RPN_PATTERN
+ mtspr SPRN_DAR, r11 /* Tag DAR */
+
+0: mfspr r10, SPRN_SPRG_SCRATCH0
+ mfspr r11, SPRN_SPRG_SCRATCH1
+ rfi
+ patch_site 0b, patch__dtlbmiss_exit_3
+
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
@@ -584,64 +638,6 @@ InstructionBreakpoint:
. = 0x2000
-/*
- * Bottom part of DataStoreTLBMiss handlers for IMMR area and linear RAM.
- * not enough space in the DataStoreTLBMiss area.
- */
-DTLBMissIMMR:
- mtcr r11
- /* Set 512k byte guarded page and mark it valid */
- li r10, MD_PS512K | MD_GUARDED | MD_SVALID
- mtspr SPRN_MD_TWC, r10
- mfspr r10, SPRN_IMMR /* Get current IMMR */
- rlwinm r10, r10, 0, 0xfff80000 /* Get 512 kbytes boundary */
- ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT | _PAGE_NO_CACHE
- mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
-
- li r11, RPN_PATTERN
- mtspr SPRN_DAR, r11 /* Tag DAR */
-
-0: mfspr r10, SPRN_SPRG_SCRATCH0
- mfspr r11, SPRN_SPRG_SCRATCH1
- rfi
- patch_site 0b, patch__dtlbmiss_exit_2
-
-DTLBMissLinear:
- mtcr r11
- /* Set 8M byte page and mark it valid */
- li r11, MD_PS8MEG | MD_SVALID
- mtspr SPRN_MD_TWC, r11
- rlwinm r10, r10, 20, 0x0f800000 /* 8xx supports max 256Mb RAM */
- ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT
- mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
-
- li r11, RPN_PATTERN
- mtspr SPRN_DAR, r11 /* Tag DAR */
-
-0: mfspr r10, SPRN_SPRG_SCRATCH0
- mfspr r11, SPRN_SPRG_SCRATCH1
- rfi
- patch_site 0b, patch__dtlbmiss_exit_3
-
-#ifndef CONFIG_PIN_TLB_TEXT
-ITLBMissLinear:
- mtcr r11
- /* Set 8M byte page and mark it valid */
- li r11, MI_PS8MEG | MI_SVALID
- mtspr SPRN_MI_TWC, r11
- rlwinm r10, r10, 20, 0x0f800000 /* 8xx supports max 256Mb RAM */
- ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT
- mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
-
-0: mfspr r10, SPRN_SPRG_SCRATCH0
- mfspr r11, SPRN_SPRG_SCRATCH1
- rfi
- patch_site 0b, patch__itlbmiss_exit_2
-#endif
-
/* This is the procedure to calculate the data EA for buggy dcbx,dcbi instructions
* by decoding the registers used by the dcbx instruction and adding them.
* DAR is set to the calculated address.
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 11/20] powerpc/mm: don't use pte_alloc_one_kernel() before slab is available
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (9 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 10/20] powerpc/8xx: regroup TLB handler routines Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 12/20] powerpc/mm: inline pte_alloc_one() and pte_alloc_one_kernel() in PPC32 Christophe Leroy
` (8 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
In the same way as PPC64, let's handle pte allocation directly
in kernel_map_page() when slab is not available.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/mm/pgtable_32.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 5877f5aa8f5d..6c8a07624773 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -43,18 +43,9 @@ EXPORT_SYMBOL(ioremap_bot); /* aka VMALLOC_END */
extern char etext[], _stext[], _sinittext[], _einittext[];
-__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
- pte_t *pte;
-
- if (slab_is_available()) {
- pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
- } else {
- pte = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
- if (pte)
- clear_page(pte);
- }
- return pte;
+ return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
}
pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
@@ -222,7 +213,21 @@ void iounmap(volatile void __iomem *addr)
}
EXPORT_SYMBOL(iounmap);
-int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot)
+static __init pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
+{
+ if (!pmd_present(*pmdp)) {
+ pte_t *ptep = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
+
+ if (!ptep)
+ return NULL;
+
+ clear_page(ptep);
+ pmd_populate_kernel(&init_mm, pmdp, ptep);
+ }
+ return pte_offset_kernel(pmdp, va);
+}
+
+__ref int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot)
{
pmd_t *pd;
pte_t *pg;
@@ -231,7 +236,10 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot)
/* Use upper 10 bits of VA to index the first level map */
pd = pmd_offset(pud_offset(pgd_offset_k(va), va), va);
/* Use middle 10 bits of VA to index the second-level map */
- pg = pte_alloc_kernel(pd, va);
+ if (slab_is_available())
+ pg = pte_alloc_kernel(pd, va);
+ else
+ pg = early_pte_alloc_kernel(pd, va);
if (pg != 0) {
err = 0;
/* The PTE should never be already set nor present in the
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 12/20] powerpc/mm: inline pte_alloc_one() and pte_alloc_one_kernel() in PPC32
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (10 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 11/20] powerpc/mm: don't use pte_alloc_one_kernel() before slab is available Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 13/20] powerpc/book3s32: Remove CONFIG_BOOKE dependent code Christophe Leroy
` (7 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
As in PPC64, inline pte_alloc_one() and pte_alloc_one_kernel()
in PPC32. This will allow to switch nohash/32 to pte_fragment
without impacting hash/32.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/book3s/32/pgalloc.h | 22 ++++++++++++++++++++--
arch/powerpc/include/asm/nohash/32/pgalloc.h | 22 ++++++++++++++++++++--
arch/powerpc/mm/pgtable_32.c | 21 ---------------------
3 files changed, 40 insertions(+), 25 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 96138ab3ddd6..701748132442 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -79,8 +79,26 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
#define pmd_pgtable(pmd) pmd_page(pmd)
#endif
-extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
-extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+ return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
+{
+ struct page *ptepage;
+
+ gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT;
+
+ ptepage = alloc_pages(flags, 0);
+ if (!ptepage)
+ return NULL;
+ if (!pgtable_page_ctor(ptepage)) {
+ __free_page(ptepage);
+ return NULL;
+ }
+ return ptepage;
+}
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 6fbbb90043c0..f3fec9052f31 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -80,8 +80,26 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
#define pmd_pgtable(pmd) pmd_page(pmd)
#endif
-extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
-extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+ return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
+{
+ struct page *ptepage;
+
+ gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT;
+
+ ptepage = alloc_pages(flags, 0);
+ if (!ptepage)
+ return NULL;
+ if (!pgtable_page_ctor(ptepage)) {
+ __free_page(ptepage);
+ return NULL;
+ }
+ return ptepage;
+}
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 6c8a07624773..7900b613e6e5 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -43,27 +43,6 @@ EXPORT_SYMBOL(ioremap_bot); /* aka VMALLOC_END */
extern char etext[], _stext[], _sinittext[], _einittext[];
-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
-{
- return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-}
-
-pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
-{
- struct page *ptepage;
-
- gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT;
-
- ptepage = alloc_pages(flags, 0);
- if (!ptepage)
- return NULL;
- if (!pgtable_page_ctor(ptepage)) {
- __free_page(ptepage);
- return NULL;
- }
- return ptepage;
-}
-
void __iomem *
ioremap(phys_addr_t addr, unsigned long size)
{
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 13/20] powerpc/book3s32: Remove CONFIG_BOOKE dependent code
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (11 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 12/20] powerpc/mm: inline pte_alloc_one() and pte_alloc_one_kernel() in PPC32 Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 14/20] powerpc/mm: Move pte_fragment_alloc() to a common location Christophe Leroy
` (6 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
BOOK3S/32 cannot be BOOKE, so remove useless code
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/book3s/32/pgalloc.h | 18 ------------------
arch/powerpc/include/asm/book3s/32/pgtable.h | 14 --------------
2 files changed, 32 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 701748132442..2639b4b7d67c 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -47,8 +47,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
#define __pmd_free_tlb(tlb,x,a) do { } while (0)
/* #define pgd_populate(mm, pmd, pte) BUG() */
-#ifndef CONFIG_BOOKE
-
static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
pte_t *pte)
{
@@ -62,22 +60,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
}
#define pmd_pgtable(pmd) pmd_page(pmd)
-#else
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
- pte_t *pte)
-{
- *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
-}
-
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
- pgtable_t pte_page)
-{
- *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | _PMD_PRESENT);
-}
-
-#define pmd_pgtable(pmd) pmd_page(pmd)
-#endif
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 5ffb7e3b211f..7a8a590f6b4c 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -334,24 +334,10 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
#define __HAVE_ARCH_PTE_SAME
#define pte_same(A,B) (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
-/*
- * Note that on Book E processors, the pmd contains the kernel virtual
- * (lowmem) address of the pte page. The physical address is less useful
- * because everything runs with translation enabled (even the TLB miss
- * handler). On everything else the pmd contains the physical address
- * of the pte page. -- paulus
- */
-#ifndef CONFIG_BOOKE
#define pmd_page_vaddr(pmd) \
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
#define pmd_page(pmd) \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
-#else
-#define pmd_page_vaddr(pmd) \
- ((unsigned long) (pmd_val(pmd) & PAGE_MASK))
-#define pmd_page(pmd) \
- pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
-#endif
/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(address) pgd_offset(&init_mm, address)
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 14/20] powerpc/mm: Move pte_fragment_alloc() to a common location
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (12 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 13/20] powerpc/book3s32: Remove CONFIG_BOOKE dependent code Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 15/20] powerpc/mm: Avoid useless lock with single page fragments Christophe Leroy
` (5 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
In preparation of next patch which generalises the use of
pte_fragment_alloc() for all, this patch moves the related functions
in a place that is common to all subarches.
The 8xx will need that for supporting 16k pages, as in that mode
page tables still have a size of 4k.
Since pte_fragment with only once fragment is not different
from what is done in the general case, we can easily migrate all
subarchs to pte fragments.
For the time being, it is only code move. We enclose it inside
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/mm/Makefile | 4 +-
arch/powerpc/mm/mmu_context.c | 1 -
arch/powerpc/mm/mmu_context_book3s64.c | 67 -------------
arch/powerpc/mm/pgtable-book3s64.c | 85 -----------------
arch/powerpc/mm/pgtable-frag.c | 167 +++++++++++++++++++++++++++++++++
5 files changed, 170 insertions(+), 154 deletions(-)
create mode 100644 arch/powerpc/mm/pgtable-frag.c
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 3c844bdd16c4..bd43b3ee52cb 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -15,7 +15,9 @@ obj-$(CONFIG_PPC_MMU_NOHASH) += mmu_context_nohash.o tlb_nohash.o \
obj-$(CONFIG_PPC_BOOK3E) += tlb_low_$(BITS)e.o
hash64-$(CONFIG_PPC_NATIVE) := hash_native_64.o
obj-$(CONFIG_PPC_BOOK3E_64) += pgtable-book3e.o
-obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o slb_low.o slb.o $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o
+obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o slb_low.o slb.o \
+ $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o \
+ pgtable-frag.o
obj-$(CONFIG_PPC_RADIX_MMU) += pgtable-radix.o tlb-radix.o
obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o hash_low_32.o mmu_context_hash32.o
obj-$(CONFIG_PPC_STD_MMU) += tlb_hash$(BITS).o
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index f84e14f23e50..b89e7dcc14cc 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -96,4 +96,3 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
*/
switch_mmu_context(prev, next, tsk);
}
-
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index dbd8f762140b..417b0cb67584 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -155,50 +155,6 @@ static void destroy_contexts(mm_context_t *ctx)
}
}
-static void pte_frag_destroy(void *pte_frag)
-{
- int count;
- struct page *page;
-
- page = virt_to_page(pte_frag);
- /* drop all the pending references */
- count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
- /* We allow PTE_FRAG_NR fragments from a PTE page */
- if (atomic_sub_and_test(PTE_FRAG_NR - count, &page->pt_frag_refcount)) {
- pgtable_page_dtor(page);
- __free_page(page);
- }
-}
-
-static void pmd_frag_destroy(void *pmd_frag)
-{
- int count;
- struct page *page;
-
- page = virt_to_page(pmd_frag);
- /* drop all the pending references */
- count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
- /* We allow PTE_FRAG_NR fragments from a PTE page */
- if (atomic_sub_and_test(PMD_FRAG_NR - count, &page->pt_frag_refcount)) {
- pgtable_pmd_page_dtor(page);
- __free_page(page);
- }
-}
-
-static void destroy_pagetable_cache(struct mm_struct *mm)
-{
- void *frag;
-
- frag = mm->context.pte_frag;
- if (frag)
- pte_frag_destroy(frag);
-
- frag = mm->context.pmd_frag;
- if (frag)
- pmd_frag_destroy(frag);
- return;
-}
-
void destroy_context(struct mm_struct *mm)
{
#ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -212,29 +168,6 @@ void destroy_context(struct mm_struct *mm)
mm->context.id = MMU_NO_CONTEXT;
}
-void arch_exit_mmap(struct mm_struct *mm)
-{
- destroy_pagetable_cache(mm);
-
- if (radix_enabled()) {
- /*
- * Radix doesn't have a valid bit in the process table
- * entries. However we know that at least P9 implementation
- * will avoid caching an entry with an invalid RTS field,
- * and 0 is invalid. So this will do.
- *
- * This runs before the "fullmm" tlb flush in exit_mmap,
- * which does a RIC=2 tlbie to clear the process table
- * entry. See the "fullmm" comments in tlb-radix.c.
- *
- * No barrier required here after the store because
- * this process will do the invalidate, which starts with
- * ptesync.
- */
- process_tb[mm->context.id].prtb0 = 0;
- }
-}
-
#ifdef CONFIG_PPC_RADIX_MMU
void radix__switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
{
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 01d7c0f7c4f0..723cd324fa34 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -317,91 +317,6 @@ void pmd_fragment_free(unsigned long *pmd)
}
}
-static pte_t *get_pte_from_cache(struct mm_struct *mm)
-{
- void *pte_frag, *ret;
-
- spin_lock(&mm->page_table_lock);
- ret = mm->context.pte_frag;
- if (ret) {
- pte_frag = ret + PTE_FRAG_SIZE;
- /*
- * If we have taken up all the fragments mark PTE page NULL
- */
- if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
- pte_frag = NULL;
- mm->context.pte_frag = pte_frag;
- }
- spin_unlock(&mm->page_table_lock);
- return (pte_t *)ret;
-}
-
-static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
-{
- void *ret = NULL;
- struct page *page;
-
- if (!kernel) {
- page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
- if (!page)
- return NULL;
- if (!pgtable_page_ctor(page)) {
- __free_page(page);
- return NULL;
- }
- } else {
- page = alloc_page(PGALLOC_GFP);
- if (!page)
- return NULL;
- }
-
- atomic_set(&page->pt_frag_refcount, 1);
-
- ret = page_address(page);
- /*
- * if we support only one fragment just return the
- * allocated page.
- */
- if (PTE_FRAG_NR == 1)
- return ret;
- spin_lock(&mm->page_table_lock);
- /*
- * If we find pgtable_page set, we return
- * the allocated page with single fragement
- * count.
- */
- if (likely(!mm->context.pte_frag)) {
- atomic_set(&page->pt_frag_refcount, PTE_FRAG_NR);
- mm->context.pte_frag = ret + PTE_FRAG_SIZE;
- }
- spin_unlock(&mm->page_table_lock);
-
- return (pte_t *)ret;
-}
-
-pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
-{
- pte_t *pte;
-
- pte = get_pte_from_cache(mm);
- if (pte)
- return pte;
-
- return __alloc_for_ptecache(mm, kernel);
-}
-
-void pte_fragment_free(unsigned long *table, int kernel)
-{
- struct page *page = virt_to_page(table);
-
- BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
- if (atomic_dec_and_test(&page->pt_frag_refcount)) {
- if (!kernel)
- pgtable_page_dtor(page);
- __free_page(page);
- }
-}
-
static inline void pgtable_free(void *table, int index)
{
switch (index) {
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
new file mode 100644
index 000000000000..bc924822dcd6
--- /dev/null
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -0,0 +1,167 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Handling Page Tables through page fragments
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <linux/hugetlb.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm/tlb.h>
+
+static void pte_frag_destroy(void *pte_frag)
+{
+ int count;
+ struct page *page;
+
+ page = virt_to_page(pte_frag);
+ /* drop all the pending references */
+ count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
+ /* We allow PTE_FRAG_NR fragments from a PTE page */
+ if (atomic_sub_and_test(PTE_FRAG_NR - count, &page->pt_frag_refcount)) {
+ pgtable_page_dtor(page);
+ __free_page(page);
+ }
+}
+
+static void pmd_frag_destroy(void *pmd_frag)
+{
+ int count;
+ struct page *page;
+
+ page = virt_to_page(pmd_frag);
+ /* drop all the pending references */
+ count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
+ /* We allow PTE_FRAG_NR fragments from a PTE page */
+ if (atomic_sub_and_test(PMD_FRAG_NR - count, &page->pt_frag_refcount)) {
+ pgtable_pmd_page_dtor(page);
+ __free_page(page);
+ }
+}
+
+static void destroy_pagetable_cache(struct mm_struct *mm)
+{
+ void *frag;
+
+ frag = mm->context.pte_frag;
+ if (frag)
+ pte_frag_destroy(frag);
+
+ frag = mm->context.pmd_frag;
+ if (frag)
+ pmd_frag_destroy(frag);
+}
+
+void arch_exit_mmap(struct mm_struct *mm)
+{
+ destroy_pagetable_cache(mm);
+
+ if (radix_enabled()) {
+ /*
+ * Radix doesn't have a valid bit in the process table
+ * entries. However we know that at least P9 implementation
+ * will avoid caching an entry with an invalid RTS field,
+ * and 0 is invalid. So this will do.
+ *
+ * This runs before the "fullmm" tlb flush in exit_mmap,
+ * which does a RIC=2 tlbie to clear the process table
+ * entry. See the "fullmm" comments in tlb-radix.c.
+ *
+ * No barrier required here after the store because
+ * this process will do the invalidate, which starts with
+ * ptesync.
+ */
+ process_tb[mm->context.id].prtb0 = 0;
+ }
+}
+
+static pte_t *get_pte_from_cache(struct mm_struct *mm)
+{
+ void *pte_frag, *ret;
+
+ spin_lock(&mm->page_table_lock);
+ ret = mm->context.pte_frag;
+ if (ret) {
+ pte_frag = ret + PTE_FRAG_SIZE;
+ /*
+ * If we have taken up all the fragments mark PTE page NULL
+ */
+ if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
+ pte_frag = NULL;
+ mm->context.pte_frag = pte_frag;
+ }
+ spin_unlock(&mm->page_table_lock);
+ return (pte_t *)ret;
+}
+
+static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
+{
+ void *ret = NULL;
+ struct page *page;
+
+ if (!kernel) {
+ page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
+ if (!page)
+ return NULL;
+ if (!pgtable_page_ctor(page)) {
+ __free_page(page);
+ return NULL;
+ }
+ } else {
+ page = alloc_page(PGALLOC_GFP);
+ if (!page)
+ return NULL;
+ }
+
+ atomic_set(&page->pt_frag_refcount, 1);
+
+ ret = page_address(page);
+ /*
+ * if we support only one fragment just return the
+ * allocated page.
+ */
+ if (PTE_FRAG_NR == 1)
+ return ret;
+ spin_lock(&mm->page_table_lock);
+ /*
+ * If we find pgtable_page set, we return
+ * the allocated page with single fragement
+ * count.
+ */
+ if (likely(!mm->context.pte_frag)) {
+ atomic_set(&page->pt_frag_refcount, PTE_FRAG_NR);
+ mm->context.pte_frag = ret + PTE_FRAG_SIZE;
+ }
+ spin_unlock(&mm->page_table_lock);
+
+ return (pte_t *)ret;
+}
+
+pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
+{
+ pte_t *pte;
+
+ pte = get_pte_from_cache(mm);
+ if (pte)
+ return pte;
+
+ return __alloc_for_ptecache(mm, kernel);
+}
+
+void pte_fragment_free(unsigned long *table, int kernel)
+{
+ struct page *page = virt_to_page(table);
+
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
+ if (!kernel)
+ pgtable_page_dtor(page);
+ __free_page(page);
+ }
+}
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 15/20] powerpc/mm: Avoid useless lock with single page fragments
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (13 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 14/20] powerpc/mm: Move pte_fragment_alloc() to a common location Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-19 2:56 ` Aneesh Kumar K.V
2018-09-18 16:57 ` [PATCH v4 16/20] powerpc/mm: Extend pte_fragment functionality to nohash/32 Christophe Leroy
` (4 subsequent siblings)
19 siblings, 1 reply; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
There is no point in taking the page table lock as
pte_frag is always NULL when we have only one fragment.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/mm/pgtable-frag.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index bc924822dcd6..ab4910e92aaf 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -85,6 +85,9 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
{
void *pte_frag, *ret;
+ if (PTE_FRAG_NR == 1)
+ return NULL;
+
spin_lock(&mm->page_table_lock);
ret = mm->context.pte_frag;
if (ret) {
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 16/20] powerpc/mm: Extend pte_fragment functionality to nohash/32
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (14 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 15/20] powerpc/mm: Avoid useless lock with single page fragments Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-19 3:03 ` Aneesh Kumar K.V
2018-09-18 16:57 ` [PATCH v4 17/20] powerpc/8xx: Remove PTE_ATOMIC_UPDATES Christophe Leroy
` (3 subsequent siblings)
19 siblings, 1 reply; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
In order to allow the 8xx to handle pte_fragments, this patch
extends the use of pte_fragments to nohash/32 platforms.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/mmu-40x.h | 1 +
arch/powerpc/include/asm/mmu-44x.h | 1 +
arch/powerpc/include/asm/mmu-8xx.h | 1 +
arch/powerpc/include/asm/mmu-book3e.h | 1 +
arch/powerpc/include/asm/mmu_context.h | 2 +-
arch/powerpc/include/asm/nohash/32/pgalloc.h | 43 +++++++++++-----------------
arch/powerpc/include/asm/nohash/32/pgtable.h | 7 +++--
arch/powerpc/include/asm/page.h | 6 +---
arch/powerpc/include/asm/pgtable.h | 8 ++++++
arch/powerpc/mm/Makefile | 3 ++
arch/powerpc/mm/mmu_context_nohash.c | 1 +
arch/powerpc/mm/pgtable-frag.c | 6 ++++
arch/powerpc/mm/pgtable_32.c | 8 ++++--
13 files changed, 51 insertions(+), 37 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu-40x.h b/arch/powerpc/include/asm/mmu-40x.h
index 74f4edb5916e..7c77ceed71d6 100644
--- a/arch/powerpc/include/asm/mmu-40x.h
+++ b/arch/powerpc/include/asm/mmu-40x.h
@@ -58,6 +58,7 @@ typedef struct {
unsigned int id;
unsigned int active;
unsigned long vdso_base;
+ void *pte_frag;
} mm_context_t;
#endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
index 295b3dbb2698..3d72e889ae7b 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -109,6 +109,7 @@ typedef struct {
unsigned int id;
unsigned int active;
unsigned long vdso_base;
+ void *pte_frag;
} mm_context_t;
#endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index fa05aa566ece..750cef6f65e3 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -179,6 +179,7 @@ typedef struct {
unsigned int id;
unsigned int active;
unsigned long vdso_base;
+ void *pte_frag;
#ifdef CONFIG_PPC_MM_SLICES
u16 user_psize; /* page size index */
unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index e20072972e35..8e8aad5172ab 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -230,6 +230,7 @@ typedef struct {
unsigned int id;
unsigned int active;
unsigned long vdso_base;
+ void *pte_frag;
} mm_context_t;
/* Page size definitions, common between 32 and 64-bit
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b2f89b621b15..7f2c37a3f99d 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -222,7 +222,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
return 0;
}
-#ifndef CONFIG_PPC_BOOK3S_64
+#if defined(CONFIG_PPC_BOOK3E_64) || defined(CONFIG_PPC_BOOK3S_32)
static inline void arch_exit_mmap(struct mm_struct *mm)
{
}
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index f3fec9052f31..e69423ad8e2e 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -27,6 +27,9 @@ extern void __bad_pte(pmd_t *pmd);
extern struct kmem_cache *pgtable_cache[];
#define PGT_CACHE(shift) pgtable_cache[shift]
+pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel);
+void pte_fragment_free(unsigned long *table, int kernel);
+
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
@@ -58,11 +61,10 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
{
- *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_USER |
- _PMD_PRESENT);
+ *pmdp = __pmd(__pa(pte_page) | _PMD_USER | _PMD_PRESENT);
}
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
#else
static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
@@ -74,49 +76,38 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
{
- *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | _PMD_PRESENT);
+ *pmdp = __pmd((unsigned long)pte_page | _PMD_PRESENT);
}
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
#endif
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+ unsigned long address)
{
- return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ return (pte_t *)pte_fragment_alloc(mm, address, 1);
}
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
+ unsigned long address)
{
- struct page *ptepage;
-
- gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT;
-
- ptepage = alloc_pages(flags, 0);
- if (!ptepage)
- return NULL;
- if (!pgtable_page_ctor(ptepage)) {
- __free_page(ptepage);
- return NULL;
- }
- return ptepage;
+ return (pgtable_t)pte_fragment_alloc(mm, address, 0);
}
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{
- free_page((unsigned long)pte);
+ pte_fragment_free((unsigned long *)pte, 1);
}
static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
{
- pgtable_page_dtor(ptepage);
- __free_page(ptepage);
+ pte_fragment_free((unsigned long *)ptepage, 0);
}
static inline void pgtable_free(void *table, unsigned index_size)
{
if (!index_size) {
- pgtable_page_dtor(virt_to_page(table));
- free_page((unsigned long)table);
+ pte_fragment_free((unsigned long *)table, 0);
} else {
BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
kmem_cache_free(PGT_CACHE(index_size), table);
@@ -155,6 +146,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
unsigned long address)
{
tlb_flush_pgtable(tlb, address);
- pgtable_free_tlb(tlb, page_address(table), 0);
+ pgtable_free_tlb(tlb, table, 0);
}
#endif /* _ASM_POWERPC_PGALLOC_32_H */
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index d2908a8038e8..73e2b1fbdb36 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -336,12 +336,12 @@ static inline int pte_young(pte_t pte)
*/
#ifndef CONFIG_BOOKE
#define pmd_page_vaddr(pmd) \
- ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
+ ((unsigned long)__va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
#define pmd_page(pmd) \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
#else
#define pmd_page_vaddr(pmd) \
- ((unsigned long) (pmd_val(pmd) & PAGE_MASK))
+ ((unsigned long)(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
#define pmd_page(pmd) \
pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
#endif
@@ -360,7 +360,8 @@ static inline int pte_young(pte_t pte)
(pmd_bad(*(dir)) ? NULL : (pte_t *)pmd_page_vaddr(*(dir)) + \
pte_index(addr))
#define pte_offset_map(dir, addr) \
- ((pte_t *) kmap_atomic(pmd_page(*(dir))) + pte_index(addr))
+ ((pte_t *)(kmap_atomic(pmd_page(*(dir))) + \
+ (pmd_page_vaddr(*(dir)) & ~PAGE_MASK)) + pte_index(addr))
#define pte_unmap(pte) kunmap_atomic(pte)
/*
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f6a1265face2..27d1c16601ee 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -335,7 +335,7 @@ void arch_free_page(struct page *page, int order);
#endif
struct vm_area_struct;
-#ifdef CONFIG_PPC_BOOK3S_64
+#if !defined(CONFIG_PPC_BOOK3E_64) && !defined(CONFIG_PPC_BOOK3S_32)
/*
* For BOOK3s 64 with 4k and 64K linux page size
* we want to use pointers, because the page table
@@ -343,12 +343,8 @@ struct vm_area_struct;
*/
typedef pte_t *pgtable_t;
#else
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC64)
-typedef pte_t *pgtable_t;
-#else
typedef struct page *pgtable_t;
#endif
-#endif
#include <asm-generic/memory_model.h>
#endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 8b38f7730211..1865a3e4ab8c 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -94,12 +94,20 @@ unsigned long vmalloc_to_phys(void *vmalloc_addr);
void pgtable_cache_add(unsigned int shift);
void pgtable_cache_init(void);
+pte_t *early_alloc_pte(void);
+
#if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
void mark_initmem_nx(void);
#else
static inline void mark_initmem_nx(void) { }
#endif
+#ifndef PTE_FRAG_NR
+#define PTE_FRAG_NR 1
+#define PTE_FRAG_SIZE_SHIFT PAGE_SHIFT
+#define PTE_FRAG_SIZE PAGE_SIZE
+#endif
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index bd43b3ee52cb..e1deb15fe85e 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -18,6 +18,9 @@ obj-$(CONFIG_PPC_BOOK3E_64) += pgtable-book3e.o
obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o slb_low.o slb.o \
$(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o \
pgtable-frag.o
+ifndef CONFIG_PPC_BOOK3S_32
+obj-$(CONFIG_PPC32) += pgtable-frag.o
+endif
obj-$(CONFIG_PPC_RADIX_MMU) += pgtable-radix.o tlb-radix.o
obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o hash_low_32.o mmu_context_hash32.o
obj-$(CONFIG_PPC_STD_MMU) += tlb_hash$(BITS).o
diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index 4d80239ef83c..98f0ef463dc8 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -385,6 +385,7 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
#endif
mm->context.id = MMU_NO_CONTEXT;
mm->context.active = 0;
+ mm->context.pte_frag = NULL;
return 0;
}
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index ab4910e92aaf..d554a1cbc56d 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -30,6 +30,7 @@ static void pte_frag_destroy(void *pte_frag)
}
}
+#ifdef CONFIG_PPC_BOOK3S_64
static void pmd_frag_destroy(void *pmd_frag)
{
int count;
@@ -44,6 +45,7 @@ static void pmd_frag_destroy(void *pmd_frag)
__free_page(page);
}
}
+#endif
static void destroy_pagetable_cache(struct mm_struct *mm)
{
@@ -53,15 +55,18 @@ static void destroy_pagetable_cache(struct mm_struct *mm)
if (frag)
pte_frag_destroy(frag);
+#ifdef CONFIG_PPC_BOOK3S_64
frag = mm->context.pmd_frag;
if (frag)
pmd_frag_destroy(frag);
+#endif
}
void arch_exit_mmap(struct mm_struct *mm)
{
destroy_pagetable_cache(mm);
+#ifdef CONFIG_PPC_BOOK3S_64
if (radix_enabled()) {
/*
* Radix doesn't have a valid bit in the process table
@@ -79,6 +84,7 @@ void arch_exit_mmap(struct mm_struct *mm)
*/
process_tb[mm->context.id].prtb0 = 0;
}
+#endif
}
static pte_t *get_pte_from_cache(struct mm_struct *mm)
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 7900b613e6e5..81e6b18d1955 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -195,12 +195,16 @@ EXPORT_SYMBOL(iounmap);
static __init pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
{
if (!pmd_present(*pmdp)) {
- pte_t *ptep = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
+ pte_t *ptep = __va(memblock_alloc(PTE_FRAG_SIZE, PTE_FRAG_SIZE));
if (!ptep)
return NULL;
- clear_page(ptep);
+ if (PTE_FRAG_SIZE == PAGE_SIZE)
+ clear_page(ptep);
+ else
+ memset(ptep, 0, PTE_FRAG_SIZE);
+
pmd_populate_kernel(&init_mm, pmdp, ptep);
}
return pte_offset_kernel(pmdp, va);
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 17/20] powerpc/8xx: Remove PTE_ATOMIC_UPDATES
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (15 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 16/20] powerpc/mm: Extend pte_fragment functionality to nohash/32 Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 18/20] powerpc/mm: reintroduce 16K pages with HW assistance on 8xx Christophe Leroy
` (2 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
commit 1bc54c03117b9 ("powerpc: rework 4xx PTE access and TLB miss")
introduced non atomic PTE updates and started the work of removing
PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
8xx with the following comment:
/* Until my rework is finished, 8xx still needs atomic PTE updates */
commit fe11dc3f9628e ("powerpc/8xx: Update TLB asm so it behaves as
linux mm expects") removed all PTE updates done in TLB miss handlers
Therefore, atomic PTE updates are not needed anymore for the 8xx
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/nohash/32/pte-8xx.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 1c57efac089d..8c9872d93257 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -70,9 +70,6 @@
#define _PTE_NONE_MASK 0
-/* Until my rework is finished, 8xx still needs atomic PTE updates */
-#define PTE_ATOMIC_UPDATES 1
-
#ifdef CONFIG_PPC_16K_PAGES
#define _PAGE_PSIZE _PAGE_SPS
#else
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 18/20] powerpc/mm: reintroduce 16K pages with HW assistance on 8xx
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (16 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 17/20] powerpc/8xx: Remove PTE_ATOMIC_UPDATES Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 19/20] powerpc/nohash32: allow setting GUARDED attribute in the PMD directly Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 20/20] powerpc/8xx: set " Christophe Leroy
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
Using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.
In order to use hardware assistance with 16K pages, this patch does
the following modifications:
- Make PGD size independent of the main page size
- In 16k pages mode, redefine pte_t as a struct with 4 elements,
and populate those 4 elements in __set_pte_at() and pte_update()
- Adapt the size of the hugepage tables.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/nohash/32/pgtable.h | 19 ++++++++++++++++++-
arch/powerpc/include/asm/nohash/pgtable.h | 4 ++++
arch/powerpc/include/asm/pgtable-types.h | 4 ++++
4 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 33931804c46f..a80669209155 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -698,7 +698,7 @@ config PPC_4K_PAGES
config PPC_16K_PAGES
bool "16k page size"
- depends on 44x
+ depends on 44x || PPC_8xx
config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 73e2b1fbdb36..6f2b35af7a28 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -19,7 +19,14 @@ extern int icache_44x_need_flush;
#endif /* __ASSEMBLY__ */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+#define PTE_INDEX_SIZE (PTE_SHIFT - 2)
+#define PTE_FRAG_NR 4
+#define PTE_FRAG_SIZE_SHIFT 12
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+#else
#define PTE_INDEX_SIZE PTE_SHIFT
+#endif
#define PMD_INDEX_SIZE 0
#define PUD_INDEX_SIZE 0
#define PGD_INDEX_SIZE (32 - PGDIR_SHIFT)
@@ -48,7 +55,11 @@ extern int icache_44x_need_flush;
* -Matt
*/
/* PGDIR_SHIFT determines what a top-level page table entry can map */
+#ifdef CONFIG_PPC_8xx
+#define PGDIR_SHIFT 22
+#else
#define PGDIR_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE)
+#endif
#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE-1))
@@ -229,7 +240,13 @@ static inline unsigned long pte_update(pte_t *p,
: "cc" );
#else /* PTE_ATOMIC_UPDATES */
unsigned long old = pte_val(*p);
- *p = __pte((old & ~clr) | set);
+ unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+ p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+ *p = __pte(new);
+#endif
#endif /* !PTE_ATOMIC_UPDATES */
#ifdef CONFIG_44x
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index aa968d87337b..883f69e6cdf7 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -204,7 +204,11 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
/* Anything else just stores the PTE normally. That covers all 64-bit
* cases, and 32-bit non-hash with 32-bit PTEs.
*/
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+ ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
*ptep = pte;
+#endif
/*
* With hardware tablewalk, a sync is needed to ensure that
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index eccb30b38b47..3b0edf041b2e 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -3,7 +3,11 @@
#define _ASM_POWERPC_PGTABLE_TYPES_H
/* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
typedef struct { pte_basic_t pte; } pte_t;
+#endif
#define __pte(x) ((pte_t) { (x) })
static inline pte_basic_t pte_val(pte_t x)
{
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 19/20] powerpc/nohash32: allow setting GUARDED attribute in the PMD directly
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (17 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 18/20] powerpc/mm: reintroduce 16K pages with HW assistance on 8xx Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 20/20] powerpc/8xx: set " Christophe Leroy
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
On the 8xx, the GUARDED attribute of the pages is managed in the
L1 entry, therefore to avoid having to copy it into L1 entry
at each TLB miss, we have to set it in the PMD
In order to allow this, this patch splits the VM alloc space in two
parts, one for VM alloc and non Guarded IO, and one for Guarded IO.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +
arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 ++++
arch/powerpc/include/asm/nohash/32/pgtable.h | 19 ++++++++-
arch/powerpc/mm/dump_linuxpagetables.c | 21 +++++++++-
arch/powerpc/mm/mem.c | 7 ++++
arch/powerpc/mm/pgtable_32.c | 60 ++++++++++++++++++++++++----
arch/powerpc/platforms/Kconfig.cputype | 2 +
7 files changed, 108 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 7a8a590f6b4c..28001d5eaa89 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -156,6 +156,8 @@ static inline bool pte_user(pte_t pte)
#define IOREMAP_TOP KVIRT_TOP
#endif
+#define IOREMAP_BASE VMALLOC_START
+
/*
* Just any arbitrary offset to the start of the vmalloc VM area: the
* current 16MB value just means that there will be a 64MB "hole" after the
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index e69423ad8e2e..7d8de0b73aad 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -58,6 +58,14 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
*pmdp = __pmd(__pa(pte) | _PMD_PRESENT);
}
+#ifdef CONFIG_PPC_PMD_GUARDED
+static inline void pmd_populate_kernel_g(struct mm_struct *mm, pmd_t *pmdp,
+ pte_t *pte)
+{
+ *pmdp = __pmd(__pa(pte) | _PMD_PRESENT | _PMD_GUARDED);
+}
+#endif
+
static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
{
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 6f2b35af7a28..9a328eda89a5 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -81,9 +81,14 @@ extern int icache_44x_need_flush;
* virtual space that goes below PKMAP and FIXMAP
*/
#ifdef CONFIG_HIGHMEM
-#define KVIRT_TOP PKMAP_BASE
+#define _KVIRT_TOP PKMAP_BASE
#else
-#define KVIRT_TOP (0xfe000000UL) /* for now, could be FIXMAP_BASE ? */
+#define _KVIRT_TOP (0xfe000000UL) /* for now, could be FIXMAP_BASE ? */
+#endif
+#ifdef CONFIG_PPC_PMD_GUARDED
+#define KVIRT_TOP _ALIGN_DOWN(_KVIRT_TOP, PGDIR_SIZE)
+#else
+#define KVIRT_TOP _KVIRT_TOP
#endif
/*
@@ -96,6 +101,12 @@ extern int icache_44x_need_flush;
#else
#define IOREMAP_TOP KVIRT_TOP
#endif
+#ifdef CONFIG_PPC_PMD_GUARDED
+#define IOREMAP_BASE _ALIGN_UP(VMALLOC_START + (IOREMAP_TOP - VMALLOC_START) / 2, \
+ PGDIR_SIZE)
+#else
+#define IOREMAP_BASE VMALLOC_START
+#endif
/*
* Just any arbitrary offset to the start of the vmalloc VM area: the
@@ -120,7 +131,11 @@ extern int icache_44x_need_flush;
#else
#define VMALLOC_START ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
#endif
+#ifdef CONFIG_PPC_PMD_GUARDED
+#define VMALLOC_END IOREMAP_BASE
+#else
#define VMALLOC_END ioremap_bot
+#endif
/*
* Bits in a linux-style PTE. These match the bits in the
diff --git a/arch/powerpc/mm/dump_linuxpagetables.c b/arch/powerpc/mm/dump_linuxpagetables.c
index e60aa6d7456d..105d0118f735 100644
--- a/arch/powerpc/mm/dump_linuxpagetables.c
+++ b/arch/powerpc/mm/dump_linuxpagetables.c
@@ -76,9 +76,9 @@ struct addr_marker {
static struct addr_marker address_markers[] = {
{ 0, "Start of kernel VM" },
+#ifdef CONFIG_PPC64
{ 0, "vmalloc() Area" },
{ 0, "vmalloc() End" },
-#ifdef CONFIG_PPC64
{ 0, "isa I/O start" },
{ 0, "isa I/O end" },
{ 0, "phb I/O start" },
@@ -87,8 +87,19 @@ static struct addr_marker address_markers[] = {
{ 0, "I/O remap end" },
{ 0, "vmemmap start" },
#else
+#ifdef CONFIG_PPC_PMD_GUARDED
+ { 0, "vmalloc() Area" },
+ { 0, "vmalloc() End" },
+ { 0, "Early I/O remap start" },
+ { 0, "Early I/O remap end" },
+ { 0, "I/O remap start" },
+ { 0, "I/O remap end" },
+#else
{ 0, "Early I/O remap start" },
{ 0, "Early I/O remap end" },
+ { 0, "vmalloc() I/O remap start" },
+ { 0, "vmalloc() I/O remap end" },
+#endif
#ifdef CONFIG_NOT_COHERENT_CACHE
{ 0, "Consistent mem start" },
{ 0, "Consistent mem end" },
@@ -286,9 +297,9 @@ static void populate_markers(void)
int i = 0;
address_markers[i++].start_address = PAGE_OFFSET;
+#ifdef CONFIG_PPC64
address_markers[i++].start_address = VMALLOC_START;
address_markers[i++].start_address = VMALLOC_END;
-#ifdef CONFIG_PPC64
address_markers[i++].start_address = ISA_IO_BASE;
address_markers[i++].start_address = ISA_IO_END;
address_markers[i++].start_address = PHB_IO_BASE;
@@ -301,6 +312,12 @@ static void populate_markers(void)
address_markers[i++].start_address = VMEMMAP_BASE;
#endif
#else /* !CONFIG_PPC64 */
+#ifdef CONFIG_PPC_PMD_GUARDED
+ address_markers[i++].start_address = VMALLOC_START;
+ address_markers[i++].start_address = VMALLOC_END;
+#endif
+ address_markers[i++].start_address = IOREMAP_BASE;
+ address_markers[i++].start_address = ioremap_bot;
address_markers[i++].start_address = ioremap_bot;
address_markers[i++].start_address = IOREMAP_TOP;
#ifdef CONFIG_NOT_COHERENT_CACHE
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 0ba0cdb3f759..d710996f356a 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -387,8 +387,15 @@ void __init mem_init(void)
#endif /* CONFIG_NOT_COHERENT_CACHE */
pr_info(" * 0x%08lx..0x%08lx : early ioremap\n",
ioremap_bot, IOREMAP_TOP);
+#ifdef CONFIG_PPC_PMD_GUARDED
+ pr_info(" * 0x%08lx..0x%08lx : ioremap\n",
+ IOREMAP_BASE, ioremap_bot);
+ pr_info(" * 0x%08lx..0x%08lx : vmalloc\n",
+ VMALLOC_START, VMALLOC_END);
+#else
pr_info(" * 0x%08lx..0x%08lx : vmalloc & ioremap\n",
VMALLOC_START, VMALLOC_END);
+#endif
#endif /* CONFIG_PPC32 */
}
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 81e6b18d1955..d6173ac120d6 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -151,7 +151,14 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, pgprot_t prot, void *call
if (slab_is_available()) {
struct vm_struct *area;
- area = get_vm_area_caller(size, VM_IOREMAP, caller);
+ bool is_g = pgprot_val(prot) & _PAGE_GUARDED;
+
+ if (IS_ENABLED(CONFIG_PPC_PMD_GUARDED) && is_g)
+ area = __get_vm_area_caller(size, VM_IOREMAP, IOREMAP_BASE,
+ ioremap_bot, caller);
+ else
+ area = get_vm_area_caller(size, VM_IOREMAP, caller);
+
if (area == 0)
return NULL;
area->phys_addr = p;
@@ -192,7 +199,38 @@ void iounmap(volatile void __iomem *addr)
}
EXPORT_SYMBOL(iounmap);
-static __init pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
+#ifdef CONFIG_PPC_PMD_GUARDED
+static int __pte_alloc_kernel_g(pmd_t *pmd, unsigned long address)
+{
+ pte_t *new = pte_alloc_one_kernel(&init_mm, address);
+ if (!new)
+ return -ENOMEM;
+
+ smp_wmb(); /* See comment in __pte_alloc */
+
+ spin_lock(&init_mm.page_table_lock);
+ if (likely(pmd_none(*pmd))) { /* Has another populated it ? */
+ pmd_populate_kernel_g(&init_mm, pmd, new);
+ new = NULL;
+ }
+ spin_unlock(&init_mm.page_table_lock);
+ if (new)
+ pte_free_kernel(&init_mm, new);
+ return 0;
+}
+
+static pte_t *pte_alloc_kernel_g(pmd_t *pmd, unsigned long address)
+{
+ if (unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel_g(pmd, address))
+ return NULL;
+ return pte_offset_kernel(pmd, address);
+}
+#else
+#define pte_alloc_kernel_g(pmd, address) pte_alloc_kernel(pmd, address)
+#define pmd_populate_kernel_g pmd_populate_kernel
+#endif
+
+static __init pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va, bool is_g)
{
if (!pmd_present(*pmdp)) {
pte_t *ptep = __va(memblock_alloc(PTE_FRAG_SIZE, PTE_FRAG_SIZE));
@@ -205,7 +243,10 @@ static __init pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
else
memset(ptep, 0, PTE_FRAG_SIZE);
- pmd_populate_kernel(&init_mm, pmdp, ptep);
+ if (is_g)
+ pmd_populate_kernel_g(&init_mm, pmdp, ptep);
+ else
+ pmd_populate_kernel(&init_mm, pmdp, ptep);
}
return pte_offset_kernel(pmdp, va);
}
@@ -215,14 +256,19 @@ __ref int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot)
pmd_t *pd;
pte_t *pg;
int err = -ENOMEM;
+ bool is_g = pgprot_val(prot) & _PAGE_GUARDED;
/* Use upper 10 bits of VA to index the first level map */
pd = pmd_offset(pud_offset(pgd_offset_k(va), va), va);
/* Use middle 10 bits of VA to index the second-level map */
- if (slab_is_available())
- pg = pte_alloc_kernel(pd, va);
- else
- pg = early_pte_alloc_kernel(pd, va);
+ if (slab_is_available()) {
+ if (is_g)
+ pg = pte_alloc_kernel_g(pd, va);
+ else
+ pg = pte_alloc_kernel(pd, va);
+ } else {
+ pg = early_pte_alloc_kernel(pd, va, is_g);
+ }
if (pg != 0) {
err = 0;
/* The PTE should never be already set nor present in the
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 6c6a7c72cae4..d0984546fbec 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -355,6 +355,8 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
def_bool y
depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
+config PPC_PMD_GUARDED
+ bool
config PPC_MMU_NOHASH
def_bool y
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 20/20] powerpc/8xx: set GUARDED attribute in the PMD directly
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
` (18 preceding siblings ...)
2018-09-18 16:57 ` [PATCH v4 19/20] powerpc/nohash32: allow setting GUARDED attribute in the PMD directly Christophe Leroy
@ 2018-09-18 16:57 ` Christophe Leroy
19 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2018-09-18 16:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
On the 8xx, the GUARDED attribute of the pages is managed in the
L1 entry, therefore to avoid having to copy it into L1 entry
at each TLB miss, we set it in the PMD.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/include/asm/nohash/32/pte-8xx.h | 3 ++-
arch/powerpc/kernel/head_8xx.S | 9 ---------
arch/powerpc/platforms/Kconfig.cputype | 1 +
3 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 8c9872d93257..20d4c1c04726 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -62,10 +62,11 @@
#define _PMD_PRESENT 0x0001
#define _PMD_PRESENT_MASK _PMD_PRESENT
-#define _PMD_BAD 0x0fd0
+#define _PMD_BAD 0x0fc0
#define _PMD_PAGE_MASK 0x000c
#define _PMD_PAGE_8M 0x000c
#define _PMD_PAGE_512K 0x0004
+#define _PMD_GUARDED 0x0010
#define _PMD_USER 0x0020 /* APG 1 */
#define _PTE_NONE_MASK 0
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3e38af7489a9..89974c938617 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -452,15 +452,6 @@ DataStoreTLBMiss:
mfspr r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
- /* Insert the Guarded flag into the TWC from the Linux PTE.
- * It is bit 27 of both the Linux PTE and the TWC (at least
- * I got that right :-). It will be better when we can put
- * this into the Linux pgd/pmd and load it in the operation
- * above.
- */
- rlwimi r11, r10, 0, _PAGE_GUARDED
- mtspr SPRN_MD_TWC, r11
-
/* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
* We also need to know if the insn is a load/store, so:
* Clear _PAGE_PRESENT and load that which will
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index d0984546fbec..c92d084a5a23 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -34,6 +34,7 @@ config PPC_8xx
bool "Freescale 8xx"
select FSL_SOC
select SYS_SUPPORTS_HUGETLBFS
+ select PPC_PMD_GUARDED
config 40x
bool "AMCC 40x"
--
2.13.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v4 15/20] powerpc/mm: Avoid useless lock with single page fragments
2018-09-18 16:57 ` [PATCH v4 15/20] powerpc/mm: Avoid useless lock with single page fragments Christophe Leroy
@ 2018-09-19 2:56 ` Aneesh Kumar K.V
2018-09-25 16:49 ` Christophe LEROY
0 siblings, 1 reply; 25+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-19 2:56 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
On 9/18/18 10:27 PM, Christophe Leroy wrote:
> There is no point in taking the page table lock as
> pte_frag is always NULL when we have only one fragment.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> arch/powerpc/mm/pgtable-frag.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index bc924822dcd6..ab4910e92aaf 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -85,6 +85,9 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
> {
> void *pte_frag, *ret;
>
> + if (PTE_FRAG_NR == 1)
> + return NULL;
> +
> spin_lock(&mm->page_table_lock);
> ret = mm->context.pte_frag;
> if (ret) {
>
May be update get_pmd_from_cache too?
-aneesh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 16/20] powerpc/mm: Extend pte_fragment functionality to nohash/32
2018-09-18 16:57 ` [PATCH v4 16/20] powerpc/mm: Extend pte_fragment functionality to nohash/32 Christophe Leroy
@ 2018-09-19 3:03 ` Aneesh Kumar K.V
2018-09-25 16:48 ` Christophe LEROY
0 siblings, 1 reply; 25+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-19 3:03 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
On 9/18/18 10:27 PM, Christophe Leroy wrote:
> In order to allow the 8xx to handle pte_fragments, this patch
> extends the use of pte_fragments to nohash/32 platforms.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> arch/powerpc/include/asm/mmu-40x.h | 1 +
> arch/powerpc/include/asm/mmu-44x.h | 1 +
> arch/powerpc/include/asm/mmu-8xx.h | 1 +
> arch/powerpc/include/asm/mmu-book3e.h | 1 +
> arch/powerpc/include/asm/mmu_context.h | 2 +-
> arch/powerpc/include/asm/nohash/32/pgalloc.h | 43 +++++++++++-----------------
> arch/powerpc/include/asm/nohash/32/pgtable.h | 7 +++--
> arch/powerpc/include/asm/page.h | 6 +---
> arch/powerpc/include/asm/pgtable.h | 8 ++++++
> arch/powerpc/mm/Makefile | 3 ++
> arch/powerpc/mm/mmu_context_nohash.c | 1 +
> arch/powerpc/mm/pgtable-frag.c | 6 ++++
> arch/powerpc/mm/pgtable_32.c | 8 ++++--
> 13 files changed, 51 insertions(+), 37 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-40x.h b/arch/powerpc/include/asm/mmu-40x.h
> index 74f4edb5916e..7c77ceed71d6 100644
> --- a/arch/powerpc/include/asm/mmu-40x.h
> +++ b/arch/powerpc/include/asm/mmu-40x.h
> @@ -58,6 +58,7 @@ typedef struct {
> unsigned int id;
> unsigned int active;
> unsigned long vdso_base;
> + void *pte_frag;
> } mm_context_t;
>
> #endif /* !__ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
> index 295b3dbb2698..3d72e889ae7b 100644
> --- a/arch/powerpc/include/asm/mmu-44x.h
> +++ b/arch/powerpc/include/asm/mmu-44x.h
> @@ -109,6 +109,7 @@ typedef struct {
> unsigned int id;
> unsigned int active;
> unsigned long vdso_base;
> + void *pte_frag;
> } mm_context_t;
>
> #endif /* !__ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
> index fa05aa566ece..750cef6f65e3 100644
> --- a/arch/powerpc/include/asm/mmu-8xx.h
> +++ b/arch/powerpc/include/asm/mmu-8xx.h
> @@ -179,6 +179,7 @@ typedef struct {
> unsigned int id;
> unsigned int active;
> unsigned long vdso_base;
> + void *pte_frag;
> #ifdef CONFIG_PPC_MM_SLICES
> u16 user_psize; /* page size index */
> unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
> diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
> index e20072972e35..8e8aad5172ab 100644
> --- a/arch/powerpc/include/asm/mmu-book3e.h
> +++ b/arch/powerpc/include/asm/mmu-book3e.h
> @@ -230,6 +230,7 @@ typedef struct {
> unsigned int id;
> unsigned int active;
> unsigned long vdso_base;
> + void *pte_frag;
> } mm_context_t;
>
> /* Page size definitions, common between 32 and 64-bit
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index b2f89b621b15..7f2c37a3f99d 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -222,7 +222,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
> return 0;
> }
>
> -#ifndef CONFIG_PPC_BOOK3S_64
> +#if defined(CONFIG_PPC_BOOK3E_64) || defined(CONFIG_PPC_BOOK3S_32)
> static inline void arch_exit_mmap(struct mm_struct *mm)
> {
> }
> diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> index f3fec9052f31..e69423ad8e2e 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> @@ -27,6 +27,9 @@ extern void __bad_pte(pmd_t *pmd);
> extern struct kmem_cache *pgtable_cache[];
> #define PGT_CACHE(shift) pgtable_cache[shift]
>
> +pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel);
> +void pte_fragment_free(unsigned long *table, int kernel);
> +
> static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> {
> return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
> @@ -58,11 +61,10 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
> static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
> pgtable_t pte_page)
> {
> - *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_USER |
> - _PMD_PRESENT);
> + *pmdp = __pmd(__pa(pte_page) | _PMD_USER | _PMD_PRESENT);
> }
>
> -#define pmd_pgtable(pmd) pmd_page(pmd)
> +#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
> #else
>
> static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
> @@ -74,49 +76,38 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
> static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
> pgtable_t pte_page)
> {
> - *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | _PMD_PRESENT);
> + *pmdp = __pmd((unsigned long)pte_page | _PMD_PRESENT);
> }
>
> -#define pmd_pgtable(pmd) pmd_page(pmd)
> +#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
> #endif
>
> -static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
> + unsigned long address)
> {
> - return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> + return (pte_t *)pte_fragment_alloc(mm, address, 1);
> }
>
> -static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
> +static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
> + unsigned long address)
> {
> - struct page *ptepage;
> -
> - gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT;
> -
> - ptepage = alloc_pages(flags, 0);
> - if (!ptepage)
> - return NULL;
> - if (!pgtable_page_ctor(ptepage)) {
> - __free_page(ptepage);
> - return NULL;
> - }
> - return ptepage;
> + return (pgtable_t)pte_fragment_alloc(mm, address, 0);
> }
>
> static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
> {
> - free_page((unsigned long)pte);
> + pte_fragment_free((unsigned long *)pte, 1);
> }
>
> static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
> {
> - pgtable_page_dtor(ptepage);
> - __free_page(ptepage);
> + pte_fragment_free((unsigned long *)ptepage, 0);
> }
>
> static inline void pgtable_free(void *table, unsigned index_size)
> {
> if (!index_size) {
> - pgtable_page_dtor(virt_to_page(table));
> - free_page((unsigned long)table);
> + pte_fragment_free((unsigned long *)table, 0);
> } else {
> BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
> kmem_cache_free(PGT_CACHE(index_size), table);
> @@ -155,6 +146,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
> unsigned long address)
> {
> tlb_flush_pgtable(tlb, address);
> - pgtable_free_tlb(tlb, page_address(table), 0);
> + pgtable_free_tlb(tlb, table, 0);
> }
> #endif /* _ASM_POWERPC_PGALLOC_32_H */
> diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
> index d2908a8038e8..73e2b1fbdb36 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@ -336,12 +336,12 @@ static inline int pte_young(pte_t pte)
> */
> #ifndef CONFIG_BOOKE
> #define pmd_page_vaddr(pmd) \
> - ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
> + ((unsigned long)__va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
> #define pmd_page(pmd) \
> pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
> #else
> #define pmd_page_vaddr(pmd) \
> - ((unsigned long) (pmd_val(pmd) & PAGE_MASK))
> + ((unsigned long)(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
> #define pmd_page(pmd) \
> pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
> #endif
> @@ -360,7 +360,8 @@ static inline int pte_young(pte_t pte)
> (pmd_bad(*(dir)) ? NULL : (pte_t *)pmd_page_vaddr(*(dir)) + \
> pte_index(addr))
> #define pte_offset_map(dir, addr) \
> - ((pte_t *) kmap_atomic(pmd_page(*(dir))) + pte_index(addr))
> + ((pte_t *)(kmap_atomic(pmd_page(*(dir))) + \
> + (pmd_page_vaddr(*(dir)) & ~PAGE_MASK)) + pte_index(addr))
> #define pte_unmap(pte) kunmap_atomic(pte)
>
> /*
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index f6a1265face2..27d1c16601ee 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -335,7 +335,7 @@ void arch_free_page(struct page *page, int order);
> #endif
>
> struct vm_area_struct;
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#if !defined(CONFIG_PPC_BOOK3E_64) && !defined(CONFIG_PPC_BOOK3S_32)
> /*
> * For BOOK3s 64 with 4k and 64K linux page size
> * we want to use pointers, because the page table
> @@ -343,12 +343,8 @@ struct vm_area_struct;
> */
> typedef pte_t *pgtable_t;
> #else
> -#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC64)
> -typedef pte_t *pgtable_t;
> -#else
> typedef struct page *pgtable_t;
> #endif
> -#endif
>
Now that is getting complicated. Is there a way to move that to platform
header instead of that complicated #if?
> #include <asm-generic/memory_model.h>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 8b38f7730211..1865a3e4ab8c 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -94,12 +94,20 @@ unsigned long vmalloc_to_phys(void *vmalloc_addr);
> void pgtable_cache_add(unsigned int shift);
> void pgtable_cache_init(void);
>
> +pte_t *early_alloc_pte(void);
> +
> #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
> void mark_initmem_nx(void);
> #else
> static inline void mark_initmem_nx(void) { }
> #endif
>
> +#ifndef PTE_FRAG_NR
> +#define PTE_FRAG_NR 1
> +#define PTE_FRAG_SIZE_SHIFT PAGE_SHIFT
> +#define PTE_FRAG_SIZE PAGE_SIZE
> +#endif
> +
IMHO we should avoid that. The #ifndef challenge is that we should
always make sure the header inclusion is correct so that platform
headers get included before. Why not move it to the platform that want
to use pte fragmentation?
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_POWERPC_PGTABLE_H */
> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> index bd43b3ee52cb..e1deb15fe85e 100644
> --- a/arch/powerpc/mm/Makefile
> +++ b/arch/powerpc/mm/Makefile
> @@ -18,6 +18,9 @@ obj-$(CONFIG_PPC_BOOK3E_64) += pgtable-book3e.o
> obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o slb_low.o slb.o \
> $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o \
> pgtable-frag.o
> +ifndef CONFIG_PPC_BOOK3S_32
> +obj-$(CONFIG_PPC32) += pgtable-frag.o
> +endif
> obj-$(CONFIG_PPC_RADIX_MMU) += pgtable-radix.o tlb-radix.o
> obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o hash_low_32.o mmu_context_hash32.o
> obj-$(CONFIG_PPC_STD_MMU) += tlb_hash$(BITS).o
> diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
> index 4d80239ef83c..98f0ef463dc8 100644
> --- a/arch/powerpc/mm/mmu_context_nohash.c
> +++ b/arch/powerpc/mm/mmu_context_nohash.c
> @@ -385,6 +385,7 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
> #endif
> mm->context.id = MMU_NO_CONTEXT;
> mm->context.active = 0;
> + mm->context.pte_frag = NULL;
> return 0;
> }
>
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index ab4910e92aaf..d554a1cbc56d 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -30,6 +30,7 @@ static void pte_frag_destroy(void *pte_frag)
> }
> }
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> static void pmd_frag_destroy(void *pmd_frag)
> {
> int count;
> @@ -44,6 +45,7 @@ static void pmd_frag_destroy(void *pmd_frag)
> __free_page(page);
> }
> }
> +#endif
>
> static void destroy_pagetable_cache(struct mm_struct *mm)
> {
> @@ -53,15 +55,18 @@ static void destroy_pagetable_cache(struct mm_struct *mm)
> if (frag)
> pte_frag_destroy(frag);
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> frag = mm->context.pmd_frag;
> if (frag)
> pmd_frag_destroy(frag);
> +#endif
> }
>
> void arch_exit_mmap(struct mm_struct *mm)
> {
> destroy_pagetable_cache(mm);
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> if (radix_enabled()) {
> /*
> * Radix doesn't have a valid bit in the process table
> @@ -79,6 +84,7 @@ void arch_exit_mmap(struct mm_struct *mm)
> */
> process_tb[mm->context.id].prtb0 = 0;
> }
> +#endif
> }
>
is there a way to avoid all that #ifdef? May be redo the frag code such
that we have few helpers that is platform independent?
> static pte_t *get_pte_from_cache(struct mm_struct *mm)
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 7900b613e6e5..81e6b18d1955 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -195,12 +195,16 @@ EXPORT_SYMBOL(iounmap);
> static __init pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
> {
> if (!pmd_present(*pmdp)) {
> - pte_t *ptep = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
> + pte_t *ptep = __va(memblock_alloc(PTE_FRAG_SIZE, PTE_FRAG_SIZE));
>
> if (!ptep)
> return NULL;
>
> - clear_page(ptep);
> + if (PTE_FRAG_SIZE == PAGE_SIZE)
> + clear_page(ptep);
> + else
> + memset(ptep, 0, PTE_FRAG_SIZE);
> +
> pmd_populate_kernel(&init_mm, pmdp, ptep);
> }
> return pte_offset_kernel(pmdp, va);
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 16/20] powerpc/mm: Extend pte_fragment functionality to nohash/32
2018-09-19 3:03 ` Aneesh Kumar K.V
@ 2018-09-25 16:48 ` Christophe LEROY
0 siblings, 0 replies; 25+ messages in thread
From: Christophe LEROY @ 2018-09-25 16:48 UTC (permalink / raw)
To: Aneesh Kumar K.V, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
Le 19/09/2018 à 05:03, Aneesh Kumar K.V a écrit :
> On 9/18/18 10:27 PM, Christophe Leroy wrote:
>> In order to allow the 8xx to handle pte_fragments, this patch
>> extends the use of pte_fragments to nohash/32 platforms.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> arch/powerpc/include/asm/mmu-40x.h | 1 +
>> arch/powerpc/include/asm/mmu-44x.h | 1 +
>> arch/powerpc/include/asm/mmu-8xx.h | 1 +
>> arch/powerpc/include/asm/mmu-book3e.h | 1 +
>> arch/powerpc/include/asm/mmu_context.h | 2 +-
>> arch/powerpc/include/asm/nohash/32/pgalloc.h | 43
>> +++++++++++-----------------
>> arch/powerpc/include/asm/nohash/32/pgtable.h | 7 +++--
>> arch/powerpc/include/asm/page.h | 6 +---
>> arch/powerpc/include/asm/pgtable.h | 8 ++++++
>> arch/powerpc/mm/Makefile | 3 ++
>> arch/powerpc/mm/mmu_context_nohash.c | 1 +
>> arch/powerpc/mm/pgtable-frag.c | 6 ++++
>> arch/powerpc/mm/pgtable_32.c | 8 ++++--
>> 13 files changed, 51 insertions(+), 37 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu-40x.h
>> b/arch/powerpc/include/asm/mmu-40x.h
>> index 74f4edb5916e..7c77ceed71d6 100644
>> --- a/arch/powerpc/include/asm/mmu-40x.h
>> +++ b/arch/powerpc/include/asm/mmu-40x.h
>> @@ -58,6 +58,7 @@ typedef struct {
>> unsigned int id;
>> unsigned int active;
>> unsigned long vdso_base;
>> + void *pte_frag;
>> } mm_context_t;
>>
>> #endif /* !__ASSEMBLY__ */
>> diff --git a/arch/powerpc/include/asm/mmu-44x.h
>> b/arch/powerpc/include/asm/mmu-44x.h
>> index 295b3dbb2698..3d72e889ae7b 100644
>> --- a/arch/powerpc/include/asm/mmu-44x.h
>> +++ b/arch/powerpc/include/asm/mmu-44x.h
>> @@ -109,6 +109,7 @@ typedef struct {
>> unsigned int id;
>> unsigned int active;
>> unsigned long vdso_base;
>> + void *pte_frag;
>> } mm_context_t;
>>
>> #endif /* !__ASSEMBLY__ */
>> diff --git a/arch/powerpc/include/asm/mmu-8xx.h
>> b/arch/powerpc/include/asm/mmu-8xx.h
>> index fa05aa566ece..750cef6f65e3 100644
>> --- a/arch/powerpc/include/asm/mmu-8xx.h
>> +++ b/arch/powerpc/include/asm/mmu-8xx.h
>> @@ -179,6 +179,7 @@ typedef struct {
>> unsigned int id;
>> unsigned int active;
>> unsigned long vdso_base;
>> + void *pte_frag;
>> #ifdef CONFIG_PPC_MM_SLICES
>> u16 user_psize; /* page size index */
>> unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
>> diff --git a/arch/powerpc/include/asm/mmu-book3e.h
>> b/arch/powerpc/include/asm/mmu-book3e.h
>> index e20072972e35..8e8aad5172ab 100644
>> --- a/arch/powerpc/include/asm/mmu-book3e.h
>> +++ b/arch/powerpc/include/asm/mmu-book3e.h
>> @@ -230,6 +230,7 @@ typedef struct {
>> unsigned int id;
>> unsigned int active;
>> unsigned long vdso_base;
>> + void *pte_frag;
>> } mm_context_t;
>>
>> /* Page size definitions, common between 32 and 64-bit
>> diff --git a/arch/powerpc/include/asm/mmu_context.h
>> b/arch/powerpc/include/asm/mmu_context.h
>> index b2f89b621b15..7f2c37a3f99d 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -222,7 +222,7 @@ static inline int arch_dup_mmap(struct mm_struct
>> *oldmm,
>> return 0;
>> }
>>
>> -#ifndef CONFIG_PPC_BOOK3S_64
>> +#if defined(CONFIG_PPC_BOOK3E_64) || defined(CONFIG_PPC_BOOK3S_32)
>> static inline void arch_exit_mmap(struct mm_struct *mm)
>> {
>> }
>> diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h
>> b/arch/powerpc/include/asm/nohash/32/pgalloc.h
>> index f3fec9052f31..e69423ad8e2e 100644
>> --- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
>> +++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
>> @@ -27,6 +27,9 @@ extern void __bad_pte(pmd_t *pmd);
>> extern struct kmem_cache *pgtable_cache[];
>> #define PGT_CACHE(shift) pgtable_cache[shift]
>>
>> +pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr,
>> int kernel);
>> +void pte_fragment_free(unsigned long *table, int kernel);
>> +
>> static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>> {
>> return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
>> @@ -58,11 +61,10 @@ static inline void pmd_populate_kernel(struct
>> mm_struct *mm, pmd_t *pmdp,
>> static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
>> pgtable_t pte_page)
>> {
>> - *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_USER |
>> - _PMD_PRESENT);
>> + *pmdp = __pmd(__pa(pte_page) | _PMD_USER | _PMD_PRESENT);
>> }
>>
>> -#define pmd_pgtable(pmd) pmd_page(pmd)
>> +#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
>> #else
>>
>> static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t
>> *pmdp,
>> @@ -74,49 +76,38 @@ static inline void pmd_populate_kernel(struct
>> mm_struct *mm, pmd_t *pmdp,
>> static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
>> pgtable_t pte_page)
>> {
>> - *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) |
>> _PMD_PRESENT);
>> + *pmdp = __pmd((unsigned long)pte_page | _PMD_PRESENT);
>> }
>>
>> -#define pmd_pgtable(pmd) pmd_page(pmd)
>> +#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
>> #endif
>>
>> -static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
>> unsigned long address)
>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
>> + unsigned long address)
>> {
>> - return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>> + return (pte_t *)pte_fragment_alloc(mm, address, 1);
>> }
>>
>> -static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned
>> long address)
>> +static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
>> + unsigned long address)
>> {
>> - struct page *ptepage;
>> -
>> - gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT;
>> -
>> - ptepage = alloc_pages(flags, 0);
>> - if (!ptepage)
>> - return NULL;
>> - if (!pgtable_page_ctor(ptepage)) {
>> - __free_page(ptepage);
>> - return NULL;
>> - }
>> - return ptepage;
>> + return (pgtable_t)pte_fragment_alloc(mm, address, 0);
>> }
>>
>> static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>> {
>> - free_page((unsigned long)pte);
>> + pte_fragment_free((unsigned long *)pte, 1);
>> }
>>
>> static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
>> {
>> - pgtable_page_dtor(ptepage);
>> - __free_page(ptepage);
>> + pte_fragment_free((unsigned long *)ptepage, 0);
>> }
>>
>> static inline void pgtable_free(void *table, unsigned index_size)
>> {
>> if (!index_size) {
>> - pgtable_page_dtor(virt_to_page(table));
>> - free_page((unsigned long)table);
>> + pte_fragment_free((unsigned long *)table, 0);
>> } else {
>> BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
>> kmem_cache_free(PGT_CACHE(index_size), table);
>> @@ -155,6 +146,6 @@ static inline void __pte_free_tlb(struct
>> mmu_gather *tlb, pgtable_t table,
>> unsigned long address)
>> {
>> tlb_flush_pgtable(tlb, address);
>> - pgtable_free_tlb(tlb, page_address(table), 0);
>> + pgtable_free_tlb(tlb, table, 0);
>> }
>> #endif /* _ASM_POWERPC_PGALLOC_32_H */
>> diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h
>> b/arch/powerpc/include/asm/nohash/32/pgtable.h
>> index d2908a8038e8..73e2b1fbdb36 100644
>> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
>> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
>> @@ -336,12 +336,12 @@ static inline int pte_young(pte_t pte)
>> */
>> #ifndef CONFIG_BOOKE
>> #define pmd_page_vaddr(pmd) \
>> - ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
>> + ((unsigned long)__va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
>> #define pmd_page(pmd) \
>> pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
>> #else
>> #define pmd_page_vaddr(pmd) \
>> - ((unsigned long) (pmd_val(pmd) & PAGE_MASK))
>> + ((unsigned long)(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
>> #define pmd_page(pmd) \
>> pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
>> #endif
>> @@ -360,7 +360,8 @@ static inline int pte_young(pte_t pte)
>> (pmd_bad(*(dir)) ? NULL : (pte_t *)pmd_page_vaddr(*(dir)) + \
>> pte_index(addr))
>> #define pte_offset_map(dir, addr) \
>> - ((pte_t *) kmap_atomic(pmd_page(*(dir))) + pte_index(addr))
>> + ((pte_t *)(kmap_atomic(pmd_page(*(dir))) + \
>> + (pmd_page_vaddr(*(dir)) & ~PAGE_MASK)) + pte_index(addr))
>> #define pte_unmap(pte) kunmap_atomic(pte)
>>
>> /*
>> diff --git a/arch/powerpc/include/asm/page.h
>> b/arch/powerpc/include/asm/page.h
>> index f6a1265face2..27d1c16601ee 100644
>> --- a/arch/powerpc/include/asm/page.h
>> +++ b/arch/powerpc/include/asm/page.h
>> @@ -335,7 +335,7 @@ void arch_free_page(struct page *page, int order);
>> #endif
>>
>> struct vm_area_struct;
>> -#ifdef CONFIG_PPC_BOOK3S_64
>> +#if !defined(CONFIG_PPC_BOOK3E_64) && !defined(CONFIG_PPC_BOOK3S_32)
>> /*
>> * For BOOK3s 64 with 4k and 64K linux page size
>> * we want to use pointers, because the page table
>> @@ -343,12 +343,8 @@ struct vm_area_struct;
>> */
>> typedef pte_t *pgtable_t;
>> #else
>> -#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC64)
>> -typedef pte_t *pgtable_t;
>> -#else
>> typedef struct page *pgtable_t;
>> #endif
>> -#endif
>>
>
>
> Now that is getting complicated. Is there a way to move that to platform
> header instead of that complicated #if?
Ok, added two new patches for that in v5 (one distributes mmu-xxx.h in
platform dirs, the other moves pgtable_t typedefs in relevant files)
>
>> #include <asm-generic/memory_model.h>
>> #endif /* __ASSEMBLY__ */
>> diff --git a/arch/powerpc/include/asm/pgtable.h
>> b/arch/powerpc/include/asm/pgtable.h
>> index 8b38f7730211..1865a3e4ab8c 100644
>> --- a/arch/powerpc/include/asm/pgtable.h
>> +++ b/arch/powerpc/include/asm/pgtable.h
>> @@ -94,12 +94,20 @@ unsigned long vmalloc_to_phys(void *vmalloc_addr);
>> void pgtable_cache_add(unsigned int shift);
>> void pgtable_cache_init(void);
>>
>> +pte_t *early_alloc_pte(void);
>> +
>> #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
>> void mark_initmem_nx(void);
>> #else
>> static inline void mark_initmem_nx(void) { }
>> #endif
>>
>> +#ifndef PTE_FRAG_NR
>> +#define PTE_FRAG_NR 1
>> +#define PTE_FRAG_SIZE_SHIFT PAGE_SHIFT
>> +#define PTE_FRAG_SIZE PAGE_SIZE
>> +#endif
>> +
>
> IMHO we should avoid that. The #ifndef challenge is that we should
> always make sure the header inclusion is correct so that platform
> headers get included before. Why not move it to the platform that want
> to use pte fragmentation?
Ok, in v5 functions using it now defined static inline in platform
headers so moved them there are well.
>
>
>> #endif /* __ASSEMBLY__ */
>>
>> #endif /* _ASM_POWERPC_PGTABLE_H */
>> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
>> index bd43b3ee52cb..e1deb15fe85e 100644
>> --- a/arch/powerpc/mm/Makefile
>> +++ b/arch/powerpc/mm/Makefile
>> @@ -18,6 +18,9 @@ obj-$(CONFIG_PPC_BOOK3E_64) += pgtable-book3e.o
>> obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o
>> slb_low.o slb.o \
>> $(hash64-y) mmu_context_book3s64.o
>> pgtable-book3s64.o \
>> pgtable-frag.o
>> +ifndef CONFIG_PPC_BOOK3S_32
>> +obj-$(CONFIG_PPC32) += pgtable-frag.o
>> +endif
>> obj-$(CONFIG_PPC_RADIX_MMU) += pgtable-radix.o tlb-radix.o
>> obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o hash_low_32.o
>> mmu_context_hash32.o
>> obj-$(CONFIG_PPC_STD_MMU) += tlb_hash$(BITS).o
>> diff --git a/arch/powerpc/mm/mmu_context_nohash.c
>> b/arch/powerpc/mm/mmu_context_nohash.c
>> index 4d80239ef83c..98f0ef463dc8 100644
>> --- a/arch/powerpc/mm/mmu_context_nohash.c
>> +++ b/arch/powerpc/mm/mmu_context_nohash.c
>> @@ -385,6 +385,7 @@ int init_new_context(struct task_struct *t, struct
>> mm_struct *mm)
>> #endif
>> mm->context.id = MMU_NO_CONTEXT;
>> mm->context.active = 0;
>> + mm->context.pte_frag = NULL;
>> return 0;
>> }
>>
>> diff --git a/arch/powerpc/mm/pgtable-frag.c
>> b/arch/powerpc/mm/pgtable-frag.c
>> index ab4910e92aaf..d554a1cbc56d 100644
>> --- a/arch/powerpc/mm/pgtable-frag.c
>> +++ b/arch/powerpc/mm/pgtable-frag.c
>> @@ -30,6 +30,7 @@ static void pte_frag_destroy(void *pte_frag)
>> }
>> }
>>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> static void pmd_frag_destroy(void *pmd_frag)
>> {
>> int count;
>> @@ -44,6 +45,7 @@ static void pmd_frag_destroy(void *pmd_frag)
>> __free_page(page);
>> }
>> }
>> +#endif
>>
>> static void destroy_pagetable_cache(struct mm_struct *mm)
>> {
>> @@ -53,15 +55,18 @@ static void destroy_pagetable_cache(struct
>> mm_struct *mm)
>> if (frag)
>> pte_frag_destroy(frag);
>>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> frag = mm->context.pmd_frag;
>> if (frag)
>> pmd_frag_destroy(frag);
>> +#endif
>> }
>>
>> void arch_exit_mmap(struct mm_struct *mm)
>> {
>> destroy_pagetable_cache(mm);
>>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> if (radix_enabled()) {
>> /*
>> * Radix doesn't have a valid bit in the process table
>> @@ -79,6 +84,7 @@ void arch_exit_mmap(struct mm_struct *mm)
>> */
>> process_tb[mm->context.id].prtb0 = 0;
>> }
>> +#endif
>> }
>>
>
> is there a way to avoid all that #ifdef? May be redo the frag code such
> that we have few helpers that is platform independent?
Yes, in v5 reworked to keep platform specific arch_exit_mmap() and
destroy_pagetable_cache().
Christophe
>
>> static pte_t *get_pte_from_cache(struct mm_struct *mm)
>> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
>> index 7900b613e6e5..81e6b18d1955 100644
>> --- a/arch/powerpc/mm/pgtable_32.c
>> +++ b/arch/powerpc/mm/pgtable_32.c
>> @@ -195,12 +195,16 @@ EXPORT_SYMBOL(iounmap);
>> static __init pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned
>> long va)
>> {
>> if (!pmd_present(*pmdp)) {
>> - pte_t *ptep = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
>> + pte_t *ptep = __va(memblock_alloc(PTE_FRAG_SIZE,
>> PTE_FRAG_SIZE));
>>
>> if (!ptep)
>> return NULL;
>>
>> - clear_page(ptep);
>> + if (PTE_FRAG_SIZE == PAGE_SIZE)
>> + clear_page(ptep);
>> + else
>> + memset(ptep, 0, PTE_FRAG_SIZE);
>> +
>> pmd_populate_kernel(&init_mm, pmdp, ptep);
>> }
>> return pte_offset_kernel(pmdp, va);
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 15/20] powerpc/mm: Avoid useless lock with single page fragments
2018-09-19 2:56 ` Aneesh Kumar K.V
@ 2018-09-25 16:49 ` Christophe LEROY
0 siblings, 0 replies; 25+ messages in thread
From: Christophe LEROY @ 2018-09-25 16:49 UTC (permalink / raw)
To: Aneesh Kumar K.V, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, aneesh.kumar
Cc: linux-kernel, linuxppc-dev
Le 19/09/2018 à 04:56, Aneesh Kumar K.V a écrit :
> On 9/18/18 10:27 PM, Christophe Leroy wrote:
>> There is no point in taking the page table lock as
>> pte_frag is always NULL when we have only one fragment.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> arch/powerpc/mm/pgtable-frag.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/pgtable-frag.c
>> b/arch/powerpc/mm/pgtable-frag.c
>> index bc924822dcd6..ab4910e92aaf 100644
>> --- a/arch/powerpc/mm/pgtable-frag.c
>> +++ b/arch/powerpc/mm/pgtable-frag.c
>> @@ -85,6 +85,9 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
>> {
>> void *pte_frag, *ret;
>>
>> + if (PTE_FRAG_NR == 1)
>> + return NULL;
>> +
>> spin_lock(&mm->page_table_lock);
>> ret = mm->context.pte_frag;
>> if (ret) {
>>
>
> May be update get_pmd_from_cache too?
>
Ok, done in v5
Christophe
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2018-09-25 16:49 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18 16:57 [PATCH v4 00/20] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 01/20] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 02/20] powerpc/code-patching: add a helper to get the address of a patch_site Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 03/20] powerpc/8xx: Use patch_site for memory setup patching Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 04/20] powerpc/8xx: Use patch_site for perf counters setup Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 05/20] powerpc/8xx: Move SW perf counters in first 32kb of memory Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 06/20] powerpc/8xx: Temporarily disable 16k pages and 512k hugepages Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 07/20] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 08/20] powerpc/mm: Enable 512k hugepage support with HW assistance " Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 09/20] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 10/20] powerpc/8xx: regroup TLB handler routines Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 11/20] powerpc/mm: don't use pte_alloc_one_kernel() before slab is available Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 12/20] powerpc/mm: inline pte_alloc_one() and pte_alloc_one_kernel() in PPC32 Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 13/20] powerpc/book3s32: Remove CONFIG_BOOKE dependent code Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 14/20] powerpc/mm: Move pte_fragment_alloc() to a common location Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 15/20] powerpc/mm: Avoid useless lock with single page fragments Christophe Leroy
2018-09-19 2:56 ` Aneesh Kumar K.V
2018-09-25 16:49 ` Christophe LEROY
2018-09-18 16:57 ` [PATCH v4 16/20] powerpc/mm: Extend pte_fragment functionality to nohash/32 Christophe Leroy
2018-09-19 3:03 ` Aneesh Kumar K.V
2018-09-25 16:48 ` Christophe LEROY
2018-09-18 16:57 ` [PATCH v4 17/20] powerpc/8xx: Remove PTE_ATOMIC_UPDATES Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 18/20] powerpc/mm: reintroduce 16K pages with HW assistance on 8xx Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 19/20] powerpc/nohash32: allow setting GUARDED attribute in the PMD directly Christophe Leroy
2018-09-18 16:57 ` [PATCH v4 20/20] powerpc/8xx: set " Christophe Leroy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).