linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Scott Wood <oss@buserror.net>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: [PATCH v7 02/23] powerpc/8xx: Map linear kernel RAM with 8M pages
Date: Tue,  9 Feb 2016 11:23:07 +0100 (CET)	[thread overview]
Message-ID: <b7313dd127230ddf17b0fed53151f0ad89283ac7.1454932181.git.christophe.leroy@c-s.fr> (raw)
In-Reply-To: <cover.1454932181.git.christophe.leroy@c-s.fr>

On a live running system (VoIP gateway for Air Trafic Control), over
a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
and approximatly 35 secondes are spent in DTLB handler.
This represents 5.8% of the overall time and even 10.8% of the
non-idle time.
Among those 87 millions DTLB misses, 15% are on user addresses and
85% are on kernel addresses. And within the kernel addresses, 93%
are on addresses from the linear address space and only 7% are on
addresses from the virtual address space.

MPC8xx has no BATs but it has 8Mb page size. This patch implements
mapping of kernel RAM using 8Mb pages, on the same model as what is
done on the 40x.

In 4k pages mode, each PGD entry maps a 4Mb area: we map every two
entries to the same 8Mb physical page. In each second entry, we add
4Mb to the page physical address to ease life of the FixupDAR
routine. This is just ignored by HW.

In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry
will point to the first page of the area. The DTLB handler adds
the 3 bits from EPN to map the correct page.

With this patch applied, we now get only 13 millions TLB misses
during the 10 minutes period. The idle time has increased to 313s
and the overall time spent in DTLB miss handler is 6.3s, which
represents 1% of the overall time and 2.2% of non-idle time.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v2: using bt instead of bgt and named the label explicitly
v3: no change
v4: no change
v5: removed use of pmd_val() as L-value
v6: no change
v7: no change

 arch/powerpc/kernel/head_8xx.S | 35 +++++++++++++++++-
 arch/powerpc/mm/8xx_mmu.c      | 83 ++++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/Makefile       |  1 +
 arch/powerpc/mm/mmu_decl.h     | 15 ++------
 4 files changed, 120 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/mm/8xx_mmu.c

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index a89492e..87d1f5f 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -398,11 +398,13 @@ DataStoreTLBMiss:
 	BRANCH_UNLESS_KERNEL(3f)
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 3:
-	mtcr	r3
 
 	/* Insert level 1 index */
 	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
+	mtcr	r11
+	bt-	28,DTLBMiss8M		/* bit 28 = Large page (8M) */
+	mtcr	r3
 
 	/* We have a pte table, so load fetch the pte from the table.
 	 */
@@ -455,6 +457,29 @@ DataStoreTLBMiss:
 	EXCEPTION_EPILOG_0
 	rfi
 
+DTLBMiss8M:
+	mtcr	r3
+	ori	r11, r11, MD_SVALID
+	MTSPR_CPU6(SPRN_MD_TWC, r11, r3)
+#ifdef CONFIG_PPC_16K_PAGES
+	/*
+	 * In 16k pages mode, each PGD entry defines a 64M block.
+	 * Here we select the 8M page within the block.
+	 */
+	rlwimi	r11, r10, 0, 0x03800000
+#endif
+	rlwinm	r10, r11, 0, 0xff800000
+	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
+			  _PAGE_PRESENT
+	MTSPR_CPU6(SPRN_MD_RPN, r10, r3)	/* Update TLB entry */
+
+	li	r11, RPN_PATTERN
+	mfspr	r3, SPRN_SPRG_SCRATCH2
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	EXCEPTION_EPILOG_0
+	rfi
+
+
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -532,13 +557,15 @@ FixupDAR:/* Entry point for dcbx workaround. */
 	/* Insert level 1 index */
 3:	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
+	mtcr	r11
+	bt	28,200f		/* bit 28 = Large page (8M) */
 	rlwinm	r11, r11,0,0,19	/* Extract page descriptor page address */
 	/* Insert level 2 index */
 	rlwimi	r11, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
 	lwz	r11, 0(r11)	/* Get the pte */
 	/* concat physical page address(r11) and page offset(r10) */
 	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT, 31
-	lwz	r11,0(r11)
+201:	lwz	r11,0(r11)
 /* Check if it really is a dcbx instruction. */
 /* dcbt and dcbtst does not generate DTLB Misses/Errors,
  * no need to include them here */
@@ -557,6 +584,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
 141:	mfspr	r10,SPRN_SPRG_SCRATCH2
 	b	DARFixed	/* Nope, go back to normal TLB processing */
 
+	/* concat physical page address(r11) and page offset(r10) */
+200:	rlwimi	r11, r10, 0, 32 - (PAGE_SHIFT << 1), 31
+	b	201b
+
 144:	mfspr	r10, SPRN_DSISR
 	rlwinm	r10, r10,0,7,5	/* Clear store bit for buggy dcbst insn */
 	mtspr	SPRN_DSISR, r10
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
new file mode 100644
index 0000000..2d42745
--- /dev/null
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -0,0 +1,83 @@
+/*
+ * This file contains the routines for initializing the MMU
+ * on the 8xx series of chips.
+ *  -- christophe
+ *
+ *  Derived from arch/powerpc/mm/40x_mmu.c:
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/memblock.h>
+
+#include "mmu_decl.h"
+
+extern int __map_without_ltlbs;
+/*
+ * MMU_init_hw does the chip-specific initialization of the MMU hardware.
+ */
+void __init MMU_init_hw(void)
+{
+	/* Nothing to do for the time being but keep it similar to other PPC */
+}
+
+#define LARGE_PAGE_SIZE_4M	(1<<22)
+#define LARGE_PAGE_SIZE_8M	(1<<23)
+#define LARGE_PAGE_SIZE_64M	(1<<26)
+
+unsigned long __init mmu_mapin_ram(unsigned long top)
+{
+	unsigned long v, s, mapped;
+	phys_addr_t p;
+
+	v = KERNELBASE;
+	p = 0;
+	s = top;
+
+	if (__map_without_ltlbs)
+		return 0;
+
+#ifdef CONFIG_PPC_4K_PAGES
+	while (s >= LARGE_PAGE_SIZE_8M) {
+		pmd_t *pmdp;
+		unsigned long val = p | MD_PS8MEG;
+
+		pmdp = pmd_offset(pud_offset(pgd_offset_k(v), v), v);
+		*pmdp++ = __pmd(val);
+		*pmdp++ = __pmd(val + LARGE_PAGE_SIZE_4M);
+
+		v += LARGE_PAGE_SIZE_8M;
+		p += LARGE_PAGE_SIZE_8M;
+		s -= LARGE_PAGE_SIZE_8M;
+	}
+#else /* CONFIG_PPC_16K_PAGES */
+	while (s >= LARGE_PAGE_SIZE_64M) {
+		pmd_t *pmdp;
+		unsigned long val = p | MD_PS8MEG;
+
+		pmdp = pmd_offset(pud_offset(pgd_offset_k(v), v), v);
+		*pmdp++ = __pmd(val);
+
+		v += LARGE_PAGE_SIZE_64M;
+		p += LARGE_PAGE_SIZE_64M;
+		s -= LARGE_PAGE_SIZE_64M;
+	}
+#endif
+
+	mapped = top - s;
+
+	/* If the size of RAM is not an exact power of two, we may not
+	 * have covered RAM in its entirety with 8 MiB
+	 * pages. Consequently, restrict the top end of RAM currently
+	 * allocable so that calls to the MEMBLOCK to allocate PTEs for "tail"
+	 * coverage with normal-sized pages (or other reasons) do not
+	 * attempt to allocate outside the allowed range.
+	 */
+	memblock_set_current_limit(mapped);
+
+	return mapped;
+}
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 1ffeda8..adfee3f 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_PPC_ICSWX)		+= icswx.o
 obj-$(CONFIG_PPC_ICSWX_PID)	+= icswx_pid.o
 obj-$(CONFIG_40x)		+= 40x_mmu.o
 obj-$(CONFIG_44x)		+= 44x_mmu.o
+obj-$(CONFIG_PPC_8xx)		+= 8xx_mmu.o
 obj-$(CONFIG_PPC_FSL_BOOK3E)	+= fsl_booke_mmu.o
 obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
 obj-$(CONFIG_PPC_SPLPAR)	+= vphn.o
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 9f58ff4..7faeb9f 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -132,22 +132,17 @@ extern void wii_memory_fixups(void);
 /* ...and now those things that may be slightly different between processor
  * architectures.  -- Dan
  */
-#if defined(CONFIG_8xx)
-#define MMU_init_hw()		do { } while(0)
-#define mmu_mapin_ram(top)	(0UL)
-
-#elif defined(CONFIG_4xx)
+#ifdef CONFIG_PPC32
 extern void MMU_init_hw(void);
 extern unsigned long mmu_mapin_ram(unsigned long top);
+#endif
 
-#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#ifdef CONFIG_PPC_FSL_BOOK3E
 extern unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx,
 				     bool dryrun);
 extern unsigned long calc_cam_sz(unsigned long ram, unsigned long virt,
 				 phys_addr_t phys);
 #ifdef CONFIG_PPC32
-extern void MMU_init_hw(void);
-extern unsigned long mmu_mapin_ram(unsigned long top);
 extern void adjust_total_lowmem(void);
 extern int switch_to_as1(void);
 extern void restore_to_as0(int esel, int offset, void *dt_ptr, int bootcpu);
@@ -162,8 +157,4 @@ struct tlbcam {
 	u32	MAS3;
 	u32	MAS7;
 };
-#elif defined(CONFIG_PPC32)
-/* anything 32-bit except 4xx or 8xx */
-extern void MMU_init_hw(void);
-extern unsigned long mmu_mapin_ram(unsigned long top);
 #endif
-- 
2.1.0

  parent reply	other threads:[~2016-02-09 10:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 10:23 [PATCH v7 00/23] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 01/23] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
2016-02-09 10:23 ` Christophe Leroy [this message]
2016-02-09 10:23 ` [PATCH v7 03/23] powerpc: Update documentation for noltlbs kernel parameter Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 04/23] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 05/23] powerpc32: Fix pte_offset_kernel() to return NULL for bad pages Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 06/23] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 07/23] powerpc/8xx: Fix vaddr for IMMR early remap Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 08/23] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 09/23] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 10/23] powerpc/8xx: map more RAM at startup when needed Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 11/23] powerpc32: Remove useless/wrong MMU:setio progress message Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 12/23] powerpc32: remove ioremap_base Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 13/23] powerpc/8xx: Add missing SPRN defines into reg_8xx.h Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 14/23] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 15/23] powerpc/8xx: remove special handling of CPU6 errata in set_dec() Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 16/23] powerpc/8xx: rewrite set_context() in C Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 17/23] powerpc/8xx: rewrite flush_instruction_cache() " Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 18/23] powerpc: add inline functions for cache related instructions Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 19/23] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 20/23] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 21/23] powerpc: Simplify test in __dma_sync() Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 22/23] powerpc32: small optimisation in flush_icache_range() Christophe Leroy
2016-02-09 10:23 ` [PATCH v7 23/23] powerpc32: Remove one insn in mulhdu Christophe Leroy
2016-02-09 15:49 ` [PATCH v7 00/23] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7313dd127230ddf17b0fed53151f0ad89283ac7.1454932181.git.christophe.leroy@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=oss@buserror.net \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).