linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments
@ 2015-09-22 16:50 Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
                   ` (24 more replies)
  0 siblings, 25 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

The main purpose of this patchset is to dramatically reduce the time
spent in DTLB miss handler. This is achieved by:
1/ Mapping RAM with 8M pages
2/ Mapping IMMR with a fixed 512K page

On a live running system (VoIP gateway for Air Trafic Control), over
a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
and approximatly 35 secondes are spent in DTLB handler.
This represents 5.8% of the overall time and even 10.8% of the
non-idle time.
Among those 87 millions DTLB misses, 15% are on user addresses and
85% are on kernel addresses. And within the kernel addresses, 93%
are on addresses from the linear address space and only 7% are on
addresses from the virtual address space.

Once the full patchset applied, the number of DTLB misses during the
period is reduced to 11.8 millions for a duration of 5.8s, which
represents 2% of the non-idle time.

This patch also includes other miscellaneous improvements:
1/ Handling of CPU6 ERRATA directly in mtspr() C macro to reduce code
specific to PPC8xx
2/ Rewrite of a few non critical ASM functions in C
3/ Removal of some unused items

See related patches for details

Christophe Leroy (25):
  powerpc/8xx: Save r3 all the time in DTLB miss handler
  powerpc/8xx: Map linear kernel RAM with 8M pages
  powerpc: Update documentation for noltlbs kernel parameter
  powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c
  powerpc/8xx: Fix vaddr for IMMR early remap
  powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either
  powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam()
    together
  powerpc/8xx: Map IMMR area with 512k page at a fixed address
  powerpc/8xx: show IMMR area in startup memory layout
  powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM
  powerpc/8xx: map 16M RAM at startup
  powerpc32: Remove useless/wrong MMU:setio progress message
  powerpc/8xx: also use r3 in the ITLB miss in all situations
  powerpc32: remove ioremap_base
  powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing
    ones
  powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro
  powerpc/8xx: remove special handling of CPU6 errata in set_dec()
  powerpc/8xx: rewrite set_context() in C
  powerpc/8xx: rewrite flush_instruction_cache() in C
  powerpc32: Remove clear_pages() and define clear_page() inline
  powerpc: add inline functions for cache related instructions
  powerpc32: move xxxxx_dcache_range() functions inline
  powerpc: Simplify test in __dma_sync()
  powerpc32: small optimisation in flush_icache_range()
  powerpc32: Remove one insn in mulhdu

 Documentation/kernel-parameters.txt         |   2 +-
 arch/powerpc/Kconfig.debug                  |   2 -
 arch/powerpc/include/asm/cache.h            |  19 +++
 arch/powerpc/include/asm/cacheflush.h       |  55 +++++++-
 arch/powerpc/include/asm/mmu-8xx.h          |  26 ++--
 arch/powerpc/include/asm/page_32.h          |  17 ++-
 arch/powerpc/include/asm/pgtable-ppc32.h    |   5 +
 arch/powerpc/include/asm/reg.h              |   2 +
 arch/powerpc/include/asm/reg_8xx.h          | 106 +++++++++++++++
 arch/powerpc/include/asm/time.h             |   6 +-
 arch/powerpc/kernel/head_8xx.S              | 167 ++++++++++++------------
 arch/powerpc/kernel/misc_32.S               | 107 ++--------------
 arch/powerpc/kernel/ppc_ksyms.c             |   2 +
 arch/powerpc/kernel/ppc_ksyms_32.c          |   1 -
 arch/powerpc/mm/8xx_mmu.c                   | 191 ++++++++++++++++++++++++++++
 arch/powerpc/mm/Makefile                    |   1 +
 arch/powerpc/mm/dma-noncoherent.c           |   2 +-
 arch/powerpc/mm/init_32.c                   |  23 ----
 arch/powerpc/mm/mem.c                       |   4 +
 arch/powerpc/mm/mmu_decl.h                  |  16 +--
 arch/powerpc/mm/pgtable_32.c                |  54 +++++++-
 arch/powerpc/platforms/embedded6xx/mpc10x.h |   8 --
 22 files changed, 553 insertions(+), 263 deletions(-)
 create mode 100644 arch/powerpc/mm/8xx_mmu.c

-- 
2.1.0


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-28 22:07   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 02/25] powerpc/8xx: Map linear kernel RAM with 8M pages Christophe Leroy
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

We are spending between 40 and 160 cycles with a mean of 65 cycles in
the TLB handling routines (measured with mftbl) so make it more
simple althought it adds one instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/kernel/head_8xx.S | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 78c1eba..1557926 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -385,23 +385,20 @@ InstructionTLBMiss:
 
 	. = 0x1200
 DataStoreTLBMiss:
-#ifdef CONFIG_8xx_CPU6
 	mtspr	SPRN_SPRG_SCRATCH2, r3
-#endif
 	EXCEPTION_PROLOG_0
-	mfcr	r10
+	mfcr	r3
 
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
-	mfspr	r11, SPRN_MD_EPN
-	IS_KERNEL(r11, r11)
+	mfspr	r10, SPRN_MD_EPN
+	IS_KERNEL(r11, r10)
 	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
 	BRANCH_UNLESS_KERNEL(3f)
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 3:
-	mtcr	r10
-	mfspr	r10, SPRN_MD_EPN
+	mtcr	r3
 
 	/* Insert level 1 index */
 	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
@@ -453,9 +450,7 @@ DataStoreTLBMiss:
 	MTSPR_CPU6(SPRN_MD_RPN, r10, r3)	/* Update TLB entry */
 
 	/* Restore registers */
-#ifdef CONFIG_8xx_CPU6
 	mfspr	r3, SPRN_SPRG_SCRATCH2
-#endif
 	mtspr	SPRN_DAR, r11	/* Tag DAR */
 	EXCEPTION_EPILOG_0
 	rfi
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 02/25] powerpc/8xx: Map linear kernel RAM with 8M pages
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 03/25] powerpc: Update documentation for noltlbs kernel parameter Christophe Leroy
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

On a live running system (VoIP gateway for Air Trafic Control), over
a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
and approximatly 35 secondes are spent in DTLB handler.
This represents 5.8% of the overall time and even 10.8% of the
non-idle time.
Among those 87 millions DTLB misses, 15% are on user addresses and
85% are on kernel addresses. And within the kernel addresses, 93%
are on addresses from the linear address space and only 7% are on
addresses from the virtual address space.

MPC8xx has no BATs but it has 8Mb page size. This patch implements
mapping of kernel RAM using 8Mb pages, on the same model as what is
done on the 40x.

In 4k pages mode, each PGD entry maps a 4Mb area: we map every two
entries to the same 8Mb physical page. In each second entry, we add
4Mb to the page physical address to ease life of the FixupDAR
routine. This is just ignored by HW.

In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry
will point to the first page of the area. The DTLB handler adds
gets the 3 bits from EPN to map the correct page.

With this patch applied, we now get only 13 millions TLB misses
during the 10 minutes period. The idle time has increased to 313s
and the overall time spent in DTLB miss handler is 6.3s, which
represents 1% of the overall time and 2.2% of non-idle time.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v2: using bt instead of bgt and named the label explicitly

 arch/powerpc/kernel/head_8xx.S | 35 +++++++++++++++++-
 arch/powerpc/mm/8xx_mmu.c      | 83 ++++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/Makefile       |  1 +
 arch/powerpc/mm/mmu_decl.h     | 15 ++------
 4 files changed, 120 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/mm/8xx_mmu.c

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 1557926..bcba51b 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -398,11 +398,13 @@ DataStoreTLBMiss:
 	BRANCH_UNLESS_KERNEL(3f)
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 3:
-	mtcr	r3
 
 	/* Insert level 1 index */
 	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
+	mtcr	r11
+	bt-	28,DTLBMiss8M		/* bit 28 = Large page (8M) */
+	mtcr	r3
 
 	/* We have a pte table, so load fetch the pte from the table.
 	 */
@@ -455,6 +457,29 @@ DataStoreTLBMiss:
 	EXCEPTION_EPILOG_0
 	rfi
 
+DTLBMiss8M:
+	mtcr	r3
+	ori	r11, r11, MD_SVALID
+	MTSPR_CPU6(SPRN_MD_TWC, r11, r3)
+#ifdef CONFIG_PPC_16K_PAGES
+	/*
+	 * In 16k pages mode, each PGD entry defines a 64M block.
+	 * Here we select the 8M page within the block.
+	 */
+	rlwimi	r11, r10, 0, 0x03800000
+#endif
+	rlwinm	r10, r11, 0, 0xff800000
+	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
+			  _PAGE_PRESENT
+	MTSPR_CPU6(SPRN_MD_RPN, r10, r3)	/* Update TLB entry */
+
+	li	r11, RPN_PATTERN
+	mfspr	r3, SPRN_SPRG_SCRATCH2
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	EXCEPTION_EPILOG_0
+	rfi
+
+
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -532,13 +557,15 @@ FixupDAR:/* Entry point for dcbx workaround. */
 	/* Insert level 1 index */
 3:	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
+	mtcr	r11
+	bt	28,200f		/* bit 28 = Large page (8M) */
 	rlwinm	r11, r11,0,0,19	/* Extract page descriptor page address */
 	/* Insert level 2 index */
 	rlwimi	r11, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
 	lwz	r11, 0(r11)	/* Get the pte */
 	/* concat physical page address(r11) and page offset(r10) */
 	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT, 31
-	lwz	r11,0(r11)
+201:	lwz	r11,0(r11)
 /* Check if it really is a dcbx instruction. */
 /* dcbt and dcbtst does not generate DTLB Misses/Errors,
  * no need to include them here */
@@ -557,6 +584,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
 141:	mfspr	r10,SPRN_SPRG_SCRATCH2
 	b	DARFixed	/* Nope, go back to normal TLB processing */
 
+	/* concat physical page address(r11) and page offset(r10) */
+200:	rlwimi	r11, r10, 0, 32 - (PAGE_SHIFT << 1), 31
+	b	201b
+
 144:	mfspr	r10, SPRN_DSISR
 	rlwinm	r10, r10,0,7,5	/* Clear store bit for buggy dcbst insn */
 	mtspr	SPRN_DSISR, r10
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
new file mode 100644
index 0000000..be6f041
--- /dev/null
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -0,0 +1,83 @@
+/*
+ * This file contains the routines for initializing the MMU
+ * on the 8xx series of chips.
+ *  -- christophe
+ *
+ *  Derived from arch/powerpc/mm/40x_mmu.c:
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/memblock.h>
+
+#include "mmu_decl.h"
+
+extern int __map_without_ltlbs;
+/*
+ * MMU_init_hw does the chip-specific initialization of the MMU hardware.
+ */
+void __init MMU_init_hw(void)
+{
+	/* Nothing to do for the time being but keep it similar to other PPC */
+}
+
+#define LARGE_PAGE_SIZE_4M	(1<<22)
+#define LARGE_PAGE_SIZE_8M	(1<<23)
+#define LARGE_PAGE_SIZE_64M	(1<<26)
+
+unsigned long __init mmu_mapin_ram(unsigned long top)
+{
+	unsigned long v, s, mapped;
+	phys_addr_t p;
+
+	v = KERNELBASE;
+	p = 0;
+	s = top;
+
+	if (__map_without_ltlbs)
+		return 0;
+
+#ifdef CONFIG_PPC_4K_PAGES
+	while (s >= LARGE_PAGE_SIZE_8M) {
+		pmd_t *pmdp;
+		unsigned long val = p | MD_PS8MEG;
+
+		pmdp = pmd_offset(pud_offset(pgd_offset_k(v), v), v);
+		pmd_val(*pmdp++) = val;
+		pmd_val(*pmdp++) = val + LARGE_PAGE_SIZE_4M;
+
+		v += LARGE_PAGE_SIZE_8M;
+		p += LARGE_PAGE_SIZE_8M;
+		s -= LARGE_PAGE_SIZE_8M;
+	}
+#else /* CONFIG_PPC_16K_PAGES */
+	while (s >= LARGE_PAGE_SIZE_64M) {
+		pmd_t *pmdp;
+		unsigned long val = p | MD_PS8MEG;
+
+		pmdp = pmd_offset(pud_offset(pgd_offset_k(v), v), v);
+		pmd_val(*pmdp++) = val;
+
+		v += LARGE_PAGE_SIZE_64M;
+		p += LARGE_PAGE_SIZE_64M;
+		s -= LARGE_PAGE_SIZE_64M;
+	}
+#endif
+
+	mapped = top - s;
+
+	/* If the size of RAM is not an exact power of two, we may not
+	 * have covered RAM in its entirety with 8 MiB
+	 * pages. Consequently, restrict the top end of RAM currently
+	 * allocable so that calls to the MEMBLOCK to allocate PTEs for "tail"
+	 * coverage with normal-sized pages (or other reasons) do not
+	 * attempt to allocate outside the allowed range.
+	 */
+	memblock_set_current_limit(mapped);
+
+	return mapped;
+}
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 3eb73a3..ca03ac0 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_PPC_ICSWX)		+= icswx.o
 obj-$(CONFIG_PPC_ICSWX_PID)	+= icswx_pid.o
 obj-$(CONFIG_40x)		+= 40x_mmu.o
 obj-$(CONFIG_44x)		+= 44x_mmu.o
+obj-$(CONFIG_PPC_8xx)		+= 8xx_mmu.o
 obj-$(CONFIG_PPC_FSL_BOOK3E)	+= fsl_booke_mmu.o
 obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
 obj-$(CONFIG_PPC_SPLPAR)	+= vphn.o
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 085b66b..99e88bc 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -132,21 +132,16 @@ extern void wii_memory_fixups(void);
 /* ...and now those things that may be slightly different between processor
  * architectures.  -- Dan
  */
-#if defined(CONFIG_8xx)
-#define MMU_init_hw()		do { } while(0)
-#define mmu_mapin_ram(top)	(0UL)
-
-#elif defined(CONFIG_4xx)
+#ifdef CONFIG_PPC32
 extern void MMU_init_hw(void);
 extern unsigned long mmu_mapin_ram(unsigned long top);
+#endif
 
-#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#ifdef CONFIG_PPC_FSL_BOOK3E
 extern unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx);
 extern unsigned long calc_cam_sz(unsigned long ram, unsigned long virt,
 				 phys_addr_t phys);
 #ifdef CONFIG_PPC32
-extern void MMU_init_hw(void);
-extern unsigned long mmu_mapin_ram(unsigned long top);
 extern void adjust_total_lowmem(void);
 extern int switch_to_as1(void);
 extern void restore_to_as0(int esel, int offset, void *dt_ptr, int bootcpu);
@@ -160,8 +155,4 @@ struct tlbcam {
 	u32	MAS3;
 	u32	MAS7;
 };
-#elif defined(CONFIG_PPC32)
-/* anything 32-bit except 4xx or 8xx */
-extern void MMU_init_hw(void);
-extern unsigned long mmu_mapin_ram(unsigned long top);
 #endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 03/25] powerpc: Update documentation for noltlbs kernel parameter
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 02/25] powerpc/8xx: Map linear kernel RAM with 8M pages Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 04/25] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c Christophe Leroy
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Now the noltlbs kernel parameter is also applicable to PPC8xx

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 Documentation/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c6dd5f3..cf28d9e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2488,7 +2488,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	nolapic_timer	[X86-32,APIC] Do not use the local APIC timer.
 
 	noltlbs		[PPC] Do not use large page/tlb entries for kernel
-			lowmem mapping on PPC40x.
+			lowmem mapping on PPC40x and PPC8xx
 
 	nomca		[IA-64] Disable machine check abort handling
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 04/25] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (2 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 03/25] powerpc: Update documentation for noltlbs kernel parameter Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap Christophe Leroy
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Now we have a 8xx specific .c file for that so put it in there
as other powerpc variants do

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/mm/8xx_mmu.c | 17 +++++++++++++++++
 arch/powerpc/mm/init_32.c | 19 -------------------
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index be6f041..0ddcb37 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -81,3 +81,20 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
 
 	return mapped;
 }
+
+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+				phys_addr_t first_memblock_size)
+{
+	/* We don't currently support the first MEMBLOCK not mapping 0
+	 * physical on those processors
+	 */
+	BUG_ON(first_memblock_base != 0);
+
+#ifdef CONFIG_PIN_TLB
+	/* 8xx can only access 24MB at the moment */
+	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01800000));
+#else
+	/* 8xx can only access 8MB at the moment */
+	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
+#endif
+}
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index a10be66..1a18e4b 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -193,22 +193,3 @@ void __init MMU_init(void)
 	/* Shortly after that, the entire linear mapping will be available */
 	memblock_set_current_limit(lowmem_end_addr);
 }
-
-#ifdef CONFIG_8xx /* No 8xx specific .c file to put that in ... */
-void setup_initial_memory_limit(phys_addr_t first_memblock_base,
-				phys_addr_t first_memblock_size)
-{
-	/* We don't currently support the first MEMBLOCK not mapping 0
-	 * physical on those processors
-	 */
-	BUG_ON(first_memblock_base != 0);
-
-#ifdef CONFIG_PIN_TLB
-	/* 8xx can only access 24MB at the moment */
-	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01800000));
-#else
-	/* 8xx can only access 8MB at the moment */
-	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
-#endif
-}
-#endif /* CONFIG_8xx */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (3 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 04/25] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-28 23:39   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either Christophe Leroy
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
648K rodata, 508K init, 290K bss, 6644K reserved)
Kernel virtual memory layout:
  * 0xfffdf000..0xfffff000  : fixmap
  * 0xfde00000..0xfe000000  : consistent mem
  * 0xfddf6000..0xfde00000  : early ioremap
  * 0xc9000000..0xfddf6000  : vmalloc & ioremap
SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1

Mapping IMMR 1:1 is just wrong because it may overlap with another
area. On most mpc8xx boards it is OK because IMMR is set to
0xff000000 but for instance on EP88xC board, IMMR is at 0xfa200000
which overlaps with VM ioremap area

This patch fixes the virtual address for remapping IMMR to 0xff000000,
regardless of the value of IMMR.

The size of IMMR area is 256kbytes (CPM at offset 0, security engine
at offset 128) so 512kbytes is enough and allows to handle the EP88xC
case (which is not 8Mbytes but only 2Mbytes aligned) the same way.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/Kconfig.debug     |  1 -
 arch/powerpc/kernel/head_8xx.S | 10 +++++-----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 3a510f4..70168a2 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -326,7 +326,6 @@ config PPC_EARLY_DEBUG_40x_PHYSADDR
 config PPC_EARLY_DEBUG_CPM_ADDR
 	hex "CPM UART early debug transmit descriptor address"
 	depends on PPC_EARLY_DEBUG_CPM
-	default "0xfa202008" if PPC_EP88XC
 	default "0xf0001ff8" if CPM2
 	default "0xff002008" if CPM1
 	help
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index bcba51b..603124e 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -763,7 +763,7 @@ start_here:
  * virtual to physical.  Also, set the cache mode since that is defined
  * by TLB entries and perform any additional mapping (like of the IMMR).
  * If configured to pin some TLBs, we pin the first 8 Mbytes of kernel,
- * 24 Mbytes of data, and the 8M IMMR space.  Anything not covered by
+ * 24 Mbytes of data, and the 512k IMMR space.  Anything not covered by
  * these mappings is mapped by page tables.
  */
 initial_mmu:
@@ -812,7 +812,7 @@ initial_mmu:
 	ori	r8, r8, MD_APG_INIT@l
 	mtspr	SPRN_MD_AP, r8
 
-	/* Map another 8 MByte at the IMMR to get the processor
+	/* Map 512 kBytes at 0xff000000 for the IMMR to get the processor
 	 * internal registers (among other things).
 	 */
 #ifdef CONFIG_PIN_TLB
@@ -820,12 +820,12 @@ initial_mmu:
 	mtspr	SPRN_MD_CTR, r10
 #endif
 	mfspr	r9, 638			/* Get current IMMR */
-	andis.	r9, r9, 0xff80		/* Get 8Mbyte boundary */
+	andis.	r9, r9, 0xfff8		/* Get 512 kbytes boundary */
 
-	mr	r8, r9			/* Create vaddr for TLB */
+	lis	r8, 0xff00		/* Create vaddr for TLB at 0xff000000 */
 	ori	r8, r8, MD_EVALID	/* Mark it valid */
 	mtspr	SPRN_MD_EPN, r8
-	li	r8, MD_PS8MEG		/* Set 8M byte page */
+	li	r8, MD_PS512K | MD_GUARDED	/* Set 512k byte page */
 	ori	r8, r8, MD_SVALID	/* Make it valid */
 	mtspr	SPRN_MD_TWC, r8
 	mr	r8, r9			/* Create paddr for TLB */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (4 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-28 23:41   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together Christophe Leroy
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

iounmap() cannot vunmap() area mapped by TLBCAMs either

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/mm/pgtable_32.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 7692d1b..03a073a 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -278,7 +278,9 @@ void iounmap(volatile void __iomem *addr)
 	 * If mapped by BATs then there is nothing to do.
 	 * Calling vfree() generates a benign warning.
 	 */
-	if (v_mapped_by_bats((unsigned long)addr)) return;
+	if (v_mapped_by_bats((unsigned long)addr) ||
+	    v_mapped_by_tlbcam((unsigned long)addr))
+		return;
 
 	if (addr > high_memory && (unsigned long) addr < ioremap_bot)
 		vunmap((void *) (PAGE_MASK & (unsigned long)addr));
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (5 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-28 23:47   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
purpose, so lets group them into a single function.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/mm/pgtable_32.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 03a073a..3fd9083 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t pa);
 #define p_mapped_by_tlbcam(x)	(0UL)
 #endif /* HAVE_TLBCAM */
 
+static inline unsigned long p_mapped_by_other(phys_addr_t pa)
+{
+	unsigned long v;
+
+	v = p_mapped_by_bats(pa);
+	if (v /*&& p_mapped_by_bats(p+size-1)*/)
+		return v;
+
+	return p_mapped_by_tlbcam(pa);
+}
+
+static inline phys_addr_t v_mapped_by_other(unsigned long va)
+{
+	phys_addr_t p;
+
+	p = v_mapped_by_bats(va);
+	if (p)
+		return p;
+
+	return v_mapped_by_tlbcam(va);
+}
+
 #define PGDIR_ORDER	(32 + PGD_T_LOG2 - PGDIR_SHIFT)
 
 #ifndef CONFIG_PPC_4K_PAGES
@@ -237,10 +259,8 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
 	 * same virt address (and this is contiguous).
 	 *  -- Cort
 	 */
-	if ((v = p_mapped_by_bats(p)) /*&& p_mapped_by_bats(p+size-1)*/ )
-		goto out;
-
-	if ((v = p_mapped_by_tlbcam(p)))
+	v = p_mapped_by_other(p);
+	if (v)
 		goto out;
 
 	if (slab_is_available()) {
@@ -278,8 +298,7 @@ void iounmap(volatile void __iomem *addr)
 	 * If mapped by BATs then there is nothing to do.
 	 * Calling vfree() generates a benign warning.
 	 */
-	if (v_mapped_by_bats((unsigned long)addr) ||
-	    v_mapped_by_tlbcam((unsigned long)addr))
+	if (v_mapped_by_other((unsigned long)addr))
 		return;
 
 	if (addr > high_memory && (unsigned long) addr < ioremap_bot)
@@ -405,7 +424,7 @@ static int __change_page_attr(struct page *page, pgprot_t prot)
 	BUG_ON(PageHighMem(page));
 	address = (unsigned long)page_address(page);
 
-	if (v_mapped_by_bats(address) || v_mapped_by_tlbcam(address))
+	if (v_mapped_by_other(address))
 		return 0;
 	if (!get_pteptr(&init_mm, address, &kpte, &kpmd))
 		return -EINVAL;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (6 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-24 11:41   ` David Laight
  2015-09-28 23:53   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 09/25] powerpc/8xx: show IMMR area in startup memory layout Christophe Leroy
                   ` (16 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Once the linear memory space has been mapped with 8Mb pages, as
seen in the related commit, we get 11 millions DTLB missed during
the reference 600s period. 77% of the missed are on user addresses
and 23% are on kernel addresses (1 fourth for linear address space
and 3 fourth for virtual address space)

Traditionaly, each driver manages one computer board which has its
own components with its own memory maps.
But on embedded chips like the MPC8xx, the SOC has all registers
located in the same IO area.

When looking at ioremaps done during startup, we see that
many drivers are re-mapping small parts of the IMMR for their own use
and all those small pieces gets their own 4k page, amplifying the
number of TLB misses: in our system we get 0xff000000 mapped 31 times
and 0xff003000 mapped 9 times.

With the patch, on the same principle as what was done for the RAM,
the IMMR gets mapped by a 512k page.

In 4k pages mode, we reserve a 4Mb area for mapping IMMR. The TLB
miss handler checks that we are within the first 512k and bail out
with page not marked valid if we are outside

In 16k pages mode, it is not realistic to reserve a 64Mb area, so
we do a standard mapping of the 512k area using 32 pages of 16:
the CPM will be mapped via the first two pages, and the SEC engine
will be mapped via the 16th and 17th pages

With this patch applies, the number of DTLB misses during the 10 min
period is reduced to 11.8 millions for a duration of 5.8s, which
represents 2% of the non-idle time hence yet another 10% reduction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v2:
- using bt instead of blt/bgt
- reorganised in order to have only one taken branch for both 512k
and 8M instead of a first branch for both 8M and 512k then a second
branch for 512k

 arch/powerpc/include/asm/pgtable-ppc32.h |  5 ++++
 arch/powerpc/kernel/head_8xx.S           | 36 ++++++++++++++++++++++-
 arch/powerpc/mm/8xx_mmu.c                | 50 ++++++++++++++++++++++++++++++++
 arch/powerpc/mm/pgtable_32.c             | 20 +++++++++++++
 4 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc32.h b/arch/powerpc/include/asm/pgtable-ppc32.h
index 9c32656..ad5324f 100644
--- a/arch/powerpc/include/asm/pgtable-ppc32.h
+++ b/arch/powerpc/include/asm/pgtable-ppc32.h
@@ -53,6 +53,11 @@ extern int icache_44x_need_flush;
 #define pgd_ERROR(e) \
 	pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 
+#ifdef CONFIG_PPC_8xx
+#define IMMR_BASE	(0xff000000UL)
+#define IMMR_SIZE	(1UL << 19)
+#endif
+
 /*
  * This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
  * value (for now) on others, from where we can start layout kernel
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 603124e..41b5d1b 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -253,6 +253,37 @@ DataAccess:
 	. = 0x400
 InstructionAccess:
 
+/*
+ * Bottom part of DTLBMiss handler for 512k pages
+ * not enough space in the primary location
+ */
+#ifdef CONFIG_PPC_4K_PAGES
+/*
+ * 512k pages are only used for mapping IMMR area in 4K pages mode.
+ * Only map the first 512k page of the 4M area covered by the PGD entry.
+ * This should not happen, but if we are called for another page of that
+ * area, don't mark it valid
+ *
+ * In 16k pages mode, IMMR is directly mapped with 16k pages
+ */
+DTLBMiss512k:
+	rlwinm.	r10, r10, 0, 0x00380000
+	bne-	1f
+	ori	r11, r11, MD_SVALID
+1:	mtcr	r3
+	MTSPR_CPU6(SPRN_MD_TWC, r11, r3)
+	rlwinm	r10, r11, 0, 0xffc00000
+	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY	| \
+			  _PAGE_PRESENT | _PAGE_NO_CACHE
+	MTSPR_CPU6(SPRN_MD_RPN, r10, r3)	/* Update TLB entry */
+
+	li	r11, RPN_PATTERN
+	mfspr	r3, SPRN_SPRG_SCRATCH2
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	EXCEPTION_EPILOG_0
+	rfi
+#endif
+
 /* External interrupt */
 	EXCEPTION(0x500, HardwareInterrupt, do_IRQ, EXC_XFER_LITE)
 
@@ -404,6 +435,9 @@ DataStoreTLBMiss:
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
 	mtcr	r11
 	bt-	28,DTLBMiss8M		/* bit 28 = Large page (8M) */
+#ifdef CONFIG_PPC_4K_PAGES
+	bt-	29,DTLBMiss512k		/* bit 29 = Large page (8M or 512K) */
+#endif
 	mtcr	r3
 
 	/* We have a pte table, so load fetch the pte from the table.
@@ -558,7 +592,7 @@ FixupDAR:/* Entry point for dcbx workaround. */
 3:	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
 	mtcr	r11
-	bt	28,200f		/* bit 28 = Large page (8M) */
+	bt	29,200f		/* bit 29 = Large page (8M or 512K) */
 	rlwinm	r11, r11,0,0,19	/* Extract page descriptor page address */
 	/* Insert level 2 index */
 	rlwimi	r11, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 0ddcb37..eeca14b 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -17,6 +17,54 @@
 #include "mmu_decl.h"
 
 extern int __map_without_ltlbs;
+
+/*
+ * Return PA for this VA if it is in IMMR area, or 0
+ */
+phys_addr_t v_mapped_by_ltlb(unsigned long va)
+{
+	unsigned long p = mfspr(SPRN_IMMR) & 0xfff80000;
+
+	if (__map_without_ltlbs)
+		return 0;
+	if (va >= IMMR_BASE && va < IMMR_BASE + IMMR_SIZE)
+		return p + va - IMMR_BASE;
+	return 0;
+}
+
+/*
+ * Return VA for a given PA or 0 if not mapped
+ */
+unsigned long p_mapped_by_ltlb(phys_addr_t pa)
+{
+	unsigned long p = mfspr(SPRN_IMMR) & 0xfff80000;
+
+	if (__map_without_ltlbs)
+		return 0;
+	if (pa >= p && pa < p + IMMR_SIZE)
+		return IMMR_BASE + pa - p;
+	return 0;
+}
+
+static void mmu_mapin_immr(void)
+{
+	unsigned long p = mfspr(SPRN_IMMR) & 0xfff80000;
+	unsigned long v = IMMR_BASE;
+#ifdef CONFIG_PPC_4K_PAGES
+	pmd_t *pmdp;
+	unsigned long val = p | MD_PS512K | MD_GUARDED;
+
+	pmdp = pmd_offset(pud_offset(pgd_offset_k(v), v), v);
+	pmd_val(*pmdp) = val;
+#else /* CONFIG_PPC_16K_PAGES */
+	unsigned long f = pgprot_val(PAGE_KERNEL_NCG);
+	int offset;
+
+	for (offset = 0; offset < IMMR_SIZE; offset += PAGE_SIZE)
+		map_page(v + offset, p + offset, f);
+#endif
+}
+
 /*
  * MMU_init_hw does the chip-specific initialization of the MMU hardware.
  */
@@ -79,6 +127,8 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
 	 */
 	memblock_set_current_limit(mapped);
 
+	mmu_mapin_immr();
+
 	return mapped;
 }
 
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 3fd9083..1f2fdbc 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -49,6 +49,10 @@ EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
 #define HAVE_TLBCAM	1
 #endif
 
+#if CONFIG_PPC_8xx
+#define HAVE_LTLB	1
+#endif
+
 extern char etext[], _stext[];
 
 #ifdef HAVE_BATS
@@ -67,6 +71,14 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t pa);
 #define p_mapped_by_tlbcam(x)	(0UL)
 #endif /* HAVE_TLBCAM */
 
+#ifdef HAVE_LTLB
+phys_addr_t v_mapped_by_ltlb(unsigned long va);
+unsigned long p_mapped_by_ltlb(phys_addr_t pa);
+#else /* !HAVE_LTLB */
+#define v_mapped_by_ltlb(x)	(0UL)
+#define p_mapped_by_ltlb(x)	(0UL)
+#endif /* HAVE_LTLB */
+
 static inline unsigned long p_mapped_by_other(phys_addr_t pa)
 {
 	unsigned long v;
@@ -75,6 +87,10 @@ static inline unsigned long p_mapped_by_other(phys_addr_t pa)
 	if (v /*&& p_mapped_by_bats(p+size-1)*/)
 		return v;
 
+	v = p_mapped_by_ltlb(pa);
+	if (v)
+		return v;
+
 	return p_mapped_by_tlbcam(pa);
 }
 
@@ -86,6 +102,10 @@ static inline phys_addr_t v_mapped_by_other(unsigned long va)
 	if (p)
 		return p;
 
+	p = v_mapped_by_ltlb(va);
+	if (p)
+		return p;
+
 	return v_mapped_by_tlbcam(va);
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 09/25] powerpc/8xx: show IMMR area in startup memory layout
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (7 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 10/25] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM Christophe Leroy
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

show IMMR area in startup memory layout

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/mm/mem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 22d94c3..e105ca6 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -367,6 +367,10 @@ void __init mem_init(void)
 	pr_info("  * 0x%08lx..0x%08lx  : highmem PTEs\n",
 		PKMAP_BASE, PKMAP_ADDR(LAST_PKMAP));
 #endif /* CONFIG_HIGHMEM */
+#ifdef IMMR_BASE
+	pr_info("  * 0x%08lx..0x%08lx  : IMMR\n",
+		IMMR_BASE, IMMR_BASE + IMMR_SIZE);
+#endif
 #ifdef CONFIG_NOT_COHERENT_CACHE
 	pr_info("  * 0x%08lx..0x%08lx  : consistent mem\n",
 		IOREMAP_TOP, IOREMAP_TOP + CONFIG_CONSISTENT_SIZE);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 10/25] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (8 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 09/25] powerpc/8xx: show IMMR area in startup memory layout Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup Christophe Leroy
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

IMMR is now mapped at 0xff000000 by page tables so it is not
anymore necessary to PIN TLBs

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/Kconfig.debug | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 70168a2..452b086 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -220,7 +220,6 @@ config PPC_EARLY_DEBUG_40x
 config PPC_EARLY_DEBUG_CPM
 	bool "Early serial debugging for Freescale CPM-based serial ports"
 	depends on SERIAL_CPM
-	select PIN_TLB if PPC_8xx
 	help
 	  Select this to enable early debugging for Freescale chips
 	  using a CPM-based serial port.  This assumes that the bootwrapper
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (9 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 10/25] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-28 23:58   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 12/25] powerpc32: Remove useless/wrong MMU:setio progress message Christophe Leroy
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

On recent kernels, with some debug options like for instance
CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough
the kernel code fits in the first 8M.
Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M
at startup, allthough pinning TLB is not necessary for that.

This patch adds a second 8M page to the initial mapping in order to
have 16M mapped regardless of CONFIG_PIN_TLB, like several other
32 bits PPC (40x, 601, ...)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/kernel/head_8xx.S | 2 ++
 arch/powerpc/mm/8xx_mmu.c      | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 41b5d1b..1238fbe 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -871,6 +871,7 @@ initial_mmu:
 	*/
 	addi	r10, r10, 0x0100
 	mtspr	SPRN_MD_CTR, r10
+#endif
 
 	lis	r8, KERNELBASE@h	/* Create vaddr for TLB */
 	addis	r8, r8, 0x0080		/* Add 8M */
@@ -883,6 +884,7 @@ initial_mmu:
 	addis	r11, r11, 0x0080	/* Add 8M */
 	mtspr	SPRN_MD_RPN, r11
 
+#ifdef CONFIG_PIN_TLB
 	addi	r10, r10, 0x0100
 	mtspr	SPRN_MD_CTR, r10
 
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index eeca14b..28283e2 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -144,7 +144,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	/* 8xx can only access 24MB at the moment */
 	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01800000));
 #else
-	/* 8xx can only access 8MB at the moment */
-	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
+	/* 8xx can only access 16MB at the moment */
+	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
 #endif
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 12/25] powerpc32: Remove useless/wrong MMU:setio progress message
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (10 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-22 16:50 ` [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations Christophe Leroy
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Commit 771168494719 ("[POWERPC] Remove unused machine call outs")
removed the call to setup_io_mappings(), so remove the associated
progress line message

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/mm/init_32.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 1a18e4b..4eb1b8f 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -178,10 +178,6 @@ void __init MMU_init(void)
 	/* Initialize early top-down ioremap allocator */
 	ioremap_bot = IOREMAP_TOP;
 
-	/* Map in I/O resources */
-	if (ppc_md.progress)
-		ppc_md.progress("MMU:setio", 0x302);
-
 	if (ppc_md.progress)
 		ppc_md.progress("MMU:exit", 0x211);
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (11 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 12/25] powerpc32: Remove useless/wrong MMU:setio progress message Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-29  0:00   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 14/25] powerpc32: remove ioremap_base Christophe Leroy
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

We are spending between 40 and 160 cycles with a mean of 65 cycles
in the TLB handling routines (measured with mftbl) so make it more
simple althought it adds one instruction

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/kernel/head_8xx.S | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 1238fbe..6c991e9 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -352,30 +352,25 @@ SystemCall:
 #endif
 
 InstructionTLBMiss:
-#ifdef CONFIG_8xx_CPU6
 	mtspr	SPRN_SPRG_SCRATCH2, r3
-#endif
 	EXCEPTION_PROLOG_0
 
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
+	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
+	INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
 #ifdef CONFIG_MODULES
 	/* Only modules will cause ITLB Misses as we always
 	 * pin the first 8MB of kernel memory */
-	mfspr	r11, SPRN_SRR0	/* Get effective address of fault */
-	INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
-	mfcr	r10
+	mfcr	r3
 	IS_KERNEL(r11, r11)
 	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
 	BRANCH_UNLESS_KERNEL(3f)
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 3:
-	mtcr	r10
-	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
+	mtcr	r3
 #else
-	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
-	INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
 	mfspr	r11, SPRN_M_TW	/* Get level 1 table base address */
 #endif
 	/* Insert level 1 index */
@@ -408,9 +403,7 @@ InstructionTLBMiss:
 	MTSPR_CPU6(SPRN_MI_RPN, r10, r3)	/* Update TLB entry */
 
 	/* Restore registers */
-#ifdef CONFIG_8xx_CPU6
 	mfspr	r3, SPRN_SPRG_SCRATCH2
-#endif
 	EXCEPTION_EPILOG_0
 	rfi
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 14/25] powerpc32: remove ioremap_base
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (12 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-29  0:38   ` Scott Wood
  2015-09-22 16:50 ` [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones Christophe Leroy
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

ioremap_base is not initialised and is nowhere used so remove it

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/mm/mmu_decl.h                  | 1 -
 arch/powerpc/mm/pgtable_32.c                | 1 -
 arch/powerpc/platforms/embedded6xx/mpc10x.h | 8 --------
 3 files changed, 10 deletions(-)

diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 99e88bc..6f1e7cd 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -100,7 +100,6 @@ extern void setbat(int index, unsigned long virt, phys_addr_t phys,
 
 extern int __map_without_bats;
 extern int __allow_ioremap_reserved;
-extern unsigned long ioremap_base;
 extern unsigned int rtas_data, rtas_size;
 
 struct hash_pte;
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 1f2fdbc..dc51ab4 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -37,7 +37,6 @@
 
 #include "mmu_decl.h"
 
-unsigned long ioremap_base;
 unsigned long ioremap_bot;
 EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
 
diff --git a/arch/powerpc/platforms/embedded6xx/mpc10x.h b/arch/powerpc/platforms/embedded6xx/mpc10x.h
index b290b63..8de6104 100644
--- a/arch/powerpc/platforms/embedded6xx/mpc10x.h
+++ b/arch/powerpc/platforms/embedded6xx/mpc10x.h
@@ -138,14 +138,6 @@
 #define MPC10X_EUMB_WP_OFFSET		0x000ff000 /* Data path diagnostic, watchpoint reg offset */
 #define MPC10X_EUMB_WP_SIZE		0x00001000 /* Data path diagnostic, watchpoint reg size */
 
-/*
- * Define some recommended places to put the EUMB regs.
- * For both maps, recommend putting the EUMB from 0xeff00000 to 0xefffffff.
- */
-extern unsigned long			ioremap_base;
-#define	MPC10X_MAPA_EUMB_BASE		(ioremap_base - MPC10X_EUMB_SIZE)
-#define	MPC10X_MAPB_EUMB_BASE		MPC10X_MAPA_EUMB_BASE
-
 enum ppc_sys_devices {
 	MPC10X_IIC1,
 	MPC10X_DMA0,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (13 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 14/25] powerpc32: remove ioremap_base Christophe Leroy
@ 2015-09-22 16:50 ` Christophe Leroy
  2015-09-29  0:03   ` Scott Wood
  2015-09-22 16:51 ` [PATCH v2 16/25] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro Christophe Leroy
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Move 8xx SPRN defines into reg_8xx.h and add some missing ones

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/include/asm/mmu-8xx.h | 26 +++++++++++++-------------
 arch/powerpc/include/asm/reg_8xx.h | 24 ++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index f05500a..44408d6 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -11,7 +11,7 @@
  * is written, and the contents of several registers are used to
  * create the entry.
  */
-#define SPRN_MI_CTR	784	/* Instruction TLB control register */
+/* SPRN_MI_CTR */	/* Instruction TLB control register */
 #define MI_GPM		0x80000000	/* Set domain manager mode */
 #define MI_PPM		0x40000000	/* Set subpage protection */
 #define MI_CIDEF	0x20000000	/* Set cache inhibit when MMU dis */
@@ -23,7 +23,7 @@
 /* These are the Ks and Kp from the PowerPC books.  For proper operation,
  * Ks = 0, Kp = 1.
  */
-#define SPRN_MI_AP	786
+/* SPRN_MI_AP */
 #define MI_Ks		0x80000000	/* Should not be set */
 #define MI_Kp		0x40000000	/* Should always be set */
 
@@ -44,7 +44,7 @@
  * about the last instruction TLB miss.  When MI_RPN is written, bits in
  * this register are used to create the TLB entry.
  */
-#define SPRN_MI_EPN	787
+/* SPRN_MI_EPN */
 #define MI_EPNMASK	0xfffff000	/* Effective page number for entry */
 #define MI_EVALID	0x00000200	/* Entry is valid */
 #define MI_ASIDMASK	0x0000000f	/* ASID match value */
@@ -54,7 +54,7 @@
  * For the instruction TLB, it contains bits that get loaded into the
  * TLB entry when the MI_RPN is written.
  */
-#define SPRN_MI_TWC	789
+/* SPRN_MI_TWC */
 #define MI_APG		0x000001e0	/* Access protection group (0) */
 #define MI_GUARDED	0x00000010	/* Guarded storage */
 #define MI_PSMASK	0x0000000c	/* Mask of page size bits */
@@ -68,7 +68,7 @@
  * causes a TLB entry to be created for the instruction TLB, using
  * additional information from the MI_EPN, and MI_TWC registers.
  */
-#define SPRN_MI_RPN	790
+/* SPRN_MI_RPN */
 #define MI_SPS16K	0x00000008	/* Small page size (0 = 4k, 1 = 16k) */
 
 /* Define an RPN value for mapping kernel memory to large virtual
@@ -78,7 +78,7 @@
  */
 #define MI_BOOTINIT	0x000001fd
 
-#define SPRN_MD_CTR	792	/* Data TLB control register */
+/* SPRN_MD_CTR */	/* Data TLB control register */
 #define MD_GPM		0x80000000	/* Set domain manager mode */
 #define MD_PPM		0x40000000	/* Set subpage protection */
 #define MD_CIDEF	0x20000000	/* Set cache inhibit when MMU dis */
@@ -89,14 +89,14 @@
 #define MD_IDXMASK	0x00001f00	/* TLB index to be loaded */
 #define MD_RESETVAL	0x04000000	/* Value of register at reset */
 
-#define SPRN_M_CASID	793	/* Address space ID (context) to match */
+/* SPRN_M_CASID */	/* Address space ID (context) to match */
 #define MC_ASIDMASK	0x0000000f	/* Bits used for ASID value */
 
 
 /* These are the Ks and Kp from the PowerPC books.  For proper operation,
  * Ks = 0, Kp = 1.
  */
-#define SPRN_MD_AP	794
+/* SPRN_MD_AP */
 #define MD_Ks		0x80000000	/* Should not be set */
 #define MD_Kp		0x40000000	/* Should always be set */
 
@@ -117,7 +117,7 @@
  * about the last instruction TLB miss.  When MD_RPN is written, bits in
  * this register are used to create the TLB entry.
  */
-#define SPRN_MD_EPN	795
+/* SPRN_MD_EPN */
 #define MD_EPNMASK	0xfffff000	/* Effective page number for entry */
 #define MD_EVALID	0x00000200	/* Entry is valid */
 #define MD_ASIDMASK	0x0000000f	/* ASID match value */
@@ -127,7 +127,7 @@
  * During a software tablewalk, reading this register provides the address
  * of the entry associated with MD_EPN.
  */
-#define SPRN_M_TWB	796
+/* SPRN_M_TWB */
 #define	M_L1TB		0xfffff000	/* Level 1 table base address */
 #define M_L1INDX	0x00000ffc	/* Level 1 index, when read */
 					/* Reset value is undefined */
@@ -137,7 +137,7 @@
  * when the MD_RPN is written.  It is also provides the hardware assist
  * for finding the PTE address during software tablewalk.
  */
-#define SPRN_MD_TWC	797
+/* SPRN_MD_TWC */
 #define MD_L2TB		0xfffff000	/* Level 2 table base address */
 #define MD_L2INDX	0xfffffe00	/* Level 2 index (*pte), when read */
 #define MD_APG		0x000001e0	/* Access protection group (0) */
@@ -155,13 +155,13 @@
  * causes a TLB entry to be created for the data TLB, using
  * additional information from the MD_EPN, and MD_TWC registers.
  */
-#define SPRN_MD_RPN	798
+/* SPRN_MD_RPN */
 #define MD_SPS16K	0x00000008	/* Small page size (0 = 4k, 1 = 16k) */
 
 /* This is a temporary storage register that could be used to save
  * a processor working register during a tablewalk.
  */
-#define SPRN_M_TW	799
+/* SPRN_M_TW */
 
 #ifndef __ASSEMBLY__
 typedef struct {
diff --git a/arch/powerpc/include/asm/reg_8xx.h b/arch/powerpc/include/asm/reg_8xx.h
index e8ea346..150323c 100644
--- a/arch/powerpc/include/asm/reg_8xx.h
+++ b/arch/powerpc/include/asm/reg_8xx.h
@@ -4,6 +4,21 @@
 #ifndef _ASM_POWERPC_REG_8xx_H
 #define _ASM_POWERPC_REG_8xx_H
 
+/* MMU registers */
+#define SPRN_MI_CTR	784	/* Instruction TLB control register */
+#define SPRN_MI_AP	786
+#define SPRN_MI_EPN	787
+#define SPRN_MI_TWC	789
+#define SPRN_MI_RPN	790
+#define SPRN_MD_CTR	792	/* Data TLB control register */
+#define SPRN_M_CASID	793	/* Address space ID (context) to match */
+#define SPRN_MD_AP	794
+#define SPRN_MD_EPN	795
+#define SPRN_M_TWB	796
+#define SPRN_MD_TWC	797
+#define SPRN_MD_RPN	798
+#define SPRN_M_TW	799
+
 /* Cache control on the MPC8xx is provided through some additional
  * special purpose registers.
  */
@@ -14,6 +29,15 @@
 #define SPRN_DC_ADR	569	/* Address needed for some commands */
 #define SPRN_DC_DAT	570	/* Read-only data register */
 
+/* Misc Debug */
+#define SPRN_DPDR	630
+#define SPRN_MI_CAM	816
+#define SPRN_MI_RAM0	817
+#define SPRN_MI_RAM1	818
+#define SPRN_MD_CAM	824
+#define SPRN_MD_RAM0	825
+#define SPRN_MD_RAM1	826
+
 /* Commands.  Only the first few are available to the instruction cache.
 */
 #define	IDC_ENABLE	0x02000000	/* Cache enable */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 16/25] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (14 preceding siblings ...)
  2015-09-22 16:50 ` [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 16:51 ` [PATCH v2 17/25] powerpc/8xx: remove special handling of CPU6 errata in set_dec() Christophe Leroy
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

MPC8xx has an ERRATA on the use of mtspr() for some registers
This patch includes the ERRATA handling directly into mtspr() macro
so that mtspr() users don't need to bother about that errata

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/include/asm/reg.h     |  2 +
 arch/powerpc/include/asm/reg_8xx.h | 82 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index aa1cc5f0..114b1e6 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1203,9 +1203,11 @@
 #define mfspr(rn)	({unsigned long rval; \
 			asm volatile("mfspr %0," __stringify(rn) \
 				: "=r" (rval)); rval;})
+#ifndef mtspr
 #define mtspr(rn, v)	asm volatile("mtspr " __stringify(rn) ",%0" : \
 				     : "r" ((unsigned long)(v)) \
 				     : "memory")
+#endif
 
 static inline unsigned long mfvtb (void)
 {
diff --git a/arch/powerpc/include/asm/reg_8xx.h b/arch/powerpc/include/asm/reg_8xx.h
index 150323c..59f9b72 100644
--- a/arch/powerpc/include/asm/reg_8xx.h
+++ b/arch/powerpc/include/asm/reg_8xx.h
@@ -63,4 +63,86 @@
 #define DC_DFWT		0x40000000	/* Data cache is forced write through */
 #define DC_LES		0x20000000	/* Caches are little endian mode */
 
+#ifdef CONFIG_8xx_CPU6
+#define do_mtspr_cpu6(rn, rn_addr, v)	\
+	do {								\
+		int _reg_cpu6 = rn_addr, _tmp_cpu6[1];		\
+		asm volatile("stw %0, %1;"				\
+			     "lwz %0, %1;"				\
+			     "mtspr " __stringify(rn) ",%2" :		\
+			     : "r" (_reg_cpu6), "m"(_tmp_cpu6),		\
+			       "r" ((unsigned long)(v))			\
+			     : "memory");				\
+	} while (0)
+
+#define do_mtspr(rn, v)	asm volatile("mtspr " __stringify(rn) ",%0" :	\
+				     : "r" ((unsigned long)(v))		\
+				     : "memory")
+#define mtspr(rn, v) \
+	do {								\
+		if (rn == SPRN_IMMR)					\
+			do_mtspr_cpu6(rn, 0x3d30, v);			\
+		else if (rn == SPRN_IC_CST)				\
+			do_mtspr_cpu6(rn, 0x2110, v);			\
+		else if (rn == SPRN_IC_ADR)				\
+			do_mtspr_cpu6(rn, 0x2310, v);			\
+		else if (rn == SPRN_IC_DAT)				\
+			do_mtspr_cpu6(rn, 0x2510, v);			\
+		else if (rn == SPRN_DC_CST)				\
+			do_mtspr_cpu6(rn, 0x3110, v);			\
+		else if (rn == SPRN_DC_ADR)				\
+			do_mtspr_cpu6(rn, 0x3310, v);			\
+		else if (rn == SPRN_DC_DAT)				\
+			do_mtspr_cpu6(rn, 0x3510, v);			\
+		else if (rn == SPRN_MI_CTR)				\
+			do_mtspr_cpu6(rn, 0x2180, v);			\
+		else if (rn == SPRN_MI_AP)				\
+			do_mtspr_cpu6(rn, 0x2580, v);			\
+		else if (rn == SPRN_MI_EPN)				\
+			do_mtspr_cpu6(rn, 0x2780, v);			\
+		else if (rn == SPRN_MI_TWC)				\
+			do_mtspr_cpu6(rn, 0x2b80, v);			\
+		else if (rn == SPRN_MI_RPN)				\
+			do_mtspr_cpu6(rn, 0x2d80, v);			\
+		else if (rn == SPRN_MI_CAM)				\
+			do_mtspr_cpu6(rn, 0x2190, v);			\
+		else if (rn == SPRN_MI_RAM0)				\
+			do_mtspr_cpu6(rn, 0x2390, v);			\
+		else if (rn == SPRN_MI_RAM1)				\
+			do_mtspr_cpu6(rn, 0x2590, v);			\
+		else if (rn == SPRN_MD_CTR)				\
+			do_mtspr_cpu6(rn, 0x3180, v);			\
+		else if (rn == SPRN_M_CASID)				\
+			do_mtspr_cpu6(rn, 0x3380, v);			\
+		else if (rn == SPRN_MD_AP)				\
+			do_mtspr_cpu6(rn, 0x3580, v);			\
+		else if (rn == SPRN_MD_EPN)				\
+			do_mtspr_cpu6(rn, 0x3780, v);			\
+		else if (rn == SPRN_M_TWB)				\
+			do_mtspr_cpu6(rn, 0x3980, v);			\
+		else if (rn == SPRN_MD_TWC)				\
+			do_mtspr_cpu6(rn, 0x3b80, v);			\
+		else if (rn == SPRN_MD_RPN)				\
+			do_mtspr_cpu6(rn, 0x3d80, v);			\
+		else if (rn == SPRN_M_TW)				\
+			do_mtspr_cpu6(rn, 0x3f80, v);			\
+		else if (rn == SPRN_MD_CAM)				\
+			do_mtspr_cpu6(rn, 0x3190, v);			\
+		else if (rn == SPRN_MD_RAM0)				\
+			do_mtspr_cpu6(rn, 0x3390, v);			\
+		else if (rn == SPRN_MD_RAM1)				\
+			do_mtspr_cpu6(rn, 0x3590, v);			\
+		else if (rn == SPRN_DEC)				\
+			do_mtspr_cpu6(rn, 0x2c00, v);			\
+		else if (rn == SPRN_TBWL)				\
+			do_mtspr_cpu6(rn, 0x3880, v);			\
+		else if (rn == SPRN_TBWU)				\
+			do_mtspr_cpu6(rn, 0x3a80, v);			\
+		else if (rn == SPRN_DPDR)				\
+			do_mtspr_cpu6(rn, 0x2d30, v);			\
+		else							\
+			do_mtspr(rn, v);				\
+	} while (0)
+#endif
+
 #endif /* _ASM_POWERPC_REG_8xx_H */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 17/25] powerpc/8xx: remove special handling of CPU6 errata in set_dec()
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (15 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 16/25] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 16:51 ` [PATCH v2 18/25] powerpc/8xx: rewrite set_context() in C Christophe Leroy
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

CPU6 ERRATA is now handled directly in mtspr(), so we can use the
standard set_dec() fonction in all cases.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/include/asm/time.h |  6 +-----
 arch/powerpc/kernel/head_8xx.S  | 18 ------------------
 2 files changed, 1 insertion(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 10fc784..e18c203 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -32,8 +32,6 @@ extern void tick_broadcast_ipi_handler(void);
 
 extern void generic_calibrate_decr(void);
 
-extern void set_dec_cpu6(unsigned int val);
-
 /* Some sane defaults: 125 MHz timebase, 1GHz processor */
 extern unsigned long ppc_proc_freq;
 #define DEFAULT_PROC_FREQ	(DEFAULT_TB_FREQ * 8)
@@ -167,14 +165,12 @@ static inline void set_dec(int val)
 {
 #if defined(CONFIG_40x)
 	mtspr(SPRN_PIT, val);
-#elif defined(CONFIG_8xx_CPU6)
-	set_dec_cpu6(val - 1);
 #else
 #ifndef CONFIG_BOOKE
 	--val;
 #endif
 	mtspr(SPRN_DEC, val);
-#endif /* not 40x or 8xx_CPU6 */
+#endif /* not 40x */
 }
 
 static inline unsigned long tb_ticks_since(unsigned long tstamp)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 6c991e9..c89d55b 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -955,24 +955,6 @@ _GLOBAL(set_context)
 	SYNC
 	blr
 
-#ifdef CONFIG_8xx_CPU6
-/* It's here because it is unique to the 8xx.
- * It is important we get called with interrupts disabled.  I used to
- * do that, but it appears that all code that calls this already had
- * interrupt disabled.
- */
-	.globl	set_dec_cpu6
-set_dec_cpu6:
-	lis	r7, cpu6_errata_word@h
-	ori	r7, r7, cpu6_errata_word@l
-	li	r4, 0x2c00
-	stw	r4, 8(r7)
-	lwz	r4, 8(r7)
-        mtspr   22, r3		/* Update Decrementer */
-	SYNC
-	blr
-#endif
-
 /*
  * We put a few things here that have to be page-aligned.
  * This stuff goes at the beginning of the data segment,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 18/25] powerpc/8xx: rewrite set_context() in C
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (16 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 17/25] powerpc/8xx: remove special handling of CPU6 errata in set_dec() Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 16:51 ` [PATCH v2 19/25] powerpc/8xx: rewrite flush_instruction_cache() " Christophe Leroy
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

There is no real need to have set_context() in assembly.
Now that we have mtspr() handling CPU6 ERRATA directly, we
can rewrite set_context() in C language for easier maintenance.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/kernel/head_8xx.S | 44 ------------------------------------------
 arch/powerpc/mm/8xx_mmu.c      | 34 ++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c89d55b..d58ab51 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -912,50 +912,6 @@ initial_mmu:
 
 
 /*
- * Set up to use a given MMU context.
- * r3 is context number, r4 is PGD pointer.
- *
- * We place the physical address of the new task page directory loaded
- * into the MMU base register, and set the ASID compare register with
- * the new "context."
- */
-_GLOBAL(set_context)
-
-#ifdef CONFIG_BDI_SWITCH
-	/* Context switch the PTE pointer for the Abatron BDI2000.
-	 * The PGDIR is passed as second argument.
-	 */
-	lis	r5, KERNELBASE@h
-	lwz	r5, 0xf0(r5)
-	stw	r4, 0x4(r5)
-#endif
-
-	/* Register M_TW will contain base address of level 1 table minus the
-	 * lower part of the kernel PGDIR base address, so that all accesses to
-	 * level 1 table are done relative to lower part of kernel PGDIR base
-	 * address.
-	 */
-	li	r5, (swapper_pg_dir-PAGE_OFFSET)@l
-	sub	r4, r4, r5
-	tophys	(r4, r4)
-#ifdef CONFIG_8xx_CPU6
-	lis	r6, cpu6_errata_word@h
-	ori	r6, r6, cpu6_errata_word@l
-	li	r7, 0x3f80
-	stw	r7, 12(r6)
-	lwz	r7, 12(r6)
-#endif
-	mtspr	SPRN_M_TW, r4		/* Update pointeur to level 1 table */
-#ifdef CONFIG_8xx_CPU6
-	li	r7, 0x3380
-	stw	r7, 12(r6)
-	lwz	r7, 12(r6)
-#endif
-	mtspr	SPRN_M_CASID, r3	/* Update context */
-	SYNC
-	blr
-
-/*
  * We put a few things here that have to be page-aligned.
  * This stuff goes at the beginning of the data segment,
  * which is page-aligned.
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 28283e2..09e5d08 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -148,3 +148,37 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
 #endif
 }
+
+/*
+ * Set up to use a given MMU context.
+ * id is context number, pgd is PGD pointer.
+ *
+ * We place the physical address of the new task page directory loaded
+ * into the MMU base register, and set the ASID compare register with
+ * the new "context."
+ */
+void set_context(unsigned long id, pgd_t *pgd)
+{
+	s16 offset = (s16)(__pa(swapper_pg_dir));
+
+#ifdef CONFIG_BDI_SWITCH
+	pgd_t	**ptr = *(pgd_t ***)(KERNELBASE + 0xf0);
+
+	/* Context switch the PTE pointer for the Abatron BDI2000.
+	 * The PGDIR is passed as second argument.
+	 */
+	*(ptr + 1) = pgd;
+#endif
+
+	/* Register M_TW will contain base address of level 1 table minus the
+	 * lower part of the kernel PGDIR base address, so that all accesses to
+	 * level 1 table are done relative to lower part of kernel PGDIR base
+	 * address.
+	 */
+	mtspr(SPRN_M_TW, __pa(pgd) - offset);
+
+	/* Update context */
+	mtspr(SPRN_M_CASID, id);
+	/* sync */
+	mb();
+}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 19/25] powerpc/8xx: rewrite flush_instruction_cache() in C
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (17 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 18/25] powerpc/8xx: rewrite set_context() in C Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 16:51 ` [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

On PPC8xx, flushing instruction cache is performed by writing
in register SPRN_IC_CST. This registers suffers CPU6 ERRATA.
The patch rewrites the fonction in C so that CPU6 ERRATA will
be handled transparently

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/kernel/misc_32.S | 10 ++++------
 arch/powerpc/mm/8xx_mmu.c     |  7 +++++++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index ed3ab50..72fd7a7 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -296,12 +296,9 @@ _GLOBAL(real_writeb)
  * Flush instruction cache.
  * This is a no-op on the 601.
  */
+#ifndef CONFIG_PPC_8xx
 _GLOBAL(flush_instruction_cache)
-#if defined(CONFIG_8xx)
-	isync
-	lis	r5, IDC_INVALL@h
-	mtspr	SPRN_IC_CST, r5
-#elif defined(CONFIG_4xx)
+#if defined(CONFIG_4xx)
 #ifdef CONFIG_403GCX
 	li      r3, 512
 	mtctr   r3
@@ -334,9 +331,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE)
 	mfspr	r3,SPRN_HID0
 	ori	r3,r3,HID0_ICFI
 	mtspr	SPRN_HID0,r3
-#endif /* CONFIG_8xx/4xx */
+#endif /* CONFIG_4xx */
 	isync
 	blr
+#endif /* CONFIG_PPC_8xx */
 
 /*
  * Write any modified data cache blocks out to memory
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 09e5d08..08a489c 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -182,3 +182,10 @@ void set_context(unsigned long id, pgd_t *pgd)
 	/* sync */
 	mb();
 }
+
+void flush_instruction_cache(void)
+{
+	isync();
+	mtspr(SPRN_IC_CST, IDC_INVALL);
+	isync();
+}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (18 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 19/25] powerpc/8xx: rewrite flush_instruction_cache() " Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 17:57   ` Joakim Tjernlund
  2015-09-29  0:23   ` Scott Wood
  2015-09-22 16:51 ` [PATCH v2 21/25] powerpc: add inline functions for cache related instructions Christophe Leroy
                   ` (4 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

clear_pages() is never used, and PPC32 is the only architecture
(still) having this function. Neither PPC64 nor any other
architecture has it.

This patch removes clear_page() and move clear_page() function
inline (same as PPC64) as it only is a few isns

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/include/asm/page_32.h | 17 ++++++++++++++---
 arch/powerpc/kernel/misc_32.S      | 16 ----------------
 arch/powerpc/kernel/ppc_ksyms_32.c |  1 -
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/page_32.h b/arch/powerpc/include/asm/page_32.h
index 68d73b2..6a8e179 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_POWERPC_PAGE_32_H
 #define _ASM_POWERPC_PAGE_32_H
 
+#include <asm/cache.h>
+
 #if defined(CONFIG_PHYSICAL_ALIGN) && (CONFIG_PHYSICAL_START != 0)
 #if (CONFIG_PHYSICAL_START % CONFIG_PHYSICAL_ALIGN) != 0
 #error "CONFIG_PHYSICAL_START must be a multiple of CONFIG_PHYSICAL_ALIGN"
@@ -36,9 +38,18 @@ typedef unsigned long long pte_basic_t;
 typedef unsigned long pte_basic_t;
 #endif
 
-struct page;
-extern void clear_pages(void *page, int order);
-static inline void clear_page(void *page) { clear_pages(page, 0); }
+/*
+ * Clear page using the dcbz instruction, which doesn't cause any
+ * memory traffic (except to write out any cache lines which get
+ * displaced).  This only works on cacheable memory.
+ */
+static inline void clear_page(void *addr)
+{
+	unsigned int i;
+
+	for (i = 0; i < PAGE_SIZE / L1_CACHE_BYTES; i++, addr += L1_CACHE_BYTES)
+		dcbz(addr);
+}
 extern void copy_page(void *to, void *from);
 
 #include <asm-generic/getorder.h>
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 72fd7a7..ce3ca08 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -517,22 +517,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
 #endif /* CONFIG_BOOKE */
 
 /*
- * Clear pages using the dcbz instruction, which doesn't cause any
- * memory traffic (except to write out any cache lines which get
- * displaced).  This only works on cacheable memory.
- *
- * void clear_pages(void *page, int order) ;
- */
-_GLOBAL(clear_pages)
-	li	r0,PAGE_SIZE/L1_CACHE_BYTES
-	slw	r0,r0,r4
-	mtctr	r0
-1:	dcbz	0,r3
-	addi	r3,r3,L1_CACHE_BYTES
-	bdnz	1b
-	blr
-
-/*
  * Copy a whole page.  We use the dcbz instruction on the destination
  * to reduce memory traffic (it eliminates the unnecessary reads of
  * the destination into cache).  This requires that the destination
diff --git a/arch/powerpc/kernel/ppc_ksyms_32.c b/arch/powerpc/kernel/ppc_ksyms_32.c
index 30ddd8a..2bfaafe 100644
--- a/arch/powerpc/kernel/ppc_ksyms_32.c
+++ b/arch/powerpc/kernel/ppc_ksyms_32.c
@@ -10,7 +10,6 @@
 #include <asm/pgtable.h>
 #include <asm/dcr.h>
 
-EXPORT_SYMBOL(clear_pages);
 EXPORT_SYMBOL(ISA_DMA_THRESHOLD);
 EXPORT_SYMBOL(DMA_MODE_READ);
 EXPORT_SYMBOL(DMA_MODE_WRITE);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 21/25] powerpc: add inline functions for cache related instructions
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (19 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-29  0:25   ` Scott Wood
  2015-09-22 16:51 ` [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

This patch adds inline functions to use dcbz, dcbi, dcbf, dcbst
from C functions

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
New in v2

 arch/powerpc/include/asm/cache.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index 0dc42c5..a2de4f0 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -76,6 +76,25 @@ extern void _set_L3CR(unsigned long);
 #define _set_L3CR(val)	do { } while(0)
 #endif
 
+static inline void dcbz(void *addr)
+{
+	__asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbi(void *addr)
+{
+	__asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbf(void *addr)
+{
+	__asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbst(void *addr)
+{
+	__asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory");
+}
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (20 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 21/25] powerpc: add inline functions for cache related instructions Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 18:12   ` Joakim Tjernlund
  2015-09-29  0:29   ` Scott Wood
  2015-09-22 16:51 ` [PATCH v2 23/25] powerpc: Simplify test in __dma_sync() Christophe Leroy
                   ` (2 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

flush/clean/invalidate _dcache_range() functions are all very
similar and are quite short. They are mainly used in __dma_sync()
perf_event locate them in the top 3 consumming functions during
heavy ethernet activity

They are good candidate for inlining, as __dma_sync() does
almost nothing but calling them

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
New in v2

 arch/powerpc/include/asm/cacheflush.h | 55 +++++++++++++++++++++++++++--
 arch/powerpc/kernel/misc_32.S         | 65 -----------------------------------
 arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
 3 files changed, 54 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 6229e6b..6169604 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -47,12 +47,61 @@ static inline void __flush_dcache_icache_phys(unsigned long physaddr)
 }
 #endif
 
-extern void flush_dcache_range(unsigned long start, unsigned long stop);
 #ifdef CONFIG_PPC32
-extern void clean_dcache_range(unsigned long start, unsigned long stop);
-extern void invalidate_dcache_range(unsigned long start, unsigned long stop);
+/*
+ * Write any modified data cache blocks out to memory and invalidate them.
+ * Does not invalidate the corresponding instruction cache blocks.
+ */
+static inline void flush_dcache_range(unsigned long start, unsigned long stop)
+{
+	void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
+	unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
+	unsigned int i;
+
+	for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
+		dcbf(addr);
+	if (i)
+		mb();	/* sync */
+}
+
+/*
+ * Write any modified data cache blocks out to memory.
+ * Does not invalidate the corresponding cache lines (especially for
+ * any corresponding instruction cache).
+ */
+static inline void clean_dcache_range(unsigned long start, unsigned long stop)
+{
+	void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
+	unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
+	unsigned int i;
+
+	for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
+		dcbst(addr);
+	if (i)
+		mb();	/* sync */
+}
+
+/*
+ * Like above, but invalidate the D-cache.  This is used by the 8xx
+ * to invalidate the cache so the PPC core doesn't get stale data
+ * from the CPM (no cache snooping here :-).
+ */
+static inline void invalidate_dcache_range(unsigned long start,
+					   unsigned long stop)
+{
+	void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
+	unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
+	unsigned int i;
+
+	for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
+		dcbi(addr);
+	if (i)
+		mb();	/* sync */
+}
+
 #endif /* CONFIG_PPC32 */
 #ifdef CONFIG_PPC64
+extern void flush_dcache_range(unsigned long start, unsigned long stop);
 extern void flush_inval_dcache_range(unsigned long start, unsigned long stop);
 extern void flush_dcache_phys_range(unsigned long start, unsigned long stop);
 #endif
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index ce3ca08..1728f61 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -375,71 +375,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
 	isync
 	blr
 /*
- * Write any modified data cache blocks out to memory.
- * Does not invalidate the corresponding cache lines (especially for
- * any corresponding instruction cache).
- *
- * clean_dcache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(clean_dcache_range)
-	li	r5,L1_CACHE_BYTES-1
-	andc	r3,r3,r5
-	subf	r4,r3,r4
-	add	r4,r4,r5
-	srwi.	r4,r4,L1_CACHE_SHIFT
-	beqlr
-	mtctr	r4
-
-1:	dcbst	0,r3
-	addi	r3,r3,L1_CACHE_BYTES
-	bdnz	1b
-	sync				/* wait for dcbst's to get to ram */
-	blr
-
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
- * Does not invalidate the corresponding instruction cache blocks.
- *
- * flush_dcache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(flush_dcache_range)
-	li	r5,L1_CACHE_BYTES-1
-	andc	r3,r3,r5
-	subf	r4,r3,r4
-	add	r4,r4,r5
-	srwi.	r4,r4,L1_CACHE_SHIFT
-	beqlr
-	mtctr	r4
-
-1:	dcbf	0,r3
-	addi	r3,r3,L1_CACHE_BYTES
-	bdnz	1b
-	sync				/* wait for dcbst's to get to ram */
-	blr
-
-/*
- * Like above, but invalidate the D-cache.  This is used by the 8xx
- * to invalidate the cache so the PPC core doesn't get stale data
- * from the CPM (no cache snooping here :-).
- *
- * invalidate_dcache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(invalidate_dcache_range)
-	li	r5,L1_CACHE_BYTES-1
-	andc	r3,r3,r5
-	subf	r4,r3,r4
-	add	r4,r4,r5
-	srwi.	r4,r4,L1_CACHE_SHIFT
-	beqlr
-	mtctr	r4
-
-1:	dcbi	0,r3
-	addi	r3,r3,L1_CACHE_BYTES
-	bdnz	1b
-	sync				/* wait for dcbi's to get to ram */
-	blr
-
-/*
  * Flush a particular page from the data cache to RAM.
  * Note: this is necessary because the instruction cache does *not*
  * snoop from the data cache.
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 202963e..0546947 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -6,7 +6,9 @@
 #include <asm/cacheflush.h>
 #include <asm/epapr_hcalls.h>
 
+#ifdef CONFIG_PPC64
 EXPORT_SYMBOL(flush_dcache_range);
+#endif
 EXPORT_SYMBOL(flush_icache_range);
 
 EXPORT_SYMBOL(empty_zero_page);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 23/25] powerpc: Simplify test in __dma_sync()
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (21 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 16:51 ` [PATCH v2 24/25] powerpc32: small optimisation in flush_icache_range() Christophe Leroy
  2015-09-22 16:51 ` [PATCH v2 25/25] powerpc32: Remove one insn in mulhdu Christophe Leroy
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

This simplification helps the compiler. We now have only one test
instead of two, so it reduces the number of branches.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
New in v2

 arch/powerpc/mm/dma-noncoherent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index 169aba4..2dc74e5 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -327,7 +327,7 @@ void __dma_sync(void *vaddr, size_t size, int direction)
 		 * invalidate only when cache-line aligned otherwise there is
 		 * the potential for discarding uncommitted data from the cache
 		 */
-		if ((start & (L1_CACHE_BYTES - 1)) || (size & (L1_CACHE_BYTES - 1)))
+		if ((start | end) & (L1_CACHE_BYTES - 1))
 			flush_dcache_range(start, end);
 		else
 			invalidate_dcache_range(start, end);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 24/25] powerpc32: small optimisation in flush_icache_range()
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (22 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 23/25] powerpc: Simplify test in __dma_sync() Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  2015-09-22 16:51 ` [PATCH v2 25/25] powerpc32: Remove one insn in mulhdu Christophe Leroy
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Inlining of _dcache_range() functions has shown that the compiler
does the same thing a bit better with one insn less

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
New in v2

 arch/powerpc/kernel/misc_32.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 1728f61..1597424 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -348,10 +348,9 @@ BEGIN_FTR_SECTION
 	PURGE_PREFETCHED_INS
 	blr				/* for 601, do nothing */
 END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
-	li	r5,L1_CACHE_BYTES-1
-	andc	r3,r3,r5
+	rlwinm	r3,r3,0,0,31 - L1_CACHE_SHIFT
 	subf	r4,r3,r4
-	add	r4,r4,r5
+	addi	r4,r4,L1_CACHE_BYTES - 1
 	srwi.	r4,r4,L1_CACHE_SHIFT
 	beqlr
 	mtctr	r4
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 25/25] powerpc32: Remove one insn in mulhdu
  2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
                   ` (23 preceding siblings ...)
  2015-09-22 16:51 ` [PATCH v2 24/25] powerpc32: small optimisation in flush_icache_range() Christophe Leroy
@ 2015-09-22 16:51 ` Christophe Leroy
  24 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, scottwood
  Cc: linux-kernel, linuxppc-dev

Remove one instruction in mulhdu

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
New in v2

 arch/powerpc/kernel/misc_32.S | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 1597424..870dc63 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -91,17 +91,16 @@ _GLOBAL(mulhdu)
 	addc	r7,r0,r7
 	addze	r4,r4
 1:	beqlr	cr1		/* all done if high part of A is 0 */
-	mr	r10,r3
 	mullw	r9,r3,r5
-	mulhwu	r3,r3,r5
+	mulhwu	r10,r3,r5
 	beq	2f
-	mullw	r0,r10,r6
-	mulhwu	r8,r10,r6
+	mullw	r0,r3,r6
+	mulhwu	r8,r3,r6
 	addc	r7,r0,r7
 	adde	r4,r4,r8
-	addze	r3,r3
+	addze	r10,r10
 2:	addc	r4,r4,r9
-	addze	r3,r3
+	addze	r3,r10
 	blr
 
 /*
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline
  2015-09-22 16:51 ` [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
@ 2015-09-22 17:57   ` Joakim Tjernlund
  2015-09-29  0:23   ` Scott Wood
  1 sibling, 0 replies; 76+ messages in thread
From: Joakim Tjernlund @ 2015-09-22 17:57 UTC (permalink / raw)
  To: christophe.leroy, paulus, mpe, benh, scottwood; +Cc: linuxppc-dev, linux-kernel

Hi Christophe

Really nice patchset!

On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> clear_pages() is never used, and PPC32 is the only architecture
> (still) having this function. Neither PPC64 nor any other
> architecture has it.
> 
> This patch removes clear_page() and move clear_page() function
> inline (same as PPC64) as it only is a few isns
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> No change in v2
> 
>  arch/powerpc/include/asm/page_32.h | 17 ++++++++++++++---
>  arch/powerpc/kernel/misc_32.S      | 16 ----------------
>  arch/powerpc/kernel/ppc_ksyms_32.c |  1 -
>  3 files changed, 14 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/page_32.h b/arch/powerpc/include/asm/page_32.h
> index 68d73b2..6a8e179 100644
> --- a/arch/powerpc/include/asm/page_32.h
> +++ b/arch/powerpc/include/asm/page_32.h
> @@ -1,6 +1,8 @@
>  #ifndef _ASM_POWERPC_PAGE_32_H
>  #define _ASM_POWERPC_PAGE_32_H
>  
> +#include <asm/cache.h>
> +
>  #if defined(CONFIG_PHYSICAL_ALIGN) && (CONFIG_PHYSICAL_START != 0)
>  #if (CONFIG_PHYSICAL_START % CONFIG_PHYSICAL_ALIGN) != 0
>  #error "CONFIG_PHYSICAL_START must be a multiple of CONFIG_PHYSICAL_ALIGN"
> @@ -36,9 +38,18 @@ typedef unsigned long long pte_basic_t;
>  typedef unsigned long pte_basic_t;
>  #endif
>  
> -struct page;
> -extern void clear_pages(void *page, int order);
> -static inline void clear_page(void *page) { clear_pages(page, 0); }
> +/*
> + * Clear page using the dcbz instruction, which doesn't cause any
> + * memory traffic (except to write out any cache lines which get
> + * displaced).  This only works on cacheable memory.
> + */
> +static inline void clear_page(void *addr)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < PAGE_SIZE / L1_CACHE_BYTES; i++, addr += L1_CACHE_BYTES)
> +		dcbz(addr);
> +}

Does gcc manage to transform this into efficient asm?
Otherwise you could help gcc by using do { .. } while(--i); instead.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 16:51 ` [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
@ 2015-09-22 18:12   ` Joakim Tjernlund
  2015-09-22 18:58     ` Scott Wood
  2015-09-29  0:29   ` Scott Wood
  1 sibling, 1 reply; 76+ messages in thread
From: Joakim Tjernlund @ 2015-09-22 18:12 UTC (permalink / raw)
  To: christophe.leroy, paulus, mpe, benh, scottwood; +Cc: linuxppc-dev, linux-kernel

On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> flush/clean/invalidate _dcache_range() functions are all very
> similar and are quite short. They are mainly used in __dma_sync()
> perf_event locate them in the top 3 consumming functions during
> heavy ethernet activity
> 
> They are good candidate for inlining, as __dma_sync() does
> almost nothing but calling them
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> New in v2
> 
>  arch/powerpc/include/asm/cacheflush.h | 55 +++++++++++++++++++++++++++--
>  arch/powerpc/kernel/misc_32.S         | 65 -----------------------------------
>  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
>  3 files changed, 54 insertions(+), 68 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
> index 6229e6b..6169604 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -47,12 +47,61 @@ static inline void __flush_dcache_icache_phys(unsigned long physaddr)
>  }
>  #endif
>  
> -extern void flush_dcache_range(unsigned long start, unsigned long stop);
>  #ifdef CONFIG_PPC32
> -extern void clean_dcache_range(unsigned long start, unsigned long stop);
> -extern void invalidate_dcache_range(unsigned long start, unsigned long stop);
> +/*
> + * Write any modified data cache blocks out to memory and invalidate them.
> + * Does not invalidate the corresponding instruction cache blocks.
> + */
> +static inline void flush_dcache_range(unsigned long start, unsigned long stop)
> +{
> +	void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> +	unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
> +	unsigned int i;
> +
> +	for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
> +		dcbf(addr);
> +	if (i)
> +		mb();	/* sync */
> +}

This feels optimized for the uncommon case when there is no invalidation.
I THINK it would be better to bail early and use do { .. } while(--i); instead.

 Jocke

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 18:12   ` Joakim Tjernlund
@ 2015-09-22 18:58     ` Scott Wood
  2015-09-22 19:34       ` Joakim Tjernlund
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-22 18:58 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: christophe.leroy, paulus, mpe, benh, linuxppc-dev, linux-kernel

On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > flush/clean/invalidate _dcache_range() functions are all very
> > similar and are quite short. They are mainly used in __dma_sync()
> > perf_event locate them in the top 3 consumming functions during
> > heavy ethernet activity
> > 
> > They are good candidate for inlining, as __dma_sync() does
> > almost nothing but calling them
> > 
> > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > ---
> > New in v2
> > 
> >  arch/powerpc/include/asm/cacheflush.h | 55 +++++++++++++++++++++++++++--
> >  arch/powerpc/kernel/misc_32.S         | 65 ------------------------------
> > -----
> >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> >  3 files changed, 54 insertions(+), 68 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > b/arch/powerpc/include/asm/cacheflush.h
> > index 6229e6b..6169604 100644
> > --- a/arch/powerpc/include/asm/cacheflush.h
> > +++ b/arch/powerpc/include/asm/cacheflush.h
> > @@ -47,12 +47,61 @@ static inline void 
> > __flush_dcache_icache_phys(unsigned long physaddr)
> >  }
> >  #endif
> >  
> > -extern void flush_dcache_range(unsigned long start, unsigned long stop);
> >  #ifdef CONFIG_PPC32
> > -extern void clean_dcache_range(unsigned long start, unsigned long stop);
> > -extern void invalidate_dcache_range(unsigned long start, unsigned long 
> > stop);
> > +/*
> > + * Write any modified data cache blocks out to memory and invalidate 
> > them.
> > + * Does not invalidate the corresponding instruction cache blocks.
> > + */
> > +static inline void flush_dcache_range(unsigned long start, unsigned long 
> > stop)
> > +{
> > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > +   unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
> > +   unsigned int i;
> > +
> > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
> > +           dcbf(addr);
> > +   if (i)
> > +           mb();   /* sync */
> > +}
> 
> This feels optimized for the uncommon case when there is no invalidation.

If you mean the "if (i)", yes, that looks odd.

> I THINK it would be better to bail early 

Bail under what conditions?

> and use do { .. } while(--i); instead.

GCC knows how to optimize loops.  Please don't make them less readable.

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 18:58     ` Scott Wood
@ 2015-09-22 19:34       ` Joakim Tjernlund
  2015-09-22 19:42         ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Joakim Tjernlund @ 2015-09-22 19:34 UTC (permalink / raw)
  To: scottwood; +Cc: christophe.leroy, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
> On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> > On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > > flush/clean/invalidate _dcache_range() functions are all very
> > > similar and are quite short. They are mainly used in __dma_sync()
> > > perf_event locate them in the top 3 consumming functions during
> > > heavy ethernet activity
> > > 
> > > They are good candidate for inlining, as __dma_sync() does
> > > almost nothing but calling them
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > ---
> > > New in v2
> > > 
> > >  arch/powerpc/include/asm/cacheflush.h | 55 +++++++++++++++++++++++++++--
> > >  arch/powerpc/kernel/misc_32.S         | 65 ------------------------------
> > > -----
> > >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > >  3 files changed, 54 insertions(+), 68 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > b/arch/powerpc/include/asm/cacheflush.h
> > > index 6229e6b..6169604 100644
> > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > @@ -47,12 +47,61 @@ static inline void 
> > > __flush_dcache_icache_phys(unsigned long physaddr)
> > >  }
> > >  #endif
> > >  
> > > -extern void flush_dcache_range(unsigned long start, unsigned long stop);
> > >  #ifdef CONFIG_PPC32
> > > -extern void clean_dcache_range(unsigned long start, unsigned long stop);
> > > -extern void invalidate_dcache_range(unsigned long start, unsigned long 
> > > stop);
> > > +/*
> > > + * Write any modified data cache blocks out to memory and invalidate 
> > > them.
> > > + * Does not invalidate the corresponding instruction cache blocks.
> > > + */
> > > +static inline void flush_dcache_range(unsigned long start, unsigned long 
> > > stop)
> > > +{
> > > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > +   unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
> > > +   unsigned int i;
> > > +
> > > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
> > > +           dcbf(addr);
> > > +   if (i)
> > > +           mb();   /* sync */
> > > +}
> > 
> > This feels optimized for the uncommon case when there is no invalidation.
> 
> If you mean the "if (i)", yes, that looks odd.

Yes.

> 
> > I THINK it would be better to bail early 
> 
> Bail under what conditions?

test for "i = 0" and return. 

> 
> > and use do { .. } while(--i); instead.
> 
> GCC knows how to optimize loops.  Please don't make them less readable.

Been a while since I checked but it used to be bad att transforming post inc to pre inc/dec
I remain unconvinced until I have seen it.

 Jocke

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 19:34       ` Joakim Tjernlund
@ 2015-09-22 19:42         ` Scott Wood
  2015-09-22 19:55           ` Joakim Tjernlund
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-22 19:42 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: christophe.leroy, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
> On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
> > On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> > > On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > > > flush/clean/invalidate _dcache_range() functions are all very
> > > > similar and are quite short. They are mainly used in __dma_sync()
> > > > perf_event locate them in the top 3 consumming functions during
> > > > heavy ethernet activity
> > > > 
> > > > They are good candidate for inlining, as __dma_sync() does
> > > > almost nothing but calling them
> > > > 
> > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > ---
> > > > New in v2
> > > > 
> > > >  arch/powerpc/include/asm/cacheflush.h | 55 
> > > > +++++++++++++++++++++++++++--
> > > >  arch/powerpc/kernel/misc_32.S         | 65 --------------------------
> > > > ----
> > > > -----
> > > >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > > >  3 files changed, 54 insertions(+), 68 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > > b/arch/powerpc/include/asm/cacheflush.h
> > > > index 6229e6b..6169604 100644
> > > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > > @@ -47,12 +47,61 @@ static inline void 
> > > > __flush_dcache_icache_phys(unsigned long physaddr)
> > > >  }
> > > >  #endif
> > > >  
> > > > -extern void flush_dcache_range(unsigned long start, unsigned long 
> > > > stop);
> > > >  #ifdef CONFIG_PPC32
> > > > -extern void clean_dcache_range(unsigned long start, unsigned long 
> > > > stop);
> > > > -extern void invalidate_dcache_range(unsigned long start, unsigned 
> > > > long 
> > > > stop);
> > > > +/*
> > > > + * Write any modified data cache blocks out to memory and invalidate 
> > > > them.
> > > > + * Does not invalidate the corresponding instruction cache blocks.
> > > > + */
> > > > +static inline void flush_dcache_range(unsigned long start, unsigned 
> > > > long 
> > > > stop)
> > > > +{
> > > > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > > +   unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES -
> > > > 1);
> > > > +   unsigned int i;
> > > > +
> > > > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += 
> > > > L1_CACHE_BYTES)
> > > > +           dcbf(addr);
> > > > +   if (i)
> > > > +           mb();   /* sync */
> > > > +}
> > > 
> > > This feels optimized for the uncommon case when there is no 
> > > invalidation.
> > 
> > If you mean the "if (i)", yes, that looks odd.
> 
> Yes.
> 
> > 
> > > I THINK it would be better to bail early 
> > 
> > Bail under what conditions?
> 
> test for "i = 0" and return. 

Why bother?

> 
> > 
> > > and use do { .. } while(--i); instead.
> > 
> > GCC knows how to optimize loops.  Please don't make them less readable.
> 
> Been a while since I checked but it used to be bad att transforming post 
> inc to pre inc/dec
> I remain unconvinced until I have seen it.

I would expect it to use bdnz for this loop, as the loop variable isn't 
referenced in the loop body.

And generally the one proposing uglification-for-optimization should provide 
the evidence. :-)

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 19:42         ` Scott Wood
@ 2015-09-22 19:55           ` Joakim Tjernlund
  2015-09-22 20:07             ` Joakim Tjernlund
  2015-09-22 20:14             ` Scott Wood
  0 siblings, 2 replies; 76+ messages in thread
From: Joakim Tjernlund @ 2015-09-22 19:55 UTC (permalink / raw)
  To: scottwood; +Cc: christophe.leroy, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 14:42 -0500, Scott Wood wrote:
> On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
> > On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
> > > On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> > > > On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > > > > flush/clean/invalidate _dcache_range() functions are all very
> > > > > similar and are quite short. They are mainly used in __dma_sync()
> > > > > perf_event locate them in the top 3 consumming functions during
> > > > > heavy ethernet activity
> > > > > 
> > > > > They are good candidate for inlining, as __dma_sync() does
> > > > > almost nothing but calling them
> > > > > 
> > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > > ---
> > > > > New in v2
> > > > > 
> > > > >  arch/powerpc/include/asm/cacheflush.h | 55 
> > > > > +++++++++++++++++++++++++++--
> > > > >  arch/powerpc/kernel/misc_32.S         | 65 --------------------------
> > > > > ----
> > > > > -----
> > > > >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > > > >  3 files changed, 54 insertions(+), 68 deletions(-)
> > > > > 
> > > > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > > > b/arch/powerpc/include/asm/cacheflush.h
> > > > > index 6229e6b..6169604 100644
> > > > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > > > @@ -47,12 +47,61 @@ static inline void 
> > > > > __flush_dcache_icache_phys(unsigned long physaddr)
> > > > >  }
> > > > >  #endif
> > > > >  
> > > > > -extern void flush_dcache_range(unsigned long start, unsigned long 
> > > > > stop);
> > > > >  #ifdef CONFIG_PPC32
> > > > > -extern void clean_dcache_range(unsigned long start, unsigned long 
> > > > > stop);
> > > > > -extern void invalidate_dcache_range(unsigned long start, unsigned 
> > > > > long 
> > > > > stop);
> > > > > +/*
> > > > > + * Write any modified data cache blocks out to memory and invalidate 
> > > > > them.
> > > > > + * Does not invalidate the corresponding instruction cache blocks.
> > > > > + */
> > > > > +static inline void flush_dcache_range(unsigned long start, unsigned 
> > > > > long 
> > > > > stop)
> > > > > +{
> > > > > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > > > +   unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES -
> > > > > 1);
> > > > > +   unsigned int i;
> > > > > +
> > > > > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += 
> > > > > L1_CACHE_BYTES)
> > > > > +           dcbf(addr);
> > > > > +   if (i)
> > > > > +           mb();   /* sync */
> > > > > +}
> > > > 
> > > > This feels optimized for the uncommon case when there is no 
> > > > invalidation.
> > > 
> > > If you mean the "if (i)", yes, that looks odd.
> > 
> > Yes.
> > 
> > > 
> > > > I THINK it would be better to bail early 
> > > 
> > > Bail under what conditions?
> > 
> > test for "i = 0" and return. 
> 
> Why bother?

I usally find it better to dela with special cases upfront så the rest doesn't need to
bother. i=0 is a NOP and it is clearer to show that upfront.

> 
> > 
> > > 
> > > > and use do { .. } while(--i); instead.
> > > 
> > > GCC knows how to optimize loops.  Please don't make them less readable.
> > 
> > Been a while since I checked but it used to be bad att transforming post 
> > inc to pre inc/dec
> > I remain unconvinced until I have seen it.
> 
> I would expect it to use bdnz for this loop, as the loop variable isn't 
> referenced in the loop body.
> 
> And generally the one proposing uglification-for-optimization should provide 
> the evidence. :-)

When it comes to gcc, past history is my evidence until proven otherwise :)
Maybe I will check again ...

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 19:55           ` Joakim Tjernlund
@ 2015-09-22 20:07             ` Joakim Tjernlund
  2015-09-22 20:14             ` Scott Wood
  1 sibling, 0 replies; 76+ messages in thread
From: Joakim Tjernlund @ 2015-09-22 20:07 UTC (permalink / raw)
  To: scottwood; +Cc: linuxppc-dev, linux-kernel, paulus


> > And generally the one proposing uglification-for-optimization should provide 
> > the evidence. :-)
> 
> When it comes to gcc, past history is my evidence until proven otherwise :)
> Maybe I will check again ...

OK then:
static inline void mb(void)
{
       __asm__ __volatile__ ("sync" : : : "memory");
}

static inline void dcbf(void *addr)
{
       __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
}
#define L1_CACHE_SHIFT 5
#define L1_CACHE_BYTES  (1 << L1_CACHE_SHIFT)
void flush_dcache_range(unsigned long start, unsigned long stop)
{
       void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
       unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
       unsigned int i;

       for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
               dcbf(addr);
       if (i)
               mb();   /* sync */
}

gives:
flush_dcache_range:
	stwu %r1,-16(%r1)
	rlwinm %r3,%r3,0,0,26
	addi %r4,%r4,31
	subf %r9,%r3,%r4
	srwi. %r10,%r9,5
	beq %cr0,.L1
	mtctr %r10
	.p2align 4,,15
.L4:
#APP
 # 8 "gccloop.c" 1
	dcbf 0, %r3
 # 0 "" 2
#NO_APP
	addi %r3,%r3,32
	bdnz .L4
#APP
 # 3 "gccloop.c" 1
	sync
 # 0 "" 2
#NO_APP
.L1:
	addi %r1,%r1,16
	blr

good enough :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 19:55           ` Joakim Tjernlund
  2015-09-22 20:07             ` Joakim Tjernlund
@ 2015-09-22 20:14             ` Scott Wood
  2015-09-22 20:32               ` Joakim Tjernlund
  1 sibling, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-22 20:14 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: christophe.leroy, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 19:55 +0000, Joakim Tjernlund wrote:
> On Tue, 2015-09-22 at 14:42 -0500, Scott Wood wrote:
> > On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
> > > On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
> > > > On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> > > > > On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > > > > > flush/clean/invalidate _dcache_range() functions are all very
> > > > > > similar and are quite short. They are mainly used in __dma_sync()
> > > > > > perf_event locate them in the top 3 consumming functions during
> > > > > > heavy ethernet activity
> > > > > > 
> > > > > > They are good candidate for inlining, as __dma_sync() does
> > > > > > almost nothing but calling them
> > > > > > 
> > > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > > > ---
> > > > > > New in v2
> > > > > > 
> > > > > >  arch/powerpc/include/asm/cacheflush.h | 55 
> > > > > > +++++++++++++++++++++++++++--
> > > > > >  arch/powerpc/kernel/misc_32.S         | 65 ----------------------
> > > > > > ----
> > > > > > ----
> > > > > > -----
> > > > > >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > > > > >  3 files changed, 54 insertions(+), 68 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > > > > b/arch/powerpc/include/asm/cacheflush.h
> > > > > > index 6229e6b..6169604 100644
> > > > > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > > > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > > > > @@ -47,12 +47,61 @@ static inline void 
> > > > > > __flush_dcache_icache_phys(unsigned long physaddr)
> > > > > >  }
> > > > > >  #endif
> > > > > >  
> > > > > > -extern void flush_dcache_range(unsigned long start, unsigned 
> > > > > > long 
> > > > > > stop);
> > > > > >  #ifdef CONFIG_PPC32
> > > > > > -extern void clean_dcache_range(unsigned long start, unsigned 
> > > > > > long 
> > > > > > stop);
> > > > > > -extern void invalidate_dcache_range(unsigned long start, 
> > > > > > unsigned 
> > > > > > long 
> > > > > > stop);
> > > > > > +/*
> > > > > > + * Write any modified data cache blocks out to memory and 
> > > > > > invalidate 
> > > > > > them.
> > > > > > + * Does not invalidate the corresponding instruction cache 
> > > > > > blocks.
> > > > > > + */
> > > > > > +static inline void flush_dcache_range(unsigned long start, 
> > > > > > unsigned 
> > > > > > long 
> > > > > > stop)
> > > > > > +{
> > > > > > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > > > > +   unsigned int size = stop - (unsigned long)addr + 
> > > > > > (L1_CACHE_BYTES -
> > > > > > 1);
> > > > > > +   unsigned int i;
> > > > > > +
> > > > > > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += 
> > > > > > L1_CACHE_BYTES)
> > > > > > +           dcbf(addr);
> > > > > > +   if (i)
> > > > > > +           mb();   /* sync */
> > > > > > +}
> > > > > 
> > > > > This feels optimized for the uncommon case when there is no 
> > > > > invalidation.
> > > > 
> > > > If you mean the "if (i)", yes, that looks odd.
> > > 
> > > Yes.
> > > 
> > > > 
> > > > > I THINK it would be better to bail early 
> > > > 
> > > > Bail under what conditions?
> > > 
> > > test for "i = 0" and return. 
> > 
> > Why bother?
> 
> I usally find it better to dela with special cases upfront så the rest 
> doesn't need to
> bother. i=0 is a NOP and it is clearer to show that upfront.

No, I mean why bother special casing this at all?

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 20:14             ` Scott Wood
@ 2015-09-22 20:32               ` Joakim Tjernlund
  2015-09-22 20:35                 ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Joakim Tjernlund @ 2015-09-22 20:32 UTC (permalink / raw)
  To: scottwood; +Cc: christophe.leroy, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 15:14 -0500, Scott Wood wrote:
> On Tue, 2015-09-22 at 19:55 +0000, Joakim Tjernlund wrote:
> > On Tue, 2015-09-22 at 14:42 -0500, Scott Wood wrote:
> > > On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
> > > > On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
> > > > > On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> > > > > > On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > > > > > > flush/clean/invalidate _dcache_range() functions are all very
> > > > > > > similar and are quite short. They are mainly used in __dma_sync()
> > > > > > > perf_event locate them in the top 3 consumming functions during
> > > > > > > heavy ethernet activity
> > > > > > > 
> > > > > > > They are good candidate for inlining, as __dma_sync() does
> > > > > > > almost nothing but calling them
> > > > > > > 
> > > > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > > > > ---
> > > > > > > New in v2
> > > > > > > 
> > > > > > >  arch/powerpc/include/asm/cacheflush.h | 55 
> > > > > > > +++++++++++++++++++++++++++--
> > > > > > >  arch/powerpc/kernel/misc_32.S         | 65 ----------------------
> > > > > > > ----
> > > > > > > ----
> > > > > > > -----
> > > > > > >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > > > > > >  3 files changed, 54 insertions(+), 68 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > > > > > b/arch/powerpc/include/asm/cacheflush.h
> > > > > > > index 6229e6b..6169604 100644
> > > > > > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > > > > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > > > > > @@ -47,12 +47,61 @@ static inline void 
> > > > > > > __flush_dcache_icache_phys(unsigned long physaddr)
> > > > > > >  }
> > > > > > >  #endif
> > > > > > >  
> > > > > > > -extern void flush_dcache_range(unsigned long start, unsigned 
> > > > > > > long 
> > > > > > > stop);
> > > > > > >  #ifdef CONFIG_PPC32
> > > > > > > -extern void clean_dcache_range(unsigned long start, unsigned 
> > > > > > > long 
> > > > > > > stop);
> > > > > > > -extern void invalidate_dcache_range(unsigned long start, 
> > > > > > > unsigned 
> > > > > > > long 
> > > > > > > stop);
> > > > > > > +/*
> > > > > > > + * Write any modified data cache blocks out to memory and 
> > > > > > > invalidate 
> > > > > > > them.
> > > > > > > + * Does not invalidate the corresponding instruction cache 
> > > > > > > blocks.
> > > > > > > + */
> > > > > > > +static inline void flush_dcache_range(unsigned long start, 
> > > > > > > unsigned 
> > > > > > > long 
> > > > > > > stop)
> > > > > > > +{
> > > > > > > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > > > > > +   unsigned int size = stop - (unsigned long)addr + 
> > > > > > > (L1_CACHE_BYTES -
> > > > > > > 1);
> > > > > > > +   unsigned int i;
> > > > > > > +
> > > > > > > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += 
> > > > > > > L1_CACHE_BYTES)
> > > > > > > +           dcbf(addr);
> > > > > > > +   if (i)
> > > > > > > +           mb();   /* sync */
> > > > > > > +}
> > > > > > 
> > > > > > This feels optimized for the uncommon case when there is no 
> > > > > > invalidation.
> > > > > 
> > > > > If you mean the "if (i)", yes, that looks odd.
> > > > 
> > > > Yes.
> > > > 
> > > > > 
> > > > > > I THINK it would be better to bail early 
> > > > > 
> > > > > Bail under what conditions?
> > > > 
> > > > test for "i = 0" and return. 
> > > 
> > > Why bother?
> > 
> > I usally find it better to dela with special cases upfront så the rest 
> > doesn't need to
> > bother. i=0 is a NOP and it is clearer to show that upfront.
> 
> No, I mean why bother special casing this at all?

I just said why? 
You to found the if(i) mb() a bit odd and it took a little time to see why it was there.
Ahh, you mean just skip the if(i) and have mb() unconditionally?
That changes the semantics a little from the ASM version but perhaps that doesn't matter?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 20:32               ` Joakim Tjernlund
@ 2015-09-22 20:35                 ` Scott Wood
  2015-09-22 20:38                   ` Joakim Tjernlund
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-22 20:35 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: christophe.leroy, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 20:32 +0000, Joakim Tjernlund wrote:
> On Tue, 2015-09-22 at 15:14 -0500, Scott Wood wrote:
> > On Tue, 2015-09-22 at 19:55 +0000, Joakim Tjernlund wrote:
> > > On Tue, 2015-09-22 at 14:42 -0500, Scott Wood wrote:
> > > > On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
> > > > > On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
> > > > > > On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> > > > > > > On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > > > > > > > flush/clean/invalidate _dcache_range() functions are all very
> > > > > > > > similar and are quite short. They are mainly used in 
> > > > > > > > __dma_sync()
> > > > > > > > perf_event locate them in the top 3 consumming functions 
> > > > > > > > during
> > > > > > > > heavy ethernet activity
> > > > > > > > 
> > > > > > > > They are good candidate for inlining, as __dma_sync() does
> > > > > > > > almost nothing but calling them
> > > > > > > > 
> > > > > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > > > > > ---
> > > > > > > > New in v2
> > > > > > > > 
> > > > > > > >  arch/powerpc/include/asm/cacheflush.h | 55 
> > > > > > > > +++++++++++++++++++++++++++--
> > > > > > > >  arch/powerpc/kernel/misc_32.S         | 65 ------------------
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > -----
> > > > > > > >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > > > > > > >  3 files changed, 54 insertions(+), 68 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > > > > > > b/arch/powerpc/include/asm/cacheflush.h
> > > > > > > > index 6229e6b..6169604 100644
> > > > > > > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > > > > > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > > > > > > @@ -47,12 +47,61 @@ static inline void 
> > > > > > > > __flush_dcache_icache_phys(unsigned long physaddr)
> > > > > > > >  }
> > > > > > > >  #endif
> > > > > > > >  
> > > > > > > > -extern void flush_dcache_range(unsigned long start, unsigned 
> > > > > > > > long 
> > > > > > > > stop);
> > > > > > > >  #ifdef CONFIG_PPC32
> > > > > > > > -extern void clean_dcache_range(unsigned long start, unsigned 
> > > > > > > > long 
> > > > > > > > stop);
> > > > > > > > -extern void invalidate_dcache_range(unsigned long start, 
> > > > > > > > unsigned 
> > > > > > > > long 
> > > > > > > > stop);
> > > > > > > > +/*
> > > > > > > > + * Write any modified data cache blocks out to memory and 
> > > > > > > > invalidate 
> > > > > > > > them.
> > > > > > > > + * Does not invalidate the corresponding instruction cache 
> > > > > > > > blocks.
> > > > > > > > + */
> > > > > > > > +static inline void flush_dcache_range(unsigned long start, 
> > > > > > > > unsigned 
> > > > > > > > long 
> > > > > > > > stop)
> > > > > > > > +{
> > > > > > > > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > > > > > > +   unsigned int size = stop - (unsigned long)addr + 
> > > > > > > > (L1_CACHE_BYTES -
> > > > > > > > 1);
> > > > > > > > +   unsigned int i;
> > > > > > > > +
> > > > > > > > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += 
> > > > > > > > L1_CACHE_BYTES)
> > > > > > > > +           dcbf(addr);
> > > > > > > > +   if (i)
> > > > > > > > +           mb();   /* sync */
> > > > > > > > +}
> > > > > > > 
> > > > > > > This feels optimized for the uncommon case when there is no 
> > > > > > > invalidation.
> > > > > > 
> > > > > > If you mean the "if (i)", yes, that looks odd.
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > > 
> > > > > > > I THINK it would be better to bail early 
> > > > > > 
> > > > > > Bail under what conditions?
> > > > > 
> > > > > test for "i = 0" and return. 
> > > > 
> > > > Why bother?
> > > 
> > > I usally find it better to dela with special cases upfront så the rest 
> > > doesn't need to
> > > bother. i=0 is a NOP and it is clearer to show that upfront.
> > 
> > No, I mean why bother special casing this at all?
> 
> I just said why? 
> You to found the if(i) mb() a bit odd and it took a little time to see why 
> it was there.
> Ahh, you mean just skip the if(i) and have mb() unconditionally?

Yes.

> That changes the semantics a little from the ASM version but perhaps that 
> doesn't matter?

Adding more barriers than strictly necessary is a performance concern, not a 
semantic change.  How often are we really calling this function over an empty 
range?

Not that it matters much one way or another...

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 20:35                 ` Scott Wood
@ 2015-09-22 20:38                   ` Joakim Tjernlund
  2015-09-22 20:57                     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Joakim Tjernlund @ 2015-09-22 20:38 UTC (permalink / raw)
  To: scottwood; +Cc: christophe.leroy, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 15:35 -0500, Scott Wood wrote:
> On Tue, 2015-09-22 at 20:32 +0000, Joakim Tjernlund wrote:
> > On Tue, 2015-09-22 at 15:14 -0500, Scott Wood wrote:
> > > On Tue, 2015-09-22 at 19:55 +0000, Joakim Tjernlund wrote:
> > > > On Tue, 2015-09-22 at 14:42 -0500, Scott Wood wrote:
> > > > > On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
> > > > > > On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
> > > > > > > On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
> > > > > > > > On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
> > > > > > > > > flush/clean/invalidate _dcache_range() functions are all very
> > > > > > > > > similar and are quite short. They are mainly used in 
> > > > > > > > > __dma_sync()
> > > > > > > > > perf_event locate them in the top 3 consumming functions 
> > > > > > > > > during
> > > > > > > > > heavy ethernet activity
> > > > > > > > > 
> > > > > > > > > They are good candidate for inlining, as __dma_sync() does
> > > > > > > > > almost nothing but calling them
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > > > > > > ---
> > > > > > > > > New in v2
> > > > > > > > > 
> > > > > > > > >  arch/powerpc/include/asm/cacheflush.h | 55 
> > > > > > > > > +++++++++++++++++++++++++++--
> > > > > > > > >  arch/powerpc/kernel/misc_32.S         | 65 ------------------
> > > > > > > > > ----
> > > > > > > > > ----
> > > > > > > > > ----
> > > > > > > > > -----
> > > > > > > > >  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > > > > > > > >  3 files changed, 54 insertions(+), 68 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > > > > > > > b/arch/powerpc/include/asm/cacheflush.h
> > > > > > > > > index 6229e6b..6169604 100644
> > > > > > > > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > > > > > > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > > > > > > > @@ -47,12 +47,61 @@ static inline void 
> > > > > > > > > __flush_dcache_icache_phys(unsigned long physaddr)
> > > > > > > > >  }
> > > > > > > > >  #endif
> > > > > > > > >  
> > > > > > > > > -extern void flush_dcache_range(unsigned long start, unsigned 
> > > > > > > > > long 
> > > > > > > > > stop);
> > > > > > > > >  #ifdef CONFIG_PPC32
> > > > > > > > > -extern void clean_dcache_range(unsigned long start, unsigned 
> > > > > > > > > long 
> > > > > > > > > stop);
> > > > > > > > > -extern void invalidate_dcache_range(unsigned long start, 
> > > > > > > > > unsigned 
> > > > > > > > > long 
> > > > > > > > > stop);
> > > > > > > > > +/*
> > > > > > > > > + * Write any modified data cache blocks out to memory and 
> > > > > > > > > invalidate 
> > > > > > > > > them.
> > > > > > > > > + * Does not invalidate the corresponding instruction cache 
> > > > > > > > > blocks.
> > > > > > > > > + */
> > > > > > > > > +static inline void flush_dcache_range(unsigned long start, 
> > > > > > > > > unsigned 
> > > > > > > > > long 
> > > > > > > > > stop)
> > > > > > > > > +{
> > > > > > > > > +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > > > > > > > +   unsigned int size = stop - (unsigned long)addr + 
> > > > > > > > > (L1_CACHE_BYTES -
> > > > > > > > > 1);
> > > > > > > > > +   unsigned int i;
> > > > > > > > > +
> > > > > > > > > +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += 
> > > > > > > > > L1_CACHE_BYTES)
> > > > > > > > > +           dcbf(addr);
> > > > > > > > > +   if (i)
> > > > > > > > > +           mb();   /* sync */
> > > > > > > > > +}
> > > > > > > > 
> > > > > > > > This feels optimized for the uncommon case when there is no 
> > > > > > > > invalidation.
> > > > > > > 
> > > > > > > If you mean the "if (i)", yes, that looks odd.
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > 
> > > > > > > > I THINK it would be better to bail early 
> > > > > > > 
> > > > > > > Bail under what conditions?
> > > > > > 
> > > > > > test for "i = 0" and return. 
> > > > > 
> > > > > Why bother?
> > > > 
> > > > I usally find it better to dela with special cases upfront så the rest 
> > > > doesn't need to
> > > > bother. i=0 is a NOP and it is clearer to show that upfront.
> > > 
> > > No, I mean why bother special casing this at all?
> > 
> > I just said why? 
> > You to found the if(i) mb() a bit odd and it took a little time to see why 
> > it was there.
> > Ahh, you mean just skip the if(i) and have mb() unconditionally?
> 
> Yes.
> 
> > That changes the semantics a little from the ASM version but perhaps that 
> > doesn't matter?
> 
> Adding more barriers than strictly necessary is a performance concern, not a 
> semantic change.
Semantics :)

>   How often are we really calling this function over an empty 
> range?

Never hopefully, it does not make much sense.

> 
> Not that it matters much one way or another...
probably not.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 20:38                   ` Joakim Tjernlund
@ 2015-09-22 20:57                     ` Christophe Leroy
  2015-09-22 22:34                       ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 20:57 UTC (permalink / raw)
  To: Joakim Tjernlund, scottwood; +Cc: paulus, mpe, benh, linux-kernel, linuxppc-dev



Le 22/09/2015 22:38, Joakim Tjernlund a écrit :
> On Tue, 2015-09-22 at 15:35 -0500, Scott Wood wrote:
>> On Tue, 2015-09-22 at 20:32 +0000, Joakim Tjernlund wrote:
>>> On Tue, 2015-09-22 at 15:14 -0500, Scott Wood wrote:
>>>> On Tue, 2015-09-22 at 19:55 +0000, Joakim Tjernlund wrote:
>>>>> On Tue, 2015-09-22 at 14:42 -0500, Scott Wood wrote:
>>>>>> On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
>>>>>>> On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
>>>>>>>> On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
>>>>>>>>> On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
>>>>>>>>>> flush/clean/invalidate _dcache_range() functions are all very
>>>>>>>>>> similar and are quite short. They are mainly used in
>>>>>>>>>> __dma_sync()
>>>>>>>>>> perf_event locate them in the top 3 consumming functions
>>>>>>>>>> during
>>>>>>>>>> heavy ethernet activity
>>>>>>>>>>
>>>>>>>>>> They are good candidate for inlining, as __dma_sync() does
>>>>>>>>>> almost nothing but calling them
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>>>>>>>>> ---
>>>>>>>>>> New in v2
>>>>>>>>>>
>>>>>>>>>>   arch/powerpc/include/asm/cacheflush.h | 55
>>>>>>>>>> +++++++++++++++++++++++++++--
>>>>>>>>>>   arch/powerpc/kernel/misc_32.S         | 65 ------------------
>>>>>>>>>> ----
>>>>>>>>>> ----
>>>>>>>>>> ----
>>>>>>>>>> -----
>>>>>>>>>>   arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
>>>>>>>>>>   3 files changed, 54 insertions(+), 68 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> b/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> index 6229e6b..6169604 100644
>>>>>>>>>> --- a/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> +++ b/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> @@ -47,12 +47,61 @@ static inline void
>>>>>>>>>> __flush_dcache_icache_phys(unsigned long physaddr)
>>>>>>>>>>   }
>>>>>>>>>>   #endif
>>>>>>>>>>   
>>>>>>>>>> -extern void flush_dcache_range(unsigned long start, unsigned
>>>>>>>>>> long
>>>>>>>>>> stop);
>>>>>>>>>>   #ifdef CONFIG_PPC32
>>>>>>>>>> -extern void clean_dcache_range(unsigned long start, unsigned
>>>>>>>>>> long
>>>>>>>>>> stop);
>>>>>>>>>> -extern void invalidate_dcache_range(unsigned long start,
>>>>>>>>>> unsigned
>>>>>>>>>> long
>>>>>>>>>> stop);
>>>>>>>>>> +/*
>>>>>>>>>> + * Write any modified data cache blocks out to memory and
>>>>>>>>>> invalidate
>>>>>>>>>> them.
>>>>>>>>>> + * Does not invalidate the corresponding instruction cache
>>>>>>>>>> blocks.
>>>>>>>>>> + */
>>>>>>>>>> +static inline void flush_dcache_range(unsigned long start,
>>>>>>>>>> unsigned
>>>>>>>>>> long
>>>>>>>>>> stop)
>>>>>>>>>> +{
>>>>>>>>>> +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
>>>>>>>>>> +   unsigned int size = stop - (unsigned long)addr +
>>>>>>>>>> (L1_CACHE_BYTES -
>>>>>>>>>> 1);
>>>>>>>>>> +   unsigned int i;
>>>>>>>>>> +
>>>>>>>>>> +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr +=
>>>>>>>>>> L1_CACHE_BYTES)
>>>>>>>>>> +           dcbf(addr);
>>>>>>>>>> +   if (i)
>>>>>>>>>> +           mb();   /* sync */
>>>>>>>>>> +}
>>>>>>>>> This feels optimized for the uncommon case when there is no
>>>>>>>>> invalidation.
>>>>>>>> If you mean the "if (i)", yes, that looks odd.
>>>>>>> Yes.
>>>>>>>
>>>>>>>>> I THINK it would be better to bail early
>>>>>>>> Bail under what conditions?
>>>>>>> test for "i = 0" and return.
>>>>>> Why bother?
>>>>> I usally find it better to dela with special cases upfront så the rest
>>>>> doesn't need to
>>>>> bother. i=0 is a NOP and it is clearer to show that upfront.
>>>> No, I mean why bother special casing this at all?
>>> I just said why?
>>> You to found the if(i) mb() a bit odd and it took a little time to see why
>>> it was there.
>>> Ahh, you mean just skip the if(i) and have mb() unconditionally?
>> Yes.
>>
>>> That changes the semantics a little from the ASM version but perhaps that
>>> doesn't matter?
>> Adding more barriers than strictly necessary is a performance concern, not a
>> semantic change.
> Semantics :)
>
>>    How often are we really calling this function over an empty
>> range?
> Never hopefully, it does not make much sense.
>
>> Not that it matters much one way or another...
> probably not.
>

Here is what I get in asm. First one is with "if (i) mb();". We see gcc 
puts a beqlr. This is the form that is closest to what we had in the 
former misc_32.S
Second one if with "mb()". Here we get a branch to sync for a useless sync

c000e0ac <my_flush_dcache_range1>:
c000e0ac:       54 63 00 36     rlwinm  r3,r3,0,0,27
c000e0b0:       38 84 00 0f     addi    r4,r4,15
c000e0b4:       7d 23 20 50     subf    r9,r3,r4
c000e0b8:       55 29 e1 3f     rlwinm. r9,r9,28,4,31
c000e0bc:       4d 82 00 20     beqlr
c000e0c0:       7d 29 03 a6     mtctr   r9
c000e0c4:       7c 00 18 6c     dcbst   0,r3
c000e0c8:       38 63 00 10     addi    r3,r3,16
c000e0cc:       42 00 ff f8     bdnz    c000e0c4 
<my_flush_dcache_range1+0x18>
c000e0d0:       7c 00 04 ac     sync
c000e0d4:       4e 80 00 20     blr

c000e0d8 <my_flush_dcache_range2>:
c000e0d8:       54 63 00 36     rlwinm  r3,r3,0,0,27
c000e0dc:       38 84 00 0f     addi    r4,r4,15
c000e0e0:       7d 23 20 50     subf    r9,r3,r4
c000e0e4:       55 29 e1 3f     rlwinm. r9,r9,28,4,31
c000e0e8:       41 82 00 14     beq     c000e0fc 
<my_flush_dcache_range2+0x24>
c000e0ec:       7d 29 03 a6     mtctr   r9
c000e0f0:       7c 00 18 6c     dcbst   0,r3
c000e0f4:       38 63 00 10     addi    r3,r3,16
c000e0f8:       42 00 ff f8     bdnz    c000e0f0 
<my_flush_dcache_range2+0x18>
c000e0fc:       7c 00 04 ac     sync
c000e100:       4e 80 00 20     blr

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 20:57                     ` Christophe Leroy
@ 2015-09-22 22:34                       ` Scott Wood
  2015-09-22 22:49                         ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-22 22:34 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Joakim Tjernlund, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Tue, 2015-09-22 at 22:57 +0200, Christophe Leroy wrote:
> Here is what I get in asm. First one is with "if (i) mb();". We see gcc 
> puts a beqlr. This is the form that is closest to what we had in the 
> former misc_32.S
> Second one if with "mb()". Here we get a branch to sync for a useless sync

I was more concerned with keeping the code simple than the asm output.

> c000e0ac <my_flush_dcache_range1>:
> c000e0ac:       54 63 00 36     rlwinm  r3,r3,0,0,27
> c000e0b0:       38 84 00 0f     addi    r4,r4,15
> c000e0b4:       7d 23 20 50     subf    r9,r3,r4
> c000e0b8:       55 29 e1 3f     rlwinm. r9,r9,28,4,31
> c000e0bc:       4d 82 00 20     beqlr
> c000e0c0:       7d 29 03 a6     mtctr   r9
> c000e0c4:       7c 00 18 6c     dcbst   0,r3
> c000e0c8:       38 63 00 10     addi    r3,r3,16
> c000e0cc:       42 00 ff f8     bdnz    c000e0c4 
> <my_flush_dcache_range1+0x18>
> c000e0d0:       7c 00 04 ac     sync
> c000e0d4:       4e 80 00 20     blr
> 
> c000e0d8 <my_flush_dcache_range2>:
> c000e0d8:       54 63 00 36     rlwinm  r3,r3,0,0,27
> c000e0dc:       38 84 00 0f     addi    r4,r4,15
> c000e0e0:       7d 23 20 50     subf    r9,r3,r4
> c000e0e4:       55 29 e1 3f     rlwinm. r9,r9,28,4,31
> c000e0e8:       41 82 00 14     beq     c000e0fc 
> <my_flush_dcache_range2+0x24>
> c000e0ec:       7d 29 03 a6     mtctr   r9
> c000e0f0:       7c 00 18 6c     dcbst   0,r3
> c000e0f4:       38 63 00 10     addi    r3,r3,16
> c000e0f8:       42 00 ff f8     bdnz    c000e0f0 
> <my_flush_dcache_range2+0x18>
> c000e0fc:       7c 00 04 ac     sync
> c000e100:       4e 80 00 20     blr

Who cares whether the case that should rarely if ever happen gets a beqlr or a
branch to sync+blr?

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 22:34                       ` Scott Wood
@ 2015-09-22 22:49                         ` Christophe Leroy
  2015-09-22 22:52                           ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-09-22 22:49 UTC (permalink / raw)
  To: Scott Wood
  Cc: Joakim Tjernlund, paulus, mpe, benh, linux-kernel, linuxppc-dev



Le 23/09/2015 00:34, Scott Wood a écrit :
> On Tue, 2015-09-22 at 22:57 +0200, Christophe Leroy wrote:
>> >Here is what I get in asm. First one is with "if (i) mb();". We see gcc
>> >puts a beqlr. This is the form that is closest to what we had in the
>> >former misc_32.S
>> >Second one if with "mb()". Here we get a branch to sync for a useless sync
> I was more concerned with keeping the code simple than the asm output.
>
Right, but is that so complicated to say: if we did nothing in the loop, 
no need to sync ?

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 22:49                         ` Christophe Leroy
@ 2015-09-22 22:52                           ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-09-22 22:52 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Joakim Tjernlund, paulus, mpe, benh, linux-kernel, linuxppc-dev

On Wed, 2015-09-23 at 00:49 +0200, Christophe Leroy wrote:
> Le 23/09/2015 00:34, Scott Wood a écrit :
> > On Tue, 2015-09-22 at 22:57 +0200, Christophe Leroy wrote:
> > > > Here is what I get in asm. First one is with "if (i) mb();". We see 
> > > > gcc
> > > > puts a beqlr. This is the form that is closest to what we had in the
> > > > former misc_32.S
> > > > Second one if with "mb()". Here we get a branch to sync for a useless 
> > > > sync
> > I was more concerned with keeping the code simple than the asm output.
> > 
> Right, but is that so complicated to say: if we did nothing in the loop, 
> no need to sync ?

As I said, it doesn't matter very much.  I wouldn't put it in personally, but 
it's not worth a long discussion.

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address
  2015-09-22 16:50 ` [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
@ 2015-09-24 11:41   ` David Laight
  2015-09-24 20:14     ` Scott Wood
  2015-09-28 23:53   ` Scott Wood
  1 sibling, 1 reply; 76+ messages in thread
From: David Laight @ 2015-09-24 11:41 UTC (permalink / raw)
  To: 'Christophe Leroy',
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	scottwood
  Cc: linuxppc-dev, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1155 bytes --]

From: Christophe Leroy
> Sent: 22 September 2015 17:51
...
> Traditionaly, each driver manages one computer board which has its
> own components with its own memory maps.
> But on embedded chips like the MPC8xx, the SOC has all registers
> located in the same IO area.
> 
> When looking at ioremaps done during startup, we see that
> many drivers are re-mapping small parts of the IMMR for their own use
> and all those small pieces gets their own 4k page, amplifying the
> number of TLB misses: in our system we get 0xff000000 mapped 31 times
> and 0xff003000 mapped 9 times.

Isn't this a more general problem?

If there are multiple remap requests for the same physical page
shouldn't the kernel be just increasing a reference count somewhere
and returning address in the same virtual page?
This should probably happen regardless of the address.
I presume it must be done for cacheable mappings.

Whether things like the IMMR should be mapped with a larger TLB
is a separate matter.

	David

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address
  2015-09-24 11:41   ` David Laight
@ 2015-09-24 20:14     ` Scott Wood
  2015-09-25 14:46       ` David Laight
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-24 20:14 UTC (permalink / raw)
  To: David Laight
  Cc: 'Christophe Leroy',
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, linux-kernel

On Thu, 2015-09-24 at 11:41 +0000, David Laight wrote:
> From: Christophe Leroy
> > Sent: 22 September 2015 17:51
> ...
> > Traditionaly, each driver manages one computer board which has its
> > own components with its own memory maps.
> > But on embedded chips like the MPC8xx, the SOC has all registers
> > located in the same IO area.
> > 
> > When looking at ioremaps done during startup, we see that
> > many drivers are re-mapping small parts of the IMMR for their own use
> > and all those small pieces gets their own 4k page, amplifying the
> > number of TLB misses: in our system we get 0xff000000 mapped 31 times
> > and 0xff003000 mapped 9 times.
> 
> Isn't this a more general problem?
> 
> If there are multiple remap requests for the same physical page
> shouldn't the kernel be just increasing a reference count somewhere
> and returning address in the same virtual page?
> This should probably happen regardless of the address.
> I presume it must be done for cacheable mappings.

Why would you assume that?

-Scott



^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address
  2015-09-24 20:14     ` Scott Wood
@ 2015-09-25 14:46       ` David Laight
  2015-09-25 17:09         ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: David Laight @ 2015-09-25 14:46 UTC (permalink / raw)
  To: 'Scott Wood'
  Cc: 'Christophe Leroy',
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 738 bytes --]

From: Scott Wood
> Sent: 24 September 2015 21:14
> > Isn't this a more general problem?
> >
> > If there are multiple remap requests for the same physical page
> > shouldn't the kernel be just increasing a reference count somewhere
> > and returning address in the same virtual page?
> > This should probably happen regardless of the address.
> > I presume it must be done for cacheable mappings.
> 
> Why would you assume that?

Because 'really horrid (tm)' things happen on some cache
architectures if you map the same physical address to
multiple virtual addresses.

	David

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address
  2015-09-25 14:46       ` David Laight
@ 2015-09-25 17:09         ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-09-25 17:09 UTC (permalink / raw)
  To: David Laight
  Cc: 'Christophe Leroy',
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, linux-kernel

On Fri, 2015-09-25 at 14:46 +0000, David Laight wrote:
> From: Scott Wood
> > Sent: 24 September 2015 21:14
> > > Isn't this a more general problem?
> > > 
> > > If there are multiple remap requests for the same physical page
> > > shouldn't the kernel be just increasing a reference count somewhere
> > > and returning address in the same virtual page?
> > > This should probably happen regardless of the address.
> > > I presume it must be done for cacheable mappings.
> > 
> > Why would you assume that?
> 
> Because 'really horrid (tm)' things happen on some cache
> architectures if you map the same physical address to
> multiple virtual addresses.

PPC is not such an architecture.

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
  2015-09-22 16:50 ` [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
@ 2015-09-28 22:07   ` Scott Wood
  2015-10-06 13:35     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-28 22:07 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> We are spending between 40 and 160 cycles with a mean of 65 cycles in
> the TLB handling routines (measured with mftbl) so make it more
> simple althought it adds one instruction.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Does this just make it simpler or does it make it faster?  What is the
performance impact?  Is the performance impact seen with or without
CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
mtspr/mfspr combo in order to replace one mfspr.

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap
  2015-09-22 16:50 ` [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap Christophe Leroy
@ 2015-09-28 23:39   ` Scott Wood
  2015-10-08 12:34     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-28 23:39 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:38PM +0200, Christophe Leroy wrote:
> Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
> 648K rodata, 508K init, 290K bss, 6644K reserved)
> Kernel virtual memory layout:
>   * 0xfffdf000..0xfffff000  : fixmap
>   * 0xfde00000..0xfe000000  : consistent mem
>   * 0xfddf6000..0xfde00000  : early ioremap
>   * 0xc9000000..0xfddf6000  : vmalloc & ioremap
> SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> 
> Mapping IMMR 1:1 is just wrong because it may overlap with another
> area. On most mpc8xx boards it is OK because IMMR is set to
> 0xff000000 but for instance on EP88xC board, IMMR is at 0xfa200000
> which overlaps with VM ioremap area
> 
> This patch fixes the virtual address for remapping IMMR to 0xff000000,
> regardless of the value of IMMR.
> 
> The size of IMMR area is 256kbytes (CPM at offset 0, security engine
> at offset 128) so 512kbytes is enough and allows to handle the EP88xC
> case (which is not 8Mbytes but only 2Mbytes aligned) the same way.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Instead of hardcoding 0xff000000, can you use asm/fixmap.h to allocate a
virtual address at compile time?

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either
  2015-09-22 16:50 ` [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either Christophe Leroy
@ 2015-09-28 23:41   ` Scott Wood
  2015-10-06 13:50     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-28 23:41 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:40PM +0200, Christophe Leroy wrote:
> iounmap() cannot vunmap() area mapped by TLBCAMs either
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> No change in v2
> 
>  arch/powerpc/mm/pgtable_32.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 7692d1b..03a073a 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -278,7 +278,9 @@ void iounmap(volatile void __iomem *addr)
>  	 * If mapped by BATs then there is nothing to do.
>  	 * Calling vfree() generates a benign warning.
>  	 */
> -	if (v_mapped_by_bats((unsigned long)addr)) return;
> +	if (v_mapped_by_bats((unsigned long)addr) ||
> +	    v_mapped_by_tlbcam((unsigned long)addr))
> +		return;

This is pretty pointless given that the next patch replaces both with
v_mapped_by_other().

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together
  2015-09-22 16:50 ` [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together Christophe Leroy
@ 2015-09-28 23:47   ` Scott Wood
  2015-10-06 14:02     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-28 23:47 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote:
> x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
> purpose, so lets group them into a single function.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> No change in v2
> 
>  arch/powerpc/mm/pgtable_32.c | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 03a073a..3fd9083 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t pa);
>  #define p_mapped_by_tlbcam(x)	(0UL)
>  #endif /* HAVE_TLBCAM */
>  
> +static inline unsigned long p_mapped_by_other(phys_addr_t pa)
> +{
> +	unsigned long v;
> +
> +	v = p_mapped_by_bats(pa);
> +	if (v /*&& p_mapped_by_bats(p+size-1)*/)
> +		return v;
> +
> +	return p_mapped_by_tlbcam(pa);
> +}

Did you forget to remove that comment?

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address
  2015-09-22 16:50 ` [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
  2015-09-24 11:41   ` David Laight
@ 2015-09-28 23:53   ` Scott Wood
  1 sibling, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-09-28 23:53 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:44PM +0200, Christophe Leroy wrote:

> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 3fd9083..1f2fdbc 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -49,6 +49,10 @@ EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
>  #define HAVE_TLBCAM	1
>  #endif
>  
> +#if CONFIG_PPC_8xx
> +#define HAVE_LTLB	1
> +#endif
> +
>  extern char etext[], _stext[];
>  
>  #ifdef HAVE_BATS
> @@ -67,6 +71,14 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t pa);
>  #define p_mapped_by_tlbcam(x)	(0UL)
>  #endif /* HAVE_TLBCAM */
>  
> +#ifdef HAVE_LTLB
> +phys_addr_t v_mapped_by_ltlb(unsigned long va);
> +unsigned long p_mapped_by_ltlb(phys_addr_t pa);
> +#else /* !HAVE_LTLB */
> +#define v_mapped_by_ltlb(x)	(0UL)
> +#define p_mapped_by_ltlb(x)	(0UL)
> +#endif /* HAVE_LTLB */
> +
>  static inline unsigned long p_mapped_by_other(phys_addr_t pa)
>  {
>  	unsigned long v;
> @@ -75,6 +87,10 @@ static inline unsigned long p_mapped_by_other(phys_addr_t pa)
>  	if (v /*&& p_mapped_by_bats(p+size-1)*/)
>  		return v;
>  
> +	v = p_mapped_by_ltlb(pa);
> +	if (v)
> +		return v;
> +
>  	return p_mapped_by_tlbcam(pa);
>  }
>  
> @@ -86,6 +102,10 @@ static inline phys_addr_t v_mapped_by_other(unsigned long va)
>  	if (p)
>  		return p;
>  
> +	p = v_mapped_by_ltlb(va);
> +	if (p)
> +		return p;
> +
>  	return v_mapped_by_tlbcam(va);
>  }

Since there is no kernel with more than one of {bats,ltlb,tlbcam} can we
just call it *_block_mapped() and have each subarch provide its
implementation (or stub) thereof?

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup
  2015-09-22 16:50 ` [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup Christophe Leroy
@ 2015-09-28 23:58   ` Scott Wood
  2015-10-06 14:10     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-28 23:58 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote:
> On recent kernels, with some debug options like for instance
> CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough
> the kernel code fits in the first 8M.
> Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M
> at startup, allthough pinning TLB is not necessary for that.
> 
> This patch adds a second 8M page to the initial mapping in order to
> have 16M mapped regardless of CONFIG_PIN_TLB, like several other
> 32 bits PPC (40x, 601, ...)
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---

Is the assumption that nobody is still running 8xx systems with only 8
MiB RAM on current kernels?

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations
  2015-09-22 16:50 ` [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations Christophe Leroy
@ 2015-09-29  0:00   ` Scott Wood
  2015-10-06 14:12     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-29  0:00 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote:
> We are spending between 40 and 160 cycles with a mean of 65 cycles
> in the TLB handling routines (measured with mftbl) so make it more
> simple althought it adds one instruction
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> No change in v2
> 
>  arch/powerpc/kernel/head_8xx.S | 15 ++++-----------
>  1 file changed, 4 insertions(+), 11 deletions(-)

Why is this a separate patch from 1/25?

Same comments as on that patch.

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones
  2015-09-22 16:50 ` [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones Christophe Leroy
@ 2015-09-29  0:03   ` Scott Wood
  2015-10-06 14:35     ` Christophe Leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-29  0:03 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote:
> Move 8xx SPRN defines into reg_8xx.h and add some missing ones
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> No change in v2

Why are they being moved?  Why are they being separated from the bit
definitions?

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline
  2015-09-22 16:51 ` [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
  2015-09-22 17:57   ` Joakim Tjernlund
@ 2015-09-29  0:23   ` Scott Wood
  1 sibling, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-09-29  0:23 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:51:09PM +0200, Christophe Leroy wrote:
> clear_pages() is never used, and PPC32 is the only architecture
> (still) having this function. Neither PPC64 nor any other
> architecture has it.

It is used, by clear_page().

> This patch removes clear_page() and move clear_page() function
> inline (same as PPC64) as it only is a few isns

It removes clear_pages(), not clear_page().

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 21/25] powerpc: add inline functions for cache related instructions
  2015-09-22 16:51 ` [PATCH v2 21/25] powerpc: add inline functions for cache related instructions Christophe Leroy
@ 2015-09-29  0:25   ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-09-29  0:25 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:51:11PM +0200, Christophe Leroy wrote:
> This patch adds inline functions to use dcbz, dcbi, dcbf, dcbst
> from C functions
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> New in v2
> 
>  arch/powerpc/include/asm/cache.h | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)

This patch needs to come before patch 20/25, which uses dcbz().

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-22 16:51 ` [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
  2015-09-22 18:12   ` Joakim Tjernlund
@ 2015-09-29  0:29   ` Scott Wood
  2015-10-07 12:49     ` Christophe Leroy
  1 sibling, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-09-29  0:29 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:51:13PM +0200, Christophe Leroy wrote:
> flush/clean/invalidate _dcache_range() functions are all very
> similar and are quite short. They are mainly used in __dma_sync()
> perf_event locate them in the top 3 consumming functions during
> heavy ethernet activity
> 
> They are good candidate for inlining, as __dma_sync() does
> almost nothing but calling them
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> New in v2
> 
>  arch/powerpc/include/asm/cacheflush.h | 55 +++++++++++++++++++++++++++--
>  arch/powerpc/kernel/misc_32.S         | 65 -----------------------------------
>  arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
>  3 files changed, 54 insertions(+), 68 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
> index 6229e6b..6169604 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -47,12 +47,61 @@ static inline void __flush_dcache_icache_phys(unsigned long physaddr)
>  }
>  #endif
>  
> -extern void flush_dcache_range(unsigned long start, unsigned long stop);
>  #ifdef CONFIG_PPC32
> -extern void clean_dcache_range(unsigned long start, unsigned long stop);
> -extern void invalidate_dcache_range(unsigned long start, unsigned long stop);
> +/*
> + * Write any modified data cache blocks out to memory and invalidate them.
> + * Does not invalidate the corresponding instruction cache blocks.
> + */
> +static inline void flush_dcache_range(unsigned long start, unsigned long stop)
> +{
> +	void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> +	unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
> +	unsigned int i;
> +
> +	for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
> +		dcbf(addr);
> +	if (i)
> +		mb();	/* sync */
> +}

I know this is 32-bit-specific code, but it's still bad practice to use
"unsigned int" for addresses or sizes thereof.

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 14/25] powerpc32: remove ioremap_base
  2015-09-22 16:50 ` [PATCH v2 14/25] powerpc32: remove ioremap_base Christophe Leroy
@ 2015-09-29  0:38   ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-09-29  0:38 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, Sep 22, 2015 at 06:50:56PM +0200, Christophe Leroy wrote:
> ioremap_base is not initialised and is nowhere used so remove it
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> No change in v2

Could you also clean up the references to ioremap_base in the comments?

-Scott

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
  2015-09-28 22:07   ` Scott Wood
@ 2015-10-06 13:35     ` Christophe Leroy
  2015-10-06 16:39       ` Scott Wood
  2015-10-06 16:46       ` Scott Wood
  0 siblings, 2 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-10-06 13:35 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 29/09/2015 00:07, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
>> We are spending between 40 and 160 cycles with a mean of 65 cycles in
>> the TLB handling routines (measured with mftbl) so make it more
>> simple althought it adds one instruction.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Does this just make it simpler or does it make it faster?  What is the
> performance impact?  Is the performance impact seen with or without
> CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> mtspr/mfspr combo in order to replace one mfspr.
>
>
The performance impact is not noticeable. Theoritically it adds 1 cycle 
on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
spend around 10% of the time in TLB handling exceptions, that represents 
only 0.15% of the total CPU time. So that's almost nothing.
Behind the fact to get in simpler, the main reason is because I need a 
third register for the following patch in the set, otherwise I would 
spend a more time saving and restoring CR several times.

Christophe


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either
  2015-09-28 23:41   ` Scott Wood
@ 2015-10-06 13:50     ` Christophe Leroy
  0 siblings, 0 replies; 76+ messages in thread
From: Christophe Leroy @ 2015-10-06 13:50 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev


Le 29/09/2015 01:41, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:40PM +0200, Christophe Leroy wrote:
>> iounmap() cannot vunmap() area mapped by TLBCAMs either
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> No change in v2
>>
>>   arch/powerpc/mm/pgtable_32.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
>> index 7692d1b..03a073a 100644
>> --- a/arch/powerpc/mm/pgtable_32.c
>> +++ b/arch/powerpc/mm/pgtable_32.c
>> @@ -278,7 +278,9 @@ void iounmap(volatile void __iomem *addr)
>>   	 * If mapped by BATs then there is nothing to do.
>>   	 * Calling vfree() generates a benign warning.
>>   	 */
>> -	if (v_mapped_by_bats((unsigned long)addr)) return;
>> +	if (v_mapped_by_bats((unsigned long)addr) ||
>> +	    v_mapped_by_tlbcam((unsigned long)addr))
>> +		return;
> This is pretty pointless given that the next patch replaces both with
> v_mapped_by_other().
>
>
I thought it was cleaner to first fix the bug, in order to make the 
following patch straight through, but I can skip it, no problem.

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together
  2015-09-28 23:47   ` Scott Wood
@ 2015-10-06 14:02     ` Christophe Leroy
  2015-10-06 15:16       ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-10-06 14:02 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 29/09/2015 01:47, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote:
>> x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
>> purpose, so lets group them into a single function.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> No change in v2
>>
>>   arch/powerpc/mm/pgtable_32.c | 33 ++++++++++++++++++++++++++-------
>>   1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
>> index 03a073a..3fd9083 100644
>> --- a/arch/powerpc/mm/pgtable_32.c
>> +++ b/arch/powerpc/mm/pgtable_32.c
>> @@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t pa);
>>   #define p_mapped_by_tlbcam(x)	(0UL)
>>   #endif /* HAVE_TLBCAM */
>>   
>> +static inline unsigned long p_mapped_by_other(phys_addr_t pa)
>> +{
>> +	unsigned long v;
>> +
>> +	v = p_mapped_by_bats(pa);
>> +	if (v /*&& p_mapped_by_bats(p+size-1)*/)
>> +		return v;
>> +
>> +	return p_mapped_by_tlbcam(pa);
>> +}
> Did you forget to remove that comment?
>
>
No I didn't, I though it was there for a reason, it has been there since 
2005.
Do you think I should remove it ?

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup
  2015-09-28 23:58   ` Scott Wood
@ 2015-10-06 14:10     ` Christophe Leroy
  2015-10-06 15:17       ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-10-06 14:10 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 29/09/2015 01:58, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote:
>> On recent kernels, with some debug options like for instance
>> CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough
>> the kernel code fits in the first 8M.
>> Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M
>> at startup, allthough pinning TLB is not necessary for that.
>>
>> This patch adds a second 8M page to the initial mapping in order to
>> have 16M mapped regardless of CONFIG_PIN_TLB, like several other
>> 32 bits PPC (40x, 601, ...)
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
> Is the assumption that nobody is still running 8xx systems with only 8
> MiB RAM on current kernels?
>
>
No, setup_initial_memory_limit() limits the memory to the minimum 
between 16M and the real memory size, so if a platform has only 8M, it 
will still be limited to 8M even with 16M mapped.

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations
  2015-09-29  0:00   ` Scott Wood
@ 2015-10-06 14:12     ` Christophe Leroy
  2015-10-06 16:48       ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-10-06 14:12 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 29/09/2015 02:00, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote:
>> We are spending between 40 and 160 cycles with a mean of 65 cycles
>> in the TLB handling routines (measured with mftbl) so make it more
>> simple althought it adds one instruction
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> No change in v2
>>
>>   arch/powerpc/kernel/head_8xx.S | 15 ++++-----------
>>   1 file changed, 4 insertions(+), 11 deletions(-)
> Why is this a separate patch from 1/25?
>
> Same comments as on that patch.
>
>
Just because here there is no real need behind the simplification of the 
code, whereas the first one was a pre-requisite for the following patch.
Should I merge them together anyway ?

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones
  2015-09-29  0:03   ` Scott Wood
@ 2015-10-06 14:35     ` Christophe Leroy
  2015-10-06 16:56       ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-10-06 14:35 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 29/09/2015 02:03, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote:
>> Move 8xx SPRN defines into reg_8xx.h and add some missing ones
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> No change in v2
> Why are they being moved?  Why are they being separated from the bit
> definitions?
>
>
It was to keep asm/reg_8xx.h self sufficient for the following patch.

Also because including asm/mmu-8xx.h creates circular inclusion issue 
(mmu-8xx.h needs page.h which includes page-32.h, page-32.h includes 
cache.h, cache.h include reg.h which includes reg_8xx). The circle 
starts with an inclusion of asm/cache.h by linux/cache.h, himself 
included by linux/printk.h, and I end up with 'implicit declaration' issues.

How can I fix that ?

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together
  2015-10-06 14:02     ` Christophe Leroy
@ 2015-10-06 15:16       ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-10-06 15:16 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 2015-10-06 at 16:02 +0200, Christophe Leroy wrote:
> Le 29/09/2015 01:47, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote:
> > > x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
> > > purpose, so lets group them into a single function.
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > ---
> > > No change in v2
> > > 
> > >   arch/powerpc/mm/pgtable_32.c | 33 ++++++++++++++++++++++++++-------
> > >   1 file changed, 26 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> > > index 03a073a..3fd9083 100644
> > > --- a/arch/powerpc/mm/pgtable_32.c
> > > +++ b/arch/powerpc/mm/pgtable_32.c
> > > @@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t 
> > > pa);
> > >   #define p_mapped_by_tlbcam(x)   (0UL)
> > >   #endif /* HAVE_TLBCAM */
> > >   
> > > +static inline unsigned long p_mapped_by_other(phys_addr_t pa)
> > > +{
> > > + unsigned long v;
> > > +
> > > + v = p_mapped_by_bats(pa);
> > > + if (v /*&& p_mapped_by_bats(p+size-1)*/)
> > > +         return v;
> > > +
> > > + return p_mapped_by_tlbcam(pa);
> > > +}
> > Did you forget to remove that comment?
> > 
> > 
> No I didn't, I though it was there for a reason, it has been there since 
> 2005.
> Do you think I should remove it ?

Oh, you took it from __ioremap_caller.  Commented-out code is generally 
frowned upon, and it makes even less sense now because there's no "size" in 
p_mapped_by_other.

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup
  2015-10-06 14:10     ` Christophe Leroy
@ 2015-10-06 15:17       ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-10-06 15:17 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 2015-10-06 at 16:10 +0200, Christophe Leroy wrote:
> Le 29/09/2015 01:58, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote:
> > > On recent kernels, with some debug options like for instance
> > > CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough
> > > the kernel code fits in the first 8M.
> > > Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M
> > > at startup, allthough pinning TLB is not necessary for that.
> > > 
> > > This patch adds a second 8M page to the initial mapping in order to
> > > have 16M mapped regardless of CONFIG_PIN_TLB, like several other
> > > 32 bits PPC (40x, 601, ...)
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > ---
> > Is the assumption that nobody is still running 8xx systems with only 8
> > MiB RAM on current kernels?
> > 
> > 
> No, setup_initial_memory_limit() limits the memory to the minimum 
> between 16M and the real memory size, so if a platform has only 8M, it 
> will still be limited to 8M even with 16M mapped.

And you just hope you don't get a speculative fetch from the second 8M?

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
  2015-10-06 13:35     ` Christophe Leroy
@ 2015-10-06 16:39       ` Scott Wood
  2015-10-06 16:46       ` Scott Wood
  1 sibling, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-10-06 16:39 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 00:07, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles in
> > > the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction.
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > Does this just make it simpler or does it make it faster?  What is the
> > performance impact?  Is the performance impact seen with or without
> > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > mtspr/mfspr combo in order to replace one mfspr.
> > 
> > 
> The performance impact is not noticeable. Theoritically it adds 1 cycle 
> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
> spend around 10% of the time in TLB handling exceptions, that represents 
> only 0.15% of the total CPU time. So that's almost nothing.
> Behind the fact to get in simpler, the main reason is because I need a 
> third register for the following patch in the set, otherwise I would 
> spend a more time saving and restoring CR several times.

If you had said in the changelog that it was because future patches would 
need the register to be saved, we could have avoided this exchange...  
Especially with large patchsets, I review the patches one at a time.  Don't 
assume I know what's coming in patch n+1 (and especially not n+m) when I 
review patch n.

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
  2015-10-06 13:35     ` Christophe Leroy
  2015-10-06 16:39       ` Scott Wood
@ 2015-10-06 16:46       ` Scott Wood
  2015-10-06 20:30         ` christophe leroy
  1 sibling, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-10-06 16:46 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 00:07, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles in
> > > the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction.
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > Does this just make it simpler or does it make it faster?  What is the
> > performance impact?  Is the performance impact seen with or without
> > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > mtspr/mfspr combo in order to replace one mfspr.
> > 
> > 
> The performance impact is not noticeable. Theoritically it adds 1 cycle 
> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
> spend around 10% of the time in TLB handling exceptions, that represents 
> only 0.15% of the total CPU time. So that's almost nothing.
> Behind the fact to get in simpler, the main reason is because I need a 
> third register for the following patch in the set, otherwise I would 
> spend a more time saving and restoring CR several times.

FWIW, the added instruction is an SPR access and I doubt that's only one 
cycle.

-Scott



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations
  2015-10-06 14:12     ` Christophe Leroy
@ 2015-10-06 16:48       ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-10-06 16:48 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 2015-10-06 at 16:12 +0200, Christophe Leroy wrote:
> Le 29/09/2015 02:00, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles
> > > in the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > ---
> > > No change in v2
> > > 
> > >   arch/powerpc/kernel/head_8xx.S | 15 ++++-----------
> > >   1 file changed, 4 insertions(+), 11 deletions(-)
> > Why is this a separate patch from 1/25?
> > 
> > Same comments as on that patch.
> > 
> > 
> Just because here there is no real need behind the simplification of the 
> code, whereas the first one was a pre-requisite for the following patch.
> Should I merge them together anyway ?

If there's no real need, why do it?  It's not really a major readability 
enhancement...

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones
  2015-10-06 14:35     ` Christophe Leroy
@ 2015-10-06 16:56       ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-10-06 16:56 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 2015-10-06 at 16:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 02:03, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote:
> > > Move 8xx SPRN defines into reg_8xx.h and add some missing ones
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > ---
> > > No change in v2
> > Why are they being moved?  Why are they being separated from the bit
> > definitions?
> > 
> > 
> It was to keep asm/reg_8xx.h self sufficient for the following patch.

Again, it would have been nice if this were in the commit message.

> Also because including asm/mmu-8xx.h creates circular inclusion issue 
> (mmu-8xx.h needs page.h which includes page-32.h, page-32.h includes 
> cache.h, cache.h include reg.h which includes reg_8xx). The circle 
> starts with an inclusion of asm/cache.h by linux/cache.h, himself 
> included by linux/printk.h, and I end up with 'implicit declaration' issues.
> 
> How can I fix that ?

mmu-8xx.h should have been including page.h instead of assuming the caller h 
as done so...  but another option is to do what mmu-book3e.h does, and use 
the kconfig symbols instead of PAGE_SHIFT.

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
  2015-10-06 16:46       ` Scott Wood
@ 2015-10-06 20:30         ` christophe leroy
  2015-10-06 20:38           ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: christophe leroy @ 2015-10-06 20:30 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 06/10/2015 18:46, Scott Wood a écrit :
> On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
>> Le 29/09/2015 00:07, Scott Wood a écrit :
>>> On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
>>>> We are spending between 40 and 160 cycles with a mean of 65 cycles in
>>>> the TLB handling routines (measured with mftbl) so make it more
>>>> simple althought it adds one instruction.
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>> Does this just make it simpler or does it make it faster?  What is the
>>> performance impact?  Is the performance impact seen with or without
>>> CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
>>> mtspr/mfspr combo in order to replace one mfspr.
>>>
>>>
>> The performance impact is not noticeable. Theoritically it adds 1 cycle
>> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we
>> spend around 10% of the time in TLB handling exceptions, that represents
>> only 0.15% of the total CPU time. So that's almost nothing.
>> Behind the fact to get in simpler, the main reason is because I need a
>> third register for the following patch in the set, otherwise I would
>> spend a more time saving and restoring CR several times.
> FWIW, the added instruction is an SPR access and I doubt that's only one
> cycle.
>
>
According to the mpc885 reference manual (table 9-1), Instruction 
Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR
and CTR and to SPRs external to the core" is "serialize + 1 cycle".
Taking into account we preeceeding instructions are also 'mtspr', we are 
already serialized, so it is only one cycle I believe.
Am I interpreting it wrong ?

Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
  2015-10-06 20:30         ` christophe leroy
@ 2015-10-06 20:38           ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-10-06 20:38 UTC (permalink / raw)
  To: christophe leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 2015-10-06 at 22:30 +0200, christophe leroy wrote:
> Le 06/10/2015 18:46, Scott Wood a écrit :
> > On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> > > Le 29/09/2015 00:07, Scott Wood a écrit :
> > > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > > > We are spending between 40 and 160 cycles with a mean of 65 cycles 
> > > > > in
> > > > > the TLB handling routines (measured with mftbl) so make it more
> > > > > simple althought it adds one instruction.
> > > > > 
> > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > Does this just make it simpler or does it make it faster?  What is the
> > > > performance impact?  Is the performance impact seen with or without
> > > > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > > > mtspr/mfspr combo in order to replace one mfspr.
> > > > 
> > > > 
> > > The performance impact is not noticeable. Theoritically it adds 1 cycle
> > > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we
> > > spend around 10% of the time in TLB handling exceptions, that represents
> > > only 0.15% of the total CPU time. So that's almost nothing.
> > > Behind the fact to get in simpler, the main reason is because I need a
> > > third register for the following patch in the set, otherwise I would
> > > spend a more time saving and restoring CR several times.
> > FWIW, the added instruction is an SPR access and I doubt that's only one
> > cycle.
> > 
> > 
> According to the mpc885 reference manual (table 9-1), Instruction 
> Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR
> and CTR and to SPRs external to the core" is "serialize + 1 cycle".
> Taking into account we preeceeding instructions are also 'mtspr', we are 
> already serialized, so it is only one cycle I believe.
> Am I interpreting it wrong ?

I don't know.  The manual doesn't go into much detail about the mechanics of 
serialization.  If it's just about "block[ing] all execution units" without 
any effect on fetching, decoding, etc. then maybe you're right.

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-09-29  0:29   ` Scott Wood
@ 2015-10-07 12:49     ` Christophe Leroy
  2015-10-08 19:12       ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-10-07 12:49 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 29/09/2015 02:29, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:51:13PM +0200, Christophe Leroy wrote:
>> flush/clean/invalidate _dcache_range() functions are all very
>> similar and are quite short. They are mainly used in __dma_sync()
>> perf_event locate them in the top 3 consumming functions during
>> heavy ethernet activity
>>
>> They are good candidate for inlining, as __dma_sync() does
>> almost nothing but calling them
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> New in v2
>>
>>   arch/powerpc/include/asm/cacheflush.h | 55 +++++++++++++++++++++++++++--
>>   arch/powerpc/kernel/misc_32.S         | 65 -----------------------------------
>>   arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
>>   3 files changed, 54 insertions(+), 68 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
>> index 6229e6b..6169604 100644
>> --- a/arch/powerpc/include/asm/cacheflush.h
>> +++ b/arch/powerpc/include/asm/cacheflush.h
>> @@ -47,12 +47,61 @@ static inline void __flush_dcache_icache_phys(unsigned long physaddr)
>>   }
>>   #endif
>>   
>> -extern void flush_dcache_range(unsigned long start, unsigned long stop);
>>   #ifdef CONFIG_PPC32
>> -extern void clean_dcache_range(unsigned long start, unsigned long stop);
>> -extern void invalidate_dcache_range(unsigned long start, unsigned long stop);
>> +/*
>> + * Write any modified data cache blocks out to memory and invalidate them.
>> + * Does not invalidate the corresponding instruction cache blocks.
>> + */
>> +static inline void flush_dcache_range(unsigned long start, unsigned long stop)
>> +{
>> +	void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
>> +	unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
>> +		dcbf(addr);
>> +	if (i)
>> +		mb();	/* sync */
>> +}
> I know this is 32-bit-specific code, but it's still bad practice to use
> "unsigned int" for addresses or sizes thereof.
>
>
Ok, I can fix size, but what about start and stop ? If I change that, it 
means I also have to fix all caller. Do you expect me to do that ?

And it is very unlykely, but what if for some reason someone wants to 
invalidate the entire user address space which is 3Gbytes size ? A 
signed size would be negative here.

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap
  2015-09-28 23:39   ` Scott Wood
@ 2015-10-08 12:34     ` Christophe Leroy
  2015-10-08 19:13       ` Scott Wood
  0 siblings, 1 reply; 76+ messages in thread
From: Christophe Leroy @ 2015-10-08 12:34 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 29/09/2015 01:39, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:38PM +0200, Christophe Leroy wrote:
>> Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
>> 648K rodata, 508K init, 290K bss, 6644K reserved)
>> Kernel virtual memory layout:
>>    * 0xfffdf000..0xfffff000  : fixmap
>>    * 0xfde00000..0xfe000000  : consistent mem
>>    * 0xfddf6000..0xfde00000  : early ioremap
>>    * 0xc9000000..0xfddf6000  : vmalloc & ioremap
>> SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
>>
>> Mapping IMMR 1:1 is just wrong because it may overlap with another
>> area. On most mpc8xx boards it is OK because IMMR is set to
>> 0xff000000 but for instance on EP88xC board, IMMR is at 0xfa200000
>> which overlaps with VM ioremap area
>>
>> This patch fixes the virtual address for remapping IMMR to 0xff000000,
>> regardless of the value of IMMR.
>>
>> The size of IMMR area is 256kbytes (CPM at offset 0, security engine
>> at offset 128) so 512kbytes is enough and allows to handle the EP88xC
>> case (which is not 8Mbytes but only 2Mbytes aligned) the same way.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Instead of hardcoding 0xff000000, can you use asm/fixmap.h to allocate a
> virtual address at compile time?
>
>
Yes good idea, but in asm/fixmap.h FIX_XXXX constants are defined as enums.
Is there a way to use them in head_8xx.S ?

Christophe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-10-07 12:49     ` Christophe Leroy
@ 2015-10-08 19:12       ` Scott Wood
  2015-10-12 18:08         ` christophe leroy
  0 siblings, 1 reply; 76+ messages in thread
From: Scott Wood @ 2015-10-08 19:12 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Wed, 2015-10-07 at 14:49 +0200, Christophe Leroy wrote:
> Le 29/09/2015 02:29, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:51:13PM +0200, Christophe Leroy wrote:
> > > flush/clean/invalidate _dcache_range() functions are all very
> > > similar and are quite short. They are mainly used in __dma_sync()
> > > perf_event locate them in the top 3 consumming functions during
> > > heavy ethernet activity
> > > 
> > > They are good candidate for inlining, as __dma_sync() does
> > > almost nothing but calling them
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > ---
> > > New in v2
> > > 
> > >   arch/powerpc/include/asm/cacheflush.h | 55 
> > > +++++++++++++++++++++++++++--
> > >   arch/powerpc/kernel/misc_32.S         | 65 ---------------------------
> > > --------
> > >   arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
> > >   3 files changed, 54 insertions(+), 68 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/include/asm/cacheflush.h 
> > > b/arch/powerpc/include/asm/cacheflush.h
> > > index 6229e6b..6169604 100644
> > > --- a/arch/powerpc/include/asm/cacheflush.h
> > > +++ b/arch/powerpc/include/asm/cacheflush.h
> > > @@ -47,12 +47,61 @@ static inline void 
> > > __flush_dcache_icache_phys(unsigned long physaddr)
> > >   }
> > >   #endif
> > >   
> > > -extern void flush_dcache_range(unsigned long start, unsigned long 
> > > stop);
> > >   #ifdef CONFIG_PPC32
> > > -extern void clean_dcache_range(unsigned long start, unsigned long 
> > > stop);
> > > -extern void invalidate_dcache_range(unsigned long start, unsigned long 
> > > stop);
> > > +/*
> > > + * Write any modified data cache blocks out to memory and invalidate 
> > > them.
> > > + * Does not invalidate the corresponding instruction cache blocks.
> > > + */
> > > +static inline void flush_dcache_range(unsigned long start, unsigned 
> > > long stop)
> > > +{
> > > + void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
> > > + unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
> > > + unsigned int i;
> > > +
> > > + for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
> > > +         dcbf(addr);
> > > + if (i)
> > > +         mb();   /* sync */
> > > +}
> > I know this is 32-bit-specific code, but it's still bad practice to use
> > "unsigned int" for addresses or sizes thereof.
> > 
> > 
> Ok, I can fix size, but what about start and stop ? If I change that, it 
> means I also have to fix all caller. Do you expect me to do that ?

start and stop are already unsigned long.

> And it is very unlykely, but what if for some reason someone wants to 
> invalidate the entire user address space which is 3Gbytes size ? A 
> signed size would be negative here.

Why would size be signed?

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap
  2015-10-08 12:34     ` Christophe Leroy
@ 2015-10-08 19:13       ` Scott Wood
  0 siblings, 0 replies; 76+ messages in thread
From: Scott Wood @ 2015-10-08 19:13 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Thu, 2015-10-08 at 14:34 +0200, Christophe Leroy wrote:
> Le 29/09/2015 01:39, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:38PM +0200, Christophe Leroy wrote:
> > > Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
> > > 648K rodata, 508K init, 290K bss, 6644K reserved)
> > > Kernel virtual memory layout:
> > >    * 0xfffdf000..0xfffff000  : fixmap
> > >    * 0xfde00000..0xfe000000  : consistent mem
> > >    * 0xfddf6000..0xfde00000  : early ioremap
> > >    * 0xc9000000..0xfddf6000  : vmalloc & ioremap
> > > SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> > > 
> > > Mapping IMMR 1:1 is just wrong because it may overlap with another
> > > area. On most mpc8xx boards it is OK because IMMR is set to
> > > 0xff000000 but for instance on EP88xC board, IMMR is at 0xfa200000
> > > which overlaps with VM ioremap area
> > > 
> > > This patch fixes the virtual address for remapping IMMR to 0xff000000,
> > > regardless of the value of IMMR.
> > > 
> > > The size of IMMR area is 256kbytes (CPM at offset 0, security engine
> > > at offset 128) so 512kbytes is enough and allows to handle the EP88xC
> > > case (which is not 8Mbytes but only 2Mbytes aligned) the same way.
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > Instead of hardcoding 0xff000000, can you use asm/fixmap.h to allocate a
> > virtual address at compile time?
> > 
> > 
> Yes good idea, but in asm/fixmap.h FIX_XXXX constants are defined as enums.
> Is there a way to use them in head_8xx.S ?

asm-offsets

-Scott


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
  2015-10-08 19:12       ` Scott Wood
@ 2015-10-12 18:08         ` christophe leroy
  0 siblings, 0 replies; 76+ messages in thread
From: christophe leroy @ 2015-10-12 18:08 UTC (permalink / raw)
  To: Scott Wood
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 08/10/2015 21:12, Scott Wood a écrit :
> On Wed, 2015-10-07 at 14:49 +0200, Christophe Leroy wrote:
>> Le 29/09/2015 02:29, Scott Wood a écrit :
>>> On Tue, Sep 22, 2015 at 06:51:13PM +0200, Christophe Leroy wrote:
>>>> flush/clean/invalidate _dcache_range() functions are all very
>>>> similar and are quite short. They are mainly used in __dma_sync()
>>>> perf_event locate them in the top 3 consumming functions during
>>>> heavy ethernet activity
>>>>
>>>> They are good candidate for inlining, as __dma_sync() does
>>>> almost nothing but calling them
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>>> ---
>>>> New in v2
>>>>
>>>>    arch/powerpc/include/asm/cacheflush.h | 55
>>>> +++++++++++++++++++++++++++--
>>>>    arch/powerpc/kernel/misc_32.S         | 65 ---------------------------
>>>> --------
>>>>    arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
>>>>    3 files changed, 54 insertions(+), 68 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/include/asm/cacheflush.h
>>>> b/arch/powerpc/include/asm/cacheflush.h
>>>> index 6229e6b..6169604 100644
>>>> --- a/arch/powerpc/include/asm/cacheflush.h
>>>> +++ b/arch/powerpc/include/asm/cacheflush.h
>>>> @@ -47,12 +47,61 @@ static inline void
>>>> __flush_dcache_icache_phys(unsigned long physaddr)
>>>>    }
>>>>    #endif
>>>>    
>>>> -extern void flush_dcache_range(unsigned long start, unsigned long
>>>> stop);
>>>>    #ifdef CONFIG_PPC32
>>>> -extern void clean_dcache_range(unsigned long start, unsigned long
>>>> stop);
>>>> -extern void invalidate_dcache_range(unsigned long start, unsigned long
>>>> stop);
>>>> +/*
>>>> + * Write any modified data cache blocks out to memory and invalidate
>>>> them.
>>>> + * Does not invalidate the corresponding instruction cache blocks.
>>>> + */
>>>> +static inline void flush_dcache_range(unsigned long start, unsigned
>>>> long stop)
>>>> +{
>>>> + void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
>>>> + unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
>>>> + unsigned int i;
>>>> +
>>>> + for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
>>>> +         dcbf(addr);
>>>> + if (i)
>>>> +         mb();   /* sync */
>>>> +}
>>> I know this is 32-bit-specific code, but it's still bad practice to use
>>> "unsigned int" for addresses or sizes thereof.
>>>
>>>
>> Ok, I can fix size, but what about start and stop ? If I change that, it
>> means I also have to fix all caller. Do you expect me to do that ?
> start and stop are already unsigned long.
>
>> And it is very unlykely, but what if for some reason someone wants to
>> invalidate the entire user address space which is 3Gbytes size ? A
>> signed size would be negative here.
> Why would size be signed?

Oops, indeed I misunderstood your comment, thought you said:
* size has to be signed int instead of unsigned int
* addresses have to be void *

I understand now that size and addresses should be unsigned long instead

Christophe


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2015-10-12 18:08 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
2015-09-28 22:07   ` Scott Wood
2015-10-06 13:35     ` Christophe Leroy
2015-10-06 16:39       ` Scott Wood
2015-10-06 16:46       ` Scott Wood
2015-10-06 20:30         ` christophe leroy
2015-10-06 20:38           ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 02/25] powerpc/8xx: Map linear kernel RAM with 8M pages Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 03/25] powerpc: Update documentation for noltlbs kernel parameter Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 04/25] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap Christophe Leroy
2015-09-28 23:39   ` Scott Wood
2015-10-08 12:34     ` Christophe Leroy
2015-10-08 19:13       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either Christophe Leroy
2015-09-28 23:41   ` Scott Wood
2015-10-06 13:50     ` Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together Christophe Leroy
2015-09-28 23:47   ` Scott Wood
2015-10-06 14:02     ` Christophe Leroy
2015-10-06 15:16       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
2015-09-24 11:41   ` David Laight
2015-09-24 20:14     ` Scott Wood
2015-09-25 14:46       ` David Laight
2015-09-25 17:09         ` Scott Wood
2015-09-28 23:53   ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 09/25] powerpc/8xx: show IMMR area in startup memory layout Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 10/25] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup Christophe Leroy
2015-09-28 23:58   ` Scott Wood
2015-10-06 14:10     ` Christophe Leroy
2015-10-06 15:17       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 12/25] powerpc32: Remove useless/wrong MMU:setio progress message Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations Christophe Leroy
2015-09-29  0:00   ` Scott Wood
2015-10-06 14:12     ` Christophe Leroy
2015-10-06 16:48       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 14/25] powerpc32: remove ioremap_base Christophe Leroy
2015-09-29  0:38   ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones Christophe Leroy
2015-09-29  0:03   ` Scott Wood
2015-10-06 14:35     ` Christophe Leroy
2015-10-06 16:56       ` Scott Wood
2015-09-22 16:51 ` [PATCH v2 16/25] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 17/25] powerpc/8xx: remove special handling of CPU6 errata in set_dec() Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 18/25] powerpc/8xx: rewrite set_context() in C Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 19/25] powerpc/8xx: rewrite flush_instruction_cache() " Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
2015-09-22 17:57   ` Joakim Tjernlund
2015-09-29  0:23   ` Scott Wood
2015-09-22 16:51 ` [PATCH v2 21/25] powerpc: add inline functions for cache related instructions Christophe Leroy
2015-09-29  0:25   ` Scott Wood
2015-09-22 16:51 ` [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
2015-09-22 18:12   ` Joakim Tjernlund
2015-09-22 18:58     ` Scott Wood
2015-09-22 19:34       ` Joakim Tjernlund
2015-09-22 19:42         ` Scott Wood
2015-09-22 19:55           ` Joakim Tjernlund
2015-09-22 20:07             ` Joakim Tjernlund
2015-09-22 20:14             ` Scott Wood
2015-09-22 20:32               ` Joakim Tjernlund
2015-09-22 20:35                 ` Scott Wood
2015-09-22 20:38                   ` Joakim Tjernlund
2015-09-22 20:57                     ` Christophe Leroy
2015-09-22 22:34                       ` Scott Wood
2015-09-22 22:49                         ` Christophe Leroy
2015-09-22 22:52                           ` Scott Wood
2015-09-29  0:29   ` Scott Wood
2015-10-07 12:49     ` Christophe Leroy
2015-10-08 19:12       ` Scott Wood
2015-10-12 18:08         ` christophe leroy
2015-09-22 16:51 ` [PATCH v2 23/25] powerpc: Simplify test in __dma_sync() Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 24/25] powerpc32: small optimisation in flush_icache_range() Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 25/25] powerpc32: Remove one insn in mulhdu Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).