All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2)
@ 2008-12-15  5:43 Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 1/16] powerpc: Fix bogus cache flushing on all 40x and BookE processors v2 Benjamin Herrenschmidt
                   ` (15 more replies)
  0 siblings, 16 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This series of patches is aimed at supporting SMP on non-hash
based processors. It consists of a rework of the MMU context
management and TLB management, clearly splitting hash32, hash64
and nohash in both cases, adding SMP safe context handling and
some basic SMP TLB management.

There is room for improvements, such as implementing lazy TLB
flushing on processors without invalidate-by-PID support HW,
some better IPI mechanism, support for variable sizes PID,
lock less fast path in the MMU context switch, etc...
but it should basically work.

There are some semingly unrelated patches in the pile as they
are dependencies of the main ones so I'm including them in.
Some of these may already have been applied in Kumar or jwb
tree.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/16] powerpc: Fix bogus cache flushing on all 40x and BookE processors v2
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 2/16] powerpc: Fix asm EMIT_BUG_ENTRY with !CONFIG_BUG Benjamin Herrenschmidt
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

We were missing the CPU_FTR_NOEXECUTE bit in our cputable for all
these processors. The result is that update_mmu_cache() would flush
the cache for all pages mapped to userspace which is totally
unnecessary on those processors since we already handle flushing
on execute in the page fault path.

This should provide a nice speed up ;-)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

This one fixes the E500 definition and uses a bit that works
for 32-bit processors

 arch/powerpc/include/asm/cputable.h |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/cputable.h	2008-12-03 13:32:53.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/cputable.h	2008-12-08 15:42:13.000000000 +1100
@@ -163,6 +163,7 @@ extern const char *powerpc_base_platform
 #define CPU_FTR_SPE			ASM_CONST(0x0000000002000000)
 #define CPU_FTR_NEED_PAIRED_STWCX	ASM_CONST(0x0000000004000000)
 #define CPU_FTR_LWSYNC			ASM_CONST(0x0000000008000000)
+#define CPU_FTR_NOEXECUTE		ASM_CONST(0x0000000010000000)
 
 /*
  * Add the 64-bit processor unique features in the top half of the word;
@@ -177,7 +178,6 @@ extern const char *powerpc_base_platform
 #define CPU_FTR_SLB			LONG_ASM_CONST(0x0000000100000000)
 #define CPU_FTR_16M_PAGE		LONG_ASM_CONST(0x0000000200000000)
 #define CPU_FTR_TLBIEL			LONG_ASM_CONST(0x0000000400000000)
-#define CPU_FTR_NOEXECUTE		LONG_ASM_CONST(0x0000000800000000)
 #define CPU_FTR_IABR			LONG_ASM_CONST(0x0000002000000000)
 #define CPU_FTR_MMCRA			LONG_ASM_CONST(0x0000004000000000)
 #define CPU_FTR_CTRL			LONG_ASM_CONST(0x0000008000000000)
@@ -367,19 +367,20 @@ extern const char *powerpc_base_platform
 #define CPU_FTRS_CLASSIC32	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE)
 #define CPU_FTRS_8XX	(CPU_FTR_USE_TB)
-#define CPU_FTRS_40X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN)
-#define CPU_FTRS_44X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN)
+#define CPU_FTRS_40X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
+#define CPU_FTRS_44X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_E200	(CPU_FTR_USE_TB | CPU_FTR_SPE_COMP | \
 	    CPU_FTR_NODSISRALIGN | CPU_FTR_COHERENT_ICACHE | \
-	    CPU_FTR_UNIFIED_ID_CACHE)
+	    CPU_FTR_UNIFIED_ID_CACHE | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_E500	(CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
-	    CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN)
+	    CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN | \
+	    CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_E500_2	(CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
 	    CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_BIG_PHYS | \
-	    CPU_FTR_NODSISRALIGN)
+	    CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_E500MC	(CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_BIG_PHYS | CPU_FTR_NODSISRALIGN | \
-	    CPU_FTR_L2CSR | CPU_FTR_LWSYNC)
+	    CPU_FTR_L2CSR | CPU_FTR_LWSYNC | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_GENERIC_32	(CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN)
 
 /* 64-bit CPUs */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/16] powerpc: Fix asm EMIT_BUG_ENTRY with !CONFIG_BUG
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 1/16] powerpc: Fix bogus cache flushing on all 40x and BookE processors v2 Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 3/16] powerpc/4xx: Extended DCR support v2 Benjamin Herrenschmidt
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

Instead of not defining it at all, this defines the macro as
being empty, thus avoiding ifdef's in call sites when CONFIG_BUG
is not set.

Also removes an extra whitespace in the existing definition

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/include/asm/bug.h |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- linux-work.orig/arch/powerpc/include/asm/bug.h	2008-12-08 14:37:16.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/bug.h	2008-12-08 15:14:21.000000000 +1100
@@ -3,6 +3,7 @@
 #ifdef __KERNEL__
 
 #include <asm/asm-compat.h>
+
 /*
  * Define an illegal instr to trap on the bug.
  * We don't use 0 because that marks the end of a function
@@ -14,6 +15,7 @@
 #ifdef CONFIG_BUG
 
 #ifdef __ASSEMBLY__
+#include <asm/asm-offsets.h>
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 .macro EMIT_BUG_ENTRY addr,file,line,flags
 	 .section __bug_table,"a"
@@ -26,7 +28,7 @@
 	 .previous
 .endm
 #else
- .macro EMIT_BUG_ENTRY addr,file,line,flags
+.macro EMIT_BUG_ENTRY addr,file,line,flags
 	 .section __bug_table,"a"
 5001:	 PPC_LONG \addr
 	 .short \flags
@@ -113,6 +115,13 @@
 #define HAVE_ARCH_BUG_ON
 #define HAVE_ARCH_WARN_ON
 #endif /* __ASSEMBLY __ */
+#else
+#ifdef __ASSEMBLY__
+.macro EMIT_BUG_ENTRY addr,file,line,flags
+.endm
+#else /* !__ASSEMBLY__ */
+#define _EMIT_BUG_ENTRY
+#endif
 #endif /* CONFIG_BUG */
 
 #include <asm-generic/bug.h>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/16] powerpc/4xx: Extended DCR support v2
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 1/16] powerpc: Fix bogus cache flushing on all 40x and BookE processors v2 Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 2/16] powerpc: Fix asm EMIT_BUG_ENTRY with !CONFIG_BUG Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-17 17:33   ` Josh Boyer
  2008-12-15  5:44 ` [PATCH 4/16] powerpc/fsl-booke: Fix problem with _tlbil_va Benjamin Herrenschmidt
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This adds supports to the "extended" DCR addressing via
the indirect mfdcrx/mtdcrx instructions supported by some
4xx cores (440H6 and later)

I enabled the feature for now only on AMCC 460 chips

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

This variant uses "440x6" instead of "440H6". I made no other
changes to the code as I think the codegen is the less bad I've
had so far and I rely on Josh further work on cleaning up the
type of 440core selection at Kconfig time so the feature are
properly reflected in the POSSIBLE and ALWAYS masks based on
the core selection. That way, if only one core type is selected
the feature test should resolve at compile time.


 arch/powerpc/include/asm/cputable.h   |    7 ++-
 arch/powerpc/include/asm/dcr-native.h |   63 +++++++++++++++++++++++++++-------
 arch/powerpc/kernel/cputable.c        |    4 +-
 arch/powerpc/sysdev/dcr-low.S         |    8 +++-
 4 files changed, 65 insertions(+), 17 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/cputable.h	2008-12-08 15:56:42.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/cputable.h	2008-12-09 14:04:44.000000000 +1100
@@ -164,6 +164,7 @@ extern const char *powerpc_base_platform
 #define CPU_FTR_NEED_PAIRED_STWCX	ASM_CONST(0x0000000004000000)
 #define CPU_FTR_LWSYNC			ASM_CONST(0x0000000008000000)
 #define CPU_FTR_NOEXECUTE		ASM_CONST(0x0000000010000000)
+#define CPU_FTR_INDEXED_DCR		ASM_CONST(0x0000000020000000)
 
 /*
  * Add the 64-bit processor unique features in the top half of the word;
@@ -369,6 +370,8 @@ extern const char *powerpc_base_platform
 #define CPU_FTRS_8XX	(CPU_FTR_USE_TB)
 #define CPU_FTRS_40X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_44X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
+#define CPU_FTRS_440x6	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE | \
+	    CPU_FTR_INDEXED_DCR)
 #define CPU_FTRS_E200	(CPU_FTR_USE_TB | CPU_FTR_SPE_COMP | \
 	    CPU_FTR_NODSISRALIGN | CPU_FTR_COHERENT_ICACHE | \
 	    CPU_FTR_UNIFIED_ID_CACHE | CPU_FTR_NOEXECUTE)
@@ -455,7 +458,7 @@ enum {
 	    CPU_FTRS_40X |
 #endif
 #ifdef CONFIG_44x
-	    CPU_FTRS_44X |
+	    CPU_FTRS_44X | CPU_FTRS_440x6 |
 #endif
 #ifdef CONFIG_E200
 	    CPU_FTRS_E200 |
@@ -495,7 +498,7 @@ enum {
 	    CPU_FTRS_40X &
 #endif
 #ifdef CONFIG_44x
-	    CPU_FTRS_44X &
+	    CPU_FTRS_44X & CPU_FTRS_440x6 &
 #endif
 #ifdef CONFIG_E200
 	    CPU_FTRS_E200 &
Index: linux-work/arch/powerpc/include/asm/dcr-native.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/dcr-native.h	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/include/asm/dcr-native.h	2008-12-08 15:56:43.000000000 +1100
@@ -23,6 +23,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/spinlock.h>
+#include <asm/cputable.h>
 
 typedef struct {
 	unsigned int base;
@@ -39,23 +40,45 @@ static inline bool dcr_map_ok_native(dcr
 #define dcr_read_native(host, dcr_n)		mfdcr(dcr_n + host.base)
 #define dcr_write_native(host, dcr_n, value)	mtdcr(dcr_n + host.base, value)
 
-/* Device Control Registers */
-void __mtdcr(int reg, unsigned int val);
-unsigned int __mfdcr(int reg);
+/* Table based DCR accessors */
+extern void __mtdcr(unsigned int reg, unsigned int val);
+extern unsigned int __mfdcr(unsigned int reg);
+
+/* mfdcrx/mtdcrx instruction based accessors. We hand code
+ * the opcodes in order not to depend on newer binutils
+ */
+static inline unsigned int mfdcrx(unsigned int reg)
+{
+	unsigned int ret;
+	asm volatile(".long 0x7c000206 | (%0 << 21) | (%1 << 16)"
+		     : "=r" (ret) : "r" (reg));
+	return ret;
+}
+
+static inline void mtdcrx(unsigned int reg, unsigned int val)
+{
+	asm volatile(".long 0x7c000306 | (%0 << 21) | (%1 << 16)"
+		     : : "r" (val), "r" (reg));
+}
+
 #define mfdcr(rn)						\
 	({unsigned int rval;					\
-	if (__builtin_constant_p(rn))				\
+	if (__builtin_constant_p(rn) && rn < 1024)		\
 		asm volatile("mfdcr %0," __stringify(rn)	\
 		              : "=r" (rval));			\
+	else if (likely(cpu_has_feature(CPU_FTR_INDEXED_DCR)))	\
+		rval = mfdcrx(rn);				\
 	else							\
 		rval = __mfdcr(rn);				\
 	rval;})
 
 #define mtdcr(rn, v)						\
 do {								\
-	if (__builtin_constant_p(rn))				\
+	if (__builtin_constant_p(rn) && rn < 1024)		\
 		asm volatile("mtdcr " __stringify(rn) ",%0"	\
 			      : : "r" (v)); 			\
+	else if (likely(cpu_has_feature(CPU_FTR_INDEXED_DCR)))	\
+		mtdcrx(rn, v);					\
 	else							\
 		__mtdcr(rn, v);					\
 } while (0)
@@ -69,8 +92,13 @@ static inline unsigned __mfdcri(int base
 	unsigned int val;
 
 	spin_lock_irqsave(&dcr_ind_lock, flags);
-	__mtdcr(base_addr, reg);
-	val = __mfdcr(base_data);
+	if (cpu_has_feature(CPU_FTR_INDEXED_DCR)) {
+		mtdcrx(base_addr, reg);
+		val = mfdcrx(base_data);
+	} else {
+		__mtdcr(base_addr, reg);
+		val = __mfdcr(base_data);
+	}
 	spin_unlock_irqrestore(&dcr_ind_lock, flags);
 	return val;
 }
@@ -81,8 +109,13 @@ static inline void __mtdcri(int base_add
 	unsigned long flags;
 
 	spin_lock_irqsave(&dcr_ind_lock, flags);
-	__mtdcr(base_addr, reg);
-	__mtdcr(base_data, val);
+	if (cpu_has_feature(CPU_FTR_INDEXED_DCR)) {
+		mtdcrx(base_addr, reg);
+		mtdcrx(base_data, val);
+	} else {
+		__mtdcr(base_addr, reg);
+		__mtdcr(base_data, val);
+	}
 	spin_unlock_irqrestore(&dcr_ind_lock, flags);
 }
 
@@ -93,9 +126,15 @@ static inline void __dcri_clrset(int bas
 	unsigned int val;
 
 	spin_lock_irqsave(&dcr_ind_lock, flags);
-	__mtdcr(base_addr, reg);
-	val = (__mfdcr(base_data) & ~clr) | set;
-	__mtdcr(base_data, val);
+	if (cpu_has_feature(CPU_FTR_INDEXED_DCR)) {
+		mtdcrx(base_addr, reg);
+		val = (mfdcrx(base_data) & ~clr) | set;
+		mtdcrx(base_data, val);
+	} else {
+		__mtdcr(base_addr, reg);
+		val = (__mfdcr(base_data) & ~clr) | set;
+		__mtdcr(base_data, val);
+	}
 	spin_unlock_irqrestore(&dcr_ind_lock, flags);
 }
 
Index: linux-work/arch/powerpc/sysdev/dcr-low.S
===================================================================
--- linux-work.orig/arch/powerpc/sysdev/dcr-low.S	2008-07-07 13:45:04.000000000 +1000
+++ linux-work/arch/powerpc/sysdev/dcr-low.S	2008-12-08 15:56:43.000000000 +1100
@@ -11,14 +11,20 @@
 
 #include <asm/ppc_asm.h>
 #include <asm/processor.h>
+#include <asm/bug.h>
 
 #define DCR_ACCESS_PROLOG(table) \
+	cmpli	cr0,r3,1024;	 \
 	rlwinm  r3,r3,4,18,27;   \
 	lis     r5,table@h;      \
 	ori     r5,r5,table@l;   \
 	add     r3,r3,r5;        \
+	bge-	1f;		 \
 	mtctr   r3;              \
-	bctr
+	bctr;			 \
+1:	trap;			 \
+	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0;	\
+	blr
 
 _GLOBAL(__mfdcr)
 	DCR_ACCESS_PROLOG(__mfdcr_table)
Index: linux-work/arch/powerpc/kernel/cputable.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/cputable.c	2008-11-24 14:48:55.000000000 +1100
+++ linux-work/arch/powerpc/kernel/cputable.c	2008-12-09 14:04:44.000000000 +1100
@@ -1506,7 +1506,7 @@ static struct cpu_spec __initdata cpu_sp
 		.pvr_mask		= 0xffff0002,
 		.pvr_value		= 0x13020002,
 		.cpu_name		= "460EX",
-		.cpu_features		= CPU_FTRS_44X,
+		.cpu_features		= CPU_FTRS_440x6,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
@@ -1518,7 +1518,7 @@ static struct cpu_spec __initdata cpu_sp
 		.pvr_mask		= 0xffff0002,
 		.pvr_value		= 0x13020000,
 		.cpu_name		= "460GT",
-		.cpu_features		= CPU_FTRS_44X,
+		.cpu_features		= CPU_FTRS_440x6,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 4/16] powerpc/fsl-booke: Fix problem with _tlbil_va
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (2 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 3/16] powerpc/4xx: Extended DCR support v2 Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15  6:59   ` Stephen Rothwell
  2008-12-15  5:44 ` [PATCH 5/16] powerpc/mm: Add local_flush_tlb_mm() to SW loaded TLB implementations Benjamin Herrenschmidt
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

From: Kumar Gala <galak@kernel.crashing.org>

An example calling sequence which we did see:

copy_user_highpage -> kmap_atomic -> flush_tlb_page -> _tlbil_va

We got interrupted after setting up the MAS registers before the
tlbwe and the interrupt handler that caused the interrupt also did
a kmap_atomic (ide code) and thus on returning from the interrupt
the MAS registers no longer contained the proper values.

Since we dont save/restore MAS registers for normal interrupts we
need to disable interrupts in _tlbil_va to ensure atomicity.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
---

 arch/powerpc/kernel/misc_32.S |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index bdc8b0e..d108715 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -479,6 +479,8 @@ _GLOBAL(_tlbil_pid)
  * (no broadcast)
  */
 _GLOBAL(_tlbil_va)
+	mfmsr	r10
+	wrteei	0
 	slwi	r4,r4,16
 	mtspr	SPRN_MAS6,r4		/* assume AS=0 for now */
 	tlbsx	0,r3
@@ -490,6 +492,7 @@ _GLOBAL(_tlbil_va)
 	tlbwe
 	msync
 	isync
+	wrtee	r10
 	blr
 #endif /* CONFIG_FSL_BOOKE */
 
-- 
1.5.6.5

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/16] powerpc/mm: Add local_flush_tlb_mm() to SW loaded TLB implementations
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (3 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 4/16] powerpc/fsl-booke: Fix problem with _tlbil_va Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15 20:30   ` Kumar Gala
  2008-12-15  5:44 ` [PATCH 6/16] powerpc/mm: Split mmu_context handling v3 Benjamin Herrenschmidt
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This adds a local_flush_tlb_mm() call as a pre-requisite for some
SMP work for BookE processors

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/include/asm/tlbflush.h |    5 +++++
 1 file changed, 5 insertions(+)

--- linux-work.orig/arch/powerpc/include/asm/tlbflush.h	2008-12-03 14:33:02.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/tlbflush.h	2008-12-03 14:33:22.000000000 +1100
@@ -40,6 +40,11 @@ extern void _tlbil_va(unsigned long addr
 extern void _tlbia(void);
 #endif
 
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+	_tlbil_pid(mm->context.id);
+}
+
 static inline void flush_tlb_mm(struct mm_struct *mm)
 {
 	_tlbil_pid(mm->context.id);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 6/16] powerpc/mm: Split mmu_context handling v3
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (4 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 5/16] powerpc/mm: Add local_flush_tlb_mm() to SW loaded TLB implementations Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15 15:43   ` Arnd Bergmann
  2008-12-15  5:44 ` [PATCH 7/16] powerpc/mm: Rework context management for CPUs with no hash table v2 Benjamin Herrenschmidt
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This splits the mmu_context handling between 32-bit hash based processors,
64-bit hash based processors and everybody else. This is preliminary work
for adding SMP support for BookE processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2. address various comments for Josh and Stephen
v3. properly removes the old mmu_context_32.c and mmu_context_64.c

 arch/powerpc/include/asm/mmu_context.h       |  260 +++------------------------
 arch/powerpc/kernel/asm-offsets.c            |    1 
 arch/powerpc/kernel/head_32.S                |   12 +
 arch/powerpc/kernel/ppc_ksyms.c              |    3 
 arch/powerpc/kernel/swsusp.c                 |    2 
 arch/powerpc/mm/Makefile                     |    7 
 arch/powerpc/mm/mmu_context_32.c             |   84 --------
 arch/powerpc/mm/mmu_context_64.c             |   70 -------
 arch/powerpc/mm/mmu_context_hash32.c         |  103 ++++++++++
 arch/powerpc/mm/mmu_context_hash64.c         |   78 ++++++++
 arch/powerpc/mm/mmu_context_nohash.c         |  162 ++++++++++++++++
 arch/powerpc/platforms/Kconfig.cputype       |   10 -
 arch/powerpc/platforms/powermac/cpufreq_32.c |    2 
 drivers/macintosh/via-pmu.c                  |    4 
 14 files changed, 407 insertions(+), 391 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/mmu_context.h	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/mmu_context.h	2008-12-09 16:31:02.000000000 +1100
@@ -2,240 +2,26 @@
 #define __ASM_POWERPC_MMU_CONTEXT_H
 #ifdef __KERNEL__
 
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
 #include <asm/mmu.h>	
 #include <asm/cputable.h>
 #include <asm-generic/mm_hooks.h>
-
-#ifndef CONFIG_PPC64
-#include <asm/atomic.h>
-#include <linux/bitops.h>
-
-/*
- * On 32-bit PowerPC 6xx/7xx/7xxx CPUs, we use a set of 16 VSIDs
- * (virtual segment identifiers) for each context.  Although the
- * hardware supports 24-bit VSIDs, and thus >1 million contexts,
- * we only use 32,768 of them.  That is ample, since there can be
- * at most around 30,000 tasks in the system anyway, and it means
- * that we can use a bitmap to indicate which contexts are in use.
- * Using a bitmap means that we entirely avoid all of the problems
- * that we used to have when the context number overflowed,
- * particularly on SMP systems.
- *  -- paulus.
- */
-
-/*
- * This function defines the mapping from contexts to VSIDs (virtual
- * segment IDs).  We use a skew on both the context and the high 4 bits
- * of the 32-bit virtual address (the "effective segment ID") in order
- * to spread out the entries in the MMU hash table.  Note, if this
- * function is changed then arch/ppc/mm/hashtable.S will have to be
- * changed to correspond.
- */
-#define CTX_TO_VSID(ctx, va)	(((ctx) * (897 * 16) + ((va) >> 28) * 0x111) \
-				 & 0xffffff)
-
-/*
-   The MPC8xx has only 16 contexts.  We rotate through them on each
-   task switch.  A better way would be to keep track of tasks that
-   own contexts, and implement an LRU usage.  That way very active
-   tasks don't always have to pay the TLB reload overhead.  The
-   kernel pages are mapped shared, so the kernel can run on behalf
-   of any task that makes a kernel entry.  Shared does not mean they
-   are not protected, just that the ASID comparison is not performed.
-        -- Dan
-
-   The IBM4xx has 256 contexts, so we can just rotate through these
-   as a way of "switching" contexts.  If the TID of the TLB is zero,
-   the PID/TID comparison is disabled, so we can use a TID of zero
-   to represent all kernel pages as shared among all contexts.
-   	-- Dan
- */
-
-static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
-{
-}
-
-#ifdef CONFIG_8xx
-#define NO_CONTEXT      	16
-#define LAST_CONTEXT    	15
-#define FIRST_CONTEXT    	0
-
-#elif defined(CONFIG_4xx)
-#define NO_CONTEXT      	256
-#define LAST_CONTEXT    	255
-#define FIRST_CONTEXT    	1
-
-#elif defined(CONFIG_E200) || defined(CONFIG_E500)
-#define NO_CONTEXT      	256
-#define LAST_CONTEXT    	255
-#define FIRST_CONTEXT    	1
-
-#else
-
-/* PPC 6xx, 7xx CPUs */
-#define NO_CONTEXT      	((unsigned long) -1)
-#define LAST_CONTEXT    	32767
-#define FIRST_CONTEXT    	1
-#endif
-
-/*
- * Set the current MMU context.
- * On 32-bit PowerPCs (other than the 8xx embedded chips), this is done by
- * loading up the segment registers for the user part of the address space.
- *
- * Since the PGD is immediately available, it is much faster to simply
- * pass this along as a second parameter, which is required for 8xx and
- * can be used for debugging on all processors (if you happen to have
- * an Abatron).
- */
-extern void set_context(unsigned long contextid, pgd_t *pgd);
-
-/*
- * Bitmap of contexts in use.
- * The size of this bitmap is LAST_CONTEXT + 1 bits.
- */
-extern unsigned long context_map[];
-
-/*
- * This caches the next context number that we expect to be free.
- * Its use is an optimization only, we can't rely on this context
- * number to be free, but it usually will be.
- */
-extern unsigned long next_mmu_context;
-
-/*
- * If we don't have sufficient contexts to give one to every task
- * that could be in the system, we need to be able to steal contexts.
- * These variables support that.
- */
-#if LAST_CONTEXT < 30000
-#define FEW_CONTEXTS	1
-extern atomic_t nr_free_contexts;
-extern struct mm_struct *context_mm[LAST_CONTEXT+1];
-extern void steal_context(void);
-#endif
-
-/*
- * Get a new mmu context for the address space described by `mm'.
- */
-static inline void get_mmu_context(struct mm_struct *mm)
-{
-	unsigned long ctx;
-
-	if (mm->context.id != NO_CONTEXT)
-		return;
-#ifdef FEW_CONTEXTS
-	while (atomic_dec_if_positive(&nr_free_contexts) < 0)
-		steal_context();
-#endif
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
-	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
-	mm->context.id = ctx;
-#ifdef FEW_CONTEXTS
-	context_mm[ctx] = mm;
-#endif
-}
-
-/*
- * Set up the context for a new address space.
- */
-static inline int init_new_context(struct task_struct *t, struct mm_struct *mm)
-{
-	mm->context.id = NO_CONTEXT;
-	return 0;
-}
-
-/*
- * We're finished using the context for an address space.
- */
-static inline void destroy_context(struct mm_struct *mm)
-{
-	preempt_disable();
-	if (mm->context.id != NO_CONTEXT) {
-		clear_bit(mm->context.id, context_map);
-		mm->context.id = NO_CONTEXT;
-#ifdef FEW_CONTEXTS
-		atomic_inc(&nr_free_contexts);
-#endif
-	}
-	preempt_enable();
-}
-
-static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
-			     struct task_struct *tsk)
-{
-#ifdef CONFIG_ALTIVEC
-	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-	asm volatile ("dssall;\n"
-#ifndef CONFIG_POWER4
-	 "sync;\n" /* G4 needs a sync here, G5 apparently not */
-#endif
-	 : : );
-#endif /* CONFIG_ALTIVEC */
-
-	tsk->thread.pgdir = next->pgd;
-
-	if (!cpu_isset(smp_processor_id(), next->cpu_vm_mask))
-		cpu_set(smp_processor_id(), next->cpu_vm_mask);
-
-	/* No need to flush userspace segments if the mm doesnt change */
-	if (prev == next)
-		return;
-
-	/* Setup new userspace context */
-	get_mmu_context(next);
-	set_context(next->context.id, next->pgd);
-}
-
-#define deactivate_mm(tsk,mm)	do { } while (0)
+#include <asm/cputhreads.h>
 
 /*
- * After we have set current->mm to a new value, this activates
- * the context for the new mm so we see the new mappings.
+ * Most if the context management is out of line
  */
-#define activate_mm(active_mm, mm)   switch_mm(active_mm, mm, current)
-
 extern void mmu_context_init(void);
-
-
-#else
-
-#include <linux/kernel.h>	
-#include <linux/mm.h>	
-#include <linux/sched.h>
-
-/*
- * Copyright (C) 2001 PPC 64 Team, IBM Corp
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-static inline void enter_lazy_tlb(struct mm_struct *mm,
-				  struct task_struct *tsk)
-{
-}
-
-/*
- * The proto-VSID space has 2^35 - 1 segments available for user mappings.
- * Each segment contains 2^28 bytes.  Each context maps 2^44 bytes,
- * so we can support 2^19-1 contexts (19 == 35 + 28 - 44).
- */
-#define NO_CONTEXT	0
-#define MAX_CONTEXT	((1UL << 19) - 1)
-
 extern int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
 extern void destroy_context(struct mm_struct *mm);
 
+extern void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next);
 extern void switch_stab(struct task_struct *tsk, struct mm_struct *mm);
 extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
+extern void set_context(unsigned long id, pgd_t *pgd);
 
 /*
  * switch_mm is the entry point called from the architecture independent
@@ -244,22 +30,39 @@ extern void switch_slb(struct task_struc
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			     struct task_struct *tsk)
 {
-	if (!cpu_isset(smp_processor_id(), next->cpu_vm_mask))
-		cpu_set(smp_processor_id(), next->cpu_vm_mask);
+	/* Mark this context has been used on the new CPU */
+	cpu_set(smp_processor_id(), next->cpu_vm_mask);
+
+	/* 32-bit keeps track of the current PGDIR in the thread struct */
+#ifdef CONFIG_PPC32
+	tsk->thread.pgdir = next->pgd;
+#endif /* CONFIG_PPC32 */
 
-	/* No need to flush userspace segments if the mm doesnt change */
+	/* Nothing else to do if we aren't actually switching */
 	if (prev == next)
 		return;
 
+	/* We must stop all altivec streams before changing the HW
+	 * context
+	 */
 #ifdef CONFIG_ALTIVEC
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
 		asm volatile ("dssall");
 #endif /* CONFIG_ALTIVEC */
 
+	/* The actual HW switching method differs between the various
+	 * sub architectures.
+	 */
+#ifdef CONFIG_PPC_STD_MMU_64
 	if (cpu_has_feature(CPU_FTR_SLB))
 		switch_slb(tsk, next);
 	else
 		switch_stab(tsk, next);
+#else
+	/* Out of line for now */
+	switch_mmu_context(prev, next);
+#endif
+
 }
 
 #define deactivate_mm(tsk,mm)	do { } while (0)
@@ -277,6 +80,11 @@ static inline void activate_mm(struct mm
 	local_irq_restore(flags);
 }
 
-#endif /* CONFIG_PPC64 */
+/* We don't currently use enter_lazy_tlb() for anything */
+static inline void enter_lazy_tlb(struct mm_struct *mm,
+				  struct task_struct *tsk)
+{
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
Index: linux-work/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/asm-offsets.c	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/kernel/asm-offsets.c	2008-12-09 16:31:02.000000000 +1100
@@ -60,6 +60,7 @@ int main(void)
 {
 	DEFINE(THREAD, offsetof(struct task_struct, thread));
 	DEFINE(MM, offsetof(struct task_struct, mm));
+	DEFINE(MMCONTEXTID, offsetof(struct mm_struct, context.id));
 #ifdef CONFIG_PPC64
 	DEFINE(AUDITCONTEXT, offsetof(struct task_struct, audit_context));
 #else
Index: linux-work/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/head_32.S	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/kernel/head_32.S	2008-12-09 16:31:02.000000000 +1100
@@ -31,6 +31,7 @@
 #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
 #include <asm/ptrace.h>
+#include <asm/bug.h>
 
 /* 601 only have IBAT; cr0.eq is set on 601 when using this macro */
 #define LOAD_BAT(n, reg, RA, RB)	\
@@ -1070,9 +1071,14 @@ start_here:
 	RFI
 
 /*
+ * void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next);
+ *
  * Set up the segment registers for a new context.
  */
-_ENTRY(set_context)
+_ENTRY(switch_mmu_context)
+	lwz	r3,MMCONTEXTID(r4)
+	cmpwi	cr0,r3,0
+	blt-	4f
 	mulli	r3,r3,897	/* multiply context by skew factor */
 	rlwinm	r3,r3,4,8,27	/* VSID = (context & 0xfffff) << 4 */
 	addis	r3,r3,0x6000	/* Set Ks, Ku bits */
@@ -1083,6 +1089,7 @@ _ENTRY(set_context)
 	/* Context switch the PTE pointer for the Abatron BDI2000.
 	 * The PGDIR is passed as second argument.
 	 */
+	lwz	r4,MM_PGD(r4)
 	lis	r5, KERNELBASE@h
 	lwz	r5, 0xf0(r5)
 	stw	r4, 0x4(r5)
@@ -1098,6 +1105,9 @@ _ENTRY(set_context)
 	sync
 	isync
 	blr
+4:	trap
+	EMIT_BUG_ENTRY 4b,__FILE__,__LINE__,0
+	blr
 
 /*
  * An undocumented "feature" of 604e requires that the v bit
Index: linux-work/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/ppc_ksyms.c	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/kernel/ppc_ksyms.c	2008-12-09 16:31:02.000000000 +1100
@@ -174,8 +174,7 @@ EXPORT_SYMBOL(cacheable_memcpy);
 #endif
 
 #ifdef CONFIG_PPC32
-EXPORT_SYMBOL(next_mmu_context);
-EXPORT_SYMBOL(set_context);
+EXPORT_SYMBOL(switch_mmu_context);
 #endif
 
 #ifdef CONFIG_PPC_STD_MMU_32
Index: linux-work/arch/powerpc/kernel/swsusp.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/swsusp.c	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/kernel/swsusp.c	2008-12-09 16:31:02.000000000 +1100
@@ -34,6 +34,6 @@ void save_processor_state(void)
 void restore_processor_state(void)
 {
 #ifdef CONFIG_PPC32
-	set_context(current->active_mm->context.id, current->active_mm->pgd);
+	switch_mmu_context(NULL, current->active_mm);
 #endif
 }
Index: linux-work/arch/powerpc/mm/Makefile
===================================================================
--- linux-work.orig/arch/powerpc/mm/Makefile	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/mm/Makefile	2008-12-09 16:31:02.000000000 +1100
@@ -8,15 +8,16 @@ endif
 
 obj-y				:= fault.o mem.o pgtable.o \
 				   init_$(CONFIG_WORD_SIZE).o \
-				   pgtable_$(CONFIG_WORD_SIZE).o \
-				   mmu_context_$(CONFIG_WORD_SIZE).o
+				   pgtable_$(CONFIG_WORD_SIZE).o
+obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o
 hash-$(CONFIG_PPC_NATIVE)	:= hash_native_64.o
 obj-$(CONFIG_PPC64)		+= hash_utils_64.o \
 				   slb_low.o slb.o stab.o \
 				   gup.o mmap.o $(hash-y)
 obj-$(CONFIG_PPC_STD_MMU_32)	+= ppc_mmu_32.o
 obj-$(CONFIG_PPC_STD_MMU)	+= hash_low_$(CONFIG_WORD_SIZE).o \
-				   tlb_$(CONFIG_WORD_SIZE).o
+				   tlb_$(CONFIG_WORD_SIZE).o \
+				   mmu_context_hash$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_40x)		+= 40x_mmu.o
 obj-$(CONFIG_44x)		+= 44x_mmu.o
 obj-$(CONFIG_FSL_BOOKE)		+= fsl_booke_mmu.o
Index: linux-work/arch/powerpc/mm/mmu_context_hash32.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/powerpc/mm/mmu_context_hash32.c	2008-12-09 16:31:02.000000000 +1100
@@ -0,0 +1,103 @@
+/*
+ * This file contains the routines for handling the MMU on those
+ * PowerPC implementations where the MMU substantially follows the
+ * architecture specification.  This includes the 6xx, 7xx, 7xxx,
+ * 8260, and POWER3 implementations but excludes the 8xx and 4xx.
+ *  -- paulus
+ *
+ *  Derived from arch/ppc/mm/init.c:
+ *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
+ *
+ *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
+ *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
+ *    Copyright (C) 1996 Paul Mackerras
+ *
+ *  Derived from "arch/i386/mm/init.c"
+ *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/mm.h>
+#include <linux/init.h>
+
+#include <asm/mmu_context.h>
+#include <asm/tlbflush.h>
+
+/*
+ * On 32-bit PowerPC 6xx/7xx/7xxx CPUs, we use a set of 16 VSIDs
+ * (virtual segment identifiers) for each context.  Although the
+ * hardware supports 24-bit VSIDs, and thus >1 million contexts,
+ * we only use 32,768 of them.  That is ample, since there can be
+ * at most around 30,000 tasks in the system anyway, and it means
+ * that we can use a bitmap to indicate which contexts are in use.
+ * Using a bitmap means that we entirely avoid all of the problems
+ * that we used to have when the context number overflowed,
+ * particularly on SMP systems.
+ *  -- paulus.
+ */
+#define NO_CONTEXT      	((unsigned long) -1)
+#define LAST_CONTEXT    	32767
+#define FIRST_CONTEXT    	1
+
+/*
+ * This function defines the mapping from contexts to VSIDs (virtual
+ * segment IDs).  We use a skew on both the context and the high 4 bits
+ * of the 32-bit virtual address (the "effective segment ID") in order
+ * to spread out the entries in the MMU hash table.  Note, if this
+ * function is changed then arch/ppc/mm/hashtable.S will have to be
+ * changed to correspond.
+ *
+ *
+ * CTX_TO_VSID(ctx, va)	(((ctx) * (897 * 16) + ((va) >> 28) * 0x111) \
+ *				 & 0xffffff)
+ */
+
+static unsigned long next_mmu_context;
+static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
+
+
+/*
+ * Set up the context for a new address space.
+ */
+int init_new_context(struct task_struct *t, struct mm_struct *mm)
+{
+	unsigned long ctx = next_mmu_context;
+
+	while (test_and_set_bit(ctx, context_map)) {
+		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
+		if (ctx > LAST_CONTEXT)
+			ctx = 0;
+	}
+	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
+	mm->context.id = ctx;
+
+	return 0;
+}
+
+/*
+ * We're finished using the context for an address space.
+ */
+void destroy_context(struct mm_struct *mm)
+{
+	preempt_disable();
+	if (mm->context.id != NO_CONTEXT) {
+		clear_bit(mm->context.id, context_map);
+		mm->context.id = NO_CONTEXT;
+	}
+	preempt_enable();
+}
+
+/*
+ * Initialize the context management stuff.
+ */
+void __init mmu_context_init(void)
+{
+	/* Reserve context 0 for kernel use */
+	context_map[0] = (1 << FIRST_CONTEXT) - 1;
+	next_mmu_context = FIRST_CONTEXT;
+}
Index: linux-work/arch/powerpc/mm/mmu_context_hash64.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/powerpc/mm/mmu_context_hash64.c	2008-12-09 16:31:02.000000000 +1100
@@ -0,0 +1,78 @@
+/*
+ *  MMU context allocation for 64-bit kernels.
+ *
+ *  Copyright (C) 2004 Anton Blanchard, IBM Corp. <anton@samba.org>
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+
+#include <asm/mmu_context.h>
+
+static DEFINE_SPINLOCK(mmu_context_lock);
+static DEFINE_IDR(mmu_context_idr);
+
+/*
+ * The proto-VSID space has 2^35 - 1 segments available for user mappings.
+ * Each segment contains 2^28 bytes.  Each context maps 2^44 bytes,
+ * so we can support 2^19-1 contexts (19 == 35 + 28 - 44).
+ */
+#define NO_CONTEXT	0
+#define MAX_CONTEXT	((1UL << 19) - 1)
+
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	int index;
+	int err;
+
+again:
+	if (!idr_pre_get(&mmu_context_idr, GFP_KERNEL))
+		return -ENOMEM;
+
+	spin_lock(&mmu_context_lock);
+	err = idr_get_new_above(&mmu_context_idr, NULL, 1, &index);
+	spin_unlock(&mmu_context_lock);
+
+	if (err == -EAGAIN)
+		goto again;
+	else if (err)
+		return err;
+
+	if (index > MAX_CONTEXT) {
+		spin_lock(&mmu_context_lock);
+		idr_remove(&mmu_context_idr, index);
+		spin_unlock(&mmu_context_lock);
+		return -ENOMEM;
+	}
+
+	/* The old code would re-promote on fork, we don't do that
+	 * when using slices as it could cause problem promoting slices
+	 * that have been forced down to 4K
+	 */
+	if (slice_mm_new_context(mm))
+		slice_set_user_psize(mm, mmu_virtual_psize);
+	mm->context.id = index;
+
+	return 0;
+}
+
+void destroy_context(struct mm_struct *mm)
+{
+	spin_lock(&mmu_context_lock);
+	idr_remove(&mmu_context_idr, mm->context.id);
+	spin_unlock(&mmu_context_lock);
+
+	mm->context.id = NO_CONTEXT;
+}
Index: linux-work/arch/powerpc/mm/mmu_context_nohash.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/powerpc/mm/mmu_context_nohash.c	2008-12-09 16:31:02.000000000 +1100
@@ -0,0 +1,162 @@
+/*
+ * This file contains the routines for handling the MMU on those
+ * PowerPC implementations where the MMU is not using the hash
+ * table, such as 8xx, 4xx, BookE's etc...
+ *
+ * Copyright 2008 Ben Herrenschmidt <benh@kernel.crashing.org>
+ *                IBM Corp.
+ *
+ *  Derived from previous arch/powerpc/mm/mmu_context.c
+ *  and arch/powerpc/include/asm/mmu_context.h
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/mm.h>
+#include <linux/init.h>
+
+#include <asm/mmu_context.h>
+#include <asm/tlbflush.h>
+
+/*
+ *   The MPC8xx has only 16 contexts.  We rotate through them on each
+ * task switch.  A better way would be to keep track of tasks that
+ * own contexts, and implement an LRU usage.  That way very active
+ * tasks don't always have to pay the TLB reload overhead.  The
+ * kernel pages are mapped shared, so the kernel can run on behalf
+ * of any task that makes a kernel entry.  Shared does not mean they
+ * are not protected, just that the ASID comparison is not performed.
+ *      -- Dan
+ *
+ * The IBM4xx has 256 contexts, so we can just rotate through these
+ * as a way of "switching" contexts.  If the TID of the TLB is zero,
+ * the PID/TID comparison is disabled, so we can use a TID of zero
+ * to represent all kernel pages as shared among all contexts.
+ * 	-- Dan
+ */
+
+#ifdef CONFIG_8xx
+#define NO_CONTEXT      	16
+#define LAST_CONTEXT    	15
+#define FIRST_CONTEXT    	0
+
+#elif defined(CONFIG_4xx)
+#define NO_CONTEXT      	256
+#define LAST_CONTEXT    	255
+#define FIRST_CONTEXT    	1
+
+#elif defined(CONFIG_E200) || defined(CONFIG_E500)
+#define NO_CONTEXT      	256
+#define LAST_CONTEXT    	255
+#define FIRST_CONTEXT    	1
+
+#else
+#error Unsupported processor type
+#endif
+
+static unsigned long next_mmu_context;
+static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
+static atomic_t nr_free_contexts;
+static struct mm_struct *context_mm[LAST_CONTEXT+1];
+static void steal_context(void);
+
+/* Steal a context from a task that has one at the moment.
+ * This is only used on 8xx and 4xx and we presently assume that
+ * they don't do SMP.  If they do then this will have to check
+ * whether the MM we steal is in use.
+ * We also assume that this is only used on systems that don't
+ * use an MMU hash table - this is true for 8xx and 4xx.
+ * This isn't an LRU system, it just frees up each context in
+ * turn (sort-of pseudo-random replacement :).  This would be the
+ * place to implement an LRU scheme if anyone was motivated to do it.
+ *  -- paulus
+ */
+static void steal_context(void)
+{
+	struct mm_struct *mm;
+
+	/* free up context `next_mmu_context' */
+	/* if we shouldn't free context 0, don't... */
+	if (next_mmu_context < FIRST_CONTEXT)
+		next_mmu_context = FIRST_CONTEXT;
+	mm = context_mm[next_mmu_context];
+	flush_tlb_mm(mm);
+	destroy_context(mm);
+}
+
+
+/*
+ * Get a new mmu context for the address space described by `mm'.
+ */
+static inline void get_mmu_context(struct mm_struct *mm)
+{
+	unsigned long ctx;
+
+	if (mm->context.id != NO_CONTEXT)
+		return;
+
+	while (atomic_dec_if_positive(&nr_free_contexts) < 0)
+		steal_context();
+
+	ctx = next_mmu_context;
+	while (test_and_set_bit(ctx, context_map)) {
+		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
+		if (ctx > LAST_CONTEXT)
+			ctx = 0;
+	}
+	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
+	mm->context.id = ctx;
+	context_mm[ctx] = mm;
+}
+
+void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
+{
+	get_mmu_context(next);
+
+	set_context(next->context.id, next->pgd);
+}
+
+/*
+ * Set up the context for a new address space.
+ */
+int init_new_context(struct task_struct *t, struct mm_struct *mm)
+{
+	mm->context.id = NO_CONTEXT;
+	return 0;
+}
+
+/*
+ * We're finished using the context for an address space.
+ */
+void destroy_context(struct mm_struct *mm)
+{
+	preempt_disable();
+	if (mm->context.id != NO_CONTEXT) {
+		clear_bit(mm->context.id, context_map);
+		mm->context.id = NO_CONTEXT;
+		atomic_inc(&nr_free_contexts);
+	}
+	preempt_enable();
+}
+
+
+/*
+ * Initialize the context management stuff.
+ */
+void __init mmu_context_init(void)
+{
+	/*
+	 * Some processors have too few contexts to reserve one for
+	 * init_mm, and require using context 0 for a normal task.
+	 * Other processors reserve the use of context zero for the kernel.
+	 * This code assumes FIRST_CONTEXT < 32.
+	 */
+	context_map[0] = (1 << FIRST_CONTEXT) - 1;
+	next_mmu_context = FIRST_CONTEXT;
+	atomic_set(&nr_free_contexts, LAST_CONTEXT - FIRST_CONTEXT + 1);
+}
+
Index: linux-work/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-work.orig/arch/powerpc/platforms/Kconfig.cputype	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/platforms/Kconfig.cputype	2008-12-09 16:31:02.000000000 +1100
@@ -195,13 +195,21 @@ config SPE
 
 config PPC_STD_MMU
 	bool
-	depends on 6xx || POWER3 || POWER4 || PPC64
+	depends on 6xx || PPC64
 	default y
 
 config PPC_STD_MMU_32
 	def_bool y
 	depends on PPC_STD_MMU && PPC32
 
+config PPC_STD_MMU_64
+	def_bool y
+	depends on PPC_STD_MMU && PPC64
+
+config PPC_MMU_NOHASH
+	def_bool y
+	depends on !PPC_STD_MMU
+
 config PPC_MM_SLICES
 	bool
 	default y if HUGETLB_PAGE || PPC_64K_PAGES
Index: linux-work/arch/powerpc/platforms/powermac/cpufreq_32.c
===================================================================
--- linux-work.orig/arch/powerpc/platforms/powermac/cpufreq_32.c	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/arch/powerpc/platforms/powermac/cpufreq_32.c	2008-12-09 16:31:02.000000000 +1100
@@ -310,7 +310,7 @@ static int pmu_set_cpu_speed(int low_spe
  		_set_L3CR(save_l3cr);
 
 	/* Restore userland MMU context */
-	set_context(current->active_mm->context.id, current->active_mm->pgd);
+	switch_mmu_context(NULL, current->active_mm);
 
 #ifdef DEBUG_FREQ
 	printk(KERN_DEBUG "HID1, after: %x\n", mfspr(SPRN_HID1));
Index: linux-work/drivers/macintosh/via-pmu.c
===================================================================
--- linux-work.orig/drivers/macintosh/via-pmu.c	2008-12-09 16:30:57.000000000 +1100
+++ linux-work/drivers/macintosh/via-pmu.c	2008-12-09 16:31:02.000000000 +1100
@@ -1814,7 +1814,7 @@ static int powerbook_sleep_grackle(void)
  		_set_L2CR(save_l2cr);
 	
 	/* Restore userland MMU context */
-	set_context(current->active_mm->context.id, current->active_mm->pgd);
+	switch_mmu_context(NULL, current->active_mm);
 
 	/* Power things up */
 	pmu_unlock();
@@ -1903,7 +1903,7 @@ powerbook_sleep_Core99(void)
  		_set_L3CR(save_l3cr);
 	
 	/* Restore userland MMU context */
-	set_context(current->active_mm->context.id, current->active_mm->pgd);
+	switch_mmu_context(NULL, current->active_mm);
 
 	/* Tell PMU we are ready */
 	pmu_unlock();
Index: linux-work/arch/powerpc/mm/mmu_context_32.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/mmu_context_32.c	2008-12-09 16:31:22.000000000 +1100
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,84 +0,0 @@
-/*
- * This file contains the routines for handling the MMU on those
- * PowerPC implementations where the MMU substantially follows the
- * architecture specification.  This includes the 6xx, 7xx, 7xxx,
- * 8260, and POWER3 implementations but excludes the 8xx and 4xx.
- *  -- paulus
- *
- *  Derived from arch/ppc/mm/init.c:
- *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
- *
- *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
- *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
- *    Copyright (C) 1996 Paul Mackerras
- *
- *  Derived from "arch/i386/mm/init.c"
- *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- *
- */
-
-#include <linux/mm.h>
-#include <linux/init.h>
-
-#include <asm/mmu_context.h>
-#include <asm/tlbflush.h>
-
-unsigned long next_mmu_context;
-unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
-#ifdef FEW_CONTEXTS
-atomic_t nr_free_contexts;
-struct mm_struct *context_mm[LAST_CONTEXT+1];
-void steal_context(void);
-#endif /* FEW_CONTEXTS */
-
-/*
- * Initialize the context management stuff.
- */
-void __init
-mmu_context_init(void)
-{
-	/*
-	 * Some processors have too few contexts to reserve one for
-	 * init_mm, and require using context 0 for a normal task.
-	 * Other processors reserve the use of context zero for the kernel.
-	 * This code assumes FIRST_CONTEXT < 32.
-	 */
-	context_map[0] = (1 << FIRST_CONTEXT) - 1;
-	next_mmu_context = FIRST_CONTEXT;
-#ifdef FEW_CONTEXTS
-	atomic_set(&nr_free_contexts, LAST_CONTEXT - FIRST_CONTEXT + 1);
-#endif /* FEW_CONTEXTS */
-}
-
-#ifdef FEW_CONTEXTS
-/*
- * Steal a context from a task that has one at the moment.
- * This is only used on 8xx and 4xx and we presently assume that
- * they don't do SMP.  If they do then this will have to check
- * whether the MM we steal is in use.
- * We also assume that this is only used on systems that don't
- * use an MMU hash table - this is true for 8xx and 4xx.
- * This isn't an LRU system, it just frees up each context in
- * turn (sort-of pseudo-random replacement :).  This would be the
- * place to implement an LRU scheme if anyone was motivated to do it.
- *  -- paulus
- */
-void
-steal_context(void)
-{
-	struct mm_struct *mm;
-
-	/* free up context `next_mmu_context' */
-	/* if we shouldn't free context 0, don't... */
-	if (next_mmu_context < FIRST_CONTEXT)
-		next_mmu_context = FIRST_CONTEXT;
-	mm = context_mm[next_mmu_context];
-	flush_tlb_mm(mm);
-	destroy_context(mm);
-}
-#endif /* FEW_CONTEXTS */
Index: linux-work/arch/powerpc/mm/mmu_context_64.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/mmu_context_64.c	2008-12-09 16:31:25.000000000 +1100
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,70 +0,0 @@
-/*
- *  MMU context allocation for 64-bit kernels.
- *
- *  Copyright (C) 2004 Anton Blanchard, IBM Corp. <anton@samba.org>
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- *
- */
-
-#include <linux/sched.h>
-#include <linux/kernel.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/types.h>
-#include <linux/mm.h>
-#include <linux/spinlock.h>
-#include <linux/idr.h>
-
-#include <asm/mmu_context.h>
-
-static DEFINE_SPINLOCK(mmu_context_lock);
-static DEFINE_IDR(mmu_context_idr);
-
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
-{
-	int index;
-	int err;
-
-again:
-	if (!idr_pre_get(&mmu_context_idr, GFP_KERNEL))
-		return -ENOMEM;
-
-	spin_lock(&mmu_context_lock);
-	err = idr_get_new_above(&mmu_context_idr, NULL, 1, &index);
-	spin_unlock(&mmu_context_lock);
-
-	if (err == -EAGAIN)
-		goto again;
-	else if (err)
-		return err;
-
-	if (index > MAX_CONTEXT) {
-		spin_lock(&mmu_context_lock);
-		idr_remove(&mmu_context_idr, index);
-		spin_unlock(&mmu_context_lock);
-		return -ENOMEM;
-	}
-
-	/* The old code would re-promote on fork, we don't do that
-	 * when using slices as it could cause problem promoting slices
-	 * that have been forced down to 4K
-	 */
-	if (slice_mm_new_context(mm))
-		slice_set_user_psize(mm, mmu_virtual_psize);
-	mm->context.id = index;
-
-	return 0;
-}
-
-void destroy_context(struct mm_struct *mm)
-{
-	spin_lock(&mmu_context_lock);
-	idr_remove(&mmu_context_idr, mm->context.id);
-	spin_unlock(&mmu_context_lock);
-
-	mm->context.id = NO_CONTEXT;
-}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 7/16] powerpc/mm: Rework context management for CPUs with no hash table v2
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (5 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 6/16] powerpc/mm: Split mmu_context handling v3 Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-17 21:30   ` Kumar Gala
  2008-12-15  5:44 ` [PATCH 8/16] powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c Benjamin Herrenschmidt
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This reworks the context management code used by 4xx,8xx and
freescale BookE. It adds support for SMP by implementing a
concept of stale context map to lazily flush the TLB on
processors where a context may have been invalidated. This
also contains the ground work for generalizing such lazy TLB
flushing by just picking up a new PID and marking the old one
stale. This will be implemented later.

This is a first implementation that uses a global spinlock.

Ideally, we should try to get at least the fast path (context ID
already assigned) lockless or limited to a per context lock,
but for now this will do.

I tried to keep the UP case reasonably simple to avoid adding
too much overhead to 8xx which does a lot of context stealing
since it effectively has only 16 PIDs available.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2. remove some bugs with active tracking on SMP

 arch/powerpc/include/asm/mmu-40x.h       |    5 
 arch/powerpc/include/asm/mmu-44x.h       |    5 
 arch/powerpc/include/asm/mmu-8xx.h       |    3 
 arch/powerpc/include/asm/mmu-fsl-booke.h |    5 
 arch/powerpc/include/asm/tlbflush.h      |    2 
 arch/powerpc/mm/mmu_context_nohash.c     |  262 +++++++++++++++++++++++++------
 6 files changed, 232 insertions(+), 50 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/mmu-40x.h	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/include/asm/mmu-40x.h	2008-12-09 16:42:03.000000000 +1100
@@ -54,8 +54,9 @@
 #ifndef __ASSEMBLY__
 
 typedef struct {
-	unsigned long id;
-	unsigned long vdso_base;
+	unsigned int	id;
+	unsigned int	active;
+	unsigned long	vdso_base;
 } mm_context_t;
 
 #endif /* !__ASSEMBLY__ */
Index: linux-work/arch/powerpc/include/asm/mmu-44x.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/mmu-44x.h	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/include/asm/mmu-44x.h	2008-12-15 10:12:39.000000000 +1100
@@ -56,8 +56,9 @@
 extern unsigned int tlb_44x_hwater;
 
 typedef struct {
-	unsigned long id;
-	unsigned long vdso_base;
+	unsigned int	id;
+	unsigned int	active;
+	unsigned long	vdso_base;
 } mm_context_t;
 
 #endif /* !__ASSEMBLY__ */
Index: linux-work/arch/powerpc/include/asm/mmu-fsl-booke.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/mmu-fsl-booke.h	2008-12-08 15:40:33.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/mmu-fsl-booke.h	2008-12-09 16:42:03.000000000 +1100
@@ -76,8 +76,9 @@
 #ifndef __ASSEMBLY__
 
 typedef struct {
-	unsigned long id;
-	unsigned long vdso_base;
+	unsigned int	id;
+	unsigned int	active;
+	unsigned long	vdso_base;
 } mm_context_t;
 #endif /* !__ASSEMBLY__ */
 
Index: linux-work/arch/powerpc/mm/mmu_context_nohash.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/mmu_context_nohash.c	2008-12-09 16:42:03.000000000 +1100
+++ linux-work/arch/powerpc/mm/mmu_context_nohash.c	2008-12-15 10:13:05.000000000 +1100
@@ -14,13 +14,28 @@
  *  as published by the Free Software Foundation; either version
  *  2 of the License, or (at your option) any later version.
  *
+ * TODO:
+ *
+ *   - The global context lock will not scale very well
+ *   - The maps should be dynamically allocated to allow for processors
+ *     that support more PID bits at runtime
+ *   - Implement flush_tlb_mm() by making the context stale and picking
+ *     a new one
+ *   - More aggressively clear stale map bits and maybe find some way to
+ *     also clear mm->cpu_vm_mask bits when processes are migrated
  */
 
+#undef DEBUG
+#define DEBUG_STEAL_ONLY
+#undef DEBUG_MAP_CONSISTENCY
+
+#include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/init.h>
 
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
+#include <linux/spinlock.h>
 
 /*
  *   The MPC8xx has only 16 contexts.  We rotate through them on each
@@ -40,17 +55,14 @@
  */
 
 #ifdef CONFIG_8xx
-#define NO_CONTEXT      	16
 #define LAST_CONTEXT    	15
 #define FIRST_CONTEXT    	0
 
 #elif defined(CONFIG_4xx)
-#define NO_CONTEXT      	256
 #define LAST_CONTEXT    	255
 #define FIRST_CONTEXT    	1
 
 #elif defined(CONFIG_E200) || defined(CONFIG_E500)
-#define NO_CONTEXT      	256
 #define LAST_CONTEXT    	255
 #define FIRST_CONTEXT    	1
 
@@ -58,11 +70,11 @@
 #error Unsupported processor type
 #endif
 
-static unsigned long next_mmu_context;
+static unsigned int next_context, nr_free_contexts;
 static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
-static atomic_t nr_free_contexts;
+static unsigned long stale_map[NR_CPUS][LAST_CONTEXT / BITS_PER_LONG + 1];
 static struct mm_struct *context_mm[LAST_CONTEXT+1];
-static void steal_context(void);
+static spinlock_t context_lock = SPIN_LOCK_UNLOCKED;
 
 /* Steal a context from a task that has one at the moment.
  * This is only used on 8xx and 4xx and we presently assume that
@@ -75,49 +87,193 @@ static void steal_context(void);
  * place to implement an LRU scheme if anyone was motivated to do it.
  *  -- paulus
  */
-static void steal_context(void)
+
+/*
+ * For context stealing, we use a slightly different approach for
+ * SMP and UP. Basically, the UP one is simpler and doesn't use
+ * the stale map as we can just flush the local CPU
+ */
+#ifdef CONFIG_SMP
+static unsigned int steal_context_smp(unsigned int id)
 {
 	struct mm_struct *mm;
+	unsigned int cpu, max;
 
-	/* free up context `next_mmu_context' */
-	/* if we shouldn't free context 0, don't... */
-	if (next_mmu_context < FIRST_CONTEXT)
-		next_mmu_context = FIRST_CONTEXT;
-	mm = context_mm[next_mmu_context];
-	flush_tlb_mm(mm);
-	destroy_context(mm);
-}
+ again:
+	max = LAST_CONTEXT - FIRST_CONTEXT;
 
+	/* Attempt to free next_context first and then loop until we manage */
+	while (max--) {
+		/* Pick up the victim mm */
+		mm = context_mm[id];
+
+		/* We have a candidate victim, check if it's active, on SMP
+		 * we cannot steal active contexts
+		 */
+		if (mm->context.active) {
+			id ++;
+			if (id > LAST_CONTEXT)
+				id = FIRST_CONTEXT;
+			continue;
+		}
+		pr_debug("[%d] steal context %d from mm @%p\n",
+			 smp_processor_id(), id, mm);
+
+		/* Mark this mm has having no context anymore */
+		mm->context.id = MMU_NO_CONTEXT;
+
+		/* Mark it stale on all CPUs that used this mm */
+		for_each_cpu_mask_nr(cpu, mm->cpu_vm_mask)
+			__set_bit(id, stale_map[cpu]);
+		return id;
+	}
 
-/*
- * Get a new mmu context for the address space described by `mm'.
+	/* This will happen if you have more CPUs than available contexts,
+	 * all we can do here is wait a bit and try again
+	 */
+	spin_unlock(&context_lock);
+	cpu_relax();
+	spin_lock(&context_lock);
+	goto again;
+}
+#endif  /* CONFIG_SMP */
+
+/* Note that this will also be called on SMP if all other CPUs are
+ * offlined, which means that it may be called for cpu != 0. For
+ * this to work, we somewhat assume that CPUs that are onlined
+ * come up with a fully clean TLB (or are cleaned when offlined)
  */
-static inline void get_mmu_context(struct mm_struct *mm)
+static unsigned int steal_context_up(unsigned int id)
 {
-	unsigned long ctx;
+	struct mm_struct *mm;
+	int cpu = smp_processor_id();
 
-	if (mm->context.id != NO_CONTEXT)
-		return;
+	/* Pick up the victim mm */
+	mm = context_mm[id];
+
+	pr_debug("[%d] steal context %d from mm @%p\n", cpu, id, mm);
+
+	/* Mark this mm has having no context anymore */
+	mm->context.id = MMU_NO_CONTEXT;
+
+	/* Flush the TLB for that context */
+	local_flush_tlb_mm(mm);
+
+	/* XXX This clear should ultimately be part of local_flush_tlb_mm */
+	__clear_bit(id, stale_map[cpu]);
+
+	return id;
+}
 
-	while (atomic_dec_if_positive(&nr_free_contexts) < 0)
-		steal_context();
+#ifdef DEBUG_MAP_CONSISTENCY
+static void context_check_map(void)
+{
+	unsigned int id, nrf, nact;
 
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
-	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
-	mm->context.id = ctx;
-	context_mm[ctx] = mm;
+	nrf = nact = 0;
+	for (id = FIRST_CONTEXT; id <= LAST_CONTEXT; id++) {
+		int used = test_bit(id, context_map);
+		if (!used)
+			nrf++;
+		if (used != (context_mm[id] != NULL))
+			pr_err("MMU: Context %d is %s and MM is %p !\n",
+			       id, used ? "used" : "free", context_mm[id]);
+		if (context_mm[id] != NULL)
+			nact += context_mm[id]->context.active;
+	}
+	if (nrf != nr_free_contexts) {
+		pr_err("MMU: Free context count out of sync ! (%d vs %d)\n",
+		       nr_free_contexts, nrf);
+		nr_free_contexts = nrf;
+	}
+	if (nact > num_online_cpus())
+		pr_err("MMU: More active contexts than CPUs ! (%d vs %d)\n",
+		       nact, num_online_cpus());
 }
+#else
+static void context_check_map(void) { }
+#endif
 
 void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 {
-	get_mmu_context(next);
+	unsigned int id, cpu = smp_processor_id();
+	unsigned long *map;
 
-	set_context(next->context.id, next->pgd);
+	/* No lockless fast path .. yet */
+	spin_lock(&context_lock);
+
+#ifndef DEBUG_STEAL_ONLY
+	pr_debug("[%d] activating context for mm @%p, active=%d, id=%d\n",
+		 cpu, next, next->context.active, next->context.id);
+#endif
+
+#ifdef CONFIG_SMP
+	/* Mark us active and the previous one not anymore */
+	next->context.active++;
+	if (prev) {
+		WARN_ON(prev->context.active < 1);
+		prev->context.active--;
+	}
+#endif /* CONFIG_SMP */
+
+	/* If we already have a valid assigned context, skip all that */
+	id = next->context.id;
+	if (likely(id != MMU_NO_CONTEXT))
+		goto ctxt_ok;
+
+	/* We really don't have a context, let's try to acquire one */
+	id = next_context;
+	if (id > LAST_CONTEXT)
+		id = FIRST_CONTEXT;
+	map = context_map;
+
+	/* No more free contexts, let's try to steal one */
+	if (nr_free_contexts == 0) {
+#ifdef CONFIG_SMP
+		if (num_online_cpus() > 1) {
+			id = steal_context_smp(id);
+			goto stolen;
+		}
+#endif /* CONFIG_SMP */
+		id = steal_context_up(id);
+		goto stolen;
+	}
+	nr_free_contexts--;
+
+	/* We know there's at least one free context, try to find it */
+	while (__test_and_set_bit(id, map)) {
+		id = find_next_zero_bit(map, LAST_CONTEXT+1, id);
+		if (id > LAST_CONTEXT)
+			id = FIRST_CONTEXT;
+	}
+ stolen:
+	next_context = id + 1;
+	context_mm[id] = next;
+	next->context.id = id;
+
+#ifndef DEBUG_STEAL_ONLY
+	pr_debug("[%d] picked up new id %d, nrf is now %d\n",
+		 cpu, id, nr_free_contexts);
+#endif
+
+	context_check_map();
+ ctxt_ok:
+
+	/* If that context got marked stale on this CPU, then flush the
+	 * local TLB for it and unmark it before we use it
+	 */
+	if (test_bit(id, stale_map[cpu])) {
+		pr_debug("[%d] flushing stale context %d for mm @%p !\n",
+			 cpu, id, next);
+		local_flush_tlb_mm(next);
+
+		/* XXX This clear should ultimately be part of local_flush_tlb_mm */
+		__clear_bit(id, stale_map[cpu]);
+	}
+
+	/* Flick the MMU and release lock */
+	set_context(id, next->pgd);
+	spin_unlock(&context_lock);
 }
 
 /*
@@ -125,7 +281,9 @@ void switch_mmu_context(struct mm_struct
  */
 int init_new_context(struct task_struct *t, struct mm_struct *mm)
 {
-	mm->context.id = NO_CONTEXT;
+	mm->context.id = MMU_NO_CONTEXT;
+	mm->context.active = 0;
+
 	return 0;
 }
 
@@ -134,13 +292,25 @@ int init_new_context(struct task_struct 
  */
 void destroy_context(struct mm_struct *mm)
 {
-	preempt_disable();
-	if (mm->context.id != NO_CONTEXT) {
-		clear_bit(mm->context.id, context_map);
-		mm->context.id = NO_CONTEXT;
-		atomic_inc(&nr_free_contexts);
+	unsigned int id;
+
+	if (mm->context.id == MMU_NO_CONTEXT)
+		return;
+
+	WARN_ON(mm->context.active != 0);
+
+	spin_lock(&context_lock);
+	id = mm->context.id;
+	if (id != MMU_NO_CONTEXT) {
+		__clear_bit(id, context_map);
+		mm->context.id = MMU_NO_CONTEXT;
+#ifdef DEBUG_MAP_CONSISTENCY
+		mm->context.active = 0;
+		context_mm[id] = NULL;
+#endif
+		nr_free_contexts++;
 	}
-	preempt_enable();
+	spin_unlock(&context_lock);
 }
 
 
@@ -149,6 +319,12 @@ void destroy_context(struct mm_struct *m
  */
 void __init mmu_context_init(void)
 {
+	/* Mark init_mm as being active on all possible CPUs since
+	 * we'll get called with prev == init_mm the first time
+	 * we schedule on a given CPU
+	 */
+	init_mm.context.active = NR_CPUS;
+
 	/*
 	 * Some processors have too few contexts to reserve one for
 	 * init_mm, and require using context 0 for a normal task.
@@ -156,7 +332,7 @@ void __init mmu_context_init(void)
 	 * This code assumes FIRST_CONTEXT < 32.
 	 */
 	context_map[0] = (1 << FIRST_CONTEXT) - 1;
-	next_mmu_context = FIRST_CONTEXT;
-	atomic_set(&nr_free_contexts, LAST_CONTEXT - FIRST_CONTEXT + 1);
+	next_context = FIRST_CONTEXT;
+	nr_free_contexts = LAST_CONTEXT - FIRST_CONTEXT + 1;
 }
 
Index: linux-work/arch/powerpc/include/asm/tlbflush.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/tlbflush.h	2008-12-09 16:42:03.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/tlbflush.h	2008-12-15 10:13:05.000000000 +1100
@@ -29,6 +29,8 @@
 
 #include <linux/mm.h>
 
+#define MMU_NO_CONTEXT      	((unsigned int)-1)
+
 extern void _tlbie(unsigned long address, unsigned int pid);
 extern void _tlbil_all(void);
 extern void _tlbil_pid(unsigned int pid);
Index: linux-work/arch/powerpc/include/asm/mmu-8xx.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/mmu-8xx.h	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/include/asm/mmu-8xx.h	2008-12-09 16:42:03.000000000 +1100
@@ -137,7 +137,8 @@
 
 #ifndef __ASSEMBLY__
 typedef struct {
-	unsigned long id;
+	unsigned int id;
+	unsigned int active;
 	unsigned long vdso_base;
 } mm_context_t;
 #endif /* !__ASSEMBLY__ */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 8/16] powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (6 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 7/16] powerpc/mm: Rework context management for CPUs with no hash table v2 Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15 20:36   ` Kumar Gala
  2008-12-15  5:44 ` [PATCH 9/16] powerpc/mm: Introduce MMU features v2 Benjamin Herrenschmidt
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This renames the files to clarify the fact that they are used by
the hash based family of CPUs (the 603 being an exception in that
family but is still handled by that code).

This paves the way for the new tlb_nohash.c coming via a subsequent
patch.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/mm/Makefile     |    2 
 arch/powerpc/mm/tlb_32.c     |  190 --------------------------------------
 arch/powerpc/mm/tlb_64.c     |  211 -------------------------------------------
 arch/powerpc/mm/tlb_hash32.c |  190 ++++++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb_hash64.c |  211 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 402 insertions(+), 402 deletions(-)

--- linux-work.orig/arch/powerpc/mm/tlb_32.c	2008-12-09 16:30:49.000000000 +1100
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,190 +0,0 @@
-/*
- * This file contains the routines for TLB flushing.
- * On machines where the MMU uses a hash table to store virtual to
- * physical translations, these routines flush entries from the
- * hash table also.
- *  -- paulus
- *
- *  Derived from arch/ppc/mm/init.c:
- *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
- *
- *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
- *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
- *    Copyright (C) 1996 Paul Mackerras
- *
- *  Derived from "arch/i386/mm/init.c"
- *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/mm.h>
-#include <linux/init.h>
-#include <linux/highmem.h>
-#include <linux/pagemap.h>
-
-#include <asm/tlbflush.h>
-#include <asm/tlb.h>
-
-#include "mmu_decl.h"
-
-/*
- * Called when unmapping pages to flush entries from the TLB/hash table.
- */
-void flush_hash_entry(struct mm_struct *mm, pte_t *ptep, unsigned long addr)
-{
-	unsigned long ptephys;
-
-	if (Hash != 0) {
-		ptephys = __pa(ptep) & PAGE_MASK;
-		flush_hash_pages(mm->context.id, addr, ptephys, 1);
-	}
-}
-EXPORT_SYMBOL(flush_hash_entry);
-
-/*
- * Called by ptep_set_access_flags, must flush on CPUs for which the
- * DSI handler can't just "fixup" the TLB on a write fault
- */
-void flush_tlb_page_nohash(struct vm_area_struct *vma, unsigned long addr)
-{
-	if (Hash != 0)
-		return;
-	_tlbie(addr);
-}
-
-/*
- * Called at the end of a mmu_gather operation to make sure the
- * TLB flush is completely done.
- */
-void tlb_flush(struct mmu_gather *tlb)
-{
-	if (Hash == 0) {
-		/*
-		 * 603 needs to flush the whole TLB here since
-		 * it doesn't use a hash table.
-		 */
-		_tlbia();
-	}
-}
-
-/*
- * TLB flushing:
- *
- *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
- *  - flush_tlb_page(vma, vmaddr) flushes one page
- *  - flush_tlb_range(vma, start, end) flushes a range of pages
- *  - flush_tlb_kernel_range(start, end) flushes kernel pages
- *
- * since the hardware hash table functions as an extension of the
- * tlb as far as the linux tables are concerned, flush it too.
- *    -- Cort
- */
-
-/*
- * 750 SMP is a Bad Idea because the 750 doesn't broadcast all
- * the cache operations on the bus.  Hence we need to use an IPI
- * to get the other CPU(s) to invalidate their TLBs.
- */
-#ifdef CONFIG_SMP_750
-#define FINISH_FLUSH	smp_send_tlb_invalidate(0)
-#else
-#define FINISH_FLUSH	do { } while (0)
-#endif
-
-static void flush_range(struct mm_struct *mm, unsigned long start,
-			unsigned long end)
-{
-	pmd_t *pmd;
-	unsigned long pmd_end;
-	int count;
-	unsigned int ctx = mm->context.id;
-
-	if (Hash == 0) {
-		_tlbia();
-		return;
-	}
-	start &= PAGE_MASK;
-	if (start >= end)
-		return;
-	end = (end - 1) | ~PAGE_MASK;
-	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
-	for (;;) {
-		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
-		if (pmd_end > end)
-			pmd_end = end;
-		if (!pmd_none(*pmd)) {
-			count = ((pmd_end - start) >> PAGE_SHIFT) + 1;
-			flush_hash_pages(ctx, start, pmd_val(*pmd), count);
-		}
-		if (pmd_end == end)
-			break;
-		start = pmd_end + 1;
-		++pmd;
-	}
-}
-
-/*
- * Flush kernel TLB entries in the given range
- */
-void flush_tlb_kernel_range(unsigned long start, unsigned long end)
-{
-	flush_range(&init_mm, start, end);
-	FINISH_FLUSH;
-}
-
-/*
- * Flush all the (user) entries for the address space described by mm.
- */
-void flush_tlb_mm(struct mm_struct *mm)
-{
-	struct vm_area_struct *mp;
-
-	if (Hash == 0) {
-		_tlbia();
-		return;
-	}
-
-	/*
-	 * It is safe to go down the mm's list of vmas when called
-	 * from dup_mmap, holding mmap_sem.  It would also be safe from
-	 * unmap_region or exit_mmap, but not from vmtruncate on SMP -
-	 * but it seems dup_mmap is the only SMP case which gets here.
-	 */
-	for (mp = mm->mmap; mp != NULL; mp = mp->vm_next)
-		flush_range(mp->vm_mm, mp->vm_start, mp->vm_end);
-	FINISH_FLUSH;
-}
-
-void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
-{
-	struct mm_struct *mm;
-	pmd_t *pmd;
-
-	if (Hash == 0) {
-		_tlbie(vmaddr);
-		return;
-	}
-	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
-	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
-	if (!pmd_none(*pmd))
-		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
-	FINISH_FLUSH;
-}
-
-/*
- * For each address in the range, find the pte for the address
- * and check _PAGE_HASHPTE bit; if it is set, find and destroy
- * the corresponding HPTE.
- */
-void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
-		     unsigned long end)
-{
-	flush_range(vma->vm_mm, start, end);
-	FINISH_FLUSH;
-}
Index: linux-work/arch/powerpc/mm/tlb_64.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/tlb_64.c	2008-12-09 16:30:49.000000000 +1100
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,211 +0,0 @@
-/*
- * This file contains the routines for flushing entries from the
- * TLB and MMU hash table.
- *
- *  Derived from arch/ppc64/mm/init.c:
- *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
- *
- *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
- *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
- *    Copyright (C) 1996 Paul Mackerras
- *
- *  Derived from "arch/i386/mm/init.c"
- *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
- *
- *  Dave Engebretsen <engebret@us.ibm.com>
- *      Rework for PPC64 port.
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- */
-
-#include <linux/kernel.h>
-#include <linux/mm.h>
-#include <linux/init.h>
-#include <linux/percpu.h>
-#include <linux/hardirq.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-#include <asm/tlb.h>
-#include <asm/bug.h>
-
-DEFINE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
-
-/* This is declared as we are using the more or less generic
- * arch/powerpc/include/asm/tlb.h file -- tgall
- */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-/*
- * A linux PTE was changed and the corresponding hash table entry
- * neesd to be flushed. This function will either perform the flush
- * immediately or will batch it up if the current CPU has an active
- * batch on it.
- *
- * Must be called from within some kind of spinlock/non-preempt region...
- */
-void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
-		     pte_t *ptep, unsigned long pte, int huge)
-{
-	struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
-	unsigned long vsid, vaddr;
-	unsigned int psize;
-	int ssize;
-	real_pte_t rpte;
-	int i;
-
-	i = batch->index;
-
-	/* We mask the address for the base page size. Huge pages will
-	 * have applied their own masking already
-	 */
-	addr &= PAGE_MASK;
-
-	/* Get page size (maybe move back to caller).
-	 *
-	 * NOTE: when using special 64K mappings in 4K environment like
-	 * for SPEs, we obtain the page size from the slice, which thus
-	 * must still exist (and thus the VMA not reused) at the time
-	 * of this call
-	 */
-	if (huge) {
-#ifdef CONFIG_HUGETLB_PAGE
-		psize = get_slice_psize(mm, addr);;
-#else
-		BUG();
-		psize = pte_pagesize_index(mm, addr, pte); /* shutup gcc */
-#endif
-	} else
-		psize = pte_pagesize_index(mm, addr, pte);
-
-	/* Build full vaddr */
-	if (!is_kernel_addr(addr)) {
-		ssize = user_segment_size(addr);
-		vsid = get_vsid(mm->context.id, addr, ssize);
-		WARN_ON(vsid == 0);
-	} else {
-		vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
-		ssize = mmu_kernel_ssize;
-	}
-	vaddr = hpt_va(addr, vsid, ssize);
-	rpte = __real_pte(__pte(pte), ptep);
-
-	/*
-	 * Check if we have an active batch on this CPU. If not, just
-	 * flush now and return. For now, we don global invalidates
-	 * in that case, might be worth testing the mm cpu mask though
-	 * and decide to use local invalidates instead...
-	 */
-	if (!batch->active) {
-		flush_hash_page(vaddr, rpte, psize, ssize, 0);
-		return;
-	}
-
-	/*
-	 * This can happen when we are in the middle of a TLB batch and
-	 * we encounter memory pressure (eg copy_page_range when it tries
-	 * to allocate a new pte). If we have to reclaim memory and end
-	 * up scanning and resetting referenced bits then our batch context
-	 * will change mid stream.
-	 *
-	 * We also need to ensure only one page size is present in a given
-	 * batch
-	 */
-	if (i != 0 && (mm != batch->mm || batch->psize != psize ||
-		       batch->ssize != ssize)) {
-		__flush_tlb_pending(batch);
-		i = 0;
-	}
-	if (i == 0) {
-		batch->mm = mm;
-		batch->psize = psize;
-		batch->ssize = ssize;
-	}
-	batch->pte[i] = rpte;
-	batch->vaddr[i] = vaddr;
-	batch->index = ++i;
-	if (i >= PPC64_TLB_BATCH_NR)
-		__flush_tlb_pending(batch);
-}
-
-/*
- * This function is called when terminating an mmu batch or when a batch
- * is full. It will perform the flush of all the entries currently stored
- * in a batch.
- *
- * Must be called from within some kind of spinlock/non-preempt region...
- */
-void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
-{
-	cpumask_t tmp;
-	int i, local = 0;
-
-	i = batch->index;
-	tmp = cpumask_of_cpu(smp_processor_id());
-	if (cpus_equal(batch->mm->cpu_vm_mask, tmp))
-		local = 1;
-	if (i == 1)
-		flush_hash_page(batch->vaddr[0], batch->pte[0],
-				batch->psize, batch->ssize, local);
-	else
-		flush_hash_range(i, local);
-	batch->index = 0;
-}
-
-/**
- * __flush_hash_table_range - Flush all HPTEs for a given address range
- *                            from the hash table (and the TLB). But keeps
- *                            the linux PTEs intact.
- *
- * @mm		: mm_struct of the target address space (generally init_mm)
- * @start	: starting address
- * @end         : ending address (not included in the flush)
- *
- * This function is mostly to be used by some IO hotplug code in order
- * to remove all hash entries from a given address range used to map IO
- * space on a removed PCI-PCI bidge without tearing down the full mapping
- * since 64K pages may overlap with other bridges when using 64K pages
- * with 4K HW pages on IO space.
- *
- * Because of that usage pattern, it's only available with CONFIG_HOTPLUG
- * and is implemented for small size rather than speed.
- */
-#ifdef CONFIG_HOTPLUG
-
-void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
-			      unsigned long end)
-{
-	unsigned long flags;
-
-	start = _ALIGN_DOWN(start, PAGE_SIZE);
-	end = _ALIGN_UP(end, PAGE_SIZE);
-
-	BUG_ON(!mm->pgd);
-
-	/* Note: Normally, we should only ever use a batch within a
-	 * PTE locked section. This violates the rule, but will work
-	 * since we don't actually modify the PTEs, we just flush the
-	 * hash while leaving the PTEs intact (including their reference
-	 * to being hashed). This is not the most performance oriented
-	 * way to do things but is fine for our needs here.
-	 */
-	local_irq_save(flags);
-	arch_enter_lazy_mmu_mode();
-	for (; start < end; start += PAGE_SIZE) {
-		pte_t *ptep = find_linux_pte(mm->pgd, start);
-		unsigned long pte;
-
-		if (ptep == NULL)
-			continue;
-		pte = pte_val(*ptep);
-		if (!(pte & _PAGE_HASHPTE))
-			continue;
-		hpte_need_flush(mm, start, ptep, pte, 0);
-	}
-	arch_leave_lazy_mmu_mode();
-	local_irq_restore(flags);
-}
-
-#endif /* CONFIG_HOTPLUG */
Index: linux-work/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/powerpc/mm/tlb_hash32.c	2008-12-09 16:31:43.000000000 +1100
@@ -0,0 +1,190 @@
+/*
+ * This file contains the routines for TLB flushing.
+ * On machines where the MMU uses a hash table to store virtual to
+ * physical translations, these routines flush entries from the
+ * hash table also.
+ *  -- paulus
+ *
+ *  Derived from arch/ppc/mm/init.c:
+ *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
+ *
+ *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
+ *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
+ *    Copyright (C) 1996 Paul Mackerras
+ *
+ *  Derived from "arch/i386/mm/init.c"
+ *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+
+#include <asm/tlbflush.h>
+#include <asm/tlb.h>
+
+#include "mmu_decl.h"
+
+/*
+ * Called when unmapping pages to flush entries from the TLB/hash table.
+ */
+void flush_hash_entry(struct mm_struct *mm, pte_t *ptep, unsigned long addr)
+{
+	unsigned long ptephys;
+
+	if (Hash != 0) {
+		ptephys = __pa(ptep) & PAGE_MASK;
+		flush_hash_pages(mm->context.id, addr, ptephys, 1);
+	}
+}
+EXPORT_SYMBOL(flush_hash_entry);
+
+/*
+ * Called by ptep_set_access_flags, must flush on CPUs for which the
+ * DSI handler can't just "fixup" the TLB on a write fault
+ */
+void flush_tlb_page_nohash(struct vm_area_struct *vma, unsigned long addr)
+{
+	if (Hash != 0)
+		return;
+	_tlbie(addr);
+}
+
+/*
+ * Called at the end of a mmu_gather operation to make sure the
+ * TLB flush is completely done.
+ */
+void tlb_flush(struct mmu_gather *tlb)
+{
+	if (Hash == 0) {
+		/*
+		 * 603 needs to flush the whole TLB here since
+		 * it doesn't use a hash table.
+		 */
+		_tlbia();
+	}
+}
+
+/*
+ * TLB flushing:
+ *
+ *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
+ *  - flush_tlb_page(vma, vmaddr) flushes one page
+ *  - flush_tlb_range(vma, start, end) flushes a range of pages
+ *  - flush_tlb_kernel_range(start, end) flushes kernel pages
+ *
+ * since the hardware hash table functions as an extension of the
+ * tlb as far as the linux tables are concerned, flush it too.
+ *    -- Cort
+ */
+
+/*
+ * 750 SMP is a Bad Idea because the 750 doesn't broadcast all
+ * the cache operations on the bus.  Hence we need to use an IPI
+ * to get the other CPU(s) to invalidate their TLBs.
+ */
+#ifdef CONFIG_SMP_750
+#define FINISH_FLUSH	smp_send_tlb_invalidate(0)
+#else
+#define FINISH_FLUSH	do { } while (0)
+#endif
+
+static void flush_range(struct mm_struct *mm, unsigned long start,
+			unsigned long end)
+{
+	pmd_t *pmd;
+	unsigned long pmd_end;
+	int count;
+	unsigned int ctx = mm->context.id;
+
+	if (Hash == 0) {
+		_tlbia();
+		return;
+	}
+	start &= PAGE_MASK;
+	if (start >= end)
+		return;
+	end = (end - 1) | ~PAGE_MASK;
+	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
+	for (;;) {
+		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
+		if (pmd_end > end)
+			pmd_end = end;
+		if (!pmd_none(*pmd)) {
+			count = ((pmd_end - start) >> PAGE_SHIFT) + 1;
+			flush_hash_pages(ctx, start, pmd_val(*pmd), count);
+		}
+		if (pmd_end == end)
+			break;
+		start = pmd_end + 1;
+		++pmd;
+	}
+}
+
+/*
+ * Flush kernel TLB entries in the given range
+ */
+void flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+	flush_range(&init_mm, start, end);
+	FINISH_FLUSH;
+}
+
+/*
+ * Flush all the (user) entries for the address space described by mm.
+ */
+void flush_tlb_mm(struct mm_struct *mm)
+{
+	struct vm_area_struct *mp;
+
+	if (Hash == 0) {
+		_tlbia();
+		return;
+	}
+
+	/*
+	 * It is safe to go down the mm's list of vmas when called
+	 * from dup_mmap, holding mmap_sem.  It would also be safe from
+	 * unmap_region or exit_mmap, but not from vmtruncate on SMP -
+	 * but it seems dup_mmap is the only SMP case which gets here.
+	 */
+	for (mp = mm->mmap; mp != NULL; mp = mp->vm_next)
+		flush_range(mp->vm_mm, mp->vm_start, mp->vm_end);
+	FINISH_FLUSH;
+}
+
+void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
+{
+	struct mm_struct *mm;
+	pmd_t *pmd;
+
+	if (Hash == 0) {
+		_tlbie(vmaddr);
+		return;
+	}
+	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
+	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
+	if (!pmd_none(*pmd))
+		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
+	FINISH_FLUSH;
+}
+
+/*
+ * For each address in the range, find the pte for the address
+ * and check _PAGE_HASHPTE bit; if it is set, find and destroy
+ * the corresponding HPTE.
+ */
+void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+		     unsigned long end)
+{
+	flush_range(vma->vm_mm, start, end);
+	FINISH_FLUSH;
+}
Index: linux-work/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/powerpc/mm/tlb_hash64.c	2008-12-09 16:31:39.000000000 +1100
@@ -0,0 +1,211 @@
+/*
+ * This file contains the routines for flushing entries from the
+ * TLB and MMU hash table.
+ *
+ *  Derived from arch/ppc64/mm/init.c:
+ *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
+ *
+ *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
+ *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
+ *    Copyright (C) 1996 Paul Mackerras
+ *
+ *  Derived from "arch/i386/mm/init.c"
+ *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
+ *
+ *  Dave Engebretsen <engebret@us.ibm.com>
+ *      Rework for PPC64 port.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm/tlb.h>
+#include <asm/bug.h>
+
+DEFINE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
+
+/* This is declared as we are using the more or less generic
+ * arch/powerpc/include/asm/tlb.h file -- tgall
+ */
+DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
+
+/*
+ * A linux PTE was changed and the corresponding hash table entry
+ * neesd to be flushed. This function will either perform the flush
+ * immediately or will batch it up if the current CPU has an active
+ * batch on it.
+ *
+ * Must be called from within some kind of spinlock/non-preempt region...
+ */
+void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
+		     pte_t *ptep, unsigned long pte, int huge)
+{
+	struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
+	unsigned long vsid, vaddr;
+	unsigned int psize;
+	int ssize;
+	real_pte_t rpte;
+	int i;
+
+	i = batch->index;
+
+	/* We mask the address for the base page size. Huge pages will
+	 * have applied their own masking already
+	 */
+	addr &= PAGE_MASK;
+
+	/* Get page size (maybe move back to caller).
+	 *
+	 * NOTE: when using special 64K mappings in 4K environment like
+	 * for SPEs, we obtain the page size from the slice, which thus
+	 * must still exist (and thus the VMA not reused) at the time
+	 * of this call
+	 */
+	if (huge) {
+#ifdef CONFIG_HUGETLB_PAGE
+		psize = get_slice_psize(mm, addr);;
+#else
+		BUG();
+		psize = pte_pagesize_index(mm, addr, pte); /* shutup gcc */
+#endif
+	} else
+		psize = pte_pagesize_index(mm, addr, pte);
+
+	/* Build full vaddr */
+	if (!is_kernel_addr(addr)) {
+		ssize = user_segment_size(addr);
+		vsid = get_vsid(mm->context.id, addr, ssize);
+		WARN_ON(vsid == 0);
+	} else {
+		vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
+		ssize = mmu_kernel_ssize;
+	}
+	vaddr = hpt_va(addr, vsid, ssize);
+	rpte = __real_pte(__pte(pte), ptep);
+
+	/*
+	 * Check if we have an active batch on this CPU. If not, just
+	 * flush now and return. For now, we don global invalidates
+	 * in that case, might be worth testing the mm cpu mask though
+	 * and decide to use local invalidates instead...
+	 */
+	if (!batch->active) {
+		flush_hash_page(vaddr, rpte, psize, ssize, 0);
+		return;
+	}
+
+	/*
+	 * This can happen when we are in the middle of a TLB batch and
+	 * we encounter memory pressure (eg copy_page_range when it tries
+	 * to allocate a new pte). If we have to reclaim memory and end
+	 * up scanning and resetting referenced bits then our batch context
+	 * will change mid stream.
+	 *
+	 * We also need to ensure only one page size is present in a given
+	 * batch
+	 */
+	if (i != 0 && (mm != batch->mm || batch->psize != psize ||
+		       batch->ssize != ssize)) {
+		__flush_tlb_pending(batch);
+		i = 0;
+	}
+	if (i == 0) {
+		batch->mm = mm;
+		batch->psize = psize;
+		batch->ssize = ssize;
+	}
+	batch->pte[i] = rpte;
+	batch->vaddr[i] = vaddr;
+	batch->index = ++i;
+	if (i >= PPC64_TLB_BATCH_NR)
+		__flush_tlb_pending(batch);
+}
+
+/*
+ * This function is called when terminating an mmu batch or when a batch
+ * is full. It will perform the flush of all the entries currently stored
+ * in a batch.
+ *
+ * Must be called from within some kind of spinlock/non-preempt region...
+ */
+void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
+{
+	cpumask_t tmp;
+	int i, local = 0;
+
+	i = batch->index;
+	tmp = cpumask_of_cpu(smp_processor_id());
+	if (cpus_equal(batch->mm->cpu_vm_mask, tmp))
+		local = 1;
+	if (i == 1)
+		flush_hash_page(batch->vaddr[0], batch->pte[0],
+				batch->psize, batch->ssize, local);
+	else
+		flush_hash_range(i, local);
+	batch->index = 0;
+}
+
+/**
+ * __flush_hash_table_range - Flush all HPTEs for a given address range
+ *                            from the hash table (and the TLB). But keeps
+ *                            the linux PTEs intact.
+ *
+ * @mm		: mm_struct of the target address space (generally init_mm)
+ * @start	: starting address
+ * @end         : ending address (not included in the flush)
+ *
+ * This function is mostly to be used by some IO hotplug code in order
+ * to remove all hash entries from a given address range used to map IO
+ * space on a removed PCI-PCI bidge without tearing down the full mapping
+ * since 64K pages may overlap with other bridges when using 64K pages
+ * with 4K HW pages on IO space.
+ *
+ * Because of that usage pattern, it's only available with CONFIG_HOTPLUG
+ * and is implemented for small size rather than speed.
+ */
+#ifdef CONFIG_HOTPLUG
+
+void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
+			      unsigned long end)
+{
+	unsigned long flags;
+
+	start = _ALIGN_DOWN(start, PAGE_SIZE);
+	end = _ALIGN_UP(end, PAGE_SIZE);
+
+	BUG_ON(!mm->pgd);
+
+	/* Note: Normally, we should only ever use a batch within a
+	 * PTE locked section. This violates the rule, but will work
+	 * since we don't actually modify the PTEs, we just flush the
+	 * hash while leaving the PTEs intact (including their reference
+	 * to being hashed). This is not the most performance oriented
+	 * way to do things but is fine for our needs here.
+	 */
+	local_irq_save(flags);
+	arch_enter_lazy_mmu_mode();
+	for (; start < end; start += PAGE_SIZE) {
+		pte_t *ptep = find_linux_pte(mm->pgd, start);
+		unsigned long pte;
+
+		if (ptep == NULL)
+			continue;
+		pte = pte_val(*ptep);
+		if (!(pte & _PAGE_HASHPTE))
+			continue;
+		hpte_need_flush(mm, start, ptep, pte, 0);
+	}
+	arch_leave_lazy_mmu_mode();
+	local_irq_restore(flags);
+}
+
+#endif /* CONFIG_HOTPLUG */
Index: linux-work/arch/powerpc/mm/Makefile
===================================================================
--- linux-work.orig/arch/powerpc/mm/Makefile	2008-12-09 16:31:02.000000000 +1100
+++ linux-work/arch/powerpc/mm/Makefile	2008-12-09 16:31:43.000000000 +1100
@@ -16,7 +16,7 @@ obj-$(CONFIG_PPC64)		+= hash_utils_64.o 
 				   gup.o mmap.o $(hash-y)
 obj-$(CONFIG_PPC_STD_MMU_32)	+= ppc_mmu_32.o
 obj-$(CONFIG_PPC_STD_MMU)	+= hash_low_$(CONFIG_WORD_SIZE).o \
-				   tlb_$(CONFIG_WORD_SIZE).o \
+				   tlb_hash$(CONFIG_WORD_SIZE).o \
 				   mmu_context_hash$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_40x)		+= 40x_mmu.o
 obj-$(CONFIG_44x)		+= 44x_mmu.o

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 9/16] powerpc/mm: Introduce MMU features v2
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (7 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 8/16] powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 10/16] powerpc/mm: Remove flush_HPTE() Benjamin Herrenschmidt
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features. I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Remove a spurrious character that broke 64-bit build

 arch/powerpc/include/asm/cputable.h       |   85 +++++++++-------------
 arch/powerpc/include/asm/feature-fixups.h |   30 +++++++
 arch/powerpc/include/asm/mmu.h            |   27 +++++++
 arch/powerpc/kernel/cputable.c            |  113 ++++++++++++++++++++++++++++++
 arch/powerpc/kernel/head_32.S             |    8 +-
 arch/powerpc/kernel/head_fsl_booke.S      |    4 -
 arch/powerpc/kernel/module.c              |    6 +
 arch/powerpc/kernel/setup_32.c            |    4 +
 arch/powerpc/kernel/setup_64.c            |    2 
 arch/powerpc/kernel/swsusp_32.S           |    6 -
 arch/powerpc/kernel/vdso.c                |   10 ++
 arch/powerpc/kernel/vdso32/vdso32.lds.S   |    3 
 arch/powerpc/kernel/vdso64/vdso64.lds.S   |    3 
 arch/powerpc/kernel/vmlinux.lds.S         |    6 +
 arch/powerpc/mm/mmu_decl.h                |    2 
 arch/powerpc/mm/ppc_mmu_32.c              |    2 
 arch/powerpc/platforms/powermac/sleep.S   |    5 -
 17 files changed, 255 insertions(+), 61 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/cputable.h	2008-12-09 16:42:03.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/cputable.h	2008-12-09 17:07:31.000000000 +1100
@@ -82,6 +82,7 @@ struct cpu_spec {
 	char		*cpu_name;
 	unsigned long	cpu_features;		/* Kernel features */
 	unsigned int	cpu_user_features;	/* Userland features */
+	unsigned int	mmu_features;		/* MMU features */
 
 	/* cache line sizes */
 	unsigned int	icache_bsize;
@@ -144,17 +145,14 @@ extern const char *powerpc_base_platform
 #define CPU_FTR_USE_TB			ASM_CONST(0x0000000000000040)
 #define CPU_FTR_L2CSR			ASM_CONST(0x0000000000000080)
 #define CPU_FTR_601			ASM_CONST(0x0000000000000100)
-#define CPU_FTR_HPTE_TABLE		ASM_CONST(0x0000000000000200)
 #define CPU_FTR_CAN_NAP			ASM_CONST(0x0000000000000400)
 #define CPU_FTR_L3CR			ASM_CONST(0x0000000000000800)
 #define CPU_FTR_L3_DISABLE_NAP		ASM_CONST(0x0000000000001000)
 #define CPU_FTR_NAP_DISABLE_L2_PR	ASM_CONST(0x0000000000002000)
 #define CPU_FTR_DUAL_PLL_750FX		ASM_CONST(0x0000000000004000)
 #define CPU_FTR_NO_DPM			ASM_CONST(0x0000000000008000)
-#define CPU_FTR_HAS_HIGH_BATS		ASM_CONST(0x0000000000010000)
 #define CPU_FTR_NEED_COHERENT		ASM_CONST(0x0000000000020000)
 #define CPU_FTR_NO_BTIC			ASM_CONST(0x0000000000040000)
-#define CPU_FTR_BIG_PHYS		ASM_CONST(0x0000000000080000)
 #define CPU_FTR_NODSISRALIGN		ASM_CONST(0x0000000000100000)
 #define CPU_FTR_PPC_LE			ASM_CONST(0x0000000000200000)
 #define CPU_FTR_REAL_LE			ASM_CONST(0x0000000000400000)
@@ -266,107 +264,99 @@ extern const char *powerpc_base_platform
 		     !defined(CONFIG_POWER3) && !defined(CONFIG_POWER4) && \
 		     !defined(CONFIG_BOOKE))
 
-#define CPU_FTRS_PPC601	(CPU_FTR_COMMON | CPU_FTR_601 | CPU_FTR_HPTE_TABLE | \
+#define CPU_FTRS_PPC601	(CPU_FTR_COMMON | CPU_FTR_601 | \
 	CPU_FTR_COHERENT_ICACHE | CPU_FTR_UNIFIED_ID_CACHE)
 #define CPU_FTRS_603	(CPU_FTR_COMMON | \
 	    CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
 #define CPU_FTRS_604	(CPU_FTR_COMMON | \
-	    CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_PPC_LE)
+	    CPU_FTR_USE_TB | CPU_FTR_PPC_LE)
 #define CPU_FTRS_740_NOTAU	(CPU_FTR_COMMON | \
 	    CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | CPU_FTR_L2CR | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
+	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
 #define CPU_FTRS_740	(CPU_FTR_COMMON | \
 	    CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | CPU_FTR_L2CR | \
-	    CPU_FTR_TAU | CPU_FTR_HPTE_TABLE | CPU_FTR_MAYBE_CAN_NAP | \
+	    CPU_FTR_TAU | CPU_FTR_MAYBE_CAN_NAP | \
 	    CPU_FTR_PPC_LE)
 #define CPU_FTRS_750	(CPU_FTR_COMMON | \
 	    CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | CPU_FTR_L2CR | \
-	    CPU_FTR_TAU | CPU_FTR_HPTE_TABLE | CPU_FTR_MAYBE_CAN_NAP | \
+	    CPU_FTR_TAU | CPU_FTR_MAYBE_CAN_NAP | \
 	    CPU_FTR_PPC_LE)
-#define CPU_FTRS_750CL	(CPU_FTRS_750 | CPU_FTR_HAS_HIGH_BATS)
+#define CPU_FTRS_750CL	(CPU_FTRS_750)
 #define CPU_FTRS_750FX1	(CPU_FTRS_750 | CPU_FTR_DUAL_PLL_750FX | CPU_FTR_NO_DPM)
 #define CPU_FTRS_750FX2	(CPU_FTRS_750 | CPU_FTR_NO_DPM)
-#define CPU_FTRS_750FX	(CPU_FTRS_750 | CPU_FTR_DUAL_PLL_750FX | \
-		CPU_FTR_HAS_HIGH_BATS)
+#define CPU_FTRS_750FX	(CPU_FTRS_750 | CPU_FTR_DUAL_PLL_750FX)
 #define CPU_FTRS_750GX	(CPU_FTRS_750FX)
 #define CPU_FTRS_7400_NOTAU	(CPU_FTR_COMMON | \
 	    CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | CPU_FTR_L2CR | \
-	    CPU_FTR_ALTIVEC_COMP | CPU_FTR_HPTE_TABLE | \
+	    CPU_FTR_ALTIVEC_COMP | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
 #define CPU_FTRS_7400	(CPU_FTR_COMMON | \
 	    CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | CPU_FTR_L2CR | \
-	    CPU_FTR_TAU | CPU_FTR_ALTIVEC_COMP | CPU_FTR_HPTE_TABLE | \
+	    CPU_FTR_TAU | CPU_FTR_ALTIVEC_COMP | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
 #define CPU_FTRS_7450_20	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_L3CR | CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
+	    CPU_FTR_L3CR | CPU_FTR_SPEC7450 | \
 	    CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE | CPU_FTR_NEED_PAIRED_STWCX)
 #define CPU_FTRS_7450_21	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_L3CR | CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
+	    CPU_FTR_L3CR | CPU_FTR_SPEC7450 | \
 	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_L3_DISABLE_NAP | \
 	    CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE | CPU_FTR_NEED_PAIRED_STWCX)
 #define CPU_FTRS_7450_23	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | CPU_FTR_NEED_PAIRED_STWCX | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_L3CR | CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
+	    CPU_FTR_L3CR | CPU_FTR_SPEC7450 | \
 	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE)
 #define CPU_FTRS_7455_1	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | CPU_FTR_NEED_PAIRED_STWCX | \
 	    CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | CPU_FTR_L3CR | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | CPU_FTR_HAS_HIGH_BATS | \
-	    CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE)
+	    CPU_FTR_SPEC7450 | CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE)
 #define CPU_FTRS_7455_20	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | CPU_FTR_NEED_PAIRED_STWCX | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_L3CR | CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
+	    CPU_FTR_L3CR | CPU_FTR_SPEC7450 | \
 	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_L3_DISABLE_NAP | \
-	    CPU_FTR_NEED_COHERENT | CPU_FTR_HAS_HIGH_BATS | CPU_FTR_PPC_LE)
+	    CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE)
 #define CPU_FTRS_7455	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_L3CR | CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
-	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_HAS_HIGH_BATS | \
+	    CPU_FTR_L3CR | CPU_FTR_SPEC7450 | CPU_FTR_NAP_DISABLE_L2_PR | \
 	    CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE | CPU_FTR_NEED_PAIRED_STWCX)
 #define CPU_FTRS_7447_10	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_L3CR | CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
-	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_HAS_HIGH_BATS | \
+	    CPU_FTR_L3CR | CPU_FTR_SPEC7450 | CPU_FTR_NAP_DISABLE_L2_PR | \
 	    CPU_FTR_NEED_COHERENT | CPU_FTR_NO_BTIC | CPU_FTR_PPC_LE | \
 	    CPU_FTR_NEED_PAIRED_STWCX)
 #define CPU_FTRS_7447	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_L3CR | CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
-	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_HAS_HIGH_BATS | \
+	    CPU_FTR_L3CR | CPU_FTR_SPEC7450 | CPU_FTR_NAP_DISABLE_L2_PR | \
 	    CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE | CPU_FTR_NEED_PAIRED_STWCX)
 #define CPU_FTRS_7447A	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
-	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_HAS_HIGH_BATS | \
+	    CPU_FTR_SPEC7450 | CPU_FTR_NAP_DISABLE_L2_PR | \
 	    CPU_FTR_NEED_COHERENT | CPU_FTR_PPC_LE | CPU_FTR_NEED_PAIRED_STWCX)
 #define CPU_FTRS_7448	(CPU_FTR_COMMON | \
 	    CPU_FTR_USE_TB | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_L2CR | CPU_FTR_ALTIVEC_COMP | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_SPEC7450 | \
-	    CPU_FTR_NAP_DISABLE_L2_PR | CPU_FTR_HAS_HIGH_BATS | \
+	    CPU_FTR_SPEC7450 | CPU_FTR_NAP_DISABLE_L2_PR | \
 	    CPU_FTR_PPC_LE | CPU_FTR_NEED_PAIRED_STWCX)
 #define CPU_FTRS_82XX	(CPU_FTR_COMMON | \
 	    CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB)
 #define CPU_FTRS_G2_LE	(CPU_FTR_COMMON | CPU_FTR_MAYBE_CAN_DOZE | \
-	    CPU_FTR_USE_TB | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_HAS_HIGH_BATS)
+	    CPU_FTR_USE_TB | CPU_FTR_MAYBE_CAN_NAP)
 #define CPU_FTRS_E300	(CPU_FTR_MAYBE_CAN_DOZE | \
-	    CPU_FTR_USE_TB | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_HAS_HIGH_BATS | \
+	    CPU_FTR_USE_TB | CPU_FTR_MAYBE_CAN_NAP | \
 	    CPU_FTR_COMMON)
 #define CPU_FTRS_E300C2	(CPU_FTR_MAYBE_CAN_DOZE | \
-	    CPU_FTR_USE_TB | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_HAS_HIGH_BATS | \
+	    CPU_FTR_USE_TB | CPU_FTR_MAYBE_CAN_NAP | \
 	    CPU_FTR_COMMON | CPU_FTR_FPU_UNAVAILABLE)
-#define CPU_FTRS_CLASSIC32	(CPU_FTR_COMMON | \
-	    CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE)
+#define CPU_FTRS_CLASSIC32	(CPU_FTR_COMMON | CPU_FTR_USE_TB)
 #define CPU_FTRS_8XX	(CPU_FTR_USE_TB)
 #define CPU_FTRS_40X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_44X	(CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
@@ -379,55 +369,54 @@ extern const char *powerpc_base_platform
 	    CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN | \
 	    CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_E500_2	(CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
-	    CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_BIG_PHYS | \
+	    CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | \
 	    CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_E500MC	(CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
-	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_BIG_PHYS | CPU_FTR_NODSISRALIGN | \
+	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN | \
 	    CPU_FTR_L2CSR | CPU_FTR_LWSYNC | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_GENERIC_32	(CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN)
 
 /* 64-bit CPUs */
 #define CPU_FTRS_POWER3	(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | CPU_FTR_PPC_LE)
+	    CPU_FTR_IABR | CPU_FTR_PPC_LE)
 #define CPU_FTRS_RS64	(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | \
+	    CPU_FTR_IABR | \
 	    CPU_FTR_MMCRA | CPU_FTR_CTRL)
 #define CPU_FTRS_POWER4	(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
+	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_MMCRA | CPU_FTR_CP_USE_DCBTZ)
 #define CPU_FTRS_PPC970	(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
+	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_ALTIVEC_COMP | CPU_FTR_CAN_NAP | CPU_FTR_MMCRA | \
 	    CPU_FTR_CP_USE_DCBTZ)
 #define CPU_FTRS_POWER5	(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
+	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_MMCRA | CPU_FTR_SMT | \
 	    CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
 	    CPU_FTR_PURR)
 #define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
+	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_MMCRA | CPU_FTR_SMT | \
 	    CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
 	    CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
 	    CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD)
 #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
+	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_MMCRA | CPU_FTR_SMT | \
 	    CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
 	    CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
 	    CPU_FTR_DSCR | CPU_FTR_SAO)
 #define CPU_FTRS_CELL	(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
+	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
 	    CPU_FTR_PAUSE_ZERO | CPU_FTR_CI_LARGE_PAGE | \
 	    CPU_FTR_CELL_TB_BUG | CPU_FTR_CP_USE_DCBTZ | \
 	    CPU_FTR_UNALIGNED_LD_STD)
 #define CPU_FTRS_PA6T (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | \
+	    CPU_FTR_PPCAS_ARCH_V2 | \
 	    CPU_FTR_ALTIVEC_COMP | CPU_FTR_CI_LARGE_PAGE | \
 	    CPU_FTR_PURR | CPU_FTR_REAL_LE | CPU_FTR_NO_SLBIE_B)
-#define CPU_FTRS_COMPATIBLE	(CPU_FTR_USE_TB | \
-	    CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2)
+#define CPU_FTRS_COMPATIBLE	(CPU_FTR_USE_TB | CPU_FTR_PPCAS_ARCH_V2)
 
 #ifdef __powerpc64__
 #define CPU_FTRS_POSSIBLE	\
Index: linux-work/arch/powerpc/include/asm/feature-fixups.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/feature-fixups.h	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/include/asm/feature-fixups.h	2008-12-09 16:42:03.000000000 +1100
@@ -81,6 +81,36 @@ label##5:					       	\
 #define ALT_FTR_SECTION_END_IFCLR(msk)	\
 	ALT_FTR_SECTION_END_NESTED_IFCLR(msk, 97)
 
+/* MMU feature dependent sections */
+#define BEGIN_MMU_FTR_SECTION_NESTED(label)	START_FTR_SECTION(label)
+#define BEGIN_MMU_FTR_SECTION			START_FTR_SECTION(97)
+
+#define END_MMU_FTR_SECTION_NESTED(msk, val, label) 		\
+	FTR_SECTION_ELSE_NESTED(label)				\
+	MAKE_FTR_SECTION_ENTRY(msk, val, label, __mmu_ftr_fixup)
+
+#define END_MMU_FTR_SECTION(msk, val)		\
+	END_MMU_FTR_SECTION_NESTED(msk, val, 97)
+
+#define END_MMU_FTR_SECTION_IFSET(msk)	END_MMU_FTR_SECTION((msk), (msk))
+#define END_MMU_FTR_SECTION_IFCLR(msk)	END_MMU_FTR_SECTION((msk), 0)
+
+/* MMU feature sections with alternatives, use BEGIN_FTR_SECTION to start */
+#define MMU_FTR_SECTION_ELSE_NESTED(label)	FTR_SECTION_ELSE_NESTED(label)
+#define MMU_FTR_SECTION_ELSE	MMU_FTR_SECTION_ELSE_NESTED(97)
+#define ALT_MMU_FTR_SECTION_END_NESTED(msk, val, label)	\
+	MAKE_FTR_SECTION_ENTRY(msk, val, label, __mmu_ftr_fixup)
+#define ALT_MMU_FTR_SECTION_END_NESTED_IFSET(msk, label)	\
+	ALT_MMU_FTR_SECTION_END_NESTED(msk, msk, label)
+#define ALT_MMU_FTR_SECTION_END_NESTED_IFCLR(msk, label)	\
+	ALT_MMU_FTR_SECTION_END_NESTED(msk, 0, label)
+#define ALT_MMU_FTR_SECTION_END(msk, val)	\
+	ALT_MMU_FTR_SECTION_END_NESTED(msk, val, 97)
+#define ALT_MMU_FTR_SECTION_END_IFSET(msk)	\
+	ALT_MMU_FTR_SECTION_END_NESTED_IFSET(msk, 97)
+#define ALT_MMU_FTR_SECTION_END_IFCLR(msk)	\
+	ALT_MMU_FTR_SECTION_END_NESTED_IFCLR(msk, 97)
+
 /* Firmware feature dependent sections */
 #define BEGIN_FW_FTR_SECTION_NESTED(label)	START_FTR_SECTION(label)
 #define BEGIN_FW_FTR_SECTION			START_FTR_SECTION(97)
Index: linux-work/arch/powerpc/include/asm/mmu.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/mmu.h	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/include/asm/mmu.h	2008-12-09 17:07:34.000000000 +1100
@@ -2,6 +2,33 @@
 #define _ASM_POWERPC_MMU_H_
 #ifdef __KERNEL__
 
+#include <asm/asm-compat.h>
+#include <asm/feature-fixups.h>
+
+/*
+ * MMU features bit definitions
+ */
+#define MMU_FTR_HPTE_TABLE		ASM_CONST(0x00000001)
+#define MMU_FTR_TYPE_8xx		ASM_CONST(0x00000002)
+#define MMU_FTR_TYPE_40x		ASM_CONST(0x00000004)
+#define MMU_FTR_TYPE_44x		ASM_CONST(0x00000008)
+#define MMU_FTR_TYPE_FSL_E		ASM_CONST(0x00000010)
+#define MMU_FTR_HAS_HIGH_BATS		ASM_CONST(0x00010000)
+#define MMU_FTR_BIG_PHYS		ASM_CONST(0x00020000)
+
+#ifndef __ASSEMBLY__
+#include <asm/cputable.h>
+
+static inline int mmu_has_feature(unsigned long feature)
+{
+	return (cur_cpu_spec->mmu_features & feature);
+}
+
+extern unsigned int __start___mmu_ftr_fixup, __stop___mmu_ftr_fixup;
+
+#endif /* !__ASSEMBLY__ */
+
+
 #ifdef CONFIG_PPC64
 /* 64-bit classic hash table MMU */
 #  include <asm/mmu-hash64.h>
Index: linux-work/arch/powerpc/kernel/cputable.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/cputable.c	2008-12-09 16:42:03.000000000 +1100
+++ linux-work/arch/powerpc/kernel/cputable.c	2008-12-09 17:07:44.000000000 +1100
@@ -19,6 +19,7 @@
 #include <asm/oprofile_impl.h>
 #include <asm/cputable.h>
 #include <asm/prom.h>		/* for PTRRELOC on ARCH=ppc */
+#include <asm/mmu.h>
 
 struct cpu_spec* cur_cpu_spec = NULL;
 EXPORT_SYMBOL(cur_cpu_spec);
@@ -93,6 +94,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER3 (630)",
 		.cpu_features		= CPU_FTRS_POWER3,
 		.cpu_user_features	= COMMON_USER_PPC64|PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -108,6 +110,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER3 (630+)",
 		.cpu_features		= CPU_FTRS_POWER3,
 		.cpu_user_features	= COMMON_USER_PPC64|PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -123,6 +126,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "RS64-II (northstar)",
 		.cpu_features		= CPU_FTRS_RS64,
 		.cpu_user_features	= COMMON_USER_PPC64,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -138,6 +142,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "RS64-III (pulsar)",
 		.cpu_features		= CPU_FTRS_RS64,
 		.cpu_user_features	= COMMON_USER_PPC64,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -153,6 +158,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "RS64-III (icestar)",
 		.cpu_features		= CPU_FTRS_RS64,
 		.cpu_user_features	= COMMON_USER_PPC64,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -168,6 +174,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "RS64-IV (sstar)",
 		.cpu_features		= CPU_FTRS_RS64,
 		.cpu_user_features	= COMMON_USER_PPC64,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -183,6 +190,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER4 (gp)",
 		.cpu_features		= CPU_FTRS_POWER4,
 		.cpu_user_features	= COMMON_USER_POWER4,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -198,6 +206,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER4+ (gq)",
 		.cpu_features		= CPU_FTRS_POWER4,
 		.cpu_user_features	= COMMON_USER_POWER4,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -214,6 +223,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_PPC970,
 		.cpu_user_features	= COMMON_USER_POWER4 |
 			PPC_FEATURE_HAS_ALTIVEC_COMP,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -232,6 +242,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_PPC970,
 		.cpu_user_features	= COMMON_USER_POWER4 |
 			PPC_FEATURE_HAS_ALTIVEC_COMP,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -250,6 +261,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_PPC970,
 		.cpu_user_features	= COMMON_USER_POWER4 |
 			PPC_FEATURE_HAS_ALTIVEC_COMP,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -268,6 +280,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_PPC970,
 		.cpu_user_features	= COMMON_USER_POWER4 |
 			PPC_FEATURE_HAS_ALTIVEC_COMP,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -286,6 +299,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_PPC970,
 		.cpu_user_features	= COMMON_USER_POWER4 |
 			PPC_FEATURE_HAS_ALTIVEC_COMP,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 8,
@@ -302,6 +316,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER5 (gr)",
 		.cpu_features		= CPU_FTRS_POWER5,
 		.cpu_user_features	= COMMON_USER_POWER5,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 6,
@@ -322,6 +337,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER5+ (gs)",
 		.cpu_features		= CPU_FTRS_POWER5,
 		.cpu_user_features	= COMMON_USER_POWER5_PLUS,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 6,
@@ -338,6 +354,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER5+ (gs)",
 		.cpu_features		= CPU_FTRS_POWER5,
 		.cpu_user_features	= COMMON_USER_POWER5_PLUS,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 6,
@@ -355,6 +372,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER5+",
 		.cpu_features		= CPU_FTRS_POWER5,
 		.cpu_user_features	= COMMON_USER_POWER5_PLUS,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.machine_check		= machine_check_generic,
@@ -368,6 +386,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_POWER6,
 		.cpu_user_features	= COMMON_USER_POWER6 |
 			PPC_FEATURE_POWER6_EXT,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 6,
@@ -387,6 +406,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER6 (architected)",
 		.cpu_features		= CPU_FTRS_POWER6,
 		.cpu_user_features	= COMMON_USER_POWER6,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.machine_check		= machine_check_generic,
@@ -399,6 +419,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER7 (architected)",
 		.cpu_features		= CPU_FTRS_POWER7,
 		.cpu_user_features	= COMMON_USER_POWER7,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.machine_check		= machine_check_generic,
@@ -411,6 +432,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER7 (raw)",
 		.cpu_features		= CPU_FTRS_POWER7,
 		.cpu_user_features	= COMMON_USER_POWER7,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 6,
@@ -433,6 +455,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_user_features	= COMMON_USER_PPC64 |
 			PPC_FEATURE_CELL | PPC_FEATURE_HAS_ALTIVEC_COMP |
 			PPC_FEATURE_SMT,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 4,
@@ -448,6 +471,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "PA6T",
 		.cpu_features		= CPU_FTRS_PA6T,
 		.cpu_user_features	= COMMON_USER_PA6T,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 64,
 		.dcache_bsize		= 64,
 		.num_pmcs		= 6,
@@ -465,6 +489,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "POWER4 (compatible)",
 		.cpu_features		= CPU_FTRS_COMPATIBLE,
 		.cpu_user_features	= COMMON_USER_PPC64,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
 		.num_pmcs		= 6,
@@ -482,6 +507,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_PPC601,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_601_INSTR |
 			PPC_FEATURE_UNIFIED_CACHE | PPC_FEATURE_NO_TB,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_generic,
@@ -493,6 +519,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "603",
 		.cpu_features		= CPU_FTRS_603,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= 0,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -505,6 +532,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "603e",
 		.cpu_features		= CPU_FTRS_603,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= 0,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -517,6 +545,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "603ev",
 		.cpu_features		= CPU_FTRS_603,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= 0,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -529,6 +558,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "604",
 		.cpu_features		= CPU_FTRS_604,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 2,
@@ -542,6 +572,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "604e",
 		.cpu_features		= CPU_FTRS_604,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -555,6 +586,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "604r",
 		.cpu_features		= CPU_FTRS_604,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -568,6 +600,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "604ev",
 		.cpu_features		= CPU_FTRS_604,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -581,6 +614,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "740/750",
 		.cpu_features		= CPU_FTRS_740_NOTAU,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -594,6 +628,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750CX",
 		.cpu_features		= CPU_FTRS_750,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -607,6 +642,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750CX",
 		.cpu_features		= CPU_FTRS_750,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -621,6 +657,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750CXe",
 		.cpu_features		= CPU_FTRS_750,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -635,6 +672,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750CXe",
 		.cpu_features		= CPU_FTRS_750,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -649,6 +687,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750CL",
 		.cpu_features		= CPU_FTRS_750CL,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -663,6 +702,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "745/755",
 		.cpu_features		= CPU_FTRS_750,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -677,6 +717,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750FX",
 		.cpu_features		= CPU_FTRS_750FX1,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -691,6 +732,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750FX",
 		.cpu_features		= CPU_FTRS_750FX2,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -705,6 +747,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750FX",
 		.cpu_features		= CPU_FTRS_750FX,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -719,6 +762,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "750GX",
 		.cpu_features		= CPU_FTRS_750GX,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -733,6 +777,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "740/750",
 		.cpu_features		= CPU_FTRS_740,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -748,6 +793,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7400_NOTAU,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -763,6 +809,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7400,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -778,6 +825,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7400,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -793,6 +841,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7450_20,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -810,6 +859,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7450_21,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -827,6 +877,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7450_23,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -844,6 +895,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7455_1,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -861,6 +913,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7455_20,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -878,6 +931,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7455,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -895,6 +949,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7447_10,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -912,6 +967,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7447_10,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -928,6 +984,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "7447/7457",
 		.cpu_features		= CPU_FTRS_7447,
 		.cpu_user_features	= COMMON_USER | PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -945,6 +1002,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7447A,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -962,6 +1020,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_7448,
 		.cpu_user_features	= COMMON_USER |
 			PPC_FEATURE_HAS_ALTIVEC_COMP | PPC_FEATURE_PPC_LE,
+		.mmu_features		= MMU_FTR_HPTE_TABLE | MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 6,
@@ -978,6 +1037,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "82xx",
 		.cpu_features		= CPU_FTRS_82XX,
 		.cpu_user_features	= COMMON_USER,
+		.mmu_features		= 0,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -990,6 +1050,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "G2_LE",
 		.cpu_features		= CPU_FTRS_G2_LE,
 		.cpu_user_features	= COMMON_USER,
+		.mmu_features		= MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -1002,6 +1063,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "e300c1",
 		.cpu_features		= CPU_FTRS_E300,
 		.cpu_user_features	= COMMON_USER,
+		.mmu_features		= MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -1014,6 +1076,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "e300c2",
 		.cpu_features		= CPU_FTRS_E300C2,
 		.cpu_user_features	= PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU,
+		.mmu_features		= MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -1026,6 +1089,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "e300c3",
 		.cpu_features		= CPU_FTRS_E300,
 		.cpu_user_features	= COMMON_USER,
+		.mmu_features		= MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -1040,6 +1104,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "e300c4",
 		.cpu_features		= CPU_FTRS_E300,
 		.cpu_user_features	= COMMON_USER,
+		.mmu_features		= MMU_FTR_HAS_HIGH_BATS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_603,
@@ -1055,6 +1120,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "(generic PPC)",
 		.cpu_features		= CPU_FTRS_CLASSIC32,
 		.cpu_user_features	= COMMON_USER,
+		.mmu_features		= MMU_FTR_HPTE_TABLE,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_generic,
@@ -1070,6 +1136,7 @@ static struct cpu_spec __initdata cpu_sp
 		 * if the 8xx code is there.... */
 		.cpu_features		= CPU_FTRS_8XX,
 		.cpu_user_features	= PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU,
+		.mmu_features		= MMU_FTR_TYPE_8xx,
 		.icache_bsize		= 16,
 		.dcache_bsize		= 16,
 		.platform		= "ppc823",
@@ -1082,6 +1149,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "403GC",
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 16,
 		.dcache_bsize		= 16,
 		.machine_check		= machine_check_4xx,
@@ -1094,6 +1162,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 		 	PPC_FEATURE_HAS_MMU | PPC_FEATURE_NO_TB,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 16,
 		.dcache_bsize		= 16,
 		.machine_check		= machine_check_4xx,
@@ -1105,6 +1174,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "403G ??",
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 16,
 		.dcache_bsize		= 16,
 		.machine_check		= machine_check_4xx,
@@ -1117,6 +1187,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1129,6 +1200,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1141,6 +1213,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1153,6 +1226,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1165,6 +1239,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1177,6 +1252,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1189,6 +1265,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1201,6 +1278,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1212,6 +1290,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "405LP",
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1224,6 +1303,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1236,6 +1316,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1248,6 +1329,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1260,6 +1342,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1272,6 +1355,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1285,6 +1369,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1297,6 +1382,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_features		= CPU_FTRS_40X,
 		.cpu_user_features	= PPC_FEATURE_32 |
 			PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+		.mmu_features		= MMU_FTR_TYPE_40x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1311,6 +1397,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GR Rev. A",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1322,6 +1409,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440EP Rev. A",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440ep,
@@ -1334,6 +1422,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GR Rev. B",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1345,6 +1434,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440EP Rev. C",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440ep,
@@ -1357,6 +1447,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440EP Rev. B",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440ep,
@@ -1369,6 +1460,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GRX",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440grx,
@@ -1381,6 +1473,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440EPX",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440epx,
@@ -1393,6 +1486,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GP Rev. B",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1404,6 +1498,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GP Rev. C",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1415,6 +1510,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GX Rev. A",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440gx,
@@ -1427,6 +1523,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GX Rev. B",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440gx,
@@ -1439,6 +1536,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GX Rev. C",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440gx,
@@ -1451,6 +1549,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440GX Rev. F",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440gx,
@@ -1463,6 +1562,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440SP Rev. A",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1474,6 +1574,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name               = "440SPe Rev. A",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features      = COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize           = 32,
 		.dcache_bsize           = 32,
 		.cpu_setup		= __setup_cpu_440spe,
@@ -1486,6 +1587,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440SPe Rev. B",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_440spe,
@@ -1498,6 +1600,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "440 in Virtex-5 FXT",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.platform		= "ppc440",
@@ -1508,6 +1611,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "460EX",
 		.cpu_features		= CPU_FTRS_440x6,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_460ex,
@@ -1520,6 +1624,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "460GT",
 		.cpu_features		= CPU_FTRS_440x6,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.cpu_setup		= __setup_cpu_460gt,
@@ -1532,6 +1637,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "(generic 44x PPC)",
 		.cpu_features		= CPU_FTRS_44X,
 		.cpu_user_features	= COMMON_USER_BOOKE,
+		.mmu_features		= MMU_FTR_TYPE_44x,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_4xx,
@@ -1548,6 +1654,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_user_features	= COMMON_USER_BOOKE |
 			PPC_FEATURE_HAS_EFP_SINGLE |
 			PPC_FEATURE_UNIFIED_CACHE,
+		.mmu_features		= MMU_FTR_TYPE_FSL_E,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_e200,
 		.platform		= "ppc5554",
@@ -1562,6 +1669,7 @@ static struct cpu_spec __initdata cpu_sp
 			PPC_FEATURE_HAS_SPE_COMP |
 			PPC_FEATURE_HAS_EFP_SINGLE_COMP |
 			PPC_FEATURE_UNIFIED_CACHE,
+		.mmu_features		= MMU_FTR_TYPE_FSL_E,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_e200,
 		.platform		= "ppc5554",
@@ -1574,6 +1682,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_user_features	= COMMON_USER_BOOKE |
 			PPC_FEATURE_HAS_EFP_SINGLE |
 			PPC_FEATURE_UNIFIED_CACHE,
+		.mmu_features		= MMU_FTR_TYPE_FSL_E,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_e200,
 		.platform		= "ppc5554",
@@ -1588,6 +1697,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_user_features	= COMMON_USER_BOOKE |
 			PPC_FEATURE_HAS_SPE_COMP |
 			PPC_FEATURE_HAS_EFP_SINGLE_COMP,
+		.mmu_features		= MMU_FTR_TYPE_FSL_E,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -1605,6 +1715,7 @@ static struct cpu_spec __initdata cpu_sp
 			PPC_FEATURE_HAS_SPE_COMP |
 			PPC_FEATURE_HAS_EFP_SINGLE_COMP |
 			PPC_FEATURE_HAS_EFP_DOUBLE_COMP,
+		.mmu_features		= MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.num_pmcs		= 4,
@@ -1619,6 +1730,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_name		= "e500mc",
 		.cpu_features		= CPU_FTRS_E500MC,
 		.cpu_user_features	= COMMON_USER_BOOKE | PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS,
 		.icache_bsize		= 64,
 		.dcache_bsize		= 64,
 		.num_pmcs		= 4,
@@ -1635,6 +1747,7 @@ static struct cpu_spec __initdata cpu_sp
 		.cpu_user_features	= COMMON_USER_BOOKE |
 			PPC_FEATURE_HAS_SPE_COMP |
 			PPC_FEATURE_HAS_EFP_SINGLE_COMP,
+		.mmu_features		= MMU_FTR_TYPE_FSL_E,
 		.icache_bsize		= 32,
 		.dcache_bsize		= 32,
 		.machine_check		= machine_check_e500,
Index: linux-work/arch/powerpc/kernel/module.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/module.c	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/kernel/module.c	2008-12-09 16:42:03.000000000 +1100
@@ -78,6 +78,12 @@ int module_finalize(const Elf_Ehdr *hdr,
 				  (void *)sect->sh_addr,
 				  (void *)sect->sh_addr + sect->sh_size);
 
+	sect = find_section(hdr, sechdrs, "__mmu_ftr_fixup");
+	if (sect != NULL)
+		do_feature_fixups(cur_cpu_spec->mmu_features,
+				  (void *)sect->sh_addr,
+				  (void *)sect->sh_addr + sect->sh_size);
+
 #ifdef CONFIG_PPC64
 	sect = find_section(hdr, sechdrs, "__fw_ftr_fixup");
 	if (sect != NULL)
Index: linux-work/arch/powerpc/kernel/setup_32.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/setup_32.c	2008-11-24 14:48:55.000000000 +1100
+++ linux-work/arch/powerpc/kernel/setup_32.c	2008-12-09 17:07:34.000000000 +1100
@@ -97,6 +97,10 @@ notrace unsigned long __init early_init(
 			  PTRRELOC(&__start___ftr_fixup),
 			  PTRRELOC(&__stop___ftr_fixup));
 
+	do_feature_fixups(spec->mmu_features,
+			  PTRRELOC(&__start___mmu_ftr_fixup),
+			  PTRRELOC(&__stop___mmu_ftr_fixup));
+
 	do_lwsync_fixups(spec->cpu_features,
 			 PTRRELOC(&__start___lwsync_fixup),
 			 PTRRELOC(&__stop___lwsync_fixup));
Index: linux-work/arch/powerpc/kernel/setup_64.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/setup_64.c	2008-12-08 15:40:33.000000000 +1100
+++ linux-work/arch/powerpc/kernel/setup_64.c	2008-12-09 16:42:03.000000000 +1100
@@ -362,6 +362,8 @@ void __init setup_system(void)
 	 */
 	do_feature_fixups(cur_cpu_spec->cpu_features,
 			  &__start___ftr_fixup, &__stop___ftr_fixup);
+	do_feature_fixups(cur_cpu_spec->mmu_features,
+			  &__start___mmu_ftr_fixup, &__stop___mmu_ftr_fixup);
 	do_feature_fixups(powerpc_firmware_features,
 			  &__start___fw_ftr_fixup, &__stop___fw_ftr_fixup);
 	do_lwsync_fixups(cur_cpu_spec->cpu_features,
Index: linux-work/arch/powerpc/kernel/vdso.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/vdso.c	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/kernel/vdso.c	2008-12-09 16:42:03.000000000 +1100
@@ -567,6 +567,11 @@ static __init int vdso_fixup_features(st
 		do_feature_fixups(cur_cpu_spec->cpu_features,
 				  start64, start64 + size64);
 
+	start64 = find_section64(v64->hdr, "__mmu_ftr_fixup", &size64);
+	if (start64)
+		do_feature_fixups(cur_cpu_spec->mmu_features,
+				  start64, start64 + size64);
+
 	start64 = find_section64(v64->hdr, "__fw_ftr_fixup", &size64);
 	if (start64)
 		do_feature_fixups(powerpc_firmware_features,
@@ -583,6 +588,11 @@ static __init int vdso_fixup_features(st
 		do_feature_fixups(cur_cpu_spec->cpu_features,
 				  start32, start32 + size32);
 
+	start32 = find_section32(v32->hdr, "__mmu_ftr_fixup", &size32);
+	if (start32)
+		do_feature_fixups(cur_cpu_spec->mmu_features,
+				  start32, start32 + size32);
+
 #ifdef CONFIG_PPC64
 	start32 = find_section32(v32->hdr, "__fw_ftr_fixup", &size32);
 	if (start32)
Index: linux-work/arch/powerpc/kernel/vdso32/vdso32.lds.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/vdso32/vdso32.lds.S	2008-09-29 10:17:03.000000000 +1000
+++ linux-work/arch/powerpc/kernel/vdso32/vdso32.lds.S	2008-12-09 16:42:03.000000000 +1100
@@ -34,6 +34,9 @@ SECTIONS
 	__ftr_fixup	: { *(__ftr_fixup) }
 
 	. = ALIGN(8);
+	__mmu_ftr_fixup	: { *(__mmu_ftr_fixup) }
+
+	. = ALIGN(8);
 	__lwsync_fixup	: { *(__lwsync_fixup) }
 
 #ifdef CONFIG_PPC64
Index: linux-work/arch/powerpc/kernel/vdso64/vdso64.lds.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/vdso64/vdso64.lds.S	2008-09-29 10:17:03.000000000 +1000
+++ linux-work/arch/powerpc/kernel/vdso64/vdso64.lds.S	2008-12-09 16:42:03.000000000 +1100
@@ -35,6 +35,9 @@ SECTIONS
 	__ftr_fixup	: { *(__ftr_fixup) }
 
 	. = ALIGN(8);
+	__mmu_ftr_fixup	: { *(__mmu_ftr_fixup) }
+
+	. = ALIGN(8);
 	__lwsync_fixup	: { *(__lwsync_fixup) }
 
 	. = ALIGN(8);
Index: linux-work/arch/powerpc/kernel/vmlinux.lds.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/vmlinux.lds.S	2008-11-24 16:35:11.000000000 +1100
+++ linux-work/arch/powerpc/kernel/vmlinux.lds.S	2008-12-09 16:42:03.000000000 +1100
@@ -152,6 +152,12 @@ SECTIONS
 		__stop___ftr_fixup = .;
 	}
 	. = ALIGN(8);
+	__mmu_ftr_fixup : AT(ADDR(__mmu_ftr_fixup) - LOAD_OFFSET) {
+		__start___mmu_ftr_fixup = .;
+		*(__mmu_ftr_fixup)
+		__stop___mmu_ftr_fixup = .;
+	}
+	. = ALIGN(8);
 	__lwsync_fixup : AT(ADDR(__lwsync_fixup) - LOAD_OFFSET) {
 		__start___lwsync_fixup = .;
 		*(__lwsync_fixup)
Index: linux-work/arch/powerpc/mm/mmu_decl.h
===================================================================
--- linux-work.orig/arch/powerpc/mm/mmu_decl.h	2008-09-29 10:17:03.000000000 +1000
+++ linux-work/arch/powerpc/mm/mmu_decl.h	2008-12-09 16:42:03.000000000 +1100
@@ -86,7 +86,7 @@ static inline void flush_HPTE(unsigned c
 			      unsigned long pdval)
 {
 	if ((Hash != 0) &&
-	    cpu_has_feature(CPU_FTR_HPTE_TABLE))
+	    mmu_has_feature(MMU_FTR_HPTE_TABLE))
 		flush_hash_pages(0, va, pdval, 1);
 	else
 		_tlbie(va);
Index: linux-work/arch/powerpc/mm/ppc_mmu_32.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/ppc_mmu_32.c	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/mm/ppc_mmu_32.c	2008-12-09 16:42:03.000000000 +1100
@@ -192,7 +192,7 @@ void __init MMU_init_hw(void)
 	extern unsigned int hash_page[];
 	extern unsigned int flush_hash_patch_A[], flush_hash_patch_B[];
 
-	if (!cpu_has_feature(CPU_FTR_HPTE_TABLE)) {
+	if (!mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
 		/*
 		 * Put a blr (procedure return) instruction at the
 		 * start of hash_page, since we can still get DSI
Index: linux-work/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/head_32.S	2008-12-09 16:42:03.000000000 +1100
+++ linux-work/arch/powerpc/kernel/head_32.S	2008-12-09 16:42:03.000000000 +1100
@@ -990,12 +990,12 @@ load_up_mmu:
 	LOAD_BAT(1,r3,r4,r5)
 	LOAD_BAT(2,r3,r4,r5)
 	LOAD_BAT(3,r3,r4,r5)
-BEGIN_FTR_SECTION
+BEGIN_MMU_FTR_SECTION
 	LOAD_BAT(4,r3,r4,r5)
 	LOAD_BAT(5,r3,r4,r5)
 	LOAD_BAT(6,r3,r4,r5)
 	LOAD_BAT(7,r3,r4,r5)
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_HIGH_BATS)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HAS_HIGH_BATS)
 	blr
 
 /*
@@ -1141,7 +1141,7 @@ clear_bats:
 	mtspr	SPRN_IBAT2L,r10
 	mtspr	SPRN_IBAT3U,r10
 	mtspr	SPRN_IBAT3L,r10
-BEGIN_FTR_SECTION
+BEGIN_MMU_FTR_SECTION
 	/* Here's a tweak: at this point, CPU setup have
 	 * not been called yet, so HIGH_BAT_EN may not be
 	 * set in HID0 for the 745x processors. However, it
@@ -1164,7 +1164,7 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_IBAT6L,r10
 	mtspr	SPRN_IBAT7U,r10
 	mtspr	SPRN_IBAT7L,r10
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_HIGH_BATS)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HAS_HIGH_BATS)
 	blr
 
 flush_tlbs:
Index: linux-work/arch/powerpc/kernel/swsusp_32.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/swsusp_32.S	2008-07-07 13:45:04.000000000 +1000
+++ linux-work/arch/powerpc/kernel/swsusp_32.S	2008-12-09 16:42:03.000000000 +1100
@@ -5,7 +5,7 @@
 #include <asm/thread_info.h>
 #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
-
+#include <asm/mmu.h>
 
 /*
  * Structure for storing CPU registers on the save area.
@@ -279,7 +279,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 	mtibatl	3,r4
 #endif
 
-BEGIN_FTR_SECTION
+BEGIN_MMU_FTR_SECTION
 	li	r4,0
 	mtspr	SPRN_DBAT4U,r4
 	mtspr	SPRN_DBAT4L,r4
@@ -297,7 +297,7 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_IBAT6L,r4
 	mtspr	SPRN_IBAT7U,r4
 	mtspr	SPRN_IBAT7L,r4
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_HIGH_BATS)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HAS_HIGH_BATS)
 
 	/* Flush all TLBs */
 	lis	r4,0x1000
Index: linux-work/arch/powerpc/platforms/powermac/sleep.S
===================================================================
--- linux-work.orig/arch/powerpc/platforms/powermac/sleep.S	2008-07-07 13:45:04.000000000 +1000
+++ linux-work/arch/powerpc/platforms/powermac/sleep.S	2008-12-09 16:42:03.000000000 +1100
@@ -17,6 +17,7 @@
 #include <asm/cache.h>
 #include <asm/thread_info.h>
 #include <asm/asm-offsets.h>
+#include <asm/mmu.h>
 
 #define MAGIC	0x4c617273	/* 'Lars' */
 
@@ -323,7 +324,7 @@ grackle_wake_up:
 	lwz	r4,SL_IBAT3+4(r1)
 	mtibatl	3,r4
 
-BEGIN_FTR_SECTION
+BEGIN_MMU_FTR_SECTION
 	li	r4,0
 	mtspr	SPRN_DBAT4U,r4
 	mtspr	SPRN_DBAT4L,r4
@@ -341,7 +342,7 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_IBAT6L,r4
 	mtspr	SPRN_IBAT7U,r4
 	mtspr	SPRN_IBAT7L,r4
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_HIGH_BATS)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HAS_HIGH_BATS)
 
 	/* Flush all TLBs */
 	lis	r4,0x1000
Index: linux-work/arch/powerpc/kernel/head_fsl_booke.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/head_fsl_booke.S	2008-12-08 15:40:33.000000000 +1100
+++ linux-work/arch/powerpc/kernel/head_fsl_booke.S	2008-12-09 16:42:03.000000000 +1100
@@ -767,10 +767,10 @@ finish_tlb_load:
 	rlwimi	r12, r13, 24, 0, 7	/* grab RPN[32:39] */
 	rlwimi	r12, r11, 24, 8, 19	/* grab RPN[40:51] */
 	mtspr	SPRN_MAS3, r12
-BEGIN_FTR_SECTION
+BEGIN_MMU_FTR_SECTION
 	srwi	r10, r13, 8		/* grab RPN[8:31] */
 	mtspr	SPRN_MAS7, r10
-END_FTR_SECTION_IFSET(CPU_FTR_BIG_PHYS)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_BIG_PHYS)
 #else
 	rlwimi	r11, r12, 0, 20, 31	/* Extract RPN from PTE and merge with perms */
 	mtspr	SPRN_MAS3, r11

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 10/16] powerpc/mm: Remove flush_HPTE()
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (8 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 9/16] powerpc/mm: Introduce MMU features v2 Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15  5:44 ` [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3 Benjamin Herrenschmidt
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

The function flush_HPTE() is used in only one place, the implementation
of DEBUG_PAGEALLOC on ppc32.

It's actually a dup of flush_tlb_page() though it's -slightly- more
efficient on hash based processors. We remove it and replace it by
a direct call to the hash flush code on those processors and to
flush_tlb_page() for everybody else.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/mm/mmu_decl.h   |   17 -----------------
 arch/powerpc/mm/pgtable_32.c |    6 +++++-
 2 files changed, 5 insertions(+), 18 deletions(-)

--- linux-work.orig/arch/powerpc/mm/mmu_decl.h	2008-12-10 17:01:18.000000000 +1100
+++ linux-work/arch/powerpc/mm/mmu_decl.h	2008-12-10 17:01:35.000000000 +1100
@@ -58,17 +58,14 @@ extern phys_addr_t lowmem_end_addr;
  * architectures.  -- Dan
  */
 #if defined(CONFIG_8xx)
-#define flush_HPTE(X, va, pg)	_tlbie(va, 0 /* 8xx doesn't care about PID */)
 #define MMU_init_hw()		do { } while(0)
 #define mmu_mapin_ram()		(0UL)
 
 #elif defined(CONFIG_4xx)
-#define flush_HPTE(pid, va, pg)	_tlbie(va, pid)
 extern void MMU_init_hw(void);
 extern unsigned long mmu_mapin_ram(void);
 
 #elif defined(CONFIG_FSL_BOOKE)
-#define flush_HPTE(pid, va, pg)	_tlbie(va, pid)
 extern void MMU_init_hw(void);
 extern unsigned long mmu_mapin_ram(void);
 extern void adjust_total_lowmem(void);
@@ -77,18 +74,4 @@ extern void adjust_total_lowmem(void);
 /* anything 32-bit except 4xx or 8xx */
 extern void MMU_init_hw(void);
 extern unsigned long mmu_mapin_ram(void);
-
-/* Be careful....this needs to be updated if we ever encounter 603 SMPs,
- * which includes all new 82xx processors.  We need tlbie/tlbsync here
- * in that case (I think). -- Dan.
- */
-static inline void flush_HPTE(unsigned context, unsigned long va,
-			      unsigned long pdval)
-{
-	if ((Hash != 0) &&
-	    mmu_has_feature(MMU_FTR_HPTE_TABLE))
-		flush_hash_pages(0, va, pdval, 1);
-	else
-		_tlbie(va);
-}
 #endif
Index: linux-work/arch/powerpc/mm/pgtable_32.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/pgtable_32.c	2008-12-10 17:01:49.000000000 +1100
+++ linux-work/arch/powerpc/mm/pgtable_32.c	2008-12-10 17:04:36.000000000 +1100
@@ -342,7 +342,11 @@ static int __change_page_attr(struct pag
 		return -EINVAL;
 	set_pte_at(&init_mm, address, kpte, mk_pte(page, prot));
 	wmb();
-	flush_HPTE(0, address, pmd_val(*kpmd));
+#ifdef CONFIG_PPC_STD_MMU
+	flush_hash_pages(0, address, pmd_val(*kpmd), 1);
+#else
+	flush_tlb_page(NULL, address);
+#endif
 	pte_unmap(kpte);
 
 	return 0;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (9 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 10/16] powerpc/mm: Remove flush_HPTE() Benjamin Herrenschmidt
@ 2008-12-15  5:44 ` Benjamin Herrenschmidt
  2008-12-15 20:19   ` Kumar Gala
  2008-12-15  5:45 ` [PATCH 12/16] powerpc/mm: Split low level tlb invalidate for nohash processors Benjamin Herrenschmidt
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This patch moves the whole no-hash TLB handling out of line into a
new tlb_nohash.c file, and implements some basic SMP support using
IPIs and/or broadcast tlbivax instructions.

Note that I'm using local invalidations for D->I cache coherency.

At worst, if another processor is trying to execute the same and
has the old entry in its TLB, it will just take a fault and re-do
the TLB flush locally (it won't re-do the cache flush in any case).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2. This variant fixes usage of linux/spinlock.h instead of asm/spinlock.h
v3. Invadvertently un-EXPORT_SYMBOL'ed some cache flush calls on ppc64
v4. Fix differences in local_* flush variants between CPU types and
    corresponding clash with highmem code. Remove remaining _tlbie calls
    from nohash code.

 arch/powerpc/include/asm/highmem.h  |    4 
 arch/powerpc/include/asm/mmu.h      |    3 
 arch/powerpc/include/asm/tlbflush.h |   84 ++++++--------
 arch/powerpc/kernel/misc_32.S       |    9 +
 arch/powerpc/kernel/ppc_ksyms.c     |    6 -
 arch/powerpc/mm/Makefile            |    2 
 arch/powerpc/mm/fault.c             |    2 
 arch/powerpc/mm/mem.c               |    2 
 arch/powerpc/mm/tlb_hash32.c        |    4 
 arch/powerpc/mm/tlb_nohash.c        |  209 ++++++++++++++++++++++++++++++++++++
 10 files changed, 268 insertions(+), 57 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/tlbflush.h	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/tlbflush.h	2008-12-15 14:36:38.000000000 +1100
@@ -6,7 +6,9 @@
  *
  *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
  *  - flush_tlb_page(vma, vmaddr) flushes one page
- *  - local_flush_tlb_page(vmaddr) flushes one page on the local processor
+ *  - local_flush_tlb_mm(mm) flushes the specified mm context on
+ *                           the local processor
+ *  - local_flush_tlb_page(vma, vmaddr) flushes one page on the local processor
  *  - flush_tlb_page_nohash(vma, vmaddr) flushes one page if SW loaded TLB
  *  - flush_tlb_range(vma, start, end) flushes a range of pages
  *  - flush_tlb_kernel_range(start, end) flushes a range of kernel pages
@@ -18,7 +20,7 @@
  */
 #ifdef __KERNEL__
 
-#if defined(CONFIG_4xx) || defined(CONFIG_8xx) || defined(CONFIG_FSL_BOOKE)
+#ifdef CONFIG_PPC_MMU_NOHASH
 /*
  * TLB flushing for software loaded TLB chips
  *
@@ -31,10 +33,10 @@
 
 #define MMU_NO_CONTEXT      	((unsigned int)-1)
 
-extern void _tlbie(unsigned long address, unsigned int pid);
 extern void _tlbil_all(void);
 extern void _tlbil_pid(unsigned int pid);
 extern void _tlbil_va(unsigned long address, unsigned int pid);
+extern void _tlbivax_bcast(unsigned long address, unsigned int pid);
 
 #if defined(CONFIG_40x) || defined(CONFIG_8xx)
 #define _tlbia()	asm volatile ("tlbia; sync" : : : "memory")
@@ -42,48 +44,26 @@ extern void _tlbil_va(unsigned long addr
 extern void _tlbia(void);
 #endif
 
-static inline void local_flush_tlb_mm(struct mm_struct *mm)
-{
-	_tlbil_pid(mm->context.id);
-}
-
-static inline void flush_tlb_mm(struct mm_struct *mm)
-{
-	_tlbil_pid(mm->context.id);
-}
-
-static inline void local_flush_tlb_page(unsigned long vmaddr)
-{
-	_tlbil_va(vmaddr, 0);
-}
-
-static inline void flush_tlb_page(struct vm_area_struct *vma,
-				  unsigned long vmaddr)
-{
-	_tlbil_va(vmaddr, vma ? vma->vm_mm->context.id : 0);
-}
+extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+			    unsigned long end);
+extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
-static inline void flush_tlb_page_nohash(struct vm_area_struct *vma,
-					 unsigned long vmaddr)
-{
-	flush_tlb_page(vma, vmaddr);
-}
+extern void local_flush_tlb_mm(struct mm_struct *mm);
+extern void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
 
-static inline void flush_tlb_range(struct vm_area_struct *vma,
-				   unsigned long start, unsigned long end)
-{
-	_tlbil_pid(vma->vm_mm->context.id);
-}
+#ifdef CONFIG_SMP
+extern void flush_tlb_mm(struct mm_struct *mm);
+extern void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
+#else
+#define flush_tlb_mm(mm)		local_flush_tlb_mm(mm)
+#define flush_tlb_page(vma,addr)	local_flush_tlb_page(vma,addr)
+#endif
+#define flush_tlb_page_nohash(vma,addr)	flush_tlb_page(vma,addr)
 
-static inline void flush_tlb_kernel_range(unsigned long start,
-					  unsigned long end)
-{
-	_tlbil_pid(0);
-}
+#elif defined(CONFIG_PPC_STD_MMU_32)
 
-#elif defined(CONFIG_PPC32)
 /*
- * TLB flushing for "classic" hash-MMMU 32-bit CPUs, 6xx, 7xx, 7xxx
+ * TLB flushing for "classic" hash-MMU 32-bit CPUs, 6xx, 7xx, 7xxx
  */
 extern void _tlbie(unsigned long address);
 extern void _tlbia(void);
@@ -94,14 +74,20 @@ extern void flush_tlb_page_nohash(struct
 extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 			    unsigned long end);
 extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
-static inline void local_flush_tlb_page(unsigned long vmaddr)
+static inline void local_flush_tlb_page(struct vm_area_struct *vma,
+					unsigned long vmaddr)
 {
-	flush_tlb_page(NULL, vmaddr);
+	flush_tlb_page(vma, vmaddr);
+}
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+	flush_tlb_mm(mm);
 }
 
-#else
+#elif defined(CONFIG_PPC_STD_MMU_64)
+
 /*
- * TLB flushing for 64-bit has-MMU CPUs
+ * TLB flushing for 64-bit hash-MMU CPUs
  */
 
 #include <linux/percpu.h>
@@ -151,11 +137,16 @@ extern void flush_hash_page(unsigned lon
 extern void flush_hash_range(unsigned long number, int local);
 
 
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+}
+
 static inline void flush_tlb_mm(struct mm_struct *mm)
 {
 }
 
-static inline void local_flush_tlb_page(unsigned long vmaddr)
+static inline void local_flush_tlb_page(struct vm_area_struct *vma,
+					unsigned long vmaddr)
 {
 }
 
@@ -183,7 +174,8 @@ static inline void flush_tlb_kernel_rang
 extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
 				     unsigned long end);
 
-
+#else
+#error Unsupported MMU type
 #endif
 
 #endif /*__KERNEL__ */
Index: linux-work/arch/powerpc/mm/Makefile
===================================================================
--- linux-work.orig/arch/powerpc/mm/Makefile	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/mm/Makefile	2008-12-15 14:36:38.000000000 +1100
@@ -9,7 +9,7 @@ endif
 obj-y				:= fault.o mem.o pgtable.o \
 				   init_$(CONFIG_WORD_SIZE).o \
 				   pgtable_$(CONFIG_WORD_SIZE).o
-obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o
+obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o tlb_nohash.o
 hash-$(CONFIG_PPC_NATIVE)	:= hash_native_64.o
 obj-$(CONFIG_PPC64)		+= hash_utils_64.o \
 				   slb_low.o slb.o stab.o \
Index: linux-work/arch/powerpc/mm/tlb_nohash.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/powerpc/mm/tlb_nohash.c	2008-12-15 14:36:20.000000000 +1100
@@ -0,0 +1,209 @@
+/*
+ * This file contains the routines for TLB flushing.
+ * On machines where the MMU does not use a hash table to store virtual to
+ * physical translations (ie, SW loaded TLBs or Book3E compilant processors,
+ * this does -not- include 603 however which shares the implementation with
+ * hash based processors)
+ *
+ *  -- BenH
+ *
+ * Copyright 2008 Ben Herrenschmidt <benh@kernel.crashing.org>
+ *                IBM Corp.
+ *
+ *  Derived from arch/ppc/mm/init.c:
+ *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
+ *
+ *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
+ *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
+ *    Copyright (C) 1996 Paul Mackerras
+ *
+ *  Derived from "arch/i386/mm/init.c"
+ *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+
+#include <asm/tlbflush.h>
+#include <asm/tlb.h>
+
+#include "mmu_decl.h"
+
+/*
+ * Basse TLB flushing operations:
+ *
+ *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
+ *  - flush_tlb_page(vma, vmaddr) flushes one page
+ *  - flush_tlb_range(vma, start, end) flushes a range of pages
+ *  - flush_tlb_kernel_range(start, end) flushes kernel pages
+ *
+ *  - local_* variants of page and mm only apply to the current
+ *    processor
+ */
+
+/*
+ * These are the base non-SMP variants of page and mm flushing
+ */
+void local_flush_tlb_mm(struct mm_struct *mm)
+{
+	unsigned int pid;
+
+	preempt_disable();
+	pid = mm->context.id;
+	if (pid != MMU_NO_CONTEXT)
+		_tlbil_pid(pid);
+	preempt_enable();
+}
+EXPORT_SYMBOL(local_flush_tlb_mm);
+
+void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
+{
+	unsigned int pid;
+
+	preempt_disable();
+	pid = vma ? vma->vm_mm->context.id : 0;
+	if (pid != MMU_NO_CONTEXT)
+		_tlbil_va(vmaddr, pid);
+	preempt_enable();
+}
+EXPORT_SYMBOL(local_flush_tlb_page);
+
+
+/*
+ * And here are the SMP non-local implementations
+ */
+#ifdef CONFIG_SMP
+
+static DEFINE_SPINLOCK(tlbivax_lock);
+
+struct tlb_flush_param {
+	unsigned long addr;
+	unsigned int pid;
+};
+
+static void do_flush_tlb_mm_ipi(void *param)
+{
+	struct tlb_flush_param *p = param;
+
+	_tlbil_pid(p ? p->pid : 0);
+}
+
+static void do_flush_tlb_page_ipi(void *param)
+{
+	struct tlb_flush_param *p = param;
+
+	_tlbil_va(p->addr, p->pid);
+}
+
+
+/* Note on invalidations and PID:
+ *
+ * We snapshot the PID with preempt disabled. At this point, it can still
+ * change either because:
+ * - our context is being stolen (PID -> NO_CONTEXT) on another CPU
+ * - we are invaliating some target that isn't currently running here
+ *   and is concurrently acquiring a new PID on another CPU
+ * - some other CPU is re-acquiring a lost PID for this mm
+ * etc...
+ *
+ * However, this shouldn't be a problem as we only guarantee
+ * invalidation of TLB entries present prior to this call, so we
+ * don't care about the PID changing, and invalidating a stale PID
+ * is generally harmless.
+ */
+
+void flush_tlb_mm(struct mm_struct *mm)
+{
+	cpumask_t cpu_mask;
+	unsigned int pid;
+
+	preempt_disable();
+	pid = mm->context.id;
+	if (unlikely(pid == MMU_NO_CONTEXT))
+		goto no_context;
+	cpu_mask = mm->cpu_vm_mask;
+	cpu_clear(smp_processor_id(), cpu_mask);
+	if (!cpus_empty(cpu_mask)) {
+		struct tlb_flush_param p = { .pid = pid };
+		smp_call_function_mask(cpu_mask, do_flush_tlb_mm_ipi, &p, 1);
+	}
+	_tlbil_pid(pid);
+ no_context:
+	preempt_enable();
+}
+EXPORT_SYMBOL(flush_tlb_mm);
+
+void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
+{
+	cpumask_t cpu_mask;
+	unsigned int pid;
+
+	preempt_disable();
+	pid = vma ? vma->vm_mm->context.id : 0;
+	if (unlikely(pid == MMU_NO_CONTEXT))
+		goto bail;
+	cpu_mask = vma->vm_mm->cpu_vm_mask;
+	cpu_clear(smp_processor_id(), cpu_mask);
+	if (!cpus_empty(cpu_mask)) {
+		/* If broadcast tlbivax is supported, use it */
+		if (mmu_has_feature(MMU_FTR_HAS_TLBIVAX_BCAST)) {
+			int lock = mmu_has_feature(MMU_FTR_TLBIVAX_NEED_LOCK);
+			if (lock)
+				spin_lock(&tlbivax_lock);
+			_tlbivax_bcast(vmaddr, pid);
+			if (lock)
+				spin_unlock(&tlbivax_lock);
+			goto bail;
+		} else {
+			struct tlb_flush_param p = { .pid = pid, .addr = vmaddr };
+			smp_call_function_mask(cpu_mask,
+					       do_flush_tlb_page_ipi, &p, 1);
+		}
+	}
+	_tlbil_va(vmaddr, pid);
+ bail:
+	preempt_enable();
+}
+EXPORT_SYMBOL(flush_tlb_page);
+
+#endif /* CONFIG_SMP */
+
+/*
+ * Flush kernel TLB entries in the given range
+ */
+void flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+#ifdef CONFIG_SMP
+	preempt_disable();
+	smp_call_function(do_flush_tlb_mm_ipi, NULL, 1);
+	_tlbil_pid(0);
+	preempt_enable();
+#endif
+	_tlbil_pid(0);
+}
+EXPORT_SYMBOL(flush_tlb_kernel_range);
+
+/*
+ * Currently, for range flushing, we just do a full mm flush. This should
+ * be optimized based on a threshold on the size of the range, since
+ * some implementation can stack multiple tlbivax before a tlbsync but
+ * for now, we keep it that way
+ */
+void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+		     unsigned long end)
+
+{
+	flush_tlb_mm(vma->vm_mm);
+}
+EXPORT_SYMBOL(flush_tlb_range);
Index: linux-work/arch/powerpc/kernel/misc_32.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/misc_32.S	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/kernel/misc_32.S	2008-12-15 14:37:28.000000000 +1100
@@ -29,6 +29,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/processor.h>
 #include <asm/kexec.h>
+#include <asm/bug.h>
 
 	.text
 
@@ -496,6 +497,14 @@ _GLOBAL(_tlbil_va)
 	blr
 #endif /* CONFIG_FSL_BOOKE */
 
+/*
+ * Nobody implements this yet
+ */
+_GLOBAL(_tlbivax_bcast)
+1:	trap
+	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0;
+	blr
+
 
 /*
  * Flush instruction cache.
Index: linux-work/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/ppc_ksyms.c	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/kernel/ppc_ksyms.c	2008-12-15 14:36:20.000000000 +1100
@@ -116,12 +116,6 @@ EXPORT_SYMBOL(giveup_spe);
 
 #ifndef CONFIG_PPC64
 EXPORT_SYMBOL(flush_instruction_cache);
-EXPORT_SYMBOL(flush_tlb_kernel_range);
-EXPORT_SYMBOL(flush_tlb_page);
-EXPORT_SYMBOL(_tlbie);
-#if defined(CONFIG_4xx) || defined(CONFIG_8xx) || defined(CONFIG_FSL_BOOKE)
-EXPORT_SYMBOL(_tlbil_va);
-#endif
 #endif
 EXPORT_SYMBOL(__flush_icache_range);
 EXPORT_SYMBOL(flush_dcache_range);
Index: linux-work/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/tlb_hash32.c	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/mm/tlb_hash32.c	2008-12-15 14:36:20.000000000 +1100
@@ -137,6 +137,7 @@ void flush_tlb_kernel_range(unsigned lon
 	flush_range(&init_mm, start, end);
 	FINISH_FLUSH;
 }
+EXPORT_SYMBOL(flush_tlb_kernel_range);
 
 /*
  * Flush all the (user) entries for the address space described by mm.
@@ -160,6 +161,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 		flush_range(mp->vm_mm, mp->vm_start, mp->vm_end);
 	FINISH_FLUSH;
 }
+EXPORT_SYMBOL(flush_tlb_mm);
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
 {
@@ -176,6 +178,7 @@ void flush_tlb_page(struct vm_area_struc
 		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
 	FINISH_FLUSH;
 }
+EXPORT_SYMBOL(flush_tlb_page);
 
 /*
  * For each address in the range, find the pte for the address
@@ -188,3 +191,4 @@ void flush_tlb_range(struct vm_area_stru
 	flush_range(vma->vm_mm, start, end);
 	FINISH_FLUSH;
 }
+EXPORT_SYMBOL(flush_tlb_range);
Index: linux-work/arch/powerpc/include/asm/mmu.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/mmu.h	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/mmu.h	2008-12-15 14:36:20.000000000 +1100
@@ -15,6 +15,9 @@
 #define MMU_FTR_TYPE_FSL_E		ASM_CONST(0x00000010)
 #define MMU_FTR_HAS_HIGH_BATS		ASM_CONST(0x00010000)
 #define MMU_FTR_BIG_PHYS		ASM_CONST(0x00020000)
+#define MMU_FTR_HAS_TLBIVAX_BCAST	ASM_CONST(0x00040000)
+#define MMU_FTR_HAS_TLBILX_PID		ASM_CONST(0x00080000)
+#define MMU_FTR_TLBIVAX_NEED_LOCK	ASM_CONST(0x00100000)
 
 #ifndef __ASSEMBLY__
 #include <asm/cputable.h>
Index: linux-work/arch/powerpc/include/asm/highmem.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/highmem.h	2008-12-08 15:40:33.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/highmem.h	2008-12-15 14:36:20.000000000 +1100
@@ -85,7 +85,7 @@ static inline void *kmap_atomic_prot(str
 	BUG_ON(!pte_none(*(kmap_pte-idx)));
 #endif
 	__set_pte_at(&init_mm, vaddr, kmap_pte-idx, mk_pte(page, prot));
-	local_flush_tlb_page(vaddr);
+	local_flush_tlb_page(NULL, vaddr);
 
 	return (void*) vaddr;
 }
@@ -113,7 +113,7 @@ static inline void kunmap_atomic(void *k
 	 * this pte without first remap it
 	 */
 	pte_clear(&init_mm, vaddr, kmap_pte-idx);
-	local_flush_tlb_page(vaddr);
+	local_flush_tlb_page(NULL, vaddr);
 #endif
 	pagefault_enable();
 }
Index: linux-work/arch/powerpc/mm/fault.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/fault.c	2008-12-03 13:32:53.000000000 +1100
+++ linux-work/arch/powerpc/mm/fault.c	2008-12-15 14:36:20.000000000 +1100
@@ -284,7 +284,7 @@ good_area:
 				}
 				pte_update(ptep, 0, _PAGE_HWEXEC |
 					   _PAGE_ACCESSED);
-				_tlbie(address, mm->context.id);
+				local_flush_tlb_page(vma, address);
 				pte_unmap_unlock(ptep, ptl);
 				up_read(&mm->mmap_sem);
 				return 0;
Index: linux-work/arch/powerpc/mm/mem.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/mem.c	2008-11-24 14:48:55.000000000 +1100
+++ linux-work/arch/powerpc/mm/mem.c	2008-12-15 14:36:20.000000000 +1100
@@ -488,7 +488,7 @@ void update_mmu_cache(struct vm_area_str
 		 * we invalidate the TLB here, thus avoiding dcbst
 		 * misbehaviour.
 		 */
-		_tlbie(address, 0 /* 8xx doesn't care about PID */);
+		_tlbil_va(address, 0 /* 8xx doesn't care about PID */);
 #endif
 		/* The _PAGE_USER test should really be _PAGE_EXEC, but
 		 * older glibc versions execute some code from no-exec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 12/16] powerpc/mm: Split low level tlb invalidate for nohash processors
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (10 preceding siblings ...)
  2008-12-15  5:44 ` [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3 Benjamin Herrenschmidt
@ 2008-12-15  5:45 ` Benjamin Herrenschmidt
  2008-12-15  5:45 ` [PATCH 13/16] powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 Benjamin Herrenschmidt
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.

This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.

I moved all the definitions for that stuff out of include/asm/tlbflush.h as
they are really internal mm stuff, into mm/mmu_decl.h

The code should have no functional changes. I kept some variants
inline for trivial forms on things like 40x and 8xx. 

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/include/asm/tlbflush.h |   14 --
 arch/powerpc/kernel/misc_32.S       |  233 ------------------------------------
 arch/powerpc/kvm/powerpc.c          |    2 
 arch/powerpc/mm/Makefile            |    3 
 arch/powerpc/mm/hash_low_32.S       |   76 +++++++++++
 arch/powerpc/mm/mmu_decl.h          |   48 +++++++
 arch/powerpc/mm/tlb_nohash_low.S    |  165 +++++++++++++++++++++++++
 7 files changed, 292 insertions(+), 249 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/tlbflush.h	2008-12-15 15:46:23.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/tlbflush.h	2008-12-15 15:46:56.000000000 +1100
@@ -33,17 +33,6 @@
 
 #define MMU_NO_CONTEXT      	((unsigned int)-1)
 
-extern void _tlbil_all(void);
-extern void _tlbil_pid(unsigned int pid);
-extern void _tlbil_va(unsigned long address, unsigned int pid);
-extern void _tlbivax_bcast(unsigned long address, unsigned int pid);
-
-#if defined(CONFIG_40x) || defined(CONFIG_8xx)
-#define _tlbia()	asm volatile ("tlbia; sync" : : : "memory")
-#else /* CONFIG_44x || CONFIG_FSL_BOOKE */
-extern void _tlbia(void);
-#endif
-
 extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 			    unsigned long end);
 extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
@@ -65,9 +54,6 @@ extern void flush_tlb_page(struct vm_are
 /*
  * TLB flushing for "classic" hash-MMU 32-bit CPUs, 6xx, 7xx, 7xxx
  */
-extern void _tlbie(unsigned long address);
-extern void _tlbia(void);
-
 extern void flush_tlb_mm(struct mm_struct *mm);
 extern void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
 extern void flush_tlb_page_nohash(struct vm_area_struct *vma, unsigned long addr);
Index: linux-work/arch/powerpc/mm/mmu_decl.h
===================================================================
--- linux-work.orig/arch/powerpc/mm/mmu_decl.h	2008-12-15 15:46:23.000000000 +1100
+++ linux-work/arch/powerpc/mm/mmu_decl.h	2008-12-15 15:46:56.000000000 +1100
@@ -22,10 +22,58 @@
 #include <asm/tlbflush.h>
 #include <asm/mmu.h>
 
+#ifdef CONFIG_PPC_MMU_NOHASH
+
+/*
+ * On 40x and 8xx, we directly inline tlbia and tlbivax
+ */
+#if defined(CONFIG_40x) || defined(CONFIG_8xx)
+static inline void _tlbil_all(void)
+{
+	asm volatile ("sync; tlbia; isync" : : : "memory")
+}
+static inline void _tlbil_pid(unsigned int pid)
+{
+	asm volatile ("sync; tlbia; isync" : : : "memory")
+}
+#else /* CONFIG_40x || CONFIG_8xx */
+extern void _tlbil_all(void);
+extern void _tlbil_pid(unsigned int pid);
+#endif /* !(CONFIG_40x || CONFIG_8xx) */
+
+/*
+ * On 8xx, we directly inline tlbie, on others, it's extern
+ */
+#ifdef CONFIG_8xx
+static inline void _tlbil_va(unsigned long address, unsigned int pid)
+{
+	asm volatile ("tlbie %0; sync" : : "r" (address) : "memory")
+}
+#else /* CONFIG_8xx */
+extern void _tlbil_va(unsigned long address, unsigned int pid);
+#endif /* CONIFG_8xx */
+
+/*
+ * As of today, we don't support tlbivax broadcast on any
+ * implementation. When that becomes the case, this will be
+ * an extern.
+ */
+static inline void _tlbivax_bcast(unsigned long address, unsigned int pid)
+{
+	BUG();
+}
+
+#else /* CONFIG_PPC_MMU_NOHASH */
+
 extern void hash_preload(struct mm_struct *mm, unsigned long ea,
 			 unsigned long access, unsigned long trap);
 
 
+extern void _tlbie(unsigned long address);
+extern void _tlbia(void);
+
+#endif /* CONFIG_PPC_MMU_NOHASH */
+
 #ifdef CONFIG_PPC32
 extern void mapin_ram(void);
 extern int map_page(unsigned long va, phys_addr_t pa, int flags);
Index: linux-work/arch/powerpc/mm/tlb_nohash_low.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/powerpc/mm/tlb_nohash_low.S	2008-12-15 15:47:57.000000000 +1100
@@ -0,0 +1,165 @@
+/*
+ * This file contains low-level functions for performing various
+ * types of TLB invalidations on various processors with no hash
+ * table.
+ *
+ * This file implements the following functions for all no-hash
+ * processors. Some aren't implemented for some variants. Some
+ * are inline in tlbflush.h
+ *
+ *	- tlbil_va
+ *	- tlbil_pid
+ *	- tlbil_all
+ *	- tlbivax_bcast (not yet)
+ *
+ * Code mostly moved over from misc_32.S
+ *
+ *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
+ *
+ * Partially rewritten by Cort Dougan (cort@cs.nmt.edu)
+ * Paul Mackerras, Kumar Gala and Benjamin Herrenschmidt.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/cputable.h>
+#include <asm/mmu.h>
+#include <asm/ppc_asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/processor.h>
+
+#if defined(CONFIG_40x)
+
+/*
+ * 40x implementation needs only tlbil_va
+ */
+_GLOBAL(_tlbil_va)
+	/* We run the search with interrupts disabled because we have to change
+	 * the PID and I don't want to preempt when that happens.
+	 */
+	mfmsr	r5
+	mfspr	r6,SPRN_PID
+	wrteei	0
+	mtspr	SPRN_PID,r4
+	tlbsx.	r3, 0, r3
+	mtspr	SPRN_PID,r6
+	wrtee	r5
+	bne	1f
+	sync
+	/* There are only 64 TLB entries, so r3 < 64, which means bit 25 is
+	 * clear. Since 25 is the V bit in the TLB_TAG, loading this value
+	 * will invalidate the TLB entry. */
+	tlbwe	r3, r3, TLB_TAG
+	isync
+1:	blr
+
+#elif defined(CONFIG_8xx)
+
+/*
+ * Nothing to do for 8xx, everything is inline
+ */
+
+#elif defined(CONFIG_44x)
+
+/*
+ * 440 implementation uses tlbsx/we for tlbil_va and a full sweep
+ * of the TLB for everything else.
+ */
+_GLOBAL(_tlbil_va)
+	mfspr	r5,SPRN_MMUCR
+	rlwimi	r5,r4,0,24,31			/* Set TID */
+
+	/* We have to run the search with interrupts disabled, even critical
+	 * and debug interrupts (in fact the only critical exceptions we have
+	 * are debug and machine check).  Otherwise  an interrupt which causes
+	 * a TLB miss can clobber the MMUCR between the mtspr and the tlbsx. */
+	mfmsr	r4
+	lis	r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@ha
+	addi	r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
+	andc	r6,r4,r6
+	mtmsr	r6
+	mtspr	SPRN_MMUCR,r5
+	tlbsx.	r3, 0, r3
+	mtmsr	r4
+	bne	1f
+	sync
+	/* There are only 64 TLB entries, so r3 < 64,
+	 * which means bit 22, is clear.  Since 22 is
+	 * the V bit in the TLB_PAGEID, loading this
+	 * value will invalidate the TLB entry.
+	 */
+	tlbwe	r3, r3, PPC44x_TLB_PAGEID
+	isync
+1:	blr
+
+_GLOBAL(_tlbil_all)
+_GLOBAL(_tlbil_pid)
+	li	r3,0
+	sync
+
+	/* Load high watermark */
+	lis	r4,tlb_44x_hwater@ha
+	lwz	r5,tlb_44x_hwater@l(r4)
+
+1:	tlbwe	r3,r3,PPC44x_TLB_PAGEID
+	addi	r3,r3,1
+	cmpw	0,r3,r5
+	ble	1b
+
+	isync
+	blr
+
+#elif defined(CONFIG_FSL_BOOKE)
+/*
+ * FSL BookE implementations. Currently _pid and _all are the
+ * same. This will change when tlbilx is actually supported and
+ * performs invalidate-by-PID. This change will be driven by
+ * mmu_features conditional
+ */
+
+/*
+ * Flush MMU TLB on the local processor
+ */
+_GLOBAL(_tlbil_pid)
+_GLOBAL(_tlbil_all)
+#define MMUCSR0_TLBFI	(MMUCSR0_TLB0FI | MMUCSR0_TLB1FI | \
+			 MMUCSR0_TLB2FI | MMUCSR0_TLB3FI)
+	li	r3,(MMUCSR0_TLBFI)@l
+	mtspr	SPRN_MMUCSR0, r3
+1:
+	mfspr	r3,SPRN_MMUCSR0
+	andi.	r3,r3,MMUCSR0_TLBFI@l
+	bne	1b
+	msync
+	isync
+	blr
+
+/*
+ * Flush MMU TLB for a particular address, but only on the local processor
+ * (no broadcast)
+ */
+_GLOBAL(_tlbil_va)
+	mfmsr	r10
+	wrteei	0
+	slwi	r4,r4,16
+	mtspr	SPRN_MAS6,r4		/* assume AS=0 for now */
+	tlbsx	0,r3
+	mfspr	r4,SPRN_MAS1		/* check valid */
+	andis.	r3,r4,MAS1_VALID@h
+	beqlr
+	rlwinm	r4,r4,0,1,31
+	mtspr	SPRN_MAS1,r4
+	tlbwe
+	msync
+	isync
+	wrtee	r10
+	blr
+#elif
+#error Unsupported processor type !
+#endif
Index: linux-work/arch/powerpc/kernel/misc_32.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/misc_32.S	2008-12-15 15:46:24.000000000 +1100
+++ linux-work/arch/powerpc/kernel/misc_32.S	2008-12-15 15:46:56.000000000 +1100
@@ -272,239 +272,6 @@ _GLOBAL(real_writeb)
 
 #endif /* CONFIG_40x */
 
-/*
- * Flush MMU TLB
- */
-#ifndef CONFIG_FSL_BOOKE
-_GLOBAL(_tlbil_all)
-_GLOBAL(_tlbil_pid)
-#endif
-_GLOBAL(_tlbia)
-#if defined(CONFIG_40x)
-	sync			/* Flush to memory before changing mapping */
-	tlbia
-	isync			/* Flush shadow TLB */
-#elif defined(CONFIG_44x)
-	li	r3,0
-	sync
-
-	/* Load high watermark */
-	lis	r4,tlb_44x_hwater@ha
-	lwz	r5,tlb_44x_hwater@l(r4)
-
-1:	tlbwe	r3,r3,PPC44x_TLB_PAGEID
-	addi	r3,r3,1
-	cmpw	0,r3,r5
-	ble	1b
-
-	isync
-#elif defined(CONFIG_FSL_BOOKE)
-	/* Invalidate all entries in TLB0 */
-	li	r3, 0x04
-	tlbivax	0,3
-	/* Invalidate all entries in TLB1 */
-	li	r3, 0x0c
-	tlbivax	0,3
-	msync
-#ifdef CONFIG_SMP
-	tlbsync
-#endif /* CONFIG_SMP */
-#else /* !(CONFIG_40x || CONFIG_44x || CONFIG_FSL_BOOKE) */
-#if defined(CONFIG_SMP)
-	rlwinm	r8,r1,0,0,(31-THREAD_SHIFT)
-	lwz	r8,TI_CPU(r8)
-	oris	r8,r8,10
-	mfmsr	r10
-	SYNC
-	rlwinm	r0,r10,0,17,15		/* clear bit 16 (MSR_EE) */
-	rlwinm	r0,r0,0,28,26		/* clear DR */
-	mtmsr	r0
-	SYNC_601
-	isync
-	lis	r9,mmu_hash_lock@h
-	ori	r9,r9,mmu_hash_lock@l
-	tophys(r9,r9)
-10:	lwarx	r7,0,r9
-	cmpwi	0,r7,0
-	bne-	10b
-	stwcx.	r8,0,r9
-	bne-	10b
-	sync
-	tlbia
-	sync
-	TLBSYNC
-	li	r0,0
-	stw	r0,0(r9)		/* clear mmu_hash_lock */
-	mtmsr	r10
-	SYNC_601
-	isync
-#else /* CONFIG_SMP */
-	sync
-	tlbia
-	sync
-#endif /* CONFIG_SMP */
-#endif /* ! defined(CONFIG_40x) */
-	blr
-
-/*
- * Flush MMU TLB for a particular address
- */
-#ifndef CONFIG_FSL_BOOKE
-_GLOBAL(_tlbil_va)
-#endif
-_GLOBAL(_tlbie)
-#if defined(CONFIG_40x)
-	/* We run the search with interrupts disabled because we have to change
-	 * the PID and I don't want to preempt when that happens.
-	 */
-	mfmsr	r5
-	mfspr	r6,SPRN_PID
-	wrteei	0
-	mtspr	SPRN_PID,r4
-	tlbsx.	r3, 0, r3
-	mtspr	SPRN_PID,r6
-	wrtee	r5
-	bne	10f
-	sync
-	/* There are only 64 TLB entries, so r3 < 64, which means bit 25 is clear.
-	 * Since 25 is the V bit in the TLB_TAG, loading this value will invalidate
-	 * the TLB entry. */
-	tlbwe	r3, r3, TLB_TAG
-	isync
-10:
-
-#elif defined(CONFIG_44x)
-	mfspr	r5,SPRN_MMUCR
-	rlwimi	r5,r4,0,24,31			/* Set TID */
-
-	/* We have to run the search with interrupts disabled, even critical
-	 * and debug interrupts (in fact the only critical exceptions we have
-	 * are debug and machine check).  Otherwise  an interrupt which causes
-	 * a TLB miss can clobber the MMUCR between the mtspr and the tlbsx. */
-	mfmsr	r4
-	lis	r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@ha
-	addi	r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
-	andc	r6,r4,r6
-	mtmsr	r6
-	mtspr	SPRN_MMUCR,r5
-	tlbsx.	r3, 0, r3
-	mtmsr	r4
-	bne	10f
-	sync
-	/* There are only 64 TLB entries, so r3 < 64,
-	 * which means bit 22, is clear.  Since 22 is
-	 * the V bit in the TLB_PAGEID, loading this
-	 * value will invalidate the TLB entry.
-	 */
-	tlbwe	r3, r3, PPC44x_TLB_PAGEID
-	isync
-10:
-#elif defined(CONFIG_FSL_BOOKE)
-	rlwinm	r4, r3, 0, 0, 19
-	ori	r5, r4, 0x08	/* TLBSEL = 1 */
-	tlbivax	0, r4
-	tlbivax	0, r5
-	msync
-#if defined(CONFIG_SMP)
-	tlbsync
-#endif /* CONFIG_SMP */
-#else /* !(CONFIG_40x || CONFIG_44x || CONFIG_FSL_BOOKE) */
-#if defined(CONFIG_SMP)
-	rlwinm	r8,r1,0,0,(31-THREAD_SHIFT)
-	lwz	r8,TI_CPU(r8)
-	oris	r8,r8,11
-	mfmsr	r10
-	SYNC
-	rlwinm	r0,r10,0,17,15		/* clear bit 16 (MSR_EE) */
-	rlwinm	r0,r0,0,28,26		/* clear DR */
-	mtmsr	r0
-	SYNC_601
-	isync
-	lis	r9,mmu_hash_lock@h
-	ori	r9,r9,mmu_hash_lock@l
-	tophys(r9,r9)
-10:	lwarx	r7,0,r9
-	cmpwi	0,r7,0
-	bne-	10b
-	stwcx.	r8,0,r9
-	bne-	10b
-	eieio
-	tlbie	r3
-	sync
-	TLBSYNC
-	li	r0,0
-	stw	r0,0(r9)		/* clear mmu_hash_lock */
-	mtmsr	r10
-	SYNC_601
-	isync
-#else /* CONFIG_SMP */
-	tlbie	r3
-	sync
-#endif /* CONFIG_SMP */
-#endif /* ! CONFIG_40x */
-	blr
-
-#if defined(CONFIG_FSL_BOOKE)
-/*
- * Flush MMU TLB, but only on the local processor (no broadcast)
- */
-_GLOBAL(_tlbil_all)
-#define MMUCSR0_TLBFI	(MMUCSR0_TLB0FI | MMUCSR0_TLB1FI | \
-			 MMUCSR0_TLB2FI | MMUCSR0_TLB3FI)
-	li	r3,(MMUCSR0_TLBFI)@l
-	mtspr	SPRN_MMUCSR0, r3
-1:
-	mfspr	r3,SPRN_MMUCSR0
-	andi.	r3,r3,MMUCSR0_TLBFI@l
-	bne	1b
-	blr
-
-/*
- * Flush MMU TLB for a particular process id, but only on the local processor
- * (no broadcast)
- */
-_GLOBAL(_tlbil_pid)
-/* we currently do an invalidate all since we don't have per pid invalidate */
-	li	r3,(MMUCSR0_TLBFI)@l
-	mtspr	SPRN_MMUCSR0, r3
-1:
-	mfspr	r3,SPRN_MMUCSR0
-	andi.	r3,r3,MMUCSR0_TLBFI@l
-	bne	1b
-	msync
-	isync
-	blr
-
-/*
- * Flush MMU TLB for a particular address, but only on the local processor
- * (no broadcast)
- */
-_GLOBAL(_tlbil_va)
-	mfmsr	r10
-	wrteei	0
-	slwi	r4,r4,16
-	mtspr	SPRN_MAS6,r4		/* assume AS=0 for now */
-	tlbsx	0,r3
-	mfspr	r4,SPRN_MAS1		/* check valid */
-	andis.	r3,r4,MAS1_VALID@h
-	beqlr
-	rlwinm	r4,r4,0,1,31
-	mtspr	SPRN_MAS1,r4
-	tlbwe
-	msync
-	isync
-	wrtee	r10
-	blr
-#endif /* CONFIG_FSL_BOOKE */
-
-/*
- * Nobody implements this yet
- */
-_GLOBAL(_tlbivax_bcast)
-1:	trap
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0;
-	blr
-
 
 /*
  * Flush instruction cache.
Index: linux-work/arch/powerpc/mm/Makefile
===================================================================
--- linux-work.orig/arch/powerpc/mm/Makefile	2008-12-15 15:46:23.000000000 +1100
+++ linux-work/arch/powerpc/mm/Makefile	2008-12-15 15:46:56.000000000 +1100
@@ -9,7 +9,8 @@ endif
 obj-y				:= fault.o mem.o pgtable.o \
 				   init_$(CONFIG_WORD_SIZE).o \
 				   pgtable_$(CONFIG_WORD_SIZE).o
-obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o tlb_nohash.o
+obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o tlb_nohash.o \
+				   tlb_nohash_low.o
 hash-$(CONFIG_PPC_NATIVE)	:= hash_native_64.o
 obj-$(CONFIG_PPC64)		+= hash_utils_64.o \
 				   slb_low.o slb.o stab.o \
Index: linux-work/arch/powerpc/mm/hash_low_32.S
===================================================================
--- linux-work.orig/arch/powerpc/mm/hash_low_32.S	2008-12-15 15:46:24.000000000 +1100
+++ linux-work/arch/powerpc/mm/hash_low_32.S	2008-12-15 15:46:56.000000000 +1100
@@ -633,3 +633,79 @@ _GLOBAL(flush_hash_patch_B)
 	SYNC_601
 	isync
 	blr
+
+/*
+ * Flush an entry from the TLB
+ */
+_GLOBAL(_tlbie)
+#ifdef CONFIG_SMP
+	rlwinm	r8,r1,0,0,(31-THREAD_SHIFT)
+	lwz	r8,TI_CPU(r8)
+	oris	r8,r8,11
+	mfmsr	r10
+	SYNC
+	rlwinm	r0,r10,0,17,15		/* clear bit 16 (MSR_EE) */
+	rlwinm	r0,r0,0,28,26		/* clear DR */
+	mtmsr	r0
+	SYNC_601
+	isync
+	lis	r9,mmu_hash_lock@h
+	ori	r9,r9,mmu_hash_lock@l
+	tophys(r9,r9)
+10:	lwarx	r7,0,r9
+	cmpwi	0,r7,0
+	bne-	10b
+	stwcx.	r8,0,r9
+	bne-	10b
+	eieio
+	tlbie	r3
+	sync
+	TLBSYNC
+	li	r0,0
+	stw	r0,0(r9)		/* clear mmu_hash_lock */
+	mtmsr	r10
+	SYNC_601
+	isync
+#else /* CONFIG_SMP */
+	tlbie	r3
+	sync
+#endif /* CONFIG_SMP */
+	blr
+
+/*
+ * Flush the entire TLB. 603/603e only
+ */
+_GLOBAL(_tlbia)
+#if defined(CONFIG_SMP)
+	rlwinm	r8,r1,0,0,(31-THREAD_SHIFT)
+	lwz	r8,TI_CPU(r8)
+	oris	r8,r8,10
+	mfmsr	r10
+	SYNC
+	rlwinm	r0,r10,0,17,15		/* clear bit 16 (MSR_EE) */
+	rlwinm	r0,r0,0,28,26		/* clear DR */
+	mtmsr	r0
+	SYNC_601
+	isync
+	lis	r9,mmu_hash_lock@h
+	ori	r9,r9,mmu_hash_lock@l
+	tophys(r9,r9)
+10:	lwarx	r7,0,r9
+	cmpwi	0,r7,0
+	bne-	10b
+	stwcx.	r8,0,r9
+	bne-	10b
+	sync
+	tlbia
+	sync
+	TLBSYNC
+	li	r0,0
+	stw	r0,0(r9)		/* clear mmu_hash_lock */
+	mtmsr	r10
+	SYNC_601
+	isync
+#else /* CONFIG_SMP */
+	sync
+	tlbia
+	sync
+#endif /* CONFIG_SMP */
Index: linux-work/arch/powerpc/kvm/powerpc.c
===================================================================
--- linux-work.orig/arch/powerpc/kvm/powerpc.c	2008-12-15 15:46:24.000000000 +1100
+++ linux-work/arch/powerpc/kvm/powerpc.c	2008-12-15 15:46:56.000000000 +1100
@@ -330,7 +330,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *
 	/* XXX It would be nice to differentiate between heavyweight exit and
 	 * sched_out here, since we could avoid the TLB flush for heavyweight
 	 * exits. */
-	_tlbia();
+	_tlbil_all();
 }
 
 int kvm_arch_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu,

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 13/16] powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (11 preceding siblings ...)
  2008-12-15  5:45 ` [PATCH 12/16] powerpc/mm: Split low level tlb invalidate for nohash processors Benjamin Herrenschmidt
@ 2008-12-15  5:45 ` Benjamin Herrenschmidt
  2008-12-15 12:25   ` [PATCH 13/16] powerpc/44x: No need to mask MSR:CE,ME " Josh Boyer
  2008-12-15  5:45 ` [PATCH 14/16] powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs v2 Benjamin Herrenschmidt
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

The handlers for Critical, Machine Check or Debug interrupts
will save and restore MMUCR nowadays, thus we only need to
disable normal interrupts when invalidating TLB entries.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/mm/tlb_nohash_low.S |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

--- linux-work.orig/arch/powerpc/mm/tlb_nohash_low.S	2008-12-15 13:34:57.000000000 +1100
+++ linux-work/arch/powerpc/mm/tlb_nohash_low.S	2008-12-15 13:35:07.000000000 +1100
@@ -75,18 +75,19 @@ _GLOBAL(_tlbil_va)
 	mfspr	r5,SPRN_MMUCR
 	rlwimi	r5,r4,0,24,31			/* Set TID */
 
-	/* We have to run the search with interrupts disabled, even critical
-	 * and debug interrupts (in fact the only critical exceptions we have
-	 * are debug and machine check).  Otherwise  an interrupt which causes
-	 * a TLB miss can clobber the MMUCR between the mtspr and the tlbsx. */
+	/* We have to run the search with interrupts disabled, otherwise
+	 * an interrupt which causes a TLB miss can clobber the MMUCR
+	 * between the mtspr and the tlbsx.
+	 *
+	 * Critical and Machine Check interrupts take care of saving
+	 * and restoring MMUCR, so only normal interrupts have to be
+	 * taken care of.
+	 */
 	mfmsr	r4
-	lis	r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@ha
-	addi	r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
-	andc	r6,r4,r6
-	mtmsr	r6
+	wrteei	0
 	mtspr	SPRN_MMUCR,r5
 	tlbsx.	r3, 0, r3
-	mtmsr	r4
+	wrtee	r4
 	bne	1f
 	sync
 	/* There are only 64 TLB entries, so r3 < 64,

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 14/16] powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs v2
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (12 preceding siblings ...)
  2008-12-15  5:45 ` [PATCH 13/16] powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 Benjamin Herrenschmidt
@ 2008-12-15  5:45 ` Benjamin Herrenschmidt
  2008-12-17 21:21   ` Kumar Gala
  2008-12-15  5:45 ` [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Benjamin Herrenschmidt
  2008-12-15  5:45 ` [PATCH 16/16] powerpc/44x: 44x TLB doesn't need "Guarded" set for all pages Benjamin Herrenschmidt
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

This makes the MMU context code used for CPUs with no hash table
(except 603) dynamically allocate the various maps used to track
the state of contexts.

Only the main free map and CPU 0 stale map are allocated at boot
time. Other CPU maps are allocated when those CPUs are brought up
and freed if they are unplugged.

This also moves the initialization of the MMU context management
slightly later during the boot process, which should be fine as
it's really only needed when userland if first started anyways.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2. rebased and add some more debug

 arch/powerpc/kernel/setup_32.c       |    5 +
 arch/powerpc/mm/init_32.c            |    4 
 arch/powerpc/mm/mmu_context_nohash.c |  161 ++++++++++++++++++++++++-----------
 3 files changed, 116 insertions(+), 54 deletions(-)

--- linux-work.orig/arch/powerpc/mm/mmu_context_nohash.c	2008-12-11 14:57:50.000000000 +1100
+++ linux-work/arch/powerpc/mm/mmu_context_nohash.c	2008-12-12 17:28:31.000000000 +1100
@@ -28,54 +28,30 @@
 #undef DEBUG
 #define DEBUG_STEAL_ONLY
 #undef DEBUG_MAP_CONSISTENCY
+/*#define DEBUG_CLAMP_LAST_CONTEXT   15 */
 
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
 
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
-#include <linux/spinlock.h>
-
-/*
- *   The MPC8xx has only 16 contexts.  We rotate through them on each
- * task switch.  A better way would be to keep track of tasks that
- * own contexts, and implement an LRU usage.  That way very active
- * tasks don't always have to pay the TLB reload overhead.  The
- * kernel pages are mapped shared, so the kernel can run on behalf
- * of any task that makes a kernel entry.  Shared does not mean they
- * are not protected, just that the ASID comparison is not performed.
- *      -- Dan
- *
- * The IBM4xx has 256 contexts, so we can just rotate through these
- * as a way of "switching" contexts.  If the TID of the TLB is zero,
- * the PID/TID comparison is disabled, so we can use a TID of zero
- * to represent all kernel pages as shared among all contexts.
- * 	-- Dan
- */
-
-#ifdef CONFIG_8xx
-#define LAST_CONTEXT    	15
-#define FIRST_CONTEXT    	0
-
-#elif defined(CONFIG_4xx)
-#define LAST_CONTEXT    	255
-#define FIRST_CONTEXT    	1
-
-#elif defined(CONFIG_E200) || defined(CONFIG_E500)
-#define LAST_CONTEXT    	255
-#define FIRST_CONTEXT    	1
-
-#else
-#error Unsupported processor type
-#endif
 
+static unsigned int first_context, last_context;
 static unsigned int next_context, nr_free_contexts;
-static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
-static unsigned long stale_map[NR_CPUS][LAST_CONTEXT / BITS_PER_LONG + 1];
-static struct mm_struct *context_mm[LAST_CONTEXT+1];
+static unsigned long *context_map;
+static unsigned long *stale_map[NR_CPUS];
+static struct mm_struct **context_mm;
 static spinlock_t context_lock = SPIN_LOCK_UNLOCKED;
 
+#define CTX_MAP_SIZE	\
+	(sizeof(unsigned long) * (last_context / BITS_PER_LONG + 1))
+
+
 /* Steal a context from a task that has one at the moment.
  * This is only used on 8xx and 4xx and we presently assume that
  * they don't do SMP.  If they do then this will have to check
@@ -100,7 +76,7 @@ static unsigned int steal_context_smp(un
 	unsigned int cpu, max;
 
  again:
-	max = LAST_CONTEXT - FIRST_CONTEXT;
+	max = last_context - first_context;
 
 	/* Attempt to free next_context first and then loop until we manage */
 	while (max--) {
@@ -112,8 +88,8 @@ static unsigned int steal_context_smp(un
 		 */
 		if (mm->context.active) {
 			id ++;
-			if (id > LAST_CONTEXT)
-				id = FIRST_CONTEXT;
+			if (id > last_context)
+				id = first_context;
 			continue;
 		}
 		pr_debug("[%d] steal context %d from mm @%p\n",
@@ -171,7 +147,7 @@ static void context_check_map(void)
 	unsigned int id, nrf, nact;
 
 	nrf = nact = 0;
-	for (id = FIRST_CONTEXT; id <= LAST_CONTEXT; id++) {
+	for (id = first_context; id <= last_context; id++) {
 		int used = test_bit(id, context_map);
 		if (!used)
 			nrf++;
@@ -189,6 +165,8 @@ static void context_check_map(void)
 	if (nact > num_online_cpus())
 		pr_err("MMU: More active contexts than CPUs ! (%d vs %d)\n",
 		       nact, num_online_cpus());
+	if (first_context > 0 && !test_bit(0, context_map))
+		pr_err("MMU: Context 0 has been freed !!!\n");
 }
 #else
 static void context_check_map(void) { }
@@ -211,6 +189,10 @@ void switch_mmu_context(struct mm_struct
 	/* Mark us active and the previous one not anymore */
 	next->context.active++;
 	if (prev) {
+#ifndef DEBUG_STEAL_ONLY
+		pr_debug(" old context %p active was: %d\n",
+			 prev, prev->context.active);
+#endif
 		WARN_ON(prev->context.active < 1);
 		prev->context.active--;
 	}
@@ -223,8 +205,8 @@ void switch_mmu_context(struct mm_struct
 
 	/* We really don't have a context, let's try to acquire one */
 	id = next_context;
-	if (id > LAST_CONTEXT)
-		id = FIRST_CONTEXT;
+	if (id > last_context)
+		id = first_context;
 	map = context_map;
 
 	/* No more free contexts, let's try to steal one */
@@ -242,9 +224,9 @@ void switch_mmu_context(struct mm_struct
 
 	/* We know there's at least one free context, try to find it */
 	while (__test_and_set_bit(id, map)) {
-		id = find_next_zero_bit(map, LAST_CONTEXT+1, id);
-		if (id > LAST_CONTEXT)
-			id = FIRST_CONTEXT;
+		id = find_next_zero_bit(map, last_context+1, id);
+		if (id > last_context)
+			id = first_context;
 	}
  stolen:
 	next_context = id + 1;
@@ -313,6 +295,42 @@ void destroy_context(struct mm_struct *m
 	spin_unlock(&context_lock);
 }
 
+#ifdef CONFIG_SMP
+
+static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self,
+					    unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned int)(long)hcpu;
+
+	/* We don't touch CPU 0 map, it's allocated at aboot and kept
+	 * around forever
+	 */
+	if (cpu == 0)
+		return NOTIFY_OK;
+
+	switch (action) {
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		pr_debug("MMU: Allocating stale context map for CPU %d\n", cpu);
+		stale_map[cpu] = kzalloc(CTX_MAP_SIZE, GFP_KERNEL);
+		break;
+#ifdef CONFIG_HOTPLUG_CPU
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		pr_debug("MMU: Freeing stale context map for CPU %d\n", cpu);
+		kfree(stale_map[cpu]);
+		stale_map[cpu] = NULL;
+		break;
+#endif
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata mmu_context_cpu_nb = {
+	.notifier_call	= mmu_context_cpu_notify,
+};
+
+#endif /* CONFIG_SMP */
 
 /*
  * Initialize the context management stuff.
@@ -326,13 +344,56 @@ void __init mmu_context_init(void)
 	init_mm.context.active = NR_CPUS;
 
 	/*
+	 *   The MPC8xx has only 16 contexts.  We rotate through them on each
+	 * task switch.  A better way would be to keep track of tasks that
+	 * own contexts, and implement an LRU usage.  That way very active
+	 * tasks don't always have to pay the TLB reload overhead.  The
+	 * kernel pages are mapped shared, so the kernel can run on behalf
+	 * of any task that makes a kernel entry.  Shared does not mean they
+	 * are not protected, just that the ASID comparison is not performed.
+	 *      -- Dan
+	 *
+	 * The IBM4xx has 256 contexts, so we can just rotate through these
+	 * as a way of "switching" contexts.  If the TID of the TLB is zero,
+	 * the PID/TID comparison is disabled, so we can use a TID of zero
+	 * to represent all kernel pages as shared among all contexts.
+	 * 	-- Dan
+	 */
+	if (mmu_has_feature(MMU_FTR_TYPE_8xx)) {
+		first_context = 0;
+		last_context = 15;
+	} else {
+		first_context = 1;
+		last_context = 255;
+	}
+
+#ifdef DEBUG_CLAMP_LAST_CONTEXT
+	last_context = DEBUG_CLAMP_LAST_CONTEXT;
+#endif
+	/*
+	 * Allocate the maps used by context management
+	 */
+	context_map = alloc_bootmem(CTX_MAP_SIZE);
+	context_mm = alloc_bootmem(sizeof(void *) * (last_context + 1));
+	stale_map[0] = alloc_bootmem(CTX_MAP_SIZE);
+
+#ifdef CONFIG_SMP
+	register_cpu_notifier(&mmu_context_cpu_nb);
+#endif
+
+	printk(KERN_INFO
+	       "MMU: Allocated %d bytes of context maps for %d contexts\n",
+	       2 * CTX_MAP_SIZE + (sizeof(void *) * (last_context + 1)),
+	       last_context - first_context + 1);
+
+	/*
 	 * Some processors have too few contexts to reserve one for
 	 * init_mm, and require using context 0 for a normal task.
 	 * Other processors reserve the use of context zero for the kernel.
-	 * This code assumes FIRST_CONTEXT < 32.
+	 * This code assumes first_context < 32.
 	 */
-	context_map[0] = (1 << FIRST_CONTEXT) - 1;
-	next_context = FIRST_CONTEXT;
-	nr_free_contexts = LAST_CONTEXT - FIRST_CONTEXT + 1;
+	context_map[0] = (1 << first_context) - 1;
+	next_context = first_context;
+	nr_free_contexts = last_context - first_context + 1;
 }
 
Index: linux-work/arch/powerpc/kernel/setup_32.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/setup_32.c	2008-12-11 14:57:58.000000000 +1100
+++ linux-work/arch/powerpc/kernel/setup_32.c	2008-12-12 17:26:58.000000000 +1100
@@ -38,6 +38,7 @@
 #include <asm/time.h>
 #include <asm/serial.h>
 #include <asm/udbg.h>
+#include <asm/mmu_context.h>
 
 #include "setup.h"
 
@@ -330,4 +331,8 @@ void __init setup_arch(char **cmdline_p)
 	if ( ppc_md.progress ) ppc_md.progress("arch: exit", 0x3eab);
 
 	paging_init();
+
+	/* Initialize the MMU context management stuff */
+	mmu_context_init();
+
 }
Index: linux-work/arch/powerpc/mm/init_32.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/init_32.c	2008-12-11 14:57:41.000000000 +1100
+++ linux-work/arch/powerpc/mm/init_32.c	2008-12-11 14:58:01.000000000 +1100
@@ -35,7 +35,6 @@
 #include <asm/pgalloc.h>
 #include <asm/prom.h>
 #include <asm/io.h>
-#include <asm/mmu_context.h>
 #include <asm/pgtable.h>
 #include <asm/mmu.h>
 #include <asm/smp.h>
@@ -180,9 +179,6 @@ void __init MMU_init(void)
 	if (ppc_md.progress)
 		ppc_md.progress("MMU:setio", 0x302);
 
-	/* Initialize the context management stuff */
-	mmu_context_init();
-
 	if (ppc_md.progress)
 		ppc_md.progress("MMU:exit", 0x211);
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (13 preceding siblings ...)
  2008-12-15  5:45 ` [PATCH 14/16] powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs v2 Benjamin Herrenschmidt
@ 2008-12-15  5:45 ` Benjamin Herrenschmidt
  2008-12-15 20:54   ` Kumar Gala
  2008-12-15  5:45 ` [PATCH 16/16] powerpc/44x: 44x TLB doesn't need "Guarded" set for all pages Benjamin Herrenschmidt
  15 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit. We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.

This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it. The hash code clears it if the feature bit is not set.

It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.

This should help having the PTE actually containing the bit
combinations that we really want.

I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss. I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/include/asm/pgtable-ppc32.h |   42 ++++++++++++-------------------
 arch/powerpc/include/asm/pgtable-ppc64.h |   13 ---------
 arch/powerpc/include/asm/pgtable.h       |   26 +++++++++++++++++++
 arch/powerpc/kernel/head_44x.S           |    1 
 arch/powerpc/kernel/pci-common.c         |   24 ++++++-----------
 arch/powerpc/mm/hash_low_32.S            |    4 +-
 arch/powerpc/mm/mem.c                    |    4 +-
 arch/powerpc/platforms/cell/spufs/file.c |   27 ++++++-------------
 drivers/video/controlfb.c                |    4 +-
 9 files changed, 68 insertions(+), 77 deletions(-)

--- linux-work.orig/arch/powerpc/include/asm/pgtable-ppc32.h	2008-11-24 14:48:55.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/pgtable-ppc32.h	2008-12-15 15:34:16.000000000 +1100
@@ -228,9 +228,10 @@ extern int icache_44x_need_flush;
  *   - FILE *must* be in the bottom three bits because swap cache
  *     entries use the top 29 bits for TLB2.
  *
- *   - CACHE COHERENT bit (M) has no effect on PPC440 core, because it
- *     doesn't support SMP. So we can use this as software bit, like
- *     DIRTY.
+ *   - CACHE COHERENT bit (M) has no effect on original PPC440 cores,
+ *     because it doesn't support SMP. However, some later 460 variants
+ *     have -some- form of SMP support and so I keep the bit there for
+ *     future use
  *
  * With the PPC 44x Linux implementation, the 0-11th LSBs of the PTE are used
  * for memory protection related functions (see PTE structure in
@@ -436,20 +437,23 @@ extern int icache_44x_need_flush;
 			 _PAGE_USER | _PAGE_ACCESSED | \
 			 _PAGE_RW | _PAGE_HWWRITE | _PAGE_DIRTY | \
 			 _PAGE_EXEC | _PAGE_HWEXEC)
+
 /*
- * Note: the _PAGE_COHERENT bit automatically gets set in the hardware
- * PTE if CONFIG_SMP is defined (hash_page does this); there is no need
- * to have it in the Linux PTE, and in fact the bit could be reused for
- * another purpose.  -- paulus.
+ * We define 2 sets of base prot bits, one for basic pages (ie,
+ * cacheable kernel and user pages) and one for non cacheable
+ * pages. We always set _PAGE_COHERENT when SMP is enabled or
+ * the processor might need it for DMA coherency.
  */
-
-#ifdef CONFIG_44x
-#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_GUARDED)
+#if defined(CONFIG_SMP) || defined(CONFIG_PPC_STD_MMU)
+#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT)
 #else
 #define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED)
 #endif
+#define _PAGE_BASE_NC	(_PAGE_PRESENT | _PAGE_ACCESSED)
+
 #define _PAGE_WRENABLE	(_PAGE_RW | _PAGE_DIRTY | _PAGE_HWWRITE)
 #define _PAGE_KERNEL	(_PAGE_BASE | _PAGE_SHARED | _PAGE_WRENABLE)
+#define _PAGE_KERNEL_NC	(_PAGE_BASE_NC | _PAGE_SHARED | _PAGE_WRENABLE | _PAGE_NO_CACHE)
 
 #ifdef CONFIG_PPC_STD_MMU
 /* On standard PPC MMU, no user access implies kernel read/write access,
@@ -459,7 +463,7 @@ extern int icache_44x_need_flush;
 #define _PAGE_KERNEL_RO	(_PAGE_BASE | _PAGE_SHARED)
 #endif
 
-#define _PAGE_IO	(_PAGE_KERNEL | _PAGE_NO_CACHE | _PAGE_GUARDED)
+#define _PAGE_IO	(_PAGE_KERNEL_NC | _PAGE_GUARDED)
 #define _PAGE_RAM	(_PAGE_KERNEL | _PAGE_HWEXEC)
 
 #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\
@@ -552,9 +556,6 @@ static inline int pte_young(pte_t pte)		
 static inline int pte_file(pte_t pte)		{ return pte_val(pte) & _PAGE_FILE; }
 static inline int pte_special(pte_t pte)	{ return pte_val(pte) & _PAGE_SPECIAL; }
 
-static inline void pte_uncache(pte_t pte)       { pte_val(pte) |= _PAGE_NO_CACHE; }
-static inline void pte_cache(pte_t pte)         { pte_val(pte) &= ~_PAGE_NO_CACHE; }
-
 static inline pte_t pte_wrprotect(pte_t pte) {
 	pte_val(pte) &= ~(_PAGE_RW | _PAGE_HWWRITE); return pte; }
 static inline pte_t pte_mkclean(pte_t pte) {
@@ -693,10 +694,11 @@ static inline void __set_pte_at(struct m
 #endif
 }
 
+
 static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, pte_t pte)
 {
-#if defined(CONFIG_PTE_64BIT) && defined(CONFIG_SMP)
+#if defined(CONFIG_PTE_64BIT) && defined(CONFIG_SMP) && defined(CONFIG_DEBUG_VM)
 	WARN_ON(pte_present(*ptep));
 #endif
 	__set_pte_at(mm, addr, ptep, pte);
@@ -760,16 +762,6 @@ static inline void __ptep_set_access_fla
 	__changed;							   \
 })
 
-/*
- * Macro to mark a page protection value as "uncacheable".
- */
-#define pgprot_noncached(prot)	(__pgprot(pgprot_val(prot) | _PAGE_NO_CACHE | _PAGE_GUARDED))
-
-struct file;
-extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
-				     unsigned long size, pgprot_t vma_prot);
-#define __HAVE_PHYS_MEM_ACCESS_PROT
-
 #define __HAVE_ARCH_PTE_SAME
 #define pte_same(A,B)	(((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
 
Index: linux-work/arch/powerpc/kernel/head_44x.S
===================================================================
--- linux-work.orig/arch/powerpc/kernel/head_44x.S	2008-09-29 10:17:03.000000000 +1000
+++ linux-work/arch/powerpc/kernel/head_44x.S	2008-12-15 15:32:09.000000000 +1100
@@ -570,6 +570,7 @@ finish_tlb_load:
 	rlwimi	r10,r12,29,30,30		/* DIRTY -> SW position */
 	and	r11,r12,r10			/* Mask PTE bits to keep */
 	andi.	r10,r12,_PAGE_USER		/* User page ? */
+	ori	r11,r11,_PAGE_GUARDED		/* 440 errata, needs G set */
 	beq	1f				/* nope, leave U bits empty */
 	rlwimi	r11,r11,3,26,28			/* yes, copy S bits to U */
 1:	tlbwe	r11,r13,PPC44x_TLB_ATTRIB	/* Write ATTRIB */
Index: linux-work/arch/powerpc/mm/hash_low_32.S
===================================================================
--- linux-work.orig/arch/powerpc/mm/hash_low_32.S	2008-12-15 14:37:33.000000000 +1100
+++ linux-work/arch/powerpc/mm/hash_low_32.S	2008-12-15 14:38:00.000000000 +1100
@@ -323,8 +323,8 @@ _GLOBAL(create_hpte)
 	ori	r8,r8,0xe14		/* clear out reserved bits and M */
 	andc	r8,r5,r8		/* PP = user? (rw&dirty? 2: 3): 0 */
 BEGIN_FTR_SECTION
-	ori	r8,r8,_PAGE_COHERENT	/* set M (coherence required) */
-END_FTR_SECTION_IFSET(CPU_FTR_NEED_COHERENT)
+	rlwinm	r8,r8,0,~_PAGE_COHERENT	/* clear M (coherence not required) */
+END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 #ifdef CONFIG_PTE_64BIT
 	/* Put the XPN bits into the PTE */
 	rlwimi	r8,r10,8,20,22
Index: linux-work/arch/powerpc/include/asm/pgtable-ppc64.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/pgtable-ppc64.h	2008-11-24 14:48:55.000000000 +1100
+++ linux-work/arch/powerpc/include/asm/pgtable-ppc64.h	2008-12-15 14:38:00.000000000 +1100
@@ -245,9 +245,6 @@ static inline int pte_young(pte_t pte) {
 static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;}
 static inline int pte_special(pte_t pte) { return pte_val(pte) & _PAGE_SPECIAL; }
 
-static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; }
-static inline void pte_cache(pte_t pte)   { pte_val(pte) &= ~_PAGE_NO_CACHE; }
-
 static inline pte_t pte_wrprotect(pte_t pte) {
 	pte_val(pte) &= ~(_PAGE_RW); return pte; }
 static inline pte_t pte_mkclean(pte_t pte) {
@@ -405,16 +402,6 @@ static inline void __ptep_set_access_fla
 	__changed;							   \
 })
 
-/*
- * Macro to mark a page protection value as "uncacheable".
- */
-#define pgprot_noncached(prot)	(__pgprot(pgprot_val(prot) | _PAGE_NO_CACHE | _PAGE_GUARDED))
-
-struct file;
-extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
-				     unsigned long size, pgprot_t vma_prot);
-#define __HAVE_PHYS_MEM_ACCESS_PROT
-
 #define __HAVE_ARCH_PTE_SAME
 #define pte_same(A,B)	(((pte_val(A) ^ pte_val(B)) & ~_PAGE_HPTEFLAGS) == 0)
 
Index: linux-work/arch/powerpc/include/asm/pgtable.h
===================================================================
--- linux-work.orig/arch/powerpc/include/asm/pgtable.h	2008-09-29 14:21:37.000000000 +1000
+++ linux-work/arch/powerpc/include/asm/pgtable.h	2008-12-15 14:38:00.000000000 +1100
@@ -16,6 +16,32 @@ struct mm_struct;
 #endif
 
 #ifndef __ASSEMBLY__
+
+/*
+ * Macro to mark a page protection value as "uncacheable".
+ */
+
+#define _PAGE_CACHE_CTL	(_PAGE_COHERENT | _PAGE_COHERENT | _PAGE_COHERENT | \
+ 			 _PAGE_WRITETHRU)
+
+#define pgprot_noncached(prot)	  (__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) | \
+				            _PAGE_NO_CACHE | _PAGE_GUARDED))
+
+#define pgprot_noncached_wc(prot) (__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) | \
+				            _PAGE_NO_CACHE))
+
+#define pgprot_cached(prot)       (__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) | \
+				            _PAGE_COHERENT))
+
+#define pgprot_cached_wthru(prot) (__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) | \
+				            _PAGE_COHERENT | _PAGE_WRITETHRU))
+
+
+struct file;
+extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+				     unsigned long size, pgprot_t vma_prot);
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
Index: linux-work/arch/powerpc/kernel/pci-common.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/pci-common.c	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/kernel/pci-common.c	2008-12-15 14:38:00.000000000 +1100
@@ -372,13 +372,10 @@ static pgprot_t __pci_mmap_set_pgprot(st
 	}
 
 	/* XXX would be nice to have a way to ask for write-through */
-	prot |= _PAGE_NO_CACHE;
 	if (write_combine)
-		prot &= ~_PAGE_GUARDED;
+		return pgprot_noncached_wc(prot);
 	else
-		prot |= _PAGE_GUARDED;
-
-	return __pgprot(prot);
+		return pgprot_noncached(prot);
 }
 
 /*
@@ -389,19 +386,17 @@ static pgprot_t __pci_mmap_set_pgprot(st
 pgprot_t pci_phys_mem_access_prot(struct file *file,
 				  unsigned long pfn,
 				  unsigned long size,
-				  pgprot_t protection)
+				  pgprot_t prot)
 {
 	struct pci_dev *pdev = NULL;
 	struct resource *found = NULL;
-	unsigned long prot = pgprot_val(protection);
 	resource_size_t offset = ((resource_size_t)pfn) << PAGE_SHIFT;
 	int i;
 
 	if (page_is_ram(pfn))
-		return __pgprot(prot);
-
-	prot |= _PAGE_NO_CACHE | _PAGE_GUARDED;
+		return prot;
 
+	prot = pgprot_noncached(prot);
 	for_each_pci_dev(pdev) {
 		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
 			struct resource *rp = &pdev->resource[i];
@@ -422,14 +417,14 @@ pgprot_t pci_phys_mem_access_prot(struct
 	}
 	if (found) {
 		if (found->flags & IORESOURCE_PREFETCH)
-			prot &= ~_PAGE_GUARDED;
+			prot = pgprot_noncached_wc(prot);
 		pci_dev_put(pdev);
 	}
 
 	pr_debug("PCI: Non-PCI map for %llx, prot: %lx\n",
-		 (unsigned long long)offset, prot);
+		 (unsigned long long)offset, pgprot_val(prot));
 
-	return __pgprot(prot);
+	return prot;
 }
 
 
@@ -585,8 +580,7 @@ int pci_mmap_legacy_page_range(struct pc
 	pr_debug(" -> mapping phys %llx\n", (unsigned long long)offset);
 
 	vma->vm_pgoff = offset >> PAGE_SHIFT;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE | _PAGE_GUARDED);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       vma->vm_end - vma->vm_start,
 			       vma->vm_page_prot);
Index: linux-work/arch/powerpc/mm/mem.c
===================================================================
--- linux-work.orig/arch/powerpc/mm/mem.c	2008-12-15 14:36:20.000000000 +1100
+++ linux-work/arch/powerpc/mm/mem.c	2008-12-15 14:38:00.000000000 +1100
@@ -102,8 +102,8 @@ pgprot_t phys_mem_access_prot(struct fil
 		return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
 
 	if (!page_is_ram(pfn))
-		vma_prot = __pgprot(pgprot_val(vma_prot)
-				    | _PAGE_GUARDED | _PAGE_NO_CACHE);
+		vma_prot = pgprot_noncached(vma_prot);
+
 	return vma_prot;
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
Index: linux-work/arch/powerpc/platforms/cell/spufs/file.c
===================================================================
--- linux-work.orig/arch/powerpc/platforms/cell/spufs/file.c	2008-12-08 15:40:33.000000000 +1100
+++ linux-work/arch/powerpc/platforms/cell/spufs/file.c	2008-12-15 14:38:00.000000000 +1100
@@ -273,12 +273,10 @@ spufs_mem_mmap_fault(struct vm_area_stru
 		return VM_FAULT_NOPAGE;
 
 	if (ctx->state == SPU_STATE_SAVED) {
-		vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-							& ~_PAGE_NO_CACHE);
+		vma->vm_page_prot = pgprot_cached(vma->vm_page_prot);
 		pfn = vmalloc_to_pfn(ctx->csa.lscsa->ls + offset);
 	} else {
-		vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-					     | _PAGE_NO_CACHE);
+		vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot);
 		pfn = (ctx->spu->local_store_phys + offset) >> PAGE_SHIFT;
 	}
 	vm_insert_pfn(vma, address, pfn);
@@ -338,8 +336,7 @@ static int spufs_mem_mmap(struct file *f
 		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE);
+	vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot);
 
 	vma->vm_ops = &spufs_mem_mmap_vmops;
 	return 0;
@@ -452,8 +449,7 @@ static int spufs_cntl_mmap(struct file *
 		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE | _PAGE_GUARDED);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	vma->vm_ops = &spufs_cntl_mmap_vmops;
 	return 0;
@@ -1155,8 +1151,7 @@ static int spufs_signal1_mmap(struct fil
 		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE | _PAGE_GUARDED);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	vma->vm_ops = &spufs_signal1_mmap_vmops;
 	return 0;
@@ -1292,8 +1287,7 @@ static int spufs_signal2_mmap(struct fil
 		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE | _PAGE_GUARDED);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	vma->vm_ops = &spufs_signal2_mmap_vmops;
 	return 0;
@@ -1414,8 +1408,7 @@ static int spufs_mss_mmap(struct file *f
 		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE | _PAGE_GUARDED);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	vma->vm_ops = &spufs_mss_mmap_vmops;
 	return 0;
@@ -1476,8 +1469,7 @@ static int spufs_psmap_mmap(struct file 
 		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE | _PAGE_GUARDED);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	vma->vm_ops = &spufs_psmap_mmap_vmops;
 	return 0;
@@ -1536,8 +1528,7 @@ static int spufs_mfc_mmap(struct file *f
 		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
-	vma->vm_page_prot = __pgprot(pgprot_val(vma->vm_page_prot)
-				     | _PAGE_NO_CACHE | _PAGE_GUARDED);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	vma->vm_ops = &spufs_mfc_mmap_vmops;
 	return 0;
Index: linux-work/drivers/video/controlfb.c
===================================================================
--- linux-work.orig/drivers/video/controlfb.c	2008-07-07 13:45:06.000000000 +1000
+++ linux-work/drivers/video/controlfb.c	2008-12-15 14:38:00.000000000 +1100
@@ -298,10 +298,10 @@ static int controlfb_mmap(struct fb_info
                        return -EINVAL;
                start = info->fix.mmio_start;
                len = PAGE_ALIGN((start & ~PAGE_MASK)+info->fix.mmio_len);
-               pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE|_PAGE_GUARDED;
+	       vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
        } else {
                /* framebuffer */
-               pgprot_val(vma->vm_page_prot) |= _PAGE_WRITETHRU;
+	       vma->vm_page_prot = pgprot_cached_wthru(vma->vm_page_prot);
        }
        start &= PAGE_MASK;
        if ((vma->vm_end - vma->vm_start + off) > len)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 16/16] powerpc/44x: 44x TLB doesn't need "Guarded" set for all pages
  2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
                   ` (14 preceding siblings ...)
  2008-12-15  5:45 ` [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Benjamin Herrenschmidt
@ 2008-12-15  5:45 ` Benjamin Herrenschmidt
  15 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  5:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

After discussing with chip designers, it appears that it's not
necessary to set G everywhere on 440 cores. The various core
errata related to prefetch should be sorted out by firmware by
disabling icache prefetching in CCR0. We add the workaround to
the kernel however just in case oooold firmwares don't do it.

This is valid for -all- 4xx core variants. Later ones hard wire
the absence of prefetch but it doesn't harm to clear the bits
in CCR0 (they should already be cleared anyway).

We still leave G=1 on the linear mapping for now, we need to
stop over-mapping RAM to be able to remove it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/kernel/head_44x.S |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- linux-work.orig/arch/powerpc/kernel/head_44x.S	2008-12-10 16:11:35.000000000 +1100
+++ linux-work/arch/powerpc/kernel/head_44x.S	2008-12-10 16:29:08.000000000 +1100
@@ -69,6 +69,17 @@ _ENTRY(_start);
 	li	r24,0		/* CPU number */
 
 /*
+ * In case the firmware didn't do it, we apply some workarounds
+ * that are good for all 440 core variants here
+ */
+	mfspr	r3,SPRN_CCR0
+	rlwinm	r3,r3,0,0,27	/* disable icache prefetch */
+	isync
+	mtspr	SPRN_CCR0,r3
+	isync
+	sync
+
+/*
  * Set up the initial MMU state
  *
  * We are still executing code at the virtual address
@@ -570,7 +581,6 @@ finish_tlb_load:
 	rlwimi	r10,r12,29,30,30		/* DIRTY -> SW position */
 	and	r11,r12,r10			/* Mask PTE bits to keep */
 	andi.	r10,r12,_PAGE_USER		/* User page ? */
-	ori	r11,r11,_PAGE_GUARDED		/* 440 errata, needs G set */
 	beq	1f				/* nope, leave U bits empty */
 	rlwimi	r11,r11,3,26,28			/* yes, copy S bits to U */
 1:	tlbwe	r11,r13,PPC44x_TLB_ATTRIB	/* Write ATTRIB */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/16] powerpc/fsl-booke: Fix problem with _tlbil_va
  2008-12-15  5:44 ` [PATCH 4/16] powerpc/fsl-booke: Fix problem with _tlbil_va Benjamin Herrenschmidt
@ 2008-12-15  6:59   ` Stephen Rothwell
  2008-12-15  7:04     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Stephen Rothwell @ 2008-12-15  6:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Kumar Gala

[-- Attachment #1: Type: text/plain, Size: 366 bytes --]

Hi Ben,

On Mon, 15 Dec 2008 16:44:21 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> From: Kumar Gala <galak@kernel.crashing.org>
> 
> An example calling sequence which we did see:

This one is already in Linus' tree as of today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/16] powerpc/fsl-booke: Fix problem with _tlbil_va
  2008-12-15  6:59   ` Stephen Rothwell
@ 2008-12-15  7:04     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15  7:04 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, Kumar Gala

On Mon, 2008-12-15 at 17:59 +1100, Stephen Rothwell wrote:
> Hi Ben,
> 
> On Mon, 15 Dec 2008 16:44:21 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >
> > From: Kumar Gala <galak@kernel.crashing.org>
> > 
> > An example calling sequence which we did see:
> 
> This one is already in Linus' tree as of today.

Ah indeed, it wasn't in powerpc yet which is why I left in there, since
that's what my series is based on.

I expect a few of those near the top of the pile to also go separate
ways via Kumar or Josh...

I grouped all them to make the dependency chain clear and because that
way it actually builds on top of today's powerpc master :-)

Once we are passed reviews etc... we can always sort out the details on
how to merge the various bits. Hopefully soon since it's now getting
some fairly good testing by Kumar and I and some of them already has
-some- amount of review.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 13/16] powerpc/44x: No need to mask MSR:CE,ME or DE in _tlbil_va on 440
  2008-12-15  5:45 ` [PATCH 13/16] powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 Benjamin Herrenschmidt
@ 2008-12-15 12:25   ` Josh Boyer
  0 siblings, 0 replies; 41+ messages in thread
From: Josh Boyer @ 2008-12-15 12:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Kumar Gala

On Mon, 15 Dec 2008 16:45:05 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> The handlers for Critical, Machine Check or Debug interrupts
> will save and restore MMUCR nowadays, thus we only need to
> disable normal interrupts when invalidating TLB entries.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>

> ---
> 
>  arch/powerpc/mm/tlb_nohash_low.S |   19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> --- linux-work.orig/arch/powerpc/mm/tlb_nohash_low.S	2008-12-15 13:34:57.000000000 +1100
> +++ linux-work/arch/powerpc/mm/tlb_nohash_low.S	2008-12-15 13:35:07.000000000 +1100
> @@ -75,18 +75,19 @@ _GLOBAL(_tlbil_va)
>  	mfspr	r5,SPRN_MMUCR
>  	rlwimi	r5,r4,0,24,31			/* Set TID */
> 
> -	/* We have to run the search with interrupts disabled, even critical
> -	 * and debug interrupts (in fact the only critical exceptions we have
> -	 * are debug and machine check).  Otherwise  an interrupt which causes
> -	 * a TLB miss can clobber the MMUCR between the mtspr and the tlbsx. */
> +	/* We have to run the search with interrupts disabled, otherwise
> +	 * an interrupt which causes a TLB miss can clobber the MMUCR
> +	 * between the mtspr and the tlbsx.
> +	 *
> +	 * Critical and Machine Check interrupts take care of saving
> +	 * and restoring MMUCR, so only normal interrupts have to be
> +	 * taken care of.
> +	 */
>  	mfmsr	r4
> -	lis	r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@ha
> -	addi	r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
> -	andc	r6,r4,r6
> -	mtmsr	r6
> +	wrteei	0
>  	mtspr	SPRN_MMUCR,r5
>  	tlbsx.	r3, 0, r3
> -	mtmsr	r4
> +	wrtee	r4
>  	bne	1f
>  	sync
>  	/* There are only 64 TLB entries, so r3 < 64,

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/16] powerpc/mm: Split mmu_context handling v3
  2008-12-15  5:44 ` [PATCH 6/16] powerpc/mm: Split mmu_context handling v3 Benjamin Herrenschmidt
@ 2008-12-15 15:43   ` Arnd Bergmann
  2008-12-15 20:20     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Arnd Bergmann @ 2008-12-15 15:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Kumar Gala

On Monday 15 December 2008, Benjamin Herrenschmidt wrote:
> +/*
> + * On 32-bit PowerPC 6xx/7xx/7xxx CPUs, we use a set of 16 VSIDs
> + * (virtual segment identifiers) for each context.  Although the
> + * hardware supports 24-bit VSIDs, and thus >1 million contexts,
> + * we only use 32,768 of them.  That is ample, since there can be
> + * at most around 30,000 tasks in the system anyway, and it means
> + * that we can use a bitmap to indicate which contexts are in use.
> + * Using a bitmap means that we entirely avoid all of the problems
> + * that we used to have when the context number overflowed,
> + * particularly on SMP systems.
> + *  -- paulus.
> + */

Didn't we lift the limit to 30,000 tasks at some point? The comment
in linux/threads.h mentions that the PID space goes up to 4 million.
What does actually happen when we increase pid_max to beyond 32768
on those systems and try to use them? Is there another limit in place?

	Arnd <><

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15  5:44 ` [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3 Benjamin Herrenschmidt
@ 2008-12-15 20:19   ` Kumar Gala
  2008-12-15 20:46     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 20:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 14, 2008, at 11:44 PM, Benjamin Herrenschmidt wrote:

> This patch moves the whole no-hash TLB handling out of line into a
> new tlb_nohash.c file, and implements some basic SMP support using
> IPIs and/or broadcast tlbivax instructions.
>
> Note that I'm using local invalidations for D->I cache coherency.
>
> At worst, if another processor is trying to execute the same and
> has the old entry in its TLB, it will just take a fault and re-do
> the TLB flush locally (it won't re-do the cache flush in any case).
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> v2. This variant fixes usage of linux/spinlock.h instead of asm/ 
> spinlock.h
> v3. Invadvertently un-EXPORT_SYMBOL'ed some cache flush calls on ppc64
> v4. Fix differences in local_* flush variants between CPU types and
>    corresponding clash with highmem code. Remove remaining _tlbie  
> calls
>    from nohash code.
>
> arch/powerpc/include/asm/highmem.h  |    4
> arch/powerpc/include/asm/mmu.h      |    3
> arch/powerpc/include/asm/tlbflush.h |   84 ++++++--------
> arch/powerpc/kernel/misc_32.S       |    9 +
> arch/powerpc/kernel/ppc_ksyms.c     |    6 -
> arch/powerpc/mm/Makefile            |    2
> arch/powerpc/mm/fault.c             |    2
> arch/powerpc/mm/mem.c               |    2
> arch/powerpc/mm/tlb_hash32.c        |    4
> arch/powerpc/mm/tlb_nohash.c        |  209 ++++++++++++++++++++++++++ 
> ++++++++++
> 10 files changed, 268 insertions(+), 57 deletions(-)
>

> Index: linux-work/arch/powerpc/mm/tlb_nohash.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-work/arch/powerpc/mm/tlb_nohash.c	2008-12-15  
> 14:36:20.000000000 +1100
> @@ -0,0 +1,209 @@
> +/*
> + * This file contains the routines for TLB flushing.
> + * On machines where the MMU does not use a hash table to store  
> virtual to
> + * physical translations (ie, SW loaded TLBs or Book3E compilant  
> processors,
> + * this does -not- include 603 however which shares the  
> implementation with
> + * hash based processors)
> + *
> + *  -- BenH
> + *
> + * Copyright 2008 Ben Herrenschmidt <benh@kernel.crashing.org>
> + *                IBM Corp.
> + *
> + *  Derived from arch/ppc/mm/init.c:
> + *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
> + *
> + *  Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
> + *  and Cort Dougan (PReP) (cort@cs.nmt.edu)
> + *    Copyright (C) 1996 Paul Mackerras
> + *
> + *  Derived from "arch/i386/mm/init.c"
> + *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> + *
> + *  This program is free software; you can redistribute it and/or
> + *  modify it under the terms of the GNU General Public License
> + *  as published by the Free Software Foundation; either version
> + *  2 of the License, or (at your option) any later version.
> + *
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/init.h>
> +#include <linux/highmem.h>
> +#include <linux/pagemap.h>
> +#include <linux/preempt.h>
> +#include <linux/spinlock.h>
> +
> +#include <asm/tlbflush.h>
> +#include <asm/tlb.h>
> +
> +#include "mmu_decl.h"
> +
> +/*
> + * Basse TLB flushing operations:

One 's'

>
> + *
> + *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
> + *  - flush_tlb_page(vma, vmaddr) flushes one page
> + *  - flush_tlb_range(vma, start, end) flushes a range of pages
> + *  - flush_tlb_kernel_range(start, end) flushes kernel pages
> + *
> + *  - local_* variants of page and mm only apply to the current
> + *    processor
> + */
> +
> +/*
> + * These are the base non-SMP variants of page and mm flushing
> + */
> +void local_flush_tlb_mm(struct mm_struct *mm)
> +{
> +	unsigned int pid;
> +
> +	preempt_disable();
> +	pid = mm->context.id;
> +	if (pid != MMU_NO_CONTEXT)
> +		_tlbil_pid(pid);
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(local_flush_tlb_mm);

Do these really get called w/MMU_NO_CONTEXT?  What is the calling code  
trying to flush under those situations?

> +
> +void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long  
> vmaddr)
> +{
> +	unsigned int pid;
> +
> +	preempt_disable();
> +	pid = vma ? vma->vm_mm->context.id : 0;
> +	if (pid != MMU_NO_CONTEXT)
> +		_tlbil_va(vmaddr, pid);
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(local_flush_tlb_page);
> +
> +
> +/*
> + * And here are the SMP non-local implementations
> + */
> +#ifdef CONFIG_SMP
> +
> +static DEFINE_SPINLOCK(tlbivax_lock);
> +
> +struct tlb_flush_param {
> +	unsigned long addr;
> +	unsigned int pid;
> +};
> +
> +static void do_flush_tlb_mm_ipi(void *param)
> +{
> +	struct tlb_flush_param *p = param;
> +
> +	_tlbil_pid(p ? p->pid : 0);
> +}
> +
> +static void do_flush_tlb_page_ipi(void *param)
> +{
> +	struct tlb_flush_param *p = param;
> +
> +	_tlbil_va(p->addr, p->pid);
> +}
> +
> +
> +/* Note on invalidations and PID:
> + *
> + * We snapshot the PID with preempt disabled. At this point, it can  
> still
> + * change either because:
> + * - our context is being stolen (PID -> NO_CONTEXT) on another CPU
> + * - we are invaliating some target that isn't currently running here
> + *   and is concurrently acquiring a new PID on another CPU
> + * - some other CPU is re-acquiring a lost PID for this mm
> + * etc...
> + *
> + * However, this shouldn't be a problem as we only guarantee
> + * invalidation of TLB entries present prior to this call, so we
> + * don't care about the PID changing, and invalidating a stale PID
> + * is generally harmless.
> + */
> +
> +void flush_tlb_mm(struct mm_struct *mm)
> +{
> +	cpumask_t cpu_mask;
> +	unsigned int pid;
> +
> +	preempt_disable();
> +	pid = mm->context.id;
> +	if (unlikely(pid == MMU_NO_CONTEXT))
> +		goto no_context;
> +	cpu_mask = mm->cpu_vm_mask;
> +	cpu_clear(smp_processor_id(), cpu_mask);
> +	if (!cpus_empty(cpu_mask)) {
> +		struct tlb_flush_param p = { .pid = pid };
> +		smp_call_function_mask(cpu_mask, do_flush_tlb_mm_ipi, &p, 1);
> +	}
> +	_tlbil_pid(pid);
> + no_context:
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(flush_tlb_mm);
> +
> +void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> +{
> +	cpumask_t cpu_mask;
> +	unsigned int pid;
> +
> +	preempt_disable();
> +	pid = vma ? vma->vm_mm->context.id : 0;
> +	if (unlikely(pid == MMU_NO_CONTEXT))
> +		goto bail;
> +	cpu_mask = vma->vm_mm->cpu_vm_mask;
> +	cpu_clear(smp_processor_id(), cpu_mask);
> +	if (!cpus_empty(cpu_mask)) {
> +		/* If broadcast tlbivax is supported, use it */
> +		if (mmu_has_feature(MMU_FTR_HAS_TLBIVAX_BCAST)) {
> +			int lock = mmu_has_feature(MMU_FTR_TLBIVAX_NEED_LOCK);
> +			if (lock)
> +				spin_lock(&tlbivax_lock);
> +			_tlbivax_bcast(vmaddr, pid);
> +			if (lock)
> +				spin_unlock(&tlbivax_lock);
> +			goto bail;
> +		} else {
> +			struct tlb_flush_param p = { .pid = pid, .addr = vmaddr };
> +			smp_call_function_mask(cpu_mask,
> +					       do_flush_tlb_page_ipi, &p, 1);
> +		}
> +	}
> +	_tlbil_va(vmaddr, pid);
> + bail:
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(flush_tlb_page);
> +
> +#endif /* CONFIG_SMP */
> +
> +/*
> + * Flush kernel TLB entries in the given range
> + */
> +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> +{
> +#ifdef CONFIG_SMP
> +	preempt_disable();
> +	smp_call_function(do_flush_tlb_mm_ipi, NULL, 1);
> +	_tlbil_pid(0);
> +	preempt_enable();
> +#endif
> +	_tlbil_pid(0);
> +}
> +EXPORT_SYMBOL(flush_tlb_kernel_range);
> +
> +/*
> + * Currently, for range flushing, we just do a full mm flush. This  
> should
> + * be optimized based on a threshold on the size of the range, since
> + * some implementation can stack multiple tlbivax before a tlbsync  
> but
> + * for now, we keep it that way
> + */
> +void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> +		     unsigned long end)
> +
> +{
> +	flush_tlb_mm(vma->vm_mm);
> +}
> +EXPORT_SYMBOL(flush_tlb_range);

[snip]

> Index: linux-work/arch/powerpc/include/asm/mmu.h
> ===================================================================
> --- linux-work.orig/arch/powerpc/include/asm/mmu.h	2008-12-15  
> 14:36:20.000000000 +1100
> +++ linux-work/arch/powerpc/include/asm/mmu.h	2008-12-15  
> 14:36:20.000000000 +1100
> @@ -15,6 +15,9 @@
> #define MMU_FTR_TYPE_FSL_E		ASM_CONST(0x00000010)
> #define MMU_FTR_HAS_HIGH_BATS		ASM_CONST(0x00010000)
> #define MMU_FTR_BIG_PHYS		ASM_CONST(0x00020000)
> +#define MMU_FTR_HAS_TLBIVAX_BCAST	ASM_CONST(0x00040000)
> +#define MMU_FTR_HAS_TLBILX_PID		ASM_CONST(0x00080000)

Can we make these FTR_USE_ instead of FTR_HAS_.  On e500 we have  
TLBIVAX_BCAST but dont plan to use it.  I'd prefer not to have to  
answer questions about that.

> +#define MMU_FTR_TLBIVAX_NEED_LOCK	ASM_CONST(0x00100000)

Is this really ivax lock or sync lock?

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/16] powerpc/mm: Split mmu_context handling v3
  2008-12-15 15:43   ` Arnd Bergmann
@ 2008-12-15 20:20     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 20:20 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linuxppc-dev, Kumar Gala

On Mon, 2008-12-15 at 16:43 +0100, Arnd Bergmann wrote:
> Didn't we lift the limit to 30,000 tasks at some point? The comment
> in linux/threads.h mentions that the PID space goes up to 4 million.
> What does actually happen when we increase pid_max to beyond 32768
> on those systems and try to use them? Is there another limit in place?

Well, the mm context allocator for hash based CPUs cannot hand out more
than 32767 contexts. I suspect it's just going to spin in
init_new_context. I could make it fail instead...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/16] powerpc/mm: Add local_flush_tlb_mm() to SW loaded TLB implementations
  2008-12-15  5:44 ` [PATCH 5/16] powerpc/mm: Add local_flush_tlb_mm() to SW loaded TLB implementations Benjamin Herrenschmidt
@ 2008-12-15 20:30   ` Kumar Gala
  0 siblings, 0 replies; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 20:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 14, 2008, at 11:44 PM, Benjamin Herrenschmidt wrote:

> This adds a local_flush_tlb_mm() call as a pre-requisite for some
> SMP work for BookE processors
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> arch/powerpc/include/asm/tlbflush.h |    5 +++++
> 1 file changed, 5 insertions(+)

Acked-by: Kumar Gala <galak@kernel.crashing.org>

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/16] powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c
  2008-12-15  5:44 ` [PATCH 8/16] powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c Benjamin Herrenschmidt
@ 2008-12-15 20:36   ` Kumar Gala
  2008-12-15 20:46     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 20:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 14, 2008, at 11:44 PM, Benjamin Herrenschmidt wrote:

> This renames the files to clarify the fact that they are used by
> the hash based family of CPUs (the 603 being an exception in that
> family but is still handled by that code).
>
> This paves the way for the new tlb_nohash.c coming via a subsequent
> patch.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> arch/powerpc/mm/Makefile     |    2
> arch/powerpc/mm/tlb_32.c     |  190  
> --------------------------------------
> arch/powerpc/mm/tlb_64.c     |  211  
> -------------------------------------------
> arch/powerpc/mm/tlb_hash32.c |  190 +++++++++++++++++++++++++++++++++ 
> +++++
> arch/powerpc/mm/tlb_hash64.c |  211 +++++++++++++++++++++++++++++++++ 
> ++++++++++
> 5 files changed, 402 insertions(+), 402 deletions(-)


Acked-by: Kumar Gala <galak@kernel.crashing.org>

(I'm told git-format-patch -M is useful for such patches)

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15 20:19   ` Kumar Gala
@ 2008-12-15 20:46     ` Benjamin Herrenschmidt
  2008-12-15 20:57       ` Kumar Gala
  0 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 20:46 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev


> > +void local_flush_tlb_mm(struct mm_struct *mm)
> > +{
> > +	unsigned int pid;
> > +
> > +	preempt_disable();
> > +	pid = mm->context.id;
> > +	if (pid != MMU_NO_CONTEXT)
> > +		_tlbil_pid(pid);
> > +	preempt_enable();
> > +}
> > +EXPORT_SYMBOL(local_flush_tlb_mm);
> 
> Do these really get called w/MMU_NO_CONTEXT?  What is the calling code  
> trying to flush under those situations?

A bit of paranoia but yes, I think they can. flush_* can be called on a
non-current mm, thus an mm without a context (because it's been stolen
mostly) due to calls to unmap_mapping_range() or something like that,
which can be called by some filesystems (especially network filesystems
trying to reflect remote changes on mmap'ed regions I think). Under some
circumstances, ptrace can also cause flushes of non-current mm's.
 
> 
> > Index: linux-work/arch/powerpc/include/asm/mmu.h
> > ===================================================================
> > --- linux-work.orig/arch/powerpc/include/asm/mmu.h	2008-12-15  
> > 14:36:20.000000000 +1100
> > +++ linux-work/arch/powerpc/include/asm/mmu.h	2008-12-15  
> > 14:36:20.000000000 +1100
> > @@ -15,6 +15,9 @@
> > #define MMU_FTR_TYPE_FSL_E		ASM_CONST(0x00000010)
> > #define MMU_FTR_HAS_HIGH_BATS		ASM_CONST(0x00010000)
> > #define MMU_FTR_BIG_PHYS		ASM_CONST(0x00020000)
> > +#define MMU_FTR_HAS_TLBIVAX_BCAST	ASM_CONST(0x00040000)
> > +#define MMU_FTR_HAS_TLBILX_PID		ASM_CONST(0x00080000)
> 
> Can we make these FTR_USE_ instead of FTR_HAS_.  On e500 we have  
> TLBIVAX_BCAST but dont plan to use it.  I'd prefer not to have to  
> answer questions about that.

Hehehe :-) I can change that easily yes.

> > +#define MMU_FTR_TLBIVAX_NEED_LOCK	ASM_CONST(0x00100000)
> 
> Is this really ivax lock or sync lock?

The whole thing. Not totally clear, you have a better name ? Some CPUs
want a lock on sync and some on ivax, I plan to lock the whole sequence.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/16] powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c
  2008-12-15 20:36   ` Kumar Gala
@ 2008-12-15 20:46     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 20:46 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

On Mon, 2008-12-15 at 14:36 -0600, Kumar Gala wrote:
> 
> Acked-by: Kumar Gala <galak@kernel.crashing.org>
> 
> (I'm told git-format-patch -M is useful for such patches)

I'm still using quilt for my own developement.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
  2008-12-15  5:45 ` [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Benjamin Herrenschmidt
@ 2008-12-15 20:54   ` Kumar Gala
  2008-12-15 21:01     ` Benjamin Herrenschmidt
  2008-12-15 21:03     ` Michael Ellerman
  0 siblings, 2 replies; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 20:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

>
> --- linux-work.orig/arch/powerpc/include/asm/pgtable-ppc32.h	 
> 2008-11-24 14:48:55.000000000 +1100
> +++ linux-work/arch/powerpc/include/asm/pgtable-ppc32.h	2008-12-15  
> 15:34:16.000000000 +1100

[snip]

>
> @@ -436,20 +437,23 @@ extern int icache_44x_need_flush;
> 			 _PAGE_USER | _PAGE_ACCESSED | \
> 			 _PAGE_RW | _PAGE_HWWRITE | _PAGE_DIRTY | \
> 			 _PAGE_EXEC | _PAGE_HWEXEC)
> +
> /*
> - * Note: the _PAGE_COHERENT bit automatically gets set in the  
> hardware
> - * PTE if CONFIG_SMP is defined (hash_page does this); there is no  
> need
> - * to have it in the Linux PTE, and in fact the bit could be reused  
> for
> - * another purpose.  -- paulus.
> + * We define 2 sets of base prot bits, one for basic pages (ie,
> + * cacheable kernel and user pages) and one for non cacheable
> + * pages. We always set _PAGE_COHERENT when SMP is enabled or
> + * the processor might need it for DMA coherency.
>  */
> -
> -#ifdef CONFIG_44x
> -#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_GUARDED)
> +#if defined(CONFIG_SMP) || defined(CONFIG_PPC_STD_MMU)
> +#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT)
> #else
> #define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED)
> #endif
> +#define _PAGE_BASE_NC	(_PAGE_PRESENT | _PAGE_ACCESSED)
> +
> #define _PAGE_WRENABLE	(_PAGE_RW | _PAGE_DIRTY | _PAGE_HWWRITE)
> #define _PAGE_KERNEL	(_PAGE_BASE | _PAGE_SHARED | _PAGE_WRENABLE)
> +#define _PAGE_KERNEL_NC	(_PAGE_BASE_NC | _PAGE_SHARED |  
> _PAGE_WRENABLE | _PAGE_NO_CACHE)

Either _BASE_NC should have _PAGE_NO_CACHE set or you need a different  
name here for _PAGE_KERNEL_NC

> #ifdef CONFIG_PPC_STD_MMU
> /* On standard PPC MMU, no user access implies kernel read/write  
> access,
> @@ -459,7 +463,7 @@ extern int icache_44x_need_flush;
> #define _PAGE_KERNEL_RO	(_PAGE_BASE | _PAGE_SHARED)
> #endif
>
> -#define _PAGE_IO	(_PAGE_KERNEL | _PAGE_NO_CACHE | _PAGE_GUARDED)
> +#define _PAGE_IO	(_PAGE_KERNEL_NC | _PAGE_GUARDED)
> #define _PAGE_RAM	(_PAGE_KERNEL | _PAGE_HWEXEC)

I think we should do:

#define _PAGE_KERNEL_NC	(_PAGE_BASE_NC | _PAGE_SHARED | _PAGE_WRENABLE)
#define _PAGE_IO	(_PAGE_KERNEL_NC | _PAGE_NO_CACHE | _PAGE_GUARDED)

> Index: linux-work/arch/powerpc/include/asm/pgtable.h
> ===================================================================
> --- linux-work.orig/arch/powerpc/include/asm/pgtable.h	2008-09-29  
> 14:21:37.000000000 +1000
> +++ linux-work/arch/powerpc/include/asm/pgtable.h	2008-12-15  
> 14:38:00.000000000 +1100
> @@ -16,6 +16,32 @@ struct mm_struct;
> #endif
>
> #ifndef __ASSEMBLY__
> +
> +/*
> + * Macro to mark a page protection value as "uncacheable".
> + */
> +
> +#define _PAGE_CACHE_CTL	(_PAGE_COHERENT | _PAGE_COHERENT |  
> _PAGE_COHERENT | \
> + 			 _PAGE_WRITETHRU)

we like coherent so much we set it thrice?

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15 20:46     ` Benjamin Herrenschmidt
@ 2008-12-15 20:57       ` Kumar Gala
  2008-12-15 21:03         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 20:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 15, 2008, at 2:46 PM, Benjamin Herrenschmidt wrote:

>>>
>>> Index: linux-work/arch/powerpc/include/asm/mmu.h
>>> ===================================================================
>>> --- linux-work.orig/arch/powerpc/include/asm/mmu.h	2008-12-15
>>> 14:36:20.000000000 +1100
>>> +++ linux-work/arch/powerpc/include/asm/mmu.h	2008-12-15
>>> 14:36:20.000000000 +1100
>>> @@ -15,6 +15,9 @@
>>> #define MMU_FTR_TYPE_FSL_E		ASM_CONST(0x00000010)
>>> #define MMU_FTR_HAS_HIGH_BATS		ASM_CONST(0x00010000)
>>> #define MMU_FTR_BIG_PHYS		ASM_CONST(0x00020000)
>>> +#define MMU_FTR_HAS_TLBIVAX_BCAST	ASM_CONST(0x00040000)
>>> +#define MMU_FTR_HAS_TLBILX_PID		ASM_CONST(0x00080000)
>>
>> Can we make these FTR_USE_ instead of FTR_HAS_.  On e500 we have
>> TLBIVAX_BCAST but dont plan to use it.  I'd prefer not to have to
>> answer questions about that.
>
> Hehehe :-) I can change that easily yes.

Probably good to add something in the commit message about how  
FTR_USE_ implies FTR_HAS_ just so when some comes back through the git  
history the know we thought about it.

>>> +#define MMU_FTR_TLBIVAX_NEED_LOCK	ASM_CONST(0x00100000)
>>
>> Is this really ivax lock or sync lock?
>
> The whole thing. Not totally clear, you have a better name ? Some CPUs
> want a lock on sync and some on ivax, I plan to lock the whole  
> sequence.

MMU_FTR_TLBIVAX_OR_SYNC_NEED_LOCK ?

Its probably a good idea to have a clear definition of what each of  
these flags means in the commit message.

-k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
  2008-12-15 20:54   ` Kumar Gala
@ 2008-12-15 21:01     ` Benjamin Herrenschmidt
  2008-12-15 21:08       ` Kumar Gala
  2008-12-15 21:03     ` Michael Ellerman
  1 sibling, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 21:01 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev


> > -#ifdef CONFIG_44x
> > -#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_GUARDED)
> > +#if defined(CONFIG_SMP) || defined(CONFIG_PPC_STD_MMU)
> > +#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT)
> > #else
> > #define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED)
> > #endif
> > +#define _PAGE_BASE_NC	(_PAGE_PRESENT | _PAGE_ACCESSED)
> > +
> > #define _PAGE_WRENABLE	(_PAGE_RW | _PAGE_DIRTY | _PAGE_HWWRITE)
> > #define _PAGE_KERNEL	(_PAGE_BASE | _PAGE_SHARED | _PAGE_WRENABLE)
> > +#define _PAGE_KERNEL_NC	(_PAGE_BASE_NC | _PAGE_SHARED |  
> > _PAGE_WRENABLE | _PAGE_NO_CACHE)
> 
> Either _BASE_NC should have _PAGE_NO_CACHE set or you need a different  
> name here for _PAGE_KERNEL_NC

Not sure what you mean.. _PAGE_KERNEL_NC has no cache in it, and
_BASE_NC doesn't ... oh well.. because it's the base type used by
KERNEL_NC :-) I agree it's not the clearest, I can just move
_PAGE_NO_CACHE to _PAGE_BASE_NC, that will make it clearer I suppose,
but I don't see anything being actually incorrect, or do I miss
something ?

> I think we should do:
> 
> #define _PAGE_KERNEL_NC	(_PAGE_BASE_NC | _PAGE_SHARED | _PAGE_WRENABLE)
> #define _PAGE_IO	(_PAGE_KERNEL_NC | _PAGE_NO_CACHE | _PAGE_GUARDED)

I don't understand.... _PAGE_KERNEL_NC is supposedly non caccheable, I
should probably move _PAGE_NO_CACHE to _PAGE_BASE_NC...

> > +#define _PAGE_CACHE_CTL	(_PAGE_COHERENT | _PAGE_COHERENT |  
> > _PAGE_COHERENT | \
> > + 			 _PAGE_WRITETHRU)
> 
> we like coherent so much we set it thrice?

Nice :-) Yeah, it should be _COHERENT, _GUARDED, _NO_CACHE and
_WRITETHRU, I'll fix that.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15 20:57       ` Kumar Gala
@ 2008-12-15 21:03         ` Benjamin Herrenschmidt
  2008-12-15 21:10           ` Kumar Gala
  0 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 21:03 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev


> > The whole thing. Not totally clear, you have a better name ? Some CPUs
> > want a lock on sync and some on ivax, I plan to lock the whole  
> > sequence.
> 
> MMU_FTR_TLBIVAX_OR_SYNC_NEED_LOCK ?

Which completely blows away the nice tab'ing :-)

MMU_FTR_LOCK_BCAST_TLB_OPS ?

> Its probably a good idea to have a clear definition of what each of  
> these flags means in the commit message.

No, I'd rather have that in a comment in the code.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
  2008-12-15 20:54   ` Kumar Gala
  2008-12-15 21:01     ` Benjamin Herrenschmidt
@ 2008-12-15 21:03     ` Michael Ellerman
  2008-12-15 21:05       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 41+ messages in thread
From: Michael Ellerman @ 2008-12-15 21:03 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]

On Mon, 2008-12-15 at 14:54 -0600, Kumar Gala wrote:
> >
> > #ifndef __ASSEMBLY__
> > +
> > +/*
> > + * Macro to mark a page protection value as "uncacheable".
> > + */
> > +
> > +#define _PAGE_CACHE_CTL	(_PAGE_COHERENT | _PAGE_COHERENT | _PAGE_COHERENT | \
> > + 			 _PAGE_WRITETHRU)
> 
> we like coherent so much we set it thrice?

That makes it really-really-really coherent.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
  2008-12-15 21:03     ` Michael Ellerman
@ 2008-12-15 21:05       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 21:05 UTC (permalink / raw)
  To: michael; +Cc: linuxppc-dev, Kumar Gala

On Tue, 2008-12-16 at 08:03 +1100, Michael Ellerman wrote:
> On Mon, 2008-12-15 at 14:54 -0600, Kumar Gala wrote:
> > >
> > > #ifndef __ASSEMBLY__
> > > +
> > > +/*
> > > + * Macro to mark a page protection value as "uncacheable".
> > > + */
> > > +
> > > +#define _PAGE_CACHE_CTL	(_PAGE_COHERENT | _PAGE_COHERENT | _PAGE_COHERENT | \
> > > + 			 _PAGE_WRITETHRU)
> > 
> > we like coherent so much we set it thrice?
> 
> That makes it really-really-really coherent.

Actually the mask is used to remove those bits so not quite :-)

Should be all WIMG, I'll send a fix.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
  2008-12-15 21:01     ` Benjamin Herrenschmidt
@ 2008-12-15 21:08       ` Kumar Gala
  0 siblings, 0 replies; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 21:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 15, 2008, at 3:01 PM, Benjamin Herrenschmidt wrote:

>
>>> -#ifdef CONFIG_44x
>>> -#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_GUARDED)
>>> +#if defined(CONFIG_SMP) || defined(CONFIG_PPC_STD_MMU)
>>> +#define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED |  
>>> _PAGE_COHERENT)
>>> #else
>>> #define _PAGE_BASE	(_PAGE_PRESENT | _PAGE_ACCESSED)
>>> #endif
>>> +#define _PAGE_BASE_NC	(_PAGE_PRESENT | _PAGE_ACCESSED)
>>> +
>>> #define _PAGE_WRENABLE	(_PAGE_RW | _PAGE_DIRTY | _PAGE_HWWRITE)
>>> #define _PAGE_KERNEL	(_PAGE_BASE | _PAGE_SHARED | _PAGE_WRENABLE)
>>> +#define _PAGE_KERNEL_NC	(_PAGE_BASE_NC | _PAGE_SHARED |
>>> _PAGE_WRENABLE | _PAGE_NO_CACHE)
>>
>> Either _BASE_NC should have _PAGE_NO_CACHE set or you need a  
>> different
>> name here for _PAGE_KERNEL_NC
>
> Not sure what you mean.. _PAGE_KERNEL_NC has no cache in it, and
> _BASE_NC doesn't ... oh well.. because it's the base type used by
> KERNEL_NC :-) I agree it's not the clearest, I can just move
> _PAGE_NO_CACHE to _PAGE_BASE_NC, that will make it clearer I suppose,
> but I don't see anything being actually incorrect, or do I miss
> something ?
>
>> I think we should do:
>>
>> #define _PAGE_KERNEL_NC	(_PAGE_BASE_NC | _PAGE_SHARED |  
>> _PAGE_WRENABLE)
>> #define _PAGE_IO	(_PAGE_KERNEL_NC | _PAGE_NO_CACHE | _PAGE_GUARDED)
>
> I don't understand.... _PAGE_KERNEL_NC is supposedly non caccheable, I
> should probably move _PAGE_NO_CACHE to _PAGE_BASE_NC...

I just want _NC to mean the same thing.  As you say, just set  
_PAGE_NO_CACHE in _PAGE_BASE_NC and we should be good.

>>> +#define _PAGE_CACHE_CTL	(_PAGE_COHERENT | _PAGE_COHERENT |
>>> _PAGE_COHERENT | \
>>> + 			 _PAGE_WRITETHRU)
>>
>> we like coherent so much we set it thrice?
>
> Nice :-) Yeah, it should be _COHERENT, _GUARDED, _NO_CACHE and
> _WRITETHRU, I'll fix that.

was guessing as much.

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15 21:03         ` Benjamin Herrenschmidt
@ 2008-12-15 21:10           ` Kumar Gala
  2008-12-15 21:18             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 21:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 15, 2008, at 3:03 PM, Benjamin Herrenschmidt wrote:

>
>>> The whole thing. Not totally clear, you have a better name ? Some  
>>> CPUs
>>> want a lock on sync and some on ivax, I plan to lock the whole
>>> sequence.
>>
>> MMU_FTR_TLBIVAX_OR_SYNC_NEED_LOCK ?
>
> Which completely blows away the nice tab'ing :-)
>
> MMU_FTR_LOCK_BCAST_TLB_OPS ?

Hmm.. are you mixing the two different locking needs together?  The is  
locking of ivax vs tlbwe and there is locking around multiple "msgs"  
on the bus.  I know for us we can have any # of ivax's on the bus, but  
only one tlbsync.

>> Its probably a good idea to have a clear definition of what each of
>> these flags means in the commit message.
>
> No, I'd rather have that in a comment in the code.

that's fine w/me.

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15 21:10           ` Kumar Gala
@ 2008-12-15 21:18             ` Benjamin Herrenschmidt
  2008-12-15 22:19               ` Kumar Gala
  0 siblings, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 21:18 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

On Mon, 2008-12-15 at 15:10 -0600, Kumar Gala wrote:

> Hmm.. are you mixing the two different locking needs together?  The is  
> locking of ivax vs tlbwe and there is locking around multiple "msgs"  
> on the bus.  I know for us we can have any # of ivax's on the bus, but  
> only one tlbsync.

I'm purely talking about the later. Right now I only issue one ivax +
one tlbsycn anyway but I was thinking about having _tlbivax_bcast take a
count for small ranges, but I would still lock the whole thing because
some impl I know of don't like multiple ivax coliding form different
sources neither.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15 21:18             ` Benjamin Herrenschmidt
@ 2008-12-15 22:19               ` Kumar Gala
  2008-12-15 23:31                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Kumar Gala @ 2008-12-15 22:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 15, 2008, at 3:18 PM, Benjamin Herrenschmidt wrote:

> On Mon, 2008-12-15 at 15:10 -0600, Kumar Gala wrote:
>
>> Hmm.. are you mixing the two different locking needs together?  The  
>> is
>> locking of ivax vs tlbwe and there is locking around multiple "msgs"
>> on the bus.  I know for us we can have any # of ivax's on the bus,  
>> but
>> only one tlbsync.
>
> I'm purely talking about the later. Right now I only issue one ivax +
> one tlbsycn anyway but I was thinking about having _tlbivax_bcast  
> take a
> count for small ranges, but I would still lock the whole thing because
> some impl I know of don't like multiple ivax coliding form different
> sources neither.

Ok.  Lets use MMU_FTR_LOCK_BCAST_TLB_OPS and have a comment about  
locking because bus implementations cant handle multiple ivax and/or  
multiple syncs.

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3
  2008-12-15 22:19               ` Kumar Gala
@ 2008-12-15 23:31                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2008-12-15 23:31 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

On Mon, 2008-12-15 at 16:19 -0600, Kumar Gala wrote:
> Ok.  Lets use MMU_FTR_LOCK_BCAST_TLB_OPS and have a comment about  
> locking because bus implementations cant handle multiple ivax and/or  
> multiple syncs.
> 
Hi used MMU_FTR_LOCK_BCAST_INVAL :-) And I put a comment that says:

/* This indicates that the processor cannot handle multiple outstanding
 * broadcast tlbivax or tlbsync. This makes the code use a spinlock
 * around such invalidate forms.
 */

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/16] powerpc/4xx: Extended DCR support v2
  2008-12-15  5:44 ` [PATCH 3/16] powerpc/4xx: Extended DCR support v2 Benjamin Herrenschmidt
@ 2008-12-17 17:33   ` Josh Boyer
  0 siblings, 0 replies; 41+ messages in thread
From: Josh Boyer @ 2008-12-17 17:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Kumar Gala

On Mon, Dec 15, 2008 at 04:44:17PM +1100, Benjamin Herrenschmidt wrote:
>This adds supports to the "extended" DCR addressing via
>the indirect mfdcrx/mtdcrx instructions supported by some
>4xx cores (440H6 and later)
>
>I enabled the feature for now only on AMCC 460 chips
>
>Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>

I actually ran some tests on this patch on a Bamboo board that
lacks the new instructions.  While tbench might not be great here,
it's the thing I had handy and it does drive some DCR access given
that every interrupt uses DCR instructions to handle the UIC bits
(and some of the MAL stuff I'm assuming).

The results with and without the patch were pretty close.
Differences were in the noise range.  I wanted to test on
Canyonlands, which does have the new instructions, but mine
appears to be DOA.

So in summary, it doesn't make things worse :).

josh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/16] powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs v2
  2008-12-15  5:45 ` [PATCH 14/16] powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs v2 Benjamin Herrenschmidt
@ 2008-12-17 21:21   ` Kumar Gala
  0 siblings, 0 replies; 41+ messages in thread
From: Kumar Gala @ 2008-12-17 21:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

> /*
>  * Initialize the context management stuff.
> @@ -326,13 +344,56 @@ void __init mmu_context_init(void)
> 	init_mm.context.active = NR_CPUS;
>
> 	/*
> +	 *   The MPC8xx has only 16 contexts.  We rotate through them on  
> each
> +	 * task switch.  A better way would be to keep track of tasks that
> +	 * own contexts, and implement an LRU usage.  That way very active
> +	 * tasks don't always have to pay the TLB reload overhead.  The
> +	 * kernel pages are mapped shared, so the kernel can run on behalf
> +	 * of any task that makes a kernel entry.  Shared does not mean they
> +	 * are not protected, just that the ASID comparison is not  
> performed.
> +	 *      -- Dan
> +	 *
> +	 * The IBM4xx has 256 contexts, so we can just rotate through these
> +	 * as a way of "switching" contexts.  If the TID of the TLB is zero,
> +	 * the PID/TID comparison is disabled, so we can use a TID of zero
> +	 * to represent all kernel pages as shared among all contexts.

can expand the comment of change it to say all other nonhash parts  
40x, 4xx, fsl-booke right now have 256 contexts.

>
> +	 * 	-- Dan
> +	 */
> +	if (mmu_has_feature(MMU_FTR_TYPE_8xx)) {
> +		first_context = 0;
> +		last_context = 15;
> +	} else {
> +		first_context = 1;
> +		last_context = 255;
> +	}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/16] powerpc/mm: Rework context management for CPUs with no hash table v2
  2008-12-15  5:44 ` [PATCH 7/16] powerpc/mm: Rework context management for CPUs with no hash table v2 Benjamin Herrenschmidt
@ 2008-12-17 21:30   ` Kumar Gala
  0 siblings, 0 replies; 41+ messages in thread
From: Kumar Gala @ 2008-12-17 21:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev


On Dec 14, 2008, at 11:44 PM, Benjamin Herrenschmidt wrote:

> This reworks the context management code used by 4xx,8xx and
> freescale BookE. It adds support for SMP by implementing a
> concept of stale context map to lazily flush the TLB on
> processors where a context may have been invalidated. This
> also contains the ground work for generalizing such lazy TLB
> flushing by just picking up a new PID and marking the old one
> stale. This will be implemented later.
>
> This is a first implementation that uses a global spinlock.
>
> Ideally, we should try to get at least the fast path (context ID
> already assigned) lockless or limited to a per context lock,
> but for now this will do.
>
> I tried to keep the UP case reasonably simple to avoid adding
> too much overhead to 8xx which does a lot of context stealing
> since it effectively has only 16 PIDs available.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> v2. remove some bugs with active tracking on SMP

I'd personally like a bit more commentary on how the stale map  
addresses the SMP issues in the commit message.

Also, Paul had a comment that we've kept around related to 8xx/4xx SMP  
as well as LRU.. is that still relevant?

- k

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2008-12-17 21:30 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-15  5:43 [PATCH 0/16] powerpc: Preliminary work to enable SMP BookE (v2) Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 1/16] powerpc: Fix bogus cache flushing on all 40x and BookE processors v2 Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 2/16] powerpc: Fix asm EMIT_BUG_ENTRY with !CONFIG_BUG Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 3/16] powerpc/4xx: Extended DCR support v2 Benjamin Herrenschmidt
2008-12-17 17:33   ` Josh Boyer
2008-12-15  5:44 ` [PATCH 4/16] powerpc/fsl-booke: Fix problem with _tlbil_va Benjamin Herrenschmidt
2008-12-15  6:59   ` Stephen Rothwell
2008-12-15  7:04     ` Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 5/16] powerpc/mm: Add local_flush_tlb_mm() to SW loaded TLB implementations Benjamin Herrenschmidt
2008-12-15 20:30   ` Kumar Gala
2008-12-15  5:44 ` [PATCH 6/16] powerpc/mm: Split mmu_context handling v3 Benjamin Herrenschmidt
2008-12-15 15:43   ` Arnd Bergmann
2008-12-15 20:20     ` Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 7/16] powerpc/mm: Rework context management for CPUs with no hash table v2 Benjamin Herrenschmidt
2008-12-17 21:30   ` Kumar Gala
2008-12-15  5:44 ` [PATCH 8/16] powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c Benjamin Herrenschmidt
2008-12-15 20:36   ` Kumar Gala
2008-12-15 20:46     ` Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 9/16] powerpc/mm: Introduce MMU features v2 Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 10/16] powerpc/mm: Remove flush_HPTE() Benjamin Herrenschmidt
2008-12-15  5:44 ` [PATCH 11/16] powerpc/mm: Add SMP support to no-hash TLB handling v3 Benjamin Herrenschmidt
2008-12-15 20:19   ` Kumar Gala
2008-12-15 20:46     ` Benjamin Herrenschmidt
2008-12-15 20:57       ` Kumar Gala
2008-12-15 21:03         ` Benjamin Herrenschmidt
2008-12-15 21:10           ` Kumar Gala
2008-12-15 21:18             ` Benjamin Herrenschmidt
2008-12-15 22:19               ` Kumar Gala
2008-12-15 23:31                 ` Benjamin Herrenschmidt
2008-12-15  5:45 ` [PATCH 12/16] powerpc/mm: Split low level tlb invalidate for nohash processors Benjamin Herrenschmidt
2008-12-15  5:45 ` [PATCH 13/16] powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 Benjamin Herrenschmidt
2008-12-15 12:25   ` [PATCH 13/16] powerpc/44x: No need to mask MSR:CE,ME " Josh Boyer
2008-12-15  5:45 ` [PATCH 14/16] powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs v2 Benjamin Herrenschmidt
2008-12-17 21:21   ` Kumar Gala
2008-12-15  5:45 ` [PATCH 15/16] powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Benjamin Herrenschmidt
2008-12-15 20:54   ` Kumar Gala
2008-12-15 21:01     ` Benjamin Herrenschmidt
2008-12-15 21:08       ` Kumar Gala
2008-12-15 21:03     ` Michael Ellerman
2008-12-15 21:05       ` Benjamin Herrenschmidt
2008-12-15  5:45 ` [PATCH 16/16] powerpc/44x: 44x TLB doesn't need "Guarded" set for all pages Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.