linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arch/tile: remove references to cpu_*_map.
       [not found] <4F761E1C.80808.com>
@ 2012-02-15  4:58 ` Rusty Russell
  2012-03-27 17:47 ` [PATCH] arch/tile/Kconfig: remove pointless "!M386" test Chris Metcalf
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Rusty Russell @ 2012-02-15  4:58 UTC (permalink / raw)
  To: Chris Metcalf, Rusty Russell, Jiri Kosina, Joe Perches, linux-kernel

This has been obsolescent for a while; time for the final push.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/setup.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 825bc64..1d43be6 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -1244,7 +1244,7 @@ static void __init setup_cpu_maps(void)
 			      sizeof(cpu_lotar_map));
 	if (rc < 0) {
 		pr_err("warning: no HV_INQ_TILES_LOTAR; using AVAIL\n");
-		cpu_lotar_map = cpu_possible_map;
+		cpu_lotar_map = *cpu_possible_mask;
 	}
 
 #if CHIP_HAS_CBOX_HOME_MAP()
@@ -1254,9 +1254,9 @@ static void __init setup_cpu_maps(void)
 			      sizeof(hash_for_home_map));
 	if (rc < 0)
 		early_panic("hv_inquire_tiles(HFH_CACHE) failed: rc %d\n", rc);
-	cpumask_or(&cpu_cacheable_map, &cpu_possible_map, &hash_for_home_map);
+	cpumask_or(&cpu_cacheable_map, cpu_possible_mask, &hash_for_home_map);
 #else
-	cpu_cacheable_map = cpu_possible_map;
+	cpu_cacheable_map = *cpu_possible_mask;
 #endif
 }
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile/Kconfig: remove pointless "!M386" test.
       [not found] <4F761E1C.80808.com>
  2012-02-15  4:58 ` [PATCH] arch/tile: remove references to cpu_*_map Rusty Russell
@ 2012-03-27 17:47 ` Chris Metcalf
  2012-03-27 17:53 ` [PATCH] arch/tile/Kconfig: rename tile_defconfig to tilepro_defconfig Chris Metcalf
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 17:47 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

Looks like a cut and paste bug from the x86 version.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 11270ca..30413c1 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -12,7 +12,7 @@ config TILE
 	select GENERIC_PENDING_IRQ if SMP
 	select GENERIC_IRQ_SHOW
 	select SYS_HYPERVISOR
-	select ARCH_HAVE_NMI_SAFE_CMPXCHG if !M386
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 
 # FIXME: investigate whether we need/want these options.
 #	select HAVE_IOREMAP_PROT
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile/Kconfig: rename tile_defconfig to tilepro_defconfig
       [not found] <4F761E1C.80808.com>
  2012-02-15  4:58 ` [PATCH] arch/tile: remove references to cpu_*_map Rusty Russell
  2012-03-27 17:47 ` [PATCH] arch/tile/Kconfig: remove pointless "!M386" test Chris Metcalf
@ 2012-03-27 17:53 ` Chris Metcalf
  2012-03-27 17:56 ` [PATCH] arch/tile/Kconfig: don't specify CONFIG_PAGE_OFFSET for 64-bit builds Chris Metcalf
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 17:53 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

We switched to using "tilepro" for the 32-bit stuff a while ago,
but missed this one usage.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Kconfig |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 30413c1..30393aa 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -69,6 +69,9 @@ config ARCH_PHYS_ADDR_T_64BIT
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
+config NEED_DMA_MAP_STATE
+	def_bool y
+
 config LOCKDEP_SUPPORT
 	def_bool y
 
@@ -118,7 +121,7 @@ config 64BIT
 
 config ARCH_DEFCONFIG
 	string
-	default "arch/tile/configs/tile_defconfig" if !TILEGX
+	default "arch/tile/configs/tilepro_defconfig" if !TILEGX
 	default "arch/tile/configs/tilegx_defconfig" if TILEGX
 
 source "init/Kconfig"
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile/Kconfig: don't specify CONFIG_PAGE_OFFSET for 64-bit builds
       [not found] <4F761E1C.80808.com>
                   ` (2 preceding siblings ...)
  2012-03-27 17:53 ` [PATCH] arch/tile/Kconfig: rename tile_defconfig to tilepro_defconfig Chris Metcalf
@ 2012-03-27 17:56 ` Chris Metcalf
  2012-03-27 18:04 ` [PATCH] arch/tile: fix typo in <arch/spr_def.h> Chris Metcalf
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 17:56 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

It's fixed at half the VA space and there's no point in configuring it.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Kconfig |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 30393aa..96033e2 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -243,6 +243,7 @@ endchoice
 
 config PAGE_OFFSET
 	hex
+	depends on !64BIT
 	default 0xF0000000 if VMSPLIT_3_75G
 	default 0xE0000000 if VMSPLIT_3_5G
 	default 0xB0000000 if VMSPLIT_2_75G
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix typo in <arch/spr_def.h>
       [not found] <4F761E1C.80808.com>
                   ` (3 preceding siblings ...)
  2012-03-27 17:56 ` [PATCH] arch/tile/Kconfig: don't specify CONFIG_PAGE_OFFSET for 64-bit builds Chris Metcalf
@ 2012-03-27 18:04 ` Chris Metcalf
  2012-03-27 18:10 ` [PATCH] arch/tile: revert comment for atomic64_add_unless() Chris Metcalf
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 18:04 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

We aren't yet using this definition in the kernel, but fix it up
before someone goes looking for it.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/arch/spr_def.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/tile/include/arch/spr_def.h b/arch/tile/include/arch/spr_def.h
index f548efe..d6ba449 100644
--- a/arch/tile/include/arch/spr_def.h
+++ b/arch/tile/include/arch/spr_def.h
@@ -60,8 +60,8 @@
 	_concat4(SPR_IPI_EVENT_, CONFIG_KERNEL_PL,,)
 #define SPR_IPI_EVENT_RESET_K \
 	_concat4(SPR_IPI_EVENT_RESET_, CONFIG_KERNEL_PL,,)
-#define SPR_IPI_MASK_SET_K \
-	_concat4(SPR_IPI_MASK_SET_, CONFIG_KERNEL_PL,,)
+#define SPR_IPI_EVENT_SET_K \
+	_concat4(SPR_IPI_EVENT_SET_, CONFIG_KERNEL_PL,,)
 #define INT_IPI_K \
 	_concat4(INT_IPI_, CONFIG_KERNEL_PL,,)
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: revert comment for atomic64_add_unless().
       [not found] <4F761E1C.80808.com>
                   ` (4 preceding siblings ...)
  2012-03-27 18:04 ` [PATCH] arch/tile: fix typo in <arch/spr_def.h> Chris Metcalf
@ 2012-03-27 18:10 ` Chris Metcalf
  2012-03-30 21:19   ` Arun Sharma
  2012-03-27 18:17 ` [PATCH] arch/tile: fix gcc 4.6 warnings in <asm/bitops_64.h> Chris Metcalf
                   ` (37 subsequent siblings)
  43 siblings, 1 reply; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 18:10 UTC (permalink / raw)
  To: Chris Metcalf, Andrew Morton, Eric Dumazet, Mike Frysinger,
	Arun Sharma, linux-kernel

It still returns whether @v was not @u, not the the old value,
unlike __atomic_add_unless().

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/atomic_32.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/asm/atomic_32.h b/arch/tile/include/asm/atomic_32.h
index c03349e..004d6d9 100644
--- a/arch/tile/include/asm/atomic_32.h
+++ b/arch/tile/include/asm/atomic_32.h
@@ -199,7 +199,7 @@ static inline u64 atomic64_add_return(u64 i, atomic64_t *v)
  * @u: ...unless v is equal to u.
  *
  * Atomically adds @a to @v, so long as @v was not already @u.
- * Returns the old value of @v.
+ * Returns non-zero if @v was not @u, and zero otherwise.
  */
 static inline u64 atomic64_add_unless(atomic64_t *v, u64 a, u64 u)
 {
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix gcc 4.6 warnings in <asm/bitops_64.h>
       [not found] <4F761E1C.80808.com>
                   ` (5 preceding siblings ...)
  2012-03-27 18:10 ` [PATCH] arch/tile: revert comment for atomic64_add_unless() Chris Metcalf
@ 2012-03-27 18:17 ` Chris Metcalf
  2012-03-27 19:21 ` [PATCH] arch/tile: use 0 for IRQ_RESCHEDULE instead of 1 Chris Metcalf
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 18:17 UTC (permalink / raw)
  To: Chris Metcalf, Andrew Morton, Mike Frysinger, Eric Dumazet,
	Akinobu Mita, linux-kernel

Fix some signedness and variable usage warnings in change_bit()
and test_and_change_bit().

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/bitops_64.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/tile/include/asm/bitops_64.h b/arch/tile/include/asm/bitops_64.h
index e9c8e38..161a381 100644
--- a/arch/tile/include/asm/bitops_64.h
+++ b/arch/tile/include/asm/bitops_64.h
@@ -39,10 +39,10 @@ static inline void clear_bit(unsigned nr, volatile unsigned long *addr)
 
 static inline void change_bit(unsigned nr, volatile unsigned long *addr)
 {
-	unsigned long old, mask = (1UL << (nr % BITS_PER_LONG));
-	long guess, oldval;
+	unsigned long mask = (1UL << (nr % BITS_PER_LONG));
+	unsigned long guess, oldval;
 	addr += nr / BITS_PER_LONG;
-	old = *addr;
+	oldval = *addr;
 	do {
 		guess = oldval;
 		oldval = atomic64_cmpxchg((atomic64_t *)addr,
@@ -86,7 +86,7 @@ static inline int test_and_change_bit(unsigned nr,
 				      volatile unsigned long *addr)
 {
 	unsigned long mask = (1UL << (nr % BITS_PER_LONG));
-	long guess, oldval = *addr;
+	unsigned long guess, oldval;
 	addr += nr / BITS_PER_LONG;
 	oldval = *addr;
 	do {
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: use 0 for IRQ_RESCHEDULE instead of 1
       [not found] <4F761E1C.80808.com>
                   ` (6 preceding siblings ...)
  2012-03-27 18:17 ` [PATCH] arch/tile: fix gcc 4.6 warnings in <asm/bitops_64.h> Chris Metcalf
@ 2012-03-27 19:21 ` Chris Metcalf
  2012-03-27 19:40 ` [PATCH] arch/tile: use interrupt critical sections less Chris Metcalf
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 19:21 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

This avoids assigning IRQ 0 to PCI devices, because we've seen that
doesn't always work well.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/irq.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/asm/irq.h b/arch/tile/include/asm/irq.h
index f80f8ce..33cff9a 100644
--- a/arch/tile/include/asm/irq.h
+++ b/arch/tile/include/asm/irq.h
@@ -21,7 +21,7 @@
 #define NR_IRQS 32
 
 /* IRQ numbers used for linux IPIs. */
-#define IRQ_RESCHEDULE 1
+#define IRQ_RESCHEDULE 0
 
 #define irq_canonicalize(irq)   (irq)
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: use interrupt critical sections less
       [not found] <4F761E1C.80808.com>
                   ` (7 preceding siblings ...)
  2012-03-27 19:21 ` [PATCH] arch/tile: use 0 for IRQ_RESCHEDULE instead of 1 Chris Metcalf
@ 2012-03-27 19:40 ` Chris Metcalf
  2012-03-29 17:30 ` [PATCH] arch/tile: support building big-endian kernel Chris Metcalf
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-27 19:40 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, Andrew Morton, Julia Lawall,
	Peter Zijlstra, linux-kernel

In general we want to avoid ever touching memory while within an
interrupt critical section, since the page fault path goes through
a different path from the hypervisor when in an interrupt critical
section, and we carefully decided with tilegx that we didn't need
to support this path in the kernel.  (On tilepro we did implement
that path as part of supporting atomic instructions in software.)

In practice we always need to touch the kernel stack, since that's
where we store the interrupt state before releasing the critical
section, but this change cleans up a few things.  The IRQ_ENABLE
macro is split up so that when we want to enable interrupts in a
deferred way (e.g. for cpu_idle or for interrupt return) we can
read the per-cpu enable mask before entering the critical section.
The cache-migration code is changed to use interrupt masking instead
of interrupt critical sections.  And, the interrupt-entry code is
changed so that we defer loading "tp" from per-cpu data until after
we have released the interrupt critical section.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/irqflags.h |   34 +++++++++++++----
 arch/tile/kernel/entry.S         |    3 +-
 arch/tile/kernel/intvec_64.S     |   78 +++++++++++++++++++++-----------------
 arch/tile/mm/init.c              |    4 ++
 arch/tile/mm/migrate.h           |    6 +++
 arch/tile/mm/migrate_32.S        |   36 ++++-------------
 arch/tile/mm/migrate_64.S        |   34 +++-------------
 7 files changed, 96 insertions(+), 99 deletions(-)

diff --git a/arch/tile/include/asm/irqflags.h b/arch/tile/include/asm/irqflags.h
index 5db0ce5..b4e96fe 100644
--- a/arch/tile/include/asm/irqflags.h
+++ b/arch/tile/include/asm/irqflags.h
@@ -28,10 +28,10 @@
  */
 #if CHIP_HAS_AUX_PERF_COUNTERS()
 #define LINUX_MASKABLE_INTERRUPTS_HI \
-       (~(INT_MASK_HI(INT_PERF_COUNT) | INT_MASK_HI(INT_AUX_PERF_COUNT)))
+	(~(INT_MASK_HI(INT_PERF_COUNT) | INT_MASK_HI(INT_AUX_PERF_COUNT)))
 #else
 #define LINUX_MASKABLE_INTERRUPTS_HI \
-       (~(INT_MASK_HI(INT_PERF_COUNT)))
+	(~(INT_MASK_HI(INT_PERF_COUNT)))
 #endif
 
 #else
@@ -90,6 +90,14 @@
 	__insn_mtspr(SPR_INTERRUPT_MASK_RESET_K_0, (unsigned long)(__m)); \
 	__insn_mtspr(SPR_INTERRUPT_MASK_RESET_K_1, (unsigned long)(__m>>32)); \
 } while (0)
+#define interrupt_mask_save_mask() \
+	(__insn_mfspr(SPR_INTERRUPT_MASK_SET_K_0) | \
+	 (((unsigned long long)__insn_mfspr(SPR_INTERRUPT_MASK_SET_K_1))<<32))
+#define interrupt_mask_restore_mask(mask) do { \
+	unsigned long long __m = (mask); \
+	__insn_mtspr(SPR_INTERRUPT_MASK_K_0, (unsigned long)(__m)); \
+	__insn_mtspr(SPR_INTERRUPT_MASK_K_1, (unsigned long)(__m>>32)); \
+} while (0)
 #else
 #define interrupt_mask_set(n) \
 	__insn_mtspr(SPR_INTERRUPT_MASK_SET_K, (1UL << (n)))
@@ -101,6 +109,10 @@
 	__insn_mtspr(SPR_INTERRUPT_MASK_SET_K, (mask))
 #define interrupt_mask_reset_mask(mask) \
 	__insn_mtspr(SPR_INTERRUPT_MASK_RESET_K, (mask))
+#define interrupt_mask_save_mask() \
+	__insn_mfspr(SPR_INTERRUPT_MASK_K)
+#define interrupt_mask_restore_mask(mask) \
+	__insn_mtspr(SPR_INTERRUPT_MASK_K, (mask))
 #endif
 
 /*
@@ -122,7 +134,7 @@ DECLARE_PER_CPU(unsigned long long, interrupts_enabled_mask);
 
 /* Disable all interrupts, including NMIs. */
 #define arch_local_irq_disable_all() \
-	interrupt_mask_set_mask(-1UL)
+	interrupt_mask_set_mask(-1ULL)
 
 /* Re-enable all maskable interrupts. */
 #define arch_local_irq_enable() \
@@ -179,7 +191,7 @@ DECLARE_PER_CPU(unsigned long long, interrupts_enabled_mask);
 #ifdef __tilegx__
 
 #if INT_MEM_ERROR != 0
-# error Fix IRQ_DISABLED() macro
+# error Fix IRQS_DISABLED() macro
 #endif
 
 /* Return 0 or 1 to indicate whether interrupts are currently disabled. */
@@ -207,9 +219,10 @@ DECLARE_PER_CPU(unsigned long long, interrupts_enabled_mask);
 	mtspr   SPR_INTERRUPT_MASK_SET_K, tmp
 
 /* Enable interrupts. */
-#define IRQ_ENABLE(tmp0, tmp1)					\
+#define IRQ_ENABLE_LOAD(tmp0, tmp1)				\
 	GET_INTERRUPTS_ENABLED_MASK_PTR(tmp0);			\
-	ld      tmp0, tmp0;					\
+	ld      tmp0, tmp0
+#define IRQ_ENABLE_APPLY(tmp0, tmp1)				\
 	mtspr   SPR_INTERRUPT_MASK_RESET_K, tmp0
 
 #else /* !__tilegx__ */
@@ -253,17 +266,22 @@ DECLARE_PER_CPU(unsigned long long, interrupts_enabled_mask);
 	mtspr   SPR_INTERRUPT_MASK_SET_K_1, tmp
 
 /* Enable interrupts. */
-#define IRQ_ENABLE(tmp0, tmp1)					\
+#define IRQ_ENABLE_LOAD(tmp0, tmp1)				\
 	GET_INTERRUPTS_ENABLED_MASK_PTR(tmp0);			\
 	{							\
 	 lw     tmp0, tmp0;					\
 	 addi   tmp1, tmp0, 4					\
 	};							\
-	lw      tmp1, tmp1;					\
+	lw      tmp1, tmp1
+#define IRQ_ENABLE_APPLY(tmp0, tmp1)				\
 	mtspr   SPR_INTERRUPT_MASK_RESET_K_0, tmp0;		\
 	mtspr   SPR_INTERRUPT_MASK_RESET_K_1, tmp1
 #endif
 
+#define IRQ_ENABLE(tmp0, tmp1)					\
+	IRQ_ENABLE_LOAD(tmp0, tmp1);				\
+	IRQ_ENABLE_APPLY(tmp0, tmp1)
+
 /*
  * Do the CPU's IRQ-state tracing from assembly code. We call a
  * C function, but almost everywhere we do, we don't mind clobbering
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index 431e9ae..f8d6155 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -99,8 +99,9 @@ STD_ENTRY(smp_nap)
  */
 STD_ENTRY(_cpu_idle)
 	movei r1, 1
+	IRQ_ENABLE_LOAD(r2, r3)
 	mtspr INTERRUPT_CRITICAL_SECTION, r1
-	IRQ_ENABLE(r2, r3)             /* unmask, but still with ICS set */
+	IRQ_ENABLE_APPLY(r2, r3)       /* unmask, but still with ICS set */
 	mtspr INTERRUPT_CRITICAL_SECTION, zero
 	.global _cpu_idle_nap
 _cpu_idle_nap:
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 79c93e1..3c1f626 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -219,7 +219,9 @@ intvec_\vecname:
 	 * This routine saves just the first four registers, plus the
 	 * stack context so we can do proper backtracing right away,
 	 * and defers to handle_interrupt to save the rest.
-	 * The backtracer needs pc, ex1, lr, sp, r52, and faultnum.
+	 * The backtracer needs pc, ex1, lr, sp, r52, and faultnum,
+	 * and needs sp set to its final location at the bottom of
+	 * the stack frame.
 	 */
 	addli   r0, r0, PTREGS_OFFSET_LR - (PTREGS_SIZE + KSTK_PTREGS_GAP)
 	wh64    r0   /* cache line 7 */
@@ -449,23 +451,6 @@ intvec_\vecname:
 	push_reg r5, r52
 	st      r52, r4
 
-	/* Load tp with our per-cpu offset. */
-#ifdef CONFIG_SMP
-	{
-	 mfspr  r20, SPR_SYSTEM_SAVE_K_0
-	 moveli r21, hw2_last(__per_cpu_offset)
-	}
-	{
-	 shl16insli r21, r21, hw1(__per_cpu_offset)
-	 bfextu r20, r20, 0, LOG2_THREAD_SIZE-1
-	}
-	shl16insli r21, r21, hw0(__per_cpu_offset)
-	shl3add r20, r20, r21
-	ld      tp, r20
-#else
-	move    tp, zero
-#endif
-
 	/*
 	 * If we will be returning to the kernel, we will need to
 	 * reset the interrupt masks to the state they had before.
@@ -488,6 +473,44 @@ intvec_\vecname:
 	.endif
 	st      r21, r32
 
+	/*
+	 * we've captured enough state to the stack (including in
+	 * particular our EX_CONTEXT state) that we can now release
+	 * the interrupt critical section and replace it with our
+	 * standard "interrupts disabled" mask value.  This allows
+	 * synchronous interrupts (and profile interrupts) to punch
+	 * through from this point onwards.
+	 *
+	 * It's important that no code before this point touch memory
+	 * other than our own stack (to keep the invariant that this
+	 * is all that gets touched under ICS), and that no code after
+	 * this point reference any interrupt-specific SPR, in particular
+	 * the EX_CONTEXT_K_ values.
+	 */
+	.ifc \function,handle_nmi
+	IRQ_DISABLE_ALL(r20)
+	.else
+	IRQ_DISABLE(r20, r21)
+	.endif
+	mtspr   INTERRUPT_CRITICAL_SECTION, zero
+
+	/* Load tp with our per-cpu offset. */
+#ifdef CONFIG_SMP
+	{
+	 mfspr  r20, SPR_SYSTEM_SAVE_K_0
+	 moveli r21, hw2_last(__per_cpu_offset)
+	}
+	{
+	 shl16insli r21, r21, hw1(__per_cpu_offset)
+	 bfextu r20, r20, 0, LOG2_THREAD_SIZE-1
+	}
+	shl16insli r21, r21, hw0(__per_cpu_offset)
+	shl3add r20, r20, r21
+	ld      tp, r20
+#else
+	move    tp, zero
+#endif
+
 #ifdef __COLLECT_LINKER_FEEDBACK__
 	/*
 	 * Notify the feedback routines that we were in the
@@ -512,21 +535,6 @@ intvec_\vecname:
 #endif
 
 	/*
-	 * we've captured enough state to the stack (including in
-	 * particular our EX_CONTEXT state) that we can now release
-	 * the interrupt critical section and replace it with our
-	 * standard "interrupts disabled" mask value.  This allows
-	 * synchronous interrupts (and profile interrupts) to punch
-	 * through from this point onwards.
-	 */
-	.ifc \function,handle_nmi
-	IRQ_DISABLE_ALL(r20)
-	.else
-	IRQ_DISABLE(r20, r21)
-	.endif
-	mtspr   INTERRUPT_CRITICAL_SECTION, zero
-
-	/*
 	 * Prepare the first 256 stack bytes to be rapidly accessible
 	 * without having to fetch the background data.
 	 */
@@ -718,9 +726,10 @@ STD_ENTRY(interrupt_return)
 	beqzt   r30, .Lrestore_regs
 	j       3f
 2:	TRACE_IRQS_ON
+	IRQ_ENABLE_LOAD(r20, r21)
 	movei   r0, 1
 	mtspr   INTERRUPT_CRITICAL_SECTION, r0
-	IRQ_ENABLE(r20, r21)
+	IRQ_ENABLE_APPLY(r20, r21)
 	beqzt   r30, .Lrestore_regs
 3:
 
@@ -737,7 +746,6 @@ STD_ENTRY(interrupt_return)
 	 * that will save some cycles if this turns out to be a syscall.
 	 */
 .Lrestore_regs:
-	FEEDBACK_REENTER(interrupt_return)   /* called from elsewhere */
 
 	/*
 	 * Rotate so we have one high bit and one low bit to test.
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 7309988..51c5e51 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -450,6 +450,7 @@ static pgd_t pgtables[PTRS_PER_PGD]
  */
 static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
 {
+	unsigned long long irqmask;
 	unsigned long address, pfn;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -630,10 +631,13 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
 	 *  - install pgtables[] as the real page table
 	 *  - flush the TLB so the new page table takes effect
 	 */
+	irqmask = interrupt_mask_save_mask();
+	interrupt_mask_set_mask(-1ULL);
 	rc = flush_and_install_context(__pa(pgtables),
 				       init_pgprot((unsigned long)pgtables),
 				       __get_cpu_var(current_asid),
 				       cpumask_bits(my_cpu_mask));
+	interrupt_mask_restore_mask(irqmask);
 	BUG_ON(rc != 0);
 
 	/* Copy the page table back to the normal swapper_pg_dir. */
diff --git a/arch/tile/mm/migrate.h b/arch/tile/mm/migrate.h
index cd45a08..91683d9 100644
--- a/arch/tile/mm/migrate.h
+++ b/arch/tile/mm/migrate.h
@@ -24,6 +24,9 @@
 /*
  * This function is used as a helper when setting up the initial
  * page table (swapper_pg_dir).
+ *
+ * You must mask ALL interrupts prior to invoking this code, since
+ * you can't legally touch the stack during the cache flush.
  */
 extern int flush_and_install_context(HV_PhysAddr page_table, HV_PTE access,
 				     HV_ASID asid,
@@ -39,6 +42,9 @@ extern int flush_and_install_context(HV_PhysAddr page_table, HV_PTE access,
  *
  * Note that any non-NULL pointers must not point to the page that
  * is handled by the stack_pte itself.
+ *
+ * You must mask ALL interrupts prior to invoking this code, since
+ * you can't legally touch the stack during the cache flush.
  */
 extern int homecache_migrate_stack_and_flush(pte_t stack_pte, unsigned long va,
 				     size_t length, pte_t *stack_ptep,
diff --git a/arch/tile/mm/migrate_32.S b/arch/tile/mm/migrate_32.S
index ac01a7c..5305814 100644
--- a/arch/tile/mm/migrate_32.S
+++ b/arch/tile/mm/migrate_32.S
@@ -40,8 +40,7 @@
 #define FRAME_R32	16
 #define FRAME_R33	20
 #define FRAME_R34	24
-#define FRAME_R35	28
-#define FRAME_SIZE	32
+#define FRAME_SIZE	28
 
 
 
@@ -66,12 +65,11 @@
 #define r_my_cpumask	r5
 
 /* Locals (callee-save); must not be more than FRAME_xxx above. */
-#define r_save_ics	r30
-#define r_context_lo	r31
-#define r_context_hi	r32
-#define r_access_lo	r33
-#define r_access_hi	r34
-#define r_asid		r35
+#define r_context_lo	r30
+#define r_context_hi	r31
+#define r_access_lo	r32
+#define r_access_hi	r33
+#define r_asid		r34
 
 STD_ENTRY(flush_and_install_context)
 	/*
@@ -104,11 +102,7 @@ STD_ENTRY(flush_and_install_context)
 	 sw r_tmp, r33
 	 addi r_tmp, sp, FRAME_R34
 	}
-	{
-	 sw r_tmp, r34
-	 addi r_tmp, sp, FRAME_R35
-	}
-	sw r_tmp, r35
+	sw r_tmp, r34
 
 	/* Move some arguments to callee-save registers. */
 	{
@@ -121,13 +115,6 @@ STD_ENTRY(flush_and_install_context)
 	}
 	move r_asid, r_asid_in
 
-	/* Disable interrupts, since we can't use our stack. */
-	{
-	 mfspr r_save_ics, INTERRUPT_CRITICAL_SECTION
-	 movei r_tmp, 1
-	}
-	mtspr INTERRUPT_CRITICAL_SECTION, r_tmp
-
 	/* First, flush our L2 cache. */
 	{
 	 move r0, zero  /* cache_pa */
@@ -163,7 +150,7 @@ STD_ENTRY(flush_and_install_context)
 	}
 	{
 	 move r4, r_asid
-	 movei r5, HV_CTX_DIRECTIO
+	 moveli r5, HV_CTX_DIRECTIO | CTX_PAGE_FLAG
 	}
 	jal hv_install_context
 	bnz r0, .Ldone
@@ -175,9 +162,6 @@ STD_ENTRY(flush_and_install_context)
 	}
 
 .Ldone:
-	/* Reset interrupts back how they were before. */
-	mtspr INTERRUPT_CRITICAL_SECTION, r_save_ics
-
 	/* Restore the callee-saved registers and return. */
 	addli lr, sp, FRAME_SIZE
 	{
@@ -202,10 +186,6 @@ STD_ENTRY(flush_and_install_context)
 	}
 	{
 	 lw r34, r_tmp
-	 addli r_tmp, sp, FRAME_R35
-	}
-	{
-	 lw r35, r_tmp
 	 addi sp, sp, FRAME_SIZE
 	}
 	jrp lr
diff --git a/arch/tile/mm/migrate_64.S b/arch/tile/mm/migrate_64.S
index e76fea6..1d15b10 100644
--- a/arch/tile/mm/migrate_64.S
+++ b/arch/tile/mm/migrate_64.S
@@ -38,8 +38,7 @@
 #define FRAME_R30	16
 #define FRAME_R31	24
 #define FRAME_R32	32
-#define FRAME_R33	40
-#define FRAME_SIZE	48
+#define FRAME_SIZE	40
 
 
 
@@ -60,10 +59,9 @@
 #define r_my_cpumask	r3
 
 /* Locals (callee-save); must not be more than FRAME_xxx above. */
-#define r_save_ics	r30
-#define r_context	r31
-#define r_access	r32
-#define r_asid		r33
+#define r_context	r30
+#define r_access	r31
+#define r_asid		r32
 
 /*
  * Caller-save locals and frame constants are the same as
@@ -93,11 +91,7 @@ STD_ENTRY(flush_and_install_context)
 	 st r_tmp, r31
 	 addi r_tmp, sp, FRAME_R32
 	}
-	{
-	 st r_tmp, r32
-	 addi r_tmp, sp, FRAME_R33
-	}
-	st r_tmp, r33
+	st r_tmp, r32
 
 	/* Move some arguments to callee-save registers. */
 	{
@@ -106,13 +100,6 @@ STD_ENTRY(flush_and_install_context)
 	}
 	move r_asid, r_asid_in
 
-	/* Disable interrupts, since we can't use our stack. */
-	{
-	 mfspr r_save_ics, INTERRUPT_CRITICAL_SECTION
-	 movei r_tmp, 1
-	}
-	mtspr INTERRUPT_CRITICAL_SECTION, r_tmp
-
 	/* First, flush our L2 cache. */
 	{
 	 move r0, zero  /* cache_pa */
@@ -147,7 +134,7 @@ STD_ENTRY(flush_and_install_context)
 	}
 	{
 	 move r2, r_asid
-	 movei r3, HV_CTX_DIRECTIO
+	 moveli r3, HV_CTX_DIRECTIO | CTX_PAGE_FLAG
 	}
 	jal hv_install_context
 	bnez r0, 1f
@@ -158,10 +145,7 @@ STD_ENTRY(flush_and_install_context)
 	 jal hv_flush_all
 	}
 
-1:      /* Reset interrupts back how they were before. */
-	mtspr INTERRUPT_CRITICAL_SECTION, r_save_ics
-
-	/* Restore the callee-saved registers and return. */
+1:	/* Restore the callee-saved registers and return. */
 	addli lr, sp, FRAME_SIZE
 	{
 	 ld lr, lr
@@ -177,10 +161,6 @@ STD_ENTRY(flush_and_install_context)
 	}
 	{
 	 ld r32, r_tmp
-	 addli r_tmp, sp, FRAME_R33
-	}
-	{
-	 ld r33, r_tmp
 	 addi sp, sp, FRAME_SIZE
 	}
 	jrp lr
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: support building big-endian kernel
       [not found] <4F761E1C.80808.com>
                   ` (8 preceding siblings ...)
  2012-03-27 19:40 ` [PATCH] arch/tile: use interrupt critical sections less Chris Metcalf
@ 2012-03-29 17:30 ` Chris Metcalf
  2012-03-29 17:39 ` [PATCH] arch/tile: optimize get_user/put_user and friends Chris Metcalf
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 17:30 UTC (permalink / raw)
  To: Chris Metcalf, Rusty Russell, Mike Frysinger, Jonas Bonn,
	Geert Uytterhoeven, Dmitry Torokhov, linux-kernel

The toolchain supports big-endian mode now, so add support for building
the kernel to run big-endian as well.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/byteorder.h |   20 ++++++++++++++++++++
 arch/tile/include/asm/elf.h       |    5 +++++
 arch/tile/kernel/module.c         |   12 +++++++++++-
 arch/tile/kernel/single_step.c    |   16 ++++++++++++----
 arch/tile/lib/memchr_64.c         |    8 +++-----
 arch/tile/lib/memcpy_64.c         |   23 +++++++++++++++++++++--
 arch/tile/lib/strchr_64.c         |   15 +++++----------
 arch/tile/lib/string-endian.h     |   33 +++++++++++++++++++++++++++++++++
 arch/tile/lib/strlen_64.c         |   11 ++++-------
 9 files changed, 114 insertions(+), 29 deletions(-)
 create mode 100644 arch/tile/lib/string-endian.h

diff --git a/arch/tile/include/asm/byteorder.h b/arch/tile/include/asm/byteorder.h
index 9558416..fb72ecf 100644
--- a/arch/tile/include/asm/byteorder.h
+++ b/arch/tile/include/asm/byteorder.h
@@ -1 +1,21 @@
+/*
+ * Copyright 2011 Tilera Corporation. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ *   NON INFRINGEMENT.  See the GNU General Public License for
+ *   more details.
+ */
+
+#if defined (__BIG_ENDIAN__)
+#include <linux/byteorder/big_endian.h>
+#elif defined (__LITTLE_ENDIAN__)
 #include <linux/byteorder/little_endian.h>
+#else
+#error "__BIG_ENDIAN__ or __LITTLE_ENDIAN__ must be defined."
+#endif
diff --git a/arch/tile/include/asm/elf.h b/arch/tile/include/asm/elf.h
index 623a6bb..d16d006 100644
--- a/arch/tile/include/asm/elf.h
+++ b/arch/tile/include/asm/elf.h
@@ -44,7 +44,11 @@ typedef elf_fpreg_t elf_fpregset_t[ELF_NFPREG];
 #else
 #define ELF_CLASS	ELFCLASS32
 #endif
+#ifdef __BIG_ENDIAN__
+#define ELF_DATA	ELFDATA2MSB
+#else
 #define ELF_DATA	ELFDATA2LSB
+#endif
 
 /*
  * There seems to be a bug in how compat_binfmt_elf.c works: it
@@ -59,6 +63,7 @@ enum { ELF_ARCH = CHIP_ELF_TYPE() };
  */
 #define elf_check_arch(x)  \
 	((x)->e_ident[EI_CLASS] == ELF_CLASS && \
+	 (x)->e_ident[EI_DATA] == ELF_DATA && \
 	 (x)->e_machine == CHIP_ELF_TYPE())
 
 /* The module loader only handles a few relocation types. */
diff --git a/arch/tile/kernel/module.c b/arch/tile/kernel/module.c
index b90ab99..bb2dc1e 100644
--- a/arch/tile/kernel/module.c
+++ b/arch/tile/kernel/module.c
@@ -157,7 +157,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs,
 
 		switch (ELF_R_TYPE(rel[i].r_info)) {
 
-#define MUNGE(func) (*location = ((*location & ~func(-1)) | func(value)))
+#ifdef __LITTLE_ENDIAN
+# define MUNGE(func) \
+	(*location = ((*location & ~func(-1)) | func(value)))
+#else
+/*
+ * Instructions are always little-endian, so when we read them as data,
+ * we have to swap them around before and after modifying them.
+ */
+# define MUNGE(func) \
+	(*location = swab64((swab64(*location) & ~func(-1)) | func(value)))
+#endif
 
 #ifndef __tilegx__
 		case R_TILE_32:
diff --git a/arch/tile/kernel/single_step.c b/arch/tile/kernel/single_step.c
index b7a8795..b231ef4 100644
--- a/arch/tile/kernel/single_step.c
+++ b/arch/tile/kernel/single_step.c
@@ -152,9 +152,6 @@ static tile_bundle_bits rewrite_load_store_unaligned(
 	if (((unsigned long)addr % size) == 0)
 		return bundle;
 
-#ifndef __LITTLE_ENDIAN
-# error We assume little-endian representation with copy_xx_user size 2 here
-#endif
 	/* Handle unaligned load/store */
 	if (mem_op == MEMOP_LOAD || mem_op == MEMOP_LOAD_POSTINCR) {
 		unsigned short val_16;
@@ -175,8 +172,19 @@ static tile_bundle_bits rewrite_load_store_unaligned(
 			state->update = 1;
 		}
 	} else {
+		unsigned short val_16;
 		val = (val_reg == TREG_ZERO) ? 0 : regs->regs[val_reg];
-		err = copy_to_user(addr, &val, size);
+		switch (size) {
+		case 2:
+			val_16 = val;
+			err = copy_to_user(addr, &val_16, sizeof(val_16));
+			break;
+		case 4:
+			err = copy_to_user(addr, &val, sizeof(val));
+			break;
+		default:
+			BUG();
+		}
 	}
 
 	if (err) {
diff --git a/arch/tile/lib/memchr_64.c b/arch/tile/lib/memchr_64.c
index 84fdc8d..6f867db 100644
--- a/arch/tile/lib/memchr_64.c
+++ b/arch/tile/lib/memchr_64.c
@@ -15,6 +15,7 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/module.h>
+#include "string-endian.h"
 
 void *memchr(const void *s, int c, size_t n)
 {
@@ -39,11 +40,8 @@ void *memchr(const void *s, int c, size_t n)
 
 	/* Read the first word, but munge it so that bytes before the array
 	 * will not match goal.
-	 *
-	 * Note that this shift count expression works because we know
-	 * shift counts are taken mod 64.
 	 */
-	before_mask = (1ULL << (s_int << 3)) - 1;
+	before_mask = MASK(s_int);
 	v = (*p | before_mask) ^ (goal & before_mask);
 
 	/* Compute the address of the last byte. */
@@ -65,7 +63,7 @@ void *memchr(const void *s, int c, size_t n)
 	/* We found a match, but it might be in a byte past the end
 	 * of the array.
 	 */
-	ret = ((char *)p) + (__insn_ctz(bits) >> 3);
+	ret = ((char *)p) + (CFZ(bits) >> 3);
 	return (ret <= last_byte_ptr) ? ret : NULL;
 }
 EXPORT_SYMBOL(memchr);
diff --git a/arch/tile/lib/memcpy_64.c b/arch/tile/lib/memcpy_64.c
index 3fab9a6..c79b8e7 100644
--- a/arch/tile/lib/memcpy_64.c
+++ b/arch/tile/lib/memcpy_64.c
@@ -15,7 +15,6 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/module.h>
-#define __memcpy memcpy
 /* EXPORT_SYMBOL() is in arch/tile/lib/exports.c since this should be asm. */
 
 /* Must be 8 bytes in size. */
@@ -188,6 +187,7 @@ int USERCOPY_FUNC(void *__restrict dstv, const void *__restrict srcv, size_t n)
 
 	/* n != 0 if we get here.  Write out any trailing bytes. */
 	dst1 = (char *)dst8;
+#ifndef __BIG_ENDIAN__
 	if (n & 4) {
 		ST4((uint32_t *)dst1, final);
 		dst1 += 4;
@@ -202,11 +202,30 @@ int USERCOPY_FUNC(void *__restrict dstv, const void *__restrict srcv, size_t n)
 	}
 	if (n)
 		ST1((uint8_t *)dst1, final);
+#else
+	if (n & 4) {
+		ST4((uint32_t *)dst1, final >> 32);
+		dst1 += 4;
+        }
+        else
+        {
+		final >>= 32;
+        }
+	if (n & 2) {
+		ST2((uint16_t *)dst1, final >> 16);
+		dst1 += 2;
+        }
+        else
+        {
+		final >>= 16;
+        }
+	if (n & 1)
+		ST1((uint8_t *)dst1, final >> 8);
+#endif
 
 	return RETVAL;
 }
 
-
 #ifdef USERCOPY_FUNC
 #undef ST1
 #undef ST2
diff --git a/arch/tile/lib/strchr_64.c b/arch/tile/lib/strchr_64.c
index 617a927..f39f9dc 100644
--- a/arch/tile/lib/strchr_64.c
+++ b/arch/tile/lib/strchr_64.c
@@ -15,8 +15,7 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/module.h>
-
-#undef strchr
+#include "string-endian.h"
 
 char *strchr(const char *s, int c)
 {
@@ -33,13 +32,9 @@ char *strchr(const char *s, int c)
 	 * match neither zero nor goal (we make sure the high bit of each
 	 * byte is 1, and the low 7 bits are all the opposite of the goal
 	 * byte).
-	 *
-	 * Note that this shift count expression works because we know shift
-	 * counts are taken mod 64.
 	 */
-	const uint64_t before_mask = (1ULL << (s_int << 3)) - 1;
-	uint64_t v = (*p | before_mask) ^
-		(goal & __insn_v1shrsi(before_mask, 1));
+	const uint64_t before_mask = MASK(s_int);
+	uint64_t v = (*p | before_mask) ^ (goal & __insn_v1shrui(before_mask, 1));
 
 	uint64_t zero_matches, goal_matches;
 	while (1) {
@@ -55,8 +50,8 @@ char *strchr(const char *s, int c)
 		v = *++p;
 	}
 
-	z = __insn_ctz(zero_matches);
-	g = __insn_ctz(goal_matches);
+	z = CFZ(zero_matches);
+	g = CFZ(goal_matches);
 
 	/* If we found c before '\0' we got a match. Note that if c == '\0'
 	 * then g == z, and we correctly return the address of the '\0'
diff --git a/arch/tile/lib/string-endian.h b/arch/tile/lib/string-endian.h
new file mode 100644
index 0000000..c0eed7c
--- /dev/null
+++ b/arch/tile/lib/string-endian.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright 2011 Tilera Corporation. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ *   NON INFRINGEMENT.  See the GNU General Public License for
+ *   more details.
+ *
+ * Provide a mask based on the pointer alignment that
+ * sets up non-zero bytes before the beginning of the string.
+ * The MASK expression works because shift counts are taken mod 64.
+ * Also, specify how to count "first" and "last" bits
+ * when the bits have been read as a word.
+ */
+
+#include <asm/byteorder.h>
+
+#ifdef __LITTLE_ENDIAN
+#define MASK(x) (__insn_shl(1ULL, (x << 3)) - 1)
+#define NULMASK(x) ((2ULL << x) - 1)
+#define CFZ(x) __insn_ctz(x)
+#define REVCZ(x) __insn_clz(x)
+#else
+#define MASK(x) (__insn_shl(-2LL, ((-x << 3) - 1)))
+#define NULMASK(x) (-2LL << (63 - x))
+#define CFZ(x) __insn_clz(x)
+#define REVCZ(x) __insn_ctz(x)
+#endif
diff --git a/arch/tile/lib/strlen_64.c b/arch/tile/lib/strlen_64.c
index 1c92d46..9583fc3 100644
--- a/arch/tile/lib/strlen_64.c
+++ b/arch/tile/lib/strlen_64.c
@@ -15,8 +15,7 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/module.h>
-
-#undef strlen
+#include "string-endian.h"
 
 size_t strlen(const char *s)
 {
@@ -24,15 +23,13 @@ size_t strlen(const char *s)
 	const uintptr_t s_int = (uintptr_t) s;
 	const uint64_t *p = (const uint64_t *)(s_int & -8);
 
-	/* Read the first word, but force bytes before the string to be nonzero.
-	 * This expression works because we know shift counts are taken mod 64.
-	 */
-	uint64_t v = *p | ((1ULL << (s_int << 3)) - 1);
+	/* Read and MASK the first word. */
+	uint64_t v = *p | MASK(s_int);
 
 	uint64_t bits;
 	while ((bits = __insn_v1cmpeqi(v, 0)) == 0)
 		v = *++p;
 
-	return ((const char *)p) + (__insn_ctz(bits) >> 3) - s;
+	return ((const char *)p) + (CFZ(bits) >> 3) - s;
 }
 EXPORT_SYMBOL(strlen);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: optimize get_user/put_user and friends
       [not found] <4F761E1C.80808.com>
                   ` (9 preceding siblings ...)
  2012-03-29 17:30 ` [PATCH] arch/tile: support building big-endian kernel Chris Metcalf
@ 2012-03-29 17:39 ` Chris Metcalf
  2012-03-29 17:58 ` [PATCH] arch/tile: Allow tilegx to build with either 16K or 64K page size Chris Metcalf
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 17:39 UTC (permalink / raw)
  To: Chris Metcalf, Andrew Morton, Eric Dumazet, Mike Frysinger,
	Arun Sharma, Arnd Bergmann, Dmitry Torokhov, linux-kernel

Use direct load/store for the get_user/put_user.

Previously, we would call out to a helper routine that would do the
appropriate thing and then return, handling the possible exception
internally.  Now we inline the load or store, along with a "we succeeded"
indication in a register; if the load or store faults, we write a
"we failed" indication into the same register and then return to the
following instruction.  This is more efficient and gives us more compact
code, as well as being more in line with what other architectures do.

The special futex assembly source file for TILE-Gx also disappears in
this change; we just use the same inlining idiom there as well, putting
the appropriate atomic operations directly into futex_atomic_op_inuser()
(and thus into the FUTEX_WAIT function).

The underlying atomic copy_from_user, copy_to_user functions were
renamed using the (cryptic) x86 convention as copy_from_user_ll and
copy_to_user_ll.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/atomic_32.h |   10 ++
 arch/tile/include/asm/futex.h     |  143 ++++++++++++++++--------
 arch/tile/include/asm/uaccess.h   |  222 ++++++++++++++++++++++---------------
 arch/tile/kernel/Makefile         |    1 -
 arch/tile/lib/atomic_32.c         |   47 +--------
 arch/tile/lib/exports.c           |    8 --
 arch/tile/lib/usercopy_32.S       |   76 -------------
 arch/tile/lib/usercopy_64.S       |   49 --------
 8 files changed, 241 insertions(+), 315 deletions(-)

diff --git a/arch/tile/include/asm/atomic_32.h b/arch/tile/include/asm/atomic_32.h
index 004d6d9..d2a45e0 100644
--- a/arch/tile/include/asm/atomic_32.h
+++ b/arch/tile/include/asm/atomic_32.h
@@ -302,7 +302,14 @@ void __init_atomic_per_cpu(void);
 void __atomic_fault_unlock(int *lock_ptr);
 #endif
 
+/* Return a pointer to the lock for the given address. */
+int *__atomic_hashed_lock(volatile void *v);
+
 /* Private helper routines in lib/atomic_asm_32.S */
+struct __get_user {
+	unsigned long val;
+	int err;
+};
 extern struct __get_user __atomic_cmpxchg(volatile int *p,
 					  int *lock, int o, int n);
 extern struct __get_user __atomic_xchg(volatile int *p, int *lock, int n);
@@ -318,6 +325,9 @@ extern u64 __atomic64_xchg_add(volatile u64 *p, int *lock, u64 n);
 extern u64 __atomic64_xchg_add_unless(volatile u64 *p,
 				      int *lock, u64 o, u64 n);
 
+/* Return failure from the atomic wrappers. */
+struct __get_user __atomic_bad_address(int __user *addr);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_TILE_ATOMIC_32_H */
diff --git a/arch/tile/include/asm/futex.h b/arch/tile/include/asm/futex.h
index d03ec12..5909ac3 100644
--- a/arch/tile/include/asm/futex.h
+++ b/arch/tile/include/asm/futex.h
@@ -28,29 +28,81 @@
 #include <linux/futex.h>
 #include <linux/uaccess.h>
 #include <linux/errno.h>
+#include <asm/atomic.h>
 
-extern struct __get_user futex_set(u32 __user *v, int i);
-extern struct __get_user futex_add(u32 __user *v, int n);
-extern struct __get_user futex_or(u32 __user *v, int n);
-extern struct __get_user futex_andn(u32 __user *v, int n);
-extern struct __get_user futex_cmpxchg(u32 __user *v, int o, int n);
+/*
+ * Support macros for futex operations.  Do not use these macros directly.
+ * They assume "ret", "val", "oparg", and "uaddr" in the lexical context.
+ * __futex_cmpxchg() additionally assumes "oldval".
+ */
+
+#ifdef __tilegx__
+
+#define __futex_asm(OP) \
+	asm("1: {" #OP " %1, %3, %4; movei %0, 0 }\n"		\
+	    ".pushsection .fixup,\"ax\"\n"			\
+	    "0: { movei %0, %5; j 9f }\n"			\
+	    ".section __ex_table,\"a\"\n"			\
+	    ".quad 1b, 0b\n"					\
+	    ".popsection\n"					\
+	    "9:"						\
+	    : "=r" (ret), "=r" (val), "+m" (*(uaddr))		\
+	    : "r" (uaddr), "r" (oparg), "i" (-EFAULT))
+
+#define __futex_set() __futex_asm(exch4)
+#define __futex_add() __futex_asm(fetchadd4)
+#define __futex_or() __futex_asm(fetchor4)
+#define __futex_andn() ({ oparg = ~oparg; __futex_asm(fetchand4); })
+#define __futex_cmpxchg() \
+	({ __insn_mtspr(SPR_CMPEXCH_VALUE, oldval); __futex_asm(cmpexch4); })
+
+#define __futex_xor()						\
+	({							\
+		u32 oldval, n = oparg;				\
+		if ((ret = __get_user(oldval, uaddr)) == 0) {	\
+			do {					\
+				oparg = oldval ^ n;		\
+				__futex_cmpxchg();		\
+			} while (ret == 0 && oldval != val);	\
+		}						\
+	})
+
+/* No need to prefetch, since the atomic ops go to the home cache anyway. */
+#define __futex_prolog()
 
-#ifndef __tilegx__
-extern struct __get_user futex_xor(u32 __user *v, int n);
 #else
-static inline struct __get_user futex_xor(u32 __user *uaddr, int n)
-{
-	struct __get_user asm_ret = __get_user_4(uaddr);
-	if (!asm_ret.err) {
-		int oldval, newval;
-		do {
-			oldval = asm_ret.val;
-			newval = oldval ^ n;
-			asm_ret = futex_cmpxchg(uaddr, oldval, newval);
-		} while (asm_ret.err == 0 && oldval != asm_ret.val);
+
+#define __futex_call(FN)						\
+	{								\
+		struct __get_user gu = FN((u32 __force *)uaddr, lock, oparg); \
+		val = gu.val;						\
+		ret = gu.err;						\
 	}
-	return asm_ret;
-}
+
+#define __futex_set() __futex_call(__atomic_xchg)
+#define __futex_add() __futex_call(__atomic_xchg_add)
+#define __futex_or() __futex_call(__atomic_or)
+#define __futex_andn() __futex_call(__atomic_andn)
+#define __futex_xor() __futex_call(__atomic_xor)
+
+#define __futex_cmpxchg()						\
+	{								\
+		struct __get_user gu = __atomic_cmpxchg((u32 __force *)uaddr, \
+							lock, oldval, oparg); \
+		val = gu.val;						\
+		ret = gu.err;						\
+	}
+
+/*
+ * Find the lock pointer for the atomic calls to use, and issue a
+ * prefetch to the user address to bring it into cache.  Similar to
+ * __atomic_setup(), but we can't do a read into the L1 since it might
+ * fault; instead we do a prefetch into the L2.
+ */
+#define __futex_prolog()					\
+	int *lock;						\
+	__insn_prefetch(uaddr);					\
+	lock = __atomic_hashed_lock((int __force *)uaddr)
 #endif
 
 static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
@@ -59,8 +111,12 @@ static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
 	int cmp = (encoded_op >> 24) & 15;
 	int oparg = (encoded_op << 8) >> 20;
 	int cmparg = (encoded_op << 20) >> 20;
-	int ret;
-	struct __get_user asm_ret;
+	int uninitialized_var(val), ret;
+
+	__futex_prolog();
+
+	/* The 32-bit futex code makes this assumption, so validate it here. */
+	BUILD_BUG_ON(sizeof(atomic_t) != sizeof(int));
 
 	if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
 		oparg = 1 << oparg;
@@ -71,46 +127,45 @@ static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
 	pagefault_disable();
 	switch (op) {
 	case FUTEX_OP_SET:
-		asm_ret = futex_set(uaddr, oparg);
+		__futex_set();
 		break;
 	case FUTEX_OP_ADD:
-		asm_ret = futex_add(uaddr, oparg);
+		__futex_add();
 		break;
 	case FUTEX_OP_OR:
-		asm_ret = futex_or(uaddr, oparg);
+		__futex_or();
 		break;
 	case FUTEX_OP_ANDN:
-		asm_ret = futex_andn(uaddr, oparg);
+		__futex_andn();
 		break;
 	case FUTEX_OP_XOR:
-		asm_ret = futex_xor(uaddr, oparg);
+		__futex_xor();
 		break;
 	default:
-		asm_ret.err = -ENOSYS;
+		ret = -ENOSYS;
+		break;
 	}
 	pagefault_enable();
 
-	ret = asm_ret.err;
-
 	if (!ret) {
 		switch (cmp) {
 		case FUTEX_OP_CMP_EQ:
-			ret = (asm_ret.val == cmparg);
+			ret = (val == cmparg);
 			break;
 		case FUTEX_OP_CMP_NE:
-			ret = (asm_ret.val != cmparg);
+			ret = (val != cmparg);
 			break;
 		case FUTEX_OP_CMP_LT:
-			ret = (asm_ret.val < cmparg);
+			ret = (val < cmparg);
 			break;
 		case FUTEX_OP_CMP_GE:
-			ret = (asm_ret.val >= cmparg);
+			ret = (val >= cmparg);
 			break;
 		case FUTEX_OP_CMP_LE:
-			ret = (asm_ret.val <= cmparg);
+			ret = (val <= cmparg);
 			break;
 		case FUTEX_OP_CMP_GT:
-			ret = (asm_ret.val > cmparg);
+			ret = (val > cmparg);
 			break;
 		default:
 			ret = -ENOSYS;
@@ -120,22 +175,20 @@ static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
 }
 
 static inline int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
-						u32 oldval, u32 newval)
+						u32 oldval, u32 oparg)
 {
-	struct __get_user asm_ret;
+	int ret, val;
+
+	__futex_prolog();
 
 	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
 		return -EFAULT;
 
-	asm_ret = futex_cmpxchg(uaddr, oldval, newval);
-	*uval = asm_ret.val;
-	return asm_ret.err;
-}
+	__futex_cmpxchg();
 
-#ifndef __tilegx__
-/* Return failure from the atomic wrappers. */
-struct __get_user __atomic_bad_address(int __user *addr);
-#endif
+	*uval = val;
+	return ret;
+}
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
index ef34d2c..c3dd275 100644
--- a/arch/tile/include/asm/uaccess.h
+++ b/arch/tile/include/asm/uaccess.h
@@ -114,45 +114,75 @@ struct exception_table_entry {
 extern int fixup_exception(struct pt_regs *regs);
 
 /*
- * We return the __get_user_N function results in a structure,
- * thus in r0 and r1.  If "err" is zero, "val" is the result
- * of the read; otherwise, "err" is -EFAULT.
- *
- * We rarely need 8-byte values on a 32-bit architecture, but
- * we size the structure to accommodate.  In practice, for the
- * the smaller reads, we can zero the high word for free, and
- * the caller will ignore it by virtue of casting anyway.
+ * Support macros for __get_user().
+ *
+ * Implementation note: The "case 8" logic of casting to the type of
+ * the result of subtracting the value from itself is basically a way
+ * of keeping all integer types the same, but casting any pointers to
+ * ptrdiff_t, i.e. also an integer type.  This way there are no
+ * questionable casts seen by the compiler on an ILP32 platform.
+ *
+ * Note that __get_user() and __put_user() assume proper alignment.
  */
-struct __get_user {
-	unsigned long long val;
-	int err;
-};
 
-/*
- * FIXME: we should express these as inline extended assembler, since
- * they're fundamentally just a variable dereference and some
- * supporting exception_table gunk.  Note that (a la i386) we can
- * extend the copy_to_user and copy_from_user routines to call into
- * such extended assembler routines, though we will have to use a
- * different return code in that case (1, 2, or 4, rather than -EFAULT).
- */
-extern struct __get_user __get_user_1(const void __user *);
-extern struct __get_user __get_user_2(const void __user *);
-extern struct __get_user __get_user_4(const void __user *);
-extern struct __get_user __get_user_8(const void __user *);
-extern int __put_user_1(long, void __user *);
-extern int __put_user_2(long, void __user *);
-extern int __put_user_4(long, void __user *);
-extern int __put_user_8(long long, void __user *);
-
-/* Unimplemented routines to cause linker failures */
-extern struct __get_user __get_user_bad(void);
-extern int __put_user_bad(void);
+#ifdef __LP64__
+#define _ASM_PTR	".quad"
+#else
+#define _ASM_PTR	".long"
+#endif
+
+#define __get_user_asm(OP, x, ptr, ret)					\
+	asm volatile("1: {" #OP " %1, %2; movei %0, 0 }\n"		\
+		     ".pushsection .fixup,\"ax\"\n"			\
+		     "0: { movei %1, 0; movei %0, %3 }\n"		\
+		     "j 9f\n"						\
+		     ".section __ex_table,\"a\"\n"			\
+		     _ASM_PTR " 1b, 0b\n"				\
+		     ".popsection\n"					\
+		     "9:"						\
+		     : "=r" (ret), "=r" (x)				\
+		     : "r" (ptr), "i" (-EFAULT))
+
+#ifdef __tilegx__
+#define __get_user_1(x, ptr, ret) __get_user_asm(ld1u, x, ptr, ret)
+#define __get_user_2(x, ptr, ret) __get_user_asm(ld2u, x, ptr, ret)
+#define __get_user_4(x, ptr, ret) __get_user_asm(ld4u, x, ptr, ret)
+#define __get_user_8(x, ptr, ret) __get_user_asm(ld, x, ptr, ret)
+#else
+#define __get_user_1(x, ptr, ret) __get_user_asm(lb_u, x, ptr, ret)
+#define __get_user_2(x, ptr, ret) __get_user_asm(lh_u, x, ptr, ret)
+#define __get_user_4(x, ptr, ret) __get_user_asm(lw, x, ptr, ret)
+#ifdef __LITTLE_ENDIAN
+#define __lo32(a, b) a
+#define __hi32(a, b) b
+#else
+#define __lo32(a, b) b
+#define __hi32(a, b) a
+#endif
+#define __get_user_8(x, ptr, ret)					\
+	({								\
+		unsigned int __a, __b;					\
+		asm volatile("1: { lw %1, %3; addi %2, %3, 4 }\n"	\
+			     "2: { lw %2, %2; movei %0, 0 }\n"		\
+			     ".pushsection .fixup,\"ax\"\n"		\
+			     "0: { movei %1, 0; movei %2, 0 }\n"	\
+			     "{ movei %0, %4; j 9f }\n"			\
+			     ".section __ex_table,\"a\"\n"		\
+			     ".word 1b, 0b\n"				\
+			     ".word 2b, 0b\n"				\
+			     ".popsection\n"				\
+			     "9:"					\
+			     : "=r" (ret), "=r" (__a), "=&r" (__b)	\
+			     : "r" (ptr), "i" (-EFAULT));		\
+		(x) = (__typeof(x))(__typeof((x)-(x)))			\
+			(((u64)__hi32(__a, __b) << 32) |		\
+			 __lo32(__a, __b));				\
+	})
+#endif
+
+extern int __get_user_bad(void)
+  __attribute__((warning("sizeof __get_user argument not 1, 2, 4 or 8")));
 
-/*
- * Careful: we have to cast the result to the type of the pointer
- * for sign reasons.
- */
 /**
  * __get_user: - Get a simple variable from user space, with less checking.
  * @x:   Variable to store result.
@@ -174,30 +204,62 @@ extern int __put_user_bad(void);
  * function.
  */
 #define __get_user(x, ptr)						\
-({	struct __get_user __ret;					\
-	__typeof__(*(ptr)) const __user *__gu_addr = (ptr);		\
-	__chk_user_ptr(__gu_addr);					\
-	switch (sizeof(*(__gu_addr))) {					\
-	case 1:								\
-		__ret = __get_user_1(__gu_addr);			\
-		break;							\
-	case 2:								\
-		__ret = __get_user_2(__gu_addr);			\
-		break;							\
-	case 4:								\
-		__ret = __get_user_4(__gu_addr);			\
-		break;							\
-	case 8:								\
-		__ret = __get_user_8(__gu_addr);			\
-		break;							\
-	default:							\
-		__ret = __get_user_bad();				\
-		break;							\
-	}								\
-	(x) = (__typeof__(*__gu_addr)) (__typeof__(*__gu_addr - *__gu_addr)) \
-	  __ret.val;			                                \
-	__ret.err;							\
-})
+	({								\
+		int __ret;						\
+		__chk_user_ptr(ptr);					\
+		switch (sizeof(*(ptr))) {				\
+		case 1: __get_user_1(x, ptr, __ret); break;		\
+		case 2: __get_user_2(x, ptr, __ret); break;		\
+		case 4: __get_user_4(x, ptr, __ret); break;		\
+		case 8: __get_user_8(x, ptr, __ret); break;		\
+		default: __ret = __get_user_bad(); break;		\
+		}							\
+		__ret;							\
+	})
+
+/* Support macros for __put_user(). */
+
+#define __put_user_asm(OP, x, ptr, ret)			\
+	asm volatile("1: {" #OP " %1, %2; movei %0, 0 }\n"		\
+		     ".pushsection .fixup,\"ax\"\n"			\
+		     "0: { movei %0, %3; j 9f }\n"			\
+		     ".section __ex_table,\"a\"\n"			\
+		     _ASM_PTR " 1b, 0b\n"				\
+		     ".popsection\n"					\
+		     "9:"						\
+		     : "=r" (ret)					\
+		     : "r" (ptr), "r" (x), "i" (-EFAULT))
+
+#ifdef __tilegx__
+#define __put_user_1(x, ptr, ret) __put_user_asm(st1, x, ptr, ret)
+#define __put_user_2(x, ptr, ret) __put_user_asm(st2, x, ptr, ret)
+#define __put_user_4(x, ptr, ret) __put_user_asm(st4, x, ptr, ret)
+#define __put_user_8(x, ptr, ret) __put_user_asm(st, x, ptr, ret)
+#else
+#define __put_user_1(x, ptr, ret) __put_user_asm(sb, x, ptr, ret)
+#define __put_user_2(x, ptr, ret) __put_user_asm(sh, x, ptr, ret)
+#define __put_user_4(x, ptr, ret) __put_user_asm(sw, x, ptr, ret)
+#define __put_user_8(x, ptr, ret)					\
+	({								\
+		u64 __x = (__typeof((x)-(x)))(x);			\
+		int __lo = (int) __x, __hi = (int) (__x >> 32);		\
+		asm volatile("1: { sw %1, %2; addi %0, %1, 4 }\n"	\
+			     "2: { sw %0, %3; movei %0, 0 }\n"		\
+			     ".pushsection .fixup,\"ax\"\n"		\
+			     "0: { movei %0, %4; j 9f }\n"		\
+			     ".section __ex_table,\"a\"\n"		\
+			     ".word 1b, 0b\n"				\
+			     ".word 2b, 0b\n"				\
+			     ".popsection\n"				\
+			     "9:"					\
+			     : "=&r" (ret)				\
+			     : "r" (ptr), "r" (__lo32(__lo, __hi)),	\
+			     "r" (__hi32(__lo, __hi)), "i" (-EFAULT));	\
+	})
+#endif
+
+extern int __put_user_bad(void)
+  __attribute__((warning("sizeof __put_user argument not 1, 2, 4 or 8")));
 
 /**
  * __put_user: - Write a simple value into user space, with less checking.
@@ -217,39 +279,19 @@ extern int __put_user_bad(void);
  * function.
  *
  * Returns zero on success, or -EFAULT on error.
- *
- * Implementation note: The "case 8" logic of casting to the type of
- * the result of subtracting the value from itself is basically a way
- * of keeping all integer types the same, but casting any pointers to
- * ptrdiff_t, i.e. also an integer type.  This way there are no
- * questionable casts seen by the compiler on an ILP32 platform.
  */
 #define __put_user(x, ptr)						\
 ({									\
-	int __pu_err = 0;						\
-	__typeof__(*(ptr)) __user *__pu_addr = (ptr);			\
-	typeof(*__pu_addr) __pu_val = (x);				\
-	__chk_user_ptr(__pu_addr);					\
-	switch (sizeof(__pu_val)) {					\
-	case 1:								\
-		__pu_err = __put_user_1((long)__pu_val, __pu_addr);	\
-		break;							\
-	case 2:								\
-		__pu_err = __put_user_2((long)__pu_val, __pu_addr);	\
-		break;							\
-	case 4:								\
-		__pu_err = __put_user_4((long)__pu_val, __pu_addr);	\
-		break;							\
-	case 8:								\
-		__pu_err =						\
-		  __put_user_8((__typeof__(__pu_val - __pu_val))__pu_val,\
-			__pu_addr);					\
-		break;							\
-	default:							\
-		__pu_err = __put_user_bad();				\
-		break;							\
+	int __ret;							\
+	__chk_user_ptr(ptr);						\
+	switch (sizeof(*(ptr))) {					\
+	case 1: __put_user_1(x, ptr, __ret); break;			\
+	case 2: __put_user_2(x, ptr, __ret); break;			\
+	case 4: __put_user_4(x, ptr, __ret); break;			\
+	case 8: __put_user_8(x, ptr, __ret); break;			\
+	default: __ret = __put_user_bad(); break;			\
 	}								\
-	__pu_err;							\
+	__ret;								\
 })
 
 /*
@@ -378,7 +420,7 @@ static inline unsigned long __must_check copy_from_user(void *to,
 /**
  * __copy_in_user() - copy data within user space, with less checking.
  * @to:   Destination address, in user space.
- * @from: Source address, in kernel space.
+ * @from: Source address, in user space.
  * @n:    Number of bytes to copy.
  *
  * Context: User context only.  This function may sleep.
diff --git a/arch/tile/kernel/Makefile b/arch/tile/kernel/Makefile
index b4dbc05..d6261e4 100644
--- a/arch/tile/kernel/Makefile
+++ b/arch/tile/kernel/Makefile
@@ -9,7 +9,6 @@ obj-y := backtrace.o entry.o init_task.o irq.o messaging.o \
 	intvec_$(BITS).o regs_$(BITS).o tile-desc_$(BITS).o
 
 obj-$(CONFIG_HARDWALL)		+= hardwall.o
-obj-$(CONFIG_TILEGX)		+= futex_64.o
 obj-$(CONFIG_COMPAT)		+= compat.o compat_signal.o
 obj-$(CONFIG_SMP)		+= smpboot.o smp.o tlb.o
 obj-$(CONFIG_MODULES)		+= module.o
diff --git a/arch/tile/lib/atomic_32.c b/arch/tile/lib/atomic_32.c
index 771b251..f5cada7 100644
--- a/arch/tile/lib/atomic_32.c
+++ b/arch/tile/lib/atomic_32.c
@@ -18,7 +18,6 @@
 #include <linux/module.h>
 #include <linux/mm.h>
 #include <linux/atomic.h>
-#include <asm/futex.h>
 #include <arch/chip.h>
 
 /* See <asm/atomic_32.h> */
@@ -50,7 +49,7 @@ int atomic_locks[PAGE_SIZE / sizeof(int)] __page_aligned_bss;
 
 #endif /* ATOMIC_LOCKS_FOUND_VIA_TABLE() */
 
-static inline int *__atomic_hashed_lock(volatile void *v)
+int *__atomic_hashed_lock(volatile void *v)
 {
 	/* NOTE: this code must match "sys_cmpxchg" in kernel/intvec_32.S */
 #if ATOMIC_LOCKS_FOUND_VIA_TABLE()
@@ -191,47 +190,6 @@ u64 _atomic64_cmpxchg(atomic64_t *v, u64 o, u64 n)
 EXPORT_SYMBOL(_atomic64_cmpxchg);
 
 
-static inline int *__futex_setup(int __user *v)
-{
-	/*
-	 * Issue a prefetch to the counter to bring it into cache.
-	 * As for __atomic_setup, but we can't do a read into the L1
-	 * since it might fault; instead we do a prefetch into the L2.
-	 */
-	__insn_prefetch(v);
-	return __atomic_hashed_lock((int __force *)v);
-}
-
-struct __get_user futex_set(u32 __user *v, int i)
-{
-	return __atomic_xchg((int __force *)v, __futex_setup(v), i);
-}
-
-struct __get_user futex_add(u32 __user *v, int n)
-{
-	return __atomic_xchg_add((int __force *)v, __futex_setup(v), n);
-}
-
-struct __get_user futex_or(u32 __user *v, int n)
-{
-	return __atomic_or((int __force *)v, __futex_setup(v), n);
-}
-
-struct __get_user futex_andn(u32 __user *v, int n)
-{
-	return __atomic_andn((int __force *)v, __futex_setup(v), n);
-}
-
-struct __get_user futex_xor(u32 __user *v, int n)
-{
-	return __atomic_xor((int __force *)v, __futex_setup(v), n);
-}
-
-struct __get_user futex_cmpxchg(u32 __user *v, int o, int n)
-{
-	return __atomic_cmpxchg((int __force *)v, __futex_setup(v), o, n);
-}
-
 /*
  * If any of the atomic or futex routines hit a bad address (not in
  * the page tables at kernel PL) this routine is called.  The futex
@@ -323,7 +281,4 @@ void __init __init_atomic_per_cpu(void)
 	BUILD_BUG_ON((PAGE_SIZE >> 3) > ATOMIC_HASH_SIZE);
 
 #endif /* ATOMIC_LOCKS_FOUND_VIA_TABLE() */
-
-	/* The futex code makes this assumption, so we validate it here. */
-	BUILD_BUG_ON(sizeof(atomic_t) != sizeof(int));
 }
diff --git a/arch/tile/lib/exports.c b/arch/tile/lib/exports.c
index 2a81d32..dd5f0a3 100644
--- a/arch/tile/lib/exports.c
+++ b/arch/tile/lib/exports.c
@@ -18,14 +18,6 @@
 
 /* arch/tile/lib/usercopy.S */
 #include <linux/uaccess.h>
-EXPORT_SYMBOL(__get_user_1);
-EXPORT_SYMBOL(__get_user_2);
-EXPORT_SYMBOL(__get_user_4);
-EXPORT_SYMBOL(__get_user_8);
-EXPORT_SYMBOL(__put_user_1);
-EXPORT_SYMBOL(__put_user_2);
-EXPORT_SYMBOL(__put_user_4);
-EXPORT_SYMBOL(__put_user_8);
 EXPORT_SYMBOL(strnlen_user_asm);
 EXPORT_SYMBOL(strncpy_from_user_asm);
 EXPORT_SYMBOL(clear_user_asm);
diff --git a/arch/tile/lib/usercopy_32.S b/arch/tile/lib/usercopy_32.S
index 979f76d..b62d002 100644
--- a/arch/tile/lib/usercopy_32.S
+++ b/arch/tile/lib/usercopy_32.S
@@ -19,82 +19,6 @@
 
 /* Access user memory, but use MMU to avoid propagating kernel exceptions. */
 
-	.pushsection .fixup,"ax"
-
-get_user_fault:
-	{ move r0, zero; move r1, zero }
-	{ movei r2, -EFAULT; jrp lr }
-	ENDPROC(get_user_fault)
-
-put_user_fault:
-	{ movei r0, -EFAULT; jrp lr }
-	ENDPROC(put_user_fault)
-
-	.popsection
-
-/*
- * __get_user_N functions take a pointer in r0, and return 0 in r2
- * on success, with the value in r0; or else -EFAULT in r2.
- */
-#define __get_user_N(bytes, LOAD) \
-	STD_ENTRY(__get_user_##bytes); \
-1:	{ LOAD r0, r0; move r1, zero; move r2, zero }; \
-	jrp lr; \
-	STD_ENDPROC(__get_user_##bytes); \
-	.pushsection __ex_table,"a"; \
-	.word 1b, get_user_fault; \
-	.popsection
-
-__get_user_N(1, lb_u)
-__get_user_N(2, lh_u)
-__get_user_N(4, lw)
-
-/*
- * __get_user_8 takes a pointer in r0, and returns 0 in r2
- * on success, with the value in r0/r1; or else -EFAULT in r2.
- */
-	STD_ENTRY(__get_user_8);
-1:	{ lw r0, r0; addi r1, r0, 4 };
-2:	{ lw r1, r1; move r2, zero };
-	jrp lr;
-	STD_ENDPROC(__get_user_8);
-	.pushsection __ex_table,"a";
-	.word 1b, get_user_fault;
-	.word 2b, get_user_fault;
-	.popsection
-
-/*
- * __put_user_N functions take a value in r0 and a pointer in r1,
- * and return 0 in r0 on success or -EFAULT on failure.
- */
-#define __put_user_N(bytes, STORE) \
-	STD_ENTRY(__put_user_##bytes); \
-1:	{ STORE r1, r0; move r0, zero }; \
-	jrp lr; \
-	STD_ENDPROC(__put_user_##bytes); \
-	.pushsection __ex_table,"a"; \
-	.word 1b, put_user_fault; \
-	.popsection
-
-__put_user_N(1, sb)
-__put_user_N(2, sh)
-__put_user_N(4, sw)
-
-/*
- * __put_user_8 takes a value in r0/r1 and a pointer in r2,
- * and returns 0 in r0 on success or -EFAULT on failure.
- */
-STD_ENTRY(__put_user_8)
-1:      { sw r2, r0; addi r2, r2, 4 }
-2:      { sw r2, r1; move r0, zero }
-	jrp lr
-	STD_ENDPROC(__put_user_8)
-	.pushsection __ex_table,"a"
-	.word 1b, put_user_fault
-	.word 2b, put_user_fault
-	.popsection
-
-
 /*
  * strnlen_user_asm takes the pointer in r0, and the length bound in r1.
  * It returns the length, including the terminating NUL, or zero on exception.
diff --git a/arch/tile/lib/usercopy_64.S b/arch/tile/lib/usercopy_64.S
index 2ff44f8..adb2dbb 100644
--- a/arch/tile/lib/usercopy_64.S
+++ b/arch/tile/lib/usercopy_64.S
@@ -19,55 +19,6 @@
 
 /* Access user memory, but use MMU to avoid propagating kernel exceptions. */
 
-	.pushsection .fixup,"ax"
-
-get_user_fault:
-	{ movei r1, -EFAULT; move r0, zero }
-	jrp lr
-	ENDPROC(get_user_fault)
-
-put_user_fault:
-	{ movei r0, -EFAULT; jrp lr }
-	ENDPROC(put_user_fault)
-
-	.popsection
-
-/*
- * __get_user_N functions take a pointer in r0, and return 0 in r1
- * on success, with the value in r0; or else -EFAULT in r1.
- */
-#define __get_user_N(bytes, LOAD) \
-	STD_ENTRY(__get_user_##bytes); \
-1:	{ LOAD r0, r0; move r1, zero }; \
-	jrp lr; \
-	STD_ENDPROC(__get_user_##bytes); \
-	.pushsection __ex_table,"a"; \
-	.quad 1b, get_user_fault; \
-	.popsection
-
-__get_user_N(1, ld1u)
-__get_user_N(2, ld2u)
-__get_user_N(4, ld4u)
-__get_user_N(8, ld)
-
-/*
- * __put_user_N functions take a value in r0 and a pointer in r1,
- * and return 0 in r0 on success or -EFAULT on failure.
- */
-#define __put_user_N(bytes, STORE) \
-	STD_ENTRY(__put_user_##bytes); \
-1:	{ STORE r1, r0; move r0, zero }; \
-	jrp lr; \
-	STD_ENDPROC(__put_user_##bytes); \
-	.pushsection __ex_table,"a"; \
-	.quad 1b, put_user_fault; \
-	.popsection
-
-__put_user_N(1, st1)
-__put_user_N(2, st2)
-__put_user_N(4, st4)
-__put_user_N(8, st)
-
 /*
  * strnlen_user_asm takes the pointer in r0, and the length bound in r1.
  * It returns the length, including the terminating NUL, or zero on exception.
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: Allow tilegx to build with either 16K or 64K page size
       [not found] <4F761E1C.80808.com>
                   ` (10 preceding siblings ...)
  2012-03-29 17:39 ` [PATCH] arch/tile: optimize get_user/put_user and friends Chris Metcalf
@ 2012-03-29 17:58 ` Chris Metcalf
  2012-03-29 18:02 ` [PATCH] arch/tile: avoid false corrupt frame warning in early boot Chris Metcalf
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 17:58 UTC (permalink / raw)
  To: Chris Metcalf, KOSAKI Motohiro, Dmitry Torokhov, Lucas De Marchi,
	Andrew Morton, Geert Uytterhoeven, Joe Perches, Ralf Baechle,
	Rusty Russell, Jiri Kosina, Benjamin Herrenschmidt,
	Jesper Nilsson, Russell King, Martin Schwidefsky, Julia Lawall,
	Peter Zijlstra, linux-kernel

This change introduces new flags for the hv_install_context()
API that passes a page table pointer to the hypervisor.  Clients
can explicitly request 4K, 16K, or 64K small pages when they
install a new context.  In practice, the page size is fixed at
kernel compile time and the same size is always requested every
time a new page table is installed.

The <hv/hypervisor.h> header changes so that it provides more abstract
macros for managing "page" things like PFNs and page tables.  For
example there is now a HV_DEFAULT_PAGE_SIZE_SMALL instead of the old
HV_PAGE_SIZE_SMALL.  The various PFN routines have been eliminated and
only PA- or PTFN-based ones remain (since PTFNs are always expressed
in fixed 2KB "page" size).  The page-table management macros are
renamed with a leading underscore and take page-size arguments with
the presumption that clients will use those macros in some single
place to provide the "real" macros they will use themselves.

I happened to notice the old hv_set_caching() API was totally broken
(it assumed 4KB pages) so I changed it so it would nominally work
correctly with other page sizes.

Tag modules with the page size so you can't load a module built with
a conflicting page size.  (And add a test for SMP while we're at it.)

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Kconfig                    |   25 ++++
 arch/tile/include/asm/Kbuild         |    1 -
 arch/tile/include/asm/mmu.h          |    2 +-
 arch/tile/include/asm/mmu_context.h  |    8 +-
 arch/tile/include/asm/module.h       |   40 +++++++
 arch/tile/include/asm/page.h         |   13 ++-
 arch/tile/include/asm/pgalloc.h      |   92 +++++++++++----
 arch/tile/include/asm/pgtable.h      |   10 +-
 arch/tile/include/asm/pgtable_32.h   |   14 ++-
 arch/tile/include/asm/pgtable_64.h   |   28 +++--
 arch/tile/include/hv/drv_xgbe_intf.h |    2 +-
 arch/tile/include/hv/hypervisor.h    |  214 +++++++++++++++++++---------------
 arch/tile/kernel/head_32.S           |    8 +-
 arch/tile/kernel/head_64.S           |   22 ++--
 arch/tile/kernel/machine_kexec.c     |    7 +-
 arch/tile/kernel/setup.c             |    8 +-
 arch/tile/kernel/smp.c               |    2 +-
 arch/tile/kernel/stack.c             |   13 +-
 arch/tile/lib/memcpy_tile64.c        |    8 +-
 arch/tile/mm/init.c                  |   11 +--
 arch/tile/mm/pgtable.c               |   27 ++---
 21 files changed, 351 insertions(+), 204 deletions(-)
 create mode 100644 arch/tile/include/asm/module.h

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 96033e2..d5f2e57 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -138,6 +138,31 @@ config NR_CPUS
 	  smaller kernel memory footprint results from using a smaller
 	  value on chips with fewer tiles.
 
+if TILEGX
+
+choice
+	prompt "Kernel page size"
+	default PAGE_SIZE_64KB
+	help
+	  This lets you select the page size of the kernel.  For best
+	  performance on memory-intensive applications, a page size of 64KB
+	  is recommended.  For workloads involving many small files, many
+	  connections, etc., it may be better to select 16KB, which uses
+	  memory more efficiently at some cost in TLB performance.
+
+	  Note that this option is TILE-Gx specific; currently
+	  TILEPro page size is set by rebuilding the hypervisor.
+
+config PAGE_SIZE_16KB
+	bool "16KB"
+
+config PAGE_SIZE_64KB
+	bool "64KB"
+
+endchoice
+
+endif
+
 source "kernel/time/Kconfig"
 
 source "kernel/Kconfig.hz"
diff --git a/arch/tile/include/asm/Kbuild b/arch/tile/include/asm/Kbuild
index 0bb4264..6b2e681 100644
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += ipcbuf.h
 generic-y += irq_regs.h
 generic-y += kdebug.h
 generic-y += local.h
-generic-y += module.h
 generic-y += msgbuf.h
 generic-y += mutex.h
 generic-y += param.h
diff --git a/arch/tile/include/asm/mmu.h b/arch/tile/include/asm/mmu.h
index 92f94c7..e2c7890 100644
--- a/arch/tile/include/asm/mmu.h
+++ b/arch/tile/include/asm/mmu.h
@@ -21,7 +21,7 @@ struct mm_context {
 	 * Written under the mmap_sem semaphore; read without the
 	 * semaphore but atomically, but it is conservatively set.
 	 */
-	unsigned int priority_cached;
+	unsigned long priority_cached;
 };
 
 typedef struct mm_context mm_context_t;
diff --git a/arch/tile/include/asm/mmu_context.h b/arch/tile/include/asm/mmu_context.h
index 15fb246..37f0b74 100644
--- a/arch/tile/include/asm/mmu_context.h
+++ b/arch/tile/include/asm/mmu_context.h
@@ -30,11 +30,15 @@ init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 	return 0;
 }
 
-/* Note that arch/tile/kernel/head.S also calls hv_install_context() */
+/*
+ * Note that arch/tile/kernel/head_NN.S and arch/tile/mm/migrate_NN.S
+ * also call hv_install_context().
+ */
 static inline void __install_page_table(pgd_t *pgdir, int asid, pgprot_t prot)
 {
 	/* FIXME: DIRECTIO should not always be set. FIXME. */
-	int rc = hv_install_context(__pa(pgdir), prot, asid, HV_CTX_DIRECTIO);
+	int rc = hv_install_context(__pa(pgdir), prot, asid,
+				    HV_CTX_DIRECTIO | CTX_PAGE_FLAG);
 	if (rc < 0)
 		panic("hv_install_context failed: %d", rc);
 }
diff --git a/arch/tile/include/asm/module.h b/arch/tile/include/asm/module.h
new file mode 100644
index 0000000..44ed07c
--- /dev/null
+++ b/arch/tile/include/asm/module.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright 2011 Tilera Corporation. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ *   NON INFRINGEMENT.  See the GNU General Public License for
+ *   more details.
+ */
+
+#ifndef _ASM_TILE_MODULE_H
+#define _ASM_TILE_MODULE_H
+
+#include <arch/chip.h>
+
+#include <asm-generic/module.h>
+
+/* We can't use modules built with different page sizes. */
+#if defined(CONFIG_PAGE_SIZE_16KB)
+# define MODULE_PGSZ " 16KB"
+#elif defined(CONFIG_PAGE_SIZE_64KB)
+# define MODULE_PGSZ " 64KB"
+#else
+# define MODULE_PGSZ ""
+#endif
+
+/* We don't really support no-SMP so tag if someone tries. */
+#ifdef CONFIG_SMP
+#define MODULE_NOSMP ""
+#else
+#define MODULE_NOSMP " nosmp"
+#endif
+
+#define MODULE_ARCH_VERMAGIC CHIP_ARCH_NAME MODULE_PGSZ MODULE_NOSMP
+
+#endif /* _ASM_TILE_MODULE_H */
diff --git a/arch/tile/include/asm/page.h b/arch/tile/include/asm/page.h
index db93518..c750943 100644
--- a/arch/tile/include/asm/page.h
+++ b/arch/tile/include/asm/page.h
@@ -20,8 +20,17 @@
 #include <arch/chip.h>
 
 /* PAGE_SHIFT and HPAGE_SHIFT determine the page sizes. */
-#define PAGE_SHIFT	HV_LOG2_PAGE_SIZE_SMALL
-#define HPAGE_SHIFT	HV_LOG2_PAGE_SIZE_LARGE
+#if defined(CONFIG_PAGE_SIZE_16KB)
+#define PAGE_SHIFT	14
+#define CTX_PAGE_FLAG	HV_CTX_PG_SM_16K
+#elif defined(CONFIG_PAGE_SIZE_64KB)
+#define PAGE_SHIFT	16
+#define CTX_PAGE_FLAG	HV_CTX_PG_SM_64K
+#else
+#define PAGE_SHIFT	HV_LOG2_DEFAULT_PAGE_SIZE_SMALL
+#define CTX_PAGE_FLAG	0
+#endif
+#define HPAGE_SHIFT	HV_LOG2_DEFAULT_PAGE_SIZE_LARGE
 
 #define PAGE_SIZE	(_AC(1, UL) << PAGE_SHIFT)
 #define HPAGE_SIZE	(_AC(1, UL) << HPAGE_SHIFT)
diff --git a/arch/tile/include/asm/pgalloc.h b/arch/tile/include/asm/pgalloc.h
index e919c0b..1b90250 100644
--- a/arch/tile/include/asm/pgalloc.h
+++ b/arch/tile/include/asm/pgalloc.h
@@ -19,24 +19,24 @@
 #include <linux/mm.h>
 #include <linux/mmzone.h>
 #include <asm/fixmap.h>
+#include <asm/page.h>
 #include <hv/hypervisor.h>
 
 /* Bits for the size of the second-level page table. */
-#define L2_KERNEL_PGTABLE_SHIFT \
-  (HV_LOG2_PAGE_SIZE_LARGE - HV_LOG2_PAGE_SIZE_SMALL + HV_LOG2_PTE_SIZE)
+#define L2_KERNEL_PGTABLE_SHIFT _HV_LOG2_L2_SIZE(HPAGE_SHIFT, PAGE_SHIFT)
+
+/* How big is a kernel L2 page table? */
+#define L2_KERNEL_PGTABLE_SIZE (1UL << L2_KERNEL_PGTABLE_SHIFT)
 
 /* We currently allocate user L2 page tables by page (unlike kernel L2s). */
-#if L2_KERNEL_PGTABLE_SHIFT < HV_LOG2_PAGE_SIZE_SMALL
-#define L2_USER_PGTABLE_SHIFT HV_LOG2_PAGE_SIZE_SMALL
+#if L2_KERNEL_PGTABLE_SHIFT < PAGE_SHIFT
+#define L2_USER_PGTABLE_SHIFT PAGE_SHIFT
 #else
 #define L2_USER_PGTABLE_SHIFT L2_KERNEL_PGTABLE_SHIFT
 #endif
 
 /* How many pages do we need, as an "order", for a user L2 page table? */
-#define L2_USER_PGTABLE_ORDER (L2_USER_PGTABLE_SHIFT - HV_LOG2_PAGE_SIZE_SMALL)
-
-/* How big is a kernel L2 page table? */
-#define L2_KERNEL_PGTABLE_SIZE (1 << L2_KERNEL_PGTABLE_SHIFT)
+#define L2_USER_PGTABLE_ORDER (L2_USER_PGTABLE_SHIFT - PAGE_SHIFT)
 
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
@@ -50,14 +50,14 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 static inline void pmd_populate_kernel(struct mm_struct *mm,
 				       pmd_t *pmd, pte_t *ptep)
 {
-	set_pmd(pmd, ptfn_pmd(__pa(ptep) >> HV_LOG2_PAGE_TABLE_ALIGN,
+	set_pmd(pmd, ptfn_pmd(HV_CPA_TO_PTFN(__pa(ptep)),
 			      __pgprot(_PAGE_PRESENT)));
 }
 
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
 				pgtable_t page)
 {
-	set_pmd(pmd, ptfn_pmd(HV_PFN_TO_PTFN(page_to_pfn(page)),
+	set_pmd(pmd, ptfn_pmd(HV_CPA_TO_PTFN(PFN_PHYS(page_to_pfn(page))),
 			      __pgprot(_PAGE_PRESENT)));
 }
 
@@ -68,8 +68,20 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
 
-extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address);
-extern void pte_free(struct mm_struct *mm, struct page *pte);
+extern pgtable_t pgtable_alloc_one(struct mm_struct *mm, unsigned long address,
+				   int order);
+extern void pgtable_free(struct mm_struct *mm, struct page *pte, int order);
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
+				      unsigned long address)
+{
+	return pgtable_alloc_one(mm, address, L2_USER_PGTABLE_ORDER);
+}
+
+static inline void pte_free(struct mm_struct *mm, struct page *pte)
+{
+	pgtable_free(mm, pte, L2_USER_PGTABLE_ORDER);
+}
 
 #define pmd_pgtable(pmd) pmd_page(pmd)
 
@@ -85,8 +97,13 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 	pte_free(mm, virt_to_page(pte));
 }
 
-extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte,
-			   unsigned long address);
+extern void __pgtable_free_tlb(struct mmu_gather *tlb, struct page *pte,
+			       unsigned long address, int order);
+static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte,
+				  unsigned long address)
+{
+	__pgtable_free_tlb(tlb, pte, address, L2_USER_PGTABLE_ORDER);
+}
 
 #define check_pgt_cache()	do { } while (0)
 
@@ -104,19 +121,44 @@ void shatter_pmd(pmd_t *pmd);
 void shatter_huge_page(unsigned long addr);
 
 #ifdef __tilegx__
-/* We share a single page allocator for both L1 and L2 page tables. */
-#if HV_L1_SIZE != HV_L2_SIZE
-# error Rework assumption that L1 and L2 page tables are same size.
-#endif
-#define L1_USER_PGTABLE_ORDER L2_USER_PGTABLE_ORDER
+
 #define pud_populate(mm, pud, pmd) \
   pmd_populate_kernel((mm), (pmd_t *)(pud), (pte_t *)(pmd))
-#define pmd_alloc_one(mm, addr) \
-  ((pmd_t *)page_to_virt(pte_alloc_one((mm), (addr))))
-#define pmd_free(mm, pmdp) \
-  pte_free((mm), virt_to_page(pmdp))
-#define __pmd_free_tlb(tlb, pmdp, address) \
-  __pte_free_tlb((tlb), virt_to_page(pmdp), (address))
+
+/* Bits for the size of the L1 (intermediate) page table. */
+#define L1_KERNEL_PGTABLE_SHIFT _HV_LOG2_L1_SIZE(HPAGE_SHIFT)
+
+/* How big is a kernel L2 page table? */
+#define L1_KERNEL_PGTABLE_SIZE (1UL << L1_KERNEL_PGTABLE_SHIFT)
+
+/* We currently allocate L1 page tables by page. */
+#if L1_KERNEL_PGTABLE_SHIFT < PAGE_SHIFT
+#define L1_USER_PGTABLE_SHIFT PAGE_SHIFT
+#else
+#define L1_USER_PGTABLE_SHIFT L1_KERNEL_PGTABLE_SHIFT
 #endif
 
+/* How many pages do we need, as an "order", for an L1 page table? */
+#define L1_USER_PGTABLE_ORDER (L1_USER_PGTABLE_SHIFT - PAGE_SHIFT)
+
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
+{
+	struct page *p = pgtable_alloc_one(mm, address, L1_USER_PGTABLE_ORDER);
+	return (pmd_t *)page_to_virt(p);
+}
+
+static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
+{
+	pgtable_free(mm, virt_to_page(pmdp), L1_USER_PGTABLE_ORDER);
+}
+
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+				  unsigned long address)
+{
+	__pgtable_free_tlb(tlb, virt_to_page(pmdp), address,
+			   L1_USER_PGTABLE_ORDER);
+}
+
+#endif /* __tilegx__ */
+
 #endif /* _ASM_TILE_PGALLOC_H */
diff --git a/arch/tile/include/asm/pgtable.h b/arch/tile/include/asm/pgtable.h
index 17ad0ed..ae43301 100644
--- a/arch/tile/include/asm/pgtable.h
+++ b/arch/tile/include/asm/pgtable.h
@@ -27,9 +27,11 @@
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
+#include <linux/pfn.h>
 #include <asm/processor.h>
 #include <asm/fixmap.h>
 #include <asm/system.h>
+#include <asm/page.h>
 
 struct mm_struct;
 struct vm_area_struct;
@@ -163,7 +165,7 @@ extern void set_page_homes(void);
   (pgprot_t) { ((oldprot).val & ~_PAGE_ALL) | (newprot).val }
 
 /* Just setting the PFN to zero suffices. */
-#define pte_pgprot(x) hv_pte_set_pfn((x), 0)
+#define pte_pgprot(x) hv_pte_set_pa((x), 0)
 
 /*
  * For PTEs and PDEs, we must clear the Present bit first when
@@ -263,7 +265,7 @@ static inline int pte_none(pte_t pte)
 
 static inline unsigned long pte_pfn(pte_t pte)
 {
-	return hv_pte_get_pfn(pte);
+	return PFN_DOWN(hv_pte_get_pa(pte));
 }
 
 /* Set or get the remote cache cpu in a pgprot with remote caching. */
@@ -272,7 +274,7 @@ extern int get_remote_cache_cpu(pgprot_t prot);
 
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 {
-	return hv_pte_set_pfn(prot, pfn);
+	return hv_pte_set_pa(prot, PFN_PHYS(pfn));
 }
 
 /* Support for priority mappings. */
@@ -472,7 +474,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  * OK for pte_lockptr(), since we just end up with potentially one
  * lock being used for several pte_t arrays.
  */
-#define pmd_page(pmd) pfn_to_page(HV_PTFN_TO_PFN(pmd_ptfn(pmd)))
+#define pmd_page(pmd) pfn_to_page(PFN_DOWN(HV_PTFN_TO_CPA(pmd_ptfn(pmd))))
 
 static inline void pmd_clear(pmd_t *pmdp)
 {
diff --git a/arch/tile/include/asm/pgtable_32.h b/arch/tile/include/asm/pgtable_32.h
index 27e20f6..4ce4a7a 100644
--- a/arch/tile/include/asm/pgtable_32.h
+++ b/arch/tile/include/asm/pgtable_32.h
@@ -20,11 +20,12 @@
  * The level-1 index is defined by the huge page size.  A PGD is composed
  * of PTRS_PER_PGD pgd_t's and is the top level of the page table.
  */
-#define PGDIR_SHIFT	HV_LOG2_PAGE_SIZE_LARGE
-#define PGDIR_SIZE	HV_PAGE_SIZE_LARGE
+#define PGDIR_SHIFT	HPAGE_SHIFT
+#define PGDIR_SIZE	HPAGE_SIZE
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
-#define PTRS_PER_PGD	(1 << (32 - PGDIR_SHIFT))
-#define SIZEOF_PGD	(PTRS_PER_PGD * sizeof(pgd_t))
+#define PTRS_PER_PGD	_HV_L1_ENTRIES(HPAGE_SHIFT)
+#define PGD_INDEX(va)	_HV_L1_INDEX(va, HPAGE_SHIFT)
+#define SIZEOF_PGD	_HV_L1_SIZE(HPAGE_SHIFT)
 
 /*
  * The level-2 index is defined by the difference between the huge
@@ -33,8 +34,9 @@
  * Note that the hypervisor docs use PTE for what we call pte_t, so
  * this nomenclature is somewhat confusing.
  */
-#define PTRS_PER_PTE (1 << (HV_LOG2_PAGE_SIZE_LARGE - HV_LOG2_PAGE_SIZE_SMALL))
-#define SIZEOF_PTE	(PTRS_PER_PTE * sizeof(pte_t))
+#define PTRS_PER_PTE	_HV_L2_ENTRIES(HPAGE_SHIFT, PAGE_SHIFT)
+#define PTE_INDEX(va)	_HV_L2_INDEX(va, HPAGE_SHIFT, PAGE_SHIFT)
+#define SIZEOF_PTE	_HV_L2_SIZE(HPAGE_SHIFT, PAGE_SHIFT)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/tile/include/asm/pgtable_64.h b/arch/tile/include/asm/pgtable_64.h
index e105f3a..2492fa5 100644
--- a/arch/tile/include/asm/pgtable_64.h
+++ b/arch/tile/include/asm/pgtable_64.h
@@ -21,17 +21,19 @@
 #define PGDIR_SIZE	HV_L1_SPAN
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 #define PTRS_PER_PGD	HV_L0_ENTRIES
-#define SIZEOF_PGD	(PTRS_PER_PGD * sizeof(pgd_t))
+#define PGD_INDEX(va)	HV_L0_INDEX(va)
+#define SIZEOF_PGD	HV_L0_SIZE
 
 /*
  * The level-1 index is defined by the huge page size.  A PMD is composed
  * of PTRS_PER_PMD pgd_t's and is the middle level of the page table.
  */
-#define PMD_SHIFT	HV_LOG2_PAGE_SIZE_LARGE
-#define PMD_SIZE	HV_PAGE_SIZE_LARGE
+#define PMD_SHIFT	HPAGE_SHIFT
+#define PMD_SIZE	HPAGE_SIZE
 #define PMD_MASK	(~(PMD_SIZE-1))
-#define PTRS_PER_PMD	(1 << (PGDIR_SHIFT - PMD_SHIFT))
-#define SIZEOF_PMD	(PTRS_PER_PMD * sizeof(pmd_t))
+#define PTRS_PER_PMD	_HV_L1_ENTRIES(HPAGE_SHIFT)
+#define PMD_INDEX(va)	_HV_L1_INDEX(va, HPAGE_SHIFT)
+#define SIZEOF_PMD	_HV_L1_SIZE(HPAGE_SHIFT)
 
 /*
  * The level-2 index is defined by the difference between the huge
@@ -40,17 +42,19 @@
  * Note that the hypervisor docs use PTE for what we call pte_t, so
  * this nomenclature is somewhat confusing.
  */
-#define PTRS_PER_PTE (1 << (HV_LOG2_PAGE_SIZE_LARGE - HV_LOG2_PAGE_SIZE_SMALL))
-#define SIZEOF_PTE	(PTRS_PER_PTE * sizeof(pte_t))
+#define PTRS_PER_PTE	_HV_L2_ENTRIES(HPAGE_SHIFT, PAGE_SHIFT)
+#define PTE_INDEX(va)	_HV_L2_INDEX(va, HPAGE_SHIFT, PAGE_SHIFT)
+#define SIZEOF_PTE	_HV_L2_SIZE(HPAGE_SHIFT, PAGE_SHIFT)
 
 /*
- * Align the vmalloc area to an L2 page table, and leave a guard page
- * at the beginning and end.  The vmalloc code also puts in an internal
+ * Align the vmalloc area to an L2 page table.  Omit guard pages at
+ * the beginning and end for simplicity (particularly in the per-cpu
+ * memory allocation code).  The vmalloc code puts in an internal
  * guard page between each allocation.
  */
 #define _VMALLOC_END	HUGE_VMAP_BASE
-#define VMALLOC_END	(_VMALLOC_END - PAGE_SIZE)
-#define VMALLOC_START	(_VMALLOC_START + PAGE_SIZE)
+#define VMALLOC_END	_VMALLOC_END
+#define VMALLOC_START	_VMALLOC_START
 
 #define HUGE_VMAP_END	(HUGE_VMAP_BASE + PGDIR_SIZE)
 
@@ -98,7 +102,7 @@ static inline int pud_bad(pud_t pud)
  * A pud_t points to a pmd_t array.  Since we can have multiple per
  * page, we don't have a one-to-one mapping of pud_t's to pages.
  */
-#define pud_page(pud) pfn_to_page(HV_PTFN_TO_PFN(pud_ptfn(pud)))
+#define pud_page(pud) pfn_to_page(PFN_DOWN(HV_PTFN_TO_CPA(pud_ptfn(pud))))
 
 static inline unsigned long pud_index(unsigned long address)
 {
diff --git a/arch/tile/include/hv/drv_xgbe_intf.h b/arch/tile/include/hv/drv_xgbe_intf.h
index f13188a..2a20b26 100644
--- a/arch/tile/include/hv/drv_xgbe_intf.h
+++ b/arch/tile/include/hv/drv_xgbe_intf.h
@@ -460,7 +460,7 @@ typedef void* lepp_comp_t;
  *  linux's "MAX_SKB_FRAGS", and presumably over-estimates by one, for
  *  our page size of exactly 65536.  We add one for a "body" fragment.
  */
-#define LEPP_MAX_FRAGS (65536 / HV_PAGE_SIZE_SMALL + 2 + 1)
+#define LEPP_MAX_FRAGS (65536 / HV_DEFAULT_PAGE_SIZE_SMALL + 2 + 1)
 
 /** Total number of bytes needed for an lepp_tso_cmd_t. */
 #define LEPP_TSO_CMD_SIZE(num_frags, header_size) \
diff --git a/arch/tile/include/hv/hypervisor.h b/arch/tile/include/hv/hypervisor.h
index 793123e..16c18e4 100644
--- a/arch/tile/include/hv/hypervisor.h
+++ b/arch/tile/include/hv/hypervisor.h
@@ -17,8 +17,8 @@
  * The hypervisor's public API.
  */
 
-#ifndef _TILE_HV_H
-#define _TILE_HV_H
+#ifndef _HV_HV_H
+#define _HV_HV_H
 
 #include <arch/chip.h>
 
@@ -42,25 +42,29 @@
  */
 #define HV_L1_SPAN (__HV_SIZE_ONE << HV_LOG2_L1_SPAN)
 
-/** The log2 of the size of small pages, in bytes. This value should
- * be verified at runtime by calling hv_sysconf(HV_SYSCONF_PAGE_SIZE_SMALL).
+/** The log2 of the initial size of small pages, in bytes.
+ * See HV_DEFAULT_PAGE_SIZE_SMALL.
  */
-#define HV_LOG2_PAGE_SIZE_SMALL 16
+#define HV_LOG2_DEFAULT_PAGE_SIZE_SMALL 16
 
-/** The size of small pages, in bytes. This value should be verified
+/** The initial size of small pages, in bytes. This value should be verified
  * at runtime by calling hv_sysconf(HV_SYSCONF_PAGE_SIZE_SMALL).
+ * It may also be modified when installing a new context.
  */
-#define HV_PAGE_SIZE_SMALL (__HV_SIZE_ONE << HV_LOG2_PAGE_SIZE_SMALL)
+#define HV_DEFAULT_PAGE_SIZE_SMALL \
+  (__HV_SIZE_ONE << HV_LOG2_DEFAULT_PAGE_SIZE_SMALL)
 
-/** The log2 of the size of large pages, in bytes. This value should be
- * verified at runtime by calling hv_sysconf(HV_SYSCONF_PAGE_SIZE_LARGE).
+/** The log2 of the initial size of large pages, in bytes.
+ * See HV_DEFAULT_PAGE_SIZE_LARGE.
  */
-#define HV_LOG2_PAGE_SIZE_LARGE 24
+#define HV_LOG2_DEFAULT_PAGE_SIZE_LARGE 24
 
-/** The size of large pages, in bytes. This value should be verified
+/** The initial size of large pages, in bytes. This value should be verified
  * at runtime by calling hv_sysconf(HV_SYSCONF_PAGE_SIZE_LARGE).
+ * It may also be modified when installing a new context.
  */
-#define HV_PAGE_SIZE_LARGE (__HV_SIZE_ONE << HV_LOG2_PAGE_SIZE_LARGE)
+#define HV_DEFAULT_PAGE_SIZE_LARGE \
+  (__HV_SIZE_ONE << HV_LOG2_DEFAULT_PAGE_SIZE_LARGE)
 
 /** The log2 of the granularity at which page tables must be aligned;
  *  in other words, the CPA for a page table must have this many zero
@@ -401,7 +405,13 @@ typedef enum {
    *  that the temperature has hit an upper limit and is no longer being
    *  accurately tracked.
    */
-  HV_SYSCONF_BOARD_TEMP      = 6
+  HV_SYSCONF_BOARD_TEMP      = 6,
+
+  /** Legal page size bitmask for hv_install_context().
+   * For example, if 16KB and 64KB small pages are supported,
+   * it would return "HV_CTX_PG_SM_16K | HV_CTX_PG_SM_64K".
+   */
+  HV_SYSCONF_VALID_PAGE_SIZES = 7,
 
 } HV_SysconfQuery;
 
@@ -649,6 +659,12 @@ void hv_set_rtc(HV_RTCTime time);
  *  new page table does not need to contain any mapping for the
  *  hv_install_context address itself.
  *
+ *  At most one HV_CTX_PG_SM_* flag may be specified in "flags";
+ *  if multiple flags are specified, HV_EINVAL is returned.
+ *  Specifying none of the flags results in using the default page size.
+ *  All cores participating in a given client must request the same
+ *  page size, or the results are undefined.
+ *
  * @param page_table Root of the page table.
  * @param access PTE providing info on how to read the page table.  This
  *   value must be consistent between multiple tiles sharing a page table,
@@ -667,6 +683,11 @@ int hv_install_context(HV_PhysAddr page_table, HV_PTE access, HV_ASID asid,
 #define HV_CTX_DIRECTIO     0x1   /**< Direct I/O requests are accepted from
                                        PL0. */
 
+#define HV_CTX_PG_SM_4K     0x10  /**< Use 4K small pages, if available. */
+#define HV_CTX_PG_SM_16K    0x20  /**< Use 16K small pages, if available. */
+#define HV_CTX_PG_SM_64K    0x40  /**< Use 64K small pages, if available. */
+#define HV_CTX_PG_SM_MASK   0xf0  /**< Mask of all possible small pages. */
+
 #ifndef __ASSEMBLER__
 
 /** Value returned from hv_inquire_context(). */
@@ -1238,11 +1259,14 @@ HV_Errno hv_set_command_line(HV_VirtAddr buf, int length);
  * with the existing priority pages) or "red/black" (if they don't).
  * The bitmask provides information on which parts of the cache
  * have been used for pinned pages so far on this tile; if (1 << N)
- * appears in the bitmask, that indicates that a page has been marked
- * "priority" whose PFN equals N, mod 8.
+ * appears in the bitmask, that indicates that a 4KB region of the
+ * cache starting at (N * 4KB) is in use by a "priority" page.
+ * The portion of cache used by a particular page can be computed
+ * by taking the page's PA, modulo CHIP_L2_CACHE_SIZE(), and setting
+ * all the "4KB" bits corresponding to the actual page size.
  * @param bitmask A bitmap of priority page set values
  */
-void hv_set_caching(unsigned int bitmask);
+void hv_set_caching(unsigned long bitmask);
 
 
 /** Zero out a specified number of pages.
@@ -1868,15 +1892,6 @@ int hv_flush_remote(HV_PhysAddr cache_pa, unsigned long cache_control,
                                               of word */
 #define HV_PTE_PTFN_BITS             29  /**< Number of bits in a PTFN */
 
-/** Position of the PFN field within the PTE (subset of the PTFN). */
-#define HV_PTE_INDEX_PFN (HV_PTE_INDEX_PTFN + (HV_LOG2_PAGE_SIZE_SMALL - \
-                                               HV_LOG2_PAGE_TABLE_ALIGN))
-
-/** Length of the PFN field within the PTE (subset of the PTFN). */
-#define HV_PTE_INDEX_PFN_BITS (HV_PTE_INDEX_PTFN_BITS - \
-                               (HV_LOG2_PAGE_SIZE_SMALL - \
-                                HV_LOG2_PAGE_TABLE_ALIGN))
-
 /*
  * Legal values for the PTE's mode field
  */
@@ -2229,40 +2244,11 @@ hv_pte_set_mode(HV_PTE pte, unsigned int val)
  *
  * This field contains the upper bits of the CPA (client physical
  * address) of the target page; the complete CPA is this field with
- * HV_LOG2_PAGE_SIZE_SMALL zero bits appended to it.
+ * HV_LOG2_PAGE_TABLE_ALIGN zero bits appended to it.
  *
- * For PTEs in a level-1 page table where the Page bit is set, the
- * CPA must be aligned modulo the large page size.
- */
-static __inline unsigned int
-hv_pte_get_pfn(const HV_PTE pte)
-{
-  return pte.val >> HV_PTE_INDEX_PFN;
-}
-
-
-/** Set the page frame number into a PTE.  See hv_pte_get_pfn. */
-static __inline HV_PTE
-hv_pte_set_pfn(HV_PTE pte, unsigned int val)
-{
-  /*
-   * Note that the use of "PTFN" in the next line is intentional; we
-   * don't want any garbage lower bits left in that field.
-   */
-  pte.val &= ~(((1ULL << HV_PTE_PTFN_BITS) - 1) << HV_PTE_INDEX_PTFN);
-  pte.val |= (__hv64) val << HV_PTE_INDEX_PFN;
-  return pte;
-}
-
-/** Get the page table frame number from the PTE.
- *
- * This field contains the upper bits of the CPA (client physical
- * address) of the target page table; the complete CPA is this field with
- * with HV_PAGE_TABLE_ALIGN zero bits appended to it.
- *
- * For PTEs in a level-1 page table when the Page bit is not set, the
- * CPA must be aligned modulo the sticter of HV_PAGE_TABLE_ALIGN and
- * the level-2 page table size.
+ * For all PTEs in the lowest-level page table, and for all PTEs with
+ * the Page bit set in all page tables, the CPA must be aligned modulo
+ * the relevant page size.
  */
 static __inline unsigned long
 hv_pte_get_ptfn(const HV_PTE pte)
@@ -2270,7 +2256,6 @@ hv_pte_get_ptfn(const HV_PTE pte)
   return pte.val >> HV_PTE_INDEX_PTFN;
 }
 
-
 /** Set the page table frame number into a PTE.  See hv_pte_get_ptfn. */
 static __inline HV_PTE
 hv_pte_set_ptfn(HV_PTE pte, unsigned long val)
@@ -2280,6 +2265,20 @@ hv_pte_set_ptfn(HV_PTE pte, unsigned long val)
   return pte;
 }
 
+/** Get the client physical address from the PTE.  See hv_pte_set_ptfn. */
+static __inline HV_PhysAddr
+hv_pte_get_pa(const HV_PTE pte)
+{
+  return (__hv64) hv_pte_get_ptfn(pte) << HV_LOG2_PAGE_TABLE_ALIGN;
+}
+
+/** Set the client physical address into a PTE.  See hv_pte_get_ptfn. */
+static __inline HV_PTE
+hv_pte_set_pa(HV_PTE pte, HV_PhysAddr pa)
+{
+  return hv_pte_set_ptfn(pte, pa >> HV_LOG2_PAGE_TABLE_ALIGN);
+}
+
 
 /** Get the remote tile caching this page.
  *
@@ -2315,28 +2314,20 @@ hv_pte_set_lotar(HV_PTE pte, unsigned int val)
 
 #endif  /* !__ASSEMBLER__ */
 
-/** Converts a client physical address to a pfn. */
-#define HV_CPA_TO_PFN(p) ((p) >> HV_LOG2_PAGE_SIZE_SMALL)
-
-/** Converts a pfn to a client physical address. */
-#define HV_PFN_TO_CPA(p) (((HV_PhysAddr)(p)) << HV_LOG2_PAGE_SIZE_SMALL)
-
 /** Converts a client physical address to a ptfn. */
 #define HV_CPA_TO_PTFN(p) ((p) >> HV_LOG2_PAGE_TABLE_ALIGN)
 
 /** Converts a ptfn to a client physical address. */
 #define HV_PTFN_TO_CPA(p) (((HV_PhysAddr)(p)) << HV_LOG2_PAGE_TABLE_ALIGN)
 
-/** Converts a ptfn to a pfn. */
-#define HV_PTFN_TO_PFN(p) \
-  ((p) >> (HV_LOG2_PAGE_SIZE_SMALL - HV_LOG2_PAGE_TABLE_ALIGN))
-
-/** Converts a pfn to a ptfn. */
-#define HV_PFN_TO_PTFN(p) \
-  ((p) << (HV_LOG2_PAGE_SIZE_SMALL - HV_LOG2_PAGE_TABLE_ALIGN))
-
 #if CHIP_VA_WIDTH() > 32
 
+/*
+ * Note that we currently do not allow customizing the page size
+ * of the L0 pages, but fix them at 4GB, so we do not use the
+ * "_HV_xxx" nomenclature for the L0 macros.
+ */
+
 /** Log number of HV_PTE entries in L0 page table */
 #define HV_LOG2_L0_ENTRIES (CHIP_VA_WIDTH() - HV_LOG2_L1_SPAN)
 
@@ -2366,69 +2357,104 @@ hv_pte_set_lotar(HV_PTE pte, unsigned int val)
 #endif /* CHIP_VA_WIDTH() > 32 */
 
 /** Log number of HV_PTE entries in L1 page table */
-#define HV_LOG2_L1_ENTRIES (HV_LOG2_L1_SPAN - HV_LOG2_PAGE_SIZE_LARGE)
+#define _HV_LOG2_L1_ENTRIES(log2_page_size_large) \
+  (HV_LOG2_L1_SPAN - log2_page_size_large)
 
 /** Number of HV_PTE entries in L1 page table */
-#define HV_L1_ENTRIES (1 << HV_LOG2_L1_ENTRIES)
+#define _HV_L1_ENTRIES(log2_page_size_large) \
+  (1 << _HV_LOG2_L1_ENTRIES(log2_page_size_large))
 
 /** Log size of L1 page table in bytes */
-#define HV_LOG2_L1_SIZE (HV_LOG2_PTE_SIZE + HV_LOG2_L1_ENTRIES)
+#define _HV_LOG2_L1_SIZE(log2_page_size_large) \
+  (HV_LOG2_PTE_SIZE + _HV_LOG2_L1_ENTRIES(log2_page_size_large))
 
 /** Size of L1 page table in bytes */
-#define HV_L1_SIZE (1 << HV_LOG2_L1_SIZE)
+#define _HV_L1_SIZE(log2_page_size_large) \
+  (1 << _HV_LOG2_L1_SIZE(log2_page_size_large))
 
 /** Log number of HV_PTE entries in level-2 page table */
-#define HV_LOG2_L2_ENTRIES (HV_LOG2_PAGE_SIZE_LARGE - HV_LOG2_PAGE_SIZE_SMALL)
+#define _HV_LOG2_L2_ENTRIES(log2_page_size_large, log2_page_size_small) \
+  (log2_page_size_large - log2_page_size_small)
 
 /** Number of HV_PTE entries in level-2 page table */
-#define HV_L2_ENTRIES (1 << HV_LOG2_L2_ENTRIES)
+#define _HV_L2_ENTRIES(log2_page_size_large, log2_page_size_small) \
+  (1 << _HV_LOG2_L2_ENTRIES(log2_page_size_large, log2_page_size_small))
 
 /** Log size of level-2 page table in bytes */
-#define HV_LOG2_L2_SIZE (HV_LOG2_PTE_SIZE + HV_LOG2_L2_ENTRIES)
+#define _HV_LOG2_L2_SIZE(log2_page_size_large, log2_page_size_small) \
+  (HV_LOG2_PTE_SIZE + \
+   _HV_LOG2_L2_ENTRIES(log2_page_size_large, log2_page_size_small))
 
 /** Size of level-2 page table in bytes */
-#define HV_L2_SIZE (1 << HV_LOG2_L2_SIZE)
+#define _HV_L2_SIZE(log2_page_size_large, log2_page_size_small) \
+  (1 << _HV_LOG2_L2_SIZE(log2_page_size_large, log2_page_size_small))
 
 #ifdef __ASSEMBLER__
 
 #if CHIP_VA_WIDTH() > 32
 
 /** Index in L1 for a specific VA */
-#define HV_L1_INDEX(va) \
-  (((va) >> HV_LOG2_PAGE_SIZE_LARGE) & (HV_L1_ENTRIES - 1))
+#define _HV_L1_INDEX(va, log2_page_size_large) \
+  (((va) >> log2_page_size_large) & (_HV_L1_ENTRIES(log2_page_size_large) - 1))
 
 #else /* CHIP_VA_WIDTH() > 32 */
 
 /** Index in L1 for a specific VA */
-#define HV_L1_INDEX(va) \
-  (((va) >> HV_LOG2_PAGE_SIZE_LARGE))
+#define _HV_L1_INDEX(va, log2_page_size_large) \
+  (((va) >> log2_page_size_large))
 
 #endif /* CHIP_VA_WIDTH() > 32 */
 
 /** Index in level-2 page table for a specific VA */
-#define HV_L2_INDEX(va) \
-  (((va) >> HV_LOG2_PAGE_SIZE_SMALL) & (HV_L2_ENTRIES - 1))
+#define _HV_L2_INDEX(va, log2_page_size_large, log2_page_size_small) \
+  (((va) >> log2_page_size_small) & \
+   (_HV_L2_ENTRIES(log2_page_size_large, log2_page_size_small) - 1))
 
 #else /* __ASSEMBLER __ */
 
 #if CHIP_VA_WIDTH() > 32
 
 /** Index in L1 for a specific VA */
-#define HV_L1_INDEX(va) \
-  (((HV_VirtAddr)(va) >> HV_LOG2_PAGE_SIZE_LARGE) & (HV_L1_ENTRIES - 1))
+#define _HV_L1_INDEX(va, log2_page_size_large) \
+  (((HV_VirtAddr)(va) >> log2_page_size_large) & \
+   (_HV_L1_ENTRIES(log2_page_size_large) - 1))
 
 #else /* CHIP_VA_WIDTH() > 32 */
 
 /** Index in L1 for a specific VA */
-#define HV_L1_INDEX(va) \
-  (((HV_VirtAddr)(va) >> HV_LOG2_PAGE_SIZE_LARGE))
+#define _HV_L1_INDEX(va, log2_page_size_large) \
+  (((HV_VirtAddr)(va) >> log2_page_size_large))
 
 #endif /* CHIP_VA_WIDTH() > 32 */
 
 /** Index in level-2 page table for a specific VA */
-#define HV_L2_INDEX(va) \
-  (((HV_VirtAddr)(va) >> HV_LOG2_PAGE_SIZE_SMALL) & (HV_L2_ENTRIES - 1))
+#define _HV_L2_INDEX(va, log2_page_size_large, log2_page_size_small) \
+  (((HV_VirtAddr)(va) >> log2_page_size_small) & \
+   (_HV_L2_ENTRIES(log2_page_size_large, log2_page_size_small) - 1))
 
 #endif /* __ASSEMBLER __ */
 
-#endif /* _TILE_HV_H */
+/** Position of the PFN field within the PTE (subset of the PTFN). */
+#define _HV_PTE_INDEX_PFN(log2_page_size) \
+  (HV_PTE_INDEX_PTFN + (log2_page_size - HV_LOG2_PAGE_TABLE_ALIGN))
+
+/** Length of the PFN field within the PTE (subset of the PTFN). */
+#define _HV_PTE_INDEX_PFN_BITS(log2_page_size) \
+  (HV_PTE_INDEX_PTFN_BITS - (log2_page_size - HV_LOG2_PAGE_TABLE_ALIGN))
+
+/** Converts a client physical address to a pfn. */
+#define _HV_CPA_TO_PFN(p, log2_page_size) ((p) >> log2_page_size)
+
+/** Converts a pfn to a client physical address. */
+#define _HV_PFN_TO_CPA(p, log2_page_size) \
+  (((HV_PhysAddr)(p)) << log2_page_size)
+
+/** Converts a ptfn to a pfn. */
+#define _HV_PTFN_TO_PFN(p, log2_page_size) \
+  ((p) >> (log2_page_size - HV_LOG2_PAGE_TABLE_ALIGN))
+
+/** Converts a pfn to a ptfn. */
+#define _HV_PFN_TO_PTFN(p, log2_page_size) \
+  ((p) << (log2_page_size - HV_LOG2_PAGE_TABLE_ALIGN))
+
+#endif /* _HV_HV_H */
diff --git a/arch/tile/kernel/head_32.S b/arch/tile/kernel/head_32.S
index 1a39b7c..f71bfee 100644
--- a/arch/tile/kernel/head_32.S
+++ b/arch/tile/kernel/head_32.S
@@ -69,7 +69,7 @@ ENTRY(_start)
 	}
 	{
 	  moveli lr, lo16(1f)
-	  move r5, zero
+	  moveli r5, CTX_PAGE_FLAG
 	}
 	{
 	  auli lr, lr, ha16(1f)
@@ -141,11 +141,11 @@ ENTRY(empty_zero_page)
 
 	.macro PTE va, cpa, bits1, no_org=0
 	.ifeq \no_org
-	.org swapper_pg_dir + HV_L1_INDEX(\va) * HV_PTE_SIZE
+	.org swapper_pg_dir + PGD_INDEX(\va) * HV_PTE_SIZE
 	.endif
 	.word HV_PTE_PAGE | HV_PTE_DIRTY | HV_PTE_PRESENT | HV_PTE_ACCESSED | \
 	      (HV_PTE_MODE_CACHE_NO_L3 << HV_PTE_INDEX_MODE)
-	.word (\bits1) | (HV_CPA_TO_PFN(\cpa) << (HV_PTE_INDEX_PFN - 32))
+	.word (\bits1) | (HV_CPA_TO_PTFN(\cpa) << (HV_PTE_INDEX_PTFN - 32))
 	.endm
 
 __PAGE_ALIGNED_DATA
@@ -166,7 +166,7 @@ ENTRY(swapper_pg_dir)
 	/* The true text VAs are mapped as VA = PA + MEM_SV_INTRPT */
 	PTE MEM_SV_INTRPT, 0, (1 << (HV_PTE_INDEX_READABLE - 32)) | \
 			      (1 << (HV_PTE_INDEX_EXECUTABLE - 32))
-	.org swapper_pg_dir + HV_L1_SIZE
+	.org swapper_pg_dir + PGDIR_SIZE
 	END(swapper_pg_dir)
 
 	/*
diff --git a/arch/tile/kernel/head_64.S b/arch/tile/kernel/head_64.S
index 6bc3a93..f9a2734 100644
--- a/arch/tile/kernel/head_64.S
+++ b/arch/tile/kernel/head_64.S
@@ -114,7 +114,7 @@ ENTRY(_start)
 	  shl16insli r0, r0, hw0(swapper_pg_dir - PAGE_OFFSET)
 	}
 	{
-	  move r3, zero
+	  moveli r3, CTX_PAGE_FLAG
 	  j hv_install_context
 	}
 1:
@@ -210,19 +210,19 @@ ENTRY(empty_zero_page)
 	.macro PTE cpa, bits1
 	.quad HV_PTE_PAGE | HV_PTE_DIRTY | HV_PTE_PRESENT | HV_PTE_ACCESSED |\
 	      HV_PTE_GLOBAL | (HV_PTE_MODE_CACHE_NO_L3 << HV_PTE_INDEX_MODE) |\
-	      (\bits1) | (HV_CPA_TO_PFN(\cpa) << HV_PTE_INDEX_PFN)
+	      (\bits1) | (HV_CPA_TO_PTFN(\cpa) << HV_PTE_INDEX_PTFN)
 	.endm
 
 __PAGE_ALIGNED_DATA
 	.align PAGE_SIZE
 ENTRY(swapper_pg_dir)
-	.org swapper_pg_dir + HV_L0_INDEX(PAGE_OFFSET) * HV_PTE_SIZE
+	.org swapper_pg_dir + PGD_INDEX(PAGE_OFFSET) * HV_PTE_SIZE
 .Lsv_data_pmd:
 	.quad 0  /* PTE temp_data_pmd - PAGE_OFFSET, 0 */
-	.org swapper_pg_dir + HV_L0_INDEX(MEM_SV_START) * HV_PTE_SIZE
+	.org swapper_pg_dir + PGD_INDEX(MEM_SV_START) * HV_PTE_SIZE
 .Lsv_code_pmd:
 	.quad 0  /* PTE temp_code_pmd - PAGE_OFFSET, 0 */
-	.org swapper_pg_dir + HV_L0_SIZE
+	.org swapper_pg_dir + SIZEOF_PGD
 	END(swapper_pg_dir)
 
 	.align HV_PAGE_TABLE_ALIGN
@@ -233,11 +233,11 @@ ENTRY(temp_data_pmd)
 	 * permissions later.
 	 */
 	.set addr, 0
-	.rept HV_L1_ENTRIES
+	.rept PTRS_PER_PMD
 	PTE addr, HV_PTE_READABLE | HV_PTE_WRITABLE
-	.set addr, addr + HV_PAGE_SIZE_LARGE
+	.set addr, addr + HPAGE_SIZE
 	.endr
-	.org temp_data_pmd + HV_L1_SIZE
+	.org temp_data_pmd + SIZEOF_PMD
 	END(temp_data_pmd)
 
 	.align HV_PAGE_TABLE_ALIGN
@@ -248,11 +248,11 @@ ENTRY(temp_code_pmd)
 	 * permissions later.
 	 */
 	.set addr, 0
-	.rept HV_L1_ENTRIES
+	.rept PTRS_PER_PMD
 	PTE addr, HV_PTE_READABLE | HV_PTE_EXECUTABLE
-	.set addr, addr + HV_PAGE_SIZE_LARGE
+	.set addr, addr + HPAGE_SIZE
 	.endr
-	.org temp_code_pmd + HV_L1_SIZE
+	.org temp_code_pmd + SIZEOF_PMD
 	END(temp_code_pmd)
 
 	/*
diff --git a/arch/tile/kernel/machine_kexec.c b/arch/tile/kernel/machine_kexec.c
index 6255f2e..b0fa37c 100644
--- a/arch/tile/kernel/machine_kexec.c
+++ b/arch/tile/kernel/machine_kexec.c
@@ -251,6 +251,7 @@ static void setup_quasi_va_is_pa(void)
 void machine_kexec(struct kimage *image)
 {
 	void *reboot_code_buffer;
+	pte_t *ptep;
 	void (*rnk)(unsigned long, void *, unsigned long)
 		__noreturn;
 
@@ -266,8 +267,10 @@ void machine_kexec(struct kimage *image)
 	 */
 	homecache_change_page_home(image->control_code_page, 0,
 				   smp_processor_id());
-	reboot_code_buffer = vmap(&image->control_code_page, 1, 0,
-				  __pgprot(_PAGE_KERNEL | _PAGE_EXECUTABLE));
+	reboot_code_buffer = page_address(image->control_code_page);
+	BUG_ON(reboot_code_buffer == NULL);
+	ptep = virt_to_pte(NULL, (unsigned long)reboot_code_buffer);
+	__set_pte(ptep, pte_mkexec(*ptep));
 	memcpy(reboot_code_buffer, relocate_new_kernel,
 	       relocate_new_kernel_size);
 	__flush_icache_range(
diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 5f85d8b..1a7276a 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -1390,13 +1390,13 @@ void __init setup_per_cpu_areas(void)
 		for (i = 0; i < size; i += PAGE_SIZE, ++pfn, ++pg) {
 
 			/* Update the vmalloc mapping and page home. */
-			pte_t *ptep =
-				virt_to_pte(NULL, (unsigned long)ptr + i);
+			unsigned long addr = (unsigned long)ptr + i;
+			pte_t *ptep = virt_to_pte(NULL, addr);
 			pte_t pte = *ptep;
 			BUG_ON(pfn != pte_pfn(pte));
 			pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_TILE_L3);
 			pte = set_remote_cache_cpu(pte, cpu);
-			set_pte(ptep, pte);
+			set_pte_at(&init_mm, addr, ptep, pte);
 
 			/* Update the lowmem mapping for consistency. */
 			lowmem_va = (unsigned long)pfn_to_kaddr(pfn);
@@ -1409,7 +1409,7 @@ void __init setup_per_cpu_areas(void)
 				BUG_ON(pte_huge(*ptep));
 			}
 			BUG_ON(pfn != pte_pfn(*ptep));
-			set_pte(ptep, pte);
+			set_pte_at(&init_mm, lowmem_va, ptep, pte);
 		}
 	}
 
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index c52224d..7579607 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -216,7 +216,7 @@ void __init ipi_init(void)
 		if (hv_get_ipi_pte(tile, KERNEL_PL, &pte) != 0)
 			panic("Failed to initialize IPI for cpu %d\n", cpu);
 
-		offset = hv_pte_get_pfn(pte) << PAGE_SHIFT;
+		offset = PFN_PHYS(pte_pfn(pte));
 		ipi_mappings[cpu] = ioremap_prot(offset, PAGE_SIZE, pte);
 	}
 #endif
diff --git a/arch/tile/kernel/stack.c b/arch/tile/kernel/stack.c
index 37ee4d0..9016d1c 100644
--- a/arch/tile/kernel/stack.c
+++ b/arch/tile/kernel/stack.c
@@ -56,12 +56,12 @@ static int valid_address(struct KBacktraceIterator *kbt, unsigned long address)
 	if (l1_pgtable == NULL)
 		return 0;	/* can't read user space in other tasks */
 
+	pte = l1_pgtable[PGD_INDEX(address)];
 #ifdef CONFIG_64BIT
 	/* Find the real l1_pgtable by looking in the l0_pgtable. */
-	pte = l1_pgtable[HV_L0_INDEX(address)];
 	if (!hv_pte_get_present(pte))
 		return 0;
-	pfn = hv_pte_get_pfn(pte);
+	pfn = pte_pfn(pte);
 	if (pte_huge(pte)) {
 		if (!pfn_valid(pfn)) {
 			pr_err("L0 huge page has bad pfn %#lx\n", pfn);
@@ -72,11 +72,11 @@ static int valid_address(struct KBacktraceIterator *kbt, unsigned long address)
 	page = pfn_to_page(pfn);
 	BUG_ON(PageHighMem(page));  /* No HIGHMEM on 64-bit. */
 	l1_pgtable = (HV_PTE *)pfn_to_kaddr(pfn);
+	pte = l1_pgtable[PMD_INDEX(address)];
 #endif
-	pte = l1_pgtable[HV_L1_INDEX(address)];
 	if (!hv_pte_get_present(pte))
 		return 0;
-	pfn = hv_pte_get_pfn(pte);
+	pfn = pte_pfn(pte);
 	if (pte_huge(pte)) {
 		if (!pfn_valid(pfn)) {
 			pr_err("huge page has bad pfn %#lx\n", pfn);
@@ -87,12 +87,11 @@ static int valid_address(struct KBacktraceIterator *kbt, unsigned long address)
 
 	page = pfn_to_page(pfn);
 	if (PageHighMem(page)) {
-		pr_err("L2 page table not in LOWMEM (%#llx)\n",
-		       HV_PFN_TO_CPA(pfn));
+		pr_err("L2 page table not in LOWMEM (%#llx)\n", PFN_PHYS(pfn));
 		return 0;
 	}
 	l2_pgtable = (HV_PTE *)pfn_to_kaddr(pfn);
-	pte = l2_pgtable[HV_L2_INDEX(address)];
+	pte = l2_pgtable[PTE_INDEX(address)];
 	return hv_pte_get_present(pte) && hv_pte_get_readable(pte);
 }
 
diff --git a/arch/tile/lib/memcpy_tile64.c b/arch/tile/lib/memcpy_tile64.c
index b2fe15e..3bc4b4e 100644
--- a/arch/tile/lib/memcpy_tile64.c
+++ b/arch/tile/lib/memcpy_tile64.c
@@ -160,7 +160,7 @@ retry_source:
 			break;
 		if (get_remote_cache_cpu(src_pte) == smp_processor_id())
 			break;
-		src_page = pfn_to_page(hv_pte_get_pfn(src_pte));
+		src_page = pfn_to_page(pte_pfn(src_pte));
 		get_page(src_page);
 		if (pte_val(src_pte) != pte_val(*src_ptep)) {
 			put_page(src_page);
@@ -168,7 +168,7 @@ retry_source:
 		}
 		if (pte_huge(src_pte)) {
 			/* Adjust the PTE to correspond to a small page */
-			int pfn = hv_pte_get_pfn(src_pte);
+			int pfn = pte_pfn(src_pte);
 			pfn += (((unsigned long)source & (HPAGE_SIZE-1))
 				>> PAGE_SHIFT);
 			src_pte = pfn_pte(pfn, src_pte);
@@ -188,7 +188,7 @@ retry_dest:
 			put_page(src_page);
 			break;
 		}
-		dst_page = pfn_to_page(hv_pte_get_pfn(dst_pte));
+		dst_page = pfn_to_page(pte_pfn(dst_pte));
 		if (dst_page == src_page) {
 			/*
 			 * Source and dest are on the same page; this
@@ -206,7 +206,7 @@ retry_dest:
 		}
 		if (pte_huge(dst_pte)) {
 			/* Adjust the PTE to correspond to a small page */
-			int pfn = hv_pte_get_pfn(dst_pte);
+			int pfn = pte_pfn(dst_pte);
 			pfn += (((unsigned long)dest & (HPAGE_SIZE-1))
 				>> PAGE_SHIFT);
 			dst_pte = pfn_pte(pfn, dst_pte);
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 51c5e51..d1c6391 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -83,7 +83,7 @@ static int num_l2_ptes[MAX_NUMNODES];
 
 static void init_prealloc_ptes(int node, int pages)
 {
-	BUG_ON(pages & (HV_L2_ENTRIES-1));
+	BUG_ON(pages & (PTRS_PER_PTE - 1));
 	if (pages) {
 		num_l2_ptes[node] = pages;
 		l2_ptes[node] = __alloc_bootmem(pages * sizeof(pte_t),
@@ -132,14 +132,9 @@ static void __init assign_pte(pmd_t *pmd, pte_t *page_table)
 
 #ifdef __tilegx__
 
-#if HV_L1_SIZE != HV_L2_SIZE
-# error Rework assumption that L1 and L2 page tables are same size.
-#endif
-
-/* Since pmd_t arrays and pte_t arrays are the same size, just use casts. */
 static inline pmd_t *alloc_pmd(void)
 {
-	return (pmd_t *)alloc_pte();
+	return __alloc_bootmem(L1_KERNEL_PGTABLE_SIZE, HV_PAGE_TABLE_ALIGN, 0);
 }
 
 static inline void assign_pmd(pud_t *pud, pmd_t *pmd)
@@ -808,7 +803,7 @@ void __init paging_init(void)
 	 * changing init_mm once we get up and running, and there's no
 	 * need for e.g. vmalloc_sync_all().
 	 */
-	BUILD_BUG_ON(pgd_index(VMALLOC_START) != pgd_index(VMALLOC_END));
+	BUILD_BUG_ON(pgd_index(VMALLOC_START) != pgd_index(VMALLOC_END - 1));
 	pud = pud_offset(pgd_base + pgd_index(VMALLOC_START), VMALLOC_START);
 	assign_pmd(pud, alloc_pmd());
 #endif
diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index de7d8e2..14e7e9e 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -288,13 +288,12 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 
 #define L2_USER_PGTABLE_PAGES (1 << L2_USER_PGTABLE_ORDER)
 
-struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
+struct page *pgtable_alloc_one(struct mm_struct *mm, unsigned long address,
+			       int order)
 {
 	gfp_t flags = GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO;
 	struct page *p;
-#if L2_USER_PGTABLE_ORDER > 0
 	int i;
-#endif
 
 #ifdef CONFIG_HIGHPTE
 	flags |= __GFP_HIGHMEM;
@@ -304,17 +303,15 @@ struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 	if (p == NULL)
 		return NULL;
 
-#if L2_USER_PGTABLE_ORDER > 0
 	/*
 	 * Make every page have a page_count() of one, not just the first.
 	 * We don't use __GFP_COMP since it doesn't look like it works
 	 * correctly with tlb_remove_page().
 	 */
-	for (i = 1; i < L2_USER_PGTABLE_PAGES; ++i) {
+	for (i = 1; i < order; ++i) {
 		init_page_count(p+i);
 		inc_zone_page_state(p+i, NR_PAGETABLE);
 	}
-#endif
 
 	pgtable_page_ctor(p);
 	return p;
@@ -325,28 +322,28 @@ struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
  * process).  We have to correct whatever pte_alloc_one() did before
  * returning the pages to the allocator.
  */
-void pte_free(struct mm_struct *mm, struct page *p)
+void pgtable_free(struct mm_struct *mm, struct page *p, int order)
 {
 	int i;
 
 	pgtable_page_dtor(p);
 	__free_page(p);
 
-	for (i = 1; i < L2_USER_PGTABLE_PAGES; ++i) {
+	for (i = 1; i < order; ++i) {
 		__free_page(p+i);
 		dec_zone_page_state(p+i, NR_PAGETABLE);
 	}
 }
 
-void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte,
-		    unsigned long address)
+void __pgtable_free_tlb(struct mmu_gather *tlb, struct page *pte,
+			unsigned long address, int order)
 {
 	int i;
 
 	pgtable_page_dtor(pte);
 	tlb_remove_page(tlb, pte);
 
-	for (i = 1; i < L2_USER_PGTABLE_PAGES; ++i) {
+	for (i = 1; i < order; ++i) {
 		tlb_remove_page(tlb, pte + i);
 		dec_zone_page_state(pte + i, NR_PAGETABLE);
 	}
@@ -481,7 +478,7 @@ void set_pte(pte_t *ptep, pte_t pte)
 /* Can this mm load a PTE with cached_priority set? */
 static inline int mm_is_priority_cached(struct mm_struct *mm)
 {
-	return mm->context.priority_cached;
+	return mm->context.priority_cached != 0;
 }
 
 /*
@@ -491,8 +488,8 @@ static inline int mm_is_priority_cached(struct mm_struct *mm)
 void start_mm_caching(struct mm_struct *mm)
 {
 	if (!mm_is_priority_cached(mm)) {
-		mm->context.priority_cached = -1U;
-		hv_set_caching(-1U);
+		mm->context.priority_cached = -1UL;
+		hv_set_caching(-1UL);
 	}
 }
 
@@ -507,7 +504,7 @@ void start_mm_caching(struct mm_struct *mm)
  * Presumably we'll come back later and have more luck and clear
  * the value then; for now we'll just keep the cache marked for priority.
  */
-static unsigned int update_priority_cached(struct mm_struct *mm)
+static unsigned long update_priority_cached(struct mm_struct *mm)
 {
 	if (mm->context.priority_cached && down_write_trylock(&mm->mmap_sem)) {
 		struct vm_area_struct *vm;
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: avoid false corrupt frame warning in early boot
       [not found] <4F761E1C.80808.com>
                   ` (11 preceding siblings ...)
  2012-03-29 17:58 ` [PATCH] arch/tile: Allow tilegx to build with either 16K or 64K page size Chris Metcalf
@ 2012-03-29 18:02 ` Chris Metcalf
  2012-03-29 18:05 ` [PATCH] arch/tile: make sure to build memcpy_user_64 without frame pointer Chris Metcalf
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 18:02 UTC (permalink / raw)
  To: Chris Metcalf, Paul E. McKenney, Frederic Weisbecker,
	Josh Triplett, Dmitry Torokhov, linux-kernel

With lockstat we can end up trying to get a backtrace before
"high_memory" is initialized, so don't worry about range testing
if it is zero.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/process.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 4c1ac6e..3be7eb5 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -286,7 +286,7 @@ struct task_struct *validate_current(void)
 	static struct task_struct corrupt = { .comm = "<corrupt>" };
 	struct task_struct *tsk = current;
 	if (unlikely((unsigned long)tsk < PAGE_OFFSET ||
-		     (void *)tsk > high_memory ||
+		     (high_memory && (void *)tsk > high_memory) ||
 		     ((unsigned long)tsk & (__alignof__(*tsk) - 1)) != 0)) {
 		pr_err("Corrupt 'current' %p (sp %#lx)\n", tsk, stack_pointer);
 		tsk = &corrupt;
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: make sure to build memcpy_user_64 without frame pointer
       [not found] <4F761E1C.80808.com>
                   ` (12 preceding siblings ...)
  2012-03-29 18:02 ` [PATCH] arch/tile: avoid false corrupt frame warning in early boot Chris Metcalf
@ 2012-03-29 18:05 ` Chris Metcalf
  2012-03-29 18:06 ` [PATCH] arch/tile: various improvements to stack backtracer Chris Metcalf
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 18:05 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, linux-kernel

Add a comment explaining why this is important, and add a CFLAGS_REMOVE
clause to the Makefile to make sure it happens.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/lib/Makefile         |    1 +
 arch/tile/lib/memcpy_user_64.c |    8 +++++++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/tile/lib/Makefile b/arch/tile/lib/Makefile
index 0c26086..985f598 100644
--- a/arch/tile/lib/Makefile
+++ b/arch/tile/lib/Makefile
@@ -7,6 +7,7 @@ lib-y = cacheflush.o checksum.o cpumask.o delay.o uaccess.o \
 	strchr_$(BITS).o strlen_$(BITS).o
 
 ifeq ($(CONFIG_TILEGX),y)
+CFLAGS_REMOVE_memcpy_user_64.o = -fno-omit-frame-pointer
 lib-y += memcpy_user_64.o
 else
 lib-y += atomic_32.o atomic_asm_32.o memcpy_tile64.o
diff --git a/arch/tile/lib/memcpy_user_64.c b/arch/tile/lib/memcpy_user_64.c
index 4763b3a..37440ca 100644
--- a/arch/tile/lib/memcpy_user_64.c
+++ b/arch/tile/lib/memcpy_user_64.c
@@ -14,7 +14,13 @@
  * Do memcpy(), but trap and return "n" when a load or store faults.
  *
  * Note: this idiom only works when memcpy() compiles to a leaf function.
- * If "sp" is updated during memcpy, the "jrp lr" will be incorrect.
+ * Here leaf function not only means it does not have calls, but also
+ * requires no stack operations (sp, stack frame pointer) and no
+ * use of callee-saved registers, else "jrp lr" will be incorrect since
+ * unwinding stack frame is bypassed. Since memcpy() is not complex so
+ * these conditions are satisfied here, but we need to be careful when
+ * modifying this file. This is not a clean solution but is the best
+ * one so far.
  *
  * Also note that we are capturing "n" from the containing scope here.
  */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: various improvements to stack backtracer
       [not found] <4F761E1C.80808.com>
                   ` (13 preceding siblings ...)
  2012-03-29 18:05 ` [PATCH] arch/tile: make sure to build memcpy_user_64 without frame pointer Chris Metcalf
@ 2012-03-29 18:06 ` Chris Metcalf
  2012-03-29 18:52 ` [PATCH] arch/tile: work around a hardware issue with the return-address stack Chris Metcalf
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 18:06 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

Fix a long-standing bug in the stack backtracer where we would print
garbage to the console instead of kernel function names, if the kernel
wasn't built with symbol support (e.g. mboot).

Make sure to tag every line of userspace backtrace output if we actually
have the mmap_sem, since that way if there's no tag, we know that it's
because we couldn't trylock the semaphore.

Stop doing a TLB flush and examining page tables during backtrace.
Instead, just trust that __copy_from_user_inatomic() will properly fault
and return a failure, which it should do in all cases.

Fix a latent bug where the backtracer would directly examine a signal
context in user space, rather than copying it safely to kernel memory
first.  This meant that a race with another thread could potentially
have caused a kernel panic.

Guard against unaligned sp when trying to restart backtrace at an
interrupt or signal handler point in the kernel backtracer.

Report kernel symbolic information for the call instruction rather
than for the following instruction.  We still report the actual numeric
address corresponding to the instruction after the call, for the sake
of consistency with the normal expectations for stack backtracers.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/stack.h |    1 -
 arch/tile/kernel/stack.c      |  230 ++++++++++++++++++++---------------------
 2 files changed, 112 insertions(+), 119 deletions(-)

diff --git a/arch/tile/include/asm/stack.h b/arch/tile/include/asm/stack.h
index 4d97a2d..0e9d382 100644
--- a/arch/tile/include/asm/stack.h
+++ b/arch/tile/include/asm/stack.h
@@ -25,7 +25,6 @@
 struct KBacktraceIterator {
 	BacktraceIterator it;
 	struct task_struct *task;     /* task we are backtracing */
-	pte_t *pgtable;		      /* page table for user space access */
 	int end;		      /* iteration complete. */
 	int new_context;              /* new context is starting */
 	int profile;                  /* profiling, so stop on async intrpt */
diff --git a/arch/tile/kernel/stack.c b/arch/tile/kernel/stack.c
index 9016d1c..1ab6377 100644
--- a/arch/tile/kernel/stack.c
+++ b/arch/tile/kernel/stack.c
@@ -21,9 +21,10 @@
 #include <linux/stacktrace.h>
 #include <linux/uaccess.h>
 #include <linux/mmzone.h>
+#include <linux/dcache.h>
+#include <linux/fs.h>
 #include <asm/backtrace.h>
 #include <asm/page.h>
-#include <asm/tlbflush.h>
 #include <asm/ucontext.h>
 #include <asm/sigframe.h>
 #include <asm/stack.h>
@@ -44,71 +45,23 @@ static int in_kernel_stack(struct KBacktraceIterator *kbt, unsigned long sp)
 	return sp >= kstack_base && sp < kstack_base + THREAD_SIZE;
 }
 
-/* Is address valid for reading? */
-static int valid_address(struct KBacktraceIterator *kbt, unsigned long address)
-{
-	HV_PTE *l1_pgtable = kbt->pgtable;
-	HV_PTE *l2_pgtable;
-	unsigned long pfn;
-	HV_PTE pte;
-	struct page *page;
-
-	if (l1_pgtable == NULL)
-		return 0;	/* can't read user space in other tasks */
-
-	pte = l1_pgtable[PGD_INDEX(address)];
-#ifdef CONFIG_64BIT
-	/* Find the real l1_pgtable by looking in the l0_pgtable. */
-	if (!hv_pte_get_present(pte))
-		return 0;
-	pfn = pte_pfn(pte);
-	if (pte_huge(pte)) {
-		if (!pfn_valid(pfn)) {
-			pr_err("L0 huge page has bad pfn %#lx\n", pfn);
-			return 0;
-		}
-		return hv_pte_get_present(pte) && hv_pte_get_readable(pte);
-	}
-	page = pfn_to_page(pfn);
-	BUG_ON(PageHighMem(page));  /* No HIGHMEM on 64-bit. */
-	l1_pgtable = (HV_PTE *)pfn_to_kaddr(pfn);
-	pte = l1_pgtable[PMD_INDEX(address)];
-#endif
-	if (!hv_pte_get_present(pte))
-		return 0;
-	pfn = pte_pfn(pte);
-	if (pte_huge(pte)) {
-		if (!pfn_valid(pfn)) {
-			pr_err("huge page has bad pfn %#lx\n", pfn);
-			return 0;
-		}
-		return hv_pte_get_present(pte) && hv_pte_get_readable(pte);
-	}
-
-	page = pfn_to_page(pfn);
-	if (PageHighMem(page)) {
-		pr_err("L2 page table not in LOWMEM (%#llx)\n", PFN_PHYS(pfn));
-		return 0;
-	}
-	l2_pgtable = (HV_PTE *)pfn_to_kaddr(pfn);
-	pte = l2_pgtable[PTE_INDEX(address)];
-	return hv_pte_get_present(pte) && hv_pte_get_readable(pte);
-}
-
 /* Callback for backtracer; basically a glorified memcpy */
 static bool read_memory_func(void *result, unsigned long address,
 			     unsigned int size, void *vkbt)
 {
 	int retval;
 	struct KBacktraceIterator *kbt = (struct KBacktraceIterator *)vkbt;
+
+	if (address == 0)
+		return 0;
 	if (__kernel_text_address(address)) {
 		/* OK to read kernel code. */
 	} else if (address >= PAGE_OFFSET) {
 		/* We only tolerate kernel-space reads of this task's stack */
 		if (!in_kernel_stack(kbt, address))
 			return 0;
-	} else if (!valid_address(kbt, address)) {
-		return 0;	/* invalid user-space address */
+	} else if (!kbt->is_current) {
+		return 0;	/* can't read from other user address spaces */
 	}
 	pagefault_disable();
 	retval = __copy_from_user_inatomic(result,
@@ -126,6 +79,8 @@ static struct pt_regs *valid_fault_handler(struct KBacktraceIterator* kbt)
 	unsigned long sp = kbt->it.sp;
 	struct pt_regs *p;
 
+	if (sp % sizeof(long) != 0)
+		return NULL;
 	if (!in_kernel_stack(kbt, sp))
 		return NULL;
 	if (!in_kernel_stack(kbt, sp + C_ABI_SAVE_AREA_SIZE + PTREGS_SIZE-1))
@@ -168,27 +123,27 @@ static int is_sigreturn(unsigned long pc)
 }
 
 /* Return a pt_regs pointer for a valid signal handler frame */
-static struct pt_regs *valid_sigframe(struct KBacktraceIterator* kbt)
+static struct pt_regs *valid_sigframe(struct KBacktraceIterator* kbt,
+				      struct rt_sigframe* kframe)
 {
 	BacktraceIterator *b = &kbt->it;
 
-	if (b->pc == VDSO_BASE) {
-		struct rt_sigframe *frame;
-		unsigned long sigframe_top =
-			b->sp + sizeof(struct rt_sigframe) - 1;
-		if (!valid_address(kbt, b->sp) ||
-		    !valid_address(kbt, sigframe_top)) {
-			if (kbt->verbose)
-				pr_err("  (odd signal: sp %#lx?)\n",
-				       (unsigned long)(b->sp));
+	if (b->pc == VDSO_BASE && b->sp < PAGE_OFFSET &&
+	    b->sp % sizeof(long) == 0) {
+		int retval;
+		pagefault_disable();
+		retval = __copy_from_user_inatomic(
+			kframe, (void __user __force *)b->sp,
+			sizeof(*kframe));
+		pagefault_enable();
+		if (retval != 0 ||
+		    (unsigned int)(kframe->info.si_signo) >= _NSIG)
 			return NULL;
-		}
-		frame = (struct rt_sigframe *)b->sp;
 		if (kbt->verbose) {
 			pr_err("  <received signal %d>\n",
-			       frame->info.si_signo);
+			       kframe->info.si_signo);
 		}
-		return (struct pt_regs *)&frame->uc.uc_mcontext;
+		return (struct pt_regs *)&kframe->uc.uc_mcontext;
 	}
 	return NULL;
 }
@@ -201,10 +156,11 @@ static int KBacktraceIterator_is_sigreturn(struct KBacktraceIterator *kbt)
 static int KBacktraceIterator_restart(struct KBacktraceIterator *kbt)
 {
 	struct pt_regs *p;
+	struct rt_sigframe kframe;
 
 	p = valid_fault_handler(kbt);
 	if (p == NULL)
-		p = valid_sigframe(kbt);
+		p = valid_sigframe(kbt, &kframe);
 	if (p == NULL)
 		return 0;
 	backtrace_init(&kbt->it, read_memory_func, kbt,
@@ -264,41 +220,19 @@ void KBacktraceIterator_init(struct KBacktraceIterator *kbt,
 
 	/*
 	 * Set up callback information.  We grab the kernel stack base
-	 * so we will allow reads of that address range, and if we're
-	 * asking about the current process we grab the page table
-	 * so we can check user accesses before trying to read them.
-	 * We flush the TLB to avoid any weird skew issues.
+	 * so we will allow reads of that address range.
 	 */
-	is_current = (t == NULL);
+	is_current = (t == NULL || t == current);
 	kbt->is_current = is_current;
 	if (is_current)
 		t = validate_current();
 	kbt->task = t;
-	kbt->pgtable = NULL;
 	kbt->verbose = 0;   /* override in caller if desired */
 	kbt->profile = 0;   /* override in caller if desired */
 	kbt->end = KBT_ONGOING;
-	kbt->new_context = 0;
-	if (is_current) {
-		HV_PhysAddr pgdir_pa = hv_inquire_context().page_table;
-		if (pgdir_pa == (unsigned long)swapper_pg_dir - PAGE_OFFSET) {
-			/*
-			 * Not just an optimization: this also allows
-			 * this to work at all before va/pa mappings
-			 * are set up.
-			 */
-			kbt->pgtable = swapper_pg_dir;
-		} else {
-			struct page *page = pfn_to_page(PFN_DOWN(pgdir_pa));
-			if (!PageHighMem(page))
-				kbt->pgtable = __va(pgdir_pa);
-			else
-				pr_err("page table not in LOWMEM"
-				       " (%#llx)\n", pgdir_pa);
-		}
-		local_flush_tlb_all();
+	kbt->new_context = 1;
+	if (is_current)
 		validate_stack(regs);
-	}
 
 	if (regs == NULL) {
 		if (is_current || t->state == TASK_RUNNING) {
@@ -344,6 +278,78 @@ void KBacktraceIterator_next(struct KBacktraceIterator *kbt)
 }
 EXPORT_SYMBOL(KBacktraceIterator_next);
 
+static void describe_addr(struct KBacktraceIterator *kbt,
+			  unsigned long address,
+			  int have_mmap_sem, char *buf, size_t bufsize)
+{
+	struct vm_area_struct *vma;
+	size_t namelen, remaining;
+	unsigned long size, offset, adjust;
+	char *p, *modname;
+	const char *name;
+	int rc;
+
+	/*
+	 * Look one byte back for every caller frame (i.e. those that
+	 * aren't a new context) so we look up symbol data for the
+	 * call itself, not the following instruction, which may be on
+	 * a different line (or in a different function).
+	 */
+	adjust = !kbt->new_context;
+	address -= adjust;
+
+	if (address >= PAGE_OFFSET) {
+		/* Handle kernel symbols. */
+		BUG_ON(bufsize < KSYM_NAME_LEN);
+		name = kallsyms_lookup(address, &size, &offset,
+				       &modname, buf);
+		if (name == NULL) {
+			buf[0] = '\0';
+			return;
+		}
+		namelen = strlen(buf);
+		remaining = (bufsize - 1) - namelen;
+		p = buf + namelen;
+		rc = snprintf(p, remaining, "+%#lx/%#lx ",
+			      offset + adjust, size);
+		if (modname && rc < remaining)
+			snprintf(p + rc, remaining - rc, "[%s] ", modname);
+		buf[bufsize-1] = '\0';
+		return;
+	}
+
+	/* If we don't have the mmap_sem, we can't show any more info. */
+	buf[0] = '\0';
+	if (!have_mmap_sem)
+		return;
+
+	/* Find vma info. */
+	vma = find_vma(kbt->task->mm, address);
+	if (vma == NULL || address < vma->vm_start) {
+		snprintf(buf, bufsize, "[unmapped address] ");
+		return;
+	}
+
+	if (vma->vm_file) {
+		char *s;
+		p = d_path(&vma->vm_file->f_path, buf, bufsize);
+		if (IS_ERR(p))
+			p = "?";
+		s = strrchr(p, '/');
+		if (s)
+			p = s+1;
+	} else {
+		p = "anon";
+	}
+
+	/* Generate a string description of the vma info. */
+	namelen = strlen(p);
+	remaining = (bufsize - 1) - namelen;
+	memmove(buf, p, namelen);
+	snprintf(buf + namelen, remaining, "[%lx+%lx] ",
+		 vma->vm_start, vma->vm_end - vma->vm_start);
+}
+
 /*
  * This method wraps the backtracer's more generic support.
  * It is only invoked from the architecture-specific code; show_stack()
@@ -352,6 +358,7 @@ EXPORT_SYMBOL(KBacktraceIterator_next);
 void tile_show_stack(struct KBacktraceIterator *kbt, int headers)
 {
 	int i;
+	int have_mmap_sem = 0;
 
 	if (headers) {
 		/*
@@ -368,31 +375,16 @@ void tile_show_stack(struct KBacktraceIterator *kbt, int headers)
 	kbt->verbose = 1;
 	i = 0;
 	for (; !KBacktraceIterator_end(kbt); KBacktraceIterator_next(kbt)) {
-		char *modname;
-		const char *name;
-		unsigned long address = kbt->it.pc;
-		unsigned long offset, size;
 		char namebuf[KSYM_NAME_LEN+100];
+		unsigned long address = kbt->it.pc;
 
-		if (address >= PAGE_OFFSET)
-			name = kallsyms_lookup(address, &size, &offset,
-					       &modname, namebuf);
-		else
-			name = NULL;
-
-		if (!name)
-			namebuf[0] = '\0';
-		else {
-			size_t namelen = strlen(namebuf);
-			size_t remaining = (sizeof(namebuf) - 1) - namelen;
-			char *p = namebuf + namelen;
-			int rc = snprintf(p, remaining, "+%#lx/%#lx ",
-					  offset, size);
-			if (modname && rc < remaining)
-				snprintf(p + rc, remaining - rc,
-					 "[%s] ", modname);
-			namebuf[sizeof(namebuf)-1] = '\0';
-		}
+		/* Try to acquire the mmap_sem as we pass into userspace. */
+		if (address < PAGE_OFFSET && !have_mmap_sem && kbt->task->mm)
+			have_mmap_sem =
+				down_read_trylock(&kbt->task->mm->mmap_sem);
+
+		describe_addr(kbt, address, have_mmap_sem,
+			      namebuf, sizeof(namebuf));
 
 		pr_err("  frame %d: 0x%lx %s(sp 0x%lx)\n",
 		       i++, address, namebuf, (unsigned long)(kbt->it.sp));
@@ -407,6 +399,8 @@ void tile_show_stack(struct KBacktraceIterator *kbt, int headers)
 		pr_err("Stack dump stopped; next frame identical to this one\n");
 	if (headers)
 		pr_err("Stack dump complete\n");
+	if (have_mmap_sem)
+		up_read(&kbt->task->mm->mmap_sem);
 }
 EXPORT_SYMBOL(tile_show_stack);
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: work around a hardware issue with the return-address stack
       [not found] <4F761E1C.80808.com>
                   ` (14 preceding siblings ...)
  2012-03-29 18:06 ` [PATCH] arch/tile: various improvements to stack backtracer Chris Metcalf
@ 2012-03-29 18:52 ` Chris Metcalf
  2012-03-29 19:23 ` [PATCH] arch/tile: improve trap handling a bit Chris Metcalf
                   ` (27 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 18:52 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, linux-kernel

In certain circumstances we need to do a bunch of jump-and-link
instructions to fill the hardware return-address stack with nonzero values.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/traps.h |    6 +++++-
 arch/tile/kernel/intvec_64.S  |   12 ++++++++++++
 arch/tile/kernel/traps.c      |    6 +++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/tile/include/asm/traps.h b/arch/tile/include/asm/traps.h
index 5f20f92..e28c3df 100644
--- a/arch/tile/include/asm/traps.h
+++ b/arch/tile/include/asm/traps.h
@@ -64,7 +64,11 @@ void do_breakpoint(struct pt_regs *, int fault_num);
 
 
 #ifdef __tilegx__
+/* kernel/single_step.c */
 void gx_singlestep_handle(struct pt_regs *, int fault_num);
+
+/* kernel/intvec_64.S */
+void fill_ra_stack(void);
 #endif
 
-#endif /* _ASM_TILE_SYSCALLS_H */
+#endif /* _ASM_TILE_TRAPS_H */
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 3c1f626..709e224 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1164,6 +1164,18 @@ int_unalign:
 	push_extra_callee_saves r0
 	j       do_trap
 
+/* Fill the return address stack with nonzero entries. */
+STD_ENTRY(fill_ra_stack)
+	{
+	 move	r0, lr
+	 jal	1f
+	}
+1:	jal	2f
+2:	jal	3f
+3:	jal	4f
+4:	jrp	r0
+	STD_ENDPROC(fill_ra_stack)
+
 /* Include .intrpt1 array of interrupt vectors */
 	.section ".intrpt1", "ax"
 
diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 4f47b8a..1e91fda 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -288,7 +288,10 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,
 		address = regs->pc;
 		break;
 #ifdef __tilegx__
-	case INT_ILL_TRANS:
+	case INT_ILL_TRANS: {
+		/* Avoid a hardware erratum with the return address stack. */
+		fill_ra_stack();
+
 		signo = SIGSEGV;
 		code = SEGV_MAPERR;
 		if (reason & SPR_ILL_TRANS_REASON__I_STREAM_VA_RMASK)
@@ -296,6 +299,7 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,
 		else
 			address = 0;  /* FIXME: GX: single-step for address */
 		break;
+	}
 #endif
 	default:
 		panic("Unexpected do_trap interrupt number %d", fault_num);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: improve trap handling a bit
       [not found] <4F761E1C.80808.com>
                   ` (15 preceding siblings ...)
  2012-03-29 18:52 ` [PATCH] arch/tile: work around a hardware issue with the return-address stack Chris Metcalf
@ 2012-03-29 19:23 ` Chris Metcalf
  2012-03-29 19:25 ` [PATCH] arch/tile: support <asm/cachectl.h> header for cacheflush() syscall Chris Metcalf
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:23 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, linux-kernel

We now respond to MEM_ERROR traps (e.g. an atomic instruction to
non-cacheable memory) with a SIGBUS.

We also no longer generate a console crash message if a user
process die due to a SIGTRAP.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/intvec_64.S |    2 +-
 arch/tile/kernel/traps.c     |    9 +++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 709e224..6a1ea82 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1186,7 +1186,7 @@ STD_ENTRY(fill_ra_stack)
 #define do_hardwall_trap bad_intr
 #endif
 
-	int_hand     INT_MEM_ERROR, MEM_ERROR, bad_intr
+	int_hand     INT_MEM_ERROR, MEM_ERROR, do_trap
 	int_hand     INT_SINGLE_STEP_3, SINGLE_STEP_3, bad_intr
 #if CONFIG_KERNEL_PL == 2
 	int_hand     INT_SINGLE_STEP_2, SINGLE_STEP_2, gx_singlestep_handle
diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 1e91fda..4c33057 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -199,7 +199,7 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,
 {
 	siginfo_t info = { 0 };
 	int signo, code;
-	unsigned long address;
+	unsigned long address = 0;
 	bundle_bits instr;
 
 	/* Re-enable interrupts. */
@@ -222,6 +222,10 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,
 	}
 
 	switch (fault_num) {
+	case INT_MEM_ERROR:
+		signo = SIGBUS;
+		code = BUS_OBJERR;
+		break;
 	case INT_ILL:
 		if (copy_from_user(&instr, (void __user *)regs->pc,
 				   sizeof(instr))) {
@@ -311,7 +315,8 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,
 	info.si_addr = (void __user *)address;
 	if (signo == SIGILL)
 		info.si_trapno = fault_num;
-	trace_unhandled_signal("trap", regs, address, signo);
+	if (signo != SIGTRAP)
+		trace_unhandled_signal("trap", regs, address, signo);
 	force_sig_info(signo, &info, current);
 }
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: support <asm/cachectl.h> header for cacheflush() syscall
       [not found] <4F761E1C.80808.com>
                   ` (16 preceding siblings ...)
  2012-03-29 19:23 ` [PATCH] arch/tile: improve trap handling a bit Chris Metcalf
@ 2012-03-29 19:25 ` Chris Metcalf
  2012-03-29 19:29 ` [PATCH] arch/tile: fix a couple of comments that needed updating Chris Metcalf
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:25 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, Arnd Bergmann, linux-kernel

We already had a syscall that did some dcache flushing, but it was
not used in practice.  Make it MIPS compatible instead so it can
do both the DCACHE and ICACHE actions.  We have code that wants to
be able to use the ICACHE flush mode from userspace so this change
enables that.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/Kbuild     |    1 +
 arch/tile/include/asm/cachectl.h |   42 ++++++++++++++++++++++++++++++++++++++
 arch/tile/include/asm/compat.h   |    3 --
 arch/tile/include/asm/syscalls.h |    3 +-
 arch/tile/include/asm/unistd.h   |    4 +-
 arch/tile/kernel/sys.c           |   10 +++++++-
 6 files changed, 55 insertions(+), 8 deletions(-)
 create mode 100644 arch/tile/include/asm/cachectl.h

diff --git a/arch/tile/include/asm/Kbuild b/arch/tile/include/asm/Kbuild
index 6b2e681..143473e 100644
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -2,6 +2,7 @@ include include/asm-generic/Kbuild.asm
 
 header-y += ../arch/
 
+header-y += cachectl.h
 header-y += ucontext.h
 header-y += hardwall.h
 
diff --git a/arch/tile/include/asm/cachectl.h b/arch/tile/include/asm/cachectl.h
new file mode 100644
index 0000000..af4c9f9
--- /dev/null
+++ b/arch/tile/include/asm/cachectl.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2011 Tilera Corporation. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ *   NON INFRINGEMENT.  See the GNU General Public License for
+ *   more details.
+ */
+
+#ifndef _ASM_TILE_CACHECTL_H
+#define _ASM_TILE_CACHECTL_H
+
+/*
+ * Options for cacheflush system call.
+ *
+ * The ICACHE flush is performed on all cores currently running the
+ * current process's address space.  The intent is for user
+ * applications to be able to modify code, invoke the system call,
+ * then allow arbitrary other threads in the same address space to see
+ * the newly-modified code.  Passing a length of CHIP_L1I_CACHE_SIZE()
+ * or more invalidates the entire icache on all cores in the address
+ * spaces.  (Note: currently this option invalidates the entire icache
+ * regardless of the requested address and length, but we may choose
+ * to honor the arguments at some point.)
+ *
+ * Flush and invalidation of memory can normally be performed with the
+ * __insn_flush(), __insn_inv(), and __insn_finv() instructions from
+ * userspace.  The DCACHE option to the system call allows userspace
+ * to flush the entire L1+L2 data cache from the core.  In this case,
+ * the address and length arguments are not used.  The DCACHE flush is
+ * restricted to the current core, not all cores in the address space.
+ */
+#define	ICACHE	(1<<0)		/* invalidate L1 instruction cache */
+#define	DCACHE	(1<<1)		/* flush and invalidate data cache */
+#define	BCACHE	(ICACHE|DCACHE)	/* flush both caches               */
+
+#endif	/* _ASM_TILE_CACHECTL_H */
diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h
index 4b4b289..69adc08 100644
--- a/arch/tile/include/asm/compat.h
+++ b/arch/tile/include/asm/compat.h
@@ -242,9 +242,6 @@ long compat_sys_fallocate(int fd, int mode,
 long compat_sys_sched_rr_get_interval(compat_pid_t pid,
 				      struct compat_timespec __user *interval);
 
-/* Tilera Linux syscalls that don't have "compat" versions. */
-#define compat_sys_flush_cache sys_flush_cache
-
 /* These are the intvec_64.S trampolines. */
 long _compat_sys_execve(const char __user *path,
 			const compat_uptr_t __user *argv,
diff --git a/arch/tile/include/asm/syscalls.h b/arch/tile/include/asm/syscalls.h
index 3b5507c..06f0464 100644
--- a/arch/tile/include/asm/syscalls.h
+++ b/arch/tile/include/asm/syscalls.h
@@ -43,7 +43,8 @@ long sys32_fadvise64(int fd, u32 offset_lo, u32 offset_hi,
 		     u32 len, int advice);
 int sys32_fadvise64_64(int fd, u32 offset_lo, u32 offset_hi,
 		       u32 len_lo, u32 len_hi, int advice);
-long sys_flush_cache(void);
+long sys_cacheflush(unsigned long addr, unsigned long len,
+		    unsigned long flags);
 #ifndef __tilegx__  /* No mmap() in the 32-bit kernel. */
 #define sys_mmap sys_mmap
 #endif
diff --git a/arch/tile/include/asm/unistd.h b/arch/tile/include/asm/unistd.h
index f70bf1c..a017246 100644
--- a/arch/tile/include/asm/unistd.h
+++ b/arch/tile/include/asm/unistd.h
@@ -24,8 +24,8 @@
 #include <asm-generic/unistd.h>
 
 /* Additional Tilera-specific syscalls. */
-#define __NR_flush_cache	(__NR_arch_specific_syscall + 1)
-__SYSCALL(__NR_flush_cache, sys_flush_cache)
+#define __NR_cacheflush	(__NR_arch_specific_syscall + 1)
+__SYSCALL(__NR_cacheflush, sys_cacheflush)
 
 #ifndef __tilegx__
 /* "Fast" syscalls provide atomic support for 32-bit chips. */
diff --git a/arch/tile/kernel/sys.c b/arch/tile/kernel/sys.c
index cb44ba7..b08095b 100644
--- a/arch/tile/kernel/sys.c
+++ b/arch/tile/kernel/sys.c
@@ -32,11 +32,17 @@
 #include <asm/syscalls.h>
 #include <asm/pgtable.h>
 #include <asm/homecache.h>
+#include <asm/cachectl.h>
 #include <arch/chip.h>
 
-SYSCALL_DEFINE0(flush_cache)
+SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, len,
+		unsigned long, flags)
 {
-	homecache_evict(cpumask_of(smp_processor_id()));
+	if (flags & DCACHE)
+		homecache_evict(cpumask_of(smp_processor_id()));
+	if (flags & ICACHE)
+		flush_remote(0, HV_FLUSH_EVICT_L1I, mm_cpumask(current->mm),
+			     0, 0, 0, NULL, NULL, 0);
 	return 0;
 }
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix a couple of comments that needed updating
       [not found] <4F761E1C.80808.com>
                   ` (17 preceding siblings ...)
  2012-03-29 19:25 ` [PATCH] arch/tile: support <asm/cachectl.h> header for cacheflush() syscall Chris Metcalf
@ 2012-03-29 19:29 ` Chris Metcalf
  2012-03-29 19:30 ` [PATCH] arch/tile/Makefile: use KCFLAGS when figuring out the libgcc path Chris Metcalf
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:29 UTC (permalink / raw)
  To: Chris Metcalf, Rusty Russell, Jiri Kosina, Joe Perches,
	Paul E. McKenney, Lucas De Marchi, Josh Triplett, linux-kernel

Not associated with any code changes, so I'm just lumping these
comment changes into a commit by themselves.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/setup.c |   11 +++++++++--
 arch/tile/mm/fault.c     |    2 +-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 1a7276a..62b1903 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -913,6 +913,13 @@ void __cpuinit setup_cpu(int boot)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
+/*
+ * Note that the kernel can potentially support other compression
+ * techniques than gz, though we don't do so by default.  If we ever
+ * decide to do so we can either look for other filename extensions,
+ * or just allow a file with this name to be compressed with an
+ * arbitrary compressor (somewhat counterintuitively).
+ */
 static int __initdata set_initramfs_file;
 static char __initdata initramfs_file[128] = "initramfs.cpio.gz";
 
@@ -928,9 +935,9 @@ static int __init setup_initramfs_file(char *str)
 early_param("initramfs_file", setup_initramfs_file);
 
 /*
- * We look for an additional "initramfs.cpio.gz" file in the hvfs.
+ * We look for an "initramfs.cpio.gz" file in the hvfs.
  * If there is one, we allocate some memory for it and it will be
- * unpacked to the initramfs after any built-in initramfs_data.
+ * unpacked to the initramfs.
  */
 static void __init load_hv_initrd(void)
 {
diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index c1eaaa1..5f1fdeb 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -131,7 +131,7 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 }
 
 /*
- * Handle a fault on the vmalloc or module mapping area
+ * Handle a fault on the vmalloc area.
  */
 static inline int vmalloc_fault(pgd_t *pgd, unsigned long address)
 {
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile/Makefile: use KCFLAGS when figuring out the libgcc path.
       [not found] <4F761E1C.80808.com>
                   ` (18 preceding siblings ...)
  2012-03-29 19:29 ` [PATCH] arch/tile: fix a couple of comments that needed updating Chris Metcalf
@ 2012-03-29 19:30 ` Chris Metcalf
  2012-03-29 19:34 ` [PATCH] arch/tile: don't wait for migrating PTEs in an NMI handler Chris Metcalf
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:30 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Makefile |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/tile/Makefile b/arch/tile/Makefile
index 17acce7..5e4d3b9 100644
--- a/arch/tile/Makefile
+++ b/arch/tile/Makefile
@@ -30,7 +30,8 @@ ifneq ($(CONFIG_DEBUG_EXTRA_FLAGS),"")
 KBUILD_CFLAGS   += $(CONFIG_DEBUG_EXTRA_FLAGS)
 endif
 
-LIBGCC_PATH     := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
+LIBGCC_PATH     := \
+  $(shell $(CC) $(KBUILD_CFLAGS) $(KCFLAGS) -print-libgcc-file-name)
 
 # Provide the path to use for "make defconfig".
 KBUILD_DEFCONFIG := $(ARCH)_defconfig
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: don't wait for migrating PTEs in an NMI handler
       [not found] <4F761E1C.80808.com>
                   ` (19 preceding siblings ...)
  2012-03-29 19:30 ` [PATCH] arch/tile/Makefile: use KCFLAGS when figuring out the libgcc path Chris Metcalf
@ 2012-03-29 19:34 ` Chris Metcalf
  2012-03-29 19:36 ` [PATCH] arch/tile: don't set the homecache of a PTE unless appropriate Chris Metcalf
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:34 UTC (permalink / raw)
  To: Chris Metcalf, Paul E. McKenney, Lucas De Marchi, Josh Triplett,
	linux-kernel

Doing so raises the possibility of self-deadlock if we are waiting
for a backtrace for an oprofile or perf interrupt while we are
in the middle of migrating our own stack page.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/fault.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index 5f1fdeb..bcba159 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -204,9 +204,14 @@ static pgd_t *get_current_pgd(void)
  * interrupt or a critical region, and must do as little as possible.
  * Similarly, we can't use atomic ops here, since we may be handling a
  * fault caused by an atomic op access.
+ *
+ * If we find a migrating PTE while we're in an NMI context, and we're
+ * at a PC that has a registered exception handler, we don't wait,
+ * since this thread may (e.g.) have been interrupted while migrating
+ * its own stack, which would then cause us to self-deadlock.
  */
 static int handle_migrating_pte(pgd_t *pgd, int fault_num,
-				unsigned long address,
+				unsigned long address, unsigned long pc,
 				int is_kernel_mode, int write)
 {
 	pud_t *pud;
@@ -228,6 +233,8 @@ static int handle_migrating_pte(pgd_t *pgd, int fault_num,
 		pte_offset_kernel(pmd, address);
 	pteval = *pte;
 	if (pte_migrating(pteval)) {
+		if (in_nmi() && search_exception_tables(pc))
+			return 0;
 		wait_for_migration(pte);
 		return 1;
 	}
@@ -301,7 +308,7 @@ static int handle_page_fault(struct pt_regs *regs,
 	 * rather than trying to patch up the existing PTE.
 	 */
 	pgd = get_current_pgd();
-	if (handle_migrating_pte(pgd, fault_num, address,
+	if (handle_migrating_pte(pgd, fault_num, address, regs->pc,
 				 is_kernel_mode, write))
 		return 1;
 
@@ -666,7 +673,7 @@ struct intvec_state do_page_fault_ics(struct pt_regs *regs, int fault_num,
 	 */
 	if (fault_num == INT_DTLB_ACCESS)
 		write = 1;
-	if (handle_migrating_pte(pgd, fault_num, address, 1, write))
+	if (handle_migrating_pte(pgd, fault_num, address, pc, 1, write))
 		return state;
 
 	/* Return zero so that we continue on with normal fault handling. */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: don't set the homecache of a PTE unless appropriate
       [not found] <4F761E1C.80808.com>
                   ` (20 preceding siblings ...)
  2012-03-29 19:34 ` [PATCH] arch/tile: don't wait for migrating PTEs in an NMI handler Chris Metcalf
@ 2012-03-29 19:36 ` Chris Metcalf
  2012-03-29 19:40 ` [PATCH] arch/tile: don't enable irqs unconditionally in page fault handler Chris Metcalf
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:36 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

We make sure not to try to set the home for an MMIO PTE (on tilegx)
or a PTE that isn't referencing memory managed by Linux.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/pgtable.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 14e7e9e..277645f 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -467,10 +467,18 @@ void __set_pte(pte_t *ptep, pte_t pte)
 
 void set_pte(pte_t *ptep, pte_t pte)
 {
-	struct page *page = pfn_to_page(pte_pfn(pte));
-
-	/* Update the home of a PTE if necessary */
-	pte = pte_set_home(pte, page_home(page));
+	if (pte_present(pte) &&
+	    (!CHIP_HAS_MMIO() || hv_pte_get_mode(pte) != HV_PTE_MODE_MMIO)) {
+		/* The PTE actually references physical memory. */
+		unsigned long pfn = pte_pfn(pte);
+		if (pfn_valid(pfn)) {
+			/* Update the home of the PTE from the struct page. */
+			pte = pte_set_home(pte, page_home(pfn_to_page(pfn)));
+		} else if (hv_pte_get_mode(pte) == 0) {
+			/* remap_pfn_range(), etc, must supply PTE mode. */
+			panic("set_pte(): out-of-range PFN and mode 0\n");
+		}
+	}
 
 	__set_pte(ptep, pte);
 }
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: don't enable irqs unconditionally in page fault handler
       [not found] <4F761E1C.80808.com>
                   ` (21 preceding siblings ...)
  2012-03-29 19:36 ` [PATCH] arch/tile: don't set the homecache of a PTE unless appropriate Chris Metcalf
@ 2012-03-29 19:40 ` Chris Metcalf
  2012-03-29 19:42 ` [PATCH] arch/tile: support loading kernels larger than 16 MB Chris Metcalf
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:40 UTC (permalink / raw)
  To: Chris Metcalf, Paul E. McKenney, Lucas De Marchi, Josh Triplett,
	linux-kernel

If we took a page fault while we had interrupts disabled, we
shouldn't enable them in the page fault handler.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/fault.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index bcba159..5d10a74 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -343,9 +343,12 @@ static int handle_page_fault(struct pt_regs *regs,
 	/*
 	 * If we're trying to touch user-space addresses, we must
 	 * be either at PL0, or else with interrupts enabled in the
-	 * kernel, so either way we can re-enable interrupts here.
+	 * kernel, so either way we can re-enable interrupts here
+	 * unless we are doing atomic access to user space with
+	 * interrupts disabled.
 	 */
-	local_irq_enable();
+	if (!(regs->flags & PT_FLAGS_DISABLE_IRQ))
+		local_irq_enable();
 
 	mm = tsk->mm;
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: support loading kernels larger than 16 MB
       [not found] <4F761E1C.80808.com>
                   ` (22 preceding siblings ...)
  2012-03-29 19:40 ` [PATCH] arch/tile: don't enable irqs unconditionally in page fault handler Chris Metcalf
@ 2012-03-29 19:42 ` Chris Metcalf
  2012-03-29 19:43 ` [PATCH] arch/tile: fix bug in delay_backoff() Chris Metcalf
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:42 UTC (permalink / raw)
  To: Chris Metcalf, Andrew Morton, Julia Lawall, Peter Zijlstra, linux-kernel

Previously we only handled kernels up to a single huge page in size.
Now we create additional PTEs appropriately.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/init.c |   21 +++++++++++++++------
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index d1c6391..6119f46 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -554,6 +554,7 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
 
 	address = MEM_SV_INTRPT;
 	pmd = get_pmd(pgtables, address);
+	pfn = 0;  /* code starts at PA 0 */
 	if (ktext_small) {
 		/* Allocate an L2 PTE for the kernel text */
 		int cpu = 0;
@@ -576,10 +577,15 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
 		}
 
 		BUG_ON(address != (unsigned long)_stext);
-		pfn = 0;  /* code starts at PA 0 */
-		pte = alloc_pte();
-		for (pte_ofs = 0; address < (unsigned long)_einittext;
-		     pfn++, pte_ofs++, address += PAGE_SIZE) {
+		pte = NULL;
+		for (; address < (unsigned long)_einittext;
+		     pfn++, address += PAGE_SIZE) {
+			pte_ofs = pte_index(address);
+			if (pte_ofs == 0) {
+				if (pte)
+					assign_pte(pmd++, pte);
+				pte = alloc_pte();
+			}
 			if (!ktext_local) {
 				prot = set_remote_cache_cpu(prot, cpu);
 				cpu = cpumask_next(cpu, &ktext_mask);
@@ -588,7 +594,8 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
 			}
 			pte[pte_ofs] = pfn_pte(pfn, prot);
 		}
-		assign_pte(pmd, pte);
+		if (pte)
+			assign_pte(pmd, pte);
 	} else {
 		pte_t pteval = pfn_pte(0, PAGE_KERNEL_EXEC);
 		pteval = pte_mkhuge(pteval);
@@ -611,7 +618,9 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
 		else
 			pteval = hv_pte_set_mode(pteval,
 						 HV_PTE_MODE_CACHE_NO_L3);
-		*(pte_t *)pmd = pteval;
+		for (; address < (unsigned long)_einittext;
+		     pfn += PFN_DOWN(HPAGE_SIZE), address += HPAGE_SIZE)
+			*(pte_t *)(pmd++) = pfn_pte(pfn, pteval);
 	}
 
 	/* Set swapper_pgprot here so it is flushed to memory right away. */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix bug in delay_backoff()
       [not found] <4F761E1C.80808.com>
                   ` (23 preceding siblings ...)
  2012-03-29 19:42 ` [PATCH] arch/tile: support loading kernels larger than 16 MB Chris Metcalf
@ 2012-03-29 19:43 ` Chris Metcalf
  2012-03-29 19:44 ` [PATCH] arch/tile: don't leak kernel memory when we unload modules Chris Metcalf
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:43 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

We were carefully computing a value to use for the number of loops
to spin for, and then ignoring it.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/lib/spinlock_common.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/lib/spinlock_common.h b/arch/tile/lib/spinlock_common.h
index c101098..6ac3750 100644
--- a/arch/tile/lib/spinlock_common.h
+++ b/arch/tile/lib/spinlock_common.h
@@ -60,5 +60,5 @@ static void delay_backoff(int iterations)
 	loops += __insn_crc32_32(stack_pointer, get_cycles_low()) &
 		(loops - 1);
 
-	relax(1 << exponent);
+	relax(loops);
 }
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: don't leak kernel memory when we unload modules
       [not found] <4F761E1C.80808.com>
                   ` (24 preceding siblings ...)
  2012-03-29 19:43 ` [PATCH] arch/tile: fix bug in delay_backoff() Chris Metcalf
@ 2012-03-29 19:44 ` Chris Metcalf
  2012-03-29 19:48 ` [PATCH] arch/tile: support kexec() for tilegx Chris Metcalf
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:44 UTC (permalink / raw)
  To: Chris Metcalf, Rusty Russell, Mike Frysinger, Jonas Bonn,
	Geert Uytterhoeven, linux-kernel

We were failing to track the memory when we allocated it.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/module.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/tile/kernel/module.c b/arch/tile/kernel/module.c
index bb2dc1e..001cbfa 100644
--- a/arch/tile/kernel/module.c
+++ b/arch/tile/kernel/module.c
@@ -67,6 +67,8 @@ void *module_alloc(unsigned long size)
 	area = __get_vm_area(size, VM_ALLOC, MEM_MODULE_START, MEM_MODULE_END);
 	if (!area)
 		goto error;
+	area->nr_pages = npages;
+	area->pages = pages;
 
 	if (map_vm_area(area, prot_rwx, &pages)) {
 		vunmap(area->addr);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: support kexec() for tilegx
       [not found] <4F761E1C.80808.com>
                   ` (25 preceding siblings ...)
  2012-03-29 19:44 ` [PATCH] arch/tile: don't leak kernel memory when we unload modules Chris Metcalf
@ 2012-03-29 19:48 ` Chris Metcalf
  2012-03-29 19:50 ` [PATCH] arch/tile: fix up locking in pgtable.c slightly Chris Metcalf
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:48 UTC (permalink / raw)
  To: Chris Metcalf, Arnd Bergmann, Andrew Morton, Geert Uytterhoeven,
	Joe Perches, Ralf Baechle, linux-kernel

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/kexec.h         |   12 ++
 arch/tile/kernel/Makefile             |    2 +-
 arch/tile/kernel/machine_kexec.c      |   35 +++-
 arch/tile/kernel/relocate_kernel.S    |  280 ---------------------------------
 arch/tile/kernel/relocate_kernel_32.S |  280 +++++++++++++++++++++++++++++++++
 arch/tile/kernel/relocate_kernel_64.S |  260 ++++++++++++++++++++++++++++++
 6 files changed, 580 insertions(+), 289 deletions(-)
 delete mode 100644 arch/tile/kernel/relocate_kernel.S
 create mode 100644 arch/tile/kernel/relocate_kernel_32.S
 create mode 100644 arch/tile/kernel/relocate_kernel_64.S

diff --git a/arch/tile/include/asm/kexec.h b/arch/tile/include/asm/kexec.h
index c11a6cc..fc98ccf 100644
--- a/arch/tile/include/asm/kexec.h
+++ b/arch/tile/include/asm/kexec.h
@@ -19,12 +19,24 @@
 
 #include <asm/page.h>
 
+#ifndef __tilegx__
 /* Maximum physical address we can use pages from. */
 #define KEXEC_SOURCE_MEMORY_LIMIT TASK_SIZE
 /* Maximum address we can reach in physical address mode. */
 #define KEXEC_DESTINATION_MEMORY_LIMIT TASK_SIZE
 /* Maximum address we can use for the control code buffer. */
 #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
+#else
+/* We need to limit the memory below PGDIR_SIZE since
+ * we only setup page table for [0, PGDIR_SIZE) before final kexec.
+ */
+/* Maximum physical address we can use pages from. */
+#define KEXEC_SOURCE_MEMORY_LIMIT PGDIR_SIZE
+/* Maximum address we can reach in physical address mode. */
+#define KEXEC_DESTINATION_MEMORY_LIMIT PGDIR_SIZE
+/* Maximum address we can use for the control code buffer. */
+#define KEXEC_CONTROL_MEMORY_LIMIT PGDIR_SIZE
+#endif
 
 #define KEXEC_CONTROL_PAGE_SIZE	PAGE_SIZE
 
diff --git a/arch/tile/kernel/Makefile b/arch/tile/kernel/Makefile
index d6261e4..f19116d 100644
--- a/arch/tile/kernel/Makefile
+++ b/arch/tile/kernel/Makefile
@@ -13,5 +13,5 @@ obj-$(CONFIG_COMPAT)		+= compat.o compat_signal.o
 obj-$(CONFIG_SMP)		+= smpboot.o smp.o tlb.o
 obj-$(CONFIG_MODULES)		+= module.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o
+obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel_$(BITS).o
 obj-$(CONFIG_PCI)		+= pci.o
diff --git a/arch/tile/kernel/machine_kexec.c b/arch/tile/kernel/machine_kexec.c
index b0fa37c..f0b54a9 100644
--- a/arch/tile/kernel/machine_kexec.c
+++ b/arch/tile/kernel/machine_kexec.c
@@ -31,6 +31,8 @@
 #include <asm/pgalloc.h>
 #include <asm/cacheflush.h>
 #include <asm/checksum.h>
+#include <asm/tlbflush.h>
+#include <asm/homecache.h>
 #include <hv/hypervisor.h>
 
 
@@ -222,11 +224,22 @@ struct page *kimage_alloc_pages_arch(gfp_t gfp_mask, unsigned int order)
 	return alloc_pages_node(0, gfp_mask, order);
 }
 
+/*
+ * Address range in which pa=va mapping is set in setup_quasi_va_is_pa().
+ * For tilepro, PAGE_OFFSET is used since this is the largest possbile value
+ * for tilepro, while for tilegx, we limit it to entire middle level page
+ * table which we assume has been allocated and is undoubtedly large enough.
+ */
+#ifndef __tilegx__
+#define	QUASI_VA_IS_PA_ADDR_RANGE PAGE_OFFSET
+#else
+#define	QUASI_VA_IS_PA_ADDR_RANGE PGDIR_SIZE
+#endif
+
 static void setup_quasi_va_is_pa(void)
 {
-	HV_PTE *pgtable;
 	HV_PTE pte;
-	int i;
+	unsigned long i;
 
 	/*
 	 * Flush our TLB to prevent conflicts between the previous contents
@@ -234,16 +247,22 @@ static void setup_quasi_va_is_pa(void)
 	 */
 	local_flush_tlb_all();
 
-	/* setup VA is PA, at least up to PAGE_OFFSET */
-
-	pgtable = (HV_PTE *)current->mm->pgd;
+	/*
+	 * setup VA is PA, at least up to QUASI_VA_IS_PA_ADDR_RANGE.
+	 * Note here we assume that level-1 page table is defined by
+	 * HPAGE_SIZE.
+	 */
 	pte = hv_pte(_PAGE_KERNEL | _PAGE_HUGE_PAGE);
 	pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_NO_L3);
-
-	for (i = 0; i < pgd_index(PAGE_OFFSET); i++) {
+	for (i = 0; i < (QUASI_VA_IS_PA_ADDR_RANGE >> HPAGE_SHIFT); i++) {
+		unsigned long vaddr = i << HPAGE_SHIFT;
+		pgd_t *pgd = pgd_offset(current->mm, vaddr);
+		pud_t *pud = pud_offset(pgd, vaddr);
+		pte_t *ptep = (pte_t *) pmd_offset(pud, vaddr);
 		unsigned long pfn = i << (HPAGE_SHIFT - PAGE_SHIFT);
+
 		if (pfn_valid(pfn))
-			__set_pte(&pgtable[i], pfn_pte(pfn, pte));
+			__set_pte(ptep, pfn_pte(pfn, pte));
 	}
 }
 
diff --git a/arch/tile/kernel/relocate_kernel.S b/arch/tile/kernel/relocate_kernel.S
deleted file mode 100644
index 010b418..0000000
--- a/arch/tile/kernel/relocate_kernel.S
+++ /dev/null
@@ -1,280 +0,0 @@
-/*
- * Copyright 2010 Tilera Corporation. All Rights Reserved.
- *
- *   This program is free software; you can redistribute it and/or
- *   modify it under the terms of the GNU General Public License
- *   as published by the Free Software Foundation, version 2.
- *
- *   This program is distributed in the hope that it will be useful, but
- *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- *   NON INFRINGEMENT.  See the GNU General Public License for
- *   more details.
- *
- * copy new kernel into place and then call hv_reexec
- *
- */
-
-#include <linux/linkage.h>
-#include <arch/chip.h>
-#include <asm/page.h>
-#include <hv/hypervisor.h>
-
-#define ___hvb	MEM_SV_INTRPT + HV_GLUE_START_CPA
-
-#define ___hv_dispatch(f) (___hvb + (HV_DISPATCH_ENTRY_SIZE * f))
-
-#define ___hv_console_putc ___hv_dispatch(HV_DISPATCH_CONSOLE_PUTC)
-#define ___hv_halt         ___hv_dispatch(HV_DISPATCH_HALT)
-#define ___hv_reexec       ___hv_dispatch(HV_DISPATCH_REEXEC)
-#define ___hv_flush_remote ___hv_dispatch(HV_DISPATCH_FLUSH_REMOTE)
-
-#undef RELOCATE_NEW_KERNEL_VERBOSE
-
-STD_ENTRY(relocate_new_kernel)
-
-	move	r30, r0		/* page list */
-	move	r31, r1		/* address of page we are on */
-	move	r32, r2		/* start address of new kernel */
-
-	shri	r1, r1, PAGE_SHIFT
-	addi	r1, r1, 1
-	shli	sp, r1, PAGE_SHIFT
-	addi	sp, sp, -8
-	/* we now have a stack (whether we need one or not) */
-
-	moveli	r40, lo16(___hv_console_putc)
-	auli	r40, r40, ha16(___hv_console_putc)
-
-#ifdef RELOCATE_NEW_KERNEL_VERBOSE
-	moveli	r0, 'r'
-	jalr	r40
-
-	moveli	r0, '_'
-	jalr	r40
-
-	moveli	r0, 'n'
-	jalr	r40
-
-	moveli	r0, '_'
-	jalr	r40
-
-	moveli	r0, 'k'
-	jalr	r40
-
-	moveli	r0, '\n'
-	jalr	r40
-#endif
-
-	/*
-	 * Throughout this code r30 is pointer to the element of page
-	 * list we are working on.
-	 *
-	 * Normally we get to the next element of the page list by
-	 * incrementing r30 by four.  The exception is if the element
-	 * on the page list is an IND_INDIRECTION in which case we use
-	 * the element with the low bits masked off as the new value
-	 * of r30.
-	 *
-	 * To get this started, we need the value passed to us (which
-	 * will always be an IND_INDIRECTION) in memory somewhere with
-	 * r30 pointing at it.  To do that, we push the value passed
-	 * to us on the stack and make r30 point to it.
-	 */
-
-	sw	sp, r30
-	move	r30, sp
-	addi	sp, sp, -8
-
-#if CHIP_HAS_CBOX_HOME_MAP()
-	/*
-	 * On TILEPro, we need to flush all tiles' caches, since we may
-	 * have been doing hash-for-home caching there.  Note that we
-	 * must do this _after_ we're completely done modifying any memory
-	 * other than our output buffer (which we know is locally cached).
-	 * We want the caches to be fully clean when we do the reexec,
-	 * because the hypervisor is going to do this flush again at that
-	 * point, and we don't want that second flush to overwrite any memory.
-	 */
-	{
-	 move	r0, zero	 /* cache_pa */
-	 move	r1, zero
-	}
-	{
-	 auli	r2, zero, ha16(HV_FLUSH_EVICT_L2) /* cache_control */
-	 movei	r3, -1		 /* cache_cpumask; -1 means all client tiles */
-	}
-	{
-	 move	r4, zero	 /* tlb_va */
-	 move	r5, zero	 /* tlb_length */
-	}
-	{
-	 move	r6, zero	 /* tlb_pgsize */
-	 move	r7, zero	 /* tlb_cpumask */
-	}
-	{
-	 move	r8, zero	 /* asids */
-	 moveli	r20, lo16(___hv_flush_remote)
-	}
-	{
-	 move	r9, zero	 /* asidcount */
-	 auli	r20, r20, ha16(___hv_flush_remote)
-	}
-
-	jalr	r20
-#endif
-
-	/* r33 is destination pointer, default to zero */
-
-	moveli	r33, 0
-
-.Lloop:	lw	r10, r30
-
-	andi	r9, r10, 0xf	/* low 4 bits tell us what type it is */
-	xor	r10, r10, r9	/* r10 is now value with low 4 bits stripped */
-
-	seqi	r0, r9, 0x1	/* IND_DESTINATION */
-	bzt	r0, .Ltry2
-
-	move	r33, r10
-
-#ifdef RELOCATE_NEW_KERNEL_VERBOSE
-	moveli	r0, 'd'
-	jalr	r40
-#endif
-
-	addi	r30, r30, 4
-	j	.Lloop
-
-.Ltry2:
-	seqi	r0, r9, 0x2	/* IND_INDIRECTION */
-	bzt	r0, .Ltry4
-
-	move	r30, r10
-
-#ifdef RELOCATE_NEW_KERNEL_VERBOSE
-	moveli	r0, 'i'
-	jalr	r40
-#endif
-
-	j	.Lloop
-
-.Ltry4:
-	seqi	r0, r9, 0x4	/* IND_DONE */
-	bzt	r0, .Ltry8
-
-	mf
-
-#ifdef RELOCATE_NEW_KERNEL_VERBOSE
-	moveli	r0, 'D'
-	jalr	r40
-	moveli	r0, '\n'
-	jalr	r40
-#endif
-
-	move	r0, r32
-	moveli	r1, 0		/* arg to hv_reexec is 64 bits */
-
-	moveli	r41, lo16(___hv_reexec)
-	auli	r41, r41, ha16(___hv_reexec)
-
-	jalr	r41
-
-	/* we should not get here */
-
-	moveli	r0, '?'
-	jalr	r40
-	moveli	r0, '\n'
-	jalr	r40
-
-	j	.Lhalt
-
-.Ltry8:	seqi	r0, r9, 0x8	/* IND_SOURCE */
-	bz	r0, .Lerr	/* unknown type */
-
-	/* copy page at r10 to page at r33 */
-
-	move	r11, r33
-
-	moveli	r0, lo16(PAGE_SIZE)
-	auli	r0, r0, ha16(PAGE_SIZE)
-	add	r33, r33, r0
-
-	/* copy word at r10 to word at r11 until r11 equals r33 */
-
-	/* We know page size must be multiple of 16, so we can unroll
-	 * 16 times safely without any edge case checking.
-	 *
-	 * Issue a flush of the destination every 16 words to avoid
-	 * incoherence when starting the new kernel.  (Now this is
-	 * just good paranoia because the hv_reexec call will also
-	 * take care of this.)
-	 */
-
-1:
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0; addi	r11, r11, 4 }
-	{ lw	r0, r10; addi	r10, r10, 4 }
-	{ sw	r11, r0 }
-	{ flush r11    ; addi	r11, r11, 4 }
-
-	seq	r0, r33, r11
-	bzt	r0, 1b
-
-#ifdef RELOCATE_NEW_KERNEL_VERBOSE
-	moveli	r0, 's'
-	jalr	r40
-#endif
-
-	addi	r30, r30, 4
-	j	.Lloop
-
-
-.Lerr:	moveli	r0, 'e'
-	jalr	r40
-	moveli	r0, 'r'
-	jalr	r40
-	moveli	r0, 'r'
-	jalr	r40
-	moveli	r0, '\n'
-	jalr	r40
-.Lhalt:
-	moveli	r41, lo16(___hv_halt)
-	auli	r41, r41, ha16(___hv_halt)
-
-	jalr	r41
-	STD_ENDPROC(relocate_new_kernel)
-
-	.section .rodata,"a"
-
-	.globl relocate_new_kernel_size
-relocate_new_kernel_size:
-	.long .Lend_relocate_new_kernel - relocate_new_kernel
diff --git a/arch/tile/kernel/relocate_kernel_32.S b/arch/tile/kernel/relocate_kernel_32.S
new file mode 100644
index 0000000..010b418
--- /dev/null
+++ b/arch/tile/kernel/relocate_kernel_32.S
@@ -0,0 +1,280 @@
+/*
+ * Copyright 2010 Tilera Corporation. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ *   NON INFRINGEMENT.  See the GNU General Public License for
+ *   more details.
+ *
+ * copy new kernel into place and then call hv_reexec
+ *
+ */
+
+#include <linux/linkage.h>
+#include <arch/chip.h>
+#include <asm/page.h>
+#include <hv/hypervisor.h>
+
+#define ___hvb	MEM_SV_INTRPT + HV_GLUE_START_CPA
+
+#define ___hv_dispatch(f) (___hvb + (HV_DISPATCH_ENTRY_SIZE * f))
+
+#define ___hv_console_putc ___hv_dispatch(HV_DISPATCH_CONSOLE_PUTC)
+#define ___hv_halt         ___hv_dispatch(HV_DISPATCH_HALT)
+#define ___hv_reexec       ___hv_dispatch(HV_DISPATCH_REEXEC)
+#define ___hv_flush_remote ___hv_dispatch(HV_DISPATCH_FLUSH_REMOTE)
+
+#undef RELOCATE_NEW_KERNEL_VERBOSE
+
+STD_ENTRY(relocate_new_kernel)
+
+	move	r30, r0		/* page list */
+	move	r31, r1		/* address of page we are on */
+	move	r32, r2		/* start address of new kernel */
+
+	shri	r1, r1, PAGE_SHIFT
+	addi	r1, r1, 1
+	shli	sp, r1, PAGE_SHIFT
+	addi	sp, sp, -8
+	/* we now have a stack (whether we need one or not) */
+
+	moveli	r40, lo16(___hv_console_putc)
+	auli	r40, r40, ha16(___hv_console_putc)
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'r'
+	jalr	r40
+
+	moveli	r0, '_'
+	jalr	r40
+
+	moveli	r0, 'n'
+	jalr	r40
+
+	moveli	r0, '_'
+	jalr	r40
+
+	moveli	r0, 'k'
+	jalr	r40
+
+	moveli	r0, '\n'
+	jalr	r40
+#endif
+
+	/*
+	 * Throughout this code r30 is pointer to the element of page
+	 * list we are working on.
+	 *
+	 * Normally we get to the next element of the page list by
+	 * incrementing r30 by four.  The exception is if the element
+	 * on the page list is an IND_INDIRECTION in which case we use
+	 * the element with the low bits masked off as the new value
+	 * of r30.
+	 *
+	 * To get this started, we need the value passed to us (which
+	 * will always be an IND_INDIRECTION) in memory somewhere with
+	 * r30 pointing at it.  To do that, we push the value passed
+	 * to us on the stack and make r30 point to it.
+	 */
+
+	sw	sp, r30
+	move	r30, sp
+	addi	sp, sp, -8
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+	/*
+	 * On TILEPro, we need to flush all tiles' caches, since we may
+	 * have been doing hash-for-home caching there.  Note that we
+	 * must do this _after_ we're completely done modifying any memory
+	 * other than our output buffer (which we know is locally cached).
+	 * We want the caches to be fully clean when we do the reexec,
+	 * because the hypervisor is going to do this flush again at that
+	 * point, and we don't want that second flush to overwrite any memory.
+	 */
+	{
+	 move	r0, zero	 /* cache_pa */
+	 move	r1, zero
+	}
+	{
+	 auli	r2, zero, ha16(HV_FLUSH_EVICT_L2) /* cache_control */
+	 movei	r3, -1		 /* cache_cpumask; -1 means all client tiles */
+	}
+	{
+	 move	r4, zero	 /* tlb_va */
+	 move	r5, zero	 /* tlb_length */
+	}
+	{
+	 move	r6, zero	 /* tlb_pgsize */
+	 move	r7, zero	 /* tlb_cpumask */
+	}
+	{
+	 move	r8, zero	 /* asids */
+	 moveli	r20, lo16(___hv_flush_remote)
+	}
+	{
+	 move	r9, zero	 /* asidcount */
+	 auli	r20, r20, ha16(___hv_flush_remote)
+	}
+
+	jalr	r20
+#endif
+
+	/* r33 is destination pointer, default to zero */
+
+	moveli	r33, 0
+
+.Lloop:	lw	r10, r30
+
+	andi	r9, r10, 0xf	/* low 4 bits tell us what type it is */
+	xor	r10, r10, r9	/* r10 is now value with low 4 bits stripped */
+
+	seqi	r0, r9, 0x1	/* IND_DESTINATION */
+	bzt	r0, .Ltry2
+
+	move	r33, r10
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'd'
+	jalr	r40
+#endif
+
+	addi	r30, r30, 4
+	j	.Lloop
+
+.Ltry2:
+	seqi	r0, r9, 0x2	/* IND_INDIRECTION */
+	bzt	r0, .Ltry4
+
+	move	r30, r10
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'i'
+	jalr	r40
+#endif
+
+	j	.Lloop
+
+.Ltry4:
+	seqi	r0, r9, 0x4	/* IND_DONE */
+	bzt	r0, .Ltry8
+
+	mf
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'D'
+	jalr	r40
+	moveli	r0, '\n'
+	jalr	r40
+#endif
+
+	move	r0, r32
+	moveli	r1, 0		/* arg to hv_reexec is 64 bits */
+
+	moveli	r41, lo16(___hv_reexec)
+	auli	r41, r41, ha16(___hv_reexec)
+
+	jalr	r41
+
+	/* we should not get here */
+
+	moveli	r0, '?'
+	jalr	r40
+	moveli	r0, '\n'
+	jalr	r40
+
+	j	.Lhalt
+
+.Ltry8:	seqi	r0, r9, 0x8	/* IND_SOURCE */
+	bz	r0, .Lerr	/* unknown type */
+
+	/* copy page at r10 to page at r33 */
+
+	move	r11, r33
+
+	moveli	r0, lo16(PAGE_SIZE)
+	auli	r0, r0, ha16(PAGE_SIZE)
+	add	r33, r33, r0
+
+	/* copy word at r10 to word at r11 until r11 equals r33 */
+
+	/* We know page size must be multiple of 16, so we can unroll
+	 * 16 times safely without any edge case checking.
+	 *
+	 * Issue a flush of the destination every 16 words to avoid
+	 * incoherence when starting the new kernel.  (Now this is
+	 * just good paranoia because the hv_reexec call will also
+	 * take care of this.)
+	 */
+
+1:
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0; addi	r11, r11, 4 }
+	{ lw	r0, r10; addi	r10, r10, 4 }
+	{ sw	r11, r0 }
+	{ flush r11    ; addi	r11, r11, 4 }
+
+	seq	r0, r33, r11
+	bzt	r0, 1b
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 's'
+	jalr	r40
+#endif
+
+	addi	r30, r30, 4
+	j	.Lloop
+
+
+.Lerr:	moveli	r0, 'e'
+	jalr	r40
+	moveli	r0, 'r'
+	jalr	r40
+	moveli	r0, 'r'
+	jalr	r40
+	moveli	r0, '\n'
+	jalr	r40
+.Lhalt:
+	moveli	r41, lo16(___hv_halt)
+	auli	r41, r41, ha16(___hv_halt)
+
+	jalr	r41
+	STD_ENDPROC(relocate_new_kernel)
+
+	.section .rodata,"a"
+
+	.globl relocate_new_kernel_size
+relocate_new_kernel_size:
+	.long .Lend_relocate_new_kernel - relocate_new_kernel
diff --git a/arch/tile/kernel/relocate_kernel_64.S b/arch/tile/kernel/relocate_kernel_64.S
new file mode 100644
index 0000000..1c09a4f
--- /dev/null
+++ b/arch/tile/kernel/relocate_kernel_64.S
@@ -0,0 +1,260 @@
+/*
+ * Copyright 2011 Tilera Corporation. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ *   NON INFRINGEMENT.  See the GNU General Public License for
+ *   more details.
+ *
+ * copy new kernel into place and then call hv_reexec
+ *
+ */
+
+#include <linux/linkage.h>
+#include <arch/chip.h>
+#include <asm/page.h>
+#include <hv/hypervisor.h>
+
+#undef RELOCATE_NEW_KERNEL_VERBOSE
+
+STD_ENTRY(relocate_new_kernel)
+
+	move	r30, r0		/* page list */
+	move	r31, r1		/* address of page we are on */
+	move	r32, r2		/* start address of new kernel */
+
+	shrui	r1, r1, PAGE_SHIFT
+	addi	r1, r1, 1
+	shli	sp, r1, PAGE_SHIFT
+	addi	sp, sp, -8
+	/* we now have a stack (whether we need one or not) */
+
+	moveli	r40, hw2_last(hv_console_putc)
+	shl16insli r40, r40, hw1(hv_console_putc)
+	shl16insli r40, r40, hw0(hv_console_putc)
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'r'
+	jalr	r40
+
+	moveli	r0, '_'
+	jalr	r40
+
+	moveli	r0, 'n'
+	jalr	r40
+
+	moveli	r0, '_'
+	jalr	r40
+
+	moveli	r0, 'k'
+	jalr	r40
+
+	moveli	r0, '\n'
+	jalr	r40
+#endif
+
+	/*
+	 * Throughout this code r30 is pointer to the element of page
+	 * list we are working on.
+	 *
+	 * Normally we get to the next element of the page list by
+	 * incrementing r30 by eight.  The exception is if the element
+	 * on the page list is an IND_INDIRECTION in which case we use
+	 * the element with the low bits masked off as the new value
+	 * of r30.
+	 *
+	 * To get this started, we need the value passed to us (which
+	 * will always be an IND_INDIRECTION) in memory somewhere with
+	 * r30 pointing at it.  To do that, we push the value passed
+	 * to us on the stack and make r30 point to it.
+	 */
+
+	st	sp, r30
+	move	r30, sp
+	addi	sp, sp, -16
+
+#if CHIP_HAS_CBOX_HOME_MAP()
+	/*
+	 * On TILE-GX, we need to flush all tiles' caches, since we may
+	 * have been doing hash-for-home caching there.  Note that we
+	 * must do this _after_ we're completely done modifying any memory
+	 * other than our output buffer (which we know is locally cached).
+	 * We want the caches to be fully clean when we do the reexec,
+	 * because the hypervisor is going to do this flush again at that
+	 * point, and we don't want that second flush to overwrite any memory.
+	 */
+	{
+	 move	r0, zero	 /* cache_pa */
+	 moveli	r1, hw2_last(HV_FLUSH_EVICT_L2)
+	}
+	{
+	 shl16insli	r1, r1, hw1(HV_FLUSH_EVICT_L2)
+	 movei	r2, -1		 /* cache_cpumask; -1 means all client tiles */
+	}
+	{
+	 shl16insli	r1, r1, hw0(HV_FLUSH_EVICT_L2)  /* cache_control */
+	 move	r3, zero	 /* tlb_va */
+	}
+	{
+	 move	r4, zero	 /* tlb_length */
+	 move	r5, zero	 /* tlb_pgsize */
+	}
+	{
+	 move	r6, zero	 /* tlb_cpumask */
+	 move	r7, zero	 /* asids */
+	}
+	{
+	 moveli	r20, hw2_last(hv_flush_remote)
+	 move	r8, zero	 /* asidcount */
+	}
+	shl16insli	r20, r20, hw1(hv_flush_remote)
+	shl16insli	r20, r20, hw0(hv_flush_remote)
+
+	jalr	r20
+#endif
+
+	/* r33 is destination pointer, default to zero */
+
+	moveli	r33, 0
+
+.Lloop:	ld	r10, r30
+
+	andi	r9, r10, 0xf	/* low 4 bits tell us what type it is */
+	xor	r10, r10, r9	/* r10 is now value with low 4 bits stripped */
+
+	cmpeqi	r0, r9, 0x1	/* IND_DESTINATION */
+	beqzt	r0, .Ltry2
+
+	move	r33, r10
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'd'
+	jalr	r40
+#endif
+
+	addi	r30, r30, 8
+	j	.Lloop
+
+.Ltry2:
+	cmpeqi	r0, r9, 0x2	/* IND_INDIRECTION */
+	beqzt	r0, .Ltry4
+
+	move	r30, r10
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'i'
+	jalr	r40
+#endif
+
+	j	.Lloop
+
+.Ltry4:
+	cmpeqi	r0, r9, 0x4	/* IND_DONE */
+	beqzt	r0, .Ltry8
+
+	mf
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 'D'
+	jalr	r40
+	moveli	r0, '\n'
+	jalr	r40
+#endif
+
+	move	r0, r32
+
+	moveli	r41, hw2_last(hv_reexec)
+	shl16insli	r41, r41, hw1(hv_reexec)
+	shl16insli	r41, r41, hw0(hv_reexec)
+
+	jalr	r41
+
+	/* we should not get here */
+
+	moveli	r0, '?'
+	jalr	r40
+	moveli	r0, '\n'
+	jalr	r40
+
+	j	.Lhalt
+
+.Ltry8:	cmpeqi	r0, r9, 0x8	/* IND_SOURCE */
+	beqz	r0, .Lerr	/* unknown type */
+
+	/* copy page at r10 to page at r33 */
+
+	move	r11, r33
+
+	moveli	r0, hw2_last(PAGE_SIZE)
+	shl16insli	r0, r0, hw1(PAGE_SIZE)
+	shl16insli	r0, r0, hw0(PAGE_SIZE)
+	add	r33, r33, r0
+
+	/* copy word at r10 to word at r11 until r11 equals r33 */
+
+	/* We know page size must be multiple of 8, so we can unroll
+	 * 8 times safely without any edge case checking.
+	 *
+	 * Issue a flush of the destination every 8 words to avoid
+	 * incoherence when starting the new kernel.  (Now this is
+	 * just good paranoia because the hv_reexec call will also
+	 * take care of this.)
+	 */
+
+1:
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0; addi	r11, r11, 8 }
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0; addi	r11, r11, 8 }
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0; addi	r11, r11, 8 }
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0; addi	r11, r11, 8 }
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0; addi	r11, r11, 8 }
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0; addi	r11, r11, 8 }
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0; addi	r11, r11, 8 }
+	{ ld	r0, r10; addi	r10, r10, 8 }
+	{ st	r11, r0 }
+	{ flush r11    ; addi	r11, r11, 8 }
+
+	cmpeq	r0, r33, r11
+	beqzt	r0, 1b
+
+#ifdef RELOCATE_NEW_KERNEL_VERBOSE
+	moveli	r0, 's'
+	jalr	r40
+#endif
+
+	addi	r30, r30, 8
+	j	.Lloop
+
+
+.Lerr:	moveli	r0, 'e'
+	jalr	r40
+	moveli	r0, 'r'
+	jalr	r40
+	moveli	r0, 'r'
+	jalr	r40
+	moveli	r0, '\n'
+	jalr	r40
+.Lhalt:
+	moveli r41, hw2_last(hv_halt)
+	shl16insli r41, r41, hw1(hv_halt)
+	shl16insli r41, r41, hw0(hv_halt)
+
+	jalr	r41
+	STD_ENDPROC(relocate_new_kernel)
+
+	.section .rodata,"a"
+
+	.globl relocate_new_kernel_size
+relocate_new_kernel_size:
+	.long .Lend_relocate_new_kernel - relocate_new_kernel
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix up locking in pgtable.c slightly
       [not found] <4F761E1C.80808.com>
                   ` (26 preceding siblings ...)
  2012-03-29 19:48 ` [PATCH] arch/tile: support kexec() for tilegx Chris Metcalf
@ 2012-03-29 19:50 ` Chris Metcalf
  2012-03-29 19:56 ` [PATCH] arch/tile: use memparse() for "maxmem" and "maxnodemem" options Chris Metcalf
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:50 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

We should be holding the init_mm.page_table_lock in shatter_huge_page()
since we are modifying the kernel page tables.  Then, only if we are
walking the other root page tables to update them, do we want to take
the pgd_lock.

Add a comment about taking the pgd_lock that we always do it with
interrupts disabled and therefore are not at risk from the tlbflush
IPI deadlock as is seen on x86.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/pgtable.c |   22 ++++++++++++----------
 1 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 277645f..9c6985f 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -178,14 +178,10 @@ void shatter_huge_page(unsigned long addr)
 	if (!pmd_huge_page(*pmd))
 		return;
 
-	/*
-	 * Grab the pgd_lock, since we may need it to walk the pgd_list,
-	 * and since we need some kind of lock here to avoid races.
-	 */
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock_irqsave(&init_mm.page_table_lock, flags);
 	if (!pmd_huge_page(*pmd)) {
 		/* Lost the race to convert the huge page. */
-		spin_unlock_irqrestore(&pgd_lock, flags);
+		spin_unlock_irqrestore(&init_mm.page_table_lock, flags);
 		return;
 	}
 
@@ -195,6 +191,7 @@ void shatter_huge_page(unsigned long addr)
 
 #ifdef __PAGETABLE_PMD_FOLDED
 	/* Walk every pgd on the system and update the pmd there. */
+	spin_lock(&pgd_lock);
 	list_for_each(pos, &pgd_list) {
 		pmd_t *copy_pmd;
 		pgd = list_to_pgd(pos) + pgd_index(addr);
@@ -202,6 +199,7 @@ void shatter_huge_page(unsigned long addr)
 		copy_pmd = pmd_offset(pud, addr);
 		__set_pmd(copy_pmd, *pmd);
 	}
+	spin_unlock(&pgd_lock);
 #endif
 
 	/* Tell every cpu to notice the change. */
@@ -209,7 +207,7 @@ void shatter_huge_page(unsigned long addr)
 		     cpu_possible_mask, NULL, 0);
 
 	/* Hold the lock until the TLB flush is finished to avoid races. */
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock_irqrestore(&init_mm.page_table_lock, flags);
 }
 
 /*
@@ -218,9 +216,13 @@ void shatter_huge_page(unsigned long addr)
  * against pageattr.c; it is the unique case in which a valid change
  * of kernel pagetables can't be lazily synchronized by vmalloc faults.
  * vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
- * -- wli
+ *
+ * The lock is always taken with interrupts disabled, unlike on x86
+ * and other platforms, because we need to take the lock in
+ * shatter_huge_page(), which may be called from an interrupt context.
+ * We are not at risk from the tlbflush IPI deadlock that was seen on
+ * x86, since we use the flush_remote() API to have the hypervisor do
+ * the TLB flushes regardless of irq disabling.
  */
 DEFINE_SPINLOCK(pgd_lock);
 LIST_HEAD(pgd_list);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: use memparse() for "maxmem" and "maxnodemem" options
       [not found] <4F761E1C.80808.com>
                   ` (27 preceding siblings ...)
  2012-03-29 19:50 ` [PATCH] arch/tile: fix up locking in pgtable.c slightly Chris Metcalf
@ 2012-03-29 19:56 ` Chris Metcalf
  2012-03-29 19:57 ` [PATCH] arch/tile: add "nop" after "nap" to help GX idle power draw Chris Metcalf
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:56 UTC (permalink / raw)
  To: Chris Metcalf, Rusty Russell, Jiri Kosina, Joe Perches, linux-kernel

This is more standard and avoids having to remember what units
the options actually take.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/setup.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 62b1903..90251dc 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -103,13 +103,11 @@ unsigned long __initdata pci_reserve_end_pfn = -1U;
 
 static int __init setup_maxmem(char *str)
 {
-	long maxmem_mb;
-	if (str == NULL || strict_strtol(str, 0, &maxmem_mb) != 0 ||
-	    maxmem_mb == 0)
+	unsigned long long maxmem;
+	if (str == NULL || (maxmem = memparse(str, NULL)) == 0)
 		return -EINVAL;
 
-	maxmem_pfn = (maxmem_mb >> (HPAGE_SHIFT - 20)) <<
-		(HPAGE_SHIFT - PAGE_SHIFT);
+	maxmem_pfn = (maxmem >> HPAGE_SHIFT) << (HPAGE_SHIFT - PAGE_SHIFT);
 	pr_info("Forcing RAM used to no more than %dMB\n",
 	       maxmem_pfn >> (20 - PAGE_SHIFT));
 	return 0;
@@ -119,14 +117,15 @@ early_param("maxmem", setup_maxmem);
 static int __init setup_maxnodemem(char *str)
 {
 	char *endp;
-	long maxnodemem_mb, node;
+	unsigned long long maxnodemem;
+	long node;
 
 	node = str ? simple_strtoul(str, &endp, 0) : INT_MAX;
-	if (node >= MAX_NUMNODES || *endp != ':' ||
-	    strict_strtol(endp+1, 0, &maxnodemem_mb) != 0)
+	if (node >= MAX_NUMNODES || *endp != ':')
 		return -EINVAL;
 
-	maxnodemem_pfn[node] = (maxnodemem_mb >> (HPAGE_SHIFT - 20)) <<
+	maxnodemem = memparse(endp+1, NULL);
+	maxnodemem_pfn[node] = (maxnodemem >> HPAGE_SHIFT) <<
 		(HPAGE_SHIFT - PAGE_SHIFT);
 	pr_info("Forcing RAM used on node %ld to no more than %dMB\n",
 	       node, maxnodemem_pfn[node] >> (20 - PAGE_SHIFT));
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: add "nop" after "nap" to help GX idle power draw
       [not found] <4F761E1C.80808.com>
                   ` (28 preceding siblings ...)
  2012-03-29 19:56 ` [PATCH] arch/tile: use memparse() for "maxmem" and "maxnodemem" options Chris Metcalf
@ 2012-03-29 19:57 ` Chris Metcalf
  2012-03-29 19:59 ` [PATCH] arch/tile: implement panic_smp_self_stop() Chris Metcalf
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:57 UTC (permalink / raw)
  To: Chris Metcalf, Benjamin Herrenschmidt, Jesper Nilsson,
	Russell King, Martin Schwidefsky, linux-kernel

This avoids the hardware istream prefetcher doing unnecessary work.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/entry.S |    2 ++
 arch/tile/kernel/smp.c   |    2 +-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index f8d6155..133c4b5 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -85,6 +85,7 @@ STD_ENTRY(cpu_idle_on_new_stack)
 /* Loop forever on a nap during SMP boot. */
 STD_ENTRY(smp_nap)
 	nap
+	nop       /* avoid provoking the icache prefetch with a jump */
 	j smp_nap /* we are not architecturally guaranteed not to exit nap */
 	jrp lr    /* clue in the backtracer */
 	STD_ENDPROC(smp_nap)
@@ -106,5 +107,6 @@ STD_ENTRY(_cpu_idle)
 	.global _cpu_idle_nap
 _cpu_idle_nap:
 	nap
+	nop       /* avoid provoking the icache prefetch with a jump */
 	jrp lr
 	STD_ENDPROC(_cpu_idle)
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index 7579607..f86887a 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -122,7 +122,7 @@ static void smp_stop_cpu_interrupt(void)
 	set_cpu_online(smp_processor_id(), 0);
 	arch_local_irq_disable_all();
 	for (;;)
-		asm("nap");
+		asm("nap; nop");
 }
 
 /* This function calls the 'stop' function on all other CPUs in the system. */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: implement panic_smp_self_stop()
       [not found] <4F761E1C.80808.com>
                   ` (29 preceding siblings ...)
  2012-03-29 19:57 ` [PATCH] arch/tile: add "nop" after "nap" to help GX idle power draw Chris Metcalf
@ 2012-03-29 19:59 ` Chris Metcalf
  2012-03-29 20:11 ` [PATCH] arch/tile: fix single-stepping over swint1 instructions on tilegx Chris Metcalf
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 19:59 UTC (permalink / raw)
  To: Chris Metcalf, Benjamin Herrenschmidt, Jesper Nilsson,
	Russell King, Martin Schwidefsky, linux-kernel

This allows the later-panicking tiles to wait in a lower power state
until they get interrupted with an smp_send_stop().

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/smp.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index f86887a..5b46a12 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -132,6 +132,12 @@ void smp_send_stop(void)
 	send_IPI_allbutself(MSG_TAG_STOP_CPU);
 }
 
+/* On panic, just wait; we may get an smp_send_stop() later on. */
+void panic_smp_self_stop(void)
+{
+	while (1)
+		asm("nap; nop");
+}
 
 /*
  * Dispatch code called from hv_message_intr() for HV_MSG_TILE hv messages.
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix single-stepping over swint1 instructions on tilegx
       [not found] <4F761E1C.80808.com>
                   ` (30 preceding siblings ...)
  2012-03-29 19:59 ` [PATCH] arch/tile: implement panic_smp_self_stop() Chris Metcalf
@ 2012-03-29 20:11 ` Chris Metcalf
  2012-03-29 20:14 ` [PATCH] arch/tile: fix pointer cast in cacheflush.c Chris Metcalf
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 20:11 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, linux-kernel

If we are single-stepping and make a syscall, we call ptrace_notify()
explicitly on the return path back to user space, since we are returning
to a pc value set artificially to the next instruction, and otherwise
we won't register that we stepped over the syscall instruction (swint1).

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/intvec_64.S |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 6a1ea82..cbf7334 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -22,6 +22,7 @@
 #include <asm/irqflags.h>
 #include <asm/asm-offsets.h>
 #include <asm/types.h>
+#include <asm/signal.h>
 #include <hv/hypervisor.h>
 #include <arch/abi.h>
 #include <arch/interrupts.h>
@@ -1047,11 +1048,25 @@ handle_syscall:
 
 	/* Do syscall trace again, if requested. */
 	ld	r30, r31
-	andi    r30, r30, _TIF_SYSCALL_TRACE
-	beqzt	r30, 1f
+	andi    r0, r30, _TIF_SYSCALL_TRACE
+	{
+	 andi    r0, r30, _TIF_SINGLESTEP
+	 beqzt   r0, 1f
+	}
 	jal	do_syscall_trace
 	FEEDBACK_REENTER(handle_syscall)
-1:	j       .Lresume_userspace   /* jump into middle of interrupt_return */
+	andi    r0, r30, _TIF_SINGLESTEP
+
+1:	beqzt	r0, 2f
+
+	/* Single stepping -- notify ptrace. */
+	{
+	 movei   r0, SIGTRAP
+	 jal     ptrace_notify
+	}
+	FEEDBACK_REENTER(handle_syscall)
+
+2:	j       .Lresume_userspace   /* jump into middle of interrupt_return */
 
 .Lcompat_syscall:
 	/*
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix pointer cast in cacheflush.c
       [not found] <4F761E1C.80808.com>
                   ` (31 preceding siblings ...)
  2012-03-29 20:11 ` [PATCH] arch/tile: fix single-stepping over swint1 instructions on tilegx Chris Metcalf
@ 2012-03-29 20:14 ` Chris Metcalf
  2012-03-29 20:19 ` [PATCH] arch/tile: export the page_home() function Chris Metcalf
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 20:14 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

Pragmatically it couldn't be wrong to cast pointers to long to compare
them (since all kernel addresses are in the top half of VA space),
but it's more correct to cast to unsigned long.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/lib/cacheflush.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/lib/cacheflush.c b/arch/tile/lib/cacheflush.c
index 8928aac..6af2b97 100644
--- a/arch/tile/lib/cacheflush.c
+++ b/arch/tile/lib/cacheflush.c
@@ -109,7 +109,7 @@ void finv_buffer_remote(void *buffer, size_t size, int hfh)
 
 	/* Figure out how far back we need to go. */
 	base = p - (step_size * (load_count - 2));
-	if ((long)base < (long)buffer)
+	if ((unsigned long)base < (unsigned long)buffer)
 		base = buffer;
 
 	/*
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: export the page_home() function.
       [not found] <4F761E1C.80808.com>
                   ` (32 preceding siblings ...)
  2012-03-29 20:14 ` [PATCH] arch/tile: fix pointer cast in cacheflush.c Chris Metcalf
@ 2012-03-29 20:19 ` Chris Metcalf
  2012-03-30 19:29 ` [PATCH] arch/tile: stop mentioning the "kvm" subdirectory Chris Metcalf
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-29 20:19 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/homecache.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/tile/mm/homecache.c b/arch/tile/mm/homecache.c
index 1cc6ae4..499f737 100644
--- a/arch/tile/mm/homecache.c
+++ b/arch/tile/mm/homecache.c
@@ -394,6 +394,7 @@ int page_home(struct page *page)
 		return pte_to_home(*virt_to_pte(NULL, kva));
 	}
 }
+EXPORT_SYMBOL(page_home);
 
 void homecache_change_page_home(struct page *page, int order, int home)
 {
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: stop mentioning the "kvm" subdirectory
       [not found] <4F761E1C.80808.com>
                   ` (33 preceding siblings ...)
  2012-03-29 20:19 ` [PATCH] arch/tile: export the page_home() function Chris Metcalf
@ 2012-03-30 19:29 ` Chris Metcalf
  2012-03-30 19:46 ` [PATCH] arch/tile: use atomic exchange in arch_write_unlock() Chris Metcalf
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 19:29 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

It causes "make clean" to fail, for example.  Once we have KVM support
complete, we'll reinstate the subdir reference.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Makefile |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/tile/Makefile b/arch/tile/Makefile
index 5e4d3b9..9520bc5 100644
--- a/arch/tile/Makefile
+++ b/arch/tile/Makefile
@@ -54,8 +54,6 @@ libs-y		+= $(LIBGCC_PATH)
 # See arch/tile/Kbuild for content of core part of the kernel
 core-y		+= arch/tile/
 
-core-$(CONFIG_KVM) += arch/tile/kvm/
-
 ifdef TILERA_ROOT
 INSTALL_PATH ?= $(TILERA_ROOT)/tile/boot
 endif
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: use atomic exchange in arch_write_unlock()
       [not found] <4F761E1C.80808.com>
                   ` (34 preceding siblings ...)
  2012-03-30 19:29 ` [PATCH] arch/tile: stop mentioning the "kvm" subdirectory Chris Metcalf
@ 2012-03-30 19:46 ` Chris Metcalf
  2012-03-30 19:47 ` [PATCH] arch/tile: fix finv_buffer_remote() for tilegx Chris Metcalf
                   ` (7 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 19:46 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, linux-kernel

This idiom is used elsewhere when we do an unlock by writing a zero,
but I missed it here.  Using an atomic operation avoids waiting
on the write buffer for the unlocking write to be sent to the home cache.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/spinlock_64.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index 72be590..5f8b6a0 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -137,7 +137,7 @@ static inline void arch_read_unlock(arch_rwlock_t *rw)
 static inline void arch_write_unlock(arch_rwlock_t *rw)
 {
 	__insn_mf();
-	rw->lock = 0;
+	__insn_exch4(&rw->lock, 0);  /* Avoid waiting in the write buffer. */
 }
 
 static inline int arch_read_trylock(arch_rwlock_t *rw)
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix finv_buffer_remote() for tilegx
       [not found] <4F761E1C.80808.com>
                   ` (35 preceding siblings ...)
  2012-03-30 19:46 ` [PATCH] arch/tile: use atomic exchange in arch_write_unlock() Chris Metcalf
@ 2012-03-30 19:47 ` Chris Metcalf
  2012-03-30 19:55 ` [PATCH] arch/tile: fix a reference to cpu_possible_map in a comment Chris Metcalf
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 19:47 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

There were some correctness issues with this code that are now fixed
with this change.  The change is likely less performant than it could
be, but it should no longer be vulnerable to any races with memory
operations on the memory network while invalidating a range of memory.
This code is run infrequently so performance isn't critical, but
correctness definitely is.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/lib/cacheflush.c |   28 ++++++++++++++++++++++++++--
 1 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/tile/lib/cacheflush.c b/arch/tile/lib/cacheflush.c
index 6af2b97..db4fb89 100644
--- a/arch/tile/lib/cacheflush.c
+++ b/arch/tile/lib/cacheflush.c
@@ -39,7 +39,21 @@ void finv_buffer_remote(void *buffer, size_t size, int hfh)
 {
 	char *p, *base;
 	size_t step_size, load_count;
+
+	/*
+	 * On TILEPro the striping granularity is a fixed 8KB; on
+	 * TILE-Gx it is configurable, and we rely on the fact that
+	 * the hypervisor always configures maximum striping, so that
+	 * bits 9 and 10 of the PA are part of the stripe function, so
+	 * every 512 bytes we hit a striping boundary.
+	 *
+	 */
+#ifdef __tilegx__
+	const unsigned long STRIPE_WIDTH = 512;
+#else
 	const unsigned long STRIPE_WIDTH = 8192;
+#endif
+
 #ifdef __tilegx__
 	/*
 	 * On TILE-Gx, we must disable the dstream prefetcher before doing
@@ -74,7 +88,7 @@ void finv_buffer_remote(void *buffer, size_t size, int hfh)
 	 * memory, that one load would be sufficient, but since we may
 	 * be, we also need to back up to the last load issued to
 	 * another memory controller, which would be the point where
-	 * we crossed an 8KB boundary (the granularity of striping
+	 * we crossed a "striping" boundary (the granularity of striping
 	 * across memory controllers).  Keep backing up and doing this
 	 * until we are before the beginning of the buffer, or have
 	 * hit all the controllers.
@@ -88,12 +102,22 @@ void finv_buffer_remote(void *buffer, size_t size, int hfh)
 	 * every cache line on a full memory stripe on each
 	 * controller" that we simply do that, to simplify the logic.
 	 *
-	 * FIXME: See bug 9535 for some issues with this code.
+	 * On TILE-Gx the hash-for-home function is much more complex,
+	 * with the upshot being we can't readily guarantee we have
+	 * hit both entries in the 128-entry AMT that were hit by any
+	 * load in the entire range, so we just re-load them all.
+	 * With larger buffers, we may want to consider using a hypervisor
+	 * trap to issue loads directly to each hash-for-home tile for
+	 * each controller (doing it from Linux would trash the TLB).
 	 */
 	if (hfh) {
 		step_size = L2_CACHE_BYTES;
+#ifdef __tilegx__
+		load_count = (size + L2_CACHE_BYTES - 1) / L2_CACHE_BYTES;
+#else
 		load_count = (STRIPE_WIDTH / L2_CACHE_BYTES) *
 			      (1 << CHIP_LOG_NUM_MSHIMS());
+#endif
 	} else {
 		step_size = STRIPE_WIDTH;
 		load_count = (1 << CHIP_LOG_NUM_MSHIMS());
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix a reference to cpu_possible_map in a comment
       [not found] <4F761E1C.80808.com>
                   ` (36 preceding siblings ...)
  2012-03-30 19:47 ` [PATCH] arch/tile: fix finv_buffer_remote() for tilegx Chris Metcalf
@ 2012-03-30 19:55 ` Chris Metcalf
  2012-03-30 20:01 ` [PATCH] arch/tile: fix hardwall for tilegx and generalize for idn and ipi Chris Metcalf
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 19:55 UTC (permalink / raw)
  To: Chris Metcalf, Rusty Russell, Jiri Kosina, Joe Perches, linux-kernel

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/setup.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 1d43be6..ee68fd2 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -1158,7 +1158,7 @@ EXPORT_SYMBOL(hash_for_home_map);
 
 /*
  * cpu_cacheable_map lists all the cpus whose caches the hypervisor can
- * flush on our behalf.  It is set to cpu_possible_map OR'ed with
+ * flush on our behalf.  It is set to cpu_possible_mask OR'ed with
  * hash_for_home_map, and it is what should be passed to
  * hv_flush_remote() to flush all caches.  Note that if there are
  * dedicated hypervisor driver tiles that have authorized use of their
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: fix hardwall for tilegx and generalize for idn and ipi
       [not found] <4F761E1C.80808.com>
                   ` (37 preceding siblings ...)
  2012-03-30 19:55 ` [PATCH] arch/tile: fix a reference to cpu_possible_map in a comment Chris Metcalf
@ 2012-03-30 20:01 ` Chris Metcalf
  2012-03-30 20:21 ` [PATCH] arch/tile: allow querying cpu module information from the hypervisor Chris Metcalf
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 20:01 UTC (permalink / raw)
  To: Chris Metcalf, Dmitry Torokhov, Arnd Bergmann, Oleg Nesterov,
	Paul E. McKenney, Frederic Weisbecker, Josh Triplett,
	linux-kernel

The hardwall drain code was not properly implemented for tilegx,
just tilepro, so you couldn't reliably restart an application that
made use of the udn.

In addition, the code was only applicable to the udn (user dynamic
network).  On tilegx there is a second user network that is available
(the "idn"), and there is support for having I/O shims deliver
user-level interrupts to applications ("ipi") which functions in a
very similar way to the inter-core permissions used for udn/idn.
So this change also generalizes the code from supporting just the udn
to supports udn/idn/ipi on tilegx.

By default we now use /dev/hardwall/{udn,idn,ipi} with separate
minor numbers for the three devices.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/arch/spr_def_32.h |   56 +++
 arch/tile/include/arch/spr_def_64.h |   43 ++
 arch/tile/include/asm/hardwall.h    |   18 +-
 arch/tile/include/asm/processor.h   |   17 +-
 arch/tile/include/asm/system.h      |   10 +-
 arch/tile/kernel/hardwall.c         |  754 +++++++++++++++++++++++------------
 arch/tile/kernel/intvec_64.S        |    2 +-
 arch/tile/kernel/process.c          |   16 +-
 8 files changed, 636 insertions(+), 280 deletions(-)

diff --git a/arch/tile/include/arch/spr_def_32.h b/arch/tile/include/arch/spr_def_32.h
index bbc1f4c..78bbce2 100644
--- a/arch/tile/include/arch/spr_def_32.h
+++ b/arch/tile/include/arch/spr_def_32.h
@@ -65,6 +65,31 @@
 #define SPR_EX_CONTEXT_2_1__ICS_RMASK 0x1
 #define SPR_EX_CONTEXT_2_1__ICS_MASK  0x4
 #define SPR_FAIL 0x4e09
+#define SPR_IDN_AVAIL_EN 0x3e05
+#define SPR_IDN_CA_DATA 0x0b00
+#define SPR_IDN_DATA_AVAIL 0x0b03
+#define SPR_IDN_DEADLOCK_TIMEOUT 0x3406
+#define SPR_IDN_DEMUX_CA_COUNT 0x0a05
+#define SPR_IDN_DEMUX_COUNT_0 0x0a06
+#define SPR_IDN_DEMUX_COUNT_1 0x0a07
+#define SPR_IDN_DEMUX_CTL 0x0a08
+#define SPR_IDN_DEMUX_QUEUE_SEL 0x0a0a
+#define SPR_IDN_DEMUX_STATUS 0x0a0b
+#define SPR_IDN_DEMUX_WRITE_FIFO 0x0a0c
+#define SPR_IDN_DIRECTION_PROTECT 0x2e05
+#define SPR_IDN_PENDING 0x0a0e
+#define SPR_IDN_REFILL_EN 0x0e05
+#define SPR_IDN_SP_FIFO_DATA 0x0a0f
+#define SPR_IDN_SP_FIFO_SEL 0x0a10
+#define SPR_IDN_SP_FREEZE 0x0a11
+#define SPR_IDN_SP_FREEZE__SP_FRZ_MASK  0x1
+#define SPR_IDN_SP_FREEZE__DEMUX_FRZ_MASK  0x2
+#define SPR_IDN_SP_FREEZE__NON_DEST_EXT_MASK  0x4
+#define SPR_IDN_SP_STATE 0x0a12
+#define SPR_IDN_TAG_0 0x0a13
+#define SPR_IDN_TAG_1 0x0a14
+#define SPR_IDN_TAG_VALID 0x0a15
+#define SPR_IDN_TILE_COORD 0x0a16
 #define SPR_INTCTRL_0_STATUS 0x4a07
 #define SPR_INTCTRL_1_STATUS 0x4807
 #define SPR_INTCTRL_2_STATUS 0x4607
@@ -87,12 +112,36 @@
 #define SPR_INTERRUPT_MASK_SET_1_1 0x480e
 #define SPR_INTERRUPT_MASK_SET_2_0 0x460c
 #define SPR_INTERRUPT_MASK_SET_2_1 0x460d
+#define SPR_MPL_AUX_PERF_COUNT_SET_0 0x6000
+#define SPR_MPL_AUX_PERF_COUNT_SET_1 0x6001
+#define SPR_MPL_AUX_PERF_COUNT_SET_2 0x6002
 #define SPR_MPL_DMA_CPL_SET_0 0x5800
 #define SPR_MPL_DMA_CPL_SET_1 0x5801
 #define SPR_MPL_DMA_CPL_SET_2 0x5802
 #define SPR_MPL_DMA_NOTIFY_SET_0 0x3800
 #define SPR_MPL_DMA_NOTIFY_SET_1 0x3801
 #define SPR_MPL_DMA_NOTIFY_SET_2 0x3802
+#define SPR_MPL_IDN_ACCESS_SET_0 0x0a00
+#define SPR_MPL_IDN_ACCESS_SET_1 0x0a01
+#define SPR_MPL_IDN_ACCESS_SET_2 0x0a02
+#define SPR_MPL_IDN_AVAIL_SET_0 0x3e00
+#define SPR_MPL_IDN_AVAIL_SET_1 0x3e01
+#define SPR_MPL_IDN_AVAIL_SET_2 0x3e02
+#define SPR_MPL_IDN_CA_SET_0 0x3a00
+#define SPR_MPL_IDN_CA_SET_1 0x3a01
+#define SPR_MPL_IDN_CA_SET_2 0x3a02
+#define SPR_MPL_IDN_COMPLETE_SET_0 0x1200
+#define SPR_MPL_IDN_COMPLETE_SET_1 0x1201
+#define SPR_MPL_IDN_COMPLETE_SET_2 0x1202
+#define SPR_MPL_IDN_FIREWALL_SET_0 0x2e00
+#define SPR_MPL_IDN_FIREWALL_SET_1 0x2e01
+#define SPR_MPL_IDN_FIREWALL_SET_2 0x2e02
+#define SPR_MPL_IDN_REFILL_SET_0 0x0e00
+#define SPR_MPL_IDN_REFILL_SET_1 0x0e01
+#define SPR_MPL_IDN_REFILL_SET_2 0x0e02
+#define SPR_MPL_IDN_TIMER_SET_0 0x3400
+#define SPR_MPL_IDN_TIMER_SET_1 0x3401
+#define SPR_MPL_IDN_TIMER_SET_2 0x3402
 #define SPR_MPL_INTCTRL_0_SET_0 0x4a00
 #define SPR_MPL_INTCTRL_0_SET_1 0x4a01
 #define SPR_MPL_INTCTRL_0_SET_2 0x4a02
@@ -102,6 +151,9 @@
 #define SPR_MPL_INTCTRL_2_SET_0 0x4600
 #define SPR_MPL_INTCTRL_2_SET_1 0x4601
 #define SPR_MPL_INTCTRL_2_SET_2 0x4602
+#define SPR_MPL_PERF_COUNT_SET_0 0x4200
+#define SPR_MPL_PERF_COUNT_SET_1 0x4201
+#define SPR_MPL_PERF_COUNT_SET_2 0x4202
 #define SPR_MPL_SN_ACCESS_SET_0 0x0800
 #define SPR_MPL_SN_ACCESS_SET_1 0x0801
 #define SPR_MPL_SN_ACCESS_SET_2 0x0802
@@ -181,6 +233,7 @@
 #define SPR_UDN_DEMUX_STATUS 0x0c0d
 #define SPR_UDN_DEMUX_WRITE_FIFO 0x0c0e
 #define SPR_UDN_DIRECTION_PROTECT 0x3005
+#define SPR_UDN_PENDING 0x0c10
 #define SPR_UDN_REFILL_EN 0x1005
 #define SPR_UDN_SP_FIFO_DATA 0x0c11
 #define SPR_UDN_SP_FIFO_SEL 0x0c12
@@ -195,6 +248,9 @@
 #define SPR_UDN_TAG_3 0x0c18
 #define SPR_UDN_TAG_VALID 0x0c19
 #define SPR_UDN_TILE_COORD 0x0c1a
+#define SPR_WATCH_CTL 0x4209
+#define SPR_WATCH_MASK 0x420a
+#define SPR_WATCH_VAL 0x420b
 
 #endif /* !defined(__ARCH_SPR_DEF_H__) */
 
diff --git a/arch/tile/include/arch/spr_def_64.h b/arch/tile/include/arch/spr_def_64.h
index cd3e5f9..0da86fa 100644
--- a/arch/tile/include/arch/spr_def_64.h
+++ b/arch/tile/include/arch/spr_def_64.h
@@ -52,6 +52,13 @@
 #define SPR_EX_CONTEXT_2_1__ICS_RMASK 0x1
 #define SPR_EX_CONTEXT_2_1__ICS_MASK  0x4
 #define SPR_FAIL 0x2707
+#define SPR_IDN_AVAIL_EN 0x1a05
+#define SPR_IDN_DATA_AVAIL 0x0a80
+#define SPR_IDN_DEADLOCK_TIMEOUT 0x1806
+#define SPR_IDN_DEMUX_COUNT_0 0x0a05
+#define SPR_IDN_DEMUX_COUNT_1 0x0a06
+#define SPR_IDN_DIRECTION_PROTECT 0x1405
+#define SPR_IDN_PENDING 0x0a08
 #define SPR_ILL_TRANS_REASON__I_STREAM_VA_RMASK 0x1
 #define SPR_INTCTRL_0_STATUS 0x2505
 #define SPR_INTCTRL_1_STATUS 0x2405
@@ -88,9 +95,27 @@
 #define SPR_IPI_MASK_SET_0 0x1f0a
 #define SPR_IPI_MASK_SET_1 0x1e0a
 #define SPR_IPI_MASK_SET_2 0x1d0a
+#define SPR_MPL_AUX_PERF_COUNT_SET_0 0x2100
+#define SPR_MPL_AUX_PERF_COUNT_SET_1 0x2101
+#define SPR_MPL_AUX_PERF_COUNT_SET_2 0x2102
 #define SPR_MPL_AUX_TILE_TIMER_SET_0 0x1700
 #define SPR_MPL_AUX_TILE_TIMER_SET_1 0x1701
 #define SPR_MPL_AUX_TILE_TIMER_SET_2 0x1702
+#define SPR_MPL_IDN_ACCESS_SET_0 0x0a00
+#define SPR_MPL_IDN_ACCESS_SET_1 0x0a01
+#define SPR_MPL_IDN_ACCESS_SET_2 0x0a02
+#define SPR_MPL_IDN_AVAIL_SET_0 0x1a00
+#define SPR_MPL_IDN_AVAIL_SET_1 0x1a01
+#define SPR_MPL_IDN_AVAIL_SET_2 0x1a02
+#define SPR_MPL_IDN_COMPLETE_SET_0 0x0500
+#define SPR_MPL_IDN_COMPLETE_SET_1 0x0501
+#define SPR_MPL_IDN_COMPLETE_SET_2 0x0502
+#define SPR_MPL_IDN_FIREWALL_SET_0 0x1400
+#define SPR_MPL_IDN_FIREWALL_SET_1 0x1401
+#define SPR_MPL_IDN_FIREWALL_SET_2 0x1402
+#define SPR_MPL_IDN_TIMER_SET_0 0x1800
+#define SPR_MPL_IDN_TIMER_SET_1 0x1801
+#define SPR_MPL_IDN_TIMER_SET_2 0x1802
 #define SPR_MPL_INTCTRL_0_SET_0 0x2500
 #define SPR_MPL_INTCTRL_0_SET_1 0x2501
 #define SPR_MPL_INTCTRL_0_SET_2 0x2502
@@ -100,6 +125,21 @@
 #define SPR_MPL_INTCTRL_2_SET_0 0x2300
 #define SPR_MPL_INTCTRL_2_SET_1 0x2301
 #define SPR_MPL_INTCTRL_2_SET_2 0x2302
+#define SPR_MPL_IPI_0 0x1f04
+#define SPR_MPL_IPI_0_SET_0 0x1f00
+#define SPR_MPL_IPI_0_SET_1 0x1f01
+#define SPR_MPL_IPI_0_SET_2 0x1f02
+#define SPR_MPL_IPI_1 0x1e04
+#define SPR_MPL_IPI_1_SET_0 0x1e00
+#define SPR_MPL_IPI_1_SET_1 0x1e01
+#define SPR_MPL_IPI_1_SET_2 0x1e02
+#define SPR_MPL_IPI_2 0x1d04
+#define SPR_MPL_IPI_2_SET_0 0x1d00
+#define SPR_MPL_IPI_2_SET_1 0x1d01
+#define SPR_MPL_IPI_2_SET_2 0x1d02
+#define SPR_MPL_PERF_COUNT_SET_0 0x2000
+#define SPR_MPL_PERF_COUNT_SET_1 0x2001
+#define SPR_MPL_PERF_COUNT_SET_2 0x2002
 #define SPR_MPL_UDN_ACCESS_SET_0 0x0b00
 #define SPR_MPL_UDN_ACCESS_SET_1 0x0b01
 #define SPR_MPL_UDN_ACCESS_SET_2 0x0b02
@@ -167,6 +207,9 @@
 #define SPR_UDN_DEMUX_COUNT_2 0x0b07
 #define SPR_UDN_DEMUX_COUNT_3 0x0b08
 #define SPR_UDN_DIRECTION_PROTECT 0x1505
+#define SPR_UDN_PENDING 0x0b0a
+#define SPR_WATCH_MASK 0x200a
+#define SPR_WATCH_VAL 0x200b
 
 #endif /* !defined(__ARCH_SPR_DEF_H__) */
 
diff --git a/arch/tile/include/asm/hardwall.h b/arch/tile/include/asm/hardwall.h
index 2ac4228..47514a5 100644
--- a/arch/tile/include/asm/hardwall.h
+++ b/arch/tile/include/asm/hardwall.h
@@ -11,12 +11,14 @@
  *   NON INFRINGEMENT.  See the GNU General Public License for
  *   more details.
  *
- * Provide methods for the HARDWALL_FILE for accessing the UDN.
+ * Provide methods for access control of per-cpu resources like
+ * UDN, IDN, or IPI.
  */
 
 #ifndef _ASM_TILE_HARDWALL_H
 #define _ASM_TILE_HARDWALL_H
 
+#include <arch/chip.h>
 #include <linux/ioctl.h>
 
 #define HARDWALL_IOCTL_BASE 0xa2
@@ -24,8 +26,9 @@
 /*
  * The HARDWALL_CREATE() ioctl is a macro with a "size" argument.
  * The resulting ioctl value is passed to the kernel in conjunction
- * with a pointer to a little-endian bitmask of cpus, which must be
- * physically in a rectangular configuration on the chip.
+ * with a pointer to a standard kernel bitmask of cpus.
+ * For network resources (UDN or IDN) the bitmask must physically
+ * represent a rectangular configuration on the chip.
  * The "size" is the number of bytes of cpu mask data.
  */
 #define _HARDWALL_CREATE 1
@@ -44,13 +47,7 @@
 #define HARDWALL_GET_ID \
  _IO(HARDWALL_IOCTL_BASE, _HARDWALL_GET_ID)
 
-#ifndef __KERNEL__
-
-/* This is the canonical name expected by userspace. */
-#define HARDWALL_FILE "/dev/hardwall"
-
-#else
-
+#ifdef __KERNEL__
 /* /proc hooks for hardwall. */
 struct proc_dir_entry;
 #ifdef CONFIG_HARDWALL
@@ -59,7 +56,6 @@ int proc_pid_hardwall(struct task_struct *task, char *buffer);
 #else
 static inline void proc_tile_hardwall_init(struct proc_dir_entry *root) {}
 #endif
-
 #endif
 
 #endif /* _ASM_TILE_HARDWALL_H */
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 34c1e01..e85a9af 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -76,6 +76,17 @@ struct async_tlb {
 
 #ifdef CONFIG_HARDWALL
 struct hardwall_info;
+struct hardwall_task {
+	/* Which hardwall is this task tied to? (or NULL if none) */
+	struct hardwall_info *info;
+	/* Chains this task into the list at info->task_head. */
+	struct list_head list;
+};
+#ifdef __tilepro__
+#define HARDWALL_TYPES 1   /* udn */
+#else
+#define HARDWALL_TYPES 3   /* udn, idn, and ipi */
+#endif
 #endif
 
 struct thread_struct {
@@ -116,10 +127,8 @@ struct thread_struct {
 	unsigned long dstream_pf;
 #endif
 #ifdef CONFIG_HARDWALL
-	/* Is this task tied to an activated hardwall? */
-	struct hardwall_info *hardwall;
-	/* Chains this task into the list at hardwall->list. */
-	struct list_head hardwall_list;
+	/* Hardwall information for various resources. */
+	struct hardwall_task hardwall[HARDWALL_TYPES];
 #endif
 #if CHIP_HAS_TILE_DMA()
 	/* Async DMA TLB fault information */
diff --git a/arch/tile/include/asm/system.h b/arch/tile/include/asm/system.h
index 23d1842..9a6f58e 100644
--- a/arch/tile/include/asm/system.h
+++ b/arch/tile/include/asm/system.h
@@ -223,14 +223,14 @@ void restrict_dma_mpls(void);
 #ifdef CONFIG_HARDWALL
 /* User-level network management functions */
 void reset_network_state(void);
-void grant_network_mpls(void);
-void restrict_network_mpls(void);
-int hardwall_deactivate(struct task_struct *task);
+void hardwall_switch_tasks(struct task_struct *prev, struct task_struct *next);
+void hardwall_deactivate_all(struct task_struct *task);
+int hardwall_ipi_valid(int cpu);
 
 /* Hook hardwall code into changes in affinity. */
 #define arch_set_cpus_allowed(p, new_mask) do { \
-	if (p->thread.hardwall && !cpumask_equal(&p->cpus_allowed, new_mask)) \
-		hardwall_deactivate(p); \
+	if (!cpumask_equal(&p->cpus_allowed, new_mask)) \
+		hardwall_deactivate_all(p); \
 } while (0)
 #endif
 
diff --git a/arch/tile/kernel/hardwall.c b/arch/tile/kernel/hardwall.c
index 8c41891..20273ee 100644
--- a/arch/tile/kernel/hardwall.c
+++ b/arch/tile/kernel/hardwall.c
@@ -33,59 +33,157 @@
 
 
 /*
- * This data structure tracks the rectangle data, etc., associated
- * one-to-one with a "struct file *" from opening HARDWALL_FILE.
+ * Implement a per-cpu "hardwall" resource class such as UDN or IPI.
+ * We use "hardwall" nomenclature throughout for historical reasons.
+ * The lock here controls access to the list data structure as well as
+ * to the items on the list.
+ */
+struct hardwall_type {
+	int index;
+	int is_xdn;
+	int is_idn;
+	int disabled;
+	const char *name;
+	struct list_head list;
+	spinlock_t lock;
+	struct proc_dir_entry *proc_dir;
+};
+
+enum hardwall_index {
+	HARDWALL_UDN = 0,
+#ifndef __tilepro__
+	HARDWALL_IDN = 1,
+	HARDWALL_IPI = 2,
+#endif
+	_HARDWALL_TYPES
+};
+
+static struct hardwall_type hardwall_types[] = {
+	{  /* user-space access to UDN */
+		0,
+		1,
+		0,
+		0,
+		"udn",
+		LIST_HEAD_INIT(hardwall_types[HARDWALL_UDN].list),
+		__SPIN_LOCK_INITIALIZER(hardwall_types[HARDWALL_UDN].lock),
+		NULL
+	},
+#ifndef __tilepro__
+	{  /* user-space access to IDN */
+		1,
+		1,
+		1,
+		1,  /* disabled pending hypervisor support */
+		"idn",
+		LIST_HEAD_INIT(hardwall_types[HARDWALL_IDN].list),
+		__SPIN_LOCK_INITIALIZER(hardwall_types[HARDWALL_IDN].lock),
+		NULL
+	},
+	{  /* access to user-space IPI */
+		2,
+		0,
+		0,
+		0,
+		"ipi",
+		LIST_HEAD_INIT(hardwall_types[HARDWALL_IPI].list),
+		__SPIN_LOCK_INITIALIZER(hardwall_types[HARDWALL_IPI].lock),
+		NULL
+	},
+#endif
+};
+
+/*
+ * This data structure tracks the cpu data, etc., associated
+ * one-to-one with a "struct file *" from opening a hardwall device file.
  * Note that the file's private data points back to this structure.
  */
 struct hardwall_info {
-	struct list_head list;             /* "rectangles" list */
+	struct list_head list;             /* for hardwall_types.list */
 	struct list_head task_head;        /* head of tasks in this hardwall */
-	struct cpumask cpumask;            /* cpus in the rectangle */
+	struct hardwall_type *type;        /* type of this resource */
+	struct cpumask cpumask;            /* cpus reserved */
+	int id;                            /* integer id for this hardwall */
+	int teardown_in_progress;          /* are we tearing this one down? */
+
+	/* Remaining fields only valid for user-network resources. */
 	int ulhc_x;                        /* upper left hand corner x coord */
 	int ulhc_y;                        /* upper left hand corner y coord */
 	int width;                         /* rectangle width */
 	int height;                        /* rectangle height */
-	int id;                            /* integer id for this hardwall */
-	int teardown_in_progress;          /* are we tearing this one down? */
+#if CHIP_HAS_REV1_XDN()
+	atomic_t xdn_pending_count;        /* cores in phase 1 of drain */
+#endif
 };
 
-/* Currently allocated hardwall rectangles */
-static LIST_HEAD(rectangles);
 
 /* /proc/tile/hardwall */
 static struct proc_dir_entry *hardwall_proc_dir;
 
 /* Functions to manage files in /proc/tile/hardwall. */
-static void hardwall_add_proc(struct hardwall_info *rect);
-static void hardwall_remove_proc(struct hardwall_info *rect);
-
-/*
- * Guard changes to the hardwall data structures.
- * This could be finer grained (e.g. one lock for the list of hardwall
- * rectangles, then separate embedded locks for each one's list of tasks),
- * but there are subtle correctness issues when trying to start with
- * a task's "hardwall" pointer and lock the correct rectangle's embedded
- * lock in the presence of a simultaneous deactivation, so it seems
- * easier to have a single lock, given that none of these data
- * structures are touched very frequently during normal operation.
- */
-static DEFINE_SPINLOCK(hardwall_lock);
+static void hardwall_add_proc(struct hardwall_info *);
+static void hardwall_remove_proc(struct hardwall_info *);
 
 /* Allow disabling UDN access. */
-static int udn_disabled;
 static int __init noudn(char *str)
 {
 	pr_info("User-space UDN access is disabled\n");
-	udn_disabled = 1;
+	hardwall_types[HARDWALL_UDN].disabled = 1;
 	return 0;
 }
 early_param("noudn", noudn);
 
+#ifndef __tilepro__
+/* Allow disabling IDN access. */
+static int __init noidn(char *str)
+{
+	pr_info("User-space IDN access is disabled\n");
+	hardwall_types[HARDWALL_IDN].disabled = 1;
+	return 0;
+}
+early_param("noidn", noidn);
+
+/* Allow disabling IPI access. */
+static int __init noipi(char *str)
+{
+	pr_info("User-space IPI access is disabled\n");
+	hardwall_types[HARDWALL_IPI].disabled = 1;
+	return 0;
+}
+early_param("noipi", noipi);
+#endif
+
 
 /*
- * Low-level primitives
+ * Low-level primitives for UDN/IDN
  */
 
+#ifdef __tilepro__
+#define mtspr_XDN(hwt, name, val) \
+	do { (void)(hwt); __insn_mtspr(SPR_UDN_##name, (val)); } while (0)
+#define mtspr_MPL_XDN(hwt, name, val) \
+	do { (void)(hwt); __insn_mtspr(SPR_MPL_UDN_##name, (val)); } while (0)
+#define mfspr_XDN(hwt, name) \
+	((void)(hwt), __insn_mfspr(SPR_UDN_##name))
+#else
+#define mtspr_XDN(hwt, name, val)					\
+	do {								\
+		if ((hwt)->is_idn)					\
+			__insn_mtspr(SPR_IDN_##name, (val));		\
+		else							\
+			__insn_mtspr(SPR_UDN_##name, (val));		\
+	} while (0)
+#define mtspr_MPL_XDN(hwt, name, val)					\
+	do {								\
+		if ((hwt)->is_idn)					\
+			__insn_mtspr(SPR_MPL_IDN_##name, (val));	\
+		else							\
+			__insn_mtspr(SPR_MPL_UDN_##name, (val));	\
+	} while (0)
+#define mfspr_XDN(hwt, name) \
+  ((hwt)->is_idn ? __insn_mfspr(SPR_IDN_##name) : __insn_mfspr(SPR_UDN_##name))
+#endif
+
 /* Set a CPU bit if the CPU is online. */
 #define cpu_online_set(cpu, dst) do { \
 	if (cpu_online(cpu))          \
@@ -101,7 +199,7 @@ static int contains(struct hardwall_info *r, int x, int y)
 }
 
 /* Compute the rectangle parameters and validate the cpumask. */
-static int setup_rectangle(struct hardwall_info *r, struct cpumask *mask)
+static int check_rectangle(struct hardwall_info *r, struct cpumask *mask)
 {
 	int x, y, cpu, ulhc, lrhc;
 
@@ -114,8 +212,6 @@ static int setup_rectangle(struct hardwall_info *r, struct cpumask *mask)
 	r->ulhc_y = cpu_y(ulhc);
 	r->width = cpu_x(lrhc) - r->ulhc_x + 1;
 	r->height = cpu_y(lrhc) - r->ulhc_y + 1;
-	cpumask_copy(&r->cpumask, mask);
-	r->id = ulhc;   /* The ulhc cpu id can be the hardwall id. */
 
 	/* Width and height must be positive */
 	if (r->width <= 0 || r->height <= 0)
@@ -128,7 +224,7 @@ static int setup_rectangle(struct hardwall_info *r, struct cpumask *mask)
 				return -EINVAL;
 
 	/*
-	 * Note that offline cpus can't be drained when this UDN
+	 * Note that offline cpus can't be drained when this user network
 	 * rectangle eventually closes.  We used to detect this
 	 * situation and print a warning, but it annoyed users and
 	 * they ignored it anyway, so now we just return without a
@@ -137,16 +233,6 @@ static int setup_rectangle(struct hardwall_info *r, struct cpumask *mask)
 	return 0;
 }
 
-/* Do the two given rectangles overlap on any cpu? */
-static int overlaps(struct hardwall_info *a, struct hardwall_info *b)
-{
-	return a->ulhc_x + a->width > b->ulhc_x &&    /* A not to the left */
-		b->ulhc_x + b->width > a->ulhc_x &&   /* B not to the left */
-		a->ulhc_y + a->height > b->ulhc_y &&  /* A not above */
-		b->ulhc_y + b->height > a->ulhc_y;    /* B not above */
-}
-
-
 /*
  * Hardware management of hardwall setup, teardown, trapping,
  * and enabling/disabling PL0 access to the networks.
@@ -157,23 +243,35 @@ enum direction_protect {
 	N_PROTECT = (1 << 0),
 	E_PROTECT = (1 << 1),
 	S_PROTECT = (1 << 2),
-	W_PROTECT = (1 << 3)
+	W_PROTECT = (1 << 3),
+	C_PROTECT = (1 << 4),
 };
 
-static void enable_firewall_interrupts(void)
+static inline int xdn_which_interrupt(struct hardwall_type *hwt)
+{
+#ifndef __tilepro__
+	if (hwt->is_idn)
+		return INT_IDN_FIREWALL;
+#endif
+	return INT_UDN_FIREWALL;
+}
+
+static void enable_firewall_interrupts(struct hardwall_type *hwt)
 {
-	arch_local_irq_unmask_now(INT_UDN_FIREWALL);
+	arch_local_irq_unmask_now(xdn_which_interrupt(hwt));
 }
 
-static void disable_firewall_interrupts(void)
+static void disable_firewall_interrupts(struct hardwall_type *hwt)
 {
-	arch_local_irq_mask_now(INT_UDN_FIREWALL);
+	arch_local_irq_mask_now(xdn_which_interrupt(hwt));
 }
 
 /* Set up hardwall on this cpu based on the passed hardwall_info. */
-static void hardwall_setup_ipi_func(void *info)
+static void hardwall_setup_func(void *info)
 {
 	struct hardwall_info *r = info;
+	struct hardwall_type *hwt = r->type;
+
 	int cpu = smp_processor_id();
 	int x = cpu % smp_width;
 	int y = cpu / smp_width;
@@ -187,13 +285,12 @@ static void hardwall_setup_ipi_func(void *info)
 	if (y == r->ulhc_y + r->height - 1)
 		bits |= S_PROTECT;
 	BUG_ON(bits == 0);
-	__insn_mtspr(SPR_UDN_DIRECTION_PROTECT, bits);
-	enable_firewall_interrupts();
-
+	mtspr_XDN(hwt, DIRECTION_PROTECT, bits);
+	enable_firewall_interrupts(hwt);
 }
 
 /* Set up all cpus on edge of rectangle to enable/disable hardwall SPRs. */
-static void hardwall_setup(struct hardwall_info *r)
+static void hardwall_protect_rectangle(struct hardwall_info *r)
 {
 	int x, y, cpu, delta;
 	struct cpumask rect_cpus;
@@ -217,37 +314,50 @@ static void hardwall_setup(struct hardwall_info *r)
 	}
 
 	/* Then tell all the cpus to set up their protection SPR */
-	on_each_cpu_mask(&rect_cpus, hardwall_setup_ipi_func, r, 1);
+	on_each_cpu_mask(&rect_cpus, hardwall_setup_func, r, 1);
 }
 
 void __kprobes do_hardwall_trap(struct pt_regs* regs, int fault_num)
 {
 	struct hardwall_info *rect;
+	struct hardwall_type *hwt;
 	struct task_struct *p;
 	struct siginfo info;
-	int x, y;
 	int cpu = smp_processor_id();
 	int found_processes;
 	unsigned long flags;
-
 	struct pt_regs *old_regs = set_irq_regs(regs);
+
 	irq_enter();
 
+	/* Figure out which network trapped. */
+	switch (fault_num) {
+#ifndef __tilepro__
+	case INT_IDN_FIREWALL:
+		hwt = &hardwall_types[HARDWALL_IDN];
+		break;
+#endif
+	case INT_UDN_FIREWALL:
+		hwt = &hardwall_types[HARDWALL_UDN];
+		break;
+	default:
+		BUG();
+	}
+	BUG_ON(hwt->disabled);
+
 	/* This tile trapped a network access; find the rectangle. */
-	x = cpu % smp_width;
-	y = cpu / smp_width;
-	spin_lock_irqsave(&hardwall_lock, flags);
-	list_for_each_entry(rect, &rectangles, list) {
-		if (contains(rect, x, y))
+	spin_lock_irqsave(&hwt->lock, flags);
+	list_for_each_entry(rect, &hwt->list, list) {
+		if (cpumask_test_cpu(cpu, &rect->cpumask))
 			break;
 	}
 
 	/*
 	 * It shouldn't be possible not to find this cpu on the
 	 * rectangle list, since only cpus in rectangles get hardwalled.
-	 * The hardwall is only removed after the UDN is drained.
+	 * The hardwall is only removed after the user network is drained.
 	 */
-	BUG_ON(&rect->list == &rectangles);
+	BUG_ON(&rect->list == &hwt->list);
 
 	/*
 	 * If we already started teardown on this hardwall, don't worry;
@@ -255,30 +365,32 @@ void __kprobes do_hardwall_trap(struct pt_regs* regs, int fault_num)
 	 * to quiesce.
 	 */
 	if (rect->teardown_in_progress) {
-		pr_notice("cpu %d: detected hardwall violation %#lx"
+		pr_notice("cpu %d: detected %s hardwall violation %#lx"
 		       " while teardown already in progress\n",
-		       cpu, (long) __insn_mfspr(SPR_UDN_DIRECTION_PROTECT));
+			  cpu, hwt->name,
+			  (long)mfspr_XDN(hwt, DIRECTION_PROTECT));
 		goto done;
 	}
 
 	/*
 	 * Kill off any process that is activated in this rectangle.
 	 * We bypass security to deliver the signal, since it must be
-	 * one of the activated processes that generated the UDN
+	 * one of the activated processes that generated the user network
 	 * message that caused this trap, and all the activated
 	 * processes shared a single open file so are pretty tightly
 	 * bound together from a security point of view to begin with.
 	 */
 	rect->teardown_in_progress = 1;
 	wmb(); /* Ensure visibility of rectangle before notifying processes. */
-	pr_notice("cpu %d: detected hardwall violation %#lx...\n",
-	       cpu, (long) __insn_mfspr(SPR_UDN_DIRECTION_PROTECT));
+	pr_notice("cpu %d: detected %s hardwall violation %#lx...\n",
+		  cpu, hwt->name, (long)mfspr_XDN(hwt, DIRECTION_PROTECT));
 	info.si_signo = SIGILL;
 	info.si_errno = 0;
 	info.si_code = ILL_HARDWALL;
 	found_processes = 0;
-	list_for_each_entry(p, &rect->task_head, thread.hardwall_list) {
-		BUG_ON(p->thread.hardwall != rect);
+	list_for_each_entry(p, &rect->task_head,
+			    thread.hardwall[hwt->index].list) {
+		BUG_ON(p->thread.hardwall[hwt->index].info != rect);
 		if (!(p->flags & PF_EXITING)) {
 			found_processes = 1;
 			pr_notice("hardwall: killing %d\n", p->pid);
@@ -289,7 +401,7 @@ void __kprobes do_hardwall_trap(struct pt_regs* regs, int fault_num)
 		pr_notice("hardwall: no associated processes!\n");
 
  done:
-	spin_unlock_irqrestore(&hardwall_lock, flags);
+	spin_unlock_irqrestore(&hwt->lock, flags);
 
 	/*
 	 * We have to disable firewall interrupts now, or else when we
@@ -298,48 +410,87 @@ void __kprobes do_hardwall_trap(struct pt_regs* regs, int fault_num)
 	 * haven't yet drained the network, and that would allow packets
 	 * to cross out of the hardwall region.
 	 */
-	disable_firewall_interrupts();
+	disable_firewall_interrupts(hwt);
 
 	irq_exit();
 	set_irq_regs(old_regs);
 }
 
-/* Allow access from user space to the UDN. */
-void grant_network_mpls(void)
+/* Allow access from user space to the user network. */
+void grant_hardwall_mpls(struct hardwall_type *hwt)
 {
-	__insn_mtspr(SPR_MPL_UDN_ACCESS_SET_0, 1);
-	__insn_mtspr(SPR_MPL_UDN_AVAIL_SET_0, 1);
-	__insn_mtspr(SPR_MPL_UDN_COMPLETE_SET_0, 1);
-	__insn_mtspr(SPR_MPL_UDN_TIMER_SET_0, 1);
+#ifndef __tilepro__
+	if (!hwt->is_xdn) {
+		__insn_mtspr(SPR_MPL_IPI_0_SET_0, 1);
+		return;
+	}
+#endif
+	mtspr_MPL_XDN(hwt, ACCESS_SET_0, 1);
+	mtspr_MPL_XDN(hwt, AVAIL_SET_0, 1);
+	mtspr_MPL_XDN(hwt, COMPLETE_SET_0, 1);
+	mtspr_MPL_XDN(hwt, TIMER_SET_0, 1);
 #if !CHIP_HAS_REV1_XDN()
-	__insn_mtspr(SPR_MPL_UDN_REFILL_SET_0, 1);
-	__insn_mtspr(SPR_MPL_UDN_CA_SET_0, 1);
+	mtspr_MPL_XDN(hwt, REFILL_SET_0, 1);
+	mtspr_MPL_XDN(hwt, CA_SET_0, 1);
 #endif
 }
 
-/* Deny access from user space to the UDN. */
-void restrict_network_mpls(void)
+/* Deny access from user space to the user network. */
+void restrict_hardwall_mpls(struct hardwall_type *hwt)
 {
-	__insn_mtspr(SPR_MPL_UDN_ACCESS_SET_1, 1);
-	__insn_mtspr(SPR_MPL_UDN_AVAIL_SET_1, 1);
-	__insn_mtspr(SPR_MPL_UDN_COMPLETE_SET_1, 1);
-	__insn_mtspr(SPR_MPL_UDN_TIMER_SET_1, 1);
+#ifndef __tilepro__
+	if (!hwt->is_xdn) {
+		__insn_mtspr(SPR_MPL_IPI_0_SET_1, 1);
+		return;
+	}
+#endif
+	mtspr_MPL_XDN(hwt, ACCESS_SET_1, 1);
+	mtspr_MPL_XDN(hwt, AVAIL_SET_1, 1);
+	mtspr_MPL_XDN(hwt, COMPLETE_SET_1, 1);
+	mtspr_MPL_XDN(hwt, TIMER_SET_1, 1);
 #if !CHIP_HAS_REV1_XDN()
-	__insn_mtspr(SPR_MPL_UDN_REFILL_SET_1, 1);
-	__insn_mtspr(SPR_MPL_UDN_CA_SET_1, 1);
+	mtspr_MPL_XDN(hwt, REFILL_SET_1, 1);
+	mtspr_MPL_XDN(hwt, CA_SET_1, 1);
 #endif
 }
 
+/* Restrict or deny as necessary for the task we're switching to. */
+void hardwall_switch_tasks(struct task_struct *prev,
+			   struct task_struct *next)
+{
+	int i;
+	for (i = 0; i < HARDWALL_TYPES; ++i) {
+		if (prev->thread.hardwall[i].info != NULL) {
+			if (next->thread.hardwall[i].info == NULL)
+				restrict_hardwall_mpls(&hardwall_types[i]);
+		} else if (next->thread.hardwall[i].info != NULL) {
+			grant_hardwall_mpls(&hardwall_types[i]);
+		}
+	}
+}
+
+/* Does this task have the right to IPI the given cpu? */
+int hardwall_ipi_valid(int cpu)
+{
+#ifdef __tilegx__
+	struct hardwall_info *info =
+		current->thread.hardwall[HARDWALL_IPI].info;
+	return info && cpumask_test_cpu(cpu, &info->cpumask);
+#else
+	return 0;
+#endif
+}
 
 /*
- * Code to create, activate, deactivate, and destroy hardwall rectangles.
+ * Code to create, activate, deactivate, and destroy hardwall resources.
  */
 
-/* Create a hardwall for the given rectangle */
-static struct hardwall_info *hardwall_create(
-	size_t size, const unsigned char __user *bits)
+/* Create a hardwall for the given resource */
+static struct hardwall_info *hardwall_create(struct hardwall_type *hwt,
+					     size_t size,
+					     const unsigned char __user *bits)
 {
-	struct hardwall_info *iter, *rect;
+	struct hardwall_info *iter, *info;
 	struct cpumask mask;
 	unsigned long flags;
 	int rc;
@@ -370,55 +521,62 @@ static struct hardwall_info *hardwall_create(
 		}
 	}
 
-	/* Allocate a new rectangle optimistically. */
-	rect = kmalloc(sizeof(struct hardwall_info),
+	/* Allocate a new hardwall_info optimistically. */
+	info = kmalloc(sizeof(struct hardwall_info),
 			GFP_KERNEL | __GFP_ZERO);
-	if (rect == NULL)
+	if (info == NULL)
 		return ERR_PTR(-ENOMEM);
-	INIT_LIST_HEAD(&rect->task_head);
+	INIT_LIST_HEAD(&info->task_head);
+	info->type = hwt;
 
 	/* Compute the rectangle size and validate that it's plausible. */
-	rc = setup_rectangle(rect, &mask);
-	if (rc != 0) {
-		kfree(rect);
-		return ERR_PTR(rc);
+	cpumask_copy(&info->cpumask, &mask);
+	info->id = find_first_bit(cpumask_bits(&mask), nr_cpumask_bits);
+	if (hwt->is_xdn) {
+		rc = check_rectangle(info, &mask);
+		if (rc != 0) {
+			kfree(info);
+			return ERR_PTR(rc);
+		}
 	}
 
 	/* Confirm it doesn't overlap and add it to the list. */
-	spin_lock_irqsave(&hardwall_lock, flags);
-	list_for_each_entry(iter, &rectangles, list) {
-		if (overlaps(iter, rect)) {
-			spin_unlock_irqrestore(&hardwall_lock, flags);
-			kfree(rect);
+	spin_lock_irqsave(&hwt->lock, flags);
+	list_for_each_entry(iter, &hwt->list, list) {
+		if (cpumask_intersects(&iter->cpumask, &info->cpumask)) {
+			spin_unlock_irqrestore(&hwt->lock, flags);
+			kfree(info);
 			return ERR_PTR(-EBUSY);
 		}
 	}
-	list_add_tail(&rect->list, &rectangles);
-	spin_unlock_irqrestore(&hardwall_lock, flags);
+	list_add_tail(&info->list, &hwt->list);
+	spin_unlock_irqrestore(&hwt->lock, flags);
 
 	/* Set up appropriate hardwalling on all affected cpus. */
-	hardwall_setup(rect);
+	if (hwt->is_xdn)
+		hardwall_protect_rectangle(info);
 
 	/* Create a /proc/tile/hardwall entry. */
-	hardwall_add_proc(rect);
+	hardwall_add_proc(info);
 
-	return rect;
+	return info;
 }
 
 /* Activate a given hardwall on this cpu for this process. */
-static int hardwall_activate(struct hardwall_info *rect)
+static int hardwall_activate(struct hardwall_info *info)
 {
-	int cpu, x, y;
+	int cpu;
 	unsigned long flags;
 	struct task_struct *p = current;
 	struct thread_struct *ts = &p->thread;
+	struct hardwall_type *hwt;
 
-	/* Require a rectangle. */
-	if (rect == NULL)
+	/* Require a hardwall. */
+	if (info == NULL)
 		return -ENODATA;
 
-	/* Not allowed to activate a rectangle that is being torn down. */
-	if (rect->teardown_in_progress)
+	/* Not allowed to activate a hardwall that is being torn down. */
+	if (info->teardown_in_progress)
 		return -EINVAL;
 
 	/*
@@ -428,78 +586,87 @@ static int hardwall_activate(struct hardwall_info *rect)
 	if (cpumask_weight(&p->cpus_allowed) != 1)
 		return -EPERM;
 
-	/* Make sure we are bound to a cpu in this rectangle. */
+	/* Make sure we are bound to a cpu assigned to this resource. */
 	cpu = smp_processor_id();
 	BUG_ON(cpumask_first(&p->cpus_allowed) != cpu);
-	x = cpu_x(cpu);
-	y = cpu_y(cpu);
-	if (!contains(rect, x, y))
+	if (!cpumask_test_cpu(cpu, &info->cpumask))
 		return -EINVAL;
 
 	/* If we are already bound to this hardwall, it's a no-op. */
-	if (ts->hardwall) {
-		BUG_ON(ts->hardwall != rect);
+	hwt = info->type;
+	if (ts->hardwall[hwt->index].info) {
+		BUG_ON(ts->hardwall[hwt->index].info != info);
 		return 0;
 	}
 
-	/* Success!  This process gets to use the user networks on this cpu. */
-	ts->hardwall = rect;
-	spin_lock_irqsave(&hardwall_lock, flags);
-	list_add(&ts->hardwall_list, &rect->task_head);
-	spin_unlock_irqrestore(&hardwall_lock, flags);
-	grant_network_mpls();
-	printk(KERN_DEBUG "Pid %d (%s) activated for hardwall: cpu %d\n",
-	       p->pid, p->comm, cpu);
+	/* Success!  This process gets to use the resource on this cpu. */
+	ts->hardwall[hwt->index].info = info;
+	spin_lock_irqsave(&hwt->lock, flags);
+	list_add(&ts->hardwall[hwt->index].list, &info->task_head);
+	spin_unlock_irqrestore(&hwt->lock, flags);
+	grant_hardwall_mpls(hwt);
+	printk(KERN_DEBUG "Pid %d (%s) activated for %s hardwall: cpu %d\n",
+	       p->pid, p->comm, hwt->name, cpu);
 	return 0;
 }
 
 /*
- * Deactivate a task's hardwall.  Must hold hardwall_lock.
+ * Deactivate a task's hardwall.  Must hold lock for hardwall_type.
  * This method may be called from free_task(), so we don't want to
  * rely on too many fields of struct task_struct still being valid.
  * We assume the cpus_allowed, pid, and comm fields are still valid.
  */
-static void _hardwall_deactivate(struct task_struct *task)
+static void _hardwall_deactivate(struct hardwall_type *hwt,
+				 struct task_struct *task)
 {
 	struct thread_struct *ts = &task->thread;
 
 	if (cpumask_weight(&task->cpus_allowed) != 1) {
-		pr_err("pid %d (%s) releasing networks with"
+		pr_err("pid %d (%s) releasing %s hardwall with"
 		       " an affinity mask containing %d cpus!\n",
-		       task->pid, task->comm,
+		       task->pid, task->comm, hwt->name,
 		       cpumask_weight(&task->cpus_allowed));
 		BUG();
 	}
 
-	BUG_ON(ts->hardwall == NULL);
-	ts->hardwall = NULL;
-	list_del(&ts->hardwall_list);
+	BUG_ON(ts->hardwall[hwt->index].info == NULL);
+	ts->hardwall[hwt->index].info = NULL;
+	list_del(&ts->hardwall[hwt->index].list);
 	if (task == current)
-		restrict_network_mpls();
+		restrict_hardwall_mpls(hwt);
 }
 
 /* Deactivate a task's hardwall. */
-int hardwall_deactivate(struct task_struct *task)
+static int hardwall_deactivate(struct hardwall_type *hwt,
+			       struct task_struct *task)
 {
 	unsigned long flags;
 	int activated;
 
-	spin_lock_irqsave(&hardwall_lock, flags);
-	activated = (task->thread.hardwall != NULL);
+	spin_lock_irqsave(&hwt->lock, flags);
+	activated = (task->thread.hardwall[hwt->index].info != NULL);
 	if (activated)
-		_hardwall_deactivate(task);
-	spin_unlock_irqrestore(&hardwall_lock, flags);
+		_hardwall_deactivate(hwt, task);
+	spin_unlock_irqrestore(&hwt->lock, flags);
 
 	if (!activated)
 		return -EINVAL;
 
-	printk(KERN_DEBUG "Pid %d (%s) deactivated for hardwall: cpu %d\n",
-	       task->pid, task->comm, smp_processor_id());
+	printk(KERN_DEBUG "Pid %d (%s) deactivated for %s hardwall: cpu %d\n",
+	       task->pid, task->comm, hwt->name, smp_processor_id());
 	return 0;
 }
 
-/* Stop a UDN switch before draining the network. */
-static void stop_udn_switch(void *ignored)
+void hardwall_deactivate_all(struct task_struct *task)
+{
+	int i;
+	for (i = 0; i < HARDWALL_TYPES; ++i)
+		if (task->thread.hardwall[i].info)
+			hardwall_deactivate(&hardwall_types[i], task);
+}
+
+/* Stop the switch before draining the network. */
+static void stop_xdn_switch(void *arg)
 {
 #if !CHIP_HAS_REV1_XDN()
 	/* Freeze the switch and the demux. */
@@ -507,13 +674,71 @@ static void stop_udn_switch(void *ignored)
 		     SPR_UDN_SP_FREEZE__SP_FRZ_MASK |
 		     SPR_UDN_SP_FREEZE__DEMUX_FRZ_MASK |
 		     SPR_UDN_SP_FREEZE__NON_DEST_EXT_MASK);
+#else
+	/*
+	 * Drop all packets bound for the core or off the edge.
+	 * We rely on the normal hardwall protection setup code
+	 * to have set the low four bits to trigger firewall interrupts,
+	 * and shift those bits up to trigger "drop on send" semantics,
+	 * plus adding "drop on send to core" for all switches.
+	 * In practice it seems the switches latch the DIRECTION_PROTECT
+	 * SPR so they won't start dropping if they're already
+	 * delivering the last message to the core, but it doesn't
+	 * hurt to enable it here.
+	 */
+	struct hardwall_type *hwt = arg;
+	unsigned long protect = mfspr_XDN(hwt, DIRECTION_PROTECT);
+	mtspr_XDN(hwt, DIRECTION_PROTECT, (protect | C_PROTECT) << 5);
 #endif
 }
 
+static void empty_xdn_demuxes(struct hardwall_type *hwt)
+{
+#ifndef __tilepro__
+	if (hwt->is_idn) {
+		while (__insn_mfspr(SPR_IDN_DATA_AVAIL) & (1 << 0))
+			(void) __tile_idn0_receive();
+		while (__insn_mfspr(SPR_IDN_DATA_AVAIL) & (1 << 1))
+			(void) __tile_idn1_receive();
+		return;
+	}
+#endif
+	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 0))
+		(void) __tile_udn0_receive();
+	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 1))
+		(void) __tile_udn1_receive();
+	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 2))
+		(void) __tile_udn2_receive();
+	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 3))
+		(void) __tile_udn3_receive();
+}
+
 /* Drain all the state from a stopped switch. */
-static void drain_udn_switch(void *ignored)
+static void drain_xdn_switch(void *arg)
 {
-#if !CHIP_HAS_REV1_XDN()
+	struct hardwall_info *info = arg;
+	struct hardwall_type *hwt = info->type;
+
+#if CHIP_HAS_REV1_XDN()
+	/*
+	 * The switches have been configured to drop any messages
+	 * destined for cores (or off the edge of the rectangle).
+	 * But the current message may continue to be delivered,
+	 * so we wait until all the cores have finished any pending
+	 * messages before we stop draining.
+	 */
+	int pending = mfspr_XDN(hwt, PENDING);
+	while (pending--) {
+		empty_xdn_demuxes(hwt);
+		if (hwt->is_idn)
+			__tile_idn_send(0);
+		else
+			__tile_udn_send(0);
+	}
+	atomic_dec(&info->xdn_pending_count);
+	while (atomic_read(&info->xdn_pending_count))
+		empty_xdn_demuxes(hwt);
+#else
 	int i;
 	int from_tile_words, ca_count;
 
@@ -533,15 +758,7 @@ static void drain_udn_switch(void *ignored)
 		(void) __insn_mfspr(SPR_UDN_DEMUX_WRITE_FIFO);
 
 	/* Empty out demuxes. */
-	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 0))
-		(void) __tile_udn0_receive();
-	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 1))
-		(void) __tile_udn1_receive();
-	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 2))
-		(void) __tile_udn2_receive();
-	while (__insn_mfspr(SPR_UDN_DATA_AVAIL) & (1 << 3))
-		(void) __tile_udn3_receive();
-	BUG_ON((__insn_mfspr(SPR_UDN_DATA_AVAIL) & 0xF) != 0);
+	empty_xdn_demuxes(hwt);
 
 	/* Empty out catch all. */
 	ca_count = __insn_mfspr(SPR_UDN_DEMUX_CA_COUNT);
@@ -563,21 +780,25 @@ static void drain_udn_switch(void *ignored)
 #endif
 }
 
-/* Reset random UDN state registers at boot up and during hardwall teardown. */
-void reset_network_state(void)
+/* Reset random XDN state registers at boot up and during hardwall teardown. */
+static void reset_xdn_network_state(struct hardwall_type *hwt)
 {
-#if !CHIP_HAS_REV1_XDN()
-	/* Reset UDN coordinates to their standard value */
-	unsigned int cpu = smp_processor_id();
-	unsigned int x = cpu % smp_width;
-	unsigned int y = cpu / smp_width;
-#endif
-
-	if (udn_disabled)
+	if (hwt->disabled)
 		return;
 
+	/* Clear out other random registers so we have a clean slate. */
+	mtspr_XDN(hwt, DIRECTION_PROTECT, 0);
+	mtspr_XDN(hwt, AVAIL_EN, 0);
+	mtspr_XDN(hwt, DEADLOCK_TIMEOUT, 0);
+
 #if !CHIP_HAS_REV1_XDN()
-	__insn_mtspr(SPR_UDN_TILE_COORD, (x << 18) | (y << 7));
+	/* Reset UDN coordinates to their standard value */
+	{
+		unsigned int cpu = smp_processor_id();
+		unsigned int x = cpu % smp_width;
+		unsigned int y = cpu / smp_width;
+		__insn_mtspr(SPR_UDN_TILE_COORD, (x << 18) | (y << 7));
+	}
 
 	/* Set demux tags to predefined values and enable them. */
 	__insn_mtspr(SPR_UDN_TAG_VALID, 0xf);
@@ -585,56 +806,50 @@ void reset_network_state(void)
 	__insn_mtspr(SPR_UDN_TAG_1, (1 << 1));
 	__insn_mtspr(SPR_UDN_TAG_2, (1 << 2));
 	__insn_mtspr(SPR_UDN_TAG_3, (1 << 3));
-#endif
 
-	/* Clear out other random registers so we have a clean slate. */
-	__insn_mtspr(SPR_UDN_AVAIL_EN, 0);
-	__insn_mtspr(SPR_UDN_DEADLOCK_TIMEOUT, 0);
-#if !CHIP_HAS_REV1_XDN()
+	/* Set other rev0 random registers to a clean state. */
 	__insn_mtspr(SPR_UDN_REFILL_EN, 0);
 	__insn_mtspr(SPR_UDN_DEMUX_QUEUE_SEL, 0);
 	__insn_mtspr(SPR_UDN_SP_FIFO_SEL, 0);
-#endif
 
 	/* Start the switch and demux. */
-#if !CHIP_HAS_REV1_XDN()
 	__insn_mtspr(SPR_UDN_SP_FREEZE, 0);
 #endif
 }
 
-/* Restart a UDN switch after draining. */
-static void restart_udn_switch(void *ignored)
+void reset_network_state(void)
 {
-	reset_network_state();
-
-	/* Disable firewall interrupts. */
-	__insn_mtspr(SPR_UDN_DIRECTION_PROTECT, 0);
-	disable_firewall_interrupts();
+	reset_xdn_network_state(&hardwall_types[HARDWALL_UDN]);
+#ifndef __tilepro__
+	reset_xdn_network_state(&hardwall_types[HARDWALL_IDN]);
+#endif
 }
 
-/* Build a struct cpumask containing all valid tiles in bounding rectangle. */
-static void fill_mask(struct hardwall_info *r, struct cpumask *result)
+/* Restart an XDN switch after draining. */
+static void restart_xdn_switch(void *arg)
 {
-	int x, y, cpu;
+	struct hardwall_type *hwt = arg;
 
-	cpumask_clear(result);
+#if CHIP_HAS_REV1_XDN()
+	/* One last drain step to avoid races with injection and draining. */
+	empty_xdn_demuxes(hwt);
+#endif
 
-	cpu = r->ulhc_y * smp_width + r->ulhc_x;
-	for (y = 0; y < r->height; ++y, cpu += smp_width - r->width) {
-		for (x = 0; x < r->width; ++x, ++cpu)
-			cpu_online_set(cpu, result);
-	}
+	reset_xdn_network_state(hwt);
+
+	/* Disable firewall interrupts. */
+	disable_firewall_interrupts(hwt);
 }
 
 /* Last reference to a hardwall is gone, so clear the network. */
-static void hardwall_destroy(struct hardwall_info *rect)
+static void hardwall_destroy(struct hardwall_info *info)
 {
 	struct task_struct *task;
+	struct hardwall_type *hwt;
 	unsigned long flags;
-	struct cpumask mask;
 
-	/* Make sure this file actually represents a rectangle. */
-	if (rect == NULL)
+	/* Make sure this file actually represents a hardwall. */
+	if (info == NULL)
 		return;
 
 	/*
@@ -644,39 +859,53 @@ static void hardwall_destroy(struct hardwall_info *rect)
 	 * deactivate any remaining tasks before freeing the
 	 * hardwall_info object itself.
 	 */
-	spin_lock_irqsave(&hardwall_lock, flags);
-	list_for_each_entry(task, &rect->task_head, thread.hardwall_list)
-		_hardwall_deactivate(task);
-	spin_unlock_irqrestore(&hardwall_lock, flags);
-
-	/* Drain the UDN. */
-	printk(KERN_DEBUG "Clearing hardwall rectangle %dx%d %d,%d\n",
-	       rect->width, rect->height, rect->ulhc_x, rect->ulhc_y);
-	fill_mask(rect, &mask);
-	on_each_cpu_mask(&mask, stop_udn_switch, NULL, 1);
-	on_each_cpu_mask(&mask, drain_udn_switch, NULL, 1);
+	hwt = info->type;
+	info->teardown_in_progress = 1;
+	spin_lock_irqsave(&hwt->lock, flags);
+	list_for_each_entry(task, &info->task_head,
+			    thread.hardwall[hwt->index].list)
+		_hardwall_deactivate(hwt, task);
+	spin_unlock_irqrestore(&hwt->lock, flags);
+
+	if (hwt->is_xdn) {
+		/* Configure the switches for draining the user network. */
+		printk(KERN_DEBUG
+		       "Clearing %s hardwall rectangle %dx%d %d,%d\n",
+		       hwt->name, info->width, info->height,
+		       info->ulhc_x, info->ulhc_y);
+		on_each_cpu_mask(&info->cpumask, stop_xdn_switch, hwt, 1);
+
+		/* Drain the network. */
+#if CHIP_HAS_REV1_XDN()
+		atomic_set(&info->xdn_pending_count,
+			   cpumask_weight(&info->cpumask));
+		on_each_cpu_mask(&info->cpumask, drain_xdn_switch, info, 0);
+#else
+		on_each_cpu_mask(&info->cpumask, drain_xdn_switch, info, 1);
+#endif
 
-	/* Restart switch and disable firewall. */
-	on_each_cpu_mask(&mask, restart_udn_switch, NULL, 1);
+		/* Restart switch and disable firewall. */
+		on_each_cpu_mask(&info->cpumask, restart_xdn_switch, hwt, 1);
+	}
 
 	/* Remove the /proc/tile/hardwall entry. */
-	hardwall_remove_proc(rect);
-
-	/* Now free the rectangle from the list. */
-	spin_lock_irqsave(&hardwall_lock, flags);
-	BUG_ON(!list_empty(&rect->task_head));
-	list_del(&rect->list);
-	spin_unlock_irqrestore(&hardwall_lock, flags);
-	kfree(rect);
+	hardwall_remove_proc(info);
+
+	/* Now free the hardwall from the list. */
+	spin_lock_irqsave(&hwt->lock, flags);
+	BUG_ON(!list_empty(&info->task_head));
+	list_del(&info->list);
+	spin_unlock_irqrestore(&hwt->lock, flags);
+	kfree(info);
 }
 
 
 static int hardwall_proc_show(struct seq_file *sf, void *v)
 {
-	struct hardwall_info *rect = sf->private;
+	struct hardwall_info *info = sf->private;
 	char buf[256];
 
-	int rc = cpulist_scnprintf(buf, sizeof(buf), &rect->cpumask);
+	int rc = cpulist_scnprintf(buf, sizeof(buf), &info->cpumask);
 	buf[rc++] = '\n';
 	seq_write(sf, buf, rc);
 	return 0;
@@ -695,31 +924,45 @@ static const struct file_operations hardwall_proc_fops = {
 	.release	= single_release,
 };
 
-static void hardwall_add_proc(struct hardwall_info *rect)
+static void hardwall_add_proc(struct hardwall_info *info)
 {
 	char buf[64];
-	snprintf(buf, sizeof(buf), "%d", rect->id);
-	proc_create_data(buf, 0444, hardwall_proc_dir,
-			 &hardwall_proc_fops, rect);
+	snprintf(buf, sizeof(buf), "%d", info->id);
+	proc_create_data(buf, 0444, info->type->proc_dir,
+			 &hardwall_proc_fops, info);
 }
 
-static void hardwall_remove_proc(struct hardwall_info *rect)
+static void hardwall_remove_proc(struct hardwall_info *info)
 {
 	char buf[64];
-	snprintf(buf, sizeof(buf), "%d", rect->id);
-	remove_proc_entry(buf, hardwall_proc_dir);
+	snprintf(buf, sizeof(buf), "%d", info->id);
+	remove_proc_entry(buf, info->type->proc_dir);
 }
 
 int proc_pid_hardwall(struct task_struct *task, char *buffer)
 {
-	struct hardwall_info *rect = task->thread.hardwall;
-	return rect ? sprintf(buffer, "%d\n", rect->id) : 0;
+	int i;
+	int n = 0;
+	for (i = 0; i < HARDWALL_TYPES; ++i) {
+		struct hardwall_info *info = task->thread.hardwall[i].info;
+		if (info)
+			n += sprintf(&buffer[n], "%s: %d\n",
+				     info->type->name, info->id);
+	}
+	return n;
 }
 
 void proc_tile_hardwall_init(struct proc_dir_entry *root)
 {
-	if (!udn_disabled)
-		hardwall_proc_dir = proc_mkdir("hardwall", root);
+	int i;
+	for (i = 0; i < HARDWALL_TYPES; ++i) {
+		struct hardwall_type *hwt = &hardwall_types[i];
+		if (hwt->disabled)
+			continue;
+		if (hardwall_proc_dir == NULL)
+			hardwall_proc_dir = proc_mkdir("hardwall", root);
+		hwt->proc_dir = proc_mkdir(hwt->name, hardwall_proc_dir);
+	}
 }
 
 
@@ -729,34 +972,45 @@ void proc_tile_hardwall_init(struct proc_dir_entry *root)
 
 static long hardwall_ioctl(struct file *file, unsigned int a, unsigned long b)
 {
-	struct hardwall_info *rect = file->private_data;
+	struct hardwall_info *info = file->private_data;
+	int minor = iminor(file->f_mapping->host);
+	struct hardwall_type* hwt;
 
 	if (_IOC_TYPE(a) != HARDWALL_IOCTL_BASE)
 		return -EINVAL;
 
+	BUILD_BUG_ON(HARDWALL_TYPES != _HARDWALL_TYPES);
+	BUILD_BUG_ON(HARDWALL_TYPES !=
+		     sizeof(hardwall_types)/sizeof(hardwall_types[0]));
+
+	if (minor < 0 || minor >= HARDWALL_TYPES)
+		return -EINVAL;
+	hwt = &hardwall_types[minor];
+	WARN_ON(info && hwt != info->type);
+
 	switch (_IOC_NR(a)) {
 	case _HARDWALL_CREATE:
-		if (udn_disabled)
+		if (hwt->disabled)
 			return -ENOSYS;
-		if (rect != NULL)
+		if (info != NULL)
 			return -EALREADY;
-		rect = hardwall_create(_IOC_SIZE(a),
-					(const unsigned char __user *)b);
-		if (IS_ERR(rect))
-			return PTR_ERR(rect);
-		file->private_data = rect;
+		info = hardwall_create(hwt, _IOC_SIZE(a),
+				       (const unsigned char __user *)b);
+		if (IS_ERR(info))
+			return PTR_ERR(info);
+		file->private_data = info;
 		return 0;
 
 	case _HARDWALL_ACTIVATE:
-		return hardwall_activate(rect);
+		return hardwall_activate(info);
 
 	case _HARDWALL_DEACTIVATE:
-		if (current->thread.hardwall != rect)
+		if (current->thread.hardwall[hwt->index].info != info)
 			return -EINVAL;
-		return hardwall_deactivate(current);
+		return hardwall_deactivate(hwt, current);
 
 	case _HARDWALL_GET_ID:
-		return rect ? rect->id : -EINVAL;
+		return info ? info->id : -EINVAL;
 
 	default:
 		return -EINVAL;
@@ -775,26 +1029,28 @@ static long hardwall_compat_ioctl(struct file *file,
 /* The user process closed the file; revoke access to user networks. */
 static int hardwall_flush(struct file *file, fl_owner_t owner)
 {
-	struct hardwall_info *rect = file->private_data;
+	struct hardwall_info *info = file->private_data;
 	struct task_struct *task, *tmp;
 	unsigned long flags;
 
-	if (rect) {
+	if (info) {
 		/*
 		 * NOTE: if multiple threads are activated on this hardwall
 		 * file, the other threads will continue having access to the
-		 * UDN until they are context-switched out and back in again.
+		 * user network until they are context-switched out and back
+		 * in again.
 		 *
 		 * NOTE: A NULL files pointer means the task is being torn
 		 * down, so in that case we also deactivate it.
 		 */
-		spin_lock_irqsave(&hardwall_lock, flags);
-		list_for_each_entry_safe(task, tmp, &rect->task_head,
-					 thread.hardwall_list) {
+		struct hardwall_type *hwt = info->type;
+		spin_lock_irqsave(&hwt->lock, flags);
+		list_for_each_entry_safe(task, tmp, &info->task_head,
+					 thread.hardwall[hwt->index].list) {
 			if (task->files == owner || task->files == NULL)
-				_hardwall_deactivate(task);
+				_hardwall_deactivate(hwt, task);
 		}
-		spin_unlock_irqrestore(&hardwall_lock, flags);
+		spin_unlock_irqrestore(&hwt->lock, flags);
 	}
 
 	return 0;
@@ -824,11 +1080,11 @@ static int __init dev_hardwall_init(void)
 	int rc;
 	dev_t dev;
 
-	rc = alloc_chrdev_region(&dev, 0, 1, "hardwall");
+	rc = alloc_chrdev_region(&dev, 0, HARDWALL_TYPES, "hardwall");
 	if (rc < 0)
 		return rc;
 	cdev_init(&hardwall_dev, &dev_hardwall_fops);
-	rc = cdev_add(&hardwall_dev, dev, 1);
+	rc = cdev_add(&hardwall_dev, dev, HARDWALL_TYPES);
 	if (rc < 0)
 		return rc;
 
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index cbf7334..7fa656a 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1226,7 +1226,7 @@ STD_ENTRY(fill_ra_stack)
 	int_hand     INT_UNALIGN_DATA, UNALIGN_DATA, int_unalign
 	int_hand     INT_DTLB_MISS, DTLB_MISS, do_page_fault
 	int_hand     INT_DTLB_ACCESS, DTLB_ACCESS, do_page_fault
-	int_hand     INT_IDN_FIREWALL, IDN_FIREWALL, bad_intr
+	int_hand     INT_IDN_FIREWALL, IDN_FIREWALL, do_hardwall_trap
 	int_hand     INT_UDN_FIREWALL, UDN_FIREWALL, do_hardwall_trap
 	int_hand     INT_TILE_TIMER, TILE_TIMER, do_timer_interrupt
 	int_hand     INT_IDN_TIMER, IDN_TIMER, bad_intr
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 3be7eb5..0caf686 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -145,10 +145,10 @@ void free_thread_info(struct thread_info *info)
 	 * Calling deactivate here just frees up the data structures.
 	 * If the task we're freeing held the last reference to a
 	 * hardwall fd, it would have been released prior to this point
-	 * anyway via exit_files(), and "hardwall" would be NULL by now.
+	 * anyway via exit_files(), and the hardwall_task.info pointers
+	 * would be NULL by now.
 	 */
-	if (info->task->thread.hardwall)
-		hardwall_deactivate(info->task);
+	hardwall_deactivate_all(info->task);
 #endif
 
 	if (step_state) {
@@ -264,7 +264,8 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 
 #ifdef CONFIG_HARDWALL
 	/* New thread does not own any networks. */
-	p->thread.hardwall = NULL;
+	memset(&p->thread.hardwall[0], 0,
+	       sizeof(struct hardwall_task) * HARDWALL_TYPES);
 #endif
 
 
@@ -534,12 +535,7 @@ struct task_struct *__sched _switch_to(struct task_struct *prev,
 
 #ifdef CONFIG_HARDWALL
 	/* Enable or disable access to the network registers appropriately. */
-	if (prev->thread.hardwall != NULL) {
-		if (next->thread.hardwall == NULL)
-			restrict_network_mpls();
-	} else if (next->thread.hardwall != NULL) {
-		grant_network_mpls();
-	}
+	hardwall_switch_tasks(prev, next);
 #endif
 
 	/*
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: allow querying cpu module information from the hypervisor
       [not found] <4F761E1C.80808.com>
                   ` (38 preceding siblings ...)
  2012-03-30 20:01 ` [PATCH] arch/tile: fix hardwall for tilegx and generalize for idn and ipi Chris Metcalf
@ 2012-03-30 20:21 ` Chris Metcalf
  2012-03-30 20:24 ` [PATCH] arch/tile: return SIGBUS for addresses that are unaligned AND invalid Chris Metcalf
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 20:21 UTC (permalink / raw)
  To: Chris Metcalf, Lucas De Marchi, Kay Sievers, Greg Kroah-Hartman,
	Arnd Bergmann, linux-kernel

This just adds a few more attributes to the information Linux
can query from the hypervisor for the /sys/hypervisor/board/ directory,
providing part, serial#, revision#, and description for cpu modules
(as opposed to the board itself, or any mezzanine boards).

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/hv/hypervisor.h |   14 +++++++++++++-
 arch/tile/kernel/sysfs.c          |    8 ++++++++
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/hv/hypervisor.h b/arch/tile/include/hv/hypervisor.h
index 3bc4045..5ae7faa 100644
--- a/arch/tile/include/hv/hypervisor.h
+++ b/arch/tile/include/hv/hypervisor.h
@@ -508,7 +508,19 @@ typedef enum {
   HV_CONFSTR_SWITCH_CONTROL  = 14,
 
   /** Chip revision level. */
-  HV_CONFSTR_CHIP_REV        = 15
+  HV_CONFSTR_CHIP_REV        = 15,
+
+  /** CPU module part number. */
+  HV_CONFSTR_CPUMOD_PART_NUM = 16,
+
+  /** CPU module serial number. */
+  HV_CONFSTR_CPUMOD_SERIAL_NUM = 17,
+
+  /** CPU module revision level. */
+  HV_CONFSTR_CPUMOD_REV      = 18,
+
+  /** Human-readable CPU module description. */
+  HV_CONFSTR_CPUMOD_DESC     = 19
 
 } HV_ConfstrQuery;
 
diff --git a/arch/tile/kernel/sysfs.c b/arch/tile/kernel/sysfs.c
index 71ae728..e25b0a8 100644
--- a/arch/tile/kernel/sysfs.c
+++ b/arch/tile/kernel/sysfs.c
@@ -93,6 +93,10 @@ HV_CONF_ATTR(mezz_part,		HV_CONFSTR_MEZZ_PART_NUM)
 HV_CONF_ATTR(mezz_serial,	HV_CONFSTR_MEZZ_SERIAL_NUM)
 HV_CONF_ATTR(mezz_revision,	HV_CONFSTR_MEZZ_REV)
 HV_CONF_ATTR(mezz_description,	HV_CONFSTR_MEZZ_DESC)
+HV_CONF_ATTR(cpumod_part,	HV_CONFSTR_CPUMOD_PART_NUM)
+HV_CONF_ATTR(cpumod_serial,	HV_CONFSTR_CPUMOD_SERIAL_NUM)
+HV_CONF_ATTR(cpumod_revision,	HV_CONFSTR_CPUMOD_REV)
+HV_CONF_ATTR(cpumod_description,HV_CONFSTR_CPUMOD_DESC)
 HV_CONF_ATTR(switch_control,	HV_CONFSTR_SWITCH_CONTROL)
 
 static struct attribute *board_attrs[] = {
@@ -104,6 +108,10 @@ static struct attribute *board_attrs[] = {
 	&dev_attr_mezz_serial.attr,
 	&dev_attr_mezz_revision.attr,
 	&dev_attr_mezz_description.attr,
+	&dev_attr_cpumod_part.attr,
+	&dev_attr_cpumod_serial.attr,
+	&dev_attr_cpumod_revision.attr,
+	&dev_attr_cpumod_description.attr,
 	&dev_attr_switch_control.attr,
 	NULL
 };
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: return SIGBUS for addresses that are unaligned AND invalid
       [not found] <4F761E1C.80808.com>
                   ` (39 preceding siblings ...)
  2012-03-30 20:21 ` [PATCH] arch/tile: allow querying cpu module information from the hypervisor Chris Metcalf
@ 2012-03-30 20:24 ` Chris Metcalf
  2012-03-30 20:27 ` [PATCH] arch/tile: remove bogus performance optimization Chris Metcalf
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 20:24 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

Previously we were returning SIGSEGV in this case.  It seems cleaner
to return SIGBUS since the hardware figures out alignment traps
before TLB violations, so SIGBUS is the "more correct" signal.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/single_step.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/tile/kernel/single_step.c b/arch/tile/kernel/single_step.c
index b231ef4..36ccd37 100644
--- a/arch/tile/kernel/single_step.c
+++ b/arch/tile/kernel/single_step.c
@@ -152,6 +152,25 @@ static tile_bundle_bits rewrite_load_store_unaligned(
 	if (((unsigned long)addr % size) == 0)
 		return bundle;
 
+	/*
+	 * Return SIGBUS with the unaligned address, if requested.
+	 * Note that we return SIGBUS even for completely invalid addresses
+	 * as long as they are in fact unaligned; this matches what the
+	 * tilepro hardware would be doing, if it could provide us with the
+	 * actual bad address in an SPR, which it doesn't.
+	 */
+	if (unaligned_fixup == 0) {
+		siginfo_t info = {
+			.si_signo = SIGBUS,
+			.si_code = BUS_ADRALN,
+			.si_addr = addr
+		};
+		trace_unhandled_signal("unaligned trap", regs,
+				       (unsigned long)addr, SIGBUS);
+		force_sig_info(info.si_signo, &info, current);
+		return (tilepro_bundle_bits) 0;
+	}
+
 	/* Handle unaligned load/store */
 	if (mem_op == MEMOP_LOAD || mem_op == MEMOP_LOAD_POSTINCR) {
 		unsigned short val_16;
@@ -199,18 +218,6 @@ static tile_bundle_bits rewrite_load_store_unaligned(
 		return (tile_bundle_bits) 0;
 	}
 
-	if (unaligned_fixup == 0) {
-		siginfo_t info = {
-			.si_signo = SIGBUS,
-			.si_code = BUS_ADRALN,
-			.si_addr = addr
-		};
-		trace_unhandled_signal("unaligned trap", regs,
-				       (unsigned long)addr, SIGBUS);
-		force_sig_info(info.si_signo, &info, current);
-		return (tile_bundle_bits) 0;
-	}
-
 	if (unaligned_printk || unaligned_fixup_count == 0) {
 		pr_info("Process %d/%s: PC %#lx: Fixup of"
 			" unaligned %s at %#lx.\n",
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: remove bogus performance optimization
       [not found] <4F761E1C.80808.com>
                   ` (40 preceding siblings ...)
  2012-03-30 20:24 ` [PATCH] arch/tile: return SIGBUS for addresses that are unaligned AND invalid Chris Metcalf
@ 2012-03-30 20:27 ` Chris Metcalf
  2012-03-30 20:29 ` [PATCH] arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally Chris Metcalf
  2012-03-30 20:31 ` [PATCH] arch/tile: add descriptive text if the kernel reports a bad trap Chris Metcalf
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 20:27 UTC (permalink / raw)
  To: Chris Metcalf, Andrew Morton, Julia Lawall, Peter Zijlstra, linux-kernel

We were re-homing the initial task's kernel stack on the boot cpu,
but in fact it's better to let it stay globally homed, since that
task isn't bound to the boot cpu anyway.  This is more of a general
cleanup than an actual performance optimization, but it removes
code, which is a good thing. :-)

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/init.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 5276f05..0138e8c 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -250,11 +250,6 @@ static pgprot_t __init init_pgprot(ulong address)
 		return construct_pgprot(PAGE_KERNEL_RO, PAGE_HOME_IMMUTABLE);
 	}
 
-	/* As a performance optimization, keep the boot init stack here. */
-	if (address >= (ulong)&init_thread_union &&
-	    address < (ulong)&init_thread_union + THREAD_SIZE)
-		return construct_pgprot(PAGE_KERNEL, smp_processor_id());
-
 #ifndef __tilegx__
 #if !ATOMIC_LOCKS_FOUND_VIA_TABLE()
 	/* Force the atomic_locks[] array page to be hash-for-home. */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally
       [not found] <4F761E1C.80808.com>
                   ` (41 preceding siblings ...)
  2012-03-30 20:27 ` [PATCH] arch/tile: remove bogus performance optimization Chris Metcalf
@ 2012-03-30 20:29 ` Chris Metcalf
  2012-03-30 20:31 ` [PATCH] arch/tile: add descriptive text if the kernel reports a bad trap Chris Metcalf
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 20:29 UTC (permalink / raw)
  To: Chris Metcalf, Andrew Morton, Eric Dumazet, Mike Frysinger,
	Arun Sharma, Dmitry Torokhov, linux-kernel

The return path as we reload registers and core state requires that r30
hold a boolean indicating whether we are returning from an NMI, but in a
couple of cases we weren't setting this properly, with the result that we
could accidentally unmask the NMI interrupt(s), which could cause confusion.
Now we set r30 in every place where we jump into the interrupt return path.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/intvec_32.S |   24 ++++++++++++++++++++----
 arch/tile/kernel/intvec_64.S |   19 ++++++++++++++++---
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index aecc8ed..5d56a1e 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -799,6 +799,10 @@ handle_interrupt:
  * This routine takes a boolean in r30 indicating if this is an NMI.
  * If so, we also expect a boolean in r31 indicating whether to
  * re-enable the oprofile interrupts.
+ *
+ * Note that .Lresume_userspace is jumped to directly in several
+ * places, and we need to make sure r30 is set correctly in those
+ * callers as well.
  */
 STD_ENTRY(interrupt_return)
 	/* If we're resuming to kernel space, don't check thread flags. */
@@ -1237,7 +1241,10 @@ handle_syscall:
 	bzt     r30, 1f
 	jal	do_syscall_trace
 	FEEDBACK_REENTER(handle_syscall)
-1:	j       .Lresume_userspace   /* jump into middle of interrupt_return */
+1:	{
+	 movei  r30, 0               /* not an NMI */
+	 j      .Lresume_userspace   /* jump into middle of interrupt_return */
+	}
 
 .Linvalid_syscall:
 	/* Report an invalid syscall back to the user program */
@@ -1246,7 +1253,10 @@ handle_syscall:
 	 movei  r28, -ENOSYS
 	}
 	sw      r29, r28
-	j       .Lresume_userspace   /* jump into middle of interrupt_return */
+	{
+	 movei  r30, 0               /* not an NMI */
+	 j      .Lresume_userspace   /* jump into middle of interrupt_return */
+	}
 	STD_ENDPROC(handle_syscall)
 
 	/* Return the address for oprofile to suppress in backtraces. */
@@ -1262,7 +1272,10 @@ STD_ENTRY(ret_from_fork)
 	jal     sim_notify_fork
 	jal     schedule_tail
 	FEEDBACK_REENTER(ret_from_fork)
-	j       .Lresume_userspace   /* jump into middle of interrupt_return */
+	{
+	 movei  r30, 0               /* not an NMI */
+	 j      .Lresume_userspace   /* jump into middle of interrupt_return */
+	}
 	STD_ENDPROC(ret_from_fork)
 
 	/*
@@ -1376,7 +1389,10 @@ handle_ill:
 
 	jal     send_sigtrap    /* issue a SIGTRAP */
 	FEEDBACK_REENTER(handle_ill)
-	j       .Lresume_userspace   /* jump into middle of interrupt_return */
+	{
+	 movei  r30, 0               /* not an NMI */
+	 j      .Lresume_userspace   /* jump into middle of interrupt_return */
+	}
 
 .Ldispatch_normal_ill:
 	{
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 7fa656a..8b5daed 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -614,6 +614,10 @@ handle_interrupt:
  * This routine takes a boolean in r30 indicating if this is an NMI.
  * If so, we also expect a boolean in r31 indicating whether to
  * re-enable the oprofile interrupts.
+ *
+ * Note that .Lresume_userspace is jumped to directly in several
+ * places, and we need to make sure r30 is set correctly in those
+ * callers as well.
  */
 STD_ENTRY(interrupt_return)
 	/* If we're resuming to kernel space, don't check thread flags. */
@@ -1066,7 +1070,10 @@ handle_syscall:
 	}
 	FEEDBACK_REENTER(handle_syscall)
 
-2:	j       .Lresume_userspace   /* jump into middle of interrupt_return */
+2:	{
+	 movei  r30, 0               /* not an NMI */
+	 j      .Lresume_userspace   /* jump into middle of interrupt_return */
+	}
 
 .Lcompat_syscall:
 	/*
@@ -1100,7 +1107,10 @@ handle_syscall:
 	 movei  r28, -ENOSYS
 	}
 	st      r29, r28
-	j       .Lresume_userspace   /* jump into middle of interrupt_return */
+	{
+	 movei  r30, 0               /* not an NMI */
+	 j      .Lresume_userspace   /* jump into middle of interrupt_return */
+	}
 	STD_ENDPROC(handle_syscall)
 
 	/* Return the address for oprofile to suppress in backtraces. */
@@ -1116,7 +1126,10 @@ STD_ENTRY(ret_from_fork)
 	jal     sim_notify_fork
 	jal     schedule_tail
 	FEEDBACK_REENTER(ret_from_fork)
-	j       .Lresume_userspace
+	{
+	 movei  r30, 0               /* not an NMI */
+	 j      .Lresume_userspace   /* jump into middle of interrupt_return */
+	}
 	STD_ENDPROC(ret_from_fork)
 
 /* Various stub interrupt handlers and syscall handlers */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] arch/tile: add descriptive text if the kernel reports a bad trap
       [not found] <4F761E1C.80808.com>
                   ` (42 preceding siblings ...)
  2012-03-30 20:29 ` [PATCH] arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally Chris Metcalf
@ 2012-03-30 20:31 ` Chris Metcalf
  43 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2012-03-30 20:31 UTC (permalink / raw)
  To: Chris Metcalf, linux-kernel

If the kernel unexpectedly takes a bad trap, it's convenient to
have it report the type of trap as part of the error.  This gives
customers a bit more context before they call up customer support.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/traps.c |   30 ++++++++++++++++++++++++++++--
 1 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 4c33057..871d497 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -194,6 +194,25 @@ static int special_ill(bundle_bits bundle, int *sigp, int *codep)
 	return 1;
 }
 
+static const char *const int_name[] = {
+	[INT_MEM_ERROR] = "Memory error",
+	[INT_ILL] = "Illegal instruction",
+	[INT_GPV] = "General protection violation",
+	[INT_UDN_ACCESS] = "UDN access",
+	[INT_IDN_ACCESS] = "IDN access",
+#if CHIP_HAS_SN()
+	[INT_SN_ACCESS] = "SN access",
+#endif
+	[INT_SWINT_3] = "Software interrupt 3",
+	[INT_SWINT_2] = "Software interrupt 2",
+	[INT_SWINT_0] = "Software interrupt 0",
+	[INT_UNALIGN_DATA] = "Unaligned data",
+	[INT_DOUBLE_FAULT] = "Double fault",
+#ifdef __tilegx__
+	[INT_ILL_TRANS] = "Illegal virtual address",
+#endif
+};
+
 void __kprobes do_trap(struct pt_regs *regs, int fault_num,
 		       unsigned long reason)
 {
@@ -210,10 +229,17 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,
 	 * current process and hope for the best.
 	 */
 	if (!user_mode(regs)) {
+		const char *name;
 		if (fixup_exception(regs))  /* only UNALIGN_DATA in practice */
 			return;
-		pr_alert("Kernel took bad trap %d at PC %#lx\n",
-		       fault_num, regs->pc);
+		if (fault_num >= 0 &&
+		    fault_num < sizeof(int_name)/sizeof(int_name[0]) &&
+		    int_name[fault_num] != NULL)
+			name = int_name[fault_num];
+		else
+			name = "Unknown interrupt";
+		pr_alert("Kernel took bad trap %d (%s) at PC %#lx\n",
+			 fault_num, name, regs->pc);
 		if (fault_num == INT_GPV)
 			pr_alert("GPV_REASON is %#lx\n", reason);
 		show_regs(regs);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] arch/tile: revert comment for atomic64_add_unless().
  2012-03-27 18:10 ` [PATCH] arch/tile: revert comment for atomic64_add_unless() Chris Metcalf
@ 2012-03-30 21:19   ` Arun Sharma
  0 siblings, 0 replies; 45+ messages in thread
From: Arun Sharma @ 2012-03-30 21:19 UTC (permalink / raw)
  To: Chris Metcalf; +Cc: Andrew Morton, Eric Dumazet, Mike Frysinger, linux-kernel

On 3/27/12 11:10 AM, Chris Metcalf wrote:
> --- a/arch/tile/include/asm/atomic_32.h
> +++ b/arch/tile/include/asm/atomic_32.h
> @@ -199,7 +199,7 @@ static inline u64 atomic64_add_return(u64 i, atomic64_t *v)
>    * @u: ...unless v is equal to u.
>    *
>    * Atomically adds @a to @v, so long as @v was not already @u.
> - * Returns the old value of @v.
> + * Returns non-zero if @v was not @u, and zero otherwise.
>    */
>   static inline u64 atomic64_add_unless(atomic64_t *v, u64 a, u64 u)
>   {

Acked-by: Arun Sharma <asharma@fb.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2012-03-30 23:06 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4F761E1C.80808.com>
2012-02-15  4:58 ` [PATCH] arch/tile: remove references to cpu_*_map Rusty Russell
2012-03-27 17:47 ` [PATCH] arch/tile/Kconfig: remove pointless "!M386" test Chris Metcalf
2012-03-27 17:53 ` [PATCH] arch/tile/Kconfig: rename tile_defconfig to tilepro_defconfig Chris Metcalf
2012-03-27 17:56 ` [PATCH] arch/tile/Kconfig: don't specify CONFIG_PAGE_OFFSET for 64-bit builds Chris Metcalf
2012-03-27 18:04 ` [PATCH] arch/tile: fix typo in <arch/spr_def.h> Chris Metcalf
2012-03-27 18:10 ` [PATCH] arch/tile: revert comment for atomic64_add_unless() Chris Metcalf
2012-03-30 21:19   ` Arun Sharma
2012-03-27 18:17 ` [PATCH] arch/tile: fix gcc 4.6 warnings in <asm/bitops_64.h> Chris Metcalf
2012-03-27 19:21 ` [PATCH] arch/tile: use 0 for IRQ_RESCHEDULE instead of 1 Chris Metcalf
2012-03-27 19:40 ` [PATCH] arch/tile: use interrupt critical sections less Chris Metcalf
2012-03-29 17:30 ` [PATCH] arch/tile: support building big-endian kernel Chris Metcalf
2012-03-29 17:39 ` [PATCH] arch/tile: optimize get_user/put_user and friends Chris Metcalf
2012-03-29 17:58 ` [PATCH] arch/tile: Allow tilegx to build with either 16K or 64K page size Chris Metcalf
2012-03-29 18:02 ` [PATCH] arch/tile: avoid false corrupt frame warning in early boot Chris Metcalf
2012-03-29 18:05 ` [PATCH] arch/tile: make sure to build memcpy_user_64 without frame pointer Chris Metcalf
2012-03-29 18:06 ` [PATCH] arch/tile: various improvements to stack backtracer Chris Metcalf
2012-03-29 18:52 ` [PATCH] arch/tile: work around a hardware issue with the return-address stack Chris Metcalf
2012-03-29 19:23 ` [PATCH] arch/tile: improve trap handling a bit Chris Metcalf
2012-03-29 19:25 ` [PATCH] arch/tile: support <asm/cachectl.h> header for cacheflush() syscall Chris Metcalf
2012-03-29 19:29 ` [PATCH] arch/tile: fix a couple of comments that needed updating Chris Metcalf
2012-03-29 19:30 ` [PATCH] arch/tile/Makefile: use KCFLAGS when figuring out the libgcc path Chris Metcalf
2012-03-29 19:34 ` [PATCH] arch/tile: don't wait for migrating PTEs in an NMI handler Chris Metcalf
2012-03-29 19:36 ` [PATCH] arch/tile: don't set the homecache of a PTE unless appropriate Chris Metcalf
2012-03-29 19:40 ` [PATCH] arch/tile: don't enable irqs unconditionally in page fault handler Chris Metcalf
2012-03-29 19:42 ` [PATCH] arch/tile: support loading kernels larger than 16 MB Chris Metcalf
2012-03-29 19:43 ` [PATCH] arch/tile: fix bug in delay_backoff() Chris Metcalf
2012-03-29 19:44 ` [PATCH] arch/tile: don't leak kernel memory when we unload modules Chris Metcalf
2012-03-29 19:48 ` [PATCH] arch/tile: support kexec() for tilegx Chris Metcalf
2012-03-29 19:50 ` [PATCH] arch/tile: fix up locking in pgtable.c slightly Chris Metcalf
2012-03-29 19:56 ` [PATCH] arch/tile: use memparse() for "maxmem" and "maxnodemem" options Chris Metcalf
2012-03-29 19:57 ` [PATCH] arch/tile: add "nop" after "nap" to help GX idle power draw Chris Metcalf
2012-03-29 19:59 ` [PATCH] arch/tile: implement panic_smp_self_stop() Chris Metcalf
2012-03-29 20:11 ` [PATCH] arch/tile: fix single-stepping over swint1 instructions on tilegx Chris Metcalf
2012-03-29 20:14 ` [PATCH] arch/tile: fix pointer cast in cacheflush.c Chris Metcalf
2012-03-29 20:19 ` [PATCH] arch/tile: export the page_home() function Chris Metcalf
2012-03-30 19:29 ` [PATCH] arch/tile: stop mentioning the "kvm" subdirectory Chris Metcalf
2012-03-30 19:46 ` [PATCH] arch/tile: use atomic exchange in arch_write_unlock() Chris Metcalf
2012-03-30 19:47 ` [PATCH] arch/tile: fix finv_buffer_remote() for tilegx Chris Metcalf
2012-03-30 19:55 ` [PATCH] arch/tile: fix a reference to cpu_possible_map in a comment Chris Metcalf
2012-03-30 20:01 ` [PATCH] arch/tile: fix hardwall for tilegx and generalize for idn and ipi Chris Metcalf
2012-03-30 20:21 ` [PATCH] arch/tile: allow querying cpu module information from the hypervisor Chris Metcalf
2012-03-30 20:24 ` [PATCH] arch/tile: return SIGBUS for addresses that are unaligned AND invalid Chris Metcalf
2012-03-30 20:27 ` [PATCH] arch/tile: remove bogus performance optimization Chris Metcalf
2012-03-30 20:29 ` [PATCH] arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally Chris Metcalf
2012-03-30 20:31 ` [PATCH] arch/tile: add descriptive text if the kernel reports a bad trap Chris Metcalf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).