All of lore.kernel.org
 help / color / mirror / Atom feed
* arch/tile: various fixes for 2.6.39
@ 2011-02-27 23:52 Chris Metcalf
  2011-02-27 23:52 ` [PATCH] arch/tile: catch up with section naming convention in 2.6.35 Chris Metcalf
                   ` (16 more replies)
  0 siblings, 17 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-27 23:52 UTC (permalink / raw)
  To: linux-kernel

The following emails (follow-ups to this one) are all more or less
standalone changes to the arch/tile code, intended for 2.6.39.  Feel free
to skip over the series if you're not interested in Tilera internals.

Patches that either follow up on LKML discussions, or have a wider
distribution than just LKML (e.g. netdev, etc.), I've broken out and
are not part of this series.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: catch up with section naming convention in 2.6.35
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
@ 2011-02-27 23:52 ` Chris Metcalf
  2011-03-01 20:52   ` Sam Ravnborg
  2011-02-28 18:08 ` [PATCH] arch/tile: bug fix: exec'ed task thought it was still single-stepping Chris Metcalf
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 20+ messages in thread
From: Chris Metcalf @ 2011-02-27 23:52 UTC (permalink / raw)
  To: linux-kernel

The convention changed to, e.g., ".data..page_aligned".  This commit
fixes the places in the tile architecture that were still using the
old convention.  One tile-specific section (.init.page) was dropped
in favor of just using an "aligned" attribute.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/cache.h  |    2 +-
 arch/tile/kernel/head_32.S     |    4 ++--
 arch/tile/kernel/vmlinux.lds.S |    5 +----
 arch/tile/lib/atomic_32.c      |    2 +-
 arch/tile/mm/init.c            |    2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/tile/include/asm/cache.h b/arch/tile/include/asm/cache.h
index 08a2815..392e533 100644
--- a/arch/tile/include/asm/cache.h
+++ b/arch/tile/include/asm/cache.h
@@ -40,7 +40,7 @@
 #define INTERNODE_CACHE_BYTES   L2_CACHE_BYTES
 
 /* Group together read-mostly things to avoid cache false sharing */
-#define __read_mostly __attribute__((__section__(".data.read_mostly")))
+#define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 /*
  * Attribute for data that is kept read/write coherent until the end of
diff --git a/arch/tile/kernel/head_32.S b/arch/tile/kernel/head_32.S
index 90e7c44..fc49b62 100644
--- a/arch/tile/kernel/head_32.S
+++ b/arch/tile/kernel/head_32.S
@@ -133,7 +133,7 @@ ENTRY(_start)
 	}
 	ENDPROC(_start)
 
-.section ".bss.page_aligned","w"
+.section ".bss..page_aligned","w"
 	.align PAGE_SIZE
 ENTRY(empty_zero_page)
 	.fill PAGE_SIZE,1,0
@@ -148,7 +148,7 @@ ENTRY(empty_zero_page)
 	.word (\bits1) | (HV_CPA_TO_PFN(\cpa) << HV_PTE_INDEX_PFN)
 	.endm
 
-.section ".data.page_aligned","wa"
+.section ".data..page_aligned","wa"
 	.align PAGE_SIZE
 ENTRY(swapper_pg_dir)
 	/*
diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
index 25fdc0c..4e211c1 100644
--- a/arch/tile/kernel/vmlinux.lds.S
+++ b/arch/tile/kernel/vmlinux.lds.S
@@ -59,10 +59,7 @@ SECTIONS
 
   . = ALIGN(PAGE_SIZE);
   VMLINUX_SYMBOL(_sinitdata) = .;
-  .init.page : AT (ADDR(.init.page) - LOAD_OFFSET) {
-    *(.init.page)
-  } :data =0
-  INIT_DATA_SECTION(16)
+  INIT_DATA_SECTION(16) :data =0
   PERCPU(PAGE_SIZE)
   . = ALIGN(PAGE_SIZE);
   VMLINUX_SYMBOL(_einitdata) = .;
diff --git a/arch/tile/lib/atomic_32.c b/arch/tile/lib/atomic_32.c
index 7a5cc70..59e3a02 100644
--- a/arch/tile/lib/atomic_32.c
+++ b/arch/tile/lib/atomic_32.c
@@ -47,7 +47,7 @@ struct atomic_locks_on_cpu *atomic_lock_ptr[ATOMIC_HASH_L1_SIZE]
 
 /* This page is remapped on startup to be hash-for-home. */
 int atomic_locks[PAGE_SIZE / sizeof(int) /* Only ATOMIC_HASH_SIZE is used */]
-  __attribute__((aligned(PAGE_SIZE), section(".bss.page_aligned")));
+  __attribute__((aligned(PAGE_SIZE), section(".bss..page_aligned")));
 
 #endif /* ATOMIC_LOCKS_FOUND_VIA_TABLE() */
 
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 0b9ce69..e34597e 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -445,7 +445,7 @@ static pmd_t *__init get_pmd(pgd_t pgtables[], unsigned long va)
 
 /* Temporary page table we use for staging. */
 static pgd_t pgtables[PTRS_PER_PGD]
- __attribute__((section(".init.page")));
+ __attribute__((aligned(HV_PAGE_TABLE_ALIGN)));
 
 /*
  * This maps the physical memory to kernel virtual address space, a total
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: bug fix: exec'ed task thought it was still single-stepping
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
  2011-02-27 23:52 ` [PATCH] arch/tile: catch up with section naming convention in 2.6.35 Chris Metcalf
@ 2011-02-28 18:08 ` Chris Metcalf
  2011-02-28 18:21 ` [PATCH] arch/tile: fix __ndelay etc to work better Chris Metcalf
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 18:08 UTC (permalink / raw)
  To: linux-kernel

To handle single-step, tile mmap's a page of memory in the process
space for each thread and uses it to construct a version of the
instruction that we want to single step.  If the process exec's,
though, we lose that mapping, and the kernel needs to be aware that
it will need to recreate it if the exec'ed process than tries to
single-step as well.

Also correct some int32_t to s32 for better kernel style.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/ptrace.h |    3 +++
 arch/tile/kernel/process.c     |    4 ++++
 arch/tile/kernel/single_step.c |   21 +++++++++++++++++++--
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/tile/include/asm/ptrace.h b/arch/tile/include/asm/ptrace.h
index ac6d343..6be2246 100644
--- a/arch/tile/include/asm/ptrace.h
+++ b/arch/tile/include/asm/ptrace.h
@@ -141,6 +141,9 @@ struct single_step_state {
 /* Single-step the instruction at regs->pc */
 extern void single_step_once(struct pt_regs *regs);
 
+/* Clean up after execve(). */
+extern void single_step_execve(void);
+
 struct task_struct;
 
 extern void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index e90eb53..5db8b5b 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -574,6 +574,8 @@ SYSCALL_DEFINE4(execve, const char __user *, path,
 		goto out;
 	error = do_execve(filename, argv, envp, regs);
 	putname(filename);
+	if (error == 0)
+		single_step_execve();
 out:
 	return error;
 }
@@ -593,6 +595,8 @@ long compat_sys_execve(const char __user *path,
 		goto out;
 	error = compat_do_execve(filename, argv, envp, regs);
 	putname(filename);
+	if (error == 0)
+		single_step_execve();
 out:
 	return error;
 }
diff --git a/arch/tile/kernel/single_step.c b/arch/tile/kernel/single_step.c
index 1eb3b39..84a729e 100644
--- a/arch/tile/kernel/single_step.c
+++ b/arch/tile/kernel/single_step.c
@@ -56,7 +56,7 @@ enum mem_op {
 	MEMOP_STORE_POSTINCR
 };
 
-static inline tile_bundle_bits set_BrOff_X1(tile_bundle_bits n, int32_t offset)
+static inline tile_bundle_bits set_BrOff_X1(tile_bundle_bits n, s32 offset)
 {
 	tile_bundle_bits result;
 
@@ -254,6 +254,18 @@ P("\n");
 	return bundle;
 }
 
+/*
+ * Called after execve() has started the new image.  This allows us
+ * to reset the info state.  Note that the the mmap'ed memory, if there
+ * was any, has already been unmapped by the exec.
+ */
+void single_step_execve(void)
+{
+	struct thread_info *ti = current_thread_info();
+	kfree(ti->step_state);
+	ti->step_state = NULL;
+}
+
 /**
  * single_step_once() - entry point when single stepping has been triggered.
  * @regs: The machine register state
@@ -373,7 +385,7 @@ void single_step_once(struct pt_regs *regs)
 		/* branches */
 		case BRANCH_OPCODE_X1:
 		{
-			int32_t offset = signExtend17(get_BrOff_X1(bundle));
+			s32 offset = signExtend17(get_BrOff_X1(bundle));
 
 			/*
 			 * For branches, we use a rewriting trick to let the
@@ -731,4 +743,9 @@ void single_step_once(struct pt_regs *regs)
 	__insn_mtspr(SPR_SINGLE_STEP_EN_K_K, 1 << USER_PL);
 }
 
+void single_step_execve(void)
+{
+	/* Nothing */
+}
+
 #endif /* !__tilegx__ */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: fix __ndelay etc to work better
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
  2011-02-27 23:52 ` [PATCH] arch/tile: catch up with section naming convention in 2.6.35 Chris Metcalf
  2011-02-28 18:08 ` [PATCH] arch/tile: bug fix: exec'ed task thought it was still single-stepping Chris Metcalf
@ 2011-02-28 18:21 ` Chris Metcalf
  2011-02-28 18:24 ` [PATCH] arch/tile: stop disabling INTCTRL_1 interrupts during hypervisor downcalls Chris Metcalf
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 18:21 UTC (permalink / raw)
  To: linux-kernel

The current implementations of __ndelay and __udelay call a hypervisor
service to delay, but the hypervisor service isn't actually implemented
very well, and the consensus is that Linux should handle figuring this
out natively and not use a hypervisor service.

By converting nanoseconds to cycles, and then spinning until the
cycle counter reaches the desired cycle, we get several benefits:
first, we are sensitive to the actual clock speed; second, we use
less power by issuing a slow SPR read once every six cycles while
we delay; and third, we properly handle the case of an interrupt by
exiting at the target time rather than after some number of cycles.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/timex.h     |    3 +++
 arch/tile/include/hv/hypervisor.h |    5 +++++
 arch/tile/kernel/entry.S          |    6 ------
 arch/tile/kernel/time.c           |   10 ++++++++++
 arch/tile/lib/delay.c             |   21 ++++++++++++++++-----
 5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/tile/include/asm/timex.h b/arch/tile/include/asm/timex.h
index 3baf5fc..29921f0 100644
--- a/arch/tile/include/asm/timex.h
+++ b/arch/tile/include/asm/timex.h
@@ -38,6 +38,9 @@ static inline cycles_t get_cycles(void)
 
 cycles_t get_clock_rate(void);
 
+/* Convert nanoseconds to core clock cycles. */
+cycles_t ns2cycles(unsigned long nsecs);
+
 /* Called at cpu initialization to set some low-level constants. */
 void setup_clock(void);
 
diff --git a/arch/tile/include/hv/hypervisor.h b/arch/tile/include/hv/hypervisor.h
index f672544..103986b 100644
--- a/arch/tile/include/hv/hypervisor.h
+++ b/arch/tile/include/hv/hypervisor.h
@@ -964,6 +964,11 @@ HV_ASIDRange hv_inquire_asid(int idx);
 
 /** Waits for at least the specified number of nanoseconds then returns.
  *
+ * NOTE: this deprecated function currently assumes a 750 MHz clock,
+ * and is thus not generally suitable for use.  New code should call
+ * hv_sysconf(HV_SYSCONF_CPU_SPEED), compute a cycle count to wait for,
+ * and delay by looping while checking the cycle counter SPR.
+ *
  * @param nanosecs The number of nanoseconds to sleep.
  */
 void hv_nanosleep(int nanosecs);
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index fd8dc42..c3aa067 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -38,12 +38,6 @@ STD_ENTRY(kernel_execve)
 	jrp lr
 	STD_ENDPROC(kernel_execve)
 
-/* Delay a fixed number of cycles. */
-STD_ENTRY(__delay)
-	{ addi r0, r0, -1; bnzt r0, . }
-	jrp lr
-	STD_ENDPROC(__delay)
-
 /*
  * We don't run this function directly, but instead copy it to a page
  * we map into every user process.  See vdso_setup().
diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
index f2e156e..49a605b 100644
--- a/arch/tile/kernel/time.c
+++ b/arch/tile/kernel/time.c
@@ -224,3 +224,13 @@ int setup_profiling_timer(unsigned int multiplier)
 {
 	return -EINVAL;
 }
+
+/*
+ * Use the tile timer to convert nsecs to core clock cycles, relying
+ * on it having the same frequency as SPR_CYCLE.
+ */
+cycles_t ns2cycles(unsigned long nsecs)
+{
+	struct clock_event_device *dev = &__get_cpu_var(tile_timer);
+	return ((u64)nsecs * dev->mult) >> dev->shift;
+}
diff --git a/arch/tile/lib/delay.c b/arch/tile/lib/delay.c
index 5801b03..cdacdd1 100644
--- a/arch/tile/lib/delay.c
+++ b/arch/tile/lib/delay.c
@@ -15,20 +15,31 @@
 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/thread_info.h>
-#include <asm/fixmap.h>
-#include <hv/hypervisor.h>
+#include <asm/timex.h>
 
 void __udelay(unsigned long usecs)
 {
-	hv_nanosleep(usecs * 1000);
+	if (usecs > ULONG_MAX / 1000) {
+		WARN_ON_ONCE(usecs > ULONG_MAX / 1000);
+		usecs = ULONG_MAX / 1000;
+	}
+	__ndelay(usecs * 1000);
 }
 EXPORT_SYMBOL(__udelay);
 
 void __ndelay(unsigned long nsecs)
 {
-	hv_nanosleep(nsecs);
+	cycles_t target = get_cycles();
+	target += ns2cycles(nsecs);
+	while (get_cycles() < target)
+		cpu_relax();
 }
 EXPORT_SYMBOL(__ndelay);
 
-/* FIXME: should be declared in a header somewhere. */
+void __delay(unsigned long cycles)
+{
+	cycles_t target = get_cycles() + cycles;
+	while (get_cycles() < target)
+		cpu_relax();
+}
 EXPORT_SYMBOL(__delay);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: stop disabling INTCTRL_1 interrupts during hypervisor downcalls
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (2 preceding siblings ...)
  2011-02-28 18:21 ` [PATCH] arch/tile: fix __ndelay etc to work better Chris Metcalf
@ 2011-02-28 18:24 ` Chris Metcalf
  2011-02-28 18:32 ` [PATCH] arch/tile: warn and retry if an IPI is not accepted by the target cpu Chris Metcalf
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 18:24 UTC (permalink / raw)
  To: linux-kernel

The problem was that this could lead to IPIs being disabled during
the softirq processing after a hypervisor downcall (e.g. for I/O),
since both IPI and device interrupts use the INCTRL_1 downcall mechanism.
When this happened at the wrong time, it could lead to deadlock.

Luckily, we were already maintaining the per-interrupt state we need,
and using it in the proper way in the hypervisor, so all we had to do
was to change Linux to stop blocking downcall interrupts for the entire
length of the downcall.  (Now they're blocked while we're executing the
downcall routine itself, but not while we're executing any subsequent
softirq routines.)  The hypervisor is doing a very small amount of
work it no longer needs to do (masking INTCTRL_1 on entry to the client
interrupt routine), but doing so means that older versions of Tile Linux
will continue to work with a current hypervisor, so that seems reasonable.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/intvec_32.S |   54 ++++--------------------------------------
 1 files changed, 5 insertions(+), 49 deletions(-)

diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index 5eed4a0..abf92f5 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -32,10 +32,6 @@
 # error "No support for kernel preemption currently"
 #endif
 
-#if INT_INTCTRL_K < 32 || INT_INTCTRL_K >= 48
-# error INT_INTCTRL_K coded to set high interrupt mask
-#endif
-
 #define PTREGS_PTR(reg, ptreg) addli reg, sp, C_ABI_SAVE_AREA_SIZE + (ptreg)
 
 #define PTREGS_OFFSET_SYSCALL PTREGS_OFFSET_REG(TREG_SYSCALL_NR)
@@ -1199,46 +1195,6 @@ STD_ENTRY(interrupt_return)
 	STD_ENDPROC(interrupt_return)
 
 	/*
-	 * This interrupt variant clears the INT_INTCTRL_K interrupt mask bit
-	 * before returning, so we can properly get more downcalls.
-	 */
-	.pushsection .text.handle_interrupt_downcall,"ax"
-handle_interrupt_downcall:
-	finish_interrupt_save handle_interrupt_downcall
-	check_single_stepping normal, .Ldispatch_downcall
-.Ldispatch_downcall:
-
-	/* Clear INTCTRL_K from the set of interrupts we ever enable. */
-	GET_INTERRUPTS_ENABLED_MASK_PTR(r30)
-	{
-	 addi   r30, r30, 4
-	 movei  r31, INT_MASK(INT_INTCTRL_K)
-	}
-	{
-	 lw     r20, r30
-	 nor    r21, r31, zero
-	}
-	and     r20, r20, r21
-	sw      r30, r20
-
-	{
-	 jalr   r0
-	 PTREGS_PTR(r0, PTREGS_OFFSET_BASE)
-	}
-	FEEDBACK_REENTER(handle_interrupt_downcall)
-
-	/* Allow INTCTRL_K to be enabled next time we enable interrupts. */
-	lw      r20, r30
-	or      r20, r20, r31
-	sw      r30, r20
-
-	{
-	 movei  r30, 0   /* not an NMI */
-	 j      interrupt_return
-	}
-	STD_ENDPROC(handle_interrupt_downcall)
-
-	/*
 	 * Some interrupts don't check for single stepping
 	 */
 	.pushsection .text.handle_interrupt_no_single_step,"ax"
@@ -2014,17 +1970,17 @@ int_unalign:
 #endif
 	int_hand     INT_INTCTRL_0, INTCTRL_0, bad_intr
 	int_hand     INT_MESSAGE_RCV_DWNCL, MESSAGE_RCV_DWNCL, \
-		     hv_message_intr, handle_interrupt_downcall
+		     hv_message_intr
 	int_hand     INT_DEV_INTR_DWNCL, DEV_INTR_DWNCL, \
-		     tile_dev_intr, handle_interrupt_downcall
+		     tile_dev_intr
 	int_hand     INT_I_ASID, I_ASID, bad_intr
 	int_hand     INT_D_ASID, D_ASID, bad_intr
 	int_hand     INT_DMATLB_MISS_DWNCL, DMATLB_MISS_DWNCL, \
-		     do_page_fault, handle_interrupt_downcall
+		     do_page_fault
 	int_hand     INT_SNITLB_MISS_DWNCL, SNITLB_MISS_DWNCL, \
-		     do_page_fault, handle_interrupt_downcall
+		     do_page_fault
 	int_hand     INT_DMATLB_ACCESS_DWNCL, DMATLB_ACCESS_DWNCL, \
-		     do_page_fault, handle_interrupt_downcall
+		     do_page_fault
 	int_hand     INT_SN_CPL, SN_CPL, bad_intr
 	int_hand     INT_DOUBLE_FAULT, DOUBLE_FAULT, do_trap
 #if CHIP_HAS_AUX_PERF_COUNTERS()
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: warn and retry if an IPI is not accepted by the target cpu
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (3 preceding siblings ...)
  2011-02-28 18:24 ` [PATCH] arch/tile: stop disabling INTCTRL_1 interrupts during hypervisor downcalls Chris Metcalf
@ 2011-02-28 18:32 ` Chris Metcalf
  2011-02-28 18:35 ` [PATCH] arch/tile: export <asm/hardwall.h> to userspace Chris Metcalf
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 18:32 UTC (permalink / raw)
  To: linux-kernel

Previously we assumed this was impossible, but in fact it can happen.
Handle it gracefully by retrying after issuing a warning.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/smp.c |   33 +++++++++++++++++++--------------
 1 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index 9575b37..a429310 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -36,6 +36,22 @@ static unsigned long __iomem *ipi_mappings[NR_CPUS];
 /* Set by smp_send_stop() to avoid recursive panics. */
 static int stopping_cpus;
 
+static void __send_IPI_many(HV_Recipient *recip, int nrecip, int tag)
+{
+	int sent = 0;
+	while (sent < nrecip) {
+		int rc = hv_send_message(recip, nrecip,
+					 (HV_VirtAddr)&tag, sizeof(tag));
+		if (rc < 0) {
+			if (!stopping_cpus)  /* avoid recursive panic */
+				panic("hv_send_message returned %d", rc);
+			break;
+		}
+		WARN_ONCE(rc == 0, "hv_send_message() returned zero\n");
+		sent += rc;
+	}
+}
+
 void send_IPI_single(int cpu, int tag)
 {
 	HV_Recipient recip = {
@@ -43,14 +59,13 @@ void send_IPI_single(int cpu, int tag)
 		.x = cpu % smp_width,
 		.state = HV_TO_BE_SENT
 	};
-	int rc = hv_send_message(&recip, 1, (HV_VirtAddr)&tag, sizeof(tag));
-	BUG_ON(rc <= 0);
+	__send_IPI_many(&recip, 1, tag);
 }
 
 void send_IPI_many(const struct cpumask *mask, int tag)
 {
 	HV_Recipient recip[NR_CPUS];
-	int cpu, sent;
+	int cpu;
 	int nrecip = 0;
 	int my_cpu = smp_processor_id();
 	for_each_cpu(cpu, mask) {
@@ -61,17 +76,7 @@ void send_IPI_many(const struct cpumask *mask, int tag)
 		r->x = cpu % smp_width;
 		r->state = HV_TO_BE_SENT;
 	}
-	sent = 0;
-	while (sent < nrecip) {
-		int rc = hv_send_message(recip, nrecip,
-					 (HV_VirtAddr)&tag, sizeof(tag));
-		if (rc <= 0) {
-			if (!stopping_cpus)  /* avoid recursive panic */
-				panic("hv_send_message returned %d", rc);
-			break;
-		}
-		sent += rc;
-	}
+	__send_IPI_many(recip, nrecip, tag);
 }
 
 void send_IPI_allbutself(int tag)
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: export <asm/hardwall.h> to userspace
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (4 preceding siblings ...)
  2011-02-28 18:32 ` [PATCH] arch/tile: warn and retry if an IPI is not accepted by the target cpu Chris Metcalf
@ 2011-02-28 18:35 ` Chris Metcalf
  2011-02-28 20:01 ` [PATCH] arch/tile: avoid a simulator warning during bootup Chris Metcalf
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 18:35 UTC (permalink / raw)
  To: linux-kernel

This should have been as part of the initial hardwall submission to
LKML but was overlooked.  The header provides the ioctl definitions for
manipulating the hardwall fd, so needs to be available to userspace.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/Kbuild |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/tile/include/asm/Kbuild b/arch/tile/include/asm/Kbuild
index 3b8f55b..849ab2f 100644
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -1,3 +1,4 @@
 include include/asm-generic/Kbuild.asm
 
 header-y += ucontext.h
+header-y += hardwall.h
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: avoid a simulator warning during bootup
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (5 preceding siblings ...)
  2011-02-28 18:35 ` [PATCH] arch/tile: export <asm/hardwall.h> to userspace Chris Metcalf
@ 2011-02-28 20:01 ` Chris Metcalf
  2011-02-28 20:14 ` [PATCH] arch/tile: fix reversed test of strict_strtol() return value Chris Metcalf
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:01 UTC (permalink / raw)
  To: linux-kernel

As the added comment says, we can sometimes see a coherence warning
from our simulator if the "swapper_pgprot" variable on the boot cpu
has not been evicted from cache by the time the other cpus come up.
Force it to be evicted so we never see the warning.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/init.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index e34597e..9a62479 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -653,6 +653,17 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
 	memcpy(pgd_base, pgtables, sizeof(pgtables));
 	__install_page_table(pgd_base, __get_cpu_var(current_asid),
 			     swapper_pgprot);
+
+	/*
+	 * We just read swapper_pgprot and thus brought it into the cache,
+	 * with its new home & caching mode.  When we start the other CPUs,
+	 * they're going to reference swapper_pgprot via their initial fake
+	 * VA-is-PA mappings, which cache everything locally.  At that
+	 * time, if it's in our cache with a conflicting home, the
+	 * simulator's coherence checker will complain.  So, flush it out
+	 * of our cache; we're not going to ever use it again anyway.
+	 */
+	__insn_finv(&swapper_pgprot);
 }
 
 /*
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: fix reversed test of strict_strtol() return value
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (6 preceding siblings ...)
  2011-02-28 20:01 ` [PATCH] arch/tile: avoid a simulator warning during bootup Chris Metcalf
@ 2011-02-28 20:14 ` Chris Metcalf
  2011-02-28 20:19 ` [PATCH] arch/tile: sync up with <arch/sim.h> and <arch/sim_def.h> changes Chris Metcalf
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:14 UTC (permalink / raw)
  To: linux-kernel

This fixes the "initfree" boot argument.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/mm/init.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 9a62479..2ef9ce5 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -1000,7 +1000,7 @@ static long __write_once initfree = 1;
 static int __init set_initfree(char *str)
 {
 	long val;
-	if (strict_strtol(str, 0, &val)) {
+	if (strict_strtol(str, 0, &val) == 0) {
 		initfree = val;
 		pr_info("initfree: %s free init pages\n",
 			initfree ? "will" : "won't");
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: sync up with <arch/sim.h> and <arch/sim_def.h> changes
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (7 preceding siblings ...)
  2011-02-28 20:14 ` [PATCH] arch/tile: fix reversed test of strict_strtol() return value Chris Metcalf
@ 2011-02-28 20:19 ` Chris Metcalf
  2011-02-28 20:22 ` [PATCH] arch/tile: use a cleaner technique to enable interrupt for cpu_idle() Chris Metcalf
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:19 UTC (permalink / raw)
  To: linux-kernel

These headers are used by Linux but are maintained upstream.
This change incorporates a few minor fixes to these headers,
including a new sim_print() function, cleaner support for the
sim_syscall() API, and a sim_query_cpu_speed() method.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/arch/sim.h     |   48 ++++++++++++++++++++++++++++---------
 arch/tile/include/arch/sim_def.h |    3 ++
 2 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/tile/include/arch/sim.h b/arch/tile/include/arch/sim.h
index 74b7c16..e54b7b0 100644
--- a/arch/tile/include/arch/sim.h
+++ b/arch/tile/include/arch/sim.h
@@ -152,16 +152,33 @@ sim_dump(unsigned int mask)
 /**
  * Print a string to the simulator stdout.
  *
- * @param str The string to be written; a newline is automatically added.
+ * @param str The string to be written.
+ */
+static __inline void
+sim_print(const char* str)
+{
+  for ( ; *str != '\0'; str++)
+  {
+    __insn_mtspr(SPR_SIM_CONTROL, SIM_CONTROL_PUTC |
+                 (*str << _SIM_CONTROL_OPERATOR_BITS));
+  }
+  __insn_mtspr(SPR_SIM_CONTROL, SIM_CONTROL_PUTC |
+               (SIM_PUTC_FLUSH_BINARY << _SIM_CONTROL_OPERATOR_BITS));
+}
+
+
+/**
+ * Print a string to the simulator stdout.
+ *
+ * @param str The string to be written (a newline is automatically added).
  */
 static __inline void
 sim_print_string(const char* str)
 {
-  int i;
-  for (i = 0; str[i] != 0; i++)
+  for ( ; *str != '\0'; str++)
   {
     __insn_mtspr(SPR_SIM_CONTROL, SIM_CONTROL_PUTC |
-                 (str[i] << _SIM_CONTROL_OPERATOR_BITS));
+                 (*str << _SIM_CONTROL_OPERATOR_BITS));
   }
   __insn_mtspr(SPR_SIM_CONTROL, SIM_CONTROL_PUTC |
                (SIM_PUTC_FLUSH_STRING << _SIM_CONTROL_OPERATOR_BITS));
@@ -203,7 +220,7 @@ sim_command(const char* str)
  * we are passing to the simulator are actually valid in the registers
  * (i.e. returned from memory) prior to the SIM_CONTROL spr.
  */
-static __inline int _sim_syscall0(int val)
+static __inline long _sim_syscall0(int val)
 {
   long result;
   __asm__ __volatile__ ("mtspr SIM_CONTROL, r0"
@@ -211,7 +228,7 @@ static __inline int _sim_syscall0(int val)
   return result;
 }
 
-static __inline int _sim_syscall1(int val, long arg1)
+static __inline long _sim_syscall1(int val, long arg1)
 {
   long result;
   __asm__ __volatile__ ("{ and zero, r1, r1; mtspr SIM_CONTROL, r0 }"
@@ -219,7 +236,7 @@ static __inline int _sim_syscall1(int val, long arg1)
   return result;
 }
 
-static __inline int _sim_syscall2(int val, long arg1, long arg2)
+static __inline long _sim_syscall2(int val, long arg1, long arg2)
 {
   long result;
   __asm__ __volatile__ ("{ and zero, r1, r2; mtspr SIM_CONTROL, r0 }"
@@ -233,7 +250,7 @@ static __inline int _sim_syscall2(int val, long arg1, long arg2)
    the register values for arguments 3 and up may still be in flight
    to the core from a stack frame reload. */
 
-static __inline int _sim_syscall3(int val, long arg1, long arg2, long arg3)
+static __inline long _sim_syscall3(int val, long arg1, long arg2, long arg3)
 {
   long result;
   __asm__ __volatile__ ("{ and zero, r3, r3 };"
@@ -244,7 +261,7 @@ static __inline int _sim_syscall3(int val, long arg1, long arg2, long arg3)
   return result;
 }
 
-static __inline int _sim_syscall4(int val, long arg1, long arg2, long arg3,
+static __inline long _sim_syscall4(int val, long arg1, long arg2, long arg3,
                                   long arg4)
 {
   long result;
@@ -256,7 +273,7 @@ static __inline int _sim_syscall4(int val, long arg1, long arg2, long arg3,
   return result;
 }
 
-static __inline int _sim_syscall5(int val, long arg1, long arg2, long arg3,
+static __inline long _sim_syscall5(int val, long arg1, long arg2, long arg3,
                                   long arg4, long arg5)
 {
   long result;
@@ -268,7 +285,6 @@ static __inline int _sim_syscall5(int val, long arg1, long arg2, long arg3,
   return result;
 }
 
-
 /**
  * Make a special syscall to the simulator itself, if running under
  * simulation. This is used as the implementation of other functions
@@ -281,7 +297,8 @@ static __inline int _sim_syscall5(int val, long arg1, long arg2, long arg3,
  */
 #define _sim_syscall(syscall_num, nr, args...) \
   _sim_syscall##nr( \
-    ((syscall_num) << _SIM_CONTROL_OPERATOR_BITS) | SIM_CONTROL_SYSCALL, args)
+    ((syscall_num) << _SIM_CONTROL_OPERATOR_BITS) | SIM_CONTROL_SYSCALL, \
+    ##args)
 
 
 /* Values for the "access_mask" parameters below. */
@@ -365,6 +382,13 @@ sim_validate_lines_evicted(unsigned long long pa, unsigned long length)
 }
 
 
+/* Return the current CPU speed in cycles per second. */
+static __inline long
+sim_query_cpu_speed(void)
+{
+  return _sim_syscall(SIM_SYSCALL_QUERY_CPU_SPEED, 0);
+}
+
 #endif /* !__DOXYGEN__ */
 
 
diff --git a/arch/tile/include/arch/sim_def.h b/arch/tile/include/arch/sim_def.h
index 7a17082..4b44a2b 100644
--- a/arch/tile/include/arch/sim_def.h
+++ b/arch/tile/include/arch/sim_def.h
@@ -243,6 +243,9 @@
  */
 #define SIM_SYSCALL_VALIDATE_LINES_EVICTED 5
 
+/** Syscall number for sim_query_cpu_speed(). */
+#define SIM_SYSCALL_QUERY_CPU_SPEED 6
+
 
 /*
  * Bit masks which can be shifted by 8, combined with
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: use a cleaner technique to enable interrupt for cpu_idle()
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (8 preceding siblings ...)
  2011-02-28 20:19 ` [PATCH] arch/tile: sync up with <arch/sim.h> and <arch/sim_def.h> changes Chris Metcalf
@ 2011-02-28 20:22 ` Chris Metcalf
  2011-02-28 20:28 ` [PATCH] arch/tile: use extended assembly to inline __mb_incoherent() Chris Metcalf
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:22 UTC (permalink / raw)
  To: linux-kernel

Previously we used iret to atomically return to kernel PL with
interrupts enabled.  However, it turns out that we are architecturally
guaranteed that we can just set and clear the "interrupt critical
section" and only interrupt on the following instruction, so we
now do that instead, since it's cleaner.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/entry.S |   16 +++++-----------
 1 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index c3aa067..431e9ae 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -91,23 +91,17 @@ STD_ENTRY(smp_nap)
 
 /*
  * Enable interrupts racelessly and then nap until interrupted.
+ * Architecturally, we are guaranteed that enabling interrupts via
+ * mtspr to INTERRUPT_CRITICAL_SECTION only interrupts at the next PC.
  * This function's _cpu_idle_nap address is special; see intvec.S.
  * When interrupted at _cpu_idle_nap, we bump the PC forward 8, and
  * as a result return to the function that called _cpu_idle().
  */
 STD_ENTRY(_cpu_idle)
-	{
-	 lnk r0
-	 movei r1, KERNEL_PL
-	}
-	{
-	 addli r0, r0, _cpu_idle_nap - .
-	 mtspr INTERRUPT_CRITICAL_SECTION, r1
-	}
+	movei r1, 1
+	mtspr INTERRUPT_CRITICAL_SECTION, r1
 	IRQ_ENABLE(r2, r3)             /* unmask, but still with ICS set */
-	mtspr SPR_EX_CONTEXT_K_1, r1   /* Kernel PL, ICS clear */
-	mtspr SPR_EX_CONTEXT_K_0, r0
-	iret
+	mtspr INTERRUPT_CRITICAL_SECTION, zero
 	.global _cpu_idle_nap
 _cpu_idle_nap:
 	nap
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: use extended assembly to inline __mb_incoherent()
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (9 preceding siblings ...)
  2011-02-28 20:22 ` [PATCH] arch/tile: use a cleaner technique to enable interrupt for cpu_idle() Chris Metcalf
@ 2011-02-28 20:28 ` Chris Metcalf
  2011-02-28 20:30 ` [PATCH] arch/tile: fix two bugs in the backtracer code Chris Metcalf
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:28 UTC (permalink / raw)
  To: linux-kernel

This avoids having to maintain an additional separate assembly
file, and of course the inline is slightly more efficient as well.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/system.h |   19 ++++++++++++++++++-
 arch/tile/lib/Makefile         |    5 ++---
 arch/tile/lib/exports.c        |    3 ---
 arch/tile/lib/mb_incoherent.S  |   34 ----------------------------------
 4 files changed, 20 insertions(+), 41 deletions(-)
 delete mode 100644 arch/tile/lib/mb_incoherent.S

diff --git a/arch/tile/include/asm/system.h b/arch/tile/include/asm/system.h
index 5388850..23d1842 100644
--- a/arch/tile/include/asm/system.h
+++ b/arch/tile/include/asm/system.h
@@ -90,7 +90,24 @@
 #endif
 
 #if !CHIP_HAS_MF_WAITS_FOR_VICTIMS()
-int __mb_incoherent(void);  /* Helper routine for mb_incoherent(). */
+#include <hv/syscall_public.h>
+/*
+ * Issue an uncacheable load to each memory controller, then
+ * wait until those loads have completed.
+ */
+static inline void __mb_incoherent(void)
+{
+	long clobber_r10;
+	asm volatile("swint2"
+		     : "=R10" (clobber_r10)
+		     : "R10" (HV_SYS_fence_incoherent)
+		     : "r0", "r1", "r2", "r3", "r4",
+		       "r5", "r6", "r7", "r8", "r9",
+		       "r11", "r12", "r13", "r14",
+		       "r15", "r16", "r17", "r18", "r19",
+		       "r20", "r21", "r22", "r23", "r24",
+		       "r25", "r26", "r27", "r28", "r29");
+}
 #endif
 
 /* Fence to guarantee visibility of stores to incoherent memory. */
diff --git a/arch/tile/lib/Makefile b/arch/tile/lib/Makefile
index 93122d5..0c26086 100644
--- a/arch/tile/lib/Makefile
+++ b/arch/tile/lib/Makefile
@@ -2,9 +2,8 @@
 # Makefile for TILE-specific library files..
 #
 
-lib-y = cacheflush.o checksum.o cpumask.o delay.o \
-	mb_incoherent.o uaccess.o memmove.o \
-	memcpy_$(BITS).o memchr_$(BITS).o memset_$(BITS).o \
+lib-y = cacheflush.o checksum.o cpumask.o delay.o uaccess.o \
+	memmove.o memcpy_$(BITS).o memchr_$(BITS).o memset_$(BITS).o \
 	strchr_$(BITS).o strlen_$(BITS).o
 
 ifeq ($(CONFIG_TILEGX),y)
diff --git a/arch/tile/lib/exports.c b/arch/tile/lib/exports.c
index 1509c55..ce5dbf5 100644
--- a/arch/tile/lib/exports.c
+++ b/arch/tile/lib/exports.c
@@ -45,9 +45,6 @@ EXPORT_SYMBOL(__copy_from_user_zeroing);
 EXPORT_SYMBOL(__copy_in_user_inatomic);
 #endif
 
-/* arch/tile/lib/mb_incoherent.S */
-EXPORT_SYMBOL(__mb_incoherent);
-
 /* hypervisor glue */
 #include <hv/hypervisor.h>
 EXPORT_SYMBOL(hv_dev_open);
diff --git a/arch/tile/lib/mb_incoherent.S b/arch/tile/lib/mb_incoherent.S
deleted file mode 100644
index 989ad7b..0000000
--- a/arch/tile/lib/mb_incoherent.S
+++ /dev/null
@@ -1,34 +0,0 @@
-/*
- * Copyright 2010 Tilera Corporation. All Rights Reserved.
- *
- *   This program is free software; you can redistribute it and/or
- *   modify it under the terms of the GNU General Public License
- *   as published by the Free Software Foundation, version 2.
- *
- *   This program is distributed in the hope that it will be useful, but
- *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- *   NON INFRINGEMENT.  See the GNU General Public License for
- *   more details.
- *
- * Assembly code for invoking the HV's fence_incoherent syscall.
- */
-
-#include <linux/linkage.h>
-#include <hv/syscall_public.h>
-#include <arch/abi.h>
-#include <arch/chip.h>
-
-#if !CHIP_HAS_MF_WAITS_FOR_VICTIMS()
-
-/*
- * Invoke the hypervisor's fence_incoherent syscall, which guarantees
- * that all victims for cachelines homed on this tile have reached memory.
- */
-STD_ENTRY(__mb_incoherent)
-	moveli TREG_SYSCALL_NR_NAME, HV_SYS_fence_incoherent
-	swint2
-	jrp lr
-	STD_ENDPROC(__mb_incoherent)
-
-#endif
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: fix two bugs in the backtracer code
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (10 preceding siblings ...)
  2011-02-28 20:28 ` [PATCH] arch/tile: use extended assembly to inline __mb_incoherent() Chris Metcalf
@ 2011-02-28 20:30 ` Chris Metcalf
  2011-02-28 20:48 ` [PATCH] arch/tile: enhance existing finv_buffer_remote() routine Chris Metcalf
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:30 UTC (permalink / raw)
  To: linux-kernel

The first is that we were using an incorrect hand-rolled variant
of __kernel_text_address() which didn't handle module PCs.  We now
just use the standard API.

The second was that we weren't accounting for the three-level
page table when we were trying to pre-verify the addresses on
the 64-bit TILE-Gx processor; we now do that correctly.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/kernel/stack.c |   28 +++++++++++++++++++---------
 1 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/tile/kernel/stack.c b/arch/tile/kernel/stack.c
index 0d54106..dd81713 100644
--- a/arch/tile/kernel/stack.c
+++ b/arch/tile/kernel/stack.c
@@ -44,13 +44,6 @@ static int in_kernel_stack(struct KBacktraceIterator *kbt, VirtualAddress sp)
 	return sp >= kstack_base && sp < kstack_base + THREAD_SIZE;
 }
 
-/* Is address in the specified kernel code? */
-static int in_kernel_text(VirtualAddress address)
-{
-	return (address >= MEM_SV_INTRPT &&
-		address < MEM_SV_INTRPT + HPAGE_SIZE);
-}
-
 /* Is address valid for reading? */
 static int valid_address(struct KBacktraceIterator *kbt, VirtualAddress address)
 {
@@ -63,6 +56,23 @@ static int valid_address(struct KBacktraceIterator *kbt, VirtualAddress address)
 	if (l1_pgtable == NULL)
 		return 0;	/* can't read user space in other tasks */
 
+#ifdef CONFIG_64BIT
+	/* Find the real l1_pgtable by looking in the l0_pgtable. */
+	pte = l1_pgtable[HV_L0_INDEX(address)];
+	if (!hv_pte_get_present(pte))
+		return 0;
+	pfn = hv_pte_get_pfn(pte);
+	if (pte_huge(pte)) {
+		if (!pfn_valid(pfn)) {
+			pr_err("L0 huge page has bad pfn %#lx\n", pfn);
+			return 0;
+		}
+		return hv_pte_get_present(pte) && hv_pte_get_readable(pte);
+	}
+	page = pfn_to_page(pfn);
+	BUG_ON(PageHighMem(page));  /* No HIGHMEM on 64-bit. */
+	l1_pgtable = (HV_PTE *)pfn_to_kaddr(pfn);
+#endif
 	pte = l1_pgtable[HV_L1_INDEX(address)];
 	if (!hv_pte_get_present(pte))
 		return 0;
@@ -92,7 +102,7 @@ static bool read_memory_func(void *result, VirtualAddress address,
 {
 	int retval;
 	struct KBacktraceIterator *kbt = (struct KBacktraceIterator *)vkbt;
-	if (in_kernel_text(address)) {
+	if (__kernel_text_address(address)) {
 		/* OK to read kernel code. */
 	} else if (address >= PAGE_OFFSET) {
 		/* We only tolerate kernel-space reads of this task's stack */
@@ -132,7 +142,7 @@ static struct pt_regs *valid_fault_handler(struct KBacktraceIterator* kbt)
 		}
 	}
 	if (EX1_PL(p->ex1) == KERNEL_PL &&
-	    in_kernel_text(p->pc) &&
+	    __kernel_text_address(p->pc) &&
 	    in_kernel_stack(kbt, p->sp) &&
 	    p->sp >= sp) {
 		if (kbt->verbose)
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: enhance existing finv_buffer_remote() routine
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (11 preceding siblings ...)
  2011-02-28 20:30 ` [PATCH] arch/tile: fix two bugs in the backtracer code Chris Metcalf
@ 2011-02-28 20:48 ` Chris Metcalf
  2011-02-28 20:51 ` [PATCH] arch/tile: export some additional module symbols Chris Metcalf
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:48 UTC (permalink / raw)
  To: linux-kernel

It now takes an additional argument so it can be used to
flush-and-invalidate pages that are cached using hash-for-home
as well those that are cached with coherence point on a single cpu.

This allows it to be used more widely for changing the coherence
point of arbitrary pages when necessary.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/cacheflush.h |   55 ++-----------------
 arch/tile/lib/cacheflush.c         |  102 ++++++++++++++++++++++++++++++++++++
 arch/tile/mm/homecache.c           |   36 +++++++++++--
 drivers/net/tile/tilepro.c         |    4 +-
 4 files changed, 141 insertions(+), 56 deletions(-)

diff --git a/arch/tile/include/asm/cacheflush.h b/arch/tile/include/asm/cacheflush.h
index 14a3f85..12fb0fb 100644
--- a/arch/tile/include/asm/cacheflush.h
+++ b/arch/tile/include/asm/cacheflush.h
@@ -138,55 +138,12 @@ static inline void finv_buffer(void *buffer, size_t size)
 }
 
 /*
- * Flush & invalidate a VA range that is homed remotely on a single core,
- * waiting until the memory controller holds the flushed values.
+ * Flush and invalidate a VA range that is homed remotely, waiting
+ * until the memory controller holds the flushed values.  If "hfh" is
+ * true, we will do a more expensive flush involving additional loads
+ * to make sure we have touched all the possible home cpus of a buffer
+ * that is homed with "hash for home".
  */
-static inline void finv_buffer_remote(void *buffer, size_t size)
-{
-	char *p;
-	int i;
-
-	/*
-	 * Flush and invalidate the buffer out of the local L1/L2
-	 * and request the home cache to flush and invalidate as well.
-	 */
-	__finv_buffer(buffer, size);
-
-	/*
-	 * Wait for the home cache to acknowledge that it has processed
-	 * all the flush-and-invalidate requests.  This does not mean
-	 * that the flushed data has reached the memory controller yet,
-	 * but it does mean the home cache is processing the flushes.
-	 */
-	__insn_mf();
-
-	/*
-	 * Issue a load to the last cache line, which can't complete
-	 * until all the previously-issued flushes to the same memory
-	 * controller have also completed.  If we weren't striping
-	 * memory, that one load would be sufficient, but since we may
-	 * be, we also need to back up to the last load issued to
-	 * another memory controller, which would be the point where
-	 * we crossed an 8KB boundary (the granularity of striping
-	 * across memory controllers).  Keep backing up and doing this
-	 * until we are before the beginning of the buffer, or have
-	 * hit all the controllers.
-	 */
-	for (i = 0, p = (char *)buffer + size - 1;
-	     i < (1 << CHIP_LOG_NUM_MSHIMS()) && p >= (char *)buffer;
-	     ++i) {
-		const unsigned long STRIPE_WIDTH = 8192;
-
-		/* Force a load instruction to issue. */
-		*(volatile char *)p;
-
-		/* Jump to end of previous stripe. */
-		p -= STRIPE_WIDTH;
-		p = (char *)((unsigned long)p | (STRIPE_WIDTH - 1));
-	}
-
-	/* Wait for the loads (and thus flushes) to have completed. */
-	__insn_mf();
-}
+void finv_buffer_remote(void *buffer, size_t size, int hfh);
 
 #endif /* _ASM_TILE_CACHEFLUSH_H */
diff --git a/arch/tile/lib/cacheflush.c b/arch/tile/lib/cacheflush.c
index 11b6164..35c1d8c 100644
--- a/arch/tile/lib/cacheflush.c
+++ b/arch/tile/lib/cacheflush.c
@@ -21,3 +21,105 @@ void __flush_icache_range(unsigned long start, unsigned long end)
 {
 	invalidate_icache((const void *)start, end - start, PAGE_SIZE);
 }
+
+
+/* Force a load instruction to issue. */
+static inline void force_load(char *p)
+{
+	*(volatile char *)p;
+}
+
+/*
+ * Flush and invalidate a VA range that is homed remotely on a single
+ * core (if "!hfh") or homed via hash-for-home (if "hfh"), waiting
+ * until the memory controller holds the flushed values.
+ */
+void finv_buffer_remote(void *buffer, size_t size, int hfh)
+{
+	char *p, *base;
+	size_t step_size, load_count;
+	const unsigned long STRIPE_WIDTH = 8192;
+
+	/*
+	 * Flush and invalidate the buffer out of the local L1/L2
+	 * and request the home cache to flush and invalidate as well.
+	 */
+	__finv_buffer(buffer, size);
+
+	/*
+	 * Wait for the home cache to acknowledge that it has processed
+	 * all the flush-and-invalidate requests.  This does not mean
+	 * that the flushed data has reached the memory controller yet,
+	 * but it does mean the home cache is processing the flushes.
+	 */
+	__insn_mf();
+
+	/*
+	 * Issue a load to the last cache line, which can't complete
+	 * until all the previously-issued flushes to the same memory
+	 * controller have also completed.  If we weren't striping
+	 * memory, that one load would be sufficient, but since we may
+	 * be, we also need to back up to the last load issued to
+	 * another memory controller, which would be the point where
+	 * we crossed an 8KB boundary (the granularity of striping
+	 * across memory controllers).  Keep backing up and doing this
+	 * until we are before the beginning of the buffer, or have
+	 * hit all the controllers.
+	 *
+	 * If we are flushing a hash-for-home buffer, it's even worse.
+	 * Each line may be homed on a different tile, and each tile
+	 * may have up to four lines that are on different
+	 * controllers.  So as we walk backwards, we have to touch
+	 * enough cache lines to satisfy these constraints.  In
+	 * practice this ends up being close enough to "load from
+	 * every cache line on a full memory stripe on each
+	 * controller" that we simply do that, to simplify the logic.
+	 *
+	 * FIXME: See bug 9535 for some issues with this code.
+	 */
+	if (hfh) {
+		step_size = L2_CACHE_BYTES;
+		load_count = (STRIPE_WIDTH / L2_CACHE_BYTES) *
+			      (1 << CHIP_LOG_NUM_MSHIMS());
+	} else {
+		step_size = STRIPE_WIDTH;
+		load_count = (1 << CHIP_LOG_NUM_MSHIMS());
+	}
+
+	/* Load the last byte of the buffer. */
+	p = (char *)buffer + size - 1;
+	force_load(p);
+
+	/* Bump down to the end of the previous stripe or cache line. */
+	p -= step_size;
+	p = (char *)((unsigned long)p | (step_size - 1));
+
+	/* Figure out how far back we need to go. */
+	base = p - (step_size * (load_count - 2));
+	if ((long)base < (long)buffer)
+		base = buffer;
+
+	/*
+	 * Fire all the loads we need.  The MAF only has eight entries
+	 * so we can have at most eight outstanding loads, so we
+	 * unroll by that amount.
+	 */
+#pragma unroll 8
+	for (; p >= base; p -= step_size)
+		force_load(p);
+
+	/*
+	 * Repeat, but with inv's instead of loads, to get rid of the
+	 * data we just loaded into our own cache and the old home L3.
+	 * No need to unroll since inv's don't target a register.
+	 */
+	p = (char *)buffer + size - 1;
+	__insn_inv(p);
+	p -= step_size;
+	p = (char *)((unsigned long)p | (step_size - 1));
+	for (; p >= base; p -= step_size)
+		__insn_inv(p);
+
+	/* Wait for the load+inv's (and thus finvs) to have completed. */
+	__insn_mf();
+}
diff --git a/arch/tile/mm/homecache.c b/arch/tile/mm/homecache.c
index d78df3a..f344f4f 100644
--- a/arch/tile/mm/homecache.c
+++ b/arch/tile/mm/homecache.c
@@ -179,23 +179,46 @@ void flush_remote(unsigned long cache_pfn, unsigned long cache_control,
 	panic("Unsafe to continue.");
 }
 
+void flush_remote_page(struct page *page, int order)
+{
+	int i, pages = (1 << order);
+	for (i = 0; i < pages; ++i, ++page) {
+		void *p = kmap_atomic(page);
+		int hfh = 0;
+		int home = page_home(page);
+#if CHIP_HAS_CBOX_HOME_MAP()
+		if (home == PAGE_HOME_HASH)
+			hfh = 1;
+		else
+#endif
+			BUG_ON(home < 0 || home >= NR_CPUS);
+		finv_buffer_remote(p, PAGE_SIZE, hfh);
+		kunmap_atomic(p);
+	}
+}
+
 void homecache_evict(const struct cpumask *mask)
 {
 	flush_remote(0, HV_FLUSH_EVICT_L2, mask, 0, 0, 0, NULL, NULL, 0);
 }
 
-/* Return a mask of the cpus whose caches currently own these pages. */
-static void homecache_mask(struct page *page, int pages,
-			   struct cpumask *home_mask)
+/*
+ * Return a mask of the cpus whose caches currently own these pages.
+ * The return value is whether the pages are all coherently cached
+ * (i.e. none are immutable, incoherent, or uncached).
+ */
+static int homecache_mask(struct page *page, int pages,
+			  struct cpumask *home_mask)
 {
 	int i;
+	int cached_coherently = 1;
 	cpumask_clear(home_mask);
 	for (i = 0; i < pages; ++i) {
 		int home = page_home(&page[i]);
 		if (home == PAGE_HOME_IMMUTABLE ||
 		    home == PAGE_HOME_INCOHERENT) {
 			cpumask_copy(home_mask, cpu_possible_mask);
-			return;
+			return 0;
 		}
 #if CHIP_HAS_CBOX_HOME_MAP()
 		if (home == PAGE_HOME_HASH) {
@@ -203,11 +226,14 @@ static void homecache_mask(struct page *page, int pages,
 			continue;
 		}
 #endif
-		if (home == PAGE_HOME_UNCACHED)
+		if (home == PAGE_HOME_UNCACHED) {
+			cached_coherently = 0;
 			continue;
+		}
 		BUG_ON(home < 0 || home >= NR_CPUS);
 		cpumask_set_cpu(home, home_mask);
 	}
+	return cached_coherently;
 }
 
 /*
diff --git a/drivers/net/tile/tilepro.c b/drivers/net/tile/tilepro.c
index 7cb301d..f901299 100644
--- a/drivers/net/tile/tilepro.c
+++ b/drivers/net/tile/tilepro.c
@@ -1620,7 +1620,7 @@ static unsigned int tile_net_tx_frags(lepp_frag_t *frags,
 	if (b_len != 0) {
 
 		if (!hash_default)
-			finv_buffer_remote(b_data, b_len);
+			finv_buffer_remote(b_data, b_len, 0);
 
 		cpa = __pa(b_data);
 		frags[n].cpa_lo = cpa;
@@ -1643,7 +1643,7 @@ static unsigned int tile_net_tx_frags(lepp_frag_t *frags,
 		if (!hash_default) {
 			void *va = pfn_to_kaddr(pfn) + f->page_offset;
 			BUG_ON(PageHighMem(f->page));
-			finv_buffer_remote(va, f->size);
+			finv_buffer_remote(va, f->size, 0);
 		}
 
 		cpa = ((phys_addr_t)pfn << PAGE_SHIFT) + f->page_offset;
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: export some additional module symbols
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (12 preceding siblings ...)
  2011-02-28 20:48 ` [PATCH] arch/tile: enhance existing finv_buffer_remote() routine Chris Metcalf
@ 2011-02-28 20:51 ` Chris Metcalf
  2011-02-28 20:58 ` [PATCH] arch/tile: fix some comments and whitespace Chris Metcalf
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:51 UTC (permalink / raw)
  To: linux-kernel

This adds a grab bag of symbols that have been missing for
various modules.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/lib/exports.c |    7 +++++++
 arch/tile/mm/init.c     |    1 +
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/tile/lib/exports.c b/arch/tile/lib/exports.c
index ce5dbf5..49284fa 100644
--- a/arch/tile/lib/exports.c
+++ b/arch/tile/lib/exports.c
@@ -29,6 +29,9 @@ EXPORT_SYMBOL(__put_user_8);
 EXPORT_SYMBOL(strnlen_user_asm);
 EXPORT_SYMBOL(strncpy_from_user_asm);
 EXPORT_SYMBOL(clear_user_asm);
+EXPORT_SYMBOL(flush_user_asm);
+EXPORT_SYMBOL(inv_user_asm);
+EXPORT_SYMBOL(finv_user_asm);
 
 /* arch/tile/kernel/entry.S */
 #include <linux/kernel.h>
@@ -82,4 +85,8 @@ int64_t __muldi3(int64_t, int64_t);
 EXPORT_SYMBOL(__muldi3);
 uint64_t __lshrdi3(uint64_t, unsigned int);
 EXPORT_SYMBOL(__lshrdi3);
+uint64_t __ashrdi3(uint64_t, unsigned int);
+EXPORT_SYMBOL(__ashrdi3);
+uint64_t __ashldi3(uint64_t, unsigned int);
+EXPORT_SYMBOL(__ashldi3);
 #endif
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 2ef9ce5..f89ed5d 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -69,6 +69,7 @@
 
 #ifndef __tilegx__
 unsigned long VMALLOC_RESERVE = CONFIG_VMALLOC_RESERVE;
+EXPORT_SYMBOL(VMALLOC_RESERVE);
 #endif
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: fix some comments and whitespace
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (13 preceding siblings ...)
  2011-02-28 20:51 ` [PATCH] arch/tile: export some additional module symbols Chris Metcalf
@ 2011-02-28 20:58 ` Chris Metcalf
  2011-02-28 21:01 ` [PATCH] arch/tile: add some more VMSPLIT options and use consistent naming Chris Metcalf
  2011-02-28 21:37 ` [PATCH] arch/tile: support 4KB page size as well as 64KB Chris Metcalf
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 20:58 UTC (permalink / raw)
  To: linux-kernel

This is a grab bag of changes with no actual change to generated code.
This includes whitespace and comment typos, plus a couple of stale
comments being removed.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Kconfig             |   20 ++++++++++----------
 arch/tile/kernel/intvec_32.S  |    4 ++--
 arch/tile/lib/atomic_32.c     |    2 +-
 arch/tile/lib/atomic_asm_32.S |    2 +-
 arch/tile/mm/fault.c          |    8 --------
 5 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 9e4eb51..92f7ea0 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -1,5 +1,5 @@
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/config-language.txt.
+# see Documentation/kbuild/kconfig-language.txt.
 
 config TILE
 	def_bool y
@@ -15,14 +15,14 @@ config TILE
 
 # FIXME: investigate whether we need/want these options.
 #	select HAVE_IOREMAP_PROT
-#       select HAVE_OPTPROBES
-#       select HAVE_REGS_AND_STACK_ACCESS_API
-#       select HAVE_HW_BREAKPOINT
-#       select PERF_EVENTS
-#       select HAVE_USER_RETURN_NOTIFIER
-#       config NO_BOOTMEM
-#       config ARCH_SUPPORTS_DEBUG_PAGEALLOC
-#       config HUGETLB_PAGE_SIZE_VARIABLE
+#	select HAVE_OPTPROBES
+#	select HAVE_REGS_AND_STACK_ACCESS_API
+#	select HAVE_HW_BREAKPOINT
+#	select PERF_EVENTS
+#	select HAVE_USER_RETURN_NOTIFIER
+#	config NO_BOOTMEM
+#	config ARCH_SUPPORTS_DEBUG_PAGEALLOC
+#	config HUGETLB_PAGE_SIZE_VARIABLE
 
 config MMU
 	def_bool y
@@ -40,7 +40,7 @@ config HAVE_SETUP_PER_CPU_AREA
 	def_bool y
 
 config NEED_PER_CPU_PAGE_FIRST_CHUNK
-        def_bool y
+	def_bool y
 
 config SYS_SUPPORTS_HUGETLBFS
 	def_bool y
diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index abf92f5..eabf1ef 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1584,7 +1584,7 @@ ENTRY(sys_cmpxchg)
 	 * about aliasing among multiple mappings of the same physical page,
 	 * and we ignore the low 3 bits so we have one lock that covers
 	 * both a cmpxchg64() and a cmpxchg() on either its low or high word.
-	 * NOTE: this code must match __atomic_hashed_lock() in lib/atomic.c.
+	 * NOTE: this must match __atomic_hashed_lock() in lib/atomic_32.c.
 	 */
 
 #if ATOMIC_LOCKS_FOUND_VIA_TABLE()
@@ -1718,7 +1718,7 @@ ENTRY(sys_cmpxchg)
 
 	/*
 	 * Perform the actual cmpxchg or atomic_update.
-	 * Note that __futex_mark_unlocked() in uClibc relies on
+	 * Note that the system <arch/atomic.h> header relies on
 	 * atomic_update() to always perform an "mf", so don't make
 	 * it optional or conditional without modifying that code.
 	 */
diff --git a/arch/tile/lib/atomic_32.c b/arch/tile/lib/atomic_32.c
index 59e3a02..020866a 100644
--- a/arch/tile/lib/atomic_32.c
+++ b/arch/tile/lib/atomic_32.c
@@ -53,7 +53,7 @@ int atomic_locks[PAGE_SIZE / sizeof(int) /* Only ATOMIC_HASH_SIZE is used */]
 
 static inline int *__atomic_hashed_lock(volatile void *v)
 {
-	/* NOTE: this code must match "sys_cmpxchg" in kernel/intvec.S */
+	/* NOTE: this code must match "sys_cmpxchg" in kernel/intvec_32.S */
 #if ATOMIC_LOCKS_FOUND_VIA_TABLE()
 	unsigned long i =
 		(unsigned long) v & ((PAGE_SIZE-1) & -sizeof(long long));
diff --git a/arch/tile/lib/atomic_asm_32.S b/arch/tile/lib/atomic_asm_32.S
index 5a5514b..82f64cc 100644
--- a/arch/tile/lib/atomic_asm_32.S
+++ b/arch/tile/lib/atomic_asm_32.S
@@ -14,7 +14,7 @@
  * Support routines for atomic operations.  Each function takes:
  *
  * r0: address to manipulate
- * r1: pointer to atomic lock guarding this operation (for FUTEX_LOCK_REG)
+ * r1: pointer to atomic lock guarding this operation (for ATOMIC_LOCK_REG)
  * r2: new value to write, or for cmpxchg/add_unless, value to compare against
  * r3: (cmpxchg/xchg_add_unless) new value to write or add;
  *     (atomic64 ops) high word of value to write
diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index dcebfc8..758f597 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -655,14 +655,6 @@ struct intvec_state do_page_fault_ics(struct pt_regs *regs, int fault_num,
 	}
 
 	/*
-	 * NOTE: the one other type of access that might bring us here
-	 * are the memory ops in __tns_atomic_acquire/__tns_atomic_release,
-	 * but we don't have to check specially for them since we can
-	 * always safely return to the address of the fault and retry,
-	 * since no separate atomic locks are involved.
-	 */
-
-	/*
 	 * Now that we have released the atomic lock (if necessary),
 	 * it's safe to spin if the PTE that caused the fault was migrating.
 	 */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: add some more VMSPLIT options and use consistent naming
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (14 preceding siblings ...)
  2011-02-28 20:58 ` [PATCH] arch/tile: fix some comments and whitespace Chris Metcalf
@ 2011-02-28 21:01 ` Chris Metcalf
  2011-02-28 21:37 ` [PATCH] arch/tile: support 4KB page size as well as 64KB Chris Metcalf
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 21:01 UTC (permalink / raw)
  To: linux-kernel

This renames 3G_OPT to 2_75G, and adds 2_5G and 2_25G.

For memory-intensive applications that are also network-buffer
intensive it can be helpful to be able to tune the virtual address
of the start of kernel memory.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Kconfig |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 92f7ea0..eed0fc5 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -234,8 +234,12 @@ choice
 		bool "3.5G/0.5G user/kernel split"
 	config VMSPLIT_3G
 		bool "3G/1G user/kernel split"
-	config VMSPLIT_3G_OPT
-		bool "3G/1G user/kernel split (for full 1G low memory)"
+	config VMSPLIT_2_75G
+		bool "2.75G/1.25G user/kernel split (for full 1G low memory)"
+	config VMSPLIT_2_5G
+		bool "2.5G/1.5G user/kernel split"
+	config VMSPLIT_2_25G
+		bool "2.25G/1.75G user/kernel split"
 	config VMSPLIT_2G
 		bool "2G/2G user/kernel split"
 	config VMSPLIT_1G
@@ -246,7 +250,9 @@ config PAGE_OFFSET
 	hex
 	default 0xF0000000 if VMSPLIT_3_75G
 	default 0xE0000000 if VMSPLIT_3_5G
-	default 0xB0000000 if VMSPLIT_3G_OPT
+	default 0xB0000000 if VMSPLIT_2_75G
+	default 0xA0000000 if VMSPLIT_2_5G
+	default 0x90000000 if VMSPLIT_2_25G
 	default 0x80000000 if VMSPLIT_2G
 	default 0x40000000 if VMSPLIT_1G
 	default 0xC0000000
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arch/tile: support 4KB page size as well as 64KB
  2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
                   ` (15 preceding siblings ...)
  2011-02-28 21:01 ` [PATCH] arch/tile: add some more VMSPLIT options and use consistent naming Chris Metcalf
@ 2011-02-28 21:37 ` Chris Metcalf
  16 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-02-28 21:37 UTC (permalink / raw)
  To: linux-kernel

The Tilera architecture traditionally supports 64KB page sizes
to improve TLB utilization and improve performance when the
hardware is being used primarily to run a single application.

For more generic server scenarios, it can be beneficial to run
with 4KB page sizes, so this commit allows that to be specified
(by modifying the arch/tile/include/hv/pagesize.h header).

As part of this change, we also re-worked the PTE management
slightly so that PTE writes all go through a __set_pte() function
where we can do some additional validation.  The set_pte_order()
function was eliminated since the "order" argument wasn't being used.

One bug uncovered was in the PCI DMA code, which wasn't properly
flushing the specified range.  This was benign with 64KB pages,
but with 4KB pages we were getting some larger flushes wrong.

The per-cpu memory reservation code also needed updating to
conform with the newer percpu stuff; before it always chose 64KB,
and that was always correct, but with 4KB granularity we now have
to pay closer attention and reserve the amount of memory that will
be requested when the percpu code starts allocating.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/Kconfig                   |    6 --
 arch/tile/include/asm/hugetlb.h     |    2 +-
 arch/tile/include/asm/page.h        |   34 +++-----
 arch/tile/include/asm/pgalloc.h     |    7 +-
 arch/tile/include/asm/pgtable.h     |   31 +++----
 arch/tile/include/asm/pgtable_32.h  |    8 ++-
 arch/tile/include/asm/stack.h       |    3 +-
 arch/tile/include/asm/thread_info.h |    1 +
 arch/tile/kernel/intvec_32.S        |   16 +++-
 arch/tile/kernel/machine_kexec.c    |    7 +-
 arch/tile/kernel/pci-dma.c          |   38 ++++----
 arch/tile/kernel/process.c          |    2 +-
 arch/tile/kernel/setup.c            |   20 +++--
 arch/tile/lib/memcpy_tile64.c       |    4 +-
 arch/tile/mm/homecache.c            |    2 +-
 arch/tile/mm/init.c                 |   18 +----
 arch/tile/mm/migrate_32.S           |    1 +
 arch/tile/mm/pgtable.c              |  170 +++++++++++++++++++++++++++++------
 18 files changed, 235 insertions(+), 135 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index eed0fc5..f3b7870 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -202,12 +202,6 @@ config NODES_SHIFT
 	  By default, 2, i.e. 2^2 == 4 DDR2 controllers.
 	  In a system with more controllers, this value should be raised.
 
-# Need 16MB areas to enable hugetlb
-# See build-time check in arch/tile/mm/init.c.
-config FORCE_MAX_ZONEORDER
-	int
-	default 9
-
 choice
 	depends on !TILEGX
 	prompt "Memory split" if EXPERT
diff --git a/arch/tile/include/asm/hugetlb.h b/arch/tile/include/asm/hugetlb.h
index 0521c27..d396d18 100644
--- a/arch/tile/include/asm/hugetlb.h
+++ b/arch/tile/include/asm/hugetlb.h
@@ -54,7 +54,7 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
 static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 				   pte_t *ptep, pte_t pte)
 {
-	set_pte_order(ptep, pte, HUGETLB_PAGE_ORDER);
+	set_pte(ptep, pte);
 }
 
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
diff --git a/arch/tile/include/asm/page.h b/arch/tile/include/asm/page.h
index 7979a45..3eb5352 100644
--- a/arch/tile/include/asm/page.h
+++ b/arch/tile/include/asm/page.h
@@ -16,10 +16,11 @@
 #define _ASM_TILE_PAGE_H
 
 #include <linux/const.h>
+#include <hv/pagesize.h>
 
 /* PAGE_SHIFT and HPAGE_SHIFT determine the page sizes. */
-#define PAGE_SHIFT	16
-#define HPAGE_SHIFT	24
+#define PAGE_SHIFT	HV_LOG2_PAGE_SIZE_SMALL
+#define HPAGE_SHIFT	HV_LOG2_PAGE_SIZE_LARGE
 
 #define PAGE_SIZE	(_AC(1, UL) << PAGE_SHIFT)
 #define HPAGE_SIZE	(_AC(1, UL) << HPAGE_SHIFT)
@@ -29,25 +30,18 @@
 
 #ifdef __KERNEL__
 
-#include <hv/hypervisor.h>
-#include <arch/chip.h>
-
 /*
- * The {,H}PAGE_SHIFT values must match the HV_LOG2_PAGE_SIZE_xxx
- * definitions in <hv/hypervisor.h>.  We validate this at build time
- * here, and again at runtime during early boot.  We provide a
- * separate definition since userspace doesn't have <hv/hypervisor.h>.
- *
- * Be careful to distinguish PAGE_SHIFT from HV_PTE_INDEX_PFN, since
- * they are the same on i386 but not TILE.
+ * If the Kconfig doesn't specify, set a maximum zone order that
+ * is enough so that we can create huge pages from small pages given
+ * the respective sizes of the two page types.  See <linux/mmzone.h>.
  */
-#if HV_LOG2_PAGE_SIZE_SMALL != PAGE_SHIFT
-# error Small page size mismatch in Linux
-#endif
-#if HV_LOG2_PAGE_SIZE_LARGE != HPAGE_SHIFT
-# error Huge page size mismatch in Linux
+#ifndef CONFIG_FORCE_MAX_ZONEORDER
+#define CONFIG_FORCE_MAX_ZONEORDER (HPAGE_SHIFT - PAGE_SHIFT + 1)
 #endif
 
+#include <hv/hypervisor.h>
+#include <arch/chip.h>
+
 #ifndef __ASSEMBLY__
 
 #include <linux/types.h>
@@ -81,12 +75,6 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
  * Hypervisor page tables are made of the same basic structure.
  */
 
-typedef __u64 pteval_t;
-typedef __u64 pmdval_t;
-typedef __u64 pudval_t;
-typedef __u64 pgdval_t;
-typedef __u64 pgprotval_t;
-
 typedef HV_PTE pte_t;
 typedef HV_PTE pgd_t;
 typedef HV_PTE pgprot_t;
diff --git a/arch/tile/include/asm/pgalloc.h b/arch/tile/include/asm/pgalloc.h
index cf52791..e919c0b 100644
--- a/arch/tile/include/asm/pgalloc.h
+++ b/arch/tile/include/asm/pgalloc.h
@@ -41,9 +41,9 @@
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
 #ifdef CONFIG_64BIT
-	set_pte_order(pmdp, pmd, L2_USER_PGTABLE_ORDER);
+	set_pte(pmdp, pmd);
 #else
-	set_pte_order(&pmdp->pud.pgd, pmd.pud.pgd, L2_USER_PGTABLE_ORDER);
+	set_pte(&pmdp->pud.pgd, pmd.pud.pgd);
 #endif
 }
 
@@ -100,6 +100,9 @@ pte_t *get_prealloc_pte(unsigned long pfn);
 /* During init, we can shatter kernel huge pages if needed. */
 void shatter_pmd(pmd_t *pmd);
 
+/* After init, a more complex technique is required. */
+void shatter_huge_page(unsigned long addr);
+
 #ifdef __tilegx__
 /* We share a single page allocator for both L1 and L2 page tables. */
 #if HV_L1_SIZE != HV_L2_SIZE
diff --git a/arch/tile/include/asm/pgtable.h b/arch/tile/include/asm/pgtable.h
index a6604e9..1a20b7e 100644
--- a/arch/tile/include/asm/pgtable.h
+++ b/arch/tile/include/asm/pgtable.h
@@ -233,15 +233,23 @@ static inline void __pte_clear(pte_t *ptep)
 #define pgd_ERROR(e) \
 	pr_err("%s:%d: bad pgd 0x%016llx.\n", __FILE__, __LINE__, pgd_val(e))
 
+/* Return PA and protection info for a given kernel VA. */
+int va_to_cpa_and_pte(void *va, phys_addr_t *cpa, pte_t *pte);
+
+/*
+ * __set_pte() ensures we write the 64-bit PTE with 32-bit words in
+ * the right order on 32-bit platforms and also allows us to write
+ * hooks to check valid PTEs, etc., if we want.
+ */
+void __set_pte(pte_t *ptep, pte_t pte);
+
 /*
- * set_pte_order() sets the given PTE and also sanity-checks the
+ * set_pte() sets the given PTE and also sanity-checks the
  * requested PTE against the page homecaching.  Unspecified parts
  * of the PTE are filled in when it is written to memory, i.e. all
  * caching attributes if "!forcecache", or the home cpu if "anyhome".
  */
-extern void set_pte_order(pte_t *ptep, pte_t pte, int order);
-
-#define set_pte(ptep, pteval) set_pte_order(ptep, pteval, 0)
+extern void set_pte(pte_t *ptep, pte_t pte);
 #define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
 #define set_pte_atomic(pteptr, pteval) set_pte(pteptr, pteval)
 
@@ -293,21 +301,6 @@ extern void check_mm_caching(struct mm_struct *prev, struct mm_struct *next);
 #define __swp_entry_to_pte(swp)	((pte_t) { (((long long) ((swp).val)) << 32) })
 
 /*
- * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
- *
- *  dst - pointer to pgd range anwhere on a pgd page
- *  src - ""
- *  count - the number of pgds to copy.
- *
- * dst and src can be on the same page, but the range must not overlap,
- * and must not cross a page boundary.
- */
-static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
-{
-       memcpy(dst, src, count * sizeof(pgd_t));
-}
-
-/*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  */
diff --git a/arch/tile/include/asm/pgtable_32.h b/arch/tile/include/asm/pgtable_32.h
index 53ec348..9f98529 100644
--- a/arch/tile/include/asm/pgtable_32.h
+++ b/arch/tile/include/asm/pgtable_32.h
@@ -24,6 +24,7 @@
 #define PGDIR_SIZE	HV_PAGE_SIZE_LARGE
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 #define PTRS_PER_PGD	(1 << (32 - PGDIR_SHIFT))
+#define SIZEOF_PGD	(PTRS_PER_PGD * sizeof(pgd_t))
 
 /*
  * The level-2 index is defined by the difference between the huge
@@ -33,6 +34,7 @@
  * this nomenclature is somewhat confusing.
  */
 #define PTRS_PER_PTE (1 << (HV_LOG2_PAGE_SIZE_LARGE - HV_LOG2_PAGE_SIZE_SMALL))
+#define SIZEOF_PTE	(PTRS_PER_PTE * sizeof(pte_t))
 
 #ifndef __ASSEMBLY__
 
@@ -94,7 +96,6 @@ static inline int pgd_addr_invalid(unsigned long addr)
  */
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 
 extern int ptep_test_and_clear_young(struct vm_area_struct *,
 				     unsigned long addr, pte_t *);
@@ -110,6 +111,11 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 	return pte;
 }
 
+static inline void __set_pmd(pmd_t *pmdp, pmd_t pmdval)
+{
+	set_pte(&pmdp->pud.pgd, pmdval.pud.pgd);
+}
+
 /* Create a pmd from a PTFN. */
 static inline pmd_t ptfn_pmd(unsigned long ptfn, pgprot_t prot)
 {
diff --git a/arch/tile/include/asm/stack.h b/arch/tile/include/asm/stack.h
index f908473..4d97a2d 100644
--- a/arch/tile/include/asm/stack.h
+++ b/arch/tile/include/asm/stack.h
@@ -18,13 +18,14 @@
 #include <linux/types.h>
 #include <linux/sched.h>
 #include <asm/backtrace.h>
+#include <asm/page.h>
 #include <hv/hypervisor.h>
 
 /* Everything we need to keep track of a backtrace iteration */
 struct KBacktraceIterator {
 	BacktraceIterator it;
 	struct task_struct *task;     /* task we are backtracing */
-	HV_PTE *pgtable;	      /* page table for user space access */
+	pte_t *pgtable;		      /* page table for user space access */
 	int end;		      /* iteration complete. */
 	int new_context;              /* new context is starting */
 	int profile;                  /* profiling, so stop on async intrpt */
diff --git a/arch/tile/include/asm/thread_info.h b/arch/tile/include/asm/thread_info.h
index 3872f2b..9e8e9c4 100644
--- a/arch/tile/include/asm/thread_info.h
+++ b/arch/tile/include/asm/thread_info.h
@@ -68,6 +68,7 @@ struct thread_info {
 #else
 #define THREAD_SIZE_ORDER (0)
 #endif
+#define THREAD_SIZE_PAGES (1 << THREAD_SIZE_ORDER)
 
 #define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
 #define LOG2_THREAD_SIZE (PAGE_SHIFT + THREAD_SIZE_ORDER)
diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index eabf1ef..fffcfa6 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1556,7 +1556,10 @@ STD_ENTRY(_sys_clone)
 	.align 64
 	/* Align much later jump on the start of a cache line. */
 #if !ATOMIC_LOCKS_FOUND_VIA_TABLE()
-	nop; nop
+	nop
+#if PAGE_SIZE >= 0x10000
+	nop
+#endif
 #endif
 ENTRY(sys_cmpxchg)
 
@@ -1587,6 +1590,10 @@ ENTRY(sys_cmpxchg)
 	 * NOTE: this must match __atomic_hashed_lock() in lib/atomic_32.c.
 	 */
 
+#if (PAGE_OFFSET & 0xffff) != 0
+# error Code here assumes PAGE_OFFSET can be loaded with just hi16()
+#endif
+
 #if ATOMIC_LOCKS_FOUND_VIA_TABLE()
 	{
 	 /* Check for unaligned input. */
@@ -1679,11 +1686,14 @@ ENTRY(sys_cmpxchg)
 	 lw	r26, r0
 	}
 	{
-	 /* atomic_locks is page aligned so this suffices to get its addr. */
-	 auli	r21, zero, hi16(atomic_locks)
+	 auli	r21, zero, ha16(atomic_locks)
 
 	 bbns   r23, .Lcmpxchg_badaddr
 	}
+#if PAGE_SIZE < 0x10000
+	/* atomic_locks is page-aligned so for big pages we don't need this. */
+	addli   r21, r21, lo16(atomic_locks)
+#endif
 	{
 	 /*
 	  * Insert the hash bits into the page-aligned pointer.
diff --git a/arch/tile/kernel/machine_kexec.c b/arch/tile/kernel/machine_kexec.c
index 0d8b9e9..e00d717 100644
--- a/arch/tile/kernel/machine_kexec.c
+++ b/arch/tile/kernel/machine_kexec.c
@@ -240,8 +240,11 @@ static void setup_quasi_va_is_pa(void)
 	pte = hv_pte(_PAGE_KERNEL | _PAGE_HUGE_PAGE);
 	pte = hv_pte_set_mode(pte, HV_PTE_MODE_CACHE_NO_L3);
 
-	for (i = 0; i < pgd_index(PAGE_OFFSET); i++)
-		pgtable[i] = pfn_pte(i << (HPAGE_SHIFT - PAGE_SHIFT), pte);
+	for (i = 0; i < pgd_index(PAGE_OFFSET); i++) {
+		unsigned long pfn = i << (HPAGE_SHIFT - PAGE_SHIFT);
+		if (pfn_valid(pfn))
+			__set_pte(&pgtable[i], pfn_pte(pfn, pte));
+	}
 }
 
 
diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c
index 5ad5e13..658752b 100644
--- a/arch/tile/kernel/pci-dma.c
+++ b/arch/tile/kernel/pci-dma.c
@@ -86,6 +86,21 @@ EXPORT_SYMBOL(dma_free_coherent);
  * can count on nothing having been touched.
  */
 
+/* Flush a PA range from cache page by page. */
+static void __dma_map_pa_range(dma_addr_t dma_addr, size_t size)
+{
+	struct page *page = pfn_to_page(PFN_DOWN(dma_addr));
+	size_t bytesleft = PAGE_SIZE - (dma_addr & (PAGE_SIZE - 1));
+
+	while ((ssize_t)size > 0) {
+		/* Flush the page. */
+		homecache_flush_cache(page++, 0);
+
+		/* Figure out if we need to continue on the next page. */
+		size -= bytesleft;
+		bytesleft = PAGE_SIZE;
+	}
+}
 
 /*
  * dma_map_single can be passed any memory address, and there appear
@@ -97,26 +112,12 @@ EXPORT_SYMBOL(dma_free_coherent);
 dma_addr_t dma_map_single(struct device *dev, void *ptr, size_t size,
 	       enum dma_data_direction direction)
 {
-	struct page *page;
-	dma_addr_t dma_addr;
-	int thispage;
+	dma_addr_t dma_addr = __pa(ptr);
 
 	BUG_ON(!valid_dma_direction(direction));
 	WARN_ON(size == 0);
 
-	dma_addr = __pa(ptr);
-
-	/* We might have been handed a buffer that wraps a page boundary */
-	while ((int)size > 0) {
-		/* The amount to flush that's on this page */
-		thispage = PAGE_SIZE - ((unsigned long)ptr & (PAGE_SIZE - 1));
-		thispage = min((int)thispage, (int)size);
-		/* Is this valid for any page we could be handed? */
-		page = pfn_to_page(kaddr_to_pfn(ptr));
-		homecache_flush_cache(page, 0);
-		ptr += thispage;
-		size -= thispage;
-	}
+	__dma_map_pa_range(dma_addr, size);
 
 	return dma_addr;
 }
@@ -140,10 +141,8 @@ int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
 	WARN_ON(nents == 0 || sglist->length == 0);
 
 	for_each_sg(sglist, sg, nents, i) {
-		struct page *page;
 		sg->dma_address = sg_phys(sg);
-		page = pfn_to_page(sg->dma_address >> PAGE_SHIFT);
-		homecache_flush_cache(page, 0);
+		__dma_map_pa_range(sg->dma_address, sg->length);
 	}
 
 	return nents;
@@ -163,6 +162,7 @@ dma_addr_t dma_map_page(struct device *dev, struct page *page,
 {
 	BUG_ON(!valid_dma_direction(direction));
 
+	BUG_ON(offset + size > PAGE_SIZE);
 	homecache_flush_cache(page, 0);
 
 	return page_to_pa(page) + offset;
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 5db8b5b..b9cd962 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -165,7 +165,7 @@ void free_thread_info(struct thread_info *info)
 		kfree(step_state);
 	}
 
-	free_page((unsigned long)info);
+	free_pages((unsigned long)info, THREAD_SIZE_ORDER);
 }
 
 static void save_arch_state(struct thread_struct *t);
diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index f185736..3696b18 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -59,6 +59,8 @@ unsigned long __initdata node_memmap_pfn[MAX_NUMNODES];
 unsigned long __initdata node_percpu_pfn[MAX_NUMNODES];
 unsigned long __initdata node_free_pfn[MAX_NUMNODES];
 
+static unsigned long __initdata node_percpu[MAX_NUMNODES];
+
 #ifdef CONFIG_HIGHMEM
 /* Page frame index of end of lowmem on each controller. */
 unsigned long __cpuinitdata node_lowmem_end_pfn[MAX_NUMNODES];
@@ -554,7 +556,6 @@ static void __init setup_bootmem_allocator(void)
 		reserve_bootmem(crashk_res.start,
 			crashk_res.end - crashk_res.start + 1, 0);
 #endif
-
 }
 
 void *__init alloc_remap(int nid, unsigned long size)
@@ -568,11 +569,13 @@ void *__init alloc_remap(int nid, unsigned long size)
 
 static int __init percpu_size(void)
 {
-	int size = ALIGN(__per_cpu_end - __per_cpu_start, PAGE_SIZE);
-#ifdef CONFIG_MODULES
-	if (size < PERCPU_ENOUGH_ROOM)
-		size = PERCPU_ENOUGH_ROOM;
-#endif
+	int size = __per_cpu_end - __per_cpu_start;
+	size += PERCPU_MODULE_RESERVE;
+	size += PERCPU_DYNAMIC_EARLY_SIZE;
+	if (size < PCPU_MIN_UNIT_SIZE)
+		size = PCPU_MIN_UNIT_SIZE;
+	size = roundup(size, PAGE_SIZE);
+
 	/* In several places we assume the per-cpu data fits on a huge page. */
 	BUG_ON(kdata_huge && size > HPAGE_SIZE);
 	return size;
@@ -589,7 +592,6 @@ static inline unsigned long alloc_bootmem_pfn(int size, unsigned long goal)
 static void __init zone_sizes_init(void)
 {
 	unsigned long zones_size[MAX_NR_ZONES] = { 0 };
-	unsigned long node_percpu[MAX_NUMNODES] = { 0 };
 	int size = percpu_size();
 	int num_cpus = smp_height * smp_width;
 	int i;
@@ -674,7 +676,7 @@ static void __init zone_sizes_init(void)
 		NODE_DATA(i)->bdata = NODE_DATA(0)->bdata;
 
 		free_area_init_node(i, zones_size, start, NULL);
-		printk(KERN_DEBUG "  DMA zone: %ld per-cpu pages\n",
+		printk(KERN_DEBUG "  Normal zone: %ld per-cpu pages\n",
 		       PFN_UP(node_percpu[i]));
 
 		/* Track the type of memory on each node */
@@ -1312,6 +1314,8 @@ static void *__init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
 
 	BUG_ON(size % PAGE_SIZE != 0);
 	pfn_offset[nid] += size / PAGE_SIZE;
+	BUG_ON(node_percpu[nid] < size);
+	node_percpu[nid] -= size;
 	if (percpu_pfn[cpu] == 0)
 		percpu_pfn[cpu] = pfn;
 	return pfn_to_kaddr(pfn);
diff --git a/arch/tile/lib/memcpy_tile64.c b/arch/tile/lib/memcpy_tile64.c
index f7d4a6a..b2fe15e 100644
--- a/arch/tile/lib/memcpy_tile64.c
+++ b/arch/tile/lib/memcpy_tile64.c
@@ -96,7 +96,7 @@ static void memcpy_multicache(void *dest, const void *source,
 	newsrc = __fix_to_virt(idx) + ((unsigned long)source & (PAGE_SIZE-1));
 	pmdp = pmd_offset(pud_offset(pgd_offset_k(newsrc), newsrc), newsrc);
 	ptep = pte_offset_kernel(pmdp, newsrc);
-	*ptep = src_pte;   /* set_pte() would be confused by this */
+	__set_pte(ptep, src_pte);   /* set_pte() would be confused by this */
 	local_flush_tlb_page(NULL, newsrc, PAGE_SIZE);
 
 	/* Actually move the data. */
@@ -109,7 +109,7 @@ static void memcpy_multicache(void *dest, const void *source,
 	 */
 	src_pte = hv_pte_set_mode(src_pte, HV_PTE_MODE_CACHE_NO_L3);
 	src_pte = hv_pte_set_writable(src_pte); /* need write access for inv */
-	*ptep = src_pte;   /* set_pte() would be confused by this */
+	__set_pte(ptep, src_pte);   /* set_pte() would be confused by this */
 	local_flush_tlb_page(NULL, newsrc, PAGE_SIZE);
 
 	/*
diff --git a/arch/tile/mm/homecache.c b/arch/tile/mm/homecache.c
index f344f4f..cbe6f4f 100644
--- a/arch/tile/mm/homecache.c
+++ b/arch/tile/mm/homecache.c
@@ -412,7 +412,7 @@ void homecache_change_page_home(struct page *page, int order, int home)
 		pte_t *ptep = virt_to_pte(NULL, kva);
 		pte_t pteval = *ptep;
 		BUG_ON(!pte_present(pteval) || pte_huge(pteval));
-		*ptep = pte_set_home(pteval, home);
+		__set_pte(ptep, pte_set_home(pteval, home));
 	}
 }
 
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index f89ed5d..d6e87fd 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -53,18 +53,6 @@
 
 #include "migrate.h"
 
-/*
- * We could set FORCE_MAX_ZONEORDER to "(HPAGE_SHIFT - PAGE_SHIFT + 1)"
- * in the Tile Kconfig, but this generates configure warnings.
- * Do it here and force people to get it right to compile this file.
- * The problem is that with 4KB small pages and 16MB huge pages,
- * the default value doesn't allow us to group enough small pages
- * together to make up a huge page.
- */
-#if CONFIG_FORCE_MAX_ZONEORDER < HPAGE_SHIFT - PAGE_SHIFT + 1
-# error "Change FORCE_MAX_ZONEORDER in arch/tile/Kconfig to match page size"
-#endif
-
 #define clear_pgd(pmdptr) (*(pmdptr) = hv_pte(0))
 
 #ifndef __tilegx__
@@ -962,11 +950,7 @@ struct kmem_cache *pgd_cache;
 
 void __init pgtable_cache_init(void)
 {
-	pgd_cache = kmem_cache_create("pgd",
-				PTRS_PER_PGD*sizeof(pgd_t),
-				PTRS_PER_PGD*sizeof(pgd_t),
-				0,
-				NULL);
+	pgd_cache = kmem_cache_create("pgd", SIZEOF_PGD, SIZEOF_PGD, 0, NULL);
 	if (!pgd_cache)
 		panic("pgtable_cache_init(): Cannot create pgd cache");
 }
diff --git a/arch/tile/mm/migrate_32.S b/arch/tile/mm/migrate_32.S
index f738765..ac01a7c 100644
--- a/arch/tile/mm/migrate_32.S
+++ b/arch/tile/mm/migrate_32.S
@@ -18,6 +18,7 @@
 #include <linux/linkage.h>
 #include <linux/threads.h>
 #include <asm/page.h>
+#include <asm/thread_info.h>
 #include <asm/types.h>
 #include <asm/asm-offsets.h>
 #include <hv/hypervisor.h>
diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 2c850d9..1a2b36f 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -142,6 +142,76 @@ pte_t *_pte_offset_map(pmd_t *dir, unsigned long address)
 }
 #endif
 
+/**
+ * shatter_huge_page() - ensure a given address is mapped by a small page.
+ *
+ * This function converts a huge PTE mapping kernel LOWMEM into a bunch
+ * of small PTEs with the same caching.  No cache flush required, but we
+ * must do a global TLB flush.
+ *
+ * Any caller that wishes to modify a kernel mapping that might
+ * have been made with a huge page should call this function,
+ * since doing so properly avoids race conditions with installing the
+ * newly-shattered page and then flushing all the TLB entries.
+ *
+ * @addr: Address at which to shatter any existing huge page.
+ */
+void shatter_huge_page(unsigned long addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long flags = 0;  /* happy compiler */
+#ifdef __PAGETABLE_PMD_FOLDED
+	struct list_head *pos;
+#endif
+
+	/* Get a pointer to the pmd entry that we need to change. */
+	addr &= HPAGE_MASK;
+	BUG_ON(pgd_addr_invalid(addr));
+	BUG_ON(addr < PAGE_OFFSET);  /* only for kernel LOWMEM */
+	pgd = swapper_pg_dir + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	BUG_ON(!pud_present(*pud));
+	pmd = pmd_offset(pud, addr);
+	BUG_ON(!pmd_present(*pmd));
+	if (!pmd_huge_page(*pmd))
+		return;
+
+	/*
+	 * Grab the pgd_lock, since we may need it to walk the pgd_list,
+	 * and since we need some kind of lock here to avoid races.
+	 */
+	spin_lock_irqsave(&pgd_lock, flags);
+	if (!pmd_huge_page(*pmd)) {
+		/* Lost the race to convert the huge page. */
+		spin_unlock_irqrestore(&pgd_lock, flags);
+		return;
+	}
+
+	/* Shatter the huge page into the preallocated L2 page table. */
+	pmd_populate_kernel(&init_mm, pmd,
+			    get_prealloc_pte(pte_pfn(*(pte_t *)pmd)));
+
+#ifdef __PAGETABLE_PMD_FOLDED
+	/* Walk every pgd on the system and update the pmd there. */
+	list_for_each(pos, &pgd_list) {
+		pmd_t *copy_pmd;
+		pgd = list_to_pgd(pos) + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+		copy_pmd = pmd_offset(pud, addr);
+		__set_pmd(copy_pmd, *pmd);
+	}
+#endif
+
+	/* Tell every cpu to notice the change. */
+	flush_remote(0, 0, NULL, addr, HPAGE_SIZE, HPAGE_SIZE,
+		     cpu_possible_mask, NULL, 0);
+
+	/* Hold the lock until the TLB flush is finished to avoid races. */
+	spin_unlock_irqrestore(&pgd_lock, flags);
+}
+
 /*
  * List of all pgd's needed so it can invalidate entries in both cached
  * and uncached pgd's. This is essentially codepath-based locking
@@ -184,9 +254,9 @@ static void pgd_ctor(pgd_t *pgd)
 	BUG_ON(((u64 *)swapper_pg_dir)[pgd_index(MEM_USER_INTRPT)] != 0);
 #endif
 
-	clone_pgd_range(pgd + KERNEL_PGD_INDEX_START,
-			swapper_pg_dir + KERNEL_PGD_INDEX_START,
-			KERNEL_PGD_PTRS);
+	memcpy(pgd + KERNEL_PGD_INDEX_START,
+	       swapper_pg_dir + KERNEL_PGD_INDEX_START,
+	       KERNEL_PGD_PTRS * sizeof(pgd_t));
 
 	pgd_list_add(pgd);
 	spin_unlock_irqrestore(&pgd_lock, flags);
@@ -220,8 +290,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 
 struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
-	gfp_t flags = GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO|__GFP_COMP;
+	gfp_t flags = GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO;
 	struct page *p;
+#if L2_USER_PGTABLE_ORDER > 0
+	int i;
+#endif
 
 #ifdef CONFIG_HIGHPTE
 	flags |= __GFP_HIGHMEM;
@@ -231,6 +304,18 @@ struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 	if (p == NULL)
 		return NULL;
 
+#if L2_USER_PGTABLE_ORDER > 0
+	/*
+	 * Make every page have a page_count() of one, not just the first.
+	 * We don't use __GFP_COMP since it doesn't look like it works
+	 * correctly with tlb_remove_page().
+	 */
+	for (i = 1; i < L2_USER_PGTABLE_PAGES; ++i) {
+		init_page_count(p+i);
+		inc_zone_page_state(p+i, NR_PAGETABLE);
+	}
+#endif
+
 	pgtable_page_ctor(p);
 	return p;
 }
@@ -242,8 +327,15 @@ struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
  */
 void pte_free(struct mm_struct *mm, struct page *p)
 {
+	int i;
+
 	pgtable_page_dtor(p);
-	__free_pages(p, L2_USER_PGTABLE_ORDER);
+	__free_page(p);
+
+	for (i = 1; i < L2_USER_PGTABLE_PAGES; ++i) {
+		__free_page(p+i);
+		dec_zone_page_state(p+i, NR_PAGETABLE);
+	}
 }
 
 void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte,
@@ -252,8 +344,12 @@ void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte,
 	int i;
 
 	pgtable_page_dtor(pte);
-	for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
+	tlb_remove_page(tlb, pte);
+
+	for (i = 1; i < L2_USER_PGTABLE_PAGES; ++i) {
 		tlb_remove_page(tlb, pte + i);
+		dec_zone_page_state(pte + i, NR_PAGETABLE);
+	}
 }
 
 #ifndef __tilegx__
@@ -335,35 +431,51 @@ int get_remote_cache_cpu(pgprot_t prot)
 	return x + y * smp_width;
 }
 
-void set_pte_order(pte_t *ptep, pte_t pte, int order)
+/*
+ * Convert a kernel VA to a PA and homing information.
+ */
+int va_to_cpa_and_pte(void *va, unsigned long long *cpa, pte_t *pte)
 {
-	unsigned long pfn = pte_pfn(pte);
-	struct page *page = pfn_to_page(pfn);
+	struct page *page = virt_to_page(va);
+	pte_t null_pte = { 0 };
 
-	/* Update the home of a PTE if necessary */
-	pte = pte_set_home(pte, page_home(page));
+	*cpa = __pa(va);
+
+	/* Note that this is not writing a page table, just returning a pte. */
+	*pte = pte_set_home(null_pte, page_home(page));
+
+	return 0; /* return non-zero if not hfh? */
+}
+EXPORT_SYMBOL(va_to_cpa_and_pte);
 
+void __set_pte(pte_t *ptep, pte_t pte)
+{
 #ifdef __tilegx__
 	*ptep = pte;
 #else
-	/*
-	 * When setting a PTE, write the high bits first, then write
-	 * the low bits.  This sets the "present" bit only after the
-	 * other bits are in place.  If a particular PTE update
-	 * involves transitioning from one valid PTE to another, it
-	 * may be necessary to call set_pte_order() more than once,
-	 * transitioning via a suitable intermediate state.
-	 * Note that this sequence also means that if we are transitioning
-	 * from any migrating PTE to a non-migrating one, we will not
-	 * see a half-updated PTE with the migrating bit off.
-	 */
-#if HV_PTE_INDEX_PRESENT >= 32 || HV_PTE_INDEX_MIGRATING >= 32
-# error Must write the present and migrating bits last
-#endif
-	((u32 *)ptep)[1] = (u32)(pte_val(pte) >> 32);
-	barrier();
-	((u32 *)ptep)[0] = (u32)(pte_val(pte));
-#endif
+# if HV_PTE_INDEX_PRESENT >= 32 || HV_PTE_INDEX_MIGRATING >= 32
+#  error Must write the present and migrating bits last
+# endif
+	if (pte_present(pte)) {
+		((u32 *)ptep)[1] = (u32)(pte_val(pte) >> 32);
+		barrier();
+		((u32 *)ptep)[0] = (u32)(pte_val(pte));
+	} else {
+		((u32 *)ptep)[0] = (u32)(pte_val(pte));
+		barrier();
+		((u32 *)ptep)[1] = (u32)(pte_val(pte) >> 32);
+	}
+#endif /* __tilegx__ */
+}
+
+void set_pte(pte_t *ptep, pte_t pte)
+{
+	struct page *page = pfn_to_page(pte_pfn(pte));
+
+	/* Update the home of a PTE if necessary */
+	pte = pte_set_home(pte, page_home(page));
+
+	__set_pte(ptep, pte);
 }
 
 /* Can this mm load a PTE with cached_priority set? */
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] arch/tile: catch up with section naming convention in 2.6.35
  2011-02-27 23:52 ` [PATCH] arch/tile: catch up with section naming convention in 2.6.35 Chris Metcalf
@ 2011-03-01 20:52   ` Sam Ravnborg
  2011-03-01 21:17     ` Chris Metcalf
  0 siblings, 1 reply; 20+ messages in thread
From: Sam Ravnborg @ 2011-03-01 20:52 UTC (permalink / raw)
  To: Chris Metcalf; +Cc: linux-kernel

Hi Chris.

A few improvement proposals.

	Sam

> index 90e7c44..fc49b62 100644
> --- a/arch/tile/kernel/head_32.S
> +++ b/arch/tile/kernel/head_32.S
> @@ -133,7 +133,7 @@ ENTRY(_start)
>  	}
>  	ENDPROC(_start)
>  
> -.section ".bss.page_aligned","w"
> +.section ".bss..page_aligned","w"

Here you could use

    __PAGE_ALIGNED_BSS


>  	.align PAGE_SIZE
>  ENTRY(empty_zero_page)
>  	.fill PAGE_SIZE,1,0
> @@ -148,7 +148,7 @@ ENTRY(empty_zero_page)
>  	.word (\bits1) | (HV_CPA_TO_PFN(\cpa) << HV_PTE_INDEX_PFN)
>  	.endm
>  
> -.section ".data.page_aligned","wa"
> +.section ".data..page_aligned","wa"

and here you could use

    __PAGE_ALIGNED_DATA


>  /* This page is remapped on startup to be hash-for-home. */
>  int atomic_locks[PAGE_SIZE / sizeof(int) /* Only ATOMIC_HASH_SIZE is used */]
> -  __attribute__((aligned(PAGE_SIZE), section(".bss.page_aligned")));
> +  __attribute__((aligned(PAGE_SIZE), section(".bss..page_aligned")));

    __page_aligned_bss


	Sam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arch/tile: catch up with section naming convention in 2.6.35
  2011-03-01 20:52   ` Sam Ravnborg
@ 2011-03-01 21:17     ` Chris Metcalf
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Metcalf @ 2011-03-01 21:17 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: linux-kernel

On 3/1/2011 3:52 PM, Sam Ravnborg wrote:
> Hi Chris.
>
> A few improvement proposals.
>
>> [...]
> Here you could use
>
>     __PAGE_ALIGNED_BSS
>> [...]
> and here you could use
>
>     __PAGE_ALIGNED_DATA
>> [...]
>     __page_aligned_bss

Thanks!  I updated the change to use the symbolic names.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-03-01 21:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-27 23:52 arch/tile: various fixes for 2.6.39 Chris Metcalf
2011-02-27 23:52 ` [PATCH] arch/tile: catch up with section naming convention in 2.6.35 Chris Metcalf
2011-03-01 20:52   ` Sam Ravnborg
2011-03-01 21:17     ` Chris Metcalf
2011-02-28 18:08 ` [PATCH] arch/tile: bug fix: exec'ed task thought it was still single-stepping Chris Metcalf
2011-02-28 18:21 ` [PATCH] arch/tile: fix __ndelay etc to work better Chris Metcalf
2011-02-28 18:24 ` [PATCH] arch/tile: stop disabling INTCTRL_1 interrupts during hypervisor downcalls Chris Metcalf
2011-02-28 18:32 ` [PATCH] arch/tile: warn and retry if an IPI is not accepted by the target cpu Chris Metcalf
2011-02-28 18:35 ` [PATCH] arch/tile: export <asm/hardwall.h> to userspace Chris Metcalf
2011-02-28 20:01 ` [PATCH] arch/tile: avoid a simulator warning during bootup Chris Metcalf
2011-02-28 20:14 ` [PATCH] arch/tile: fix reversed test of strict_strtol() return value Chris Metcalf
2011-02-28 20:19 ` [PATCH] arch/tile: sync up with <arch/sim.h> and <arch/sim_def.h> changes Chris Metcalf
2011-02-28 20:22 ` [PATCH] arch/tile: use a cleaner technique to enable interrupt for cpu_idle() Chris Metcalf
2011-02-28 20:28 ` [PATCH] arch/tile: use extended assembly to inline __mb_incoherent() Chris Metcalf
2011-02-28 20:30 ` [PATCH] arch/tile: fix two bugs in the backtracer code Chris Metcalf
2011-02-28 20:48 ` [PATCH] arch/tile: enhance existing finv_buffer_remote() routine Chris Metcalf
2011-02-28 20:51 ` [PATCH] arch/tile: export some additional module symbols Chris Metcalf
2011-02-28 20:58 ` [PATCH] arch/tile: fix some comments and whitespace Chris Metcalf
2011-02-28 21:01 ` [PATCH] arch/tile: add some more VMSPLIT options and use consistent naming Chris Metcalf
2011-02-28 21:37 ` [PATCH] arch/tile: support 4KB page size as well as 64KB Chris Metcalf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.