[PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems
@ 2013-10-03 21:17 Santosh Shilimkar
  2013-10-03 21:17 ` [PATCH v3 1/6] ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-03 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

This is v3 of the series which addresses the comments/suggestions
we received on v2[1]. The v1 version link is here [2]

The series trying to extend the existing v2p runtime patching for
LPAE machines which can have physical memory beyond 4 GB. Keystone
is one such ARM machine.

64 bit patching support patch is significantly revised with the inputs
from Nicolas Pitre. The patch-set is tested in various modes like
LPAE/non-LPAE, ARM/THMUB. For the THUMB2 build, we found an issue
with devicemap_init() code sequence and the last patch in the series
tries to address that. I missed that patch to be included in the
last version.

Special thanks to Nicolas for his valuable feedback on the earlier
versions.

There was a point about dual patching and avoiding two steps but there
is no easy way at least we can think of apart from ripping out the
current patch code and directly operating on pv_offsets which is
already nacked while back. In either case, this will be an optimisation
and can be carried out as a next step.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>

Santosh Shilimkar (4):
  ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions
  ARM: mm: Introduce virt_to_idmap() with an arch hook
  ARM: mm: Move the idmap print to appropriate place in the code
  ARM: mm: Recreate kernel mappings in early_paging_init()

Sricharan R (2):
  ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  ARM: mm: Change the order of TLB/cache maintenance operations.

 arch/arm/include/asm/mach/arch.h |    1 +
 arch/arm/include/asm/memory.h    |   73 +++++++++++++++++++++++++++++----
 arch/arm/kernel/armksyms.c       |    1 +
 arch/arm/kernel/head.S           |   60 +++++++++++++++++----------
 arch/arm/kernel/patch.c          |    3 ++
 arch/arm/kernel/setup.c          |    3 ++
 arch/arm/kernel/smp.c            |    2 +-
 arch/arm/mm/idmap.c              |    8 ++--
 arch/arm/mm/mmu.c                |   84 +++++++++++++++++++++++++++++++++++++-
 9 files changed, 199 insertions(+), 36 deletions(-)

Regards,
Santosh

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-July/188108.html
[2] http://lwn.net/Articles/556175/ 
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/6] ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions
  2013-10-03 21:17 [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
@ 2013-10-03 21:17 ` Santosh Shilimkar
  2013-10-03 21:17 ` [PATCH v3 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook Santosh Shilimkar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-03 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

Fix remainder types used when converting back and forth between
physical and virtual addresses.

Cc: Russell King <linux@arm.linux.org.uk>

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/memory.h |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index e750a93..c133bd9 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -185,22 +185,32 @@ extern unsigned long __pv_phys_offset;
 	: "=r" (to)					\
 	: "r" (from), "I" (type))
 
-static inline unsigned long __virt_to_phys(unsigned long x)
+static inline phys_addr_t __virt_to_phys(unsigned long x)
 {
 	unsigned long t;
 	__pv_stub(x, t, "add", __PV_BITS_31_24);
 	return t;
 }
 
-static inline unsigned long __phys_to_virt(unsigned long x)
+static inline unsigned long __phys_to_virt(phys_addr_t x)
 {
 	unsigned long t;
 	__pv_stub(x, t, "sub", __PV_BITS_31_24);
 	return t;
 }
+
 #else
-#define __virt_to_phys(x)	((x) - PAGE_OFFSET + PHYS_OFFSET)
-#define __phys_to_virt(x)	((x) - PHYS_OFFSET + PAGE_OFFSET)
+
+static inline phys_addr_t __virt_to_phys(unsigned long x)
+{
+	return (phys_addr_t)x - PAGE_OFFSET + PHYS_OFFSET;
+}
+
+static inline unsigned long __phys_to_virt(phys_addr_t x)
+{
+	return x - PHYS_OFFSET + PAGE_OFFSET;
+}
+
 #endif
 #endif
 #endif /* __ASSEMBLY__ */
@@ -238,14 +248,14 @@ static inline phys_addr_t virt_to_phys(const volatile void *x)
 
 static inline void *phys_to_virt(phys_addr_t x)
 {
-	return (void *)(__phys_to_virt((unsigned long)(x)));
+	return (void *)__phys_to_virt(x);
 }
 
 /*
  * Drivers should NOT use these either.
  */
 #define __pa(x)			__virt_to_phys((unsigned long)(x))
-#define __va(x)			((void *)__phys_to_virt((unsigned long)(x)))
+#define __va(x)			((void *)__phys_to_virt((phys_addr_t)(x)))
 #define pfn_to_kaddr(pfn)	__va((pfn) << PAGE_SHIFT)
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook
  2013-10-03 21:17 [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
  2013-10-03 21:17 ` [PATCH v3 1/6] ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
@ 2013-10-03 21:17 ` Santosh Shilimkar
  2013-10-03 21:17 ` [PATCH v3 3/6] ARM: mm: Move the idmap print to appropriate place in the code Santosh Shilimkar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-03 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

On some PAE systems (e.g. TI Keystone), memory is above the
32-bit addressable limit, and the interconnect provides an
aliased view of parts of physical memory in the 32-bit addressable
space.  This alias is strictly for boot time usage, and is not
otherwise usable because of coherency limitations. On such systems,
the idmap mechanism needs to take this aliased mapping into account.

This patch introduces virt_to_idmap() and a arch function pointer which
can be populated by platform which needs it. Also populate necessary
idmap spots with now available virt_to_idmap(). Avoided #ifdef approach
to be compatible with multi-platform builds.

Most architecture won't touch it and in that case virt_to_idmap()
fall-back to existing virt_to_phys() macro.

Cc: Russell King <linux@arm.linux.org.uk>

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/memory.h |   16 ++++++++++++++++
 arch/arm/kernel/smp.c         |    2 +-
 arch/arm/mm/idmap.c           |    5 +++--
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index c133bd9..d9b96c65 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -173,6 +173,7 @@
  */
 #define __PV_BITS_31_24	0x81000000
 
+extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
 extern unsigned long __pv_phys_offset;
 #define PHYS_OFFSET __pv_phys_offset
 
@@ -259,6 +260,21 @@ static inline void *phys_to_virt(phys_addr_t x)
 #define pfn_to_kaddr(pfn)	__va((pfn) << PAGE_SHIFT)
 
 /*
+ * These are for systems that have a hardware interconnect supported alias of
+ * physical memory for idmap purposes.  Most cases should leave these
+ * untouched.
+ */
+static inline phys_addr_t __virt_to_idmap(unsigned long x)
+{
+	if (arch_virt_to_idmap)
+		return arch_virt_to_idmap(x);
+	else
+		return __virt_to_phys(x);
+}
+
+#define virt_to_idmap(x)	__virt_to_idmap((unsigned long)(x))
+
+/*
  * Virtual <-> DMA view memory address translations
  * Again, these are *only* valid on the kernel direct mapped RAM
  * memory.  Use of these is *deprecated* (and that doesn't mean
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 72024ea..a0eb830 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -80,7 +80,7 @@ void __init smp_set_ops(struct smp_operations *ops)
 
 static unsigned long get_arch_pgd(pgd_t *pgd)
 {
-	phys_addr_t pgdir = virt_to_phys(pgd);
+	phys_addr_t pgdir = virt_to_idmap(pgd);
 	BUG_ON(pgdir & ARCH_PGD_MASK);
 	return pgdir >> ARCH_PGD_SHIFT;
 }
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 83cb3ac..c0a1e48 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -10,6 +10,7 @@
 #include <asm/system_info.h>
 
 pgd_t *idmap_pgd;
+phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
 
 #ifdef CONFIG_ARM_LPAE
 static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end,
@@ -67,8 +68,8 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start,
 	unsigned long addr, end;
 	unsigned long next;
 
-	addr = virt_to_phys(text_start);
-	end = virt_to_phys(text_end);
+	addr = virt_to_idmap(text_start);
+	end = virt_to_idmap(text_end);
 
 	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/6] ARM: mm: Move the idmap print to appropriate place in the code
  2013-10-03 21:17 [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
  2013-10-03 21:17 ` [PATCH v3 1/6] ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
  2013-10-03 21:17 ` [PATCH v3 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook Santosh Shilimkar
@ 2013-10-03 21:17 ` Santosh Shilimkar
  2013-10-03 21:17 ` [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-03 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

Commit 9e9a367c29cebd2 {ARM: Section based HYP idmap} moved
the address conversion inside identity_mapping_add() without
respective print which carries useful idmap information.

Move the print as well inside identity_mapping_add() to
fix the same.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Nicolas Pitre <nico@linaro.org>

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mm/idmap.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index c0a1e48..8e0e52e 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -70,6 +70,7 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start,
 
 	addr = virt_to_idmap(text_start);
 	end = virt_to_idmap(text_end);
+	pr_info("Setting up static identity map for 0x%lx - 0x%lx\n", addr, end);
 
 	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
@@ -91,8 +92,6 @@ static int __init init_static_idmap(void)
 	if (!idmap_pgd)
 		return -ENOMEM;
 
-	pr_info("Setting up static identity map for 0x%p - 0x%p\n",
-		__idmap_text_start, __idmap_text_end);
 	identity_mapping_add(idmap_pgd, __idmap_text_start,
 			     __idmap_text_end, 0);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  2013-10-03 21:17 [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
                   ` (2 preceding siblings ...)
  2013-10-03 21:17 ` [PATCH v3 3/6] ARM: mm: Move the idmap print to appropriate place in the code Santosh Shilimkar
@ 2013-10-03 21:17 ` Santosh Shilimkar
  2013-10-04  0:17   ` Nicolas Pitre
  2013-10-03 21:17 ` [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
  2013-10-03 21:18 ` [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations Santosh Shilimkar
  5 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-03 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <r.sricharan@ti.com>

The current phys_to_virt patching mechanism works only for 32 bit
physical addresses and this patch extends the idea for 64bit physical
addresses.

The 64bit v2p patching mechanism patches the higher 8 bits of physical
address with a constant using 'mov' instruction and lower 32bits are patched
using 'add'. While this is correct, in those platforms where the lowmem addressable
physical memory spawns across 4GB boundary, a carry bit can be produced as a
result of addition of lower 32bits. This has to be taken in to account and added
in to the upper. The patched __pv_offset and va are added in lower 32bits, where
__pv_offset can be in two's complement form when PA_START < VA_START and that can
result in a false carry bit.

e.g
    1) PA = 0x80000000; VA = 0xC0000000
       __pv_offset = PA - VA = 0xC0000000 (2's complement)

    2) PA = 0x2 80000000; VA = 0xC000000
       __pv_offset = PA - VA = 0x1 C0000000

So adding __pv_offset + VA should never result in a true overflow for (1).
So in order to differentiate between a true carry, a __pv_offset is extended
to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
2's complement. So 'mvn #0' is inserted instead of 'mov' while patching
for the same reason. Since mov, add, sub instruction are to patched
with different constants inside the same stub, the rotation field
of the opcode is using to differentiate between them.

So the above examples for v2p translation becomes for VA=0xC0000000,
    1) PA[63:32] = 0xffffffff
       PA[31:0] = VA + 0xC0000000 --> results in a carry
       PA[63:32] = PA[63:32] + carry

       PA[63:0] = 0x0 80000000

    2) PA[63:32] = 0x1
       PA[31:0] = VA + 0xC0000000 --> results in a carry
       PA[63:32] = PA[63:32] + carry

       PA[63:0] = 0x2 80000000

The above ideas were suggested by Nicolas Pitre <nico@linaro.org> as
part of the review of first and second versions of the subject patch.

There is no corresponding change on the phys_to_virt() side, because
computations on the upper 32-bits would be discarded anyway.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/memory.h |   35 +++++++++++++++++++++---
 arch/arm/kernel/armksyms.c    |    1 +
 arch/arm/kernel/head.S        |   60 ++++++++++++++++++++++++++---------------
 arch/arm/kernel/patch.c       |    3 +++
 4 files changed, 75 insertions(+), 24 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index d9b96c65..942ad84 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -172,9 +172,12 @@
  * so that all we need to do is modify the 8-bit constant field.
  */
 #define __PV_BITS_31_24	0x81000000
+#define __PV_BITS_7_0	0x81
 
 extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
-extern unsigned long __pv_phys_offset;
+extern u64 __pv_phys_offset;
+extern u64 __pv_offset;
+
 #define PHYS_OFFSET __pv_phys_offset
 
 #define __pv_stub(from,to,instr,type)			\
@@ -186,10 +189,36 @@ extern unsigned long __pv_phys_offset;
 	: "=r" (to)					\
 	: "r" (from), "I" (type))
 
+#define __pv_stub_mov_hi(t)				\
+	__asm__ volatile("@ __pv_stub_mov\n"		\
+	"1:	mov	%R0, %1\n"			\
+	"	.pushsection .pv_table,\"a\"\n"		\
+	"	.long	1b\n"				\
+	"	.popsection\n"				\
+	: "=r" (t)					\
+	: "I" (__PV_BITS_7_0))
+
+#define __pv_add_carry_stub(x, y)			\
+	__asm__ volatile("@ __pv_add_carry_stub\n"	\
+	"1:	adds	%Q0, %1, %2\n"			\
+	"	adc	%R0, %R0, #0\n"			\
+	"	.pushsection .pv_table,\"a\"\n"		\
+	"	.long	1b\n"				\
+	"	.popsection\n"				\
+	: "+r" (y)					\
+	: "r" (x), "I" (__PV_BITS_31_24)		\
+	: "cc")
+
 static inline phys_addr_t __virt_to_phys(unsigned long x)
 {
-	unsigned long t;
-	__pv_stub(x, t, "add", __PV_BITS_31_24);
+	phys_addr_t t;
+
+	if (sizeof(phys_addr_t) == 4) {
+		__pv_stub(x, t, "add", __PV_BITS_31_24);
+	} else {
+		__pv_stub_mov_hi(t);
+		__pv_add_carry_stub(x, t);
+	}
 	return t;
 }
 
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..1f031dd 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
 
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 EXPORT_SYMBOL(__pv_phys_offset);
+EXPORT_SYMBOL(__pv_offset);
 #endif
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 2c7cc1e..90d04d7 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -536,6 +536,14 @@ ENTRY(fixup_smp)
 	ldmfd	sp!, {r4 - r6, pc}
 ENDPROC(fixup_smp)
 
+#ifdef __ARMEB_
+#define LOW_OFFSET	0x4
+#define HIGH_OFFSET	0x0
+#else
+#define LOW_OFFSET	0x0
+#define HIGH_OFFSET	0x4
+#endif
+
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 
 /* __fixup_pv_table - patch the stub instructions with the delta between
@@ -546,17 +554,20 @@ ENDPROC(fixup_smp)
 	__HEAD
 __fixup_pv_table:
 	adr	r0, 1f
-	ldmia	r0, {r3-r5, r7}
-	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
+	ldmia	r0, {r3-r7}
+	mvn	ip, #0
+	subs	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
 	add	r4, r4, r3	@ adjust table start address
 	add	r5, r5, r3	@ adjust table end address
-	add	r7, r7, r3	@ adjust __pv_phys_offset address
-	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
+	add	r6, r6, r3	@ adjust __pv_phys_offset address
+	add	r7, r7, r3	@ adjust __pv_offset address
+	str	r8, [r6, #LOW_OFFSET]	@ save computed PHYS_OFFSET to __pv_phys_offset
+	strcc	ip, [r7, #HIGH_OFFSET]	@ save to __pv_offset high bits
 	mov	r6, r3, lsr #24	@ constant for add/sub instructions
 	teq	r3, r6, lsl #24 @ must be 16MiB aligned
 THUMB(	it	ne		@ cross section branch )
 	bne	__error
-	str	r6, [r7, #4]	@ save to __pv_offset
+	str	r3, [r7, #LOW_OFFSET]	@ save to __pv_offset low bits
 	b	__fixup_a_pv_table
 ENDPROC(__fixup_pv_table)
 
@@ -565,9 +576,18 @@ ENDPROC(__fixup_pv_table)
 	.long	__pv_table_begin
 	.long	__pv_table_end
 2:	.long	__pv_phys_offset
+	.long	__pv_offset
 
 	.text
 __fixup_a_pv_table:
+	adr	r0, 3f
+	ldr	r6, [r0]
+	add	r6, r6, r3
+	ldr	r0, [r6, #HIGH_OFFSET]	@ pv_offset high word
+	ldr	r6, [r6, #LOW_OFFSET]	@ pv_offset low word
+	mov	r6, r6, lsr #24
+	cmn	r0, #1
+	moveq	r0, #0x400000	@ set bit 22, mov to mvn instruction
 #ifdef CONFIG_THUMB2_KERNEL
 	lsls	r6, #24
 	beq	2f
@@ -582,9 +602,15 @@ __fixup_a_pv_table:
 	b	2f
 1:	add     r7, r3
 	ldrh	ip, [r7, #2]
-	and	ip, 0x8f00
-	orr	ip, r6	@ mask in offset bits 31-24
+	tst	ip, #0x4000
+	and	ip, #0x8f00
+	orrne	ip, r6	@ mask in offset bits 31-24
+	orreq	ip, r0	@ mask in offset bits 7-0
 	strh	ip, [r7, #2]
+	ldrheq	ip, [r7]
+	biceq	ip, #0x20
+	orreq	ip, ip, r0, lsr #16
+	strheq	ip, [r7]
 2:	cmp	r4, r5
 	ldrcc	r7, [r4], #4	@ use branch for delay slot
 	bcc	1b
@@ -593,7 +619,10 @@ __fixup_a_pv_table:
 	b	2f
 1:	ldr	ip, [r7, r3]
 	bic	ip, ip, #0x000000ff
-	orr	ip, ip, r6	@ mask in offset bits 31-24
+	tst	ip, #0xf00	@ check the rotation field
+	orrne	ip, ip, r6	@ mask in offset bits 31-24
+	biceq	ip, ip, #0x400000	@ clear bit 22
+	orreq	ip, ip, r0	@ mask in offset bits 7-0
 	str	ip, [r7, r3]
 2:	cmp	r4, r5
 	ldrcc	r7, [r4], #4	@ use branch for delay slot
@@ -602,28 +631,17 @@ __fixup_a_pv_table:
 #endif
 ENDPROC(__fixup_a_pv_table)
 
+3:	.long __pv_offset
+
 ENTRY(fixup_pv_table)
 	stmfd	sp!, {r4 - r7, lr}
-	ldr	r2, 2f			@ get address of __pv_phys_offset
 	mov	r3, #0			@ no offset
 	mov	r4, r0			@ r0 = table start
 	add	r5, r0, r1		@ r1 = table size
-	ldr	r6, [r2, #4]		@ get __pv_offset
 	bl	__fixup_a_pv_table
 	ldmfd	sp!, {r4 - r7, pc}
 ENDPROC(fixup_pv_table)
 
-	.align
-2:	.long	__pv_phys_offset
-
-	.data
-	.globl	__pv_phys_offset
-	.type	__pv_phys_offset, %object
-__pv_phys_offset:
-	.long	0
-	.size	__pv_phys_offset, . - __pv_phys_offset
-__pv_offset:
-	.long	0
 #endif
 
 #include "head-common.S"
diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index 07314af..8356312 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -8,6 +8,9 @@
 
 #include "patch.h"
 
+u64 __pv_phys_offset __attribute__((section(".data")));
+u64 __pv_offset __attribute__((section(".data")));
+
 struct patch {
 	void *addr;
 	unsigned int insn;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-03 21:17 [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
                   ` (3 preceding siblings ...)
  2013-10-03 21:17 ` [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
@ 2013-10-03 21:17 ` Santosh Shilimkar
  2013-10-04  0:23   ` Nicolas Pitre
  2013-10-04 15:59   ` Will Deacon
  2013-10-03 21:18 ` [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations Santosh Shilimkar
  5 siblings, 2 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-03 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds a step in the init sequence, in order to recreate
the kernel code/data page table mappings prior to full paging
initialization.  This is necessary on LPAE systems that run out of
a physical address space outside the 4G limit.  On these systems,
this implementation provides a machine descriptor hook that allows
the PHYS_OFFSET to be overridden in a machine specific fashion.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: R Sricharan <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/mach/arch.h |    1 +
 arch/arm/kernel/setup.c          |    3 ++
 arch/arm/mm/mmu.c                |   82 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 402a2bc..17a3fa2 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -49,6 +49,7 @@ struct machine_desc {
 	bool			(*smp_init)(void);
 	void			(*fixup)(struct tag *, char **,
 					 struct meminfo *);
+	void			(*init_meminfo)(void);
 	void			(*reserve)(void);/* reserve mem blocks	*/
 	void			(*map_io)(void);/* IO mapping function	*/
 	void			(*init_early)(void);
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 0e1e2b3..b9a6dac 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -73,6 +73,7 @@ __setup("fpe=", fpe_setup);
 #endif
 
 extern void paging_init(const struct machine_desc *desc);
+extern void early_paging_init(const struct machine_desc *, struct proc_info_list *);
 extern void sanity_check_meminfo(void);
 extern enum reboot_mode reboot_mode;
 extern void setup_dma_zone(const struct machine_desc *desc);
@@ -878,6 +879,8 @@ void __init setup_arch(char **cmdline_p)
 	parse_early_param();
 
 	sort(&meminfo.bank, meminfo.nr_banks, sizeof(meminfo.bank[0]), meminfo_cmp, NULL);
+
+	early_paging_init(mdesc, lookup_processor_type(read_cpuid_id()));
 	sanity_check_meminfo();
 	arm_memblock_init(&meminfo, mdesc);
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b1d17ee..47c7497 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -28,6 +28,7 @@
 #include <asm/highmem.h>
 #include <asm/system_info.h>
 #include <asm/traps.h>
+#include <asm/procinfo.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -1315,6 +1316,87 @@ static void __init map_lowmem(void)
 	}
 }
 
+#ifdef CONFIG_ARM_LPAE
+extern void fixup_pv_table(const void *, unsigned long);
+extern const void *__pv_table_begin, *__pv_table_end;
+
+/*
+ * early_paging_init() recreates boot time page table setup, allowing machines
+ * to switch over to a high (>4G) address space on LPAE systems
+ */
+void __init early_paging_init(const struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
+	unsigned long map_start, map_end;
+	pgd_t *pgd0, *pgdk;
+	pud_t *pud0, *pudk;
+	pmd_t *pmd0, *pmdk;
+	phys_addr_t phys;
+	int i;
+
+	/* remap kernel code and data */
+	map_start = init_mm.start_code;
+	map_end   = init_mm.brk;
+
+	/* get a handle on things... */
+	pgd0 = pgd_offset_k(0);
+	pud0 = pud_offset(pgd0, 0);
+	pmd0 = pmd_offset(pud0, 0);
+
+	pgdk = pgd_offset_k(map_start);
+	pudk = pud_offset(pgdk, map_start);
+	pmdk = pmd_offset(pudk, map_start);
+
+	phys = PHYS_OFFSET;
+
+	if (mdesc->init_meminfo) {
+		mdesc->init_meminfo();
+		/* Run the patch stub to update the constants */
+		fixup_pv_table(&__pv_table_begin,
+			(&__pv_table_end - &__pv_table_begin) << 2);
+
+		/*
+		 * Cache cleaning operations for self-modifying code
+		 * We should clean the entries by MVA but running a
+		 * for loop over every pv_table entry pointer would
+		 * just complicate the code.
+		 */
+		flush_cache_louis();
+		dsb();
+		isb();
+	}
+
+	/* remap level 1 table */
+	for (i = 0; i < PTRS_PER_PGD; i++) {
+		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
+		pmd0 += PTRS_PER_PMD;
+	}
+
+	/* remap pmds for kernel mapping */
+	phys = __pa(map_start) & PMD_MASK;
+	do {
+		*pmdk++ = __pmd(phys | pmdprot);
+		phys += PMD_SIZE;
+	} while (phys < map_end);
+
+	flush_cache_all();
+	cpu_set_ttbr(0, __pa(pgd0));
+	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
+	local_flush_tlb_all();
+}
+
+#else
+
+void __init early_paging_init(const struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	if (mdesc->init_meminfo)
+		mdesc->init_meminfo();
+}
+
+#endif
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps, and sets up the zero page, bad page and bad page tables.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-03 21:17 [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
                   ` (4 preceding siblings ...)
  2013-10-03 21:17 ` [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
@ 2013-10-03 21:18 ` Santosh Shilimkar
  2013-10-04  0:25   ` Nicolas Pitre
                     ` (2 more replies)
  5 siblings, 3 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-03 21:18 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <r.sricharan@ti.com>

As per the arm ARMv7 manual, the sequence of TLB maintenance
operations after making changes to the translation table is
to clean the dcache first, then invalidate the TLB. With
the current sequence we see cache corruption when the
flush_cache_all is called after tlb_flush_all.

STR rx, [Translation table entry]
; write new entry to the translation table
Clean cache line [Translation table entry]
DSB
; ensures visibility of the data cleaned from the D Cache
Invalidate TLB entry by MVA (and ASID if non-global) [page address]
Invalidate BTC
DSB
; ensure completion of the Invalidate TLB operation
ISB
; ensure table changes visible to instruction fetch

The issue is seen only with LPAE + THUMB BUILT KERNEL + 64BIT patching,
which is little bit weird.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>

Signed-off-by: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mm/mmu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 47c7497..49cba8a 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1280,8 +1280,8 @@ static void __init devicemaps_init(const struct machine_desc *mdesc)
 	 * any write-allocated cache lines in the vector page are written
 	 * back.  After this point, we can start to touch devices again.
 	 */
-	local_flush_tlb_all();
 	flush_cache_all();
+	local_flush_tlb_all();
 }
 
 static void __init kmap_init(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  2013-10-03 21:17 ` [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
@ 2013-10-04  0:17   ` Nicolas Pitre
  2013-10-04  5:37     ` Sricharan R
  0 siblings, 1 reply; 28+ messages in thread
From: Nicolas Pitre @ 2013-10-04  0:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 3 Oct 2013, Santosh Shilimkar wrote:

> From: Sricharan R <r.sricharan@ti.com>
> 
> The current phys_to_virt patching mechanism works only for 32 bit
> physical addresses and this patch extends the idea for 64bit physical
> addresses.
> 
> The 64bit v2p patching mechanism patches the higher 8 bits of physical
> address with a constant using 'mov' instruction and lower 32bits are patched
> using 'add'. While this is correct, in those platforms where the lowmem addressable
> physical memory spawns across 4GB boundary, a carry bit can be produced as a
> result of addition of lower 32bits. This has to be taken in to account and added
> in to the upper. The patched __pv_offset and va are added in lower 32bits, where
> __pv_offset can be in two's complement form when PA_START < VA_START and that can
> result in a false carry bit.
> 
> e.g
>     1) PA = 0x80000000; VA = 0xC0000000
>        __pv_offset = PA - VA = 0xC0000000 (2's complement)
> 
>     2) PA = 0x2 80000000; VA = 0xC000000
>        __pv_offset = PA - VA = 0x1 C0000000
> 
> So adding __pv_offset + VA should never result in a true overflow for (1).
> So in order to differentiate between a true carry, a __pv_offset is extended
> to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
> 2's complement. So 'mvn #0' is inserted instead of 'mov' while patching
> for the same reason. Since mov, add, sub instruction are to patched
> with different constants inside the same stub, the rotation field
> of the opcode is using to differentiate between them.
> 
> So the above examples for v2p translation becomes for VA=0xC0000000,
>     1) PA[63:32] = 0xffffffff
>        PA[31:0] = VA + 0xC0000000 --> results in a carry
>        PA[63:32] = PA[63:32] + carry
> 
>        PA[63:0] = 0x0 80000000
> 
>     2) PA[63:32] = 0x1
>        PA[31:0] = VA + 0xC0000000 --> results in a carry
>        PA[63:32] = PA[63:32] + carry
> 
>        PA[63:0] = 0x2 80000000
> 
> The above ideas were suggested by Nicolas Pitre <nico@linaro.org> as
> part of the review of first and second versions of the subject patch.
> 
> There is no corresponding change on the phys_to_virt() side, because
> computations on the upper 32-bits would be discarded anyway.
> 
> Cc: Nicolas Pitre <nico@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> 
> Signed-off-by: Sricharan R <r.sricharan@ti.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Almost there ...

> ---
>  arch/arm/include/asm/memory.h |   35 +++++++++++++++++++++---
>  arch/arm/kernel/armksyms.c    |    1 +
>  arch/arm/kernel/head.S        |   60 ++++++++++++++++++++++++++---------------
>  arch/arm/kernel/patch.c       |    3 +++
>  4 files changed, 75 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> index d9b96c65..942ad84 100644
> --- a/arch/arm/include/asm/memory.h
> +++ b/arch/arm/include/asm/memory.h
> @@ -172,9 +172,12 @@
>   * so that all we need to do is modify the 8-bit constant field.
>   */
>  #define __PV_BITS_31_24	0x81000000
> +#define __PV_BITS_7_0	0x81
>  
>  extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
> -extern unsigned long __pv_phys_offset;
> +extern u64 __pv_phys_offset;
> +extern u64 __pv_offset;
> +
>  #define PHYS_OFFSET __pv_phys_offset
>  
>  #define __pv_stub(from,to,instr,type)			\
> @@ -186,10 +189,36 @@ extern unsigned long __pv_phys_offset;
>  	: "=r" (to)					\
>  	: "r" (from), "I" (type))
>  
> +#define __pv_stub_mov_hi(t)				\
> +	__asm__ volatile("@ __pv_stub_mov\n"		\
> +	"1:	mov	%R0, %1\n"			\
> +	"	.pushsection .pv_table,\"a\"\n"		\
> +	"	.long	1b\n"				\
> +	"	.popsection\n"				\
> +	: "=r" (t)					\
> +	: "I" (__PV_BITS_7_0))
> +
> +#define __pv_add_carry_stub(x, y)			\
> +	__asm__ volatile("@ __pv_add_carry_stub\n"	\
> +	"1:	adds	%Q0, %1, %2\n"			\
> +	"	adc	%R0, %R0, #0\n"			\
> +	"	.pushsection .pv_table,\"a\"\n"		\
> +	"	.long	1b\n"				\
> +	"	.popsection\n"				\
> +	: "+r" (y)					\
> +	: "r" (x), "I" (__PV_BITS_31_24)		\

The third operand i.e. __PV_BITS_31_24 is useless here.

> +	: "cc")
> +
>  static inline phys_addr_t __virt_to_phys(unsigned long x)
>  {
> -	unsigned long t;
> -	__pv_stub(x, t, "add", __PV_BITS_31_24);
> +	phys_addr_t t;
> +
> +	if (sizeof(phys_addr_t) == 4) {
> +		__pv_stub(x, t, "add", __PV_BITS_31_24);
> +	} else {
> +		__pv_stub_mov_hi(t);
> +		__pv_add_carry_stub(x, t);
> +	}
>  	return t;
>  }
>  
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..1f031dd 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
>  
>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>  EXPORT_SYMBOL(__pv_phys_offset);
> +EXPORT_SYMBOL(__pv_offset);
>  #endif
> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> index 2c7cc1e..90d04d7 100644
> --- a/arch/arm/kernel/head.S
> +++ b/arch/arm/kernel/head.S
> @@ -536,6 +536,14 @@ ENTRY(fixup_smp)
>  	ldmfd	sp!, {r4 - r6, pc}
>  ENDPROC(fixup_smp)
>  
> +#ifdef __ARMEB_
> +#define LOW_OFFSET	0x4
> +#define HIGH_OFFSET	0x0
> +#else
> +#define LOW_OFFSET	0x0
> +#define HIGH_OFFSET	0x4
> +#endif
> +
>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>  
>  /* __fixup_pv_table - patch the stub instructions with the delta between
> @@ -546,17 +554,20 @@ ENDPROC(fixup_smp)
>  	__HEAD
>  __fixup_pv_table:
>  	adr	r0, 1f
> -	ldmia	r0, {r3-r5, r7}
> -	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
> +	ldmia	r0, {r3-r7}
> +	mvn	ip, #0
> +	subs	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
>  	add	r4, r4, r3	@ adjust table start address
>  	add	r5, r5, r3	@ adjust table end address
> -	add	r7, r7, r3	@ adjust __pv_phys_offset address
> -	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
> +	add	r6, r6, r3	@ adjust __pv_phys_offset address
> +	add	r7, r7, r3	@ adjust __pv_offset address
> +	str	r8, [r6, #LOW_OFFSET]	@ save computed PHYS_OFFSET to __pv_phys_offset
> +	strcc	ip, [r7, #HIGH_OFFSET]	@ save to __pv_offset high bits
>  	mov	r6, r3, lsr #24	@ constant for add/sub instructions
>  	teq	r3, r6, lsl #24 @ must be 16MiB aligned
>  THUMB(	it	ne		@ cross section branch )
>  	bne	__error
> -	str	r6, [r7, #4]	@ save to __pv_offset
> +	str	r3, [r7, #LOW_OFFSET]	@ save to __pv_offset low bits
>  	b	__fixup_a_pv_table
>  ENDPROC(__fixup_pv_table)
>  
> @@ -565,9 +576,18 @@ ENDPROC(__fixup_pv_table)
>  	.long	__pv_table_begin
>  	.long	__pv_table_end
>  2:	.long	__pv_phys_offset
> +	.long	__pv_offset
>  
>  	.text
>  __fixup_a_pv_table:
> +	adr	r0, 3f
> +	ldr	r6, [r0]
> +	add	r6, r6, r3
> +	ldr	r0, [r6, #HIGH_OFFSET]	@ pv_offset high word
> +	ldr	r6, [r6, #LOW_OFFSET]	@ pv_offset low word
> +	mov	r6, r6, lsr #24
> +	cmn	r0, #1
> +	moveq	r0, #0x400000	@ set bit 22, mov to mvn instruction
>  #ifdef CONFIG_THUMB2_KERNEL
>  	lsls	r6, #24
>  	beq	2f
> @@ -582,9 +602,15 @@ __fixup_a_pv_table:
>  	b	2f
>  1:	add     r7, r3
>  	ldrh	ip, [r7, #2]
> -	and	ip, 0x8f00
> -	orr	ip, r6	@ mask in offset bits 31-24
> +	tst	ip, #0x4000
> +	and	ip, #0x8f00
> +	orrne	ip, r6	@ mask in offset bits 31-24
> +	orreq	ip, r0	@ mask in offset bits 7-0
>  	strh	ip, [r7, #2]
> +	ldrheq	ip, [r7]
> +	biceq	ip, #0x20
> +	orreq	ip, ip, r0, lsr #16
> +	strheq	ip, [r7]
>  2:	cmp	r4, r5
>  	ldrcc	r7, [r4], #4	@ use branch for delay slot
>  	bcc	1b
> @@ -593,7 +619,10 @@ __fixup_a_pv_table:
>  	b	2f
>  1:	ldr	ip, [r7, r3]
>  	bic	ip, ip, #0x000000ff
> -	orr	ip, ip, r6	@ mask in offset bits 31-24
> +	tst	ip, #0xf00	@ check the rotation field
> +	orrne	ip, ip, r6	@ mask in offset bits 31-24
> +	biceq	ip, ip, #0x400000	@ clear bit 22
> +	orreq	ip, ip, r0	@ mask in offset bits 7-0
>  	str	ip, [r7, r3]
>  2:	cmp	r4, r5
>  	ldrcc	r7, [r4], #4	@ use branch for delay slot
> @@ -602,28 +631,17 @@ __fixup_a_pv_table:
>  #endif
>  ENDPROC(__fixup_a_pv_table)
>  
> +3:	.long __pv_offset
> +
>  ENTRY(fixup_pv_table)
>  	stmfd	sp!, {r4 - r7, lr}
> -	ldr	r2, 2f			@ get address of __pv_phys_offset
>  	mov	r3, #0			@ no offset
>  	mov	r4, r0			@ r0 = table start
>  	add	r5, r0, r1		@ r1 = table size
> -	ldr	r6, [r2, #4]		@ get __pv_offset
>  	bl	__fixup_a_pv_table
>  	ldmfd	sp!, {r4 - r7, pc}
>  ENDPROC(fixup_pv_table)
>  
> -	.align
> -2:	.long	__pv_phys_offset
> -
> -	.data
> -	.globl	__pv_phys_offset
> -	.type	__pv_phys_offset, %object
> -__pv_phys_offset:
> -	.long	0
> -	.size	__pv_phys_offset, . - __pv_phys_offset
> -__pv_offset:
> -	.long	0
>  #endif
>  
>  #include "head-common.S"
> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
> index 07314af..8356312 100644
> --- a/arch/arm/kernel/patch.c
> +++ b/arch/arm/kernel/patch.c
> @@ -8,6 +8,9 @@
>  
>  #include "patch.h"
>  
> +u64 __pv_phys_offset __attribute__((section(".data")));
> +u64 __pv_offset __attribute__((section(".data")));

Please add a comment explaining why you force those variables out of the 
.bss section.  This is unlikely to be obvious to people.

In fact, is there a reason why you moved those out of head.S?  You only 
needed to replace the .long with .quad to match the u64 type.

I think I might have suggested moving them out if they were to be typed 
with phys_addr_t, but using a fixed u64 is simpler.


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-03 21:17 ` [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
@ 2013-10-04  0:23   ` Nicolas Pitre
  2013-10-04 15:59   ` Will Deacon
  1 sibling, 0 replies; 28+ messages in thread
From: Nicolas Pitre @ 2013-10-04  0:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 3 Oct 2013, Santosh Shilimkar wrote:

> This patch adds a step in the init sequence, in order to recreate
> the kernel code/data page table mappings prior to full paging
> initialization.  This is necessary on LPAE systems that run out of
> a physical address space outside the 4G limit.  On these systems,
> this implementation provides a machine descriptor hook that allows
> the PHYS_OFFSET to be overridden in a machine specific fashion.
> 
> Cc: Nicolas Pitre <nico@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> 
> Signed-off-by: R Sricharan <r.sricharan@ti.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/include/asm/mach/arch.h |    1 +
>  arch/arm/kernel/setup.c          |    3 ++
>  arch/arm/mm/mmu.c                |   82 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 86 insertions(+)
> 
> diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
> index 402a2bc..17a3fa2 100644
> --- a/arch/arm/include/asm/mach/arch.h
> +++ b/arch/arm/include/asm/mach/arch.h
> @@ -49,6 +49,7 @@ struct machine_desc {
>  	bool			(*smp_init)(void);
>  	void			(*fixup)(struct tag *, char **,
>  					 struct meminfo *);
> +	void			(*init_meminfo)(void);
>  	void			(*reserve)(void);/* reserve mem blocks	*/
>  	void			(*map_io)(void);/* IO mapping function	*/
>  	void			(*init_early)(void);
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 0e1e2b3..b9a6dac 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -73,6 +73,7 @@ __setup("fpe=", fpe_setup);
>  #endif
>  
>  extern void paging_init(const struct machine_desc *desc);
> +extern void early_paging_init(const struct machine_desc *, struct proc_info_list *);
>  extern void sanity_check_meminfo(void);
>  extern enum reboot_mode reboot_mode;
>  extern void setup_dma_zone(const struct machine_desc *desc);
> @@ -878,6 +879,8 @@ void __init setup_arch(char **cmdline_p)
>  	parse_early_param();
>  
>  	sort(&meminfo.bank, meminfo.nr_banks, sizeof(meminfo.bank[0]), meminfo_cmp, NULL);
> +
> +	early_paging_init(mdesc, lookup_processor_type(read_cpuid_id()));
>  	sanity_check_meminfo();
>  	arm_memblock_init(&meminfo, mdesc);
>  
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index b1d17ee..47c7497 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -28,6 +28,7 @@
>  #include <asm/highmem.h>
>  #include <asm/system_info.h>
>  #include <asm/traps.h>
> +#include <asm/procinfo.h>
>  
>  #include <asm/mach/arch.h>
>  #include <asm/mach/map.h>
> @@ -1315,6 +1316,87 @@ static void __init map_lowmem(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ARM_LPAE
> +extern void fixup_pv_table(const void *, unsigned long);
> +extern const void *__pv_table_begin, *__pv_table_end;
> +
> +/*
> + * early_paging_init() recreates boot time page table setup, allowing machines
> + * to switch over to a high (>4G) address space on LPAE systems
> + */
> +void __init early_paging_init(const struct machine_desc *mdesc,
> +			      struct proc_info_list *procinfo)
> +{
> +	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
> +	unsigned long map_start, map_end;
> +	pgd_t *pgd0, *pgdk;
> +	pud_t *pud0, *pudk;
> +	pmd_t *pmd0, *pmdk;
> +	phys_addr_t phys;
> +	int i;
> +
> +	/* remap kernel code and data */
> +	map_start = init_mm.start_code;
> +	map_end   = init_mm.brk;
> +
> +	/* get a handle on things... */
> +	pgd0 = pgd_offset_k(0);
> +	pud0 = pud_offset(pgd0, 0);
> +	pmd0 = pmd_offset(pud0, 0);
> +
> +	pgdk = pgd_offset_k(map_start);
> +	pudk = pud_offset(pgdk, map_start);
> +	pmdk = pmd_offset(pudk, map_start);
> +
> +	phys = PHYS_OFFSET;
> +
> +	if (mdesc->init_meminfo) {
> +		mdesc->init_meminfo();
> +		/* Run the patch stub to update the constants */
> +		fixup_pv_table(&__pv_table_begin,
> +			(&__pv_table_end - &__pv_table_begin) << 2);
> +
> +		/*
> +		 * Cache cleaning operations for self-modifying code
> +		 * We should clean the entries by MVA but running a
> +		 * for loop over every pv_table entry pointer would
> +		 * just complicate the code.
> +		 */
> +		flush_cache_louis();
> +		dsb();
> +		isb();
> +	}
> +
> +	/* remap level 1 table */
> +	for (i = 0; i < PTRS_PER_PGD; i++) {
> +		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
> +		pmd0 += PTRS_PER_PMD;
> +	}
> +
> +	/* remap pmds for kernel mapping */
> +	phys = __pa(map_start) & PMD_MASK;
> +	do {
> +		*pmdk++ = __pmd(phys | pmdprot);
> +		phys += PMD_SIZE;
> +	} while (phys < map_end);
> +
> +	flush_cache_all();
> +	cpu_set_ttbr(0, __pa(pgd0));
> +	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
> +	local_flush_tlb_all();
> +}
> +
> +#else
> +
> +void __init early_paging_init(const struct machine_desc *mdesc,
> +			      struct proc_info_list *procinfo)
> +{
> +	if (mdesc->init_meminfo)
> +		mdesc->init_meminfo();
> +}
> +
> +#endif
> +
>  /*
>   * paging_init() sets up the page tables, initialises the zone memory
>   * maps, and sets up the zero page, bad page and bad page tables.
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-03 21:18 ` [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations Santosh Shilimkar
@ 2013-10-04  0:25   ` Nicolas Pitre
  2013-10-04  8:46   ` Russell King - ARM Linux
  2013-10-04 15:52   ` Will Deacon
  2 siblings, 0 replies; 28+ messages in thread
From: Nicolas Pitre @ 2013-10-04  0:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 3 Oct 2013, Santosh Shilimkar wrote:

> From: Sricharan R <r.sricharan@ti.com>
> 
> As per the arm ARMv7 manual, the sequence of TLB maintenance
> operations after making changes to the translation table is
> to clean the dcache first, then invalidate the TLB. With
> the current sequence we see cache corruption when the
> flush_cache_all is called after tlb_flush_all.
> 
> STR rx, [Translation table entry]
> ; write new entry to the translation table
> Clean cache line [Translation table entry]
> DSB
> ; ensures visibility of the data cleaned from the D Cache
> Invalidate TLB entry by MVA (and ASID if non-global) [page address]
> Invalidate BTC
> DSB
> ; ensure completion of the Invalidate TLB operation
> ISB
> ; ensure table changes visible to instruction fetch
> 
> The issue is seen only with LPAE + THUMB BUILT KERNEL + 64BIT patching,
> which is little bit weird.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
> 
> Signed-off-by: Sricharan R <r.sricharan@ti.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/mm/mmu.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 47c7497..49cba8a 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1280,8 +1280,8 @@ static void __init devicemaps_init(const struct machine_desc *mdesc)
>  	 * any write-allocated cache lines in the vector page are written
>  	 * back.  After this point, we can start to touch devices again.
>  	 */
> -	local_flush_tlb_all();
>  	flush_cache_all();
> +	local_flush_tlb_all();
>  }
>  
>  static void __init kmap_init(void)
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  2013-10-04  0:17   ` Nicolas Pitre
@ 2013-10-04  5:37     ` Sricharan R
  2013-10-04 13:02       ` Nicolas Pitre
  0 siblings, 1 reply; 28+ messages in thread
From: Sricharan R @ 2013-10-04  5:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,
On Friday 04 October 2013 05:47 AM, Nicolas Pitre wrote:
> On Thu, 3 Oct 2013, Santosh Shilimkar wrote:
>
>> From: Sricharan R <r.sricharan@ti.com>
>>
>> The current phys_to_virt patching mechanism works only for 32 bit
>> physical addresses and this patch extends the idea for 64bit physical
>> addresses.
>>
>> The 64bit v2p patching mechanism patches the higher 8 bits of physical
>> address with a constant using 'mov' instruction and lower 32bits are patched
>> using 'add'. While this is correct, in those platforms where the lowmem addressable
>> physical memory spawns across 4GB boundary, a carry bit can be produced as a
>> result of addition of lower 32bits. This has to be taken in to account and added
>> in to the upper. The patched __pv_offset and va are added in lower 32bits, where
>> __pv_offset can be in two's complement form when PA_START < VA_START and that can
>> result in a false carry bit.
>>
>> e.g
>>     1) PA = 0x80000000; VA = 0xC0000000
>>        __pv_offset = PA - VA = 0xC0000000 (2's complement)
>>
>>     2) PA = 0x2 80000000; VA = 0xC000000
>>        __pv_offset = PA - VA = 0x1 C0000000
>>
>> So adding __pv_offset + VA should never result in a true overflow for (1).
>> So in order to differentiate between a true carry, a __pv_offset is extended
>> to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
>> 2's complement. So 'mvn #0' is inserted instead of 'mov' while patching
>> for the same reason. Since mov, add, sub instruction are to patched
>> with different constants inside the same stub, the rotation field
>> of the opcode is using to differentiate between them.
>>
>> So the above examples for v2p translation becomes for VA=0xC0000000,
>>     1) PA[63:32] = 0xffffffff
>>        PA[31:0] = VA + 0xC0000000 --> results in a carry
>>        PA[63:32] = PA[63:32] + carry
>>
>>        PA[63:0] = 0x0 80000000
>>
>>     2) PA[63:32] = 0x1
>>        PA[31:0] = VA + 0xC0000000 --> results in a carry
>>        PA[63:32] = PA[63:32] + carry
>>
>>        PA[63:0] = 0x2 80000000
>>
>> The above ideas were suggested by Nicolas Pitre <nico@linaro.org> as
>> part of the review of first and second versions of the subject patch.
>>
>> There is no corresponding change on the phys_to_virt() side, because
>> computations on the upper 32-bits would be discarded anyway.
>>
>> Cc: Nicolas Pitre <nico@linaro.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>>
>> Signed-off-by: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Almost there ...
>
>> ---
>>  arch/arm/include/asm/memory.h |   35 +++++++++++++++++++++---
>>  arch/arm/kernel/armksyms.c    |    1 +
>>  arch/arm/kernel/head.S        |   60 ++++++++++++++++++++++++++---------------
>>  arch/arm/kernel/patch.c       |    3 +++
>>  4 files changed, 75 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
>> index d9b96c65..942ad84 100644
>> --- a/arch/arm/include/asm/memory.h
>> +++ b/arch/arm/include/asm/memory.h
>> @@ -172,9 +172,12 @@
>>   * so that all we need to do is modify the 8-bit constant field.
>>   */
>>  #define __PV_BITS_31_24	0x81000000
>> +#define __PV_BITS_7_0	0x81
>>  
>>  extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
>> -extern unsigned long __pv_phys_offset;
>> +extern u64 __pv_phys_offset;
>> +extern u64 __pv_offset;
>> +
>>  #define PHYS_OFFSET __pv_phys_offset
>>  
>>  #define __pv_stub(from,to,instr,type)			\
>> @@ -186,10 +189,36 @@ extern unsigned long __pv_phys_offset;
>>  	: "=r" (to)					\
>>  	: "r" (from), "I" (type))
>>  
>> +#define __pv_stub_mov_hi(t)				\
>> +	__asm__ volatile("@ __pv_stub_mov\n"		\
>> +	"1:	mov	%R0, %1\n"			\
>> +	"	.pushsection .pv_table,\"a\"\n"		\
>> +	"	.long	1b\n"				\
>> +	"	.popsection\n"				\
>> +	: "=r" (t)					\
>> +	: "I" (__PV_BITS_7_0))
>> +
>> +#define __pv_add_carry_stub(x, y)			\
>> +	__asm__ volatile("@ __pv_add_carry_stub\n"	\
>> +	"1:	adds	%Q0, %1, %2\n"			\
>> +	"	adc	%R0, %R0, #0\n"			\
>> +	"	.pushsection .pv_table,\"a\"\n"		\
>> +	"	.long	1b\n"				\
>> +	"	.popsection\n"				\
>> +	: "+r" (y)					\
>> +	: "r" (x), "I" (__PV_BITS_31_24)		\
> The third operand i.e. __PV_BITS_31_24 is useless here.
  This is used to encode the correct rotation and we use
  this in the patching code to identify the the offset to
  be patched.
>> +	: "cc")
>> +
>>  static inline phys_addr_t __virt_to_phys(unsigned long x)
>>  {
>> -	unsigned long t;
>> -	__pv_stub(x, t, "add", __PV_BITS_31_24);
>> +	phys_addr_t t;
>> +
>> +	if (sizeof(phys_addr_t) == 4) {
>> +		__pv_stub(x, t, "add", __PV_BITS_31_24);
>> +	} else {
>> +		__pv_stub_mov_hi(t);
>> +		__pv_add_carry_stub(x, t);
>> +	}
>>  	return t;
>>  }
>>  
>> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
>> index 60d3b73..1f031dd 100644
>> --- a/arch/arm/kernel/armksyms.c
>> +++ b/arch/arm/kernel/armksyms.c
>> @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
>>  
>>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>>  EXPORT_SYMBOL(__pv_phys_offset);
>> +EXPORT_SYMBOL(__pv_offset);
>>  #endif
>> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
>> index 2c7cc1e..90d04d7 100644
>> --- a/arch/arm/kernel/head.S
>> +++ b/arch/arm/kernel/head.S
>> @@ -536,6 +536,14 @@ ENTRY(fixup_smp)
>>  	ldmfd	sp!, {r4 - r6, pc}
>>  ENDPROC(fixup_smp)
>>  
>> +#ifdef __ARMEB_
>> +#define LOW_OFFSET	0x4
>> +#define HIGH_OFFSET	0x0
>> +#else
>> +#define LOW_OFFSET	0x0
>> +#define HIGH_OFFSET	0x4
>> +#endif
>> +
>>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>>  
>>  /* __fixup_pv_table - patch the stub instructions with the delta between
>> @@ -546,17 +554,20 @@ ENDPROC(fixup_smp)
>>  	__HEAD
>>  __fixup_pv_table:
>>  	adr	r0, 1f
>> -	ldmia	r0, {r3-r5, r7}
>> -	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
>> +	ldmia	r0, {r3-r7}
>> +	mvn	ip, #0
>> +	subs	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
>>  	add	r4, r4, r3	@ adjust table start address
>>  	add	r5, r5, r3	@ adjust table end address
>> -	add	r7, r7, r3	@ adjust __pv_phys_offset address
>> -	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
>> +	add	r6, r6, r3	@ adjust __pv_phys_offset address
>> +	add	r7, r7, r3	@ adjust __pv_offset address
>> +	str	r8, [r6, #LOW_OFFSET]	@ save computed PHYS_OFFSET to __pv_phys_offset
>> +	strcc	ip, [r7, #HIGH_OFFSET]	@ save to __pv_offset high bits
>>  	mov	r6, r3, lsr #24	@ constant for add/sub instructions
>>  	teq	r3, r6, lsl #24 @ must be 16MiB aligned
>>  THUMB(	it	ne		@ cross section branch )
>>  	bne	__error
>> -	str	r6, [r7, #4]	@ save to __pv_offset
>> +	str	r3, [r7, #LOW_OFFSET]	@ save to __pv_offset low bits
>>  	b	__fixup_a_pv_table
>>  ENDPROC(__fixup_pv_table)
>>  
>> @@ -565,9 +576,18 @@ ENDPROC(__fixup_pv_table)
>>  	.long	__pv_table_begin
>>  	.long	__pv_table_end
>>  2:	.long	__pv_phys_offset
>> +	.long	__pv_offset
>>  
>>  	.text
>>  __fixup_a_pv_table:
>> +	adr	r0, 3f
>> +	ldr	r6, [r0]
>> +	add	r6, r6, r3
>> +	ldr	r0, [r6, #HIGH_OFFSET]	@ pv_offset high word
>> +	ldr	r6, [r6, #LOW_OFFSET]	@ pv_offset low word
>> +	mov	r6, r6, lsr #24
>> +	cmn	r0, #1
>> +	moveq	r0, #0x400000	@ set bit 22, mov to mvn instruction
>>  #ifdef CONFIG_THUMB2_KERNEL
>>  	lsls	r6, #24
>>  	beq	2f
>> @@ -582,9 +602,15 @@ __fixup_a_pv_table:
>>  	b	2f
>>  1:	add     r7, r3
>>  	ldrh	ip, [r7, #2]
>> -	and	ip, 0x8f00
>> -	orr	ip, r6	@ mask in offset bits 31-24
>> +	tst	ip, #0x4000
>> +	and	ip, #0x8f00
>> +	orrne	ip, r6	@ mask in offset bits 31-24
>> +	orreq	ip, r0	@ mask in offset bits 7-0
>>  	strh	ip, [r7, #2]
>> +	ldrheq	ip, [r7]
>> +	biceq	ip, #0x20
>> +	orreq	ip, ip, r0, lsr #16
>> +	strheq	ip, [r7]
>>  2:	cmp	r4, r5
>>  	ldrcc	r7, [r4], #4	@ use branch for delay slot
>>  	bcc	1b
>> @@ -593,7 +619,10 @@ __fixup_a_pv_table:
>>  	b	2f
>>  1:	ldr	ip, [r7, r3]
>>  	bic	ip, ip, #0x000000ff
>> -	orr	ip, ip, r6	@ mask in offset bits 31-24
>> +	tst	ip, #0xf00	@ check the rotation field
>> +	orrne	ip, ip, r6	@ mask in offset bits 31-24
>> +	biceq	ip, ip, #0x400000	@ clear bit 22
>> +	orreq	ip, ip, r0	@ mask in offset bits 7-0
>>  	str	ip, [r7, r3]
>>  2:	cmp	r4, r5
>>  	ldrcc	r7, [r4], #4	@ use branch for delay slot
>> @@ -602,28 +631,17 @@ __fixup_a_pv_table:
>>  #endif
>>  ENDPROC(__fixup_a_pv_table)
>>  
>> +3:	.long __pv_offset
>> +
>>  ENTRY(fixup_pv_table)
>>  	stmfd	sp!, {r4 - r7, lr}
>> -	ldr	r2, 2f			@ get address of __pv_phys_offset
>>  	mov	r3, #0			@ no offset
>>  	mov	r4, r0			@ r0 = table start
>>  	add	r5, r0, r1		@ r1 = table size
>> -	ldr	r6, [r2, #4]		@ get __pv_offset
>>  	bl	__fixup_a_pv_table
>>  	ldmfd	sp!, {r4 - r7, pc}
>>  ENDPROC(fixup_pv_table)
>>  
>> -	.align
>> -2:	.long	__pv_phys_offset
>> -
>> -	.data
>> -	.globl	__pv_phys_offset
>> -	.type	__pv_phys_offset, %object
>> -__pv_phys_offset:
>> -	.long	0
>> -	.size	__pv_phys_offset, . - __pv_phys_offset
>> -__pv_offset:
>> -	.long	0
>>  #endif
>>  
>>  #include "head-common.S"
>> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
>> index 07314af..8356312 100644
>> --- a/arch/arm/kernel/patch.c
>> +++ b/arch/arm/kernel/patch.c
>> @@ -8,6 +8,9 @@
>>  
>>  #include "patch.h"
>>  
>> +u64 __pv_phys_offset __attribute__((section(".data")));
>> +u64 __pv_offset __attribute__((section(".data")));
> Please add a comment explaining why you force those variables out of the 
> .bss section.  This is unlikely to be obvious to people.
>
> In fact, is there a reason why you moved those out of head.S?  You only 
> needed to replace the .long with .quad to match the u64 type.
>
> I think I might have suggested moving them out if they were to be typed 
> with phys_addr_t, but using a fixed u64 is simpler.
 Yes, I moved it here after your comments :-) . Since it is always u64
  i can move it to head.S with quad as well.

Regards,
 Sricharan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-03 21:18 ` [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations Santosh Shilimkar
  2013-10-04  0:25   ` Nicolas Pitre
@ 2013-10-04  8:46   ` Russell King - ARM Linux
  2013-10-04 13:14     ` Nicolas Pitre
  2013-10-04 15:52   ` Will Deacon
  2 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2013-10-04  8:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 03, 2013 at 05:18:00PM -0400, Santosh Shilimkar wrote:
> From: Sricharan R <r.sricharan@ti.com>
> 
> As per the arm ARMv7 manual, the sequence of TLB maintenance
> operations after making changes to the translation table is
> to clean the dcache first, then invalidate the TLB. With
> the current sequence we see cache corruption when the
> flush_cache_all is called after tlb_flush_all.

This needs testing on ARMv4 CPUs which don't have a way to flush the
cache except by reading memory - hence they need the new page table
entries to be visible to the MMU before calling flush_cache_all().

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  2013-10-04  5:37     ` Sricharan R
@ 2013-10-04 13:02       ` Nicolas Pitre
  2013-10-07 19:25         ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Nicolas Pitre @ 2013-10-04 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 4 Oct 2013, Sricharan R wrote:

> Hi,
> On Friday 04 October 2013 05:47 AM, Nicolas Pitre wrote:
> > On Thu, 3 Oct 2013, Santosh Shilimkar wrote:
> >
> >> From: Sricharan R <r.sricharan@ti.com>
> >>
> >> The current phys_to_virt patching mechanism works only for 32 bit
> >> physical addresses and this patch extends the idea for 64bit physical
> >> addresses.
> >>
> >> The 64bit v2p patching mechanism patches the higher 8 bits of physical
> >> address with a constant using 'mov' instruction and lower 32bits are patched
> >> using 'add'. While this is correct, in those platforms where the lowmem addressable
> >> physical memory spawns across 4GB boundary, a carry bit can be produced as a
> >> result of addition of lower 32bits. This has to be taken in to account and added
> >> in to the upper. The patched __pv_offset and va are added in lower 32bits, where
> >> __pv_offset can be in two's complement form when PA_START < VA_START and that can
> >> result in a false carry bit.
> >>
> >> e.g
> >>     1) PA = 0x80000000; VA = 0xC0000000
> >>        __pv_offset = PA - VA = 0xC0000000 (2's complement)
> >>
> >>     2) PA = 0x2 80000000; VA = 0xC000000
> >>        __pv_offset = PA - VA = 0x1 C0000000
> >>
> >> So adding __pv_offset + VA should never result in a true overflow for (1).
> >> So in order to differentiate between a true carry, a __pv_offset is extended
> >> to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
> >> 2's complement. So 'mvn #0' is inserted instead of 'mov' while patching
> >> for the same reason. Since mov, add, sub instruction are to patched
> >> with different constants inside the same stub, the rotation field
> >> of the opcode is using to differentiate between them.
> >>
> >> So the above examples for v2p translation becomes for VA=0xC0000000,
> >>     1) PA[63:32] = 0xffffffff
> >>        PA[31:0] = VA + 0xC0000000 --> results in a carry
> >>        PA[63:32] = PA[63:32] + carry
> >>
> >>        PA[63:0] = 0x0 80000000
> >>
> >>     2) PA[63:32] = 0x1
> >>        PA[31:0] = VA + 0xC0000000 --> results in a carry
> >>        PA[63:32] = PA[63:32] + carry
> >>
> >>        PA[63:0] = 0x2 80000000
> >>
> >> The above ideas were suggested by Nicolas Pitre <nico@linaro.org> as
> >> part of the review of first and second versions of the subject patch.
> >>
> >> There is no corresponding change on the phys_to_virt() side, because
> >> computations on the upper 32-bits would be discarded anyway.
> >>
> >> Cc: Nicolas Pitre <nico@linaro.org>
> >> Cc: Russell King <linux@arm.linux.org.uk>
> >>
> >> Signed-off-by: Sricharan R <r.sricharan@ti.com>
> >> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> > Almost there ...
> >
> >> ---
> >>  arch/arm/include/asm/memory.h |   35 +++++++++++++++++++++---
> >>  arch/arm/kernel/armksyms.c    |    1 +
> >>  arch/arm/kernel/head.S        |   60 ++++++++++++++++++++++++++---------------
> >>  arch/arm/kernel/patch.c       |    3 +++
> >>  4 files changed, 75 insertions(+), 24 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> >> index d9b96c65..942ad84 100644
> >> --- a/arch/arm/include/asm/memory.h
> >> +++ b/arch/arm/include/asm/memory.h
> >> @@ -172,9 +172,12 @@
> >>   * so that all we need to do is modify the 8-bit constant field.
> >>   */
> >>  #define __PV_BITS_31_24	0x81000000
> >> +#define __PV_BITS_7_0	0x81
> >>  
> >>  extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
> >> -extern unsigned long __pv_phys_offset;
> >> +extern u64 __pv_phys_offset;
> >> +extern u64 __pv_offset;
> >> +
> >>  #define PHYS_OFFSET __pv_phys_offset
> >>  
> >>  #define __pv_stub(from,to,instr,type)			\
> >> @@ -186,10 +189,36 @@ extern unsigned long __pv_phys_offset;
> >>  	: "=r" (to)					\
> >>  	: "r" (from), "I" (type))
> >>  
> >> +#define __pv_stub_mov_hi(t)				\
> >> +	__asm__ volatile("@ __pv_stub_mov\n"		\
> >> +	"1:	mov	%R0, %1\n"			\
> >> +	"	.pushsection .pv_table,\"a\"\n"		\
> >> +	"	.long	1b\n"				\
> >> +	"	.popsection\n"				\
> >> +	: "=r" (t)					\
> >> +	: "I" (__PV_BITS_7_0))
> >> +
> >> +#define __pv_add_carry_stub(x, y)			\
> >> +	__asm__ volatile("@ __pv_add_carry_stub\n"	\
> >> +	"1:	adds	%Q0, %1, %2\n"			\
> >> +	"	adc	%R0, %R0, #0\n"			\
> >> +	"	.pushsection .pv_table,\"a\"\n"		\
> >> +	"	.long	1b\n"				\
> >> +	"	.popsection\n"				\
> >> +	: "+r" (y)					\
> >> +	: "r" (x), "I" (__PV_BITS_31_24)		\
> > The third operand i.e. __PV_BITS_31_24 is useless here.
>   This is used to encode the correct rotation and we use
>   this in the patching code to identify the the offset to
>   be patched.

Obviously!  Please disregard this comment -- I was confused.

> >> +	: "cc")
> >> +
> >>  static inline phys_addr_t __virt_to_phys(unsigned long x)
> >>  {
> >> -	unsigned long t;
> >> -	__pv_stub(x, t, "add", __PV_BITS_31_24);
> >> +	phys_addr_t t;
> >> +
> >> +	if (sizeof(phys_addr_t) == 4) {
> >> +		__pv_stub(x, t, "add", __PV_BITS_31_24);
> >> +	} else {
> >> +		__pv_stub_mov_hi(t);
> >> +		__pv_add_carry_stub(x, t);
> >> +	}
> >>  	return t;
> >>  }
> >>  
> >> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> >> index 60d3b73..1f031dd 100644
> >> --- a/arch/arm/kernel/armksyms.c
> >> +++ b/arch/arm/kernel/armksyms.c
> >> @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
> >>  
> >>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
> >>  EXPORT_SYMBOL(__pv_phys_offset);
> >> +EXPORT_SYMBOL(__pv_offset);
> >>  #endif
> >> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> >> index 2c7cc1e..90d04d7 100644
> >> --- a/arch/arm/kernel/head.S
> >> +++ b/arch/arm/kernel/head.S
> >> @@ -536,6 +536,14 @@ ENTRY(fixup_smp)
> >>  	ldmfd	sp!, {r4 - r6, pc}
> >>  ENDPROC(fixup_smp)
> >>  
> >> +#ifdef __ARMEB_
> >> +#define LOW_OFFSET	0x4
> >> +#define HIGH_OFFSET	0x0
> >> +#else
> >> +#define LOW_OFFSET	0x0
> >> +#define HIGH_OFFSET	0x4
> >> +#endif
> >> +
> >>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
> >>  
> >>  /* __fixup_pv_table - patch the stub instructions with the delta between
> >> @@ -546,17 +554,20 @@ ENDPROC(fixup_smp)
> >>  	__HEAD
> >>  __fixup_pv_table:
> >>  	adr	r0, 1f
> >> -	ldmia	r0, {r3-r5, r7}
> >> -	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
> >> +	ldmia	r0, {r3-r7}
> >> +	mvn	ip, #0
> >> +	subs	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
> >>  	add	r4, r4, r3	@ adjust table start address
> >>  	add	r5, r5, r3	@ adjust table end address
> >> -	add	r7, r7, r3	@ adjust __pv_phys_offset address
> >> -	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
> >> +	add	r6, r6, r3	@ adjust __pv_phys_offset address
> >> +	add	r7, r7, r3	@ adjust __pv_offset address
> >> +	str	r8, [r6, #LOW_OFFSET]	@ save computed PHYS_OFFSET to __pv_phys_offset
> >> +	strcc	ip, [r7, #HIGH_OFFSET]	@ save to __pv_offset high bits
> >>  	mov	r6, r3, lsr #24	@ constant for add/sub instructions
> >>  	teq	r3, r6, lsl #24 @ must be 16MiB aligned
> >>  THUMB(	it	ne		@ cross section branch )
> >>  	bne	__error
> >> -	str	r6, [r7, #4]	@ save to __pv_offset
> >> +	str	r3, [r7, #LOW_OFFSET]	@ save to __pv_offset low bits
> >>  	b	__fixup_a_pv_table
> >>  ENDPROC(__fixup_pv_table)
> >>  
> >> @@ -565,9 +576,18 @@ ENDPROC(__fixup_pv_table)
> >>  	.long	__pv_table_begin
> >>  	.long	__pv_table_end
> >>  2:	.long	__pv_phys_offset
> >> +	.long	__pv_offset
> >>  
> >>  	.text
> >>  __fixup_a_pv_table:
> >> +	adr	r0, 3f
> >> +	ldr	r6, [r0]
> >> +	add	r6, r6, r3
> >> +	ldr	r0, [r6, #HIGH_OFFSET]	@ pv_offset high word
> >> +	ldr	r6, [r6, #LOW_OFFSET]	@ pv_offset low word
> >> +	mov	r6, r6, lsr #24
> >> +	cmn	r0, #1
> >> +	moveq	r0, #0x400000	@ set bit 22, mov to mvn instruction
> >>  #ifdef CONFIG_THUMB2_KERNEL
> >>  	lsls	r6, #24
> >>  	beq	2f
> >> @@ -582,9 +602,15 @@ __fixup_a_pv_table:
> >>  	b	2f
> >>  1:	add     r7, r3
> >>  	ldrh	ip, [r7, #2]
> >> -	and	ip, 0x8f00
> >> -	orr	ip, r6	@ mask in offset bits 31-24
> >> +	tst	ip, #0x4000
> >> +	and	ip, #0x8f00
> >> +	orrne	ip, r6	@ mask in offset bits 31-24
> >> +	orreq	ip, r0	@ mask in offset bits 7-0
> >>  	strh	ip, [r7, #2]
> >> +	ldrheq	ip, [r7]
> >> +	biceq	ip, #0x20
> >> +	orreq	ip, ip, r0, lsr #16
> >> +	strheq	ip, [r7]
> >>  2:	cmp	r4, r5
> >>  	ldrcc	r7, [r4], #4	@ use branch for delay slot
> >>  	bcc	1b
> >> @@ -593,7 +619,10 @@ __fixup_a_pv_table:
> >>  	b	2f
> >>  1:	ldr	ip, [r7, r3]
> >>  	bic	ip, ip, #0x000000ff
> >> -	orr	ip, ip, r6	@ mask in offset bits 31-24
> >> +	tst	ip, #0xf00	@ check the rotation field
> >> +	orrne	ip, ip, r6	@ mask in offset bits 31-24
> >> +	biceq	ip, ip, #0x400000	@ clear bit 22
> >> +	orreq	ip, ip, r0	@ mask in offset bits 7-0
> >>  	str	ip, [r7, r3]
> >>  2:	cmp	r4, r5
> >>  	ldrcc	r7, [r4], #4	@ use branch for delay slot
> >> @@ -602,28 +631,17 @@ __fixup_a_pv_table:
> >>  #endif
> >>  ENDPROC(__fixup_a_pv_table)
> >>  
> >> +3:	.long __pv_offset
> >> +
> >>  ENTRY(fixup_pv_table)
> >>  	stmfd	sp!, {r4 - r7, lr}
> >> -	ldr	r2, 2f			@ get address of __pv_phys_offset
> >>  	mov	r3, #0			@ no offset
> >>  	mov	r4, r0			@ r0 = table start
> >>  	add	r5, r0, r1		@ r1 = table size
> >> -	ldr	r6, [r2, #4]		@ get __pv_offset
> >>  	bl	__fixup_a_pv_table
> >>  	ldmfd	sp!, {r4 - r7, pc}
> >>  ENDPROC(fixup_pv_table)
> >>  
> >> -	.align
> >> -2:	.long	__pv_phys_offset
> >> -
> >> -	.data
> >> -	.globl	__pv_phys_offset
> >> -	.type	__pv_phys_offset, %object
> >> -__pv_phys_offset:
> >> -	.long	0
> >> -	.size	__pv_phys_offset, . - __pv_phys_offset
> >> -__pv_offset:
> >> -	.long	0
> >>  #endif
> >>  
> >>  #include "head-common.S"
> >> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
> >> index 07314af..8356312 100644
> >> --- a/arch/arm/kernel/patch.c
> >> +++ b/arch/arm/kernel/patch.c
> >> @@ -8,6 +8,9 @@
> >>  
> >>  #include "patch.h"
> >>  
> >> +u64 __pv_phys_offset __attribute__((section(".data")));
> >> +u64 __pv_offset __attribute__((section(".data")));
> > Please add a comment explaining why you force those variables out of the 
> > .bss section.  This is unlikely to be obvious to people.
> >
> > In fact, is there a reason why you moved those out of head.S?  You only 
> > needed to replace the .long with .quad to match the u64 type.
> >
> > I think I might have suggested moving them out if they were to be typed 
> > with phys_addr_t, but using a fixed u64 is simpler.
>  Yes, I moved it here after your comments :-) . Since it is always u64
>   i can move it to head.S with quad as well.

The reason behind my suggestion was to use phys_addr_t for those 
variable which is easier with C code given that phys_addr_t can be 32 or 
64 bits.  But that makes the assembly more complicated.  With a fixed 
type it is not required to move them.

Once this is done, you can add:

Reviewed-by: Nicolas Pitre <nico@linaro.org>



Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-04  8:46   ` Russell King - ARM Linux
@ 2013-10-04 13:14     ` Nicolas Pitre
  2013-10-04 13:19       ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Nicolas Pitre @ 2013-10-04 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 4 Oct 2013, Russell King - ARM Linux wrote:

> On Thu, Oct 03, 2013 at 05:18:00PM -0400, Santosh Shilimkar wrote:
> > From: Sricharan R <r.sricharan@ti.com>
> > 
> > As per the arm ARMv7 manual, the sequence of TLB maintenance
> > operations after making changes to the translation table is
> > to clean the dcache first, then invalidate the TLB. With
> > the current sequence we see cache corruption when the
> > flush_cache_all is called after tlb_flush_all.
> 
> This needs testing on ARMv4 CPUs which don't have a way to flush the
> cache except by reading memory - hence they need the new page table
> entries to be visible to the MMU before calling flush_cache_all().

I suspect you might be one of the few individuals still having the 
ability to test new kernels on ARMv4 CPUs.


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-04 13:14     ` Nicolas Pitre
@ 2013-10-04 13:19       ` Santosh Shilimkar
  0 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-04 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 04 October 2013 09:14 AM, Nicolas Pitre wrote:
> On Fri, 4 Oct 2013, Russell King - ARM Linux wrote:
> 
>> On Thu, Oct 03, 2013 at 05:18:00PM -0400, Santosh Shilimkar wrote:
>>> From: Sricharan R <r.sricharan@ti.com>
>>>
>>> As per the arm ARMv7 manual, the sequence of TLB maintenance
>>> operations after making changes to the translation table is
>>> to clean the dcache first, then invalidate the TLB. With
>>> the current sequence we see cache corruption when the
>>> flush_cache_all is called after tlb_flush_all.
>>
>> This needs testing on ARMv4 CPUs which don't have a way to flush the
>> cache except by reading memory - hence they need the new page table
>> entries to be visible to the MMU before calling flush_cache_all().
> 
> I suspect you might be one of the few individuals still having the 
> ability to test new kernels on ARMv4 CPUs.
> 
At least I don't have any ARMv4 based system to validate it.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-03 21:18 ` [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations Santosh Shilimkar
  2013-10-04  0:25   ` Nicolas Pitre
  2013-10-04  8:46   ` Russell King - ARM Linux
@ 2013-10-04 15:52   ` Will Deacon
  2013-10-04 16:03     ` Santosh Shilimkar
  2 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2013-10-04 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 03, 2013 at 10:18:00PM +0100, Santosh Shilimkar wrote:
> From: Sricharan R <r.sricharan@ti.com>
> 
> As per the arm ARMv7 manual, the sequence of TLB maintenance
> operations after making changes to the translation table is
> to clean the dcache first, then invalidate the TLB. With
> the current sequence we see cache corruption when the
> flush_cache_all is called after tlb_flush_all.
> 
> STR rx, [Translation table entry]
> ; write new entry to the translation table
> Clean cache line [Translation table entry]
> DSB
> ; ensures visibility of the data cleaned from the D Cache
> Invalidate TLB entry by MVA (and ASID if non-global) [page address]
> Invalidate BTC
> DSB
> ; ensure completion of the Invalidate TLB operation
> ISB
> ; ensure table changes visible to instruction fetch
> 
> The issue is seen only with LPAE + THUMB BUILT KERNEL + 64BIT patching,
> which is little bit weird.

NAK.

I don't buy your reasoning. All current LPAE implementations also implement
the multi-processing extensions, meaning that the cache flush isn't required
to make the PTEs visible to the table walker. The dsb from the TLB_WB flag
is sufficient, so I think you still have some debugging to do as this change
is likely masking a problem elsewhere.

On top of that, create_mapping does all the flushing you need (for the !SMP
case) when the tables are initialised, so this code doesn't need changing.

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-03 21:17 ` [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
  2013-10-04  0:23   ` Nicolas Pitre
@ 2013-10-04 15:59   ` Will Deacon
  2013-10-04 16:12     ` Santosh Shilimkar
  1 sibling, 1 reply; 28+ messages in thread
From: Will Deacon @ 2013-10-04 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 03, 2013 at 10:17:59PM +0100, Santosh Shilimkar wrote:
> This patch adds a step in the init sequence, in order to recreate
> the kernel code/data page table mappings prior to full paging
> initialization.  This is necessary on LPAE systems that run out of
> a physical address space outside the 4G limit.  On these systems,
> this implementation provides a machine descriptor hook that allows
> the PHYS_OFFSET to be overridden in a machine specific fashion.

[...]

> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index b1d17ee..47c7497 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -28,6 +28,7 @@
>  #include <asm/highmem.h>
>  #include <asm/system_info.h>
>  #include <asm/traps.h>
> +#include <asm/procinfo.h>
>  
>  #include <asm/mach/arch.h>
>  #include <asm/mach/map.h>
> @@ -1315,6 +1316,87 @@ static void __init map_lowmem(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ARM_LPAE
> +extern void fixup_pv_table(const void *, unsigned long);
> +extern const void *__pv_table_begin, *__pv_table_end;
> +
> +/*
> + * early_paging_init() recreates boot time page table setup, allowing machines
> + * to switch over to a high (>4G) address space on LPAE systems
> + */
> +void __init early_paging_init(const struct machine_desc *mdesc,
> +			      struct proc_info_list *procinfo)
> +{
> +	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
> +	unsigned long map_start, map_end;
> +	pgd_t *pgd0, *pgdk;
> +	pud_t *pud0, *pudk;
> +	pmd_t *pmd0, *pmdk;
> +	phys_addr_t phys;
> +	int i;
> +
> +	/* remap kernel code and data */
> +	map_start = init_mm.start_code;
> +	map_end   = init_mm.brk;
> +
> +	/* get a handle on things... */
> +	pgd0 = pgd_offset_k(0);
> +	pud0 = pud_offset(pgd0, 0);
> +	pmd0 = pmd_offset(pud0, 0);
> +
> +	pgdk = pgd_offset_k(map_start);
> +	pudk = pud_offset(pgdk, map_start);
> +	pmdk = pmd_offset(pudk, map_start);
> +
> +	phys = PHYS_OFFSET;
> +
> +	if (mdesc->init_meminfo) {
> +		mdesc->init_meminfo();
> +		/* Run the patch stub to update the constants */
> +		fixup_pv_table(&__pv_table_begin,
> +			(&__pv_table_end - &__pv_table_begin) << 2);
> +
> +		/*
> +		 * Cache cleaning operations for self-modifying code
> +		 * We should clean the entries by MVA but running a
> +		 * for loop over every pv_table entry pointer would
> +		 * just complicate the code.
> +		 */
> +		flush_cache_louis();
> +		dsb();
> +		isb();

You don't need either of these barriers.

> +	}
> +
> +	/* remap level 1 table */
> +	for (i = 0; i < PTRS_PER_PGD; i++) {
> +		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
> +		pmd0 += PTRS_PER_PMD;
> +	}
> +
> +	/* remap pmds for kernel mapping */
> +	phys = __pa(map_start) & PMD_MASK;
> +	do {
> +		*pmdk++ = __pmd(phys | pmdprot);
> +		phys += PMD_SIZE;
> +	} while (phys < map_end);
> +
> +	flush_cache_all();

Why are you being so heavyweight with your cacheflushing? If you're just
interested in flushing the new page tables, then use the proper accessors to
build them. The only case I think you need to flush the world is for VIVT,
which you won't have with LPAE (you could have a BUG_ON here).

> +	cpu_set_ttbr(0, __pa(pgd0));
> +	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);

Can you not use cpu_switch_mm with the init_mm for this?

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-04 15:52   ` Will Deacon
@ 2013-10-04 16:03     ` Santosh Shilimkar
  2013-10-09 18:56       ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-04 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 04 October 2013 11:52 AM, Will Deacon wrote:
> On Thu, Oct 03, 2013 at 10:18:00PM +0100, Santosh Shilimkar wrote:
>> From: Sricharan R <r.sricharan@ti.com>
>>
>> As per the arm ARMv7 manual, the sequence of TLB maintenance
>> operations after making changes to the translation table is
>> to clean the dcache first, then invalidate the TLB. With
>> the current sequence we see cache corruption when the
>> flush_cache_all is called after tlb_flush_all.
>>
>> STR rx, [Translation table entry]
>> ; write new entry to the translation table
>> Clean cache line [Translation table entry]
>> DSB
>> ; ensures visibility of the data cleaned from the D Cache
>> Invalidate TLB entry by MVA (and ASID if non-global) [page address]
>> Invalidate BTC
>> DSB
>> ; ensure completion of the Invalidate TLB operation
>> ISB
>> ; ensure table changes visible to instruction fetch
>>
>> The issue is seen only with LPAE + THUMB BUILT KERNEL + 64BIT patching,
>> which is little bit weird.
> 
> NAK.
> 
> I don't buy your reasoning. All current LPAE implementations also implement
> the multi-processing extensions, meaning that the cache flush isn't required
> to make the PTEs visible to the table walker. The dsb from the TLB_WB flag
> is sufficient, so I think you still have some debugging to do as this change
> is likely masking a problem elsewhere.
> 
> On top of that, create_mapping does all the flushing you need (for the !SMP
> case) when the tables are initialised, so this code doesn't need changing.
> 
Fair enough. We will drop this patch from this series and continue to look
at the issue further. As such the patch has no hard dependency with rest of
the series.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-04 15:59   ` Will Deacon
@ 2013-10-04 16:12     ` Santosh Shilimkar
  2013-10-07 19:34       ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-04 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 04 October 2013 11:59 AM, Will Deacon wrote:
> On Thu, Oct 03, 2013 at 10:17:59PM +0100, Santosh Shilimkar wrote:
>> This patch adds a step in the init sequence, in order to recreate
>> the kernel code/data page table mappings prior to full paging
>> initialization.  This is necessary on LPAE systems that run out of
>> a physical address space outside the 4G limit.  On these systems,
>> this implementation provides a machine descriptor hook that allows
>> the PHYS_OFFSET to be overridden in a machine specific fashion.
> 
> [...]
> 
>> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
>> index b1d17ee..47c7497 100644
>> --- a/arch/arm/mm/mmu.c
>> +++ b/arch/arm/mm/mmu.c
>> @@ -28,6 +28,7 @@
>>  #include <asm/highmem.h>
>>  #include <asm/system_info.h>
>>  #include <asm/traps.h>
>> +#include <asm/procinfo.h>
>>  
>>  #include <asm/mach/arch.h>
>>  #include <asm/mach/map.h>
>> @@ -1315,6 +1316,87 @@ static void __init map_lowmem(void)
>>  	}
>>  }
>>  
>> +#ifdef CONFIG_ARM_LPAE
>> +extern void fixup_pv_table(const void *, unsigned long);
>> +extern const void *__pv_table_begin, *__pv_table_end;
>> +
>> +/*
>> + * early_paging_init() recreates boot time page table setup, allowing machines
>> + * to switch over to a high (>4G) address space on LPAE systems
>> + */
>> +void __init early_paging_init(const struct machine_desc *mdesc,
>> +			      struct proc_info_list *procinfo)
>> +{
>> +	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
>> +	unsigned long map_start, map_end;
>> +	pgd_t *pgd0, *pgdk;
>> +	pud_t *pud0, *pudk;
>> +	pmd_t *pmd0, *pmdk;
>> +	phys_addr_t phys;
>> +	int i;
>> +
>> +	/* remap kernel code and data */
>> +	map_start = init_mm.start_code;
>> +	map_end   = init_mm.brk;
>> +
>> +	/* get a handle on things... */
>> +	pgd0 = pgd_offset_k(0);
>> +	pud0 = pud_offset(pgd0, 0);
>> +	pmd0 = pmd_offset(pud0, 0);
>> +
>> +	pgdk = pgd_offset_k(map_start);
>> +	pudk = pud_offset(pgdk, map_start);
>> +	pmdk = pmd_offset(pudk, map_start);
>> +
>> +	phys = PHYS_OFFSET;
>> +
>> +	if (mdesc->init_meminfo) {
>> +		mdesc->init_meminfo();
>> +		/* Run the patch stub to update the constants */
>> +		fixup_pv_table(&__pv_table_begin,
>> +			(&__pv_table_end - &__pv_table_begin) << 2);
>> +
>> +		/*
>> +		 * Cache cleaning operations for self-modifying code
>> +		 * We should clean the entries by MVA but running a
>> +		 * for loop over every pv_table entry pointer would
>> +		 * just complicate the code.
>> +		 */
>> +		flush_cache_louis();
>> +		dsb();
>> +		isb();
> 
> You don't need either of these barriers.
> 
Agree. Just want to be clear, its because they are already present
in flush_cache_louis(), right ?

>> +	}
>> +
>> +	/* remap level 1 table */
>> +	for (i = 0; i < PTRS_PER_PGD; i++) {
>> +		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
>> +		pmd0 += PTRS_PER_PMD;
>> +	}
>> +
>> +	/* remap pmds for kernel mapping */
>> +	phys = __pa(map_start) & PMD_MASK;
>> +	do {
>> +		*pmdk++ = __pmd(phys | pmdprot);
>> +		phys += PMD_SIZE;
>> +	} while (phys < map_end);
>> +
>> +	flush_cache_all();
> 
> Why are you being so heavyweight with your cacheflushing? If you're just
> interested in flushing the new page tables, then use the proper accessors to
> build them. The only case I think you need to flush the world is for VIVT,
> which you won't have with LPAE (you could have a BUG_ON here).
> 
It was mainly to avoid all the looping MVA based stuff but you have valid point.
I shall look at it and see what can be done.

>> +	cpu_set_ttbr(0, __pa(pgd0));
>> +	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
> 
> Can you not use cpu_switch_mm with the init_mm for this?
> 
Probably yes. Will have a look at it.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  2013-10-04 13:02       ` Nicolas Pitre
@ 2013-10-07 19:25         ` Santosh Shilimkar
  2013-10-07 19:42           ` Nicolas Pitre
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-07 19:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 04 October 2013 09:02 AM, Nicolas Pitre wrote:
> On Fri, 4 Oct 2013, Sricharan R wrote:
> 
>> Hi,
>> On Friday 04 October 2013 05:47 AM, Nicolas Pitre wrote:
>>> On Thu, 3 Oct 2013, Santosh Shilimkar wrote:
>>>
>>>> From: Sricharan R <r.sricharan@ti.com>

[..]

>>>> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
>>>> index 07314af..8356312 100644
>>>> --- a/arch/arm/kernel/patch.c
>>>> +++ b/arch/arm/kernel/patch.c
>>>> @@ -8,6 +8,9 @@
>>>>  
>>>>  #include "patch.h"
>>>>  
>>>> +u64 __pv_phys_offset __attribute__((section(".data")));
>>>> +u64 __pv_offset __attribute__((section(".data")));
>>> Please add a comment explaining why you force those variables out of the 
>>> .bss section.  This is unlikely to be obvious to people.
>>>
>>> In fact, is there a reason why you moved those out of head.S?  You only 
>>> needed to replace the .long with .quad to match the u64 type.
>>>
>>> I think I might have suggested moving them out if they were to be typed 
>>> with phys_addr_t, but using a fixed u64 is simpler.
>>  Yes, I moved it here after your comments :-) . Since it is always u64
>>   i can move it to head.S with quad as well.
> 
> The reason behind my suggestion was to use phys_addr_t for those 
> variable which is easier with C code given that phys_addr_t can be 32 or 
> 64 bits.  But that makes the assembly more complicated.  With a fixed 
> type it is not required to move them.
> 
> Once this is done, you can add:
> 
> Reviewed-by: Nicolas Pitre <nico@linaro.org>
> 
Update patch below with your review tag for records.

Regards,
Santosh

-->
>From 3a018dad5ac051436a0cb4951a9325047c5c152a Mon Sep 17 00:00:00 2001
From: Sricharan R <r.sricharan@ti.com>
Date: Mon, 29 Jul 2013 20:26:22 +0530
Subject: [PATCH v3 4/5] ARM: mm: Correct virt_to_phys patching for 64 bit
 physical addresses

The current phys_to_virt patching mechanism works only for 32 bit
physical addresses and this patch extends the idea for 64bit physical
addresses.

The 64bit v2p patching mechanism patches the higher 8 bits of physical
address with a constant using 'mov' instruction and lower 32bits are patched
using 'add'. While this is correct, in those platforms where the lowmem addressable
physical memory spawns across 4GB boundary, a carry bit can be produced as a
result of addition of lower 32bits. This has to be taken in to account and added
in to the upper. The patched __pv_offset and va are added in lower 32bits, where
__pv_offset can be in two's complement form when PA_START < VA_START and that can
result in a false carry bit.

e.g
    1) PA = 0x80000000; VA = 0xC0000000
       __pv_offset = PA - VA = 0xC0000000 (2's complement)

    2) PA = 0x2 80000000; VA = 0xC000000
       __pv_offset = PA - VA = 0x1 C0000000

So adding __pv_offset + VA should never result in a true overflow for (1).
So in order to differentiate between a true carry, a __pv_offset is extended
to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
2's complement. So 'mvn #0' is inserted instead of 'mov' while patching
for the same reason. Since mov, add, sub instruction are to patched
with different constants inside the same stub, the rotation field
of the opcode is using to differentiate between them.

So the above examples for v2p translation becomes for VA=0xC0000000,
    1) PA[63:32] = 0xffffffff
       PA[31:0] = VA + 0xC0000000 --> results in a carry
       PA[63:32] = PA[63:32] + carry

       PA[63:0] = 0x0 80000000

    2) PA[63:32] = 0x1
       PA[31:0] = VA + 0xC0000000 --> results in a carry
       PA[63:32] = PA[63:32] + carry

       PA[63:0] = 0x2 80000000

The above ideas were suggested by Nicolas Pitre <nico@linaro.org> as
part of the review of first and second versions of the subject patch.

There is no corresponding change on the phys_to_virt() side, because
computations on the upper 32-bits would be discarded anyway.

Cc: Russell King <linux@arm.linux.org.uk>

Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/memory.h |   35 +++++++++++++++++++++--
 arch/arm/kernel/armksyms.c    |    1 +
 arch/arm/kernel/head.S        |   61 ++++++++++++++++++++++++++++++-----------
 3 files changed, 78 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index d9b96c65..942ad84 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -172,9 +172,12 @@
  * so that all we need to do is modify the 8-bit constant field.
  */
 #define __PV_BITS_31_24	0x81000000
+#define __PV_BITS_7_0	0x81
 
 extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
-extern unsigned long __pv_phys_offset;
+extern u64 __pv_phys_offset;
+extern u64 __pv_offset;
+
 #define PHYS_OFFSET __pv_phys_offset
 
 #define __pv_stub(from,to,instr,type)			\
@@ -186,10 +189,36 @@ extern unsigned long __pv_phys_offset;
 	: "=r" (to)					\
 	: "r" (from), "I" (type))
 
+#define __pv_stub_mov_hi(t)				\
+	__asm__ volatile("@ __pv_stub_mov\n"		\
+	"1:	mov	%R0, %1\n"			\
+	"	.pushsection .pv_table,\"a\"\n"		\
+	"	.long	1b\n"				\
+	"	.popsection\n"				\
+	: "=r" (t)					\
+	: "I" (__PV_BITS_7_0))
+
+#define __pv_add_carry_stub(x, y)			\
+	__asm__ volatile("@ __pv_add_carry_stub\n"	\
+	"1:	adds	%Q0, %1, %2\n"			\
+	"	adc	%R0, %R0, #0\n"			\
+	"	.pushsection .pv_table,\"a\"\n"		\
+	"	.long	1b\n"				\
+	"	.popsection\n"				\
+	: "+r" (y)					\
+	: "r" (x), "I" (__PV_BITS_31_24)		\
+	: "cc")
+
 static inline phys_addr_t __virt_to_phys(unsigned long x)
 {
-	unsigned long t;
-	__pv_stub(x, t, "add", __PV_BITS_31_24);
+	phys_addr_t t;
+
+	if (sizeof(phys_addr_t) == 4) {
+		__pv_stub(x, t, "add", __PV_BITS_31_24);
+	} else {
+		__pv_stub_mov_hi(t);
+		__pv_add_carry_stub(x, t);
+	}
 	return t;
 }
 
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..1f031dd 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
 
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 EXPORT_SYMBOL(__pv_phys_offset);
+EXPORT_SYMBOL(__pv_offset);
 #endif
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 2c7cc1e..69eaf84 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -536,6 +536,14 @@ ENTRY(fixup_smp)
 	ldmfd	sp!, {r4 - r6, pc}
 ENDPROC(fixup_smp)
 
+#ifdef __ARMEB_
+#define LOW_OFFSET	0x4
+#define HIGH_OFFSET	0x0
+#else
+#define LOW_OFFSET	0x0
+#define HIGH_OFFSET	0x4
+#endif
+
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 
 /* __fixup_pv_table - patch the stub instructions with the delta between
@@ -546,17 +554,20 @@ ENDPROC(fixup_smp)
 	__HEAD
 __fixup_pv_table:
 	adr	r0, 1f
-	ldmia	r0, {r3-r5, r7}
-	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
+	ldmia	r0, {r3-r7}
+	mvn	ip, #0
+	subs	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
 	add	r4, r4, r3	@ adjust table start address
 	add	r5, r5, r3	@ adjust table end address
-	add	r7, r7, r3	@ adjust __pv_phys_offset address
-	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
+	add	r6, r6, r3	@ adjust __pv_phys_offset address
+	add	r7, r7, r3	@ adjust __pv_offset address
+	str	r8, [r6, #LOW_OFFSET]	@ save computed PHYS_OFFSET to __pv_phys_offset
+	strcc	ip, [r7, #HIGH_OFFSET]	@ save to __pv_offset high bits
 	mov	r6, r3, lsr #24	@ constant for add/sub instructions
 	teq	r3, r6, lsl #24 @ must be 16MiB aligned
 THUMB(	it	ne		@ cross section branch )
 	bne	__error
-	str	r6, [r7, #4]	@ save to __pv_offset
+	str	r3, [r7, #LOW_OFFSET]	@ save to __pv_offset low bits
 	b	__fixup_a_pv_table
 ENDPROC(__fixup_pv_table)
 
@@ -565,9 +576,18 @@ ENDPROC(__fixup_pv_table)
 	.long	__pv_table_begin
 	.long	__pv_table_end
 2:	.long	__pv_phys_offset
+	.long	__pv_offset
 
 	.text
 __fixup_a_pv_table:
+	adr	r0, 3f
+	ldr	r6, [r0]
+	add	r6, r6, r3
+	ldr	r0, [r6, #HIGH_OFFSET]	@ pv_offset high word
+	ldr	r6, [r6, #LOW_OFFSET]	@ pv_offset low word
+	mov	r6, r6, lsr #24
+	cmn	r0, #1
+	moveq	r0, #0x400000	@ set bit 22, mov to mvn instruction
 #ifdef CONFIG_THUMB2_KERNEL
 	lsls	r6, #24
 	beq	2f
@@ -582,9 +602,15 @@ __fixup_a_pv_table:
 	b	2f
 1:	add     r7, r3
 	ldrh	ip, [r7, #2]
-	and	ip, 0x8f00
-	orr	ip, r6	@ mask in offset bits 31-24
+	tst	ip, #0x4000
+	and	ip, #0x8f00
+	orrne	ip, r6	@ mask in offset bits 31-24
+	orreq	ip, r0	@ mask in offset bits 7-0
 	strh	ip, [r7, #2]
+	ldrheq	ip, [r7]
+	biceq	ip, #0x20
+	orreq	ip, ip, r0, lsr #16
+	strheq	ip, [r7]
 2:	cmp	r4, r5
 	ldrcc	r7, [r4], #4	@ use branch for delay slot
 	bcc	1b
@@ -593,7 +619,10 @@ __fixup_a_pv_table:
 	b	2f
 1:	ldr	ip, [r7, r3]
 	bic	ip, ip, #0x000000ff
-	orr	ip, ip, r6	@ mask in offset bits 31-24
+	tst	ip, #0xf00	@ check the rotation field
+	orrne	ip, ip, r6	@ mask in offset bits 31-24
+	biceq	ip, ip, #0x400000	@ clear bit 22
+	orreq	ip, ip, r0	@ mask in offset bits 7-0
 	str	ip, [r7, r3]
 2:	cmp	r4, r5
 	ldrcc	r7, [r4], #4	@ use branch for delay slot
@@ -602,28 +631,28 @@ __fixup_a_pv_table:
 #endif
 ENDPROC(__fixup_a_pv_table)
 
+3:	.long __pv_offset
+
 ENTRY(fixup_pv_table)
 	stmfd	sp!, {r4 - r7, lr}
-	ldr	r2, 2f			@ get address of __pv_phys_offset
 	mov	r3, #0			@ no offset
 	mov	r4, r0			@ r0 = table start
 	add	r5, r0, r1		@ r1 = table size
-	ldr	r6, [r2, #4]		@ get __pv_offset
 	bl	__fixup_a_pv_table
 	ldmfd	sp!, {r4 - r7, pc}
 ENDPROC(fixup_pv_table)
 
-	.align
-2:	.long	__pv_phys_offset
-
 	.data
 	.globl	__pv_phys_offset
 	.type	__pv_phys_offset, %object
 __pv_phys_offset:
-	.long	0
-	.size	__pv_phys_offset, . - __pv_phys_offset
+	.quad	0
+
+	.data
+	.globl	__pv_offset
+	.type	__pv_offset, %object
 __pv_offset:
-	.long	0
+	.quad	0
 #endif
 
 #include "head-common.S"
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-04 16:12     ` Santosh Shilimkar
@ 2013-10-07 19:34       ` Santosh Shilimkar
  2013-10-08 10:26         ` Will Deacon
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-07 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

Will,

On Friday 04 October 2013 12:12 PM, Santosh Shilimkar wrote:
> On Friday 04 October 2013 11:59 AM, Will Deacon wrote:
>> On Thu, Oct 03, 2013 at 10:17:59PM +0100, Santosh Shilimkar wrote:
>>> This patch adds a step in the init sequence, in order to recreate
>>> the kernel code/data page table mappings prior to full paging
>>> initialization.  This is necessary on LPAE systems that run out of
>>> a physical address space outside the 4G limit.  On these systems,
>>> this implementation provides a machine descriptor hook that allows
>>> the PHYS_OFFSET to be overridden in a machine specific fashion.
>>
>> [...]
>>
>>> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
>>> index b1d17ee..47c7497 100644
>>> --- a/arch/arm/mm/mmu.c
>>> +++ b/arch/arm/mm/mmu.c
[..]

>>> @@ -1315,6 +1316,87 @@ static void __init map_lowmem(void)
>>>  	}
>>>  }
>>>  
>>> +#ifdef CONFIG_ARM_LPAE
>>> +extern void fixup_pv_table(const void *, unsigned long);
>>> +extern const void *__pv_table_begin, *__pv_table_end;
>>> +
>>> +/*
>>> + * early_paging_init() recreates boot time page table setup, allowing machines
>>> + * to switch over to a high (>4G) address space on LPAE systems
>>> + */
>>> +void __init early_paging_init(const struct machine_desc *mdesc,
>>> +			      struct proc_info_list *procinfo)
>>> +{
>>> +	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
>>> +	unsigned long map_start, map_end;
>>> +	pgd_t *pgd0, *pgdk;
>>> +	pud_t *pud0, *pudk;
>>> +	pmd_t *pmd0, *pmdk;
>>> +	phys_addr_t phys;
>>> +	int i;
>>> +
>>> +	/* remap kernel code and data */
>>> +	map_start = init_mm.start_code;
>>> +	map_end   = init_mm.brk;
>>> +
>>> +	/* get a handle on things... */
>>> +	pgd0 = pgd_offset_k(0);
>>> +	pud0 = pud_offset(pgd0, 0);
>>> +	pmd0 = pmd_offset(pud0, 0);
>>> +
>>> +	pgdk = pgd_offset_k(map_start);
>>> +	pudk = pud_offset(pgdk, map_start);
>>> +	pmdk = pmd_offset(pudk, map_start);
>>> +
>>> +	phys = PHYS_OFFSET;
>>> +
>>> +	if (mdesc->init_meminfo) {
>>> +		mdesc->init_meminfo();
>>> +		/* Run the patch stub to update the constants */
>>> +		fixup_pv_table(&__pv_table_begin,
>>> +			(&__pv_table_end - &__pv_table_begin) << 2);
>>> +
>>> +		/*
>>> +		 * Cache cleaning operations for self-modifying code
>>> +		 * We should clean the entries by MVA but running a
>>> +		 * for loop over every pv_table entry pointer would
>>> +		 * just complicate the code.
>>> +		 */
>>> +		flush_cache_louis();
>>> +		dsb();
>>> +		isb();
>>
>> You don't need either of these barriers.
>>
> Agree. Just want to be clear, its because they are already present
> in flush_cache_louis(), right ?
> 
Updated patch end of the email which addresses your comments. Regarding
above barriers, we dropped the dsb() but I have to retain the isb()
to commit the I-cache/BTB invalidate ops which are issued as part of
flush_cache_louis(). Off-list I was discussing whether to patch cache-v7.S
to add an isb to flush_cache_louis() with Russell but looking at other
usages we though of leaving the isb() in my patch itself. Without the
isb(), we see corruption on next v2p conversion.

Let me know if you have any other concern. I plan to prepare a branch for
RMK to pull in for upcoming merge window.

Regards,
Santosh

>From 6c4be7594a9556d1d79503c17d0ce629abec17d7 Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@ti.com>
Date: Wed, 31 Jul 2013 12:44:46 -0400
Subject: [PATCH v3 5/5] ARM: mm: Recreate kernel mappings in
 early_paging_init()

This patch adds a step in the init sequence, in order to recreate
the kernel code/data page table mappings prior to full paging
initialization.  This is necessary on LPAE systems that run out of
a physical address space outside the 4G limit.  On these systems,
this implementation provides a machine descriptor hook that allows
the PHYS_OFFSET to be overridden in a machine specific fashion.

Cc: Russell King <linux@arm.linux.org.uk>

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: R Sricharan <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/mach/arch.h |    1 +
 arch/arm/kernel/setup.c          |    3 ++
 arch/arm/mm/mmu.c                |   89 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 402a2bc..17a3fa2 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -49,6 +49,7 @@ struct machine_desc {
 	bool			(*smp_init)(void);
 	void			(*fixup)(struct tag *, char **,
 					 struct meminfo *);
+	void			(*init_meminfo)(void);
 	void			(*reserve)(void);/* reserve mem blocks	*/
 	void			(*map_io)(void);/* IO mapping function	*/
 	void			(*init_early)(void);
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 0e1e2b3..b9a6dac 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -73,6 +73,7 @@ __setup("fpe=", fpe_setup);
 #endif
 
 extern void paging_init(const struct machine_desc *desc);
+extern void early_paging_init(const struct machine_desc *, struct proc_info_list *);
 extern void sanity_check_meminfo(void);
 extern enum reboot_mode reboot_mode;
 extern void setup_dma_zone(const struct machine_desc *desc);
@@ -878,6 +879,8 @@ void __init setup_arch(char **cmdline_p)
 	parse_early_param();
 
 	sort(&meminfo.bank, meminfo.nr_banks, sizeof(meminfo.bank[0]), meminfo_cmp, NULL);
+
+	early_paging_init(mdesc, lookup_processor_type(read_cpuid_id()));
 	sanity_check_meminfo();
 	arm_memblock_init(&meminfo, mdesc);
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b1d17ee..0751b46 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -28,6 +28,7 @@
 #include <asm/highmem.h>
 #include <asm/system_info.h>
 #include <asm/traps.h>
+#include <asm/procinfo.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -1315,6 +1316,94 @@ static void __init map_lowmem(void)
 	}
 }
 
+#ifdef CONFIG_ARM_LPAE
+extern void fixup_pv_table(const void *, unsigned long);
+extern const void *__pv_table_begin, *__pv_table_end;
+
+/*
+ * early_paging_init() recreates boot time page table setup, allowing machines
+ * to switch over to a high (>4G) address space on LPAE systems
+ */
+void __init early_paging_init(const struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
+	unsigned long map_start, map_end;
+	pgd_t *pgd0, *pgdk;
+	pud_t *pud0, *pudk, *pud_start;
+	pmd_t *pmd0, *pmdk, *pmd_start;
+	phys_addr_t phys;
+	int i;
+
+	/* remap kernel code and data */
+	map_start = init_mm.start_code;
+	map_end   = init_mm.brk;
+
+	/* get a handle on things... */
+	pgd0 = pgd_offset_k(0);
+	pud_start = pud0 = pud_offset(pgd0, 0);
+	pmd0 = pmd_offset(pud0, 0);
+
+	pgdk = pgd_offset_k(map_start);
+	pudk = pud_offset(pgdk, map_start);
+	pmd_start = pmdk = pmd_offset(pudk, map_start);
+
+	phys = PHYS_OFFSET;
+
+	if (mdesc->init_meminfo) {
+		mdesc->init_meminfo();
+		/* Run the patch stub to update the constants */
+		fixup_pv_table(&__pv_table_begin,
+			(&__pv_table_end - &__pv_table_begin) << 2);
+
+		/*
+		 * Cache cleaning operations for self-modifying code
+		 * We should clean the entries by MVA but running a
+		 * for loop over every pv_table entry pointer would
+		 * just complicate the code. isb() is added to commit
+		 * all the prior cp15 operations.
+		 */
+		flush_cache_louis();
+		isb();
+	}
+
+	/* remap level 1 table */
+	for (i = 0; i < PTRS_PER_PGD; i++) {
+		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
+		pmd0 += PTRS_PER_PMD;
+	}
+
+	__cpuc_flush_dcache_area(pud_start, sizeof(pud_start) * PTRS_PER_PGD);
+	outer_clean_range(virt_to_phys(pud_start), sizeof(pud_start) * PTRS_PER_PGD);
+
+	/* remap pmds for kernel mapping */
+	phys = __pa(map_start) & PMD_MASK;
+	i = 0;
+	do {
+		*pmdk++ = __pmd(phys | pmdprot);
+		phys += PMD_SIZE;
+		i++;
+	} while (phys < map_end);
+
+	__cpuc_flush_dcache_area(pmd_start, sizeof(pmd_start) * i);
+	outer_clean_range(virt_to_phys(pmd_start), sizeof(pmd_start) * i);
+
+	cpu_switch_mm(pgd0, &init_mm);
+	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
+	local_flush_tlb_all();
+}
+
+#else
+
+void __init early_paging_init(const struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	if (mdesc->init_meminfo)
+		mdesc->init_meminfo();
+}
+
+#endif
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps, and sets up the zero page, bad page and bad page tables.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  2013-10-07 19:25         ` Santosh Shilimkar
@ 2013-10-07 19:42           ` Nicolas Pitre
  2013-10-08 11:43             ` Sricharan R
  0 siblings, 1 reply; 28+ messages in thread
From: Nicolas Pitre @ 2013-10-07 19:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 7 Oct 2013, Santosh Shilimkar wrote:

> Update patch below with your review tag for records.
> 
> Regards,
> Santosh

Micronit:

>  	.data
>  	.globl	__pv_phys_offset
>  	.type	__pv_phys_offset, %object
>  __pv_phys_offset:
> -	.long	0
> -	.size	__pv_phys_offset, . - __pv_phys_offset
> +	.quad	0
> +
> +	.data
> +	.globl	__pv_offset
> +	.type	__pv_offset, %object
>  __pv_offset:
> -	.long	0
> +	.quad	0

Please keep the .size statement for __pv_phys_offset, and adding one for 
__pv_offset wouldn't hurt either.  And the second .data is redundant.


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-07 19:34       ` Santosh Shilimkar
@ 2013-10-08 10:26         ` Will Deacon
  2013-10-08 17:45           ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2013-10-08 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 07, 2013 at 08:34:41PM +0100, Santosh Shilimkar wrote:
> Will,

Hi Santosh,

> On Friday 04 October 2013 12:12 PM, Santosh Shilimkar wrote:
> > On Friday 04 October 2013 11:59 AM, Will Deacon wrote:
> >> On Thu, Oct 03, 2013 at 10:17:59PM +0100, Santosh Shilimkar wrote:
> >>> +	if (mdesc->init_meminfo) {
> >>> +		mdesc->init_meminfo();
> >>> +		/* Run the patch stub to update the constants */
> >>> +		fixup_pv_table(&__pv_table_begin,
> >>> +			(&__pv_table_end - &__pv_table_begin) << 2);
> >>> +
> >>> +		/*
> >>> +		 * Cache cleaning operations for self-modifying code
> >>> +		 * We should clean the entries by MVA but running a
> >>> +		 * for loop over every pv_table entry pointer would
> >>> +		 * just complicate the code.
> >>> +		 */
> >>> +		flush_cache_louis();
> >>> +		dsb();
> >>> +		isb();
> >>
> >> You don't need either of these barriers.
> >>
> > Agree. Just want to be clear, its because they are already present
> > in flush_cache_louis(), right ?
> > 
> Updated patch end of the email which addresses your comments. Regarding
> above barriers, we dropped the dsb() but I have to retain the isb()
> to commit the I-cache/BTB invalidate ops which are issued as part of
> flush_cache_louis(). Off-list I was discussing whether to patch cache-v7.S
> to add an isb to flush_cache_louis() with Russell but looking at other
> usages we though of leaving the isb() in my patch itself. Without the
> isb(), we see corruption on next v2p conversion.

Ok, further comments below.

> +void __init early_paging_init(const struct machine_desc *mdesc,
> +			      struct proc_info_list *procinfo)
> +{
> +	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
> +	unsigned long map_start, map_end;
> +	pgd_t *pgd0, *pgdk;
> +	pud_t *pud0, *pudk, *pud_start;
> +	pmd_t *pmd0, *pmdk, *pmd_start;
> +	phys_addr_t phys;
> +	int i;
> +
> +	/* remap kernel code and data */
> +	map_start = init_mm.start_code;
> +	map_end   = init_mm.brk;
> +
> +	/* get a handle on things... */
> +	pgd0 = pgd_offset_k(0);
> +	pud_start = pud0 = pud_offset(pgd0, 0);
> +	pmd0 = pmd_offset(pud0, 0);
> +
> +	pgdk = pgd_offset_k(map_start);
> +	pudk = pud_offset(pgdk, map_start);
> +	pmd_start = pmdk = pmd_offset(pudk, map_start);
> +
> +	phys = PHYS_OFFSET;
> +
> +	if (mdesc->init_meminfo) {
> +		mdesc->init_meminfo();
> +		/* Run the patch stub to update the constants */
> +		fixup_pv_table(&__pv_table_begin,
> +			(&__pv_table_end - &__pv_table_begin) << 2);
> +
> +		/*
> +		 * Cache cleaning operations for self-modifying code
> +		 * We should clean the entries by MVA but running a
> +		 * for loop over every pv_table entry pointer would
> +		 * just complicate the code. isb() is added to commit
> +		 * all the prior cp15 operations.
> +		 */
> +		flush_cache_louis();
> +		isb();

I see, you need the new __pv_tables to be visible for your page table
population below, right? In which case, I'm afraid I have to go back on my
original statement; you *do* need that dsb() prior to the isb() if you want
to ensure that the icache maintenance is complete and synchronised.

However, this really looks like an issue with the v7 cache flushing
routines. Why on Earth do they only guarantee completion on the D-side?

> +	}
> +
> +	/* remap level 1 table */
> +	for (i = 0; i < PTRS_PER_PGD; i++) {
> +		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
> +		pmd0 += PTRS_PER_PMD;
> +	}
> +
> +	__cpuc_flush_dcache_area(pud_start, sizeof(pud_start) * PTRS_PER_PGD);
> +	outer_clean_range(virt_to_phys(pud_start), sizeof(pud_start) * PTRS_PER_PGD);

You don't need to flush these page tables if you're SMP. If you use
clean_dcache_area instead, it will do the right thing. The again, why can't
you use pud_populate and pmd_populate for these two loops? Is there an
interaction with coherency here? (if so, why don't you need to flush the
entire cache hierarchy anyway?)

> +	/* remap pmds for kernel mapping */
> +	phys = __pa(map_start) & PMD_MASK;
> +	i = 0;
> +	do {
> +		*pmdk++ = __pmd(phys | pmdprot);
> +		phys += PMD_SIZE;
> +		i++;
> +	} while (phys < map_end);
> +
> +	__cpuc_flush_dcache_area(pmd_start, sizeof(pmd_start) * i);
> +	outer_clean_range(virt_to_phys(pmd_start), sizeof(pmd_start) * i);
> +
> +	cpu_switch_mm(pgd0, &init_mm);
> +	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);

I think you should have a local_flush_bp_all here.

> +	local_flush_tlb_all();

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses
  2013-10-07 19:42           ` Nicolas Pitre
@ 2013-10-08 11:43             ` Sricharan R
  0 siblings, 0 replies; 28+ messages in thread
From: Sricharan R @ 2013-10-08 11:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 08 October 2013 01:12 AM, Nicolas Pitre wrote:
> On Mon, 7 Oct 2013, Santosh Shilimkar wrote:
>
>> Update patch below with your review tag for records.
>>
>> Regards,
>> Santosh
> Micronit:
>
>>  	.data
>>  	.globl	__pv_phys_offset
>>  	.type	__pv_phys_offset, %object
>>  __pv_phys_offset:
>> -	.long	0
>> -	.size	__pv_phys_offset, . - __pv_phys_offset
>> +	.quad	0
>> +
>> +	.data
>> +	.globl	__pv_offset
>> +	.type	__pv_offset, %object
>>  __pv_offset:
>> -	.long	0
>> +	.quad	0
> Please keep the .size statement for __pv_phys_offset, and adding one for 
> __pv_offset wouldn't hurt either.  And the second .data is redundant.
>
Ok, will update this.

Regards,
 Sricharan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-08 10:26         ` Will Deacon
@ 2013-10-08 17:45           ` Santosh Shilimkar
  2013-10-09 10:06             ` Will Deacon
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-08 17:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Tuesday 08 October 2013 06:26 AM, Will Deacon wrote:
> On Mon, Oct 07, 2013 at 08:34:41PM +0100, Santosh Shilimkar wrote:
>> Will,
> 
> Hi Santosh,
> 

[..]

>> +void __init early_paging_init(const struct machine_desc *mdesc,
>> +			      struct proc_info_list *procinfo)
>> +{
>> +	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
>> +	unsigned long map_start, map_end;
>> +	pgd_t *pgd0, *pgdk;
>> +	pud_t *pud0, *pudk, *pud_start;
>> +	pmd_t *pmd0, *pmdk, *pmd_start;
>> +	phys_addr_t phys;
>> +	int i;
>> +
>> +	/* remap kernel code and data */
>> +	map_start = init_mm.start_code;
>> +	map_end   = init_mm.brk;
>> +
>> +	/* get a handle on things... */
>> +	pgd0 = pgd_offset_k(0);
>> +	pud_start = pud0 = pud_offset(pgd0, 0);
>> +	pmd0 = pmd_offset(pud0, 0);
>> +
>> +	pgdk = pgd_offset_k(map_start);
>> +	pudk = pud_offset(pgdk, map_start);
>> +	pmd_start = pmdk = pmd_offset(pudk, map_start);
>> +
>> +	phys = PHYS_OFFSET;
>> +
>> +	if (mdesc->init_meminfo) {
>> +		mdesc->init_meminfo();
>> +		/* Run the patch stub to update the constants */
>> +		fixup_pv_table(&__pv_table_begin,
>> +			(&__pv_table_end - &__pv_table_begin) << 2);
>> +
>> +		/*
>> +		 * Cache cleaning operations for self-modifying code
>> +		 * We should clean the entries by MVA but running a
>> +		 * for loop over every pv_table entry pointer would
>> +		 * just complicate the code. isb() is added to commit
>> +		 * all the prior cp15 operations.
>> +		 */
>> +		flush_cache_louis();
>> +		isb();
> 
> I see, you need the new __pv_tables to be visible for your page table
> population below, right? In which case, I'm afraid I have to go back on my
> original statement; you *do* need that dsb() prior to the isb() if you want
> to ensure that the icache maintenance is complete and synchronised.
> 
Need of dsb and isb is what ARM ARM says but then I got bit biased after
your reply. 

> However, this really looks like an issue with the v7 cache flushing
> routines. Why on Earth do they only guarantee completion on the D-side?
> 
Indeed.

>> +	}
>> +
>> +	/* remap level 1 table */
>> +	for (i = 0; i < PTRS_PER_PGD; i++) {
>> +		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
>> +		pmd0 += PTRS_PER_PMD;
>> +	}
>> +
>> +	__cpuc_flush_dcache_area(pud_start, sizeof(pud_start) * PTRS_PER_PGD);
>> +	outer_clean_range(virt_to_phys(pud_start), sizeof(pud_start) * PTRS_PER_PGD);
> 
> You don't need to flush these page tables if you're SMP. If you use
> clean_dcache_area instead, it will do the right thing. The again, why can't
> you use pud_populate and pmd_populate for these two loops? Is there an
> interaction with coherency here? (if so, why don't you need to flush the
> entire cache hierarchy anyway?)
> 
You mean AMRMv7 SMP PT walkers can read from L1 cache and hence doesn't need 
flushing L1. While this could be true, for some reason we don't the same
behavior and seeing that without flush we are seeing the issue.

Initially we were doing entire cache flush but moved to the mva based
routines on your suggestion.

Regarding the pud_populate(), since we needed L_PGD_SWAPPER, we couldn't
use that version but updated patch uses the set_pud() which takes the flag.
And pmd_populate() can't be used either because it creates pte based
tables which is not what we want.

So the current working patch as it stands is end of the email. Do let
us know if we are missing anything for the PTW L1 allocation behavior.

Regards,
Santosh


>From 832ea2ba84ad8a012ec7d4dad4d8085cca2cd598 Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@ti.com>
Date: Wed, 31 Jul 2013 12:44:46 -0400
Subject: [PATCH v3 5/8] ARM: mm: Recreate kernel mappings in
 early_paging_init()

This patch adds a step in the init sequence, in order to recreate
the kernel code/data page table mappings prior to full paging
initialization.  This is necessary on LPAE systems that run out of
a physical address space outside the 4G limit.  On these systems,
this implementation provides a machine descriptor hook that allows
the PHYS_OFFSET to be overridden in a machine specific fashion.

Based on Cyril's initial patch. The pv_table needs to be patched
again after switching to higher address space.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: R Sricharan <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/mach/arch.h |    1 +
 arch/arm/kernel/setup.c          |    4 ++
 arch/arm/mm/mmu.c                |   91 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 402a2bc..17a3fa2 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -49,6 +49,7 @@ struct machine_desc {
 	bool			(*smp_init)(void);
 	void			(*fixup)(struct tag *, char **,
 					 struct meminfo *);
+	void			(*init_meminfo)(void);
 	void			(*reserve)(void);/* reserve mem blocks	*/
 	void			(*map_io)(void);/* IO mapping function	*/
 	void			(*init_early)(void);
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 0e1e2b3..af7b7db 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -73,6 +73,8 @@ __setup("fpe=", fpe_setup);
 #endif
 
 extern void paging_init(const struct machine_desc *desc);
+extern void early_paging_init(const struct machine_desc *,
+			      struct proc_info_list *);
 extern void sanity_check_meminfo(void);
 extern enum reboot_mode reboot_mode;
 extern void setup_dma_zone(const struct machine_desc *desc);
@@ -878,6 +880,8 @@ void __init setup_arch(char **cmdline_p)
 	parse_early_param();
 
 	sort(&meminfo.bank, meminfo.nr_banks, sizeof(meminfo.bank[0]), meminfo_cmp, NULL);
+
+	early_paging_init(mdesc, lookup_processor_type(read_cpuid_id()));
 	sanity_check_meminfo();
 	arm_memblock_init(&meminfo, mdesc);
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b1d17ee..e9e5276 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -28,6 +28,7 @@
 #include <asm/highmem.h>
 #include <asm/system_info.h>
 #include <asm/traps.h>
+#include <asm/procinfo.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -1315,6 +1316,96 @@ static void __init map_lowmem(void)
 	}
 }
 
+#ifdef CONFIG_ARM_LPAE
+extern void fixup_pv_table(const void *, unsigned long);
+extern const void *__pv_table_begin, *__pv_table_end;
+
+/*
+ * early_paging_init() recreates boot time page table setup, allowing machines
+ * to switch over to a high (>4G) address space on LPAE systems
+ */
+void __init early_paging_init(const struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
+	unsigned long map_start, map_end;
+	pgd_t *pgd0, *pgdk;
+	pud_t *pud0, *pudk, *pud_start;
+	pmd_t *pmd0, *pmdk, *pmd_start;
+	phys_addr_t phys;
+	int i;
+
+	/* remap kernel code and data */
+	map_start = init_mm.start_code;
+	map_end   = init_mm.brk;
+
+	/* get a handle on things... */
+	pgd0 = pgd_offset_k(0);
+	pud_start = pud0 = pud_offset(pgd0, 0);
+	pmd0 = pmd_offset(pud0, 0);
+
+	pgdk = pgd_offset_k(map_start);
+	pudk = pud_offset(pgdk, map_start);
+	pmd_start = pmdk = pmd_offset(pudk, map_start);
+
+	if (mdesc->init_meminfo) {
+		mdesc->init_meminfo();
+		/* Run the patch stub to update the constants */
+		fixup_pv_table(&__pv_table_begin,
+			(&__pv_table_end - &__pv_table_begin) << 2);
+
+		/*
+		 * Cache cleaning operations for self-modifying code
+		 * We should clean the entries by MVA but running a
+		 * for loop over every pv_table entry pointer would
+		 * just complicate the code.
+		 */
+		flush_cache_louis();
+		dsb();
+		isb();
+	}
+
+	/* remap level 1 table */
+	for (i = 0; i < PTRS_PER_PGD; pud0++, i++) {
+		set_pud(pud0,
+			__pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER));
+		pmd0 += PTRS_PER_PMD;
+	}
+
+	__cpuc_flush_dcache_area(pud_start, sizeof(pud_start) * PTRS_PER_PGD);
+	outer_clean_range(virt_to_phys(pud_start),
+			  sizeof(pud_start) * PTRS_PER_PGD);
+
+	/* remap pmds for kernel mapping */
+	phys = __pa(map_start) & PMD_MASK;
+	i = 0;
+	do {
+		*pmdk++ = __pmd(phys | pmdprot);
+		phys += PMD_SIZE;
+		i++;
+	} while (phys < map_end);
+
+	__cpuc_flush_dcache_area(pmd_start, sizeof(pmd_start) * i);
+	outer_clean_range(virt_to_phys(pmd_start),
+			  sizeof(pmd_start) * i);
+
+	cpu_switch_mm(pgd0, &init_mm);
+	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
+	local_flush_bp_all();
+	local_flush_tlb_all();
+}
+
+#else
+
+void __init early_paging_init(struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	if (mdesc->init_meminfo)
+		mdesc->init_meminfo();
+}
+
+#endif
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps, and sets up the zero page, bad page and bad page tables.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-08 17:45           ` Santosh Shilimkar
@ 2013-10-09 10:06             ` Will Deacon
  2013-10-09 18:51               ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2013-10-09 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 08, 2013 at 06:45:33PM +0100, Santosh Shilimkar wrote:
> On Tuesday 08 October 2013 06:26 AM, Will Deacon wrote:
> > On Mon, Oct 07, 2013 at 08:34:41PM +0100, Santosh Shilimkar wrote:
> >> +		/*
> >> +		 * Cache cleaning operations for self-modifying code
> >> +		 * We should clean the entries by MVA but running a
> >> +		 * for loop over every pv_table entry pointer would
> >> +		 * just complicate the code. isb() is added to commit
> >> +		 * all the prior cp15 operations.
> >> +		 */
> >> +		flush_cache_louis();
> >> +		isb();
> > 
> > I see, you need the new __pv_tables to be visible for your page table
> > population below, right? In which case, I'm afraid I have to go back on my
> > original statement; you *do* need that dsb() prior to the isb() if you want
> > to ensure that the icache maintenance is complete and synchronised.
> > 
> Need of dsb and isb is what ARM ARM says but then I got bit biased after
> your reply. 

Yeah, sorry about that. I didn't originally notice that you needed the I-cache
flushing before the __pa stuff below.

> >> +	}
> >> +
> >> +	/* remap level 1 table */
> >> +	for (i = 0; i < PTRS_PER_PGD; i++) {
> >> +		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
> >> +		pmd0 += PTRS_PER_PMD;
> >> +	}
> >> +
> >> +	__cpuc_flush_dcache_area(pud_start, sizeof(pud_start) * PTRS_PER_PGD);
> >> +	outer_clean_range(virt_to_phys(pud_start), sizeof(pud_start) * PTRS_PER_PGD);
> > 
> > You don't need to flush these page tables if you're SMP. If you use
> > clean_dcache_area instead, it will do the right thing. The again, why can't
> > you use pud_populate and pmd_populate for these two loops? Is there an
> > interaction with coherency here? (if so, why don't you need to flush the
> > entire cache hierarchy anyway?)
> > 
> You mean AMRMv7 SMP PT walkers can read from L1 cache and hence doesn't need 
> flushing L1. While this could be true, for some reason we don't the same
> behavior and seeing that without flush we are seeing the issue.

I would really like to know why this isn't working for you. I have a feeling
that it's related to your interesting coherency issues on keystone. For
example, if the physical address put in the ttbr doesn't match the physical
address which is mapped to the kernel page tables, then we could get
physical aliasing in the caches.

> Initially we were doing entire cache flush but moved to the mva based
> routines on your suggestion.

If the issue is related to coherency and physical aliasing, I really think
you should just flush the entire cache hierarchy. It's difficult to identify
exactly what state needs to be carried over between the old and new
mappings, but I bet it's more than just page tables.

> Regarding the pud_populate(), since we needed L_PGD_SWAPPER, we couldn't
> use that version but updated patch uses the set_pud() which takes the flag.
> And pmd_populate() can't be used either because it creates pte based
> tables which is not what we want.

Ok. It certainly looks better than it did.

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-10-09 10:06             ` Will Deacon
@ 2013-10-09 18:51               ` Santosh Shilimkar
  0 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-09 18:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 09 October 2013 06:06 AM, Will Deacon wrote:
> On Tue, Oct 08, 2013 at 06:45:33PM +0100, Santosh Shilimkar wrote:
>> On Tuesday 08 October 2013 06:26 AM, Will Deacon wrote:
>>> On Mon, Oct 07, 2013 at 08:34:41PM +0100, Santosh Shilimkar wrote:
>>>> +		/*
>>>> +		 * Cache cleaning operations for self-modifying code
>>>> +		 * We should clean the entries by MVA but running a
>>>> +		 * for loop over every pv_table entry pointer would
>>>> +		 * just complicate the code. isb() is added to commit
>>>> +		 * all the prior cp15 operations.
>>>> +		 */
>>>> +		flush_cache_louis();
>>>> +		isb();
>>>
>>> I see, you need the new __pv_tables to be visible for your page table
>>> population below, right? In which case, I'm afraid I have to go back on my
>>> original statement; you *do* need that dsb() prior to the isb() if you want
>>> to ensure that the icache maintenance is complete and synchronised.
>>>
>> Need of dsb and isb is what ARM ARM says but then I got bit biased after
>> your reply. 
> 
> Yeah, sorry about that. I didn't originally notice that you needed the I-cache
> flushing before the __pa stuff below.
>
No problem
 
>>>> +	}
>>>> +
>>>> +	/* remap level 1 table */
>>>> +	for (i = 0; i < PTRS_PER_PGD; i++) {
>>>> +		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
>>>> +		pmd0 += PTRS_PER_PMD;
>>>> +	}
>>>> +
>>>> +	__cpuc_flush_dcache_area(pud_start, sizeof(pud_start) * PTRS_PER_PGD);
>>>> +	outer_clean_range(virt_to_phys(pud_start), sizeof(pud_start) * PTRS_PER_PGD);
>>>
>>> You don't need to flush these page tables if you're SMP. If you use
>>> clean_dcache_area instead, it will do the right thing. The again, why can't
>>> you use pud_populate and pmd_populate for these two loops? Is there an
>>> interaction with coherency here? (if so, why don't you need to flush the
>>> entire cache hierarchy anyway?)
>>>
>> You mean AMRMv7 SMP PT walkers can read from L1 cache and hence doesn't need 
>> flushing L1. While this could be true, for some reason we don't the same
>> behavior and seeing that without flush we are seeing the issue.
> 
> I would really like to know why this isn't working for you. I have a feeling
> that it's related to your interesting coherency issues on keystone. For
> example, if the physical address put in the ttbr doesn't match the physical
> address which is mapped to the kernel page tables, then we could get
> physical aliasing in the caches.
> 
It might be. we will keep debugging that.

>> Initially we were doing entire cache flush but moved to the mva based
>> routines on your suggestion.
> 
> If the issue is related to coherency and physical aliasing, I really think
> you should just flush the entire cache hierarchy. It's difficult to identify
> exactly what state needs to be carried over between the old and new
> mappings, but I bet it's more than just page tables.
>
You are probably right. I will go back to the full flush to avoid any
corner case till we figure out the issue.
 
>> Regarding the pud_populate(), since we needed L_PGD_SWAPPER, we couldn't
>> use that version but updated patch uses the set_pud() which takes the flag.
>> And pmd_populate() can't be used either because it creates pte based
>> tables which is not what we want.
> 
> Ok. It certainly looks better than it did.
> 
Thanks a lot. I will refresh the patch with above update.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations.
  2013-10-04 16:03     ` Santosh Shilimkar
@ 2013-10-09 18:56       ` Santosh Shilimkar
  0 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2013-10-09 18:56 UTC (permalink / raw)
  To: linux-arm-kernel

Will,

Will,

On Friday 04 October 2013 12:03 PM, Santosh Shilimkar wrote:
> On Friday 04 October 2013 11:52 AM, Will Deacon wrote:
>> On Thu, Oct 03, 2013 at 10:18:00PM +0100, Santosh Shilimkar wrote:
>>> From: Sricharan R <r.sricharan@ti.com>
>>>
>>> As per the arm ARMv7 manual, the sequence of TLB maintenance
>>> operations after making changes to the translation table is
>>> to clean the dcache first, then invalidate the TLB. With
>>> the current sequence we see cache corruption when the
>>> flush_cache_all is called after tlb_flush_all.
>>>
>>> STR rx, [Translation table entry]
>>> ; write new entry to the translation table
>>> Clean cache line [Translation table entry]
>>> DSB
>>> ; ensures visibility of the data cleaned from the D Cache
>>> Invalidate TLB entry by MVA (and ASID if non-global) [page address]
>>> Invalidate BTC
>>> DSB
>>> ; ensure completion of the Invalidate TLB operation
>>> ISB
>>> ; ensure table changes visible to instruction fetch
>>>
>>> The issue is seen only with LPAE + THUMB BUILT KERNEL + 64BIT patching,
>>> which is little bit weird.
>>
>> NAK.
>>
>> I don't buy your reasoning. All current LPAE implementations also implement
>> the multi-processing extensions, meaning that the cache flush isn't required
>> to make the PTEs visible to the table walker. The dsb from the TLB_WB flag
>> is sufficient, so I think you still have some debugging to do as this change
>> is likely masking a problem elsewhere.
>>
>> On top of that, create_mapping does all the flushing you need (for the !SMP
>> case) when the tables are initialised, so this code doesn't need changing.
>>
> Fair enough. We will drop this patch from this series and continue to look
> at the issue further. As such the patch has no hard dependency with rest of
> the series.
> 
Just to update the thread, Sricharan tracked down this issue now and
the 64 bit patch is fixed.

Thanks for NAK ;)

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-10-09 18:56 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-03 21:17 [PATCH v3 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
2013-10-03 21:17 ` [PATCH v3 1/6] ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
2013-10-03 21:17 ` [PATCH v3 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook Santosh Shilimkar
2013-10-03 21:17 ` [PATCH v3 3/6] ARM: mm: Move the idmap print to appropriate place in the code Santosh Shilimkar
2013-10-03 21:17 ` [PATCH v3 4/6] ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
2013-10-04  0:17   ` Nicolas Pitre
2013-10-04  5:37     ` Sricharan R
2013-10-04 13:02       ` Nicolas Pitre
2013-10-07 19:25         ` Santosh Shilimkar
2013-10-07 19:42           ` Nicolas Pitre
2013-10-08 11:43             ` Sricharan R
2013-10-03 21:17 ` [PATCH v3 5/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
2013-10-04  0:23   ` Nicolas Pitre
2013-10-04 15:59   ` Will Deacon
2013-10-04 16:12     ` Santosh Shilimkar
2013-10-07 19:34       ` Santosh Shilimkar
2013-10-08 10:26         ` Will Deacon
2013-10-08 17:45           ` Santosh Shilimkar
2013-10-09 10:06             ` Will Deacon
2013-10-09 18:51               ` Santosh Shilimkar
2013-10-03 21:18 ` [PATCH v3 6/6] ARM: mm: Change the order of TLB/cache maintenance operations Santosh Shilimkar
2013-10-04  0:25   ` Nicolas Pitre
2013-10-04  8:46   ` Russell King - ARM Linux
2013-10-04 13:14     ` Nicolas Pitre
2013-10-04 13:19       ` Santosh Shilimkar
2013-10-04 15:52   ` Will Deacon
2013-10-04 16:03     ` Santosh Shilimkar
2013-10-09 18:56       ` Santosh Shilimkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.