[PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems
@ 2013-07-31 16:44 Santosh Shilimkar
  2013-07-31 16:44 ` [PATCH v2 1/6] ARM: mm: LPAE: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-07-31 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

This is version 2 of the series which addresses the comments/suggestions
we received on earlier version. The series trying to extend the existing
v2p runtime patching for LPAE machines which can have physical memory
beyond 4 GB. Keystone is one such ARM machine.

In testing various modes(LPAE/non-LPAE, ARM/THMUB), we found an issue
with the patching code for THUMB2 build. So there is a patch added in
the series to address it. I was planning to squash it in the parent
64 bit patch but thought of keeping it separate for ease of review.
We can squash it while merging the patches. 

Special thanks to Nicolas for his valuable feedback on the earlier
version.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>

Santosh Shilimkar (4):
  ARM: mm: LPAE: use phys_addr_t appropriately in p2v and v2p
    conversions
  ARM: mm: Introduce virt_to_idmap() with an arch hook
  ARM: mm: Move the idmap print to appropriate place in the code
  ARM: mm: Recreate kernel mappings in early_paging_init()

Sricharan R (2):
  ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical
    addresses
  ARM: mm: Update runtime patching code to THUMB2 mode

 arch/arm/include/asm/mach/arch.h |    1 +
 arch/arm/include/asm/memory.h    |   73 +++++++++++++++++++++---
 arch/arm/kernel/armksyms.c       |    1 +
 arch/arm/kernel/head.S           |  115 +++++++++++++++++++++++++++-----------
 arch/arm/kernel/setup.c          |    3 +
 arch/arm/kernel/smp.c            |    2 +-
 arch/arm/mm/idmap.c              |    8 +--
 arch/arm/mm/mmu.c                |   82 +++++++++++++++++++++++++++
 8 files changed, 237 insertions(+), 48 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/6] ARM: mm: LPAE: use phys_addr_t appropriately in p2v and v2p conversions
  2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
@ 2013-07-31 16:44 ` Santosh Shilimkar
  2013-07-31 16:44 ` [PATCH v2 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook Santosh Shilimkar
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-07-31 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

Fix remainder types used when converting back and forth between
physical and virtual addresses.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/memory.h |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index e750a93..c133bd9 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -185,22 +185,32 @@ extern unsigned long __pv_phys_offset;
 	: "=r" (to)					\
 	: "r" (from), "I" (type))
 
-static inline unsigned long __virt_to_phys(unsigned long x)
+static inline phys_addr_t __virt_to_phys(unsigned long x)
 {
 	unsigned long t;
 	__pv_stub(x, t, "add", __PV_BITS_31_24);
 	return t;
 }
 
-static inline unsigned long __phys_to_virt(unsigned long x)
+static inline unsigned long __phys_to_virt(phys_addr_t x)
 {
 	unsigned long t;
 	__pv_stub(x, t, "sub", __PV_BITS_31_24);
 	return t;
 }
+
 #else
-#define __virt_to_phys(x)	((x) - PAGE_OFFSET + PHYS_OFFSET)
-#define __phys_to_virt(x)	((x) - PHYS_OFFSET + PAGE_OFFSET)
+
+static inline phys_addr_t __virt_to_phys(unsigned long x)
+{
+	return (phys_addr_t)x - PAGE_OFFSET + PHYS_OFFSET;
+}
+
+static inline unsigned long __phys_to_virt(phys_addr_t x)
+{
+	return x - PHYS_OFFSET + PAGE_OFFSET;
+}
+
 #endif
 #endif
 #endif /* __ASSEMBLY__ */
@@ -238,14 +248,14 @@ static inline phys_addr_t virt_to_phys(const volatile void *x)
 
 static inline void *phys_to_virt(phys_addr_t x)
 {
-	return (void *)(__phys_to_virt((unsigned long)(x)));
+	return (void *)__phys_to_virt(x);
 }
 
 /*
  * Drivers should NOT use these either.
  */
 #define __pa(x)			__virt_to_phys((unsigned long)(x))
-#define __va(x)			((void *)__phys_to_virt((unsigned long)(x)))
+#define __va(x)			((void *)__phys_to_virt((phys_addr_t)(x)))
 #define pfn_to_kaddr(pfn)	__va((pfn) << PAGE_SHIFT)
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook
  2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
  2013-07-31 16:44 ` [PATCH v2 1/6] ARM: mm: LPAE: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
@ 2013-07-31 16:44 ` Santosh Shilimkar
  2013-08-03  1:53   ` Nicolas Pitre
  2013-07-31 16:44 ` [PATCH v2 3/6] ARM: mm: Move the idmap print to appropriate place in the code Santosh Shilimkar
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Santosh Shilimkar @ 2013-07-31 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

On some PAE systems (e.g. TI Keystone), memory is above the
32-bit addressable limit, and the interconnect provides an
aliased view of parts of physical memory in the 32-bit addressable
space.  This alias is strictly for boot time usage, and is not
otherwise usable because of coherency limitations. On such systems,
the idmap mechanism needs to take this aliased mapping into account.

This patch introduces virt_to_idmap() and a arch function pointer which
can be populated by platform which needs it. Also populate necessary
idmap spots with now available virt_to_idmap(). Avoided #ifdef approach
to be compatible with multi-platform builds.

Most architecture won't touch it and in that case virt_to_idmap()
fall-back to existing virt_to_phys() macro.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/memory.h |   16 ++++++++++++++++
 arch/arm/kernel/smp.c         |    2 +-
 arch/arm/mm/idmap.c           |    5 +++--
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index c133bd9..d9b96c65 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -173,6 +173,7 @@
  */
 #define __PV_BITS_31_24	0x81000000
 
+extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
 extern unsigned long __pv_phys_offset;
 #define PHYS_OFFSET __pv_phys_offset
 
@@ -259,6 +260,21 @@ static inline void *phys_to_virt(phys_addr_t x)
 #define pfn_to_kaddr(pfn)	__va((pfn) << PAGE_SHIFT)
 
 /*
+ * These are for systems that have a hardware interconnect supported alias of
+ * physical memory for idmap purposes.  Most cases should leave these
+ * untouched.
+ */
+static inline phys_addr_t __virt_to_idmap(unsigned long x)
+{
+	if (arch_virt_to_idmap)
+		return arch_virt_to_idmap(x);
+	else
+		return __virt_to_phys(x);
+}
+
+#define virt_to_idmap(x)	__virt_to_idmap((unsigned long)(x))
+
+/*
  * Virtual <-> DMA view memory address translations
  * Again, these are *only* valid on the kernel direct mapped RAM
  * memory.  Use of these is *deprecated* (and that doesn't mean
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index c2b4f8f..d36523d 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -81,7 +81,7 @@ void __init smp_set_ops(struct smp_operations *ops)
 
 static unsigned long get_arch_pgd(pgd_t *pgd)
 {
-	phys_addr_t pgdir = virt_to_phys(pgd);
+	phys_addr_t pgdir = virt_to_idmap(pgd);
 	BUG_ON(pgdir & ARCH_PGD_MASK);
 	return pgdir >> ARCH_PGD_SHIFT;
 }
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 83cb3ac..c0a1e48 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -10,6 +10,7 @@
 #include <asm/system_info.h>
 
 pgd_t *idmap_pgd;
+phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
 
 #ifdef CONFIG_ARM_LPAE
 static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end,
@@ -67,8 +68,8 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start,
 	unsigned long addr, end;
 	unsigned long next;
 
-	addr = virt_to_phys(text_start);
-	end = virt_to_phys(text_end);
+	addr = virt_to_idmap(text_start);
+	end = virt_to_idmap(text_end);
 
 	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/6] ARM: mm: Move the idmap print to appropriate place in the code
  2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
  2013-07-31 16:44 ` [PATCH v2 1/6] ARM: mm: LPAE: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
  2013-07-31 16:44 ` [PATCH v2 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook Santosh Shilimkar
@ 2013-07-31 16:44 ` Santosh Shilimkar
  2013-08-03  1:55   ` Nicolas Pitre
  2013-07-31 16:44 ` [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Santosh Shilimkar @ 2013-07-31 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

Commit 9e9a367c29cebd2 {ARM: Section based HYP idmap} moved
the address conversion inside identity_mapping_add() without
respective print which carries useful idmap information.

Move the print as well inside identity_mapping_add() to
fix the same.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mm/idmap.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index c0a1e48..8e0e52e 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -70,6 +70,7 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start,
 
 	addr = virt_to_idmap(text_start);
 	end = virt_to_idmap(text_end);
+	pr_info("Setting up static identity map for 0x%lx - 0x%lx\n", addr, end);
 
 	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
@@ -91,8 +92,6 @@ static int __init init_static_idmap(void)
 	if (!idmap_pgd)
 		return -ENOMEM;
 
-	pr_info("Setting up static identity map for 0x%p - 0x%p\n",
-		__idmap_text_start, __idmap_text_end);
 	identity_mapping_add(idmap_pgd, __idmap_text_start,
 			     __idmap_text_end, 0);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
                   ` (2 preceding siblings ...)
  2013-07-31 16:44 ` [PATCH v2 3/6] ARM: mm: Move the idmap print to appropriate place in the code Santosh Shilimkar
@ 2013-07-31 16:44 ` Santosh Shilimkar
  2013-07-31 18:31   ` Sricharan R
  2013-08-03  3:28   ` Nicolas Pitre
  2013-07-31 16:44 ` [PATCH v2 5/6] ARM: mm: Update runtime patching code to THUMB2 mode Santosh Shilimkar
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-07-31 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <r.sricharan@ti.com>

The current phys_to_virt patching mechanism does not work
for 64 bit physical addressesp. Note that constant used in add/sub
instructions is encoded in to the last 8 bits of the opcode. So shift
the _pv_offset constant by 24 to get it in to the correct place.

The v2p patching mechanism patches the higher 32bits of physical
address with a constant. While this is correct, in those platforms
where the lowmem addressable physical memory spawns across 4GB boundary,
a carry bit can be produced as a result of addition of lower 32bits.
This has to be taken in to account and added in to the upper. The patched
__pv_offset and va are added in lower 32bits, where __pv_offset can be
in two's complement form when PA_START < VA_START and that can result
in a false carry bit.

e.g PA = 0x80000000 VA = 0xC0000000
__pv_offset = PA - VA = 0xC0000000 (2's complement)

So adding __pv_offset + VA should never result in a true overflow. So in
order to differentiate between a true carry, a __pv_offset is extended
to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
2's complement. We are using the same to insert 'mvn #0' instead of
'mov' while patching.

The above idea was suggested by Nicolas Pitre <nico@linaro.org> as
part of the review of first version of the subject patch.

There is no corresponding change on the phys_to_virt() side, because
computations on the upper 32-bits would be discarded anyway.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/memory.h |   35 +++++++++++++++++++--
 arch/arm/kernel/armksyms.c    |    1 +
 arch/arm/kernel/head.S        |   68 +++++++++++++++++++++++++++++++----------
 3 files changed, 85 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index d9b96c65..abe879d 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -174,7 +174,9 @@
 #define __PV_BITS_31_24	0x81000000
 
 extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
-extern unsigned long __pv_phys_offset;
+extern phys_addr_t __pv_phys_offset;
+extern phys_addr_t __pv_offset;
+
 #define PHYS_OFFSET __pv_phys_offset
 
 #define __pv_stub(from,to,instr,type)			\
@@ -186,10 +188,37 @@ extern unsigned long __pv_phys_offset;
 	: "=r" (to)					\
 	: "r" (from), "I" (type))
 
+#define __pv_stub_mov_hi(t)				\
+	__asm__ volatile("@ __pv_stub_mov\n"		\
+	"1:	mov	%R0, %1\n"			\
+	"	.pushsection .pv_table,\"a\"\n"		\
+	"	.long	1b\n"				\
+	"	.popsection\n"				\
+	: "=r" (t)					\
+	: "I" (__PV_BITS_31_24))
+
+#define __pv_add_carry_stub(x, y)			\
+	__asm__ volatile("@ __pv_add_carry_stub\n"	\
+	"1:	adds	%Q0, %1, %2\n"			\
+	"2:	adc	%R0, %R0, #0\n"			\
+	"	.pushsection .pv_table,\"a\"\n"		\
+	"	.long	1b\n"				\
+	"	.long	2b\n"				\
+	"	.popsection\n"				\
+	: "+r" (y)					\
+	: "r" (x), "I" (__PV_BITS_31_24)		\
+	: "cc")
+
 static inline phys_addr_t __virt_to_phys(unsigned long x)
 {
-	unsigned long t;
-	__pv_stub(x, t, "add", __PV_BITS_31_24);
+	phys_addr_t t = 0;
+
+	if (sizeof(phys_addr_t) == 4) {
+		__pv_stub(x, t, "add", __PV_BITS_31_24);
+	} else {
+		__pv_stub_mov_hi(t);
+		__pv_add_carry_stub(x, t);
+	}
 	return t;
 }
 
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..1f031dd 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
 
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 EXPORT_SYMBOL(__pv_phys_offset);
+EXPORT_SYMBOL(__pv_offset);
 #endif
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 9cf6063..aa3b0f7 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -545,17 +545,22 @@ ENDPROC(fixup_smp)
 	__HEAD
 __fixup_pv_table:
 	adr	r0, 1f
-	ldmia	r0, {r3-r5, r7}
+	ldmia	r0, {r3-r7}
+	cmp	r0, r3
+	mvn	ip, #0
 	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
 	add	r4, r4, r3	@ adjust table start address
 	add	r5, r5, r3	@ adjust table end address
-	add	r7, r7, r3	@ adjust __pv_phys_offset address
-	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
+	add	r6, r6, r3	@ adjust __pv_phys_offset address
+	add	r7, r7, r3	@ adjust __pv_offset address
+	str	r8, [r6]	@ save computed PHYS_OFFSET to __pv_phys_offset
+	strcc	ip, [r7, #4]	@ save to __pv_offset high bits
 	mov	r6, r3, lsr #24	@ constant for add/sub instructions
 	teq	r3, r6, lsl #24 @ must be 16MiB aligned
 THUMB(	it	ne		@ cross section branch )
 	bne	__error
-	str	r6, [r7, #4]	@ save to __pv_offset
+	lsl	r6, r6, #24
+	str	r6, [r7]	@ save to __pv_offset low bits
 	b	__fixup_a_pv_table
 ENDPROC(__fixup_pv_table)
 
@@ -564,6 +569,7 @@ ENDPROC(__fixup_pv_table)
 	.long	__pv_table_begin
 	.long	__pv_table_end
 2:	.long	__pv_phys_offset
+	.long	__pv_offset
 
 	.text
 __fixup_a_pv_table:
@@ -589,27 +595,53 @@ __fixup_a_pv_table:
 	bcc	1b
 	bx	lr
 #else
-	b	2f
+	adr	r0, 5f
+	b	4f
 1:	ldr	ip, [r7, r3]
-	bic	ip, ip, #0x000000ff
-	orr	ip, ip, r6	@ mask in offset bits 31-24
-	str	ip, [r7, r3]
-2:	cmp	r4, r5
+	lsr	r6, ip, #20		@ extract opcode
+	and	r6, r6, #0x3e
+	cmp	r6, #0x28		@ check for 'add' instruction
+	beq	2f
+	cmp	r6, #0x24		@ check for 'sub' instruction
+	beq	2f
+	cmp	r6, #0x2a		@ check for 'adc' instruction
+	beq	4f
+	ldr	r6, [r0]
+	add	r6, r6, r3
+	ldr	r6, [r6, #4]
+	mvn	r11, #0
+	cmp	r11, r6
+	and	ip, ip, #0xf000		@ Register encoded in inst
+	orrne	ip, ip, r6
+	ldreq	r6, [r0, #0x4]		@ mvn if _pv_offset high bits is 0xffffffff
+	ldrne	r6, [r0, #0x8]		@ mov otherwise
+	bic	r6, r6, #0xff
+	bic	r6, r6, #0xf00
+	orr	ip, ip, r6
+	b	3f
+2:	ldr	r6, [r0]
+	ldr	r6, [r6, r3]
+	bic	ip, ip, #0xff
+	orr	ip, ip, r6, lsr #24	@ mask in offset bits 31-24
+3:	str	ip, [r7, r3]
+4:	cmp	r4, r5
 	ldrcc	r7, [r4], #4	@ use branch for delay slot
 	bcc	1b
 	mov	pc, lr
 #endif
 ENDPROC(__fixup_a_pv_table)
 
+5:	.long __pv_offset
+	mvn	r0, #0
+	mov	r0, #0x81000000 @ For getting the correct 4 byte encoding
+
 ENTRY(fixup_pv_table)
-	stmfd	sp!, {r4 - r7, lr}
-	ldr	r2, 2f			@ get address of __pv_phys_offset
+	stmfd	sp!, {r0, r3 - r7, r11 - r12, lr}
 	mov	r3, #0			@ no offset
 	mov	r4, r0			@ r0 = table start
 	add	r5, r0, r1		@ r1 = table size
-	ldr	r6, [r2, #4]		@ get __pv_offset
 	bl	__fixup_a_pv_table
-	ldmfd	sp!, {r4 - r7, pc}
+	ldmfd	sp!, {r0, r3 - r7, r11 - r12, pc}
 ENDPROC(fixup_pv_table)
 
 	.align
@@ -619,10 +651,14 @@ ENDPROC(fixup_pv_table)
 	.globl	__pv_phys_offset
 	.type	__pv_phys_offset, %object
 __pv_phys_offset:
-	.long	0
-	.size	__pv_phys_offset, . - __pv_phys_offset
+	.quad	0
+
+	.data
+	.globl __pv_offset
+	.type __pv_offset, %object
 __pv_offset:
-	.long	0
+	.quad   0
+
 #endif
 
 #include "head-common.S"
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/6] ARM: mm: Update runtime patching code to THUMB2 mode
  2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
                   ` (3 preceding siblings ...)
  2013-07-31 16:44 ` [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
@ 2013-07-31 16:44 ` Santosh Shilimkar
  2013-08-03  3:40   ` Nicolas Pitre
  2013-07-31 16:44 ` [PATCH v2 6/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
  2013-08-03  1:52 ` [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Nicolas Pitre
  6 siblings, 1 reply; 26+ messages in thread
From: Santosh Shilimkar @ 2013-07-31 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <r.sricharan@ti.com>

Update the runtime patching  code to support Thumb2. In testing the
64 bit patching code, the issue was uncovered.

For better review, the patch is kept separate. If needed it can be
merged into "ARM: LPAE: Correct virt_to_phys patching for 64 bit
physical addresses"

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: Sricharan R <r.sricharan@ti.com>
[santosh.shilimkar at ti.com: reduced #ifdef, updated commit log]
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/kernel/head.S |   69 ++++++++++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 29 deletions(-)

diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index aa3b0f7..a70d330 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -573,62 +573,73 @@ ENDPROC(__fixup_pv_table)
 
 	.text
 __fixup_a_pv_table:
-#ifdef CONFIG_THUMB2_KERNEL
-	lsls	r6, #24
-	beq	2f
-	clz	r7, r6
-	lsr	r6, #24
-	lsl	r6, r7
-	bic	r6, #0x0080
-	lsrs	r7, #1
-	orrcs	r6, #0x0080
-	orr	r6, r6, r7, lsl #12
-	orr	r6, #0x4000
-	b	2f
-1:	add     r7, r3
-	ldrh	ip, [r7, #2]
-	and	ip, 0x8f00
-	orr	ip, r6	@ mask in offset bits 31-24
-	strh	ip, [r7, #2]
-2:	cmp	r4, r5
-	ldrcc	r7, [r4], #4	@ use branch for delay slot
-	bcc	1b
-	bx	lr
-#else
 	adr	r0, 5f
 	b	4f
 1:	ldr	ip, [r7, r3]
+ THUMB(	1:	add	r7, r3)
+ THUMB(	ldrh	ip, [r7])
+ THUMB(	ldrh	r6, [r7, #2])
+ THUMB(	orr	ip, r6, ip, lsl #16)
+ ARM( 1: ldr	ip, [r7, r3])
 	lsr	r6, ip, #20		@ extract opcode
 	and	r6, r6, #0x3e
-	cmp	r6, #0x28		@ check for 'add' instruction
+ THUMB(	cmp	r6, #0x10)
+ ARM(	cmp	r6, #0x28)		@ check for 'add' instruction
 	beq	2f
-	cmp	r6, #0x24		@ check for 'sub' instruction
+ THUMB( cmp	r6, #0x1a)
+ ARM(	cmp	r6, #0x24)		@ check for 'sub' instruction
 	beq	2f
-	cmp	r6, #0x2a		@ check for 'adc' instruction
+ THUMB( cmp	r6, #0x14)
+ ARM(	cmp	r6, #0x2a)		@ check for 'adc' instruction
 	beq	4f
 	ldr	r6, [r0]
 	add	r6, r6, r3
 	ldr	r6, [r6, #4]
 	mvn	r11, #0
 	cmp	r11, r6
-	and	ip, ip, #0xf000		@ Register encoded in inst
+ THUMB( and	ip, ip, #0xf00)
+ ARM(	and	ip, ip, #0xf000)	@ Register encoded in inst
 	orrne	ip, ip, r6
 	ldreq	r6, [r0, #0x4]		@ mvn if _pv_offset high bits is 0xffffffff
 	ldrne	r6, [r0, #0x8]		@ mov otherwise
-	bic	r6, r6, #0xff
-	bic	r6, r6, #0xf00
+ THUMB(	mov	r11, r6, lsr #16)
+ THUMB(	mov	r6, r6, lsl #16)
+ THUMB(	orr	r6, r11, r6)
+ THUMB(	bic	r6, r6, #0x7000)
+ THUMB(	bic	r6, r6, #0xff)
+ ARM(	bic	r6, r6, #0xff)
+ ARM(	bic	r6, r6, #0xf00)
 	orr	ip, ip, r6
 	b	3f
 2:	ldr	r6, [r0]
 	ldr	r6, [r6, r3]
+#ifdef CONFIG_THUMB2_KERNEL
+	cmp	r6, #0
+	beq	6f
+	clz	r11, r6
+	lsr	r6, #24
+	lsl	r6, r11
+	bic	r6, #0x0080
+	lsrs	r11, #1
+	orrcs	r6, #0x0080
+	orr	r6, r6, r11, lsl #12
+	orr	r6, #0x4000
+6:	bic	ip, ip, #0x7000
+	bic	ip, ip, #0xff
+	orr	ip, ip, r6		@ mask in offset bits 31-24
+3:	strh	ip, [r7, #2]
+	mov	r6, ip, lsr #16
+	strh	r6, [r7]
+#else
+
 	bic	ip, ip, #0xff
 	orr	ip, ip, r6, lsr #24	@ mask in offset bits 31-24
 3:	str	ip, [r7, r3]
+#endif
 4:	cmp	r4, r5
 	ldrcc	r7, [r4], #4	@ use branch for delay slot
 	bcc	1b
 	mov	pc, lr
-#endif
 ENDPROC(__fixup_a_pv_table)
 
 5:	.long __pv_offset
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 6/6] ARM: mm: Recreate kernel mappings in early_paging_init()
  2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
                   ` (4 preceding siblings ...)
  2013-07-31 16:44 ` [PATCH v2 5/6] ARM: mm: Update runtime patching code to THUMB2 mode Santosh Shilimkar
@ 2013-07-31 16:44 ` Santosh Shilimkar
  2013-08-03  1:52 ` [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Nicolas Pitre
  6 siblings, 0 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-07-31 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds a step in the init sequence, in order to recreate
the kernel code/data page table mappings prior to full paging
initialization.  This is necessary on LPAE systems that run out of
a physical address space outside the 4G limit.  On these systems,
this implementation provides a machine descriptor hook that allows
the PHYS_OFFSET to be overridden in a machine specific fashion.

Based on Cyril's initial patch. The pv_table needs to be patched
again after switching to higher address space.

Cc: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: R Sricharan <r.sricharan@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/mach/arch.h |    1 +
 arch/arm/kernel/setup.c          |    3 ++
 arch/arm/mm/mmu.c                |   82 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 441efc4..8fa734f 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -49,6 +49,7 @@ struct machine_desc {
 	bool			(*smp_init)(void);
 	void			(*fixup)(struct tag *, char **,
 					 struct meminfo *);
+	void			(*init_meminfo)(void);
 	void			(*reserve)(void);/* reserve mem blocks	*/
 	void			(*map_io)(void);/* IO mapping function	*/
 	void			(*init_early)(void);
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 63af9a7..a554d7e 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -72,6 +72,7 @@ static int __init fpe_setup(char *line)
 __setup("fpe=", fpe_setup);
 #endif
 
+extern void early_paging_init(struct machine_desc *, struct proc_info_list *);
 extern void paging_init(struct machine_desc *desc);
 extern void sanity_check_meminfo(void);
 extern enum reboot_mode reboot_mode;
@@ -876,6 +877,8 @@ void __init setup_arch(char **cmdline_p)
 	parse_early_param();
 
 	sort(&meminfo.bank, meminfo.nr_banks, sizeof(meminfo.bank[0]), meminfo_cmp, NULL);
+
+	early_paging_init(mdesc, lookup_processor_type(read_cpuid_id()));
 	sanity_check_meminfo();
 	arm_memblock_init(&meminfo, mdesc);
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4f56617..7910656 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -28,6 +28,7 @@
 #include <asm/highmem.h>
 #include <asm/system_info.h>
 #include <asm/traps.h>
+#include <asm/procinfo.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -1268,6 +1269,87 @@ static void __init map_lowmem(void)
 	}
 }
 
+#ifdef CONFIG_ARM_LPAE
+extern void fixup_pv_table(const void *, unsigned long);
+extern const void *__pv_table_begin, *__pv_table_end;
+
+/*
+ * early_paging_init() recreates boot time page table setup, allowing machines
+ * to switch over to a high (>4G) address space on LPAE systems
+ */
+void __init early_paging_init(struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
+	unsigned long map_start, map_end;
+	pgd_t *pgd0, *pgdk;
+	pud_t *pud0, *pudk;
+	pmd_t *pmd0, *pmdk;
+	phys_addr_t phys;
+	int i;
+
+	/* remap kernel code and data */
+	map_start = init_mm.start_code;
+	map_end   = init_mm.brk;
+
+	/* get a handle on things... */
+	pgd0 = pgd_offset_k(0);
+	pud0 = pud_offset(pgd0, 0);
+	pmd0 = pmd_offset(pud0, 0);
+
+	pgdk = pgd_offset_k(map_start);
+	pudk = pud_offset(pgdk, map_start);
+	pmdk = pmd_offset(pudk, map_start);
+
+	phys = PHYS_OFFSET;
+
+	if (mdesc->init_meminfo) {
+		mdesc->init_meminfo();
+		/* Run the patch stub to update the constants */
+		fixup_pv_table(&__pv_table_begin,
+			(&__pv_table_end - &__pv_table_begin) << 2);
+
+		/*
+		 * Cache cleaning operations for self-modifying code
+		 * We should clean the entries by MVA but running a
+		 * for loop over every pv_table entry pointer would
+		 * just complicate the code.
+		 */
+		flush_cache_louis();
+		dsb();
+		isb();
+	}
+
+	/* remap level 1 table */
+	for (i = 0; i < PTRS_PER_PGD; i++) {
+		*pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
+		pmd0 += PTRS_PER_PMD;
+	}
+
+	/* remap pmds for kernel mapping */
+	phys = __pa(map_start) & PMD_MASK;
+	do {
+		*pmdk++ = __pmd(phys | pmdprot);
+		phys += PMD_SIZE;
+	} while (phys < map_end);
+
+	flush_cache_all();
+	cpu_set_ttbr(0, __pa(pgd0));
+	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
+	local_flush_tlb_all();
+}
+
+#else
+
+void __init early_paging_init(struct machine_desc *mdesc,
+			      struct proc_info_list *procinfo)
+{
+	if (mdesc->init_meminfo)
+		mdesc->init_meminfo();
+}
+
+#endif
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps, and sets up the zero page, bad page and bad page tables.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-07-31 16:44 ` [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
@ 2013-07-31 18:31   ` Sricharan R
  2013-08-03  3:32     ` Nicolas Pitre
  2013-08-03  3:28   ` Nicolas Pitre
  1 sibling, 1 reply; 26+ messages in thread
From: Sricharan R @ 2013-07-31 18:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,
On Wednesday 31 July 2013 10:14 PM, Santosh Shilimkar wrote:
> From: Sricharan R <r.sricharan@ti.com>
>
> The current phys_to_virt patching mechanism does not work
> for 64 bit physical addressesp. Note that constant used in add/sub
> instructions is encoded in to the last 8 bits of the opcode. So shift
> the _pv_offset constant by 24 to get it in to the correct place.
>
> The v2p patching mechanism patches the higher 32bits of physical
> address with a constant. While this is correct, in those platforms
> where the lowmem addressable physical memory spawns across 4GB boundary,
> a carry bit can be produced as a result of addition of lower 32bits.
> This has to be taken in to account and added in to the upper. The patched
> __pv_offset and va are added in lower 32bits, where __pv_offset can be
> in two's complement form when PA_START < VA_START and that can result
> in a false carry bit.
>
> e.g PA = 0x80000000 VA = 0xC0000000
> __pv_offset = PA - VA = 0xC0000000 (2's complement)
>
> So adding __pv_offset + VA should never result in a true overflow. So in
> order to differentiate between a true carry, a __pv_offset is extended
> to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
> 2's complement. We are using the same to insert 'mvn #0' instead of
> 'mov' while patching.
>
> The above idea was suggested by Nicolas Pitre <nico@linaro.org> as
> part of the review of first version of the subject patch.
>
> There is no corresponding change on the phys_to_virt() side, because
> computations on the upper 32-bits would be discarded anyway.
>
> Cc: Nicolas Pitre <nico@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
>
> Signed-off-by: Sricharan R <r.sricharan@ti.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---
>  arch/arm/include/asm/memory.h |   35 +++++++++++++++++++--
>  arch/arm/kernel/armksyms.c    |    1 +
>  arch/arm/kernel/head.S        |   68 +++++++++++++++++++++++++++++++----------
>  3 files changed, 85 insertions(+), 19 deletions(-)
>
> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> index d9b96c65..abe879d 100644
> --- a/arch/arm/include/asm/memory.h
> +++ b/arch/arm/include/asm/memory.h
> @@ -174,7 +174,9 @@
>  #define __PV_BITS_31_24	0x81000000
>  
>  extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
> -extern unsigned long __pv_phys_offset;
> +extern phys_addr_t __pv_phys_offset;
> +extern phys_addr_t __pv_offset;
> +
>  #define PHYS_OFFSET __pv_phys_offset
>  
>  #define __pv_stub(from,to,instr,type)			\
> @@ -186,10 +188,37 @@ extern unsigned long __pv_phys_offset;
>  	: "=r" (to)					\
>  	: "r" (from), "I" (type))
>  
> +#define __pv_stub_mov_hi(t)				\
> +	__asm__ volatile("@ __pv_stub_mov\n"		\
> +	"1:	mov	%R0, %1\n"			\
> +	"	.pushsection .pv_table,\"a\"\n"		\
> +	"	.long	1b\n"				\
> +	"	.popsection\n"				\
> +	: "=r" (t)					\
> +	: "I" (__PV_BITS_31_24))
> +
> +#define __pv_add_carry_stub(x, y)			\
> +	__asm__ volatile("@ __pv_add_carry_stub\n"	\
> +	"1:	adds	%Q0, %1, %2\n"			\
> +	"2:	adc	%R0, %R0, #0\n"			\
> +	"	.pushsection .pv_table,\"a\"\n"		\
> +	"	.long	1b\n"				\
> +	"	.long	2b\n"				\
> +	"	.popsection\n"				\
> +	: "+r" (y)					\
> +	: "r" (x), "I" (__PV_BITS_31_24)		\
> +	: "cc")
> +
>  static inline phys_addr_t __virt_to_phys(unsigned long x)
>  {
> -	unsigned long t;
> -	__pv_stub(x, t, "add", __PV_BITS_31_24);
> +	phys_addr_t t = 0;
> +
> +	if (sizeof(phys_addr_t) == 4) {
> +		__pv_stub(x, t, "add", __PV_BITS_31_24);
> +	} else {
> +		__pv_stub_mov_hi(t);
> +		__pv_add_carry_stub(x, t);
> +	}
>  	return t;
>  }
>  
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..1f031dd 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
>  
>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>  EXPORT_SYMBOL(__pv_phys_offset);
> +EXPORT_SYMBOL(__pv_offset);
>  #endif
> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> index 9cf6063..aa3b0f7 100644
> --- a/arch/arm/kernel/head.S
> +++ b/arch/arm/kernel/head.S
> @@ -545,17 +545,22 @@ ENDPROC(fixup_smp)
>  	__HEAD
>  __fixup_pv_table:
>  	adr	r0, 1f
> -	ldmia	r0, {r3-r5, r7}
> +	ldmia	r0, {r3-r7}
> +	cmp	r0, r3
> +	mvn	ip, #0
>  	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
>  	add	r4, r4, r3	@ adjust table start address
>  	add	r5, r5, r3	@ adjust table end address
> -	add	r7, r7, r3	@ adjust __pv_phys_offset address
> -	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
> +	add	r6, r6, r3	@ adjust __pv_phys_offset address
> +	add	r7, r7, r3	@ adjust __pv_offset address
> +	str	r8, [r6]	@ save computed PHYS_OFFSET to __pv_phys_offset
> +	strcc	ip, [r7, #4]	@ save to __pv_offset high bits
>  	mov	r6, r3, lsr #24	@ constant for add/sub instructions
>  	teq	r3, r6, lsl #24 @ must be 16MiB aligned
>  THUMB(	it	ne		@ cross section branch )
>  	bne	__error
> -	str	r6, [r7, #4]	@ save to __pv_offset
> +	lsl	r6, r6, #24
> +	str	r6, [r7]	@ save to __pv_offset low bits
>  	b	__fixup_a_pv_table
>  ENDPROC(__fixup_pv_table)
>  
> @@ -564,6 +569,7 @@ ENDPROC(__fixup_pv_table)
>  	.long	__pv_table_begin
>  	.long	__pv_table_end
>  2:	.long	__pv_phys_offset
> +	.long	__pv_offset
>  
>  	.text
>  __fixup_a_pv_table:
> @@ -589,27 +595,53 @@ __fixup_a_pv_table:
>  	bcc	1b
>  	bx	lr
>  #else
> -	b	2f
> +	adr	r0, 5f
> +	b	4f
>  1:	ldr	ip, [r7, r3]
> -	bic	ip, ip, #0x000000ff
> -	orr	ip, ip, r6	@ mask in offset bits 31-24
> -	str	ip, [r7, r3]
> -2:	cmp	r4, r5
> +	lsr	r6, ip, #20		@ extract opcode
> +	and	r6, r6, #0x3e
> +	cmp	r6, #0x28		@ check for 'add' instruction
> +	beq	2f
> +	cmp	r6, #0x24		@ check for 'sub' instruction
> +	beq	2f
> +	cmp	r6, #0x2a		@ check for 'adc' instruction
> +	beq	4f
> +	ldr	r6, [r0]
> +	add	r6, r6, r3
> +	ldr	r6, [r6, #4]
> +	mvn	r11, #0
> +	cmp	r11, r6
> +	and	ip, ip, #0xf000		@ Register encoded in inst
> +	orrne	ip, ip, r6
> +	ldreq	r6, [r0, #0x4]		@ mvn if _pv_offset high bits is 0xffffffff
> +	ldrne	r6, [r0, #0x8]		@ mov otherwise
> +	bic	r6, r6, #0xff
> +	bic	r6, r6, #0xf00
> +	orr	ip, ip, r6
> +	b	3f
> +2:	ldr	r6, [r0]
> +	ldr	r6, [r6, r3]
> +	bic	ip, ip, #0xff
> +	orr	ip, ip, r6, lsr #24	@ mask in offset bits 31-24
> +3:	str	ip, [r7, r3]
> +4:	cmp	r4, r5
>  	ldrcc	r7, [r4], #4	@ use branch for delay slot
>  	bcc	1b
>  	mov	pc, lr
>  #endif
>  ENDPROC(__fixup_a_pv_table)
>  
> +5:	.long __pv_offset
> +	mvn	r0, #0
> +	mov	r0, #0x81000000 @ For getting the correct 4 byte encoding
> +
>  ENTRY(fixup_pv_table)
> -	stmfd	sp!, {r4 - r7, lr}
> -	ldr	r2, 2f			@ get address of __pv_phys_offset
> +	stmfd	sp!, {r0, r3 - r7, r11 - r12, lr}
>  	mov	r3, #0			@ no offset
>  	mov	r4, r0			@ r0 = table start
>  	add	r5, r0, r1		@ r1 = table size
> -	ldr	r6, [r2, #4]		@ get __pv_offset
>  	bl	__fixup_a_pv_table
> -	ldmfd	sp!, {r4 - r7, pc}
> +	ldmfd	sp!, {r0, r3 - r7, r11 - r12, pc}
>  ENDPROC(fixup_pv_table)
>  
>  	.align
> @@ -619,10 +651,14 @@ ENDPROC(fixup_pv_table)
>  	.globl	__pv_phys_offset
>  	.type	__pv_phys_offset, %object
>  __pv_phys_offset:
> -	.long	0
> -	.size	__pv_phys_offset, . - __pv_phys_offset
> +	.quad	0
> +
> +	.data
> +	.globl __pv_offset
> +	.type __pv_offset, %object
>  __pv_offset:
> -	.long	0
> +	.quad   0
> +
>  #endif
>  
>  #include "head-common.S"

Just, had another way of implementing this without using the
'opcodes'. By adding a additional data in the stub which would identify
the previous instruction instead of using opcodes. This can make the
patching code little simpler.
Incase, if using opcodes is not good. Like this,

#define PATCH_ADDS      0
#define PATCH_ADDC      1

#define __pv_add_carry_stub(x, y)			\
	__asm__ volatile("@ __pv_add_carry_stub\n"	\
	"1:	adds	%Q0, %1, %2\n"			\
	"2:	adc	%R0, %R0, #0\n"			\
	"	.pushsection .pv_table,\"a\"\n"		\
	"	.long	1b\n"				\
	"       .long (" __stringify(PATCH_ADDS) ")\n"  \  
	"	.long	2b\n"				\   
	"       .long (" __stringify(PATCH_ADDC) ")\n"  \  
	"	.popsection\n"				\
	: "+r" (y)					\
	: "r" (x), "I" (__PV_BITS_31_24)		\
	: "cc")


Regards,
 Sricharan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems
  2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
                   ` (5 preceding siblings ...)
  2013-07-31 16:44 ` [PATCH v2 6/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
@ 2013-08-03  1:52 ` Nicolas Pitre
  2013-08-03 19:02   ` Santosh Shilimkar
  6 siblings, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-03  1:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 31 Jul 2013, Santosh Shilimkar wrote:

> This is version 2 of the series which addresses the comments/suggestions
> we received on earlier version. The series trying to extend the existing
> v2p runtime patching for LPAE machines which can have physical memory
> beyond 4 GB. Keystone is one such ARM machine.

FYI: I never received patch 1/6.

> In testing various modes(LPAE/non-LPAE, ARM/THMUB), we found an issue
> with the patching code for THUMB2 build. So there is a patch added in
> the series to address it. I was planning to squash it in the parent
> 64 bit patch but thought of keeping it separate for ease of review.
> We can squash it while merging the patches. 

Is this an issue with the current code before implementing 64 bit offset 
patching?


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook
  2013-07-31 16:44 ` [PATCH v2 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook Santosh Shilimkar
@ 2013-08-03  1:53   ` Nicolas Pitre
  2013-08-03 19:03     ` Santosh Shilimkar
  0 siblings, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-03  1:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 31 Jul 2013, Santosh Shilimkar wrote:

> On some PAE systems (e.g. TI Keystone), memory is above the
> 32-bit addressable limit, and the interconnect provides an
> aliased view of parts of physical memory in the 32-bit addressable
> space.  This alias is strictly for boot time usage, and is not
> otherwise usable because of coherency limitations. On such systems,
> the idmap mechanism needs to take this aliased mapping into account.
> 
> This patch introduces virt_to_idmap() and a arch function pointer which
> can be populated by platform which needs it. Also populate necessary
> idmap spots with now available virt_to_idmap(). Avoided #ifdef approach
> to be compatible with multi-platform builds.
> 
> Most architecture won't touch it and in that case virt_to_idmap()
> fall-back to existing virt_to_phys() macro.
> 
> Cc: Nicolas Pitre <nico@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> 
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Acked-by: Nicolas Pitre <nico@linaro.org>



> ---
>  arch/arm/include/asm/memory.h |   16 ++++++++++++++++
>  arch/arm/kernel/smp.c         |    2 +-
>  arch/arm/mm/idmap.c           |    5 +++--
>  3 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> index c133bd9..d9b96c65 100644
> --- a/arch/arm/include/asm/memory.h
> +++ b/arch/arm/include/asm/memory.h
> @@ -173,6 +173,7 @@
>   */
>  #define __PV_BITS_31_24	0x81000000
>  
> +extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
>  extern unsigned long __pv_phys_offset;
>  #define PHYS_OFFSET __pv_phys_offset
>  
> @@ -259,6 +260,21 @@ static inline void *phys_to_virt(phys_addr_t x)
>  #define pfn_to_kaddr(pfn)	__va((pfn) << PAGE_SHIFT)
>  
>  /*
> + * These are for systems that have a hardware interconnect supported alias of
> + * physical memory for idmap purposes.  Most cases should leave these
> + * untouched.
> + */
> +static inline phys_addr_t __virt_to_idmap(unsigned long x)
> +{
> +	if (arch_virt_to_idmap)
> +		return arch_virt_to_idmap(x);
> +	else
> +		return __virt_to_phys(x);
> +}
> +
> +#define virt_to_idmap(x)	__virt_to_idmap((unsigned long)(x))
> +
> +/*
>   * Virtual <-> DMA view memory address translations
>   * Again, these are *only* valid on the kernel direct mapped RAM
>   * memory.  Use of these is *deprecated* (and that doesn't mean
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index c2b4f8f..d36523d 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -81,7 +81,7 @@ void __init smp_set_ops(struct smp_operations *ops)
>  
>  static unsigned long get_arch_pgd(pgd_t *pgd)
>  {
> -	phys_addr_t pgdir = virt_to_phys(pgd);
> +	phys_addr_t pgdir = virt_to_idmap(pgd);
>  	BUG_ON(pgdir & ARCH_PGD_MASK);
>  	return pgdir >> ARCH_PGD_SHIFT;
>  }
> diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
> index 83cb3ac..c0a1e48 100644
> --- a/arch/arm/mm/idmap.c
> +++ b/arch/arm/mm/idmap.c
> @@ -10,6 +10,7 @@
>  #include <asm/system_info.h>
>  
>  pgd_t *idmap_pgd;
> +phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
>  
>  #ifdef CONFIG_ARM_LPAE
>  static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end,
> @@ -67,8 +68,8 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start,
>  	unsigned long addr, end;
>  	unsigned long next;
>  
> -	addr = virt_to_phys(text_start);
> -	end = virt_to_phys(text_end);
> +	addr = virt_to_idmap(text_start);
> +	end = virt_to_idmap(text_end);
>  
>  	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 3/6] ARM: mm: Move the idmap print to appropriate place in the code
  2013-07-31 16:44 ` [PATCH v2 3/6] ARM: mm: Move the idmap print to appropriate place in the code Santosh Shilimkar
@ 2013-08-03  1:55   ` Nicolas Pitre
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-03  1:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 31 Jul 2013, Santosh Shilimkar wrote:

> Commit 9e9a367c29cebd2 {ARM: Section based HYP idmap} moved
> the address conversion inside identity_mapping_add() without
> respective print which carries useful idmap information.
> 
> Move the print as well inside identity_mapping_add() to
> fix the same.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Nicolas Pitre <nico@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> 
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Acked-by: Nicolas Pitre <nico@linaro.org>


> ---
>  arch/arm/mm/idmap.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
> index c0a1e48..8e0e52e 100644
> --- a/arch/arm/mm/idmap.c
> +++ b/arch/arm/mm/idmap.c
> @@ -70,6 +70,7 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start,
>  
>  	addr = virt_to_idmap(text_start);
>  	end = virt_to_idmap(text_end);
> +	pr_info("Setting up static identity map for 0x%lx - 0x%lx\n", addr, end);
>  
>  	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  
> @@ -91,8 +92,6 @@ static int __init init_static_idmap(void)
>  	if (!idmap_pgd)
>  		return -ENOMEM;
>  
> -	pr_info("Setting up static identity map for 0x%p - 0x%p\n",
> -		__idmap_text_start, __idmap_text_end);
>  	identity_mapping_add(idmap_pgd, __idmap_text_start,
>  			     __idmap_text_end, 0);
>  
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-07-31 16:44 ` [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
  2013-07-31 18:31   ` Sricharan R
@ 2013-08-03  3:28   ` Nicolas Pitre
  2013-08-03 12:47     ` Sricharan R
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-03  3:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 31 Jul 2013, Santosh Shilimkar wrote:

> From: Sricharan R <r.sricharan@ti.com>
> 
> The current phys_to_virt patching mechanism does not work
> for 64 bit physical addressesp. Note that constant used in add/sub
> instructions is encoded in to the last 8 bits of the opcode. So shift
> the _pv_offset constant by 24 to get it in to the correct place.
> 
> The v2p patching mechanism patches the higher 32bits of physical
> address with a constant. While this is correct, in those platforms
> where the lowmem addressable physical memory spawns across 4GB boundary,
> a carry bit can be produced as a result of addition of lower 32bits.
> This has to be taken in to account and added in to the upper. The patched
> __pv_offset and va are added in lower 32bits, where __pv_offset can be
> in two's complement form when PA_START < VA_START and that can result
> in a false carry bit.
> 
> e.g PA = 0x80000000 VA = 0xC0000000
> __pv_offset = PA - VA = 0xC0000000 (2's complement)
> 
> So adding __pv_offset + VA should never result in a true overflow. So in
> order to differentiate between a true carry, a __pv_offset is extended
> to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
> 2's complement. We are using the same to insert 'mvn #0' instead of
> 'mov' while patching.
> 
> The above idea was suggested by Nicolas Pitre <nico@linaro.org> as
> part of the review of first version of the subject patch.
> 
> There is no corresponding change on the phys_to_virt() side, because
> computations on the upper 32-bits would be discarded anyway.
> 
> Cc: Nicolas Pitre <nico@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> 
> Signed-off-by: Sricharan R <r.sricharan@ti.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

There are still issues with this patch.

>  arch/arm/include/asm/memory.h |   35 +++++++++++++++++++--
>  arch/arm/kernel/armksyms.c    |    1 +
>  arch/arm/kernel/head.S        |   68 +++++++++++++++++++++++++++++++----------
>  3 files changed, 85 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> index d9b96c65..abe879d 100644
> --- a/arch/arm/include/asm/memory.h
> +++ b/arch/arm/include/asm/memory.h
> @@ -174,7 +174,9 @@
>  #define __PV_BITS_31_24	0x81000000
>  
>  extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
> -extern unsigned long __pv_phys_offset;
> +extern phys_addr_t __pv_phys_offset;
> +extern phys_addr_t __pv_offset;
> +
>  #define PHYS_OFFSET __pv_phys_offset
>  
>  #define __pv_stub(from,to,instr,type)			\
> @@ -186,10 +188,37 @@ extern unsigned long __pv_phys_offset;
>  	: "=r" (to)					\
>  	: "r" (from), "I" (type))
>  
> +#define __pv_stub_mov_hi(t)				\
> +	__asm__ volatile("@ __pv_stub_mov\n"		\
> +	"1:	mov	%R0, %1\n"			\
> +	"	.pushsection .pv_table,\"a\"\n"		\
> +	"	.long	1b\n"				\
> +	"	.popsection\n"				\
> +	: "=r" (t)					\
> +	: "I" (__PV_BITS_31_24))

You should not use __PV_BITS_31_24 here.

Please understand the reason for its usage.  The current code uses 
__PV_BITS_31_24 because we want to use instructions following this 
pattern:

	add	rd, rn, #0xii000000

The encoding of an immediate argument is made of a 8 bit value and a 4 
bit rotation.  So an immediate argument must always be at most 8 bit 
wide and aligned to an even bit position.  The stub value is 
__PV_BITS_31_24 so we have:

	add	rd, rn, #0x81000000

The idea behind this choice of 0x81000000 is to let the assembler 
correctly encode the rotation value into the opcode for us so we only 
have the 8 bit literal value to patch i.e. replacing the 0x81 by the 
actual pv_offset value once shifted right by 24 bits.

And that's why the physical RAM start has to be aligned to a 16MB 
boundary: because we want the difference between phys RAM start and 
PAGE_OFFSET to be represented using bits 31 to 24 only.

Now... here you want to patch a mov opcode where the value being patched 
is the high bits of a physical address.  So far we know this is likely 
to be in the low bits of the high word.  You therefore want a stub 
instruction that reads like:

	mov	rd, 0x000000ii

so the rotation field is appropriately set by the assembler.  

Therefore, using __PV_BITS_31_24 here makes no sense.

> +#define __pv_add_carry_stub(x, y)			\
> +	__asm__ volatile("@ __pv_add_carry_stub\n"	\
> +	"1:	adds	%Q0, %1, %2\n"			\
> +	"2:	adc	%R0, %R0, #0\n"			\
> +	"	.pushsection .pv_table,\"a\"\n"		\
> +	"	.long	1b\n"				\
> +	"	.long	2b\n"				\

Why are you tagging the adc instruction here?  This doesn't need to be 
patched, does it?

> +	"	.popsection\n"				\
> +	: "+r" (y)					\
> +	: "r" (x), "I" (__PV_BITS_31_24)		\
> +	: "cc")
> +
>  static inline phys_addr_t __virt_to_phys(unsigned long x)
>  {
> -	unsigned long t;
> -	__pv_stub(x, t, "add", __PV_BITS_31_24);
> +	phys_addr_t t = 0;

Why do you initialize t to 0 ?

> +	if (sizeof(phys_addr_t) == 4) {
> +		__pv_stub(x, t, "add", __PV_BITS_31_24);
> +	} else {
> +		__pv_stub_mov_hi(t);
> +		__pv_add_carry_stub(x, t);
> +	}
>  	return t;
>  }
>  
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..1f031dd 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
>  
>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>  EXPORT_SYMBOL(__pv_phys_offset);
> +EXPORT_SYMBOL(__pv_offset);
>  #endif
> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> index 9cf6063..aa3b0f7 100644
> --- a/arch/arm/kernel/head.S
> +++ b/arch/arm/kernel/head.S
> @@ -545,17 +545,22 @@ ENDPROC(fixup_smp)
>  	__HEAD
>  __fixup_pv_table:
>  	adr	r0, 1f
> -	ldmia	r0, {r3-r5, r7}
> +	ldmia	r0, {r3-r7}
> +	cmp	r0, r3
> +	mvn	ip, #0
>  	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET

Instead of cmp followed by sub, you could simply use subs.

>  	add	r4, r4, r3	@ adjust table start address
>  	add	r5, r5, r3	@ adjust table end address
> -	add	r7, r7, r3	@ adjust __pv_phys_offset address
> -	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
> +	add	r6, r6, r3	@ adjust __pv_phys_offset address
> +	add	r7, r7, r3	@ adjust __pv_offset address
> +	str	r8, [r6]	@ save computed PHYS_OFFSET to __pv_phys_offset
> +	strcc	ip, [r7, #4]	@ save to __pv_offset high bits

This is not big endian safe.

>  	mov	r6, r3, lsr #24	@ constant for add/sub instructions
>  	teq	r3, r6, lsl #24 @ must be 16MiB aligned

Remember the reason for the __PV_BITS_31_24 definition above?
It is applied right here.

>  THUMB(	it	ne		@ cross section branch )
>  	bne	__error
> -	str	r6, [r7, #4]	@ save to __pv_offset
> +	lsl	r6, r6, #24

You already have the right value in r3 from above.

And, for non Thumb specific code, we prefer not to use the new mnemonics 
such as lsl in order to allow the kernel to be compilable with old 
binutils.

> +	str	r6, [r7]	@ save to __pv_offset low bits
>  	b	__fixup_a_pv_table
>  ENDPROC(__fixup_pv_table)
>  
> @@ -564,6 +569,7 @@ ENDPROC(__fixup_pv_table)
>  	.long	__pv_table_begin
>  	.long	__pv_table_end
>  2:	.long	__pv_phys_offset
> +	.long	__pv_offset
>  
>  	.text
>  __fixup_a_pv_table:
> @@ -589,27 +595,53 @@ __fixup_a_pv_table:
>  	bcc	1b
>  	bx	lr
>  #else
> -	b	2f
> +	adr	r0, 5f
> +	b	4f
>  1:	ldr	ip, [r7, r3]
> -	bic	ip, ip, #0x000000ff
> -	orr	ip, ip, r6	@ mask in offset bits 31-24
> -	str	ip, [r7, r3]
> -2:	cmp	r4, r5
> +	lsr	r6, ip, #20		@ extract opcode
> +	and	r6, r6, #0x3e
> +	cmp	r6, #0x28		@ check for 'add' instruction
> +	beq	2f
> +	cmp	r6, #0x24		@ check for 'sub' instruction
> +	beq	2f
[...]

blablabla...

Remember when I said the 0x81 immediate could be augmented with 
additional bits to determine what to patch?  Whether you use 0x81000000 
or 0x00000081 in the stub instruction, the opcode will always have 0x81 
in the least significant bits because that's where the 8 bit immediate 
bit field is located.  So instead of doing this opcode determination, 
you could simply test, say, bit 1 instead:

	ldr	ip, [r7, r3]
	tst	ip, #0x2		@ patching high or low value?
	bic	ip, ip, #0x000000ff
	orreq	ip, ip, r6      @ mask in offset bits 31-24 of low word
	orrne	ip, ip, r?	@ mask in offset bits 0-8 of high word
	str	ip, [r7, r3]

... meaning that, instead of using 0x81 for the stub value on the mov 
instruction, it only has to be 0x83.  Bits 7 and 0 still act as anchors 
for the rotation field in the opcode, while bit 1 indicates which value 
to patch in.

> +	cmp	r6, #0x2a		@ check for 'adc' instruction
> +	beq	4f
> +	ldr	r6, [r0]
> +	add	r6, r6, r3
> +	ldr	r6, [r6, #4]
> +	mvn	r11, #0
> +	cmp	r11, r6

You could have used "cmn r6, #1" instead of the above 2 instructions.

> +	and	ip, ip, #0xf000		@ Register encoded in inst
> +	orrne	ip, ip, r6
> +	ldreq	r6, [r0, #0x4]		@ mvn if _pv_offset high bits is 0xffffffff
> +	ldrne	r6, [r0, #0x8]		@ mov otherwise
> +	bic	r6, r6, #0xff
> +	bic	r6, r6, #0xf00
> +	orr	ip, ip, r6

Hmmm.... More blablablabla. I'm not even sure I understand what's going 
on here...

I'm assuming you want to patch the mov, or turn it into a 'mvn rd, #0' 
if the high value is 0xffffffff.  Instead of this complicated code, 
let's have a look at the mov and mvn opcodes:

mov:

31-28 27 26 25 24 23 22 21 20 19-16 15-12 11-0
cond  0  0  I  1  1  0  1  S  SBZ   Rd    shifter_operand

mvn:

31-28 27 26 25 24 23 22 21 20 19-16 15-12 11-0
cond  0  0  I  1  1  1  1  S  SBZ   Rd    shifter_operand

It is very convenient to notice that only bit 22 differs between the 
two. 

So, you have 2 values to prepare for patching:

1) The high bits of the pv_offset low word held in r6 in the current 
   unpatched code, shifted right by 24 bits.

2) The low bits of the pv_offset low word referenced as being held in
   register r? in my example code above.  It doesn't have to be shifted 
   in this case, although the top 24 bits are expected to all be zero.
   But if the high word is equal to 0xffffffff, then we simply have to
   set r? with bit 22 which will have the effect of turning the existing
   mov into an mvn.

That's it!  No need for more complicated code.

Of course the above analisys is valid for ARM mode only.  Thumb mode is 
a bit more complicated and left as an exercice to the reader.

[...]
> @@ -619,10 +651,14 @@ ENDPROC(fixup_pv_table)
>  	.globl	__pv_phys_offset
>  	.type	__pv_phys_offset, %object
>  __pv_phys_offset:
> -	.long	0
> -	.size	__pv_phys_offset, . - __pv_phys_offset
> +	.quad	0
> +
> +	.data
> +	.globl __pv_offset
> +	.type __pv_offset, %object
>  __pv_offset:
> -	.long	0
> +	.quad   0
> +

Those should probably be moved into a C file where the proper size for 
phys_addr_t can be applied.

Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-07-31 18:31   ` Sricharan R
@ 2013-08-03  3:32     ` Nicolas Pitre
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-03  3:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 1 Aug 2013, Sricharan R wrote:

> Just, had another way of implementing this without using the
> 'opcodes'. By adding a additional data in the stub which would identify
> the previous instruction instead of using opcodes. This can make the
> patching code little simpler.
> Incase, if using opcodes is not good. Like this,
> 
> #define PATCH_ADDS      0
> #define PATCH_ADDC      1

No.  In general the patching code doesn't need to care about the 
instruction being patched.  It only needs to know what value to patch 
in.  See my previous email.



Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 5/6] ARM: mm: Update runtime patching code to THUMB2 mode
  2013-07-31 16:44 ` [PATCH v2 5/6] ARM: mm: Update runtime patching code to THUMB2 mode Santosh Shilimkar
@ 2013-08-03  3:40   ` Nicolas Pitre
  2013-08-03 12:51     ` Sricharan R
  0 siblings, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-03  3:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 31 Jul 2013, Santosh Shilimkar wrote:

> From: Sricharan R <r.sricharan@ti.com>
> 
> Update the runtime patching  code to support Thumb2. In testing the
> 64 bit patching code, the issue was uncovered.
> 
> For better review, the patch is kept separate. If needed it can be
> merged into "ARM: LPAE: Correct virt_to_phys patching for 64 bit
> physical addresses"
> 
> Cc: Nicolas Pitre <nico@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> 
> Signed-off-by: Sricharan R <r.sricharan@ti.com>
> [santosh.shilimkar at ti.com: reduced #ifdef, updated commit log]
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---
>  arch/arm/kernel/head.S |   69 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 40 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> index aa3b0f7..a70d330 100644
> --- a/arch/arm/kernel/head.S
> +++ b/arch/arm/kernel/head.S
> @@ -573,62 +573,73 @@ ENDPROC(__fixup_pv_table)
>  
>  	.text
>  __fixup_a_pv_table:
> -#ifdef CONFIG_THUMB2_KERNEL
> -	lsls	r6, #24
> -	beq	2f
> -	clz	r7, r6
> -	lsr	r6, #24
> -	lsl	r6, r7
> -	bic	r6, #0x0080
> -	lsrs	r7, #1
> -	orrcs	r6, #0x0080
> -	orr	r6, r6, r7, lsl #12
> -	orr	r6, #0x4000
> -	b	2f
> -1:	add     r7, r3
> -	ldrh	ip, [r7, #2]
> -	and	ip, 0x8f00
> -	orr	ip, r6	@ mask in offset bits 31-24
> -	strh	ip, [r7, #2]
> -2:	cmp	r4, r5
> -	ldrcc	r7, [r4], #4	@ use branch for delay slot
> -	bcc	1b
> -	bx	lr
> -#else
>  	adr	r0, 5f
>  	b	4f
>  1:	ldr	ip, [r7, r3]
> + THUMB(	1:	add	r7, r3)
> + THUMB(	ldrh	ip, [r7])
> + THUMB(	ldrh	r6, [r7, #2])
> + THUMB(	orr	ip, r6, ip, lsl #16)
> + ARM( 1: ldr	ip, [r7, r3])
>  	lsr	r6, ip, #20		@ extract opcode

Please don't do this.

- Remember my comment about using mnemonics such as "lsr" in an 
  ARM-mode compiled kernel when old binutils are used.

- There is rather little commonalities between the patching of ARM 
  instructions vs Thumb instructions.  Especially after you take into 
  account my previous comments.  Interlacing them makes it harder 
  to follow as well in this case.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03  3:28   ` Nicolas Pitre
@ 2013-08-03 12:47     ` Sricharan R
  2013-08-03 14:01       ` Nicolas Pitre
  2013-08-03 14:05       ` Russell King - ARM Linux
  0 siblings, 2 replies; 26+ messages in thread
From: Sricharan R @ 2013-08-03 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Nicolas,
  Thanks for all the response again.

On Saturday 03 August 2013 08:58 AM, Nicolas Pitre wrote:
> On Wed, 31 Jul 2013, Santosh Shilimkar wrote:
>
>> From: Sricharan R <r.sricharan@ti.com>
>>
>> The current phys_to_virt patching mechanism does not work
>> for 64 bit physical addressesp. Note that constant used in add/sub
>> instructions is encoded in to the last 8 bits of the opcode. So shift
>> the _pv_offset constant by 24 to get it in to the correct place.
>>
>> The v2p patching mechanism patches the higher 32bits of physical
>> address with a constant. While this is correct, in those platforms
>> where the lowmem addressable physical memory spawns across 4GB boundary,
>> a carry bit can be produced as a result of addition of lower 32bits.
>> This has to be taken in to account and added in to the upper. The patched
>> __pv_offset and va are added in lower 32bits, where __pv_offset can be
>> in two's complement form when PA_START < VA_START and that can result
>> in a false carry bit.
>>
>> e.g PA = 0x80000000 VA = 0xC0000000
>> __pv_offset = PA - VA = 0xC0000000 (2's complement)
>>
>> So adding __pv_offset + VA should never result in a true overflow. So in
>> order to differentiate between a true carry, a __pv_offset is extended
>> to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is
>> 2's complement. We are using the same to insert 'mvn #0' instead of
>> 'mov' while patching.
>>
>> The above idea was suggested by Nicolas Pitre <nico@linaro.org> as
>> part of the review of first version of the subject patch.
>>
>> There is no corresponding change on the phys_to_virt() side, because
>> computations on the upper 32-bits would be discarded anyway.
>>
>> Cc: Nicolas Pitre <nico@linaro.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>>
>> Signed-off-by: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> There are still issues with this patch.
>
>>  arch/arm/include/asm/memory.h |   35 +++++++++++++++++++--
>>  arch/arm/kernel/armksyms.c    |    1 +
>>  arch/arm/kernel/head.S        |   68 +++++++++++++++++++++++++++++++----------
>>  3 files changed, 85 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
>> index d9b96c65..abe879d 100644
>> --- a/arch/arm/include/asm/memory.h
>> +++ b/arch/arm/include/asm/memory.h
>> @@ -174,7 +174,9 @@
>>  #define __PV_BITS_31_24	0x81000000
>>  
>>  extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x);
>> -extern unsigned long __pv_phys_offset;
>> +extern phys_addr_t __pv_phys_offset;
>> +extern phys_addr_t __pv_offset;
>> +
>>  #define PHYS_OFFSET __pv_phys_offset
>>  
>>  #define __pv_stub(from,to,instr,type)			\
>> @@ -186,10 +188,37 @@ extern unsigned long __pv_phys_offset;
>>  	: "=r" (to)					\
>>  	: "r" (from), "I" (type))
>>  
>> +#define __pv_stub_mov_hi(t)				\
>> +	__asm__ volatile("@ __pv_stub_mov\n"		\
>> +	"1:	mov	%R0, %1\n"			\
>> +	"	.pushsection .pv_table,\"a\"\n"		\
>> +	"	.long	1b\n"				\
>> +	"	.popsection\n"				\
>> +	: "=r" (t)					\
>> +	: "I" (__PV_BITS_31_24))
> You should not use __PV_BITS_31_24 here.
>
> Please understand the reason for its usage.  The current code uses 
> __PV_BITS_31_24 because we want to use instructions following this 
> pattern:
>
> 	add	rd, rn, #0xii000000
>
> The encoding of an immediate argument is made of a 8 bit value and a 4 
> bit rotation.  So an immediate argument must always be at most 8 bit 
> wide and aligned to an even bit position.  The stub value is 
> __PV_BITS_31_24 so we have:
>
> 	add	rd, rn, #0x81000000
>
> The idea behind this choice of 0x81000000 is to let the assembler 
> correctly encode the rotation value into the opcode for us so we only 
> have the 8 bit literal value to patch i.e. replacing the 0x81 by the 
> actual pv_offset value once shifted right by 24 bits.
>
> And that's why the physical RAM start has to be aligned to a 16MB 
> boundary: because we want the difference between phys RAM start and 
> PAGE_OFFSET to be represented using bits 31 to 24 only.
>
> Now... here you want to patch a mov opcode where the value being patched 
> is the high bits of a physical address.  So far we know this is likely 
> to be in the low bits of the high word.  You therefore want a stub 
> instruction that reads like:
>
> 	mov	rd, 0x000000ii
>
> so the rotation field is appropriately set by the assembler.  
>
> Therefore, using __PV_BITS_31_24 here makes no sense.
  hmm, I understood this and had kept the
  immediate operand for mov as __PV_BITS_7_0 0x81
  in my first version of the patch.

  So the reason why I made this to be __PV_BITS_31_24 in this version
  is , while looking in to THUMB instruction set, i see that there is a
  16 bit encoding available for the mov rx, #immediate instruction
 
  So i thought that if i use 0x81 as the immediate, then the assembler can
  use 16 bit encoding  while compiling for THUMB2,
  for which i added the support in next patch. This means some instructions
  inside the stub would be 32bits and some 16bits. In order to avoid this i went
 with 0x81000000, forcing 32bit encoding.

   But now actually realise that this 16 bit encoding is true only when used inside
   the IT block.  So i agree with you and will rechange this to __PV_BITS_7_0.
>> +#define __pv_add_carry_stub(x, y)			\
>> +	__asm__ volatile("@ __pv_add_carry_stub\n"	\
>> +	"1:	adds	%Q0, %1, %2\n"			\
>> +	"2:	adc	%R0, %R0, #0\n"			\
>> +	"	.pushsection .pv_table,\"a\"\n"		\
>> +	"	.long	1b\n"				\
>> +	"	.long	2b\n"				\
> Why are you tagging the adc instruction here?  This doesn't need to be 
> patched, does it?
  Correct, not needed. Will remove it.
>> +	"	.popsection\n"				\
>> +	: "+r" (y)					\
>> +	: "r" (x), "I" (__PV_BITS_31_24)		\
>> +	: "cc")
>> +
>>  static inline phys_addr_t __virt_to_phys(unsigned long x)
>>  {
>> -	unsigned long t;
>> -	__pv_stub(x, t, "add", __PV_BITS_31_24);
>> +	phys_addr_t t = 0;
> Why do you initialize t to 0 ?
  ok, will fix this.
>> +	if (sizeof(phys_addr_t) == 4) {
>> +		__pv_stub(x, t, "add", __PV_BITS_31_24);
>> +	} else {
>> +		__pv_stub_mov_hi(t);
>> +		__pv_add_carry_stub(x, t);
>> +	}
>>  	return t;
>>  }
>>  
>> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
>> index 60d3b73..1f031dd 100644
>> --- a/arch/arm/kernel/armksyms.c
>> +++ b/arch/arm/kernel/armksyms.c
>> @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc);
>>  
>>  #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>>  EXPORT_SYMBOL(__pv_phys_offset);
>> +EXPORT_SYMBOL(__pv_offset);
>>  #endif
>> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
>> index 9cf6063..aa3b0f7 100644
>> --- a/arch/arm/kernel/head.S
>> +++ b/arch/arm/kernel/head.S
>> @@ -545,17 +545,22 @@ ENDPROC(fixup_smp)
>>  	__HEAD
>>  __fixup_pv_table:
>>  	adr	r0, 1f
>> -	ldmia	r0, {r3-r5, r7}
>> +	ldmia	r0, {r3-r7}
>> +	cmp	r0, r3
>> +	mvn	ip, #0
>>  	sub	r3, r0, r3	@ PHYS_OFFSET - PAGE_OFFSET
> Instead of cmp followed by sub, you could simply use subs.
 ok. will fix this.
>>  	add	r4, r4, r3	@ adjust table start address
>>  	add	r5, r5, r3	@ adjust table end address
>> -	add	r7, r7, r3	@ adjust __pv_phys_offset address
>> -	str	r8, [r7]	@ save computed PHYS_OFFSET to __pv_phys_offset
>> +	add	r6, r6, r3	@ adjust __pv_phys_offset address
>> +	add	r7, r7, r3	@ adjust __pv_offset address
>> +	str	r8, [r6]	@ save computed PHYS_OFFSET to __pv_phys_offset
>> +	strcc	ip, [r7, #4]	@ save to __pv_offset high bits
> This is not big endian safe.
 hmm, will add check for this and in other places applicable.
>>  	mov	r6, r3, lsr #24	@ constant for add/sub instructions
>>  	teq	r3, r6, lsl #24 @ must be 16MiB aligned
> Remember the reason for the __PV_BITS_31_24 definition above?
> It is applied right here.
>
>>  THUMB(	it	ne		@ cross section branch )
>>  	bne	__error
>> -	str	r6, [r7, #4]	@ save to __pv_offset
>> +	lsl	r6, r6, #24
> You already have the right value in r3 from above.
>
> And, for non Thumb specific code, we prefer not to use the new mnemonics 
> such as lsl in order to allow the kernel to be compilable with old 
> binutils.
  Ok, will change this.
>> +	str	r6, [r7]	@ save to __pv_offset low bits
>>  	b	__fixup_a_pv_table
>>  ENDPROC(__fixup_pv_table)
>>  
>> @@ -564,6 +569,7 @@ ENDPROC(__fixup_pv_table)
>>  	.long	__pv_table_begin
>>  	.long	__pv_table_end
>>  2:	.long	__pv_phys_offset
>> +	.long	__pv_offset
>>  
>>  	.text
>>  __fixup_a_pv_table:
>> @@ -589,27 +595,53 @@ __fixup_a_pv_table:
>>  	bcc	1b
>>  	bx	lr
>>  #else
>> -	b	2f
>> +	adr	r0, 5f
>> +	b	4f
>>  1:	ldr	ip, [r7, r3]
>> -	bic	ip, ip, #0x000000ff
>> -	orr	ip, ip, r6	@ mask in offset bits 31-24
>> -	str	ip, [r7, r3]
>> -2:	cmp	r4, r5
>> +	lsr	r6, ip, #20		@ extract opcode
>> +	and	r6, r6, #0x3e
>> +	cmp	r6, #0x28		@ check for 'add' instruction
>> +	beq	2f
>> +	cmp	r6, #0x24		@ check for 'sub' instruction
>> +	beq	2f
> [...]
>
>
> blablabla...
>
> Remember when I said the 0x81 immediate could be augmented with 
> additional bits to determine what to patch?  Whether you use 0x81000000 
> or 0x00000081 in the stub instruction, the opcode will always have 0x81 
> in the least significant bits because that's where the 8 bit immediate 
> bit field is located.  So instead of doing this opcode determination, 
> you could simply test, say, bit 1 instead:
>
> 	ldr	ip, [r7, r3]
> 	tst	ip, #0x2		@ patching high or low value?
> 	bic	ip, ip, #0x000000ff
> 	orreq	ip, ip, r6      @ mask in offset bits 31-24 of low word
> 	orrne	ip, ip, r?	@ mask in offset bits 0-8 of high word
> 	str	ip, [r7, r3]
>
> ... meaning that, instead of using 0x81 for the stub value on the mov 
> instruction, it only has to be 0x83.  Bits 7 and 0 still act as anchors 
> for the rotation field in the opcode, while bit 1 indicates which value 
> to patch in.
  I started with this kind of augmenting with the immediate operand
  while starting V2. But the problem was, we do the runtime patching twice.

  Once while booting from low address alias space and
  other while switching to higher memory address. So in this
  case, the augmented bits are lost after doing the patching for
  the first time. So the augmented bits are not valid while trying
  to patch for second time.

  This was the reason, i thought of using opcodes which
  is always there. Also another approach of pushing some
  identifier data inside the stub. Like,
 

  #define PATCH_HIGH_BITS      1


#define __pv_add_carry_stub(x, y)			\
	__asm__ volatile("@ __pv_add_carry_stub\n"	\
	"1:	adds	%Q0, %1, %2\n"			\
	"2:	adc	%R0, %R0, #0\n"			\
	"	.pushsection .pv_table,\"a\"\n"		\
	"	.long	1b\n"				\
	"       .long (" __stringify(PATCH_ADDS) ")\n"  \  
	"	.long	2b\n"				\   
	"       .long (" __stringify(PATCH_HIGH_BITS) ")\n"  \  
	"	.popsection\n"				\
	: "+r" (y)					\
	: "r" (x), "I" (__PV_BITS_31_24)		\
	: "cc")

 
   With this, the data is always inside the stub and not lost.
   I see that you commented on this approach already,
   but not able to think of other ways of handling this
  two time patching.
>> +	cmp	r6, #0x2a		@ check for 'adc' instruction
>> +	beq	4f
>> +	ldr	r6, [r0]
>> +	add	r6, r6, r3
>> +	ldr	r6, [r6, #4]
>> +	mvn	r11, #0
>> +	cmp	r11, r6
> You could have used "cmn r6, #1" instead of the above 2 instructions.
 ok. Thanks, will fix this.
>> +	and	ip, ip, #0xf000		@ Register encoded in inst
>> +	orrne	ip, ip, r6
>> +	ldreq	r6, [r0, #0x4]		@ mvn if _pv_offset high bits is 0xffffffff
>> +	ldrne	r6, [r0, #0x8]		@ mov otherwise
>> +	bic	r6, r6, #0xff
>> +	bic	r6, r6, #0xf00
>> +	orr	ip, ip, r6
> Hmmm.... More blablablabla. I'm not even sure I understand what's going 
> on here...
  This was because the __PV_BITS_31_0 which i used for mov and
  that resulted in clearing the shifts. Anyways now this is not needed
  at all and will simplify this.
> I'm assuming you want to patch the mov, or turn it into a 'mvn rd, #0' 
> if the high value is 0xffffffff.  Instead of this complicated code, 
> let's have a look at the mov and mvn opcodes:
>
> mov:
>
> 31-28 27 26 25 24 23 22 21 20 19-16 15-12 11-0
> cond  0  0  I  1  1  0  1  S  SBZ   Rd    shifter_operand
>
> mvn:
>
> 31-28 27 26 25 24 23 22 21 20 19-16 15-12 11-0
> cond  0  0  I  1  1  1  1  S  SBZ   Rd    shifter_operand
>
> It is very convenient to notice that only bit 22 differs between the 
> two. 
  ok, Thanks again for this optimization.
  Will use this trick.
> So, you have 2 values to prepare for patching:
>
> 1) The high bits of the pv_offset low word held in r6 in the current 
>    unpatched code, shifted right by 24 bits.
>
> 2) The low bits of the pv_offset low word referenced as being held in
>    register r? in my example code above.  It doesn't have to be shifted 
>    in this case, although the top 24 bits are expected to all be zero.
>    But if the high word is equal to 0xffffffff, then we simply have to
>    set r? with bit 22 which will have the effect of turning the existing
>    mov into an mvn.
>
> That's it!  No need for more complicated code.
>
> Of course the above analisys is valid for ARM mode only.  Thumb mode is 
> a bit more complicated and left as an exercice to the reader.
>
> [...]
 ok, i will rework with all the above points. Only thing that i am
 stuck is the two time patching part, that i explained above.

 Again, thanks for all the points.
>> @@ -619,10 +651,14 @@ ENDPROC(fixup_pv_table)
>>  	.globl	__pv_phys_offset
>>  	.type	__pv_phys_offset, %object
>>  __pv_phys_offset:
>> -	.long	0
>> -	.size	__pv_phys_offset, . - __pv_phys_offset
>> +	.quad	0
>> +
>> +	.data
>> +	.globl __pv_offset
>> +	.type __pv_offset, %object
>>  __pv_offset:
>> -	.long	0
>> +	.quad   0
>> +
> Those should probably be moved into a C file where the proper size for 
> phys_addr_t can be applied.
>
  Ok, sure. Will move this to C file.
> Nicolas

Regards,
 Sricharan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 5/6] ARM: mm: Update runtime patching code to THUMB2 mode
  2013-08-03  3:40   ` Nicolas Pitre
@ 2013-08-03 12:51     ` Sricharan R
  0 siblings, 0 replies; 26+ messages in thread
From: Sricharan R @ 2013-08-03 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Nicolas,

On Saturday 03 August 2013 09:10 AM, Nicolas Pitre wrote:
> On Wed, 31 Jul 2013, Santosh Shilimkar wrote:
>
>> From: Sricharan R <r.sricharan@ti.com>
>>
>> Update the runtime patching  code to support Thumb2. In testing the
>> 64 bit patching code, the issue was uncovered.
>>
>> For better review, the patch is kept separate. If needed it can be
>> merged into "ARM: LPAE: Correct virt_to_phys patching for 64 bit
>> physical addresses"
>>
>> Cc: Nicolas Pitre <nico@linaro.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>>
>> Signed-off-by: Sricharan R <r.sricharan@ti.com>
>> [santosh.shilimkar at ti.com: reduced #ifdef, updated commit log]
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
>> ---
>>  arch/arm/kernel/head.S |   69 ++++++++++++++++++++++++++++--------------------
>>  1 file changed, 40 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
>> index aa3b0f7..a70d330 100644
>> --- a/arch/arm/kernel/head.S
>> +++ b/arch/arm/kernel/head.S
>> @@ -573,62 +573,73 @@ ENDPROC(__fixup_pv_table)
>>  
>>  	.text
>>  __fixup_a_pv_table:
>> -#ifdef CONFIG_THUMB2_KERNEL
>> -	lsls	r6, #24
>> -	beq	2f
>> -	clz	r7, r6
>> -	lsr	r6, #24
>> -	lsl	r6, r7
>> -	bic	r6, #0x0080
>> -	lsrs	r7, #1
>> -	orrcs	r6, #0x0080
>> -	orr	r6, r6, r7, lsl #12
>> -	orr	r6, #0x4000
>> -	b	2f
>> -1:	add     r7, r3
>> -	ldrh	ip, [r7, #2]
>> -	and	ip, 0x8f00
>> -	orr	ip, r6	@ mask in offset bits 31-24
>> -	strh	ip, [r7, #2]
>> -2:	cmp	r4, r5
>> -	ldrcc	r7, [r4], #4	@ use branch for delay slot
>> -	bcc	1b
>> -	bx	lr
>> -#else
>>  	adr	r0, 5f
>>  	b	4f
>>  1:	ldr	ip, [r7, r3]
>> + THUMB(	1:	add	r7, r3)
>> + THUMB(	ldrh	ip, [r7])
>> + THUMB(	ldrh	r6, [r7, #2])
>> + THUMB(	orr	ip, r6, ip, lsl #16)
>> + ARM( 1: ldr	ip, [r7, r3])
>>  	lsr	r6, ip, #20		@ extract opcode
> Please don't do this.
>
> - Remember my comment about using mnemonics such as "lsr" in an 
>   ARM-mode compiled kernel when old binutils are used.
 Ok, i will remove this to be generic.
> - There is rather little commonalities between the patching of ARM 
>   instructions vs Thumb instructions.  Especially after you take into 
>   account my previous comments.  Interlacing them makes it harder 
>   to follow as well in this case.
>
>
> Nicolas
 Ok, then i will keep it under #ifdef CONFIG_THUMB2_KERNEL
 as it was originally.

Regards,
 Sricharan
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03 12:47     ` Sricharan R
@ 2013-08-03 14:01       ` Nicolas Pitre
  2013-08-03 19:25         ` Santosh Shilimkar
  2013-08-03 14:05       ` Russell King - ARM Linux
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-03 14:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 3 Aug 2013, Sricharan R wrote:

> On Saturday 03 August 2013 08:58 AM, Nicolas Pitre wrote:
> > ... meaning that, instead of using 0x81 for the stub value on the mov 
> > instruction, it only has to be 0x83.  Bits 7 and 0 still act as anchors 
> > for the rotation field in the opcode, while bit 1 indicates which value 
> > to patch in.
>   I started with this kind of augmenting with the immediate operand
>   while starting V2. But the problem was, we do the runtime patching twice.

Ahhh... Bummer.

>   Once while booting from low address alias space and
>   other while switching to higher memory address. So in this
>   case, the augmented bits are lost after doing the patching for
>   the first time. So the augmented bits are not valid while trying
>   to patch for second time.
> 
>   This was the reason, i thought of using opcodes which
>   is always there.

In which case you only have to look for a mov.

Better yet: just look at the rotate field in the shifter operand.  If it 
is 0 then it is the __PV_BITS_8_0 case, if it is non zero then it is the 
__PV_BITS_31_24 case.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03 12:47     ` Sricharan R
  2013-08-03 14:01       ` Nicolas Pitre
@ 2013-08-03 14:05       ` Russell King - ARM Linux
  2013-08-03 14:09         ` Russell King - ARM Linux
  1 sibling, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2013-08-03 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Aug 03, 2013 at 06:17:29PM +0530, Sricharan R wrote:
>   I started with this kind of augmenting with the immediate operand
>   while starting V2. But the problem was, we do the runtime patching twice.

It might be much better to do this only once, and instead of having the
early code overwrite the page table in use, create a new page table with
all the correct page table entries in and switch to that.

This also has the added advantage that there's no longer a page table at
a known fixed address which may be useful to attackers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03 14:05       ` Russell King - ARM Linux
@ 2013-08-03 14:09         ` Russell King - ARM Linux
  2013-08-03 19:15           ` Santosh Shilimkar
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2013-08-03 14:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Aug 03, 2013 at 03:05:44PM +0100, Russell King - ARM Linux wrote:
> On Sat, Aug 03, 2013 at 06:17:29PM +0530, Sricharan R wrote:
> >   I started with this kind of augmenting with the immediate operand
> >   while starting V2. But the problem was, we do the runtime patching twice.
> 
> It might be much better to do this only once, and instead of having the
> early code overwrite the page table in use, create a new page table with
> all the correct page table entries in and switch to that.

Note: we still need to do a certain amount of modification of the existing
page table so that we _can_ perform such a switch on all our CPUs - that
is, ensuring that the region for flushing the CPU caches on processors
which need it is properly mapped.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems
  2013-08-03  1:52 ` [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Nicolas Pitre
@ 2013-08-03 19:02   ` Santosh Shilimkar
  0 siblings, 0 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-08-03 19:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 August 2013 09:52 PM, Nicolas Pitre wrote:
> On Wed, 31 Jul 2013, Santosh Shilimkar wrote:
> 
>> This is version 2 of the series which addresses the comments/suggestions
>> we received on earlier version. The series trying to extend the existing
>> v2p runtime patching for LPAE machines which can have physical memory
>> beyond 4 GB. Keystone is one such ARM machine.
> 
> FYI: I never received patch 1/6.
>
Looks like since I converted the CC to ack for the 1/6, you didn't
received it.

 
>> In testing various modes(LPAE/non-LPAE, ARM/THMUB), we found an issue
>> with the patching code for THUMB2 build. So there is a patch added in
>> the series to address it. I was planning to squash it in the parent
>> 64 bit patch but thought of keeping it separate for ease of review.
>> We can squash it while merging the patches. 
> 
> Is this an issue with the current code before implementing 64 bit offset 
> patching?
> 
The issue was found with newly added code. I already see quite a bit of
discussion on those patches.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook
  2013-08-03  1:53   ` Nicolas Pitre
@ 2013-08-03 19:03     ` Santosh Shilimkar
  0 siblings, 0 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-08-03 19:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 August 2013 09:53 PM, Nicolas Pitre wrote:
> On Wed, 31 Jul 2013, Santosh Shilimkar wrote:
> 
>> On some PAE systems (e.g. TI Keystone), memory is above the
>> 32-bit addressable limit, and the interconnect provides an
>> aliased view of parts of physical memory in the 32-bit addressable
>> space.  This alias is strictly for boot time usage, and is not
>> otherwise usable because of coherency limitations. On such systems,
>> the idmap mechanism needs to take this aliased mapping into account.
>>
>> This patch introduces virt_to_idmap() and a arch function pointer which
>> can be populated by platform which needs it. Also populate necessary
>> idmap spots with now available virt_to_idmap(). Avoided #ifdef approach
>> to be compatible with multi-platform builds.
>>
>> Most architecture won't touch it and in that case virt_to_idmap()
>> fall-back to existing virt_to_phys() macro.
>>
>> Cc: Nicolas Pitre <nico@linaro.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> 
> Acked-by: Nicolas Pitre <nico@linaro.org>
> 
Thanks !!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03 14:09         ` Russell King - ARM Linux
@ 2013-08-03 19:15           ` Santosh Shilimkar
  2013-08-09 19:37             ` Santosh Shilimkar
  0 siblings, 1 reply; 26+ messages in thread
From: Santosh Shilimkar @ 2013-08-03 19:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Saturday 03 August 2013 10:09 AM, Russell King - ARM Linux wrote:
> On Sat, Aug 03, 2013 at 03:05:44PM +0100, Russell King - ARM Linux wrote:
>> On Sat, Aug 03, 2013 at 06:17:29PM +0530, Sricharan R wrote:
>>>   I started with this kind of augmenting with the immediate operand
>>>   while starting V2. But the problem was, we do the runtime patching twice.
>>
>> It might be much better to do this only once, and instead of having the
>> early code overwrite the page table in use, create a new page table with
>> all the correct page table entries in and switch to that.
>
The twice patching approach was taken obviously from the last discussion
where you suggested to avoid too many changes to the existing patching
code on Cyril's proposal. And the idea was obvious to delay the patching
as late as machine code initialization so that it easy to patch and maintain.
 
> Note: we still need to do a certain amount of modification of the existing
> page table so that we _can_ perform such a switch on all our CPUs - that
> is, ensuring that the region for flushing the CPU caches on processors
> which need it is properly mapped.
> 
We probably need some more guidance on this approach. Last attempt was
more or less removing the early patching code and operating directly
on pv_offset variable to start with. And then in late code patching
the stub itself were built to operate on pv_offsets. You didn't like
that approach so to ensure that we follow your idea properly some
more explanation would help.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03 14:01       ` Nicolas Pitre
@ 2013-08-03 19:25         ` Santosh Shilimkar
  2013-08-04  5:32           ` Nicolas Pitre
  0 siblings, 1 reply; 26+ messages in thread
From: Santosh Shilimkar @ 2013-08-03 19:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Saturday 03 August 2013 10:01 AM, Nicolas Pitre wrote:
> On Sat, 3 Aug 2013, Sricharan R wrote:
> 
>> On Saturday 03 August 2013 08:58 AM, Nicolas Pitre wrote:
>>> ... meaning that, instead of using 0x81 for the stub value on the mov 
>>> instruction, it only has to be 0x83.  Bits 7 and 0 still act as anchors 
>>> for the rotation field in the opcode, while bit 1 indicates which value 
>>> to patch in.
>>   I started with this kind of augmenting with the immediate operand
>>   while starting V2. But the problem was, we do the runtime patching twice.
> 
> Ahhh... Bummer.
>
Sorry if it wasn't clear but I thought we discussed why patching is
done twice. This was purely based on the discussion where RMK
suggested to follow that approach to minimize code changes.
 
Looks like we need to revisit that now based on Russell's latest
comment.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03 19:25         ` Santosh Shilimkar
@ 2013-08-04  5:32           ` Nicolas Pitre
  2013-08-05 14:38             ` Santosh Shilimkar
  0 siblings, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2013-08-04  5:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 3 Aug 2013, Santosh Shilimkar wrote:

> On Saturday 03 August 2013 10:01 AM, Nicolas Pitre wrote:
> > On Sat, 3 Aug 2013, Sricharan R wrote:
> > 
> >> On Saturday 03 August 2013 08:58 AM, Nicolas Pitre wrote:
> >>> ... meaning that, instead of using 0x81 for the stub value on the mov 
> >>> instruction, it only has to be 0x83.  Bits 7 and 0 still act as anchors 
> >>> for the rotation field in the opcode, while bit 1 indicates which value 
> >>> to patch in.
> >>   I started with this kind of augmenting with the immediate operand
> >>   while starting V2. But the problem was, we do the runtime patching twice.
> > 
> > Ahhh... Bummer.
> >
> Sorry if it wasn't clear but I thought we discussed why patching is
> done twice.

Yeah, I know the reasons.  I just had forgotten about the effects on the 
anchor bits.

> This was purely based on the discussion where RMK suggested to follow 
> that approach to minimize code changes.
>  
> Looks like we need to revisit that now based on Russell's latest
> comment.

Note that my comments on this particular patch are still valid and 
independent from whatever approach is used globally to deal with the 
memory alias.  I do think that the value to patch should be selected 
depending on the opcode's rotation field which makes it compatible with 
a double patching approach as well.


Nicolas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-04  5:32           ` Nicolas Pitre
@ 2013-08-05 14:38             ` Santosh Shilimkar
  0 siblings, 0 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-08-05 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 04 August 2013 01:32 AM, Nicolas Pitre wrote:
> On Sat, 3 Aug 2013, Santosh Shilimkar wrote:
> 
>> On Saturday 03 August 2013 10:01 AM, Nicolas Pitre wrote:
>>> On Sat, 3 Aug 2013, Sricharan R wrote:
>>>
>>>> On Saturday 03 August 2013 08:58 AM, Nicolas Pitre wrote:
>>>>> ... meaning that, instead of using 0x81 for the stub value on the mov 
>>>>> instruction, it only has to be 0x83.  Bits 7 and 0 still act as anchors 
>>>>> for the rotation field in the opcode, while bit 1 indicates which value 
>>>>> to patch in.
>>>>   I started with this kind of augmenting with the immediate operand
>>>>   while starting V2. But the problem was, we do the runtime patching twice.
>>>
>>> Ahhh... Bummer.
>>>
>> Sorry if it wasn't clear but I thought we discussed why patching is
>> done twice.
> 
> Yeah, I know the reasons.  I just had forgotten about the effects on the 
> anchor bits.
>
I see.
 
>> This was purely based on the discussion where RMK suggested to follow 
>> that approach to minimize code changes.
>>  
>> Looks like we need to revisit that now based on Russell's latest
>> comment.
> 
> Note that my comments on this particular patch are still valid and 
> independent from whatever approach is used globally to deal with the 
> memory alias.  I do think that the value to patch should be selected 
> depending on the opcode's rotation field which makes it compatible with 
> a double patching approach as well.
> 
Completely agree.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses
  2013-08-03 19:15           ` Santosh Shilimkar
@ 2013-08-09 19:37             ` Santosh Shilimkar
  0 siblings, 0 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2013-08-09 19:37 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Saturday 03 August 2013 03:15 PM, Santosh Shilimkar wrote:
> On Saturday 03 August 2013 10:09 AM, Russell King - ARM Linux wrote:
>> On Sat, Aug 03, 2013 at 03:05:44PM +0100, Russell King - ARM Linux wrote:
>>> On Sat, Aug 03, 2013 at 06:17:29PM +0530, Sricharan R wrote:
>>>>   I started with this kind of augmenting with the immediate operand
>>>>   while starting V2. But the problem was, we do the runtime patching twice.
>>>
>>> It might be much better to do this only once, and instead of having the
>>> early code overwrite the page table in use, create a new page table with
>>> all the correct page table entries in and switch to that.
>>
> The twice patching approach was taken obviously from the last discussion
> where you suggested to avoid too many changes to the existing patching
> code on Cyril's proposal. And the idea was obvious to delay the patching
> as late as machine code initialization so that it easy to patch and maintain.
>  
>> Note: we still need to do a certain amount of modification of the existing
>> page table so that we _can_ perform such a switch on all our CPUs - that
>> is, ensuring that the region for flushing the CPU caches on processors
>> which need it is properly mapped.
>>
> We probably need some more guidance on this approach. Last attempt was
> more or less removing the early patching code and operating directly
> on pv_offset variable to start with. And then in late code patching
> the stub itself were built to operate on pv_offsets. You didn't like
> that approach so to ensure that we follow your idea properly some
> more explanation would help.
> 
Considering the $subject patch is now more or less sorted out, I
would like to hear on your idea about one time patching.

Thanks

Regards,
Santosh

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-08-09 19:37 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-31 16:44 [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Santosh Shilimkar
2013-07-31 16:44 ` [PATCH v2 1/6] ARM: mm: LPAE: use phys_addr_t appropriately in p2v and v2p conversions Santosh Shilimkar
2013-07-31 16:44 ` [PATCH v2 2/6] ARM: mm: Introduce virt_to_idmap() with an arch hook Santosh Shilimkar
2013-08-03  1:53   ` Nicolas Pitre
2013-08-03 19:03     ` Santosh Shilimkar
2013-07-31 16:44 ` [PATCH v2 3/6] ARM: mm: Move the idmap print to appropriate place in the code Santosh Shilimkar
2013-08-03  1:55   ` Nicolas Pitre
2013-07-31 16:44 ` [PATCH v2 4/6] ARM: mm: LPAE: Correct virt_to_phys patching for 64 bit physical addresses Santosh Shilimkar
2013-07-31 18:31   ` Sricharan R
2013-08-03  3:32     ` Nicolas Pitre
2013-08-03  3:28   ` Nicolas Pitre
2013-08-03 12:47     ` Sricharan R
2013-08-03 14:01       ` Nicolas Pitre
2013-08-03 19:25         ` Santosh Shilimkar
2013-08-04  5:32           ` Nicolas Pitre
2013-08-05 14:38             ` Santosh Shilimkar
2013-08-03 14:05       ` Russell King - ARM Linux
2013-08-03 14:09         ` Russell King - ARM Linux
2013-08-03 19:15           ` Santosh Shilimkar
2013-08-09 19:37             ` Santosh Shilimkar
2013-07-31 16:44 ` [PATCH v2 5/6] ARM: mm: Update runtime patching code to THUMB2 mode Santosh Shilimkar
2013-08-03  3:40   ` Nicolas Pitre
2013-08-03 12:51     ` Sricharan R
2013-07-31 16:44 ` [PATCH v2 6/6] ARM: mm: Recreate kernel mappings in early_paging_init() Santosh Shilimkar
2013-08-03  1:52 ` [PATCH v2 0/6] ARM: mm: Extend the runtime patch stub for PAE systems Nicolas Pitre
2013-08-03 19:02   ` Santosh Shilimkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.