linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
@ 2020-09-18 10:30 Ard Biesheuvel
  2020-09-18 10:30 ` [RFC/RFT PATCH 1/6] ARM: p2v: factor out shared loop processing Ard Biesheuvel
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 10:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei, Ard Biesheuvel

This series is inspired by Zhei Len's series [0], which updates the
ARM p2v patching code to optionally support p2v relative alignments
of as little as 64 KiB.

Reducing this alignment is necessary for some specific Huawei boards,
but given that reducing this minimum alignment will make the boot
sequence more robust for all platforms, especially EFI boot, which
no longer relies on the 128 MB masking of the decompressor load address,
but uses firmware memory allocation routines to find a suitable spot
for the decompressed kernel.

This series is not based on Zhei Len's code, but addresses the same
problem, and takes some feedback given in the review into account:
- use of a MOVW instruction to avoid two adds/adcs sequences when dealing
  with the carry on LPAE
- add support for Thumb2 kernels as well
- make the change unconditional - it will bit rot otherwise, and has value
  for other platforms as well.

The first four patches are general cleanup and preparatory changes.
Patch #5 implements the switch to a MOVW instruction without changing
the minimum alignment.
Patch #6 reduces the minimum alignment to 2 MiB.

Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.

Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Nicolas Pitre <nico@fluxnic.net>

[0] https://lore.kernel.org/linux-arm-kernel/20200915015204.2971-1-thunder.leizhen@huawei.com/

Ard Biesheuvel (6):
  ARM: p2v: factor out shared loop processing
  ARM: p2v: factor out BE8 handling
  ARM: p2v: drop redundant 'type' argument from __pv_stub
  ARM: p2v: use relative references in patch site arrays
  ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE
  ARM: p2v: reduce p2v alignment requirement to 2 MiB

 arch/arm/Kconfig              |   2 +-
 arch/arm/include/asm/memory.h |  58 ++++++---
 arch/arm/kernel/head.S        | 136 ++++++++++++--------
 3 files changed, 123 insertions(+), 73 deletions(-)

-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC/RFT PATCH 1/6] ARM: p2v: factor out shared loop processing
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
@ 2020-09-18 10:30 ` Ard Biesheuvel
  2020-09-18 10:30 ` [RFC/RFT PATCH 2/6] ARM: p2v: factor out BE8 handling Ard Biesheuvel
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 10:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/kernel/head.S | 24 +++++++++-----------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index f8904227e7fd..9a0c11ac8281 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -641,7 +641,7 @@ __fixup_a_pv_table:
 #ifdef CONFIG_THUMB2_KERNEL
 	moveq	r0, #0x200000	@ set bit 21, mov to mvn instruction
 	lsls	r6, #24
-	beq	2f
+	beq	.Lnext
 	clz	r7, r6
 	lsr	r6, #24
 	lsl	r6, r7
@@ -650,8 +650,8 @@ __fixup_a_pv_table:
 	orrcs	r6, #0x0080
 	orr	r6, r6, r7, lsl #12
 	orr	r6, #0x4000
-	b	2f
-1:	add     r7, r3
+	b	.Lnext
+.Lloop:	add	r7, r3
 	ldrh	ip, [r7, #2]
 ARM_BE8(rev16	ip, ip)
 	tst	ip, #0x4000
@@ -660,25 +660,21 @@ ARM_BE8(rev16	ip, ip)
 	orreq	ip, r0	@ mask in offset bits 7-0
 ARM_BE8(rev16	ip, ip)
 	strh	ip, [r7, #2]
-	bne	2f
+	bne	.Lnext
 	ldrh	ip, [r7]
 ARM_BE8(rev16	ip, ip)
 	bic	ip, #0x20
 	orr	ip, ip, r0, lsr #16
 ARM_BE8(rev16	ip, ip)
 	strh	ip, [r7]
-2:	cmp	r4, r5
-	ldrcc	r7, [r4], #4	@ use branch for delay slot
-	bcc	1b
-	bx	lr
 #else
 #ifdef CONFIG_CPU_ENDIAN_BE8
 	moveq	r0, #0x00004000	@ set bit 22, mov to mvn instruction
 #else
 	moveq	r0, #0x400000	@ set bit 22, mov to mvn instruction
 #endif
-	b	2f
-1:	ldr	ip, [r7, r3]
+	b	.Lnext
+.Lloop:	ldr	ip, [r7, r3]
 #ifdef CONFIG_CPU_ENDIAN_BE8
 	@ in BE8, we load data in BE, but instructions still in LE
 	bic	ip, ip, #0xff000000
@@ -694,11 +690,13 @@ ARM_BE8(rev16	ip, ip)
 	orreq	ip, ip, r0	@ mask in offset bits 7-0
 #endif
 	str	ip, [r7, r3]
-2:	cmp	r4, r5
+#endif
+
+.Lnext:
+	cmp	r4, r5
 	ldrcc	r7, [r4], #4	@ use branch for delay slot
-	bcc	1b
+	bcc	.Lloop
 	ret	lr
-#endif
 ENDPROC(__fixup_a_pv_table)
 
 	.align
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC/RFT PATCH 2/6] ARM: p2v: factor out BE8 handling
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
  2020-09-18 10:30 ` [RFC/RFT PATCH 1/6] ARM: p2v: factor out shared loop processing Ard Biesheuvel
@ 2020-09-18 10:30 ` Ard Biesheuvel
  2020-09-18 10:30 ` [RFC/RFT PATCH 3/6] ARM: p2v: drop redundant 'type' argument from __pv_stub Ard Biesheuvel
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 10:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/kernel/head.S | 30 +++++++++-----------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 9a0c11ac8281..c2a912121e3e 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -669,26 +669,24 @@ ARM_BE8(rev16	ip, ip)
 	strh	ip, [r7]
 #else
 #ifdef CONFIG_CPU_ENDIAN_BE8
-	moveq	r0, #0x00004000	@ set bit 22, mov to mvn instruction
+@ in BE8, we load data in BE, but instructions still in LE
+#define PV_BIT22	0x00004000
+#define PV_IMM8_MASK	0xff000000
+#define PV_ROT_MASK	0x000f0000
 #else
-	moveq	r0, #0x400000	@ set bit 22, mov to mvn instruction
+#define PV_BIT22	0x00400000
+#define PV_IMM8_MASK	0x000000ff
+#define PV_ROT_MASK	0xf00
 #endif
+
+	moveq	r0, #PV_BIT22	@ set bit 22, mov to mvn instruction
 	b	.Lnext
 .Lloop:	ldr	ip, [r7, r3]
-#ifdef CONFIG_CPU_ENDIAN_BE8
-	@ in BE8, we load data in BE, but instructions still in LE
-	bic	ip, ip, #0xff000000
-	tst	ip, #0x000f0000	@ check the rotation field
-	orrne	ip, ip, r6, lsl #24 @ mask in offset bits 31-24
-	biceq	ip, ip, #0x00004000 @ clear bit 22
-	orreq	ip, ip, r0      @ mask in offset bits 7-0
-#else
-	bic	ip, ip, #0x000000ff
-	tst	ip, #0xf00	@ check the rotation field
-	orrne	ip, ip, r6	@ mask in offset bits 31-24
-	biceq	ip, ip, #0x400000	@ clear bit 22
-	orreq	ip, ip, r0	@ mask in offset bits 7-0
-#endif
+	bic	ip, ip, #PV_IMM8_MASK
+	tst	ip, #PV_ROT_MASK		@ check the rotation field
+	orrne	ip, ip, r6 ARM_BE8(, lsl #24)	@ mask in offset bits 31-24
+	biceq	ip, ip, #PV_BIT22		@ clear bit 22
+	orreq	ip, ip, r0			@ mask in offset bits 7-0
 	str	ip, [r7, r3]
 #endif
 
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC/RFT PATCH 3/6] ARM: p2v: drop redundant 'type' argument from __pv_stub
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
  2020-09-18 10:30 ` [RFC/RFT PATCH 1/6] ARM: p2v: factor out shared loop processing Ard Biesheuvel
  2020-09-18 10:30 ` [RFC/RFT PATCH 2/6] ARM: p2v: factor out BE8 handling Ard Biesheuvel
@ 2020-09-18 10:30 ` Ard Biesheuvel
  2020-09-18 10:31 ` [RFC/RFT PATCH 4/6] ARM: p2v: use relative references in patch site arrays Ard Biesheuvel
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 10:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei, Ard Biesheuvel

We always pass the same value for 'type' so pull it into the __pv_stub
macro itself.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/memory.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 99035b5891ef..eb3c8e6e960a 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -183,14 +183,14 @@ extern const void *__pv_table_begin, *__pv_table_end;
 #define PHYS_OFFSET	((phys_addr_t)__pv_phys_pfn_offset << PAGE_SHIFT)
 #define PHYS_PFN_OFFSET	(__pv_phys_pfn_offset)
 
-#define __pv_stub(from,to,instr,type)			\
+#define __pv_stub(from,to,instr)			\
 	__asm__("@ __pv_stub\n"				\
 	"1:	" instr "	%0, %1, %2\n"		\
 	"	.pushsection .pv_table,\"a\"\n"		\
 	"	.long	1b\n"				\
 	"	.popsection\n"				\
 	: "=r" (to)					\
-	: "r" (from), "I" (type))
+	: "r" (from), "I" (__PV_BITS_31_24))
 
 #define __pv_stub_mov_hi(t)				\
 	__asm__ volatile("@ __pv_stub_mov\n"		\
@@ -217,7 +217,7 @@ static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x)
 	phys_addr_t t;
 
 	if (sizeof(phys_addr_t) == 4) {
-		__pv_stub(x, t, "add", __PV_BITS_31_24);
+		__pv_stub(x, t, "add");
 	} else {
 		__pv_stub_mov_hi(t);
 		__pv_add_carry_stub(x, t);
@@ -235,7 +235,7 @@ static inline unsigned long __phys_to_virt(phys_addr_t x)
 	 * assembler expression receives 32 bit argument
 	 * in place where 'r' 32 bit operand is expected.
 	 */
-	__pv_stub((unsigned long) x, t, "sub", __PV_BITS_31_24);
+	__pv_stub((unsigned long) x, t, "sub");
 	return t;
 }
 
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC/RFT PATCH 4/6] ARM: p2v: use relative references in patch site arrays
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2020-09-18 10:30 ` [RFC/RFT PATCH 3/6] ARM: p2v: drop redundant 'type' argument from __pv_stub Ard Biesheuvel
@ 2020-09-18 10:31 ` Ard Biesheuvel
  2020-09-18 10:31 ` [RFC/RFT PATCH 5/6] ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE Ard Biesheuvel
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 10:31 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei, Ard Biesheuvel

Free up a register in the p2v patching code by switching to relative
references, which don't require keeping the phys-to-virt displacement
live in a register.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/memory.h |  6 +++---
 arch/arm/kernel/head.S        | 15 ++++++++-------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index eb3c8e6e960a..4121662dea5a 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -187,7 +187,7 @@ extern const void *__pv_table_begin, *__pv_table_end;
 	__asm__("@ __pv_stub\n"				\
 	"1:	" instr "	%0, %1, %2\n"		\
 	"	.pushsection .pv_table,\"a\"\n"		\
-	"	.long	1b\n"				\
+	"	.long	1b - .\n"			\
 	"	.popsection\n"				\
 	: "=r" (to)					\
 	: "r" (from), "I" (__PV_BITS_31_24))
@@ -196,7 +196,7 @@ extern const void *__pv_table_begin, *__pv_table_end;
 	__asm__ volatile("@ __pv_stub_mov\n"		\
 	"1:	mov	%R0, %1\n"			\
 	"	.pushsection .pv_table,\"a\"\n"		\
-	"	.long	1b\n"				\
+	"	.long	1b - .\n"			\
 	"	.popsection\n"				\
 	: "=r" (t)					\
 	: "I" (__PV_BITS_7_0))
@@ -206,7 +206,7 @@ extern const void *__pv_table_begin, *__pv_table_end;
 	"1:	adds	%Q0, %1, %2\n"			\
 	"	adc	%R0, %R0, #0\n"			\
 	"	.pushsection .pv_table,\"a\"\n"		\
-	"	.long	1b\n"				\
+	"	.long	1b - .\n"			\
 	"	.popsection\n"				\
 	: "+r" (y)					\
 	: "r" (x), "I" (__PV_BITS_31_24)		\
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index c2a912121e3e..d2bd3b258386 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -633,7 +633,7 @@ ENDPROC(__fixup_pv_table)
 __fixup_a_pv_table:
 	adr	r0, 3f
 	ldr	r6, [r0]
-	add	r6, r6, r3
+	add	r6, r6, r0
 	ldr	r0, [r6, #HIGH_OFFSET]	@ pv_offset high word
 	ldr	r6, [r6, #LOW_OFFSET]	@ pv_offset low word
 	mov	r6, r6, lsr #24
@@ -651,7 +651,8 @@ __fixup_a_pv_table:
 	orr	r6, r6, r7, lsl #12
 	orr	r6, #0x4000
 	b	.Lnext
-.Lloop:	add	r7, r3
+.Lloop:	add	r7, r4
+	add	r4, #4
 	ldrh	ip, [r7, #2]
 ARM_BE8(rev16	ip, ip)
 	tst	ip, #0x4000
@@ -681,28 +682,28 @@ ARM_BE8(rev16	ip, ip)
 
 	moveq	r0, #PV_BIT22	@ set bit 22, mov to mvn instruction
 	b	.Lnext
-.Lloop:	ldr	ip, [r7, r3]
+.Lloop:	ldr	ip, [r7, r4]
 	bic	ip, ip, #PV_IMM8_MASK
 	tst	ip, #PV_ROT_MASK		@ check the rotation field
 	orrne	ip, ip, r6 ARM_BE8(, lsl #24)	@ mask in offset bits 31-24
 	biceq	ip, ip, #PV_BIT22		@ clear bit 22
 	orreq	ip, ip, r0			@ mask in offset bits 7-0
-	str	ip, [r7, r3]
+	str	ip, [r7, r4]
+	add	r4, r4, #4
 #endif
 
 .Lnext:
 	cmp	r4, r5
-	ldrcc	r7, [r4], #4	@ use branch for delay slot
+	ldrcc	r7, [r4]	@ use branch for delay slot
 	bcc	.Lloop
 	ret	lr
 ENDPROC(__fixup_a_pv_table)
 
 	.align
-3:	.long __pv_offset
+3:	.long __pv_offset - .
 
 ENTRY(fixup_pv_table)
 	stmfd	sp!, {r4 - r7, lr}
-	mov	r3, #0			@ no offset
 	mov	r4, r0			@ r0 = table start
 	add	r5, r0, r1		@ r1 = table size
 	bl	__fixup_a_pv_table
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC/RFT PATCH 5/6] ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2020-09-18 10:31 ` [RFC/RFT PATCH 4/6] ARM: p2v: use relative references in patch site arrays Ard Biesheuvel
@ 2020-09-18 10:31 ` Ard Biesheuvel
  2020-09-18 10:31 ` [RFC/RFT PATCH 6/6] ARM: p2v: reduce p2v alignment requirement to 2 MiB Ard Biesheuvel
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 10:31 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei, Ard Biesheuvel

In preparation for reducing the phys-to-virt minimum relative alignment
from 16 MiB to 2 MiB, switch to patchable sequences involving MOVW
instructions that can more easily be manipulated to carry a 12-bit
immediate. Note that the non-LPAE ARM sequence is not updated: MOVW
may not be supported on non-LPAE platforms, and the sequence itself
can be updated more easily to apply the 12 bits of displacement.

For Thumb2, which has many more versions of opcodes, switch to a sequence
that can be patched by the same patching code for both versions, and use
asm constraints and S-suffixed opcodes to force narrow encodings to be
selected.

Suggested-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/memory.h | 43 +++++++++++----
 arch/arm/kernel/head.S        | 57 +++++++++++++-------
 2 files changed, 69 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 4121662dea5a..7184a2540816 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -183,6 +183,7 @@ extern const void *__pv_table_begin, *__pv_table_end;
 #define PHYS_OFFSET	((phys_addr_t)__pv_phys_pfn_offset << PAGE_SHIFT)
 #define PHYS_PFN_OFFSET	(__pv_phys_pfn_offset)
 
+#ifndef CONFIG_THUMB2_KERNEL
 #define __pv_stub(from,to,instr)			\
 	__asm__("@ __pv_stub\n"				\
 	"1:	" instr "	%0, %1, %2\n"		\
@@ -192,25 +193,46 @@ extern const void *__pv_table_begin, *__pv_table_end;
 	: "=r" (to)					\
 	: "r" (from), "I" (__PV_BITS_31_24))
 
-#define __pv_stub_mov_hi(t)				\
-	__asm__ volatile("@ __pv_stub_mov\n"		\
-	"1:	mov	%R0, %1\n"			\
+#define __pv_add_carry_stub(x, y)			\
+	__asm__ volatile("@ __pv_add_carry_stub\n"	\
+	"0:	movw	%R0, %2\n"			\
+	"1:	adds	%Q0, %1, %R0, lsl #24\n"	\
+	"2:	mov	%R0, %3\n"			\
+	"	adc	%R0, %R0, #0\n"			\
 	"	.pushsection .pv_table,\"a\"\n"		\
-	"	.long	1b - .\n"			\
+	"	.long	0b - ., 1b - ., 2b - .\n"	\
 	"	.popsection\n"				\
-	: "=r" (t)					\
-	: "I" (__PV_BITS_7_0))
+	: "=&r" (y)					\
+	: "r" (x), "j" (0), "I" (__PV_BITS_7_0)		\
+	: "cc")
+
+#else
+#define __pv_stub(from,to,instr)			\
+	__asm__("@ __pv_stub\n"				\
+	"0:	movw	%0, %2\n"			\
+	"	lsls	%0, #24\n"			\
+	"	" instr "s %0, %1, %0\n"		\
+	"	.pushsection .pv_table,\"a\"\n"		\
+	"	.long	0b - .\n"			\
+	"	.popsection\n"				\
+	: "=&l" (to)					\
+	: "l" (from), "j" (0)				\
+	: "cc")
 
 #define __pv_add_carry_stub(x, y)			\
 	__asm__ volatile("@ __pv_add_carry_stub\n"	\
-	"1:	adds	%Q0, %1, %2\n"			\
+	"0:	movw	%R0, %2\n"			\
+	"	lsls	%R0, #24\n"			\
+	"	adds	%Q0, %1, %R0\n"			\
+	"1:	mvn	%R0, #0\n"			\
 	"	adc	%R0, %R0, #0\n"			\
 	"	.pushsection .pv_table,\"a\"\n"		\
-	"	.long	1b - .\n"			\
+	"	.long	0b - ., 1b - .\n"		\
 	"	.popsection\n"				\
-	: "+r" (y)					\
-	: "r" (x), "I" (__PV_BITS_31_24)		\
+	: "=&l" (y)					\
+	: "l" (x), "j" (0)				\
 	: "cc")
+#endif
 
 static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x)
 {
@@ -219,7 +241,6 @@ static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x)
 	if (sizeof(phys_addr_t) == 4) {
 		__pv_stub(x, t, "add");
 	} else {
-		__pv_stub_mov_hi(t);
 		__pv_add_carry_stub(x, t);
 	}
 	return t;
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index d2bd3b258386..86cea608a5ea 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -639,43 +639,45 @@ __fixup_a_pv_table:
 	mov	r6, r6, lsr #24
 	cmn	r0, #1
 #ifdef CONFIG_THUMB2_KERNEL
-	moveq	r0, #0x200000	@ set bit 21, mov to mvn instruction
-	lsls	r6, #24
-	beq	.Lnext
-	clz	r7, r6
-	lsr	r6, #24
-	lsl	r6, r7
-	bic	r6, #0x0080
-	lsrs	r7, #1
-	orrcs	r6, #0x0080
-	orr	r6, r6, r7, lsl #12
-	orr	r6, #0x4000
+	moveq	r0, #0x200	@ bit 9, ADD to SUB instruction (T1 encoding)
 	b	.Lnext
 .Lloop:	add	r7, r4
 	add	r4, #4
+#ifdef CONFIG_ARM_LPAE
+	ldrh	ip, [r7]
+ARM_BE8(rev16	ip, ip)
+	tst	ip, #0x200	@ MOVW has bit 9 set, MVN has it clear
+	bne	0f		@ skip if MOVW
+	tst	r0, #0x200	@ need to convert MVN to MOV ?
+	bne	.Lnext
+	eor	ip, ip, #0x20	@ flick bit #5
+ARM_BE8(rev16	ip, ip)
+	strh	ip, [r7]
+	b	.Lnext
+0:
+#endif
 	ldrh	ip, [r7, #2]
 ARM_BE8(rev16	ip, ip)
-	tst	ip, #0x4000
-	and	ip, #0x8f00
-	orrne	ip, r6	@ mask in offset bits 31-24
-	orreq	ip, r0	@ mask in offset bits 7-0
+	orr	ip, r6	@ mask in offset bits 31-24
 ARM_BE8(rev16	ip, ip)
 	strh	ip, [r7, #2]
-	bne	.Lnext
-	ldrh	ip, [r7]
+	ldrh	ip, [r7, #6]
 ARM_BE8(rev16	ip, ip)
-	bic	ip, #0x20
-	orr	ip, ip, r0, lsr #16
+	eor	ip, ip, r0
 ARM_BE8(rev16	ip, ip)
-	strh	ip, [r7]
+	strh	ip, [r7, #6]
 #else
 #ifdef CONFIG_CPU_ENDIAN_BE8
 @ in BE8, we load data in BE, but instructions still in LE
+#define PV_BIT20	0x00001000
 #define PV_BIT22	0x00004000
+#define PV_BIT23_22	0x0000c000
 #define PV_IMM8_MASK	0xff000000
 #define PV_ROT_MASK	0x000f0000
 #else
+#define PV_BIT20	0x00100000
 #define PV_BIT22	0x00400000
+#define PV_BIT23_22	0x00c00000
 #define PV_IMM8_MASK	0x000000ff
 #define PV_ROT_MASK	0xf00
 #endif
@@ -683,11 +685,26 @@ ARM_BE8(rev16	ip, ip)
 	moveq	r0, #PV_BIT22	@ set bit 22, mov to mvn instruction
 	b	.Lnext
 .Lloop:	ldr	ip, [r7, r4]
+#ifdef CONFIG_ARM_LPAE
+	tst	ip, #PV_BIT23_22	@ MOVW has bit 23:22 clear, MOV/ADD/SUB have it set
+ARM_BE8(rev	ip, ip)
+	orreq	ip, ip, r6
+ARM_BE8(rev	ip, ip)
+	beq	2f
+	tst	ip, #PV_BIT20		@ ADDS has bit 20 set
+	beq	1f
+	tst	r0, #PV_BIT22		@ check whether to invert bits 23:22 (ADD -> SUB)
+	beq	.Lnext
+	eor	ip, ip, #PV_BIT23_22
+	b	2f
+1:
+#endif
 	bic	ip, ip, #PV_IMM8_MASK
 	tst	ip, #PV_ROT_MASK		@ check the rotation field
 	orrne	ip, ip, r6 ARM_BE8(, lsl #24)	@ mask in offset bits 31-24
 	biceq	ip, ip, #PV_BIT22		@ clear bit 22
 	orreq	ip, ip, r0			@ mask in offset bits 7-0
+2:
 	str	ip, [r7, r4]
 	add	r4, r4, #4
 #endif
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC/RFT PATCH 6/6] ARM: p2v: reduce p2v alignment requirement to 2 MiB
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2020-09-18 10:31 ` [RFC/RFT PATCH 5/6] ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE Ard Biesheuvel
@ 2020-09-18 10:31 ` Ard Biesheuvel
  2020-09-18 17:25 ` [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment " Ard Biesheuvel
  2020-09-19 23:49 ` Nicolas Pitre
  7 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 10:31 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei, Ard Biesheuvel

Update the p2v patching code so we can deal with displacements that are
not a multiple of 16 MiB but of 2 MiB, to prevent wasting of up to 14 MiB
of physical RAM when running on a platform where the start of memory is
not correctly aligned.

For the ARM code path, this simply comes down to using two add/sub
instructions instead of one for the carryless version, and patching
each of them with the correct immediate depending on the rotation
field. For the LPAE calculation, it patches the MOVW instruction with
up to 12 bits of offset.

For the Thumb2 code path, patching more than 11 bits off displacement
is somewhat cumbersome, and given that 11 bits produce a minimum
alignment of 2 MiB, which is also the granularity for LPAE block
mappings, it makes sense to stick to 2 MiB for the new p2v requirement.

Suggested-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/Kconfig              |  2 +-
 arch/arm/include/asm/memory.h | 15 +++++++-----
 arch/arm/kernel/head.S        | 24 +++++++++++++++-----
 3 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e00d94b16658..c4737a0e613b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -240,7 +240,7 @@ config ARM_PATCH_PHYS_VIRT
 	  kernel in system memory.
 
 	  This can only be used with non-XIP MMU kernels where the base
-	  of physical memory is at a 16MB boundary.
+	  of physical memory is at a 2MiB boundary.
 
 	  Only disable this option if you know that you do not require
 	  this feature (eg, building a kernel for a single machine) and
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 7184a2540816..5da01e7f0d8a 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -173,6 +173,7 @@ extern unsigned long vectors_base;
  * so that all we need to do is modify the 8-bit constant field.
  */
 #define __PV_BITS_31_24	0x81000000
+#define __PV_BITS_23_16	0x810000
 #define __PV_BITS_7_0	0x81
 
 extern unsigned long __pv_phys_pfn_offset;
@@ -187,16 +188,18 @@ extern const void *__pv_table_begin, *__pv_table_end;
 #define __pv_stub(from,to,instr)			\
 	__asm__("@ __pv_stub\n"				\
 	"1:	" instr "	%0, %1, %2\n"		\
+	"2:	" instr "	%0, %0, %3\n"		\
 	"	.pushsection .pv_table,\"a\"\n"		\
-	"	.long	1b - .\n"			\
+	"	.long	1b - ., 2b - .\n"		\
 	"	.popsection\n"				\
-	: "=r" (to)					\
-	: "r" (from), "I" (__PV_BITS_31_24))
+	: "=&r" (to)					\
+	: "r" (from), "I" (__PV_BITS_31_24),		\
+	  "I"(__PV_BITS_23_16))
 
 #define __pv_add_carry_stub(x, y)			\
 	__asm__ volatile("@ __pv_add_carry_stub\n"	\
 	"0:	movw	%R0, %2\n"			\
-	"1:	adds	%Q0, %1, %R0, lsl #24\n"	\
+	"1:	adds	%Q0, %1, %R0, lsl #20\n"	\
 	"2:	mov	%R0, %3\n"			\
 	"	adc	%R0, %R0, #0\n"			\
 	"	.pushsection .pv_table,\"a\"\n"		\
@@ -210,7 +213,7 @@ extern const void *__pv_table_begin, *__pv_table_end;
 #define __pv_stub(from,to,instr)			\
 	__asm__("@ __pv_stub\n"				\
 	"0:	movw	%0, %2\n"			\
-	"	lsls	%0, #24\n"			\
+	"	lsls	%0, #21\n"			\
 	"	" instr "s %0, %1, %0\n"		\
 	"	.pushsection .pv_table,\"a\"\n"		\
 	"	.long	0b - .\n"			\
@@ -222,7 +225,7 @@ extern const void *__pv_table_begin, *__pv_table_end;
 #define __pv_add_carry_stub(x, y)			\
 	__asm__ volatile("@ __pv_add_carry_stub\n"	\
 	"0:	movw	%R0, %2\n"			\
-	"	lsls	%R0, #24\n"			\
+	"	lsls	%R0, #21\n"			\
 	"	adds	%Q0, %1, %R0\n"			\
 	"1:	mvn	%R0, #0\n"			\
 	"	adc	%R0, %R0, #0\n"			\
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 86cea608a5ea..d08d506a0ccd 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -614,8 +614,8 @@ __fixup_pv_table:
 	mov	r0, r8, lsr #PAGE_SHIFT	@ convert to PFN
 	str	r0, [r6]	@ save computed PHYS_OFFSET to __pv_phys_pfn_offset
 	strcc	ip, [r7, #HIGH_OFFSET]	@ save to __pv_offset high bits
-	mov	r6, r3, lsr #24	@ constant for add/sub instructions
-	teq	r3, r6, lsl #24 @ must be 16MiB aligned
+	mov	r6, r3, lsr #21	@ constant for add/sub instructions
+	teq	r3, r6, lsl #21 @ must be 2MiB aligned
 THUMB(	it	ne		@ cross section branch )
 	bne	__error
 	str	r3, [r7, #LOW_OFFSET]	@ save to __pv_offset low bits
@@ -636,10 +636,13 @@ __fixup_a_pv_table:
 	add	r6, r6, r0
 	ldr	r0, [r6, #HIGH_OFFSET]	@ pv_offset high word
 	ldr	r6, [r6, #LOW_OFFSET]	@ pv_offset low word
-	mov	r6, r6, lsr #24
+	mov	r6, r6, lsr #16
 	cmn	r0, #1
 #ifdef CONFIG_THUMB2_KERNEL
 	moveq	r0, #0x200	@ bit 9, ADD to SUB instruction (T1 encoding)
+	mov	r3, r6, lsr #13	@ isolate top 3 bits of displacement
+	ubfx	r6, r6, #5, #8	@ put bits 28:21 into the imm8 field
+	bfi	r6, r3, #12, #3	@ put bits 31:29 into the imm3 field
 	b	.Lnext
 .Lloop:	add	r7, r4
 	add	r4, #4
@@ -658,7 +661,7 @@ ARM_BE8(rev16	ip, ip)
 #endif
 	ldrh	ip, [r7, #2]
 ARM_BE8(rev16	ip, ip)
-	orr	ip, r6	@ mask in offset bits 31-24
+	orr	ip, r6	@ mask in offset bits 31-21
 ARM_BE8(rev16	ip, ip)
 	strh	ip, [r7, #2]
 	ldrh	ip, [r7, #6]
@@ -674,21 +677,26 @@ ARM_BE8(rev16	ip, ip)
 #define PV_BIT23_22	0x0000c000
 #define PV_IMM8_MASK	0xff000000
 #define PV_ROT_MASK	0x000f0000
+#define PV_ROT16_MASK	0x00080000
 #else
 #define PV_BIT20	0x00100000
 #define PV_BIT22	0x00400000
 #define PV_BIT23_22	0x00c00000
 #define PV_IMM8_MASK	0x000000ff
 #define PV_ROT_MASK	0xf00
+#define PV_ROT16_MASK	0x800
 #endif
 
 	moveq	r0, #PV_BIT22	@ set bit 22, mov to mvn instruction
+	and	r3, r6, #0xf
+	mov	r6, r6, lsr #8
 	b	.Lnext
 .Lloop:	ldr	ip, [r7, r4]
 #ifdef CONFIG_ARM_LPAE
 	tst	ip, #PV_BIT23_22	@ MOVW has bit 23:22 clear, MOV/ADD/SUB have it set
 ARM_BE8(rev	ip, ip)
-	orreq	ip, ip, r6
+	orreq	ip, ip, r6, lsl #4
+	orreq	ip, ip, r3, lsr #4
 ARM_BE8(rev	ip, ip)
 	beq	2f
 	tst	ip, #PV_BIT20		@ ADDS has bit 20 set
@@ -701,9 +709,13 @@ ARM_BE8(rev	ip, ip)
 #endif
 	bic	ip, ip, #PV_IMM8_MASK
 	tst	ip, #PV_ROT_MASK		@ check the rotation field
-	orrne	ip, ip, r6 ARM_BE8(, lsl #24)	@ mask in offset bits 31-24
 	biceq	ip, ip, #PV_BIT22		@ clear bit 22
 	orreq	ip, ip, r0			@ mask in offset bits 7-0
+	beq	2f
+
+	tst	ip, #PV_ROT16_MASK		@ amount of shift?
+	orreq	ip, ip, r6 ARM_BE8(, lsl #24)	@ mask in offset bits 31-24
+	orrne	ip, ip, r3 ARM_BE8(, lsl #24)	@ mask in offset bits 23-16
 2:
 	str	ip, [r7, r4]
 	add	r4, r4, #4
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2020-09-18 10:31 ` [RFC/RFT PATCH 6/6] ARM: p2v: reduce p2v alignment requirement to 2 MiB Ard Biesheuvel
@ 2020-09-18 17:25 ` Ard Biesheuvel
  2020-09-19 23:49 ` Nicolas Pitre
  7 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-18 17:25 UTC (permalink / raw)
  To: Linux ARM
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Russell King,
	Santosh Shilimkar, Zhen Lei

On Fri, 18 Sep 2020 at 12:31, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> This series is inspired by Zhei Len's series [0], which updates the
> ARM p2v patching code to optionally support p2v relative alignments
> of as little as 64 KiB.
>
> Reducing this alignment is necessary for some specific Huawei boards,
> but given that reducing this minimum alignment will make the boot
> sequence more robust for all platforms, especially EFI boot, which
> no longer relies on the 128 MB masking of the decompressor load address,
> but uses firmware memory allocation routines to find a suitable spot
> for the decompressed kernel.
>
> This series is not based on Zhei Len's code, but addresses the same
> problem, and takes some feedback given in the review into account:
> - use of a MOVW instruction to avoid two adds/adcs sequences when dealing
>   with the carry on LPAE
> - add support for Thumb2 kernels as well
> - make the change unconditional - it will bit rot otherwise, and has value
>   for other platforms as well.
>
> The first four patches are general cleanup and preparatory changes.
> Patch #5 implements the switch to a MOVW instruction without changing
> the minimum alignment.
> Patch #6 reduces the minimum alignment to 2 MiB.
>
> Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.
>
> Cc: Zhen Lei <thunder.leizhen@huawei.com>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: Nicolas Pitre <nico@fluxnic.net>
>
> [0] https://lore.kernel.org/linux-arm-kernel/20200915015204.2971-1-thunder.leizhen@huawei.com/
>
> Ard Biesheuvel (6):
>   ARM: p2v: factor out shared loop processing
>   ARM: p2v: factor out BE8 handling
>   ARM: p2v: drop redundant 'type' argument from __pv_stub
>   ARM: p2v: use relative references in patch site arrays
>   ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE
>   ARM: p2v: reduce p2v alignment requirement to 2 MiB
>

Note: there is a thinko in this version of the patches, as it
unnecessarily patches ADD instructions into SUB and vice versa. Since
QEMU has its DRAM at 0x4000_0000, the translation for a 3g/1g split is
0x8000_0000, which why it worked in my testing, but will fail on other
platforms (or with other splits, for that matter)

For the general discussion of the approach taken here, that does not
really matter, so I won't send a v2 until people have had some time to
have a look. However, if you want to test this code on other hardware,
please use the code at the following link instead:

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm-p2v-v2

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
  2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2020-09-18 17:25 ` [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment " Ard Biesheuvel
@ 2020-09-19 23:49 ` Nicolas Pitre
  2020-09-20  7:50   ` Ard Biesheuvel
  2020-09-20  8:55   ` Russell King - ARM Linux admin
  7 siblings, 2 replies; 14+ messages in thread
From: Nicolas Pitre @ 2020-09-19 23:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, Linus Walleij, Russell King, Santosh Shilimkar,
	Zhen Lei, linux-arm-kernel

On Fri, 18 Sep 2020, Ard Biesheuvel wrote:

> This series is inspired by Zhei Len's series [0], which updates the
> ARM p2v patching code to optionally support p2v relative alignments
> of as little as 64 KiB.
> 
> Reducing this alignment is necessary for some specific Huawei boards,
> but given that reducing this minimum alignment will make the boot
> sequence more robust for all platforms, especially EFI boot, which
> no longer relies on the 128 MB masking of the decompressor load address,
> but uses firmware memory allocation routines to find a suitable spot
> for the decompressed kernel.
> 
> This series is not based on Zhei Len's code, but addresses the same
> problem, and takes some feedback given in the review into account:
> - use of a MOVW instruction to avoid two adds/adcs sequences when dealing
>   with the carry on LPAE
> - add support for Thumb2 kernels as well
> - make the change unconditional - it will bit rot otherwise, and has value
>   for other platforms as well.
> 
> The first four patches are general cleanup and preparatory changes.
> Patch #5 implements the switch to a MOVW instruction without changing
> the minimum alignment.
> Patch #6 reduces the minimum alignment to 2 MiB.
> 
> Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.

At this point I think this really ought to be split into a file of its 
own... and maybe even rewritten in C. Even though I wrote the original 
code, I no longer understand it without re-investing time into it. But 
in either cases the whole of head.S would need to have its registers 
shuffled first to move long lived values away from r0-r3,ip,lr to allow 
for standard function calls.


Nicolas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
  2020-09-19 23:49 ` Nicolas Pitre
@ 2020-09-20  7:50   ` Ard Biesheuvel
  2020-09-20  8:57     ` Russell King - ARM Linux admin
  2020-09-20  8:55   ` Russell King - ARM Linux admin
  1 sibling, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-20  7:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linux-efi, Linus Walleij, Russell King, Santosh Shilimkar,
	Zhen Lei, Linux ARM

On Sun, 20 Sep 2020 at 01:49, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Fri, 18 Sep 2020, Ard Biesheuvel wrote:
>
> > This series is inspired by Zhei Len's series [0], which updates the
> > ARM p2v patching code to optionally support p2v relative alignments
> > of as little as 64 KiB.
> >
> > Reducing this alignment is necessary for some specific Huawei boards,
> > but given that reducing this minimum alignment will make the boot
> > sequence more robust for all platforms, especially EFI boot, which
> > no longer relies on the 128 MB masking of the decompressor load address,
> > but uses firmware memory allocation routines to find a suitable spot
> > for the decompressed kernel.
> >
> > This series is not based on Zhei Len's code, but addresses the same
> > problem, and takes some feedback given in the review into account:
> > - use of a MOVW instruction to avoid two adds/adcs sequences when dealing
> >   with the carry on LPAE
> > - add support for Thumb2 kernels as well
> > - make the change unconditional - it will bit rot otherwise, and has value
> >   for other platforms as well.
> >
> > The first four patches are general cleanup and preparatory changes.
> > Patch #5 implements the switch to a MOVW instruction without changing
> > the minimum alignment.
> > Patch #6 reduces the minimum alignment to 2 MiB.
> >
> > Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.
>
> At this point I think this really ought to be split into a file of its
> own... and maybe even rewritten in C. Even though I wrote the original
> code, I no longer understand it without re-investing time into it. But
> in either cases the whole of head.S would need to have its registers
> shuffled first to move long lived values away from r0-r3,ip,lr to allow
> for standard function calls.
>

I agree with that in principle, however, running C code with a stack
with the MMU off is slightly risky.

I have managed to simplify the code a bit more (given that some
patching was not needed to begin with), and I can add some more
comments to head.S to annotate the actions.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
  2020-09-19 23:49 ` Nicolas Pitre
  2020-09-20  7:50   ` Ard Biesheuvel
@ 2020-09-20  8:55   ` Russell King - ARM Linux admin
  1 sibling, 0 replies; 14+ messages in thread
From: Russell King - ARM Linux admin @ 2020-09-20  8:55 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linux-efi, Linus Walleij, Santosh Shilimkar, Zhen Lei,
	Ard Biesheuvel, linux-arm-kernel

On Sat, Sep 19, 2020 at 07:49:26PM -0400, Nicolas Pitre wrote:
> On Fri, 18 Sep 2020, Ard Biesheuvel wrote:
> > This series is inspired by Zhei Len's series [0], which updates the
> > ARM p2v patching code to optionally support p2v relative alignments
> > of as little as 64 KiB.
> > 
> > Reducing this alignment is necessary for some specific Huawei boards,
> > but given that reducing this minimum alignment will make the boot
> > sequence more robust for all platforms, especially EFI boot, which
> > no longer relies on the 128 MB masking of the decompressor load address,
> > but uses firmware memory allocation routines to find a suitable spot
> > for the decompressed kernel.
> > 
> > This series is not based on Zhei Len's code, but addresses the same
> > problem, and takes some feedback given in the review into account:
> > - use of a MOVW instruction to avoid two adds/adcs sequences when dealing
> >   with the carry on LPAE
> > - add support for Thumb2 kernels as well
> > - make the change unconditional - it will bit rot otherwise, and has value
> >   for other platforms as well.
> > 
> > The first four patches are general cleanup and preparatory changes.
> > Patch #5 implements the switch to a MOVW instruction without changing
> > the minimum alignment.
> > Patch #6 reduces the minimum alignment to 2 MiB.
> > 
> > Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.
> 
> At this point I think this really ought to be split into a file of its 
> own... and maybe even rewritten in C. Even though I wrote the original 
> code, I no longer understand it without re-investing time into it. But 
> in either cases the whole of head.S would need to have its registers 
> shuffled first to move long lived values away from r0-r3,ip,lr to allow 
> for standard function calls.

However, that code has to run _before_ the virtual mappings are setup,
which makes C code out of the question, unless we build a separate
executable binary and then insert it into the kernel image.  So, sorry,
it's not going to be practical to rewrite it in C.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
  2020-09-20  7:50   ` Ard Biesheuvel
@ 2020-09-20  8:57     ` Russell King - ARM Linux admin
  2020-09-20 10:06       ` Ard Biesheuvel
  0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux admin @ 2020-09-20  8:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Santosh Shilimkar,
	Zhen Lei, Linux ARM

On Sun, Sep 20, 2020 at 09:50:30AM +0200, Ard Biesheuvel wrote:
> On Sun, 20 Sep 2020 at 01:49, Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > On Fri, 18 Sep 2020, Ard Biesheuvel wrote:
> >
> > > This series is inspired by Zhei Len's series [0], which updates the
> > > ARM p2v patching code to optionally support p2v relative alignments
> > > of as little as 64 KiB.
> > >
> > > Reducing this alignment is necessary for some specific Huawei boards,
> > > but given that reducing this minimum alignment will make the boot
> > > sequence more robust for all platforms, especially EFI boot, which
> > > no longer relies on the 128 MB masking of the decompressor load address,
> > > but uses firmware memory allocation routines to find a suitable spot
> > > for the decompressed kernel.
> > >
> > > This series is not based on Zhei Len's code, but addresses the same
> > > problem, and takes some feedback given in the review into account:
> > > - use of a MOVW instruction to avoid two adds/adcs sequences when dealing
> > >   with the carry on LPAE
> > > - add support for Thumb2 kernels as well
> > > - make the change unconditional - it will bit rot otherwise, and has value
> > >   for other platforms as well.
> > >
> > > The first four patches are general cleanup and preparatory changes.
> > > Patch #5 implements the switch to a MOVW instruction without changing
> > > the minimum alignment.
> > > Patch #6 reduces the minimum alignment to 2 MiB.
> > >
> > > Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.
> >
> > At this point I think this really ought to be split into a file of its
> > own... and maybe even rewritten in C. Even though I wrote the original
> > code, I no longer understand it without re-investing time into it. But
> > in either cases the whole of head.S would need to have its registers
> > shuffled first to move long lived values away from r0-r3,ip,lr to allow
> > for standard function calls.
> >
> 
> I agree with that in principle, however, running C code with a stack
> with the MMU off is slightly risky.

It's more than "slightly".  C code has literal addresses, which are raw
virtual addresses.  These are meaningless with the MMU off.

I guess one could correct the various pointers the code would read, but
you could not directly access any variable (as that involves
dereferencing a virtual address stored in the function's literal pool.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
  2020-09-20  8:57     ` Russell King - ARM Linux admin
@ 2020-09-20 10:06       ` Ard Biesheuvel
  2020-09-20 15:34         ` Nicolas Pitre
  0 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2020-09-20 10:06 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: linux-efi, Nicolas Pitre, Linus Walleij, Santosh Shilimkar,
	Zhen Lei, Linux ARM

On Sun, 20 Sep 2020 at 10:57, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Sun, Sep 20, 2020 at 09:50:30AM +0200, Ard Biesheuvel wrote:
> > On Sun, 20 Sep 2020 at 01:49, Nicolas Pitre <nico@fluxnic.net> wrote:
> > >
> > > On Fri, 18 Sep 2020, Ard Biesheuvel wrote:
> > >
> > > > This series is inspired by Zhei Len's series [0], which updates the
> > > > ARM p2v patching code to optionally support p2v relative alignments
> > > > of as little as 64 KiB.
> > > >
> > > > Reducing this alignment is necessary for some specific Huawei boards,
> > > > but given that reducing this minimum alignment will make the boot
> > > > sequence more robust for all platforms, especially EFI boot, which
> > > > no longer relies on the 128 MB masking of the decompressor load address,
> > > > but uses firmware memory allocation routines to find a suitable spot
> > > > for the decompressed kernel.
> > > >
> > > > This series is not based on Zhei Len's code, but addresses the same
> > > > problem, and takes some feedback given in the review into account:
> > > > - use of a MOVW instruction to avoid two adds/adcs sequences when dealing
> > > >   with the carry on LPAE
> > > > - add support for Thumb2 kernels as well
> > > > - make the change unconditional - it will bit rot otherwise, and has value
> > > >   for other platforms as well.
> > > >
> > > > The first four patches are general cleanup and preparatory changes.
> > > > Patch #5 implements the switch to a MOVW instruction without changing
> > > > the minimum alignment.
> > > > Patch #6 reduces the minimum alignment to 2 MiB.
> > > >
> > > > Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.
> > >
> > > At this point I think this really ought to be split into a file of its
> > > own... and maybe even rewritten in C. Even though I wrote the original
> > > code, I no longer understand it without re-investing time into it. But
> > > in either cases the whole of head.S would need to have its registers
> > > shuffled first to move long lived values away from r0-r3,ip,lr to allow
> > > for standard function calls.
> > >
> >
> > I agree with that in principle, however, running C code with a stack
> > with the MMU off is slightly risky.
>
> It's more than "slightly".  C code has literal addresses, which are raw
> virtual addresses.  These are meaningless with the MMU off.
>
> I guess one could correct the various pointers the code would read, but
> you could not directly access any variable (as that involves
> dereferencing a virtual address stored in the function's literal pool.)
>

We might be able to work around that by compiling with -fPIC, and/or
by ensuring that all inputs to the routine are passed via function
parameters. But I agree that using C for this code is probably not the
right choice.

If there is no disagreement about the 2 MiB alignment, or the choice
of opcodes for the patchable sequences, I can prepare a v2 that fixes
the issues I mentioned, and has some more explanatory comments in the
patching routine.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB
  2020-09-20 10:06       ` Ard Biesheuvel
@ 2020-09-20 15:34         ` Nicolas Pitre
  0 siblings, 0 replies; 14+ messages in thread
From: Nicolas Pitre @ 2020-09-20 15:34 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, Linus Walleij, Russell King - ARM Linux admin,
	Santosh Shilimkar, Zhen Lei, Linux ARM

On Sun, 20 Sep 2020, Ard Biesheuvel wrote:

> On Sun, 20 Sep 2020 at 10:57, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Sun, Sep 20, 2020 at 09:50:30AM +0200, Ard Biesheuvel wrote:
> > > On Sun, 20 Sep 2020 at 01:49, Nicolas Pitre <nico@fluxnic.net> wrote:
> > > >
> > > > On Fri, 18 Sep 2020, Ard Biesheuvel wrote:
> > > >
> > > > > This series is inspired by Zhei Len's series [0], which updates the
> > > > > ARM p2v patching code to optionally support p2v relative alignments
> > > > > of as little as 64 KiB.
> > > > >
> > > > > Reducing this alignment is necessary for some specific Huawei boards,
> > > > > but given that reducing this minimum alignment will make the boot
> > > > > sequence more robust for all platforms, especially EFI boot, which
> > > > > no longer relies on the 128 MB masking of the decompressor load address,
> > > > > but uses firmware memory allocation routines to find a suitable spot
> > > > > for the decompressed kernel.
> > > > >
> > > > > This series is not based on Zhei Len's code, but addresses the same
> > > > > problem, and takes some feedback given in the review into account:
> > > > > - use of a MOVW instruction to avoid two adds/adcs sequences when dealing
> > > > >   with the carry on LPAE
> > > > > - add support for Thumb2 kernels as well
> > > > > - make the change unconditional - it will bit rot otherwise, and has value
> > > > >   for other platforms as well.
> > > > >
> > > > > The first four patches are general cleanup and preparatory changes.
> > > > > Patch #5 implements the switch to a MOVW instruction without changing
> > > > > the minimum alignment.
> > > > > Patch #6 reduces the minimum alignment to 2 MiB.
> > > > >
> > > > > Tested on QEMU in ARM/!LPAE, ARM/LPAE, Thumb2/!LPAE and Thumb2/LPAE modes.
> > > >
> > > > At this point I think this really ought to be split into a file of its
> > > > own... and maybe even rewritten in C. Even though I wrote the original
> > > > code, I no longer understand it without re-investing time into it. But
> > > > in either cases the whole of head.S would need to have its registers
> > > > shuffled first to move long lived values away from r0-r3,ip,lr to allow
> > > > for standard function calls.
> > > >
> > >
> > > I agree with that in principle, however, running C code with a stack
> > > with the MMU off is slightly risky.
> >
> > It's more than "slightly".  C code has literal addresses, which are raw
> > virtual addresses.  These are meaningless with the MMU off.
> >
> > I guess one could correct the various pointers the code would read, but
> > you could not directly access any variable (as that involves
> > dereferencing a virtual address stored in the function's literal pool.)
> >
> 
> We might be able to work around that by compiling with -fPIC, and/or
> by ensuring that all inputs to the routine are passed via function
> parameters. But I agree that using C for this code is probably not the
> right choice.

Yeah... It is doable, like we do in the decompressor case, but the level 
of caution needed here would probably negate the gain from writing this 
in C.

The argument for moving it out to a file of its own still stands though.

> If there is no disagreement about the 2 MiB alignment, or the choice
> of opcodes for the patchable sequences, I can prepare a v2 that fixes
> the issues I mentioned, and has some more explanatory comments in the
> patching routine.

Yes please. Given you do have it all in your head now, it would be very 
valuable to be way more expensive with comments. Adding a comment block 
with the opcode bit definition before the code that transforms them 
would also be nice.


Nicolas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-20 15:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18 10:30 [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment to 2 MiB Ard Biesheuvel
2020-09-18 10:30 ` [RFC/RFT PATCH 1/6] ARM: p2v: factor out shared loop processing Ard Biesheuvel
2020-09-18 10:30 ` [RFC/RFT PATCH 2/6] ARM: p2v: factor out BE8 handling Ard Biesheuvel
2020-09-18 10:30 ` [RFC/RFT PATCH 3/6] ARM: p2v: drop redundant 'type' argument from __pv_stub Ard Biesheuvel
2020-09-18 10:31 ` [RFC/RFT PATCH 4/6] ARM: p2v: use relative references in patch site arrays Ard Biesheuvel
2020-09-18 10:31 ` [RFC/RFT PATCH 5/6] ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE Ard Biesheuvel
2020-09-18 10:31 ` [RFC/RFT PATCH 6/6] ARM: p2v: reduce p2v alignment requirement to 2 MiB Ard Biesheuvel
2020-09-18 17:25 ` [RFC/RFT PATCH 0/6] ARM: p2v: reduce min alignment " Ard Biesheuvel
2020-09-19 23:49 ` Nicolas Pitre
2020-09-20  7:50   ` Ard Biesheuvel
2020-09-20  8:57     ` Russell King - ARM Linux admin
2020-09-20 10:06       ` Ard Biesheuvel
2020-09-20 15:34         ` Nicolas Pitre
2020-09-20  8:55   ` Russell King - ARM Linux admin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).