linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] arm64: Usercopy refresh
@ 2022-09-28 11:58 Robin Murphy
  2022-09-28 11:58 ` [PATCH 1/3] arm64: Update copy_from_user() Robin Murphy
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Robin Murphy @ 2022-09-28 11:58 UTC (permalink / raw)
  To: will, catalin.marinas; +Cc: linux-arm-kernel, mark.rutland, kristina.martsenko

Hi all,

With the troublesome copy_in_user() now long gone, it's finally time for
some less-worse usercopy routines! Beyond just improving on the mangled
old code, this is also spurred on by Mark's ongoing work on exception
handling, off the back of which we had some ideas around making uaccess
fixups neater. Those can be explored in due course, but getting the last
of the old cruft out of the way first should help make it easier. For
now I've held off on clever macro stuff and stuck with hand-rolled
labels and explicit annotations which do bump the line count up a fair
bit, but in the hope of keeping the code itself more readable for
review.

Thanks,
Robin.


Robin Murphy (3):
  arm64: Update copy_from_user()
  arm64: Update copy_to_user()
  arm64: Garbage-collect usercopy leftovers

 arch/arm64/include/asm/asm-uaccess.h |  30 --
 arch/arm64/lib/copy_from_user.S      | 274 +++++++++++++++----
 arch/arm64/lib/copy_template.S       | 181 -------------
 arch/arm64/lib/copy_to_user.S        | 391 +++++++++++++++++++++++----
 4 files changed, 564 insertions(+), 312 deletions(-)
 delete mode 100644 arch/arm64/lib/copy_template.S

-- 
2.36.1.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] arm64: Update copy_from_user()
  2022-09-28 11:58 [PATCH 0/3] arm64: Usercopy refresh Robin Murphy
@ 2022-09-28 11:58 ` Robin Murphy
  2022-09-28 11:58 ` [PATCH 2/3] arm64: Update copy_to_user() Robin Murphy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Robin Murphy @ 2022-09-28 11:58 UTC (permalink / raw)
  To: will, catalin.marinas; +Cc: linux-arm-kernel, mark.rutland, kristina.martsenko

Replace the old mangled-beyond-hope copy_from_user() routine with a
shiny new one based on our newest memcpy() implementation. When plumbed
into the standalone memcpy benchmark from the Arm Optimized Routines
library, this routine outperforms the old one by up to ~1.7x for small
copies, with comparable gains across a range of microarchitectures,
levelling off once sizes get up to ~2KB and general load/store
throughput starts to dominate.

Some of this is paid for by pushing more complexity into the fixup
handlers, but much of that could be recovered again with cleverer
exception records that can give more information about the original
access to the handler itself. For now though, the label sleds are at
least entertaining.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/lib/copy_from_user.S | 274 ++++++++++++++++++++++++++------
 1 file changed, 223 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 34e317907524..a4b9bd73a5a8 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -1,73 +1,245 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Copyright (C) 2012 ARM Ltd.
+ * Copyright (c) 2012-2022, Arm Limited.
  */
 
 #include <linux/linkage.h>
 
 #include <asm/asm-uaccess.h>
 #include <asm/assembler.h>
-#include <asm/cache.h>
 
-/*
- * Copy from user space to a kernel buffer (alignment handled by the hardware)
+/* Assumptions:
+ *
+ * ARMv8-a, AArch64, unaligned accesses.
  *
- * Parameters:
- *	x0 - to
- *	x1 - from
- *	x2 - n
- * Returns:
- *	x0 - bytes not copied
  */
 
-	.macro ldrb1 reg, ptr, val
-	user_ldst 9998f, ldtrb, \reg, \ptr, \val
-	.endm
+#define L(label) .L ## label
 
-	.macro strb1 reg, ptr, val
-	strb \reg, [\ptr], \val
-	.endm
+#define dstin	x0
+#define src	x1
+#define count	x2
+#define dst	x3
+#define srcend	x4
+#define dstend	x5
+#define A_l	x6
+#define A_lw	w6
+#define A_h	x7
+#define B_l	x8
+#define B_lw	w8
+#define B_h	x9
+#define C_l	x10
+#define C_lw	w10
+#define C_h	x11
+#define D_l	x12
+#define D_h	x13
+#define E_l	x14
+#define E_h	x15
+#define F_l	x16
+#define F_h	x17
+#define tmp1	x14
 
-	.macro ldrh1 reg, ptr, val
-	user_ldst 9997f, ldtrh, \reg, \ptr, \val
-	.endm
+/*
+ * Derived from memcpy with various adjustments:
+ *
+ * - memmove parts are removed since user and kernel pointers won't overlap.
+ * - The main loop is scaled down to 48 bytes per iteration since the increase
+ *   in load ops changes the balance; little cores barely notice the difference,
+ *   so big cores can benefit from keeping the loop relatively short.
+ * - Similarly, preferring source rather than destination alignment works out
+ *   better on average.
+ * - The 33-128 byte cases are reworked to better balance the stores with the
+ *   doubled-up load ops, and keep a more consistent access pattern.
+ * - The 0-3 byte sequence is replaced with the one borrowed from clear_user,
+ *   since LDTRB lacks a register-offset addressing mode.
+ */
 
-	.macro strh1 reg, ptr, val
-	strh \reg, [\ptr], \val
-	.endm
+#define U_pre(x...)	USER(L(fixup_pre), x)
+#define U_dst(x...)	USER(L(fixup_dst), x)
+#define U_S1(x...)	USER(L(fixup_s1), x)
+#define U_M16(x...)	USER(L(fixup_m16), x)
+#define U_M32(x...)	USER(L(fixup_m32), x)
+#define U_M64(x...)	USER(L(fixup_m64), x)
+#define U_L32(x...)	USER(L(fixup_l32), x)
+#define U_L48(x...)	USER(L(fixup_l48), x)
+#define U_L64(x...)	USER(L(fixup_l64), x)
 
-	.macro ldr1 reg, ptr, val
-	user_ldst 9997f, ldtr, \reg, \ptr, \val
-	.endm
-
-	.macro str1 reg, ptr, val
-	str \reg, [\ptr], \val
-	.endm
-
-	.macro ldp1 reg1, reg2, ptr, val
-	user_ldp 9997f, \reg1, \reg2, \ptr, \val
-	.endm
-
-	.macro stp1 reg1, reg2, ptr, val
-	stp \reg1, \reg2, [\ptr], \val
-	.endm
-
-end	.req	x5
-srcin	.req	x15
 SYM_FUNC_START(__arch_copy_from_user)
-	add	end, x0, x2
-	mov	srcin, x1
-#include "copy_template.S"
-	mov	x0, #0				// Nothing to copy
+	add	srcend, src, count
+	add	dstend, dstin, count
+	cmp	count, 128
+	b.hi	L(copy_long)
+	cmp	count, 32
+	b.hi	L(copy32_128)
+
+	/* Small copies: 0..32 bytes.  */
+	cmp	count, 16
+	b.lo	L(copy16)
+U_pre(	ldtr	A_l, [src])
+U_pre(	ldtr	A_h, [src, 8])
+U_pre(	ldtr	D_l, [srcend, -16])
+U_pre(	ldtr	D_h, [srcend, -8])
+	stp	A_l, A_h, [dstin]
+	stp	D_l, D_h, [dstend, -16]
+	mov	x0, #0
 	ret
 
-	// Exception fixups
-9997:	cmp	dst, dstin
-	b.ne	9998f
-	// Before being absolutely sure we couldn't copy anything, try harder
-USER(9998f, ldtrb tmp1w, [srcin])
-	strb	tmp1w, [dst], #1
-9998:	sub	x0, end, dst			// bytes not copied
+	/* Copy 8-15 bytes.  */
+L(copy16):
+	tbz	count, 3, L(copy8)
+U_pre(	ldtr	A_l, [src])
+U_pre(	ldtr	A_h, [srcend, -8])
+	str	A_l, [dstin]
+	str	A_h, [dstend, -8]
+	mov	x0, #0
 	ret
+
+	.p2align 3
+	/* Copy 4-7 bytes.  */
+L(copy8):
+	tbz	count, 2, L(copy4)
+U_pre(	ldtr	A_lw, [src])
+U_pre(	ldtr	B_lw, [srcend, -4])
+	str	A_lw, [dstin]
+	str	B_lw, [dstend, -4]
+	mov	x0, #0
+	ret
+
+	/* Copy 0..3 bytes.  */
+L(copy4):
+	tbz	count, #1, L(copy1)
+U_pre(	ldtrh	A_lw, [src])
+	strh	A_lw, [dstin]
+L(copy1):
+	tbz	count, #0, L(copy0)
+U_S1(	ldtrb	A_lw, [srcend, -1])
+	strb	A_lw, [dstend, -1]
+L(copy0):
+	mov	x0, #0
+	ret
+
+	.p2align 4
+	/* Medium copies: 33..128 bytes.  */
+L(copy32_128):
+U_pre(	ldtr	A_l, [src])
+U_pre(	ldtr	A_h, [src, 8])
+U_pre(	ldtr	B_l, [src, 16])
+U_pre(	ldtr	B_h, [src, 24])
+	stp	A_l, A_h, [dstin]
+	stp	B_l, B_h, [dstin, 16]
+U_M32(	ldtr	C_l, [srcend, -32])
+U_M32(	ldtr	C_h, [srcend, -24])
+U_M32(	ldtr	D_l, [srcend, -16])
+U_M32(	ldtr	D_h, [srcend, -8])
+	cmp	count, 64
+	b.ls	L(copy64)
+U_M32(	ldtr	E_l, [src, 32])
+U_M32(	ldtr	E_h, [src, 40])
+U_M32(	ldtr	F_l, [src, 48])
+U_M32(	ldtr	F_h, [src, 56])
+	stp	E_l, E_h, [dstin, 32]
+	stp	F_l, F_h, [dstin, 48]
+U_M64(	ldtr	A_l, [srcend, -64])
+U_M64(	ldtr	A_h, [srcend, -56])
+U_M64(	ldtr	B_l, [srcend, -48])
+U_M64(	ldtr	B_h, [srcend, -40])
+	stp	A_l, A_h, [dstend, -64]
+	stp	B_l, B_h, [dstend, -48]
+L(copy64):
+	stp	C_l, C_h, [dstend, -32]
+	stp	D_l, D_h, [dstend, -16]
+	mov	x0, #0
+	ret
+
+	.p2align 4
+	/* Copy more than 128 bytes.  */
+L(copy_long):
+	/* Copy 16 bytes and then align src to 16-byte alignment.  */
+
+U_pre(	ldtr	D_l, [src])
+U_pre(	ldtr	D_h, [src, 8])
+	and	tmp1, src, 15
+	bic	src, src, 15
+	sub	dst, dstin, tmp1
+	add	count, count, tmp1	/* Count is now 16 too large.  */
+U_pre(	ldtr	A_l, [src, 16])
+U_pre(	ldtr	A_h, [src, 24])
+	stp	D_l, D_h, [dstin]
+U_M16(	ldtr	B_l, [src, 32])
+U_M16(	ldtr	B_h, [src, 40])
+U_M16(	ldtr	C_l, [src, 48])
+U_M16(	ldtr	C_h, [src, 56])
+	add	src, src, #48
+	subs	count, count, 96 + 16	/* Test and readjust count.  */
+	b.ls	L(copy48_from_end)
+
+L(loop48):
+	stp	A_l, A_h, [dst, 16]
+U_L32(	ldtr	A_l, [src, 16])
+U_L32(	ldtr	A_h, [src, 24])
+	stp	B_l, B_h, [dst, 32]
+U_L48(	ldtr	B_l, [src, 32])
+U_L48(	ldtr	B_h, [src, 40])
+	stp	C_l, C_h, [dst, 48]!
+U_dst(	ldtr	C_l, [src, 48])
+U_dst(	ldtr	C_h, [src, 56])
+	add	src, src, #48
+	subs	count, count, 48
+	b.hi	L(loop48)
+
+	/* Write the last iteration and copy 48 bytes from the end.  */
+L(copy48_from_end):
+	stp	A_l, A_h, [dst, 16]
+U_L32(	ldtr	A_l, [srcend, -48])
+U_L32(	ldtr	A_h, [srcend, -40])
+	stp	B_l, B_h, [dst, 32]
+U_L48(	ldtr	B_l, [srcend, -32])
+U_L48(	ldtr	B_h, [srcend, -24])
+	stp	C_l, C_h, [dst, 48]
+U_L64(	ldtr	C_l, [srcend, -16])
+U_L64(	ldtr	C_h, [srcend, -8])
+	stp	A_l, A_h, [dstend, -48]
+	stp	B_l, B_h, [dstend, -32]
+	stp	C_l, C_h, [dstend, -16]
+	mov	x0, #0
+	ret
+
+	/* Fixups... */
+
+	/*
+	 * Fault before anything has been written, but progress may have
+	 * been possible; realign dst and retry a single byte to confirm.
+	 */
+L(fixup_pre):
+	mov	dst, dstin
+U_dst(	ldtrb	A_lw, [src])
+	strb	A_lw, [dst], #1
+L(fixup_dst):
+	sub	x0, dstend, dst
+	ret
+
+	/* Small: Fault with 1 byte remaining, regardless of count */
+L(fixup_s1):
+	mov	x0, #1
+	ret
+
+	/* Medium: Faults after n bytes beyond dstin have been written */
+L(fixup_m64):
+	add	dstin, dstin, #32
+L(fixup_m32):
+	add	dstin, dstin, #16
+L(fixup_m16):
+	add	dst, dstin, #16
+	b	L(fixup_dst)
+
+	/* Large: Faults after n bytes beyond dst have been written */
+L(fixup_l64):
+	add	dst, dst, #16
+L(fixup_l48):
+	add	dst, dst, #16
+L(fixup_l32):
+	add	dst, dst, #32
+	b	L(fixup_dst)
+
 SYM_FUNC_END(__arch_copy_from_user)
 EXPORT_SYMBOL(__arch_copy_from_user)
-- 
2.36.1.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] arm64: Update copy_to_user()
  2022-09-28 11:58 [PATCH 0/3] arm64: Usercopy refresh Robin Murphy
  2022-09-28 11:58 ` [PATCH 1/3] arm64: Update copy_from_user() Robin Murphy
@ 2022-09-28 11:58 ` Robin Murphy
  2022-12-05 16:09   ` Will Deacon
  2022-09-28 11:58 ` [PATCH 3/3] arm64: Garbage-collect usercopy leftovers Robin Murphy
  2022-11-07 19:08 ` [PATCH 0/3] arm64: Usercopy refresh Will Deacon
  3 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2022-09-28 11:58 UTC (permalink / raw)
  To: will, catalin.marinas; +Cc: linux-arm-kernel, mark.rutland, kristina.martsenko

As with its counterpart, replace copy_to_user() with a new and improved
version similarly derived from memcpy(). Different tradeoffs from the
base implementation are made relative to copy_from_user() to get the
most consistent results across different microarchitectures, but the
overall shape of the performance gain ends up about the same.

The exception fixups are even more comical this time around, but that's
down to now needing to reconstruct the faulting address, and cope with
overlapping stores at various points. Again, improvements to the
exception mechanism could significantly simplify things in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/lib/copy_to_user.S | 391 +++++++++++++++++++++++++++++-----
 1 file changed, 341 insertions(+), 50 deletions(-)

diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 802231772608..b641f00f50d6 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -1,73 +1,364 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Copyright (C) 2012 ARM Ltd.
+ * Copyright (c) 2012-2022, Arm Limited.
  */
 
 #include <linux/linkage.h>
 
 #include <asm/asm-uaccess.h>
 #include <asm/assembler.h>
-#include <asm/cache.h>
+
+/* Assumptions:
+ *
+ * ARMv8-a, AArch64, unaligned accesses.
+ *
+ */
+
+#define L(label) .L ## label
+
+#define dstin	x0
+#define src	x1
+#define count	x2
+#define dst	x3
+#define srcend	x4
+#define dstend	x5
+#define A_l	x6
+#define A_lw	w6
+#define A_h	x7
+#define B_l	x8
+#define B_lw	w8
+#define B_h	x9
+#define C_l	x10
+#define C_lw	w10
+#define C_h	x11
+#define D_l	x12
+#define D_h	x13
+#define E_l	x14
+#define E_h	x15
+#define F_l	x16
+#define F_h	x17
+#define tmp1	x14
 
 /*
- * Copy to user space from a kernel buffer (alignment handled by the hardware)
+ * Derived from memcpy with various adjustments:
  *
- * Parameters:
- *	x0 - to
- *	x1 - from
- *	x2 - n
- * Returns:
- *	x0 - bytes not copied
+ * - memmove parts are removed since user and kernel pointers won't overlap.
+ * - Contrary to copy_from_user, although big cores still aren't particularly
+ *   keen on the increased instruction count in the main loop, processing fewer
+ *   than 64 bytes per iteration here hurts little cores more.
+ * - The medium-size cases are reworked to better balance the loads with the
+ *   doubled-up store ops, avoid potential out-of-sequence faults, and preserve
+ *   the input arguments for the sake of fault handling.
+ * - The 0-3 byte sequence is replaced with the one borrowed from clear_user,
+ *   since STTRB lacks a register-offset addressing mode.
  */
-	.macro ldrb1 reg, ptr, val
-	ldrb  \reg, [\ptr], \val
-	.endm
 
-	.macro strb1 reg, ptr, val
-	user_ldst 9998f, sttrb, \reg, \ptr, \val
-	.endm
+#define U_pre(x...)	USER(L(fixup_pre), x)
+#define U_dst(x...)	USER(L(fixup_dst), x)
 
-	.macro ldrh1 reg, ptr, val
-	ldrh  \reg, [\ptr], \val
-	.endm
+#define U_S1(x...)	USER(L(fixup_s1), x)
+#define U_S4(x...)	USER(L(fixup_s4), x)
+#define U_S8(x...)	USER(L(fixup_s8), x)
+#define U_ST8(x...)	USER(L(fixup_st8), x)
+#define U_S16(x...)	USER(L(fixup_s16), x)
+#define U_M24(x...)	USER(L(fixup_m24), x)
+#define U_M32(x...)	USER(L(fixup_m32), x)
+#define U_M40(x...)	USER(L(fixup_m40), x)
+#define U_M48(x...)	USER(L(fixup_m48), x)
+#define U_M56(x...)	USER(L(fixup_m56), x)
+#define U_M64(x...)	USER(L(fixup_m64), x)
+#define U_MT8(x...)	USER(L(fixup_mt8), x)
+#define U_MT16(x...)	USER(L(fixup_mt16), x)
+#define U_MT24(x...)	USER(L(fixup_mt24), x)
+#define U_MT32(x...)	USER(L(fixup_mt32), x)
+#define U_MT40(x...)	USER(L(fixup_mt40), x)
+#define U_MT48(x...)	USER(L(fixup_mt48), x)
+#define U_MT56(x...)	USER(L(fixup_mt56), x)
 
-	.macro strh1 reg, ptr, val
-	user_ldst 9997f, sttrh, \reg, \ptr, \val
-	.endm
+#define U_L16(x...)	USER(L(fixup_l16), x)
+#define U_L24(x...)	USER(L(fixup_l24), x)
+#define U_L32(x...)	USER(L(fixup_l32), x)
+#define U_L40(x...)	USER(L(fixup_l40), x)
+#define U_L48(x...)	USER(L(fixup_l48), x)
+#define U_L56(x...)	USER(L(fixup_l56), x)
+#define U_L64(x...)	USER(L(fixup_l64), x)
+#define U_L72(x...)	USER(L(fixup_l72), x)
+#define U_LT8(x...)	USER(L(fixup_lt8), x)
+#define U_LT16(x...)	USER(L(fixup_lt16), x)
+#define U_LT24(x...)	USER(L(fixup_lt24), x)
+#define U_LT32(x...)	USER(L(fixup_lt32), x)
+#define U_LT40(x...)	USER(L(fixup_lt40), x)
+#define U_LT48(x...)	USER(L(fixup_lt48), x)
+#define U_LT56(x...)	USER(L(fixup_lt56), x)
+#define U_LT64(x...)	USER(L(fixup_lt64), x)
 
-	.macro ldr1 reg, ptr, val
-	ldr \reg, [\ptr], \val
-	.endm
-
-	.macro str1 reg, ptr, val
-	user_ldst 9997f, sttr, \reg, \ptr, \val
-	.endm
-
-	.macro ldp1 reg1, reg2, ptr, val
-	ldp \reg1, \reg2, [\ptr], \val
-	.endm
-
-	.macro stp1 reg1, reg2, ptr, val
-	user_stp 9997f, \reg1, \reg2, \ptr, \val
-	.endm
-
-end	.req	x5
-srcin	.req	x15
 SYM_FUNC_START(__arch_copy_to_user)
-	add	end, x0, x2
-	mov	srcin, x1
-#include "copy_template.S"
+	add	srcend, src, count
+	add	dstend, dstin, count
+	cmp	count, 128
+	b.hi	L(copy_long)
+	cmp	count, 32
+	b.hi	L(copy32_128)
+
+	/* Small copies: 0..32 bytes.  */
+	cmp	count, 16
+	b.lo	L(copy16)
+	ldp	A_l, A_h, [src]
+	ldp	D_l, D_h, [srcend, -16]
+U_pre(	sttr	A_l, [dstin])
+U_S8(	sttr	A_h, [dstin, 8])
+U_S16(	sttr	D_l, [dstend, -16])
+U_ST8(	sttr	D_h, [dstend, -8])
 	mov	x0, #0
 	ret
 
-	// Exception fixups
-9997:	cmp	dst, dstin
-	b.ne	9998f
-	// Before being absolutely sure we couldn't copy anything, try harder
-	ldrb	tmp1w, [srcin]
-USER(9998f, sttrb tmp1w, [dst])
-	add	dst, dst, #1
-9998:	sub	x0, end, dst			// bytes not copied
+	/* Copy 8-15 bytes.  */
+L(copy16):
+	tbz	count, 3, L(copy8)
+	ldr	A_l, [src]
+	ldr	A_h, [srcend, -8]
+U_pre(	sttr	A_l, [dstin])
+U_S8(	sttr	A_h, [dstend, -8])
+	mov	x0, #0
 	ret
+
+	.p2align 3
+	/* Copy 4-7 bytes.  */
+L(copy8):
+	tbz	count, 2, L(copy4)
+	ldr	A_lw, [src]
+	ldr	B_lw, [srcend, -4]
+U_pre(	sttr	A_lw, [dstin])
+U_S4(	sttr	B_lw, [dstend, -4])
+	mov	x0, #0
+	ret
+
+	/* Copy 0..3 bytes.  */
+L(copy4):
+	tbz	count, #1, L(copy1)
+	ldrh	A_lw, [src]
+U_pre(	sttrh	A_lw, [dstin])
+L(copy1):
+	tbz	count, #0, L(copy0)
+	ldrb	A_lw, [srcend, -1]
+U_S1(	sttrb	A_lw, [dstend, -1])
+L(copy0):
+	mov	x0, #0
+	ret
+
+	.p2align 4
+	/* Medium copies: 33..128 bytes.  */
+L(copy32_128):
+	ldp	A_l, A_h, [src]
+	ldp	B_l, B_h, [src, 16]
+U_pre(	sttr	A_l, [dstin])
+U_S8(	sttr	A_h, [dstin, 8])
+U_S16(	sttr	B_l, [dstin, 16])
+U_M24(	sttr	B_h, [dstin, 24])
+	ldp	C_l, C_h, [srcend, -32]
+	ldp	D_l, D_h, [srcend, -16]
+	cmp	count, 64
+	b.ls	L(copy64)
+	ldp	E_l, E_h, [src, 32]
+	ldp	F_l, F_h, [src, 48]
+U_M32(	sttr	E_l, [dstin, 32])
+U_M40(	sttr	E_h, [dstin, 40])
+U_M48(	sttr	F_l, [dstin, 48])
+U_M56(	sttr	F_h, [dstin, 56])
+	ldp	A_l, A_h, [srcend, -64]
+	ldp	B_l, B_h, [srcend, -48]
+U_M64(	sttr	A_l, [dstend, -64])
+U_MT56(	sttr	A_h, [dstend, -56])
+U_MT48(	sttr	B_l, [dstend, -48])
+U_MT40(	sttr	B_h, [dstend, -40])
+L(copy64):
+U_MT32(	sttr	C_l, [dstend, -32])
+U_MT24(	sttr	C_h, [dstend, -24])
+U_MT16(	sttr	D_l, [dstend, -16])
+U_MT8(	sttr	D_h, [dstend, -8])
+	mov	x0, #0
+	ret
+
+	.p2align 4
+	nop
+	nop
+	nop
+	/* Copy more than 128 bytes.  */
+L(copy_long):
+	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
+
+	ldp	D_l, D_h, [src]
+	and	tmp1, dstin, 15
+	bic	dst, dstin, 15
+	sub	src, src, tmp1
+	add	count, count, tmp1	/* Count is now 16 too large.  */
+	ldp	A_l, A_h, [src, 16]
+U_pre(	sttr	D_l, [dstin])
+U_S8(	sttr	D_h, [dstin, 8])
+	ldp	B_l, B_h, [src, 32]
+	ldp	C_l, C_h, [src, 48]
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 128 + 16	/* Test and readjust count.  */
+	b.ls	L(copy64_from_end)
+
+L(loop64):
+U_L16(	sttr	A_l, [dst, 16])
+U_L24(	sttr	A_h, [dst, 24])
+	ldp	A_l, A_h, [src, 16]
+U_L32(	sttr	B_l, [dst, 32])
+U_L40(	sttr	B_h, [dst, 40])
+	ldp	B_l, B_h, [src, 32]
+U_L48(	sttr	C_l, [dst, 48])
+U_L56(	sttr	C_h, [dst, 56])
+	ldp	C_l, C_h, [src, 48]
+U_L64(	sttr	D_l, [dst, 64])
+U_L72(	sttr	D_h, [dst, 72])
+	add	dst, dst, #64
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 64
+	b.hi	L(loop64)
+
+	/* Write the last iteration and copy 64 bytes from the end.  */
+L(copy64_from_end):
+	ldp	E_l, E_h, [srcend, -64]
+U_L16(	sttr	A_l, [dst, 16])
+U_L24(	sttr	A_h, [dst, 24])
+	ldp	A_l, A_h, [srcend, -48]
+U_L32(	sttr	B_l, [dst, 32])
+U_L40(	sttr	B_h, [dst, 40])
+	ldp	B_l, B_h, [srcend, -32]
+U_L48(	sttr	C_l, [dst, 48])
+U_L56(	sttr	C_h, [dst, 56])
+	ldp	C_l, C_h, [srcend, -16]
+U_L64(	sttr	D_l, [dst, 64])
+U_L72(	sttr	D_h, [dst, 72])
+U_LT64(	sttr	E_l, [dstend, -64])
+U_LT56(	sttr	E_h, [dstend, -56])
+U_LT48(	sttr	A_l, [dstend, -48])
+U_LT40(	sttr	A_h, [dstend, -40])
+U_LT32(	sttr	B_l, [dstend, -32])
+U_LT24(	sttr	B_h, [dstend, -24])
+U_LT16(	sttr	C_l, [dstend, -16])
+U_LT8(	sttr	C_h, [dstend, -8])
+	mov	x0, #0
+	ret
+
+	/* Fixups... */
+
+	/*
+	 * Fault on the first write, but progress may have been possible;
+	 * realign dst and retry a single byte to confirm.
+	 */
+L(fixup_pre):
+	mov	dst, dstin
+U_dst(	ldtrb	A_lw, [src])
+	strb	A_lw, [dst], #1
+L(fixup_dst):
+	sub	x0, dstend, dst
+	ret
+
+	/* Small: Fault with 1 byte remaining, regardless of count */
+L(fixup_s1):
+	mov	x0, #1
+	ret
+
+	/* Small tail case: Fault 8 bytes before dstend, >=16 bytes written */
+L(fixup_st8):
+	sub	dst, dstend, #8
+	add	dstin, dstin, #16
+L(fixup_tail):
+	cmp	dst, dstin
+	csel	dst, dst, dstin, hi
+	b	L(fixup_dst)
+
+	/* Small/medium: Faults n bytes past dtsin, that much written */
+L(fixup_m64):
+	add	dstin, dstin, #8
+L(fixup_m56):
+	add	dstin, dstin, #8
+L(fixup_m48):
+	add	dstin, dstin, #8
+L(fixup_m40):
+	add	dstin, dstin, #8
+L(fixup_m32):
+	add	dstin, dstin, #8
+L(fixup_m24):
+	add	dstin, dstin, #8
+L(fixup_s16):
+	add	dstin, dstin, #8
+L(fixup_s8):
+	add	dstin, dstin, #4
+L(fixup_s4):
+	add	dst, dstin, #4
+	b	L(fixup_dst)
+
+	/*
+	 * Medium tail cases: Faults n bytes before dstend, 64 or 32 bytes
+	 * past dstin written, depending on original count
+	 */
+L(fixup_mt56):
+	sub	count, count, #8
+L(fixup_mt48):
+	sub	count, count, #8
+L(fixup_mt40):
+	sub	count, count, #8
+L(fixup_mt32):
+	sub	count, count, #8
+L(fixup_mt24):
+	sub	count, count, #8
+L(fixup_mt16):
+	sub	count, count, #8
+L(fixup_mt8):
+	add	count, count, #8
+	add	dst, dstin, count
+
+	sub	tmp1, dstend, dstin
+	cmp	tmp1, #64
+	add	tmp1, dstin, #64
+	add	dstin, dstin, #32
+	csel	dstin, dstin, tmp1, ls
+	b	L(fixup_tail)
+
+	/* Large: Faults n bytes past dst, at least 16 bytes past dstin written */
+L(fixup_l72):
+	add	dst, dst, #8
+L(fixup_l64):
+	add	dst, dst, #8
+L(fixup_l56):
+	add	dst, dst, #8
+L(fixup_l48):
+	add	dst, dst, #8
+L(fixup_l40):
+	add	dst, dst, #8
+L(fixup_l32):
+	add	dst, dst, #8
+L(fixup_l24):
+	add	dst, dst, #8
+L(fixup_l16):
+	add	dst, dst, #16
+	add	dstin, dstin, #16
+	b	L(fixup_tail)
+
+	/* Large tail: Faults n bytes before dstend, 80 bytes past dst written */
+L(fixup_lt64):
+	sub	count, count, #8
+L(fixup_lt56):
+	sub	count, count, #8
+L(fixup_lt48):
+	sub	count, count, #8
+L(fixup_lt40):
+	sub	count, count, #8
+L(fixup_lt32):
+	sub	count, count, #8
+L(fixup_lt24):
+	sub	count, count, #8
+L(fixup_lt16):
+	sub	count, count, #8
+L(fixup_lt8):
+	add	count, count, #56	/* Count was also off by 64 */
+	add	dstin, dst, #80
+	add	dst, dst, count
+	b	L(fixup_tail)
+
 SYM_FUNC_END(__arch_copy_to_user)
 EXPORT_SYMBOL(__arch_copy_to_user)
-- 
2.36.1.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] arm64: Garbage-collect usercopy leftovers
  2022-09-28 11:58 [PATCH 0/3] arm64: Usercopy refresh Robin Murphy
  2022-09-28 11:58 ` [PATCH 1/3] arm64: Update copy_from_user() Robin Murphy
  2022-09-28 11:58 ` [PATCH 2/3] arm64: Update copy_to_user() Robin Murphy
@ 2022-09-28 11:58 ` Robin Murphy
  2022-11-07 19:08 ` [PATCH 0/3] arm64: Usercopy refresh Will Deacon
  3 siblings, 0 replies; 8+ messages in thread
From: Robin Murphy @ 2022-09-28 11:58 UTC (permalink / raw)
  To: will, catalin.marinas; +Cc: linux-arm-kernel, mark.rutland, kristina.martsenko

With both usercopy routines replaced, remove the now-unused template
and supporting macros.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/include/asm/asm-uaccess.h |  30 -----
 arch/arm64/lib/copy_template.S       | 181 ---------------------------
 2 files changed, 211 deletions(-)
 delete mode 100644 arch/arm64/lib/copy_template.S

diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
index 75b211c98dea..c2dea9b6b9e9 100644
--- a/arch/arm64/include/asm/asm-uaccess.h
+++ b/arch/arm64/include/asm/asm-uaccess.h
@@ -62,34 +62,4 @@ alternative_else_nop_endif
 #define USER(l, x...)				\
 9999:	x;					\
 	_asm_extable_uaccess	9999b, l
-
-/*
- * Generate the assembly for LDTR/STTR with exception table entries.
- * This is complicated as there is no post-increment or pair versions of the
- * unprivileged instructions, and USER() only works for single instructions.
- */
-	.macro user_ldp l, reg1, reg2, addr, post_inc
-8888:		ldtr	\reg1, [\addr];
-8889:		ldtr	\reg2, [\addr, #8];
-		add	\addr, \addr, \post_inc;
-
-		_asm_extable_uaccess	8888b, \l;
-		_asm_extable_uaccess	8889b, \l;
-	.endm
-
-	.macro user_stp l, reg1, reg2, addr, post_inc
-8888:		sttr	\reg1, [\addr];
-8889:		sttr	\reg2, [\addr, #8];
-		add	\addr, \addr, \post_inc;
-
-		_asm_extable_uaccess	8888b,\l;
-		_asm_extable_uaccess	8889b,\l;
-	.endm
-
-	.macro user_ldst l, inst, reg, addr, post_inc
-8888:		\inst		\reg, [\addr];
-		add		\addr, \addr, \post_inc;
-
-		_asm_extable_uaccess	8888b, \l;
-	.endm
 #endif
diff --git a/arch/arm64/lib/copy_template.S b/arch/arm64/lib/copy_template.S
deleted file mode 100644
index 488df234c49a..000000000000
--- a/arch/arm64/lib/copy_template.S
+++ /dev/null
@@ -1,181 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2013 ARM Ltd.
- * Copyright (C) 2013 Linaro.
- *
- * This code is based on glibc cortex strings work originally authored by Linaro
- * be found @
- *
- * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/
- * files/head:/src/aarch64/
- */
-
-
-/*
- * Copy a buffer from src to dest (alignment handled by the hardware)
- *
- * Parameters:
- *	x0 - dest
- *	x1 - src
- *	x2 - n
- * Returns:
- *	x0 - dest
- */
-dstin	.req	x0
-src	.req	x1
-count	.req	x2
-tmp1	.req	x3
-tmp1w	.req	w3
-tmp2	.req	x4
-tmp2w	.req	w4
-dst	.req	x6
-
-A_l	.req	x7
-A_h	.req	x8
-B_l	.req	x9
-B_h	.req	x10
-C_l	.req	x11
-C_h	.req	x12
-D_l	.req	x13
-D_h	.req	x14
-
-	mov	dst, dstin
-	cmp	count, #16
-	/*When memory length is less than 16, the accessed are not aligned.*/
-	b.lo	.Ltiny15
-
-	neg	tmp2, src
-	ands	tmp2, tmp2, #15/* Bytes to reach alignment. */
-	b.eq	.LSrcAligned
-	sub	count, count, tmp2
-	/*
-	* Copy the leading memory data from src to dst in an increasing
-	* address order.By this way,the risk of overwriting the source
-	* memory data is eliminated when the distance between src and
-	* dst is less than 16. The memory accesses here are alignment.
-	*/
-	tbz	tmp2, #0, 1f
-	ldrb1	tmp1w, src, #1
-	strb1	tmp1w, dst, #1
-1:
-	tbz	tmp2, #1, 2f
-	ldrh1	tmp1w, src, #2
-	strh1	tmp1w, dst, #2
-2:
-	tbz	tmp2, #2, 3f
-	ldr1	tmp1w, src, #4
-	str1	tmp1w, dst, #4
-3:
-	tbz	tmp2, #3, .LSrcAligned
-	ldr1	tmp1, src, #8
-	str1	tmp1, dst, #8
-
-.LSrcAligned:
-	cmp	count, #64
-	b.ge	.Lcpy_over64
-	/*
-	* Deal with small copies quickly by dropping straight into the
-	* exit block.
-	*/
-.Ltail63:
-	/*
-	* Copy up to 48 bytes of data. At this point we only need the
-	* bottom 6 bits of count to be accurate.
-	*/
-	ands	tmp1, count, #0x30
-	b.eq	.Ltiny15
-	cmp	tmp1w, #0x20
-	b.eq	1f
-	b.lt	2f
-	ldp1	A_l, A_h, src, #16
-	stp1	A_l, A_h, dst, #16
-1:
-	ldp1	A_l, A_h, src, #16
-	stp1	A_l, A_h, dst, #16
-2:
-	ldp1	A_l, A_h, src, #16
-	stp1	A_l, A_h, dst, #16
-.Ltiny15:
-	/*
-	* Prefer to break one ldp/stp into several load/store to access
-	* memory in an increasing address order,rather than to load/store 16
-	* bytes from (src-16) to (dst-16) and to backward the src to aligned
-	* address,which way is used in original cortex memcpy. If keeping
-	* the original memcpy process here, memmove need to satisfy the
-	* precondition that src address is at least 16 bytes bigger than dst
-	* address,otherwise some source data will be overwritten when memove
-	* call memcpy directly. To make memmove simpler and decouple the
-	* memcpy's dependency on memmove, withdrew the original process.
-	*/
-	tbz	count, #3, 1f
-	ldr1	tmp1, src, #8
-	str1	tmp1, dst, #8
-1:
-	tbz	count, #2, 2f
-	ldr1	tmp1w, src, #4
-	str1	tmp1w, dst, #4
-2:
-	tbz	count, #1, 3f
-	ldrh1	tmp1w, src, #2
-	strh1	tmp1w, dst, #2
-3:
-	tbz	count, #0, .Lexitfunc
-	ldrb1	tmp1w, src, #1
-	strb1	tmp1w, dst, #1
-
-	b	.Lexitfunc
-
-.Lcpy_over64:
-	subs	count, count, #128
-	b.ge	.Lcpy_body_large
-	/*
-	* Less than 128 bytes to copy, so handle 64 here and then jump
-	* to the tail.
-	*/
-	ldp1	A_l, A_h, src, #16
-	stp1	A_l, A_h, dst, #16
-	ldp1	B_l, B_h, src, #16
-	ldp1	C_l, C_h, src, #16
-	stp1	B_l, B_h, dst, #16
-	stp1	C_l, C_h, dst, #16
-	ldp1	D_l, D_h, src, #16
-	stp1	D_l, D_h, dst, #16
-
-	tst	count, #0x3f
-	b.ne	.Ltail63
-	b	.Lexitfunc
-
-	/*
-	* Critical loop.  Start at a new cache line boundary.  Assuming
-	* 64 bytes per line this ensures the entire loop is in one line.
-	*/
-	.p2align	L1_CACHE_SHIFT
-.Lcpy_body_large:
-	/* pre-get 64 bytes data. */
-	ldp1	A_l, A_h, src, #16
-	ldp1	B_l, B_h, src, #16
-	ldp1	C_l, C_h, src, #16
-	ldp1	D_l, D_h, src, #16
-1:
-	/*
-	* interlace the load of next 64 bytes data block with store of the last
-	* loaded 64 bytes data.
-	*/
-	stp1	A_l, A_h, dst, #16
-	ldp1	A_l, A_h, src, #16
-	stp1	B_l, B_h, dst, #16
-	ldp1	B_l, B_h, src, #16
-	stp1	C_l, C_h, dst, #16
-	ldp1	C_l, C_h, src, #16
-	stp1	D_l, D_h, dst, #16
-	ldp1	D_l, D_h, src, #16
-	subs	count, count, #64
-	b.ge	1b
-	stp1	A_l, A_h, dst, #16
-	stp1	B_l, B_h, dst, #16
-	stp1	C_l, C_h, dst, #16
-	stp1	D_l, D_h, dst, #16
-
-	tst	count, #0x3f
-	b.ne	.Ltail63
-.Lexitfunc:
-- 
2.36.1.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] arm64: Usercopy refresh
  2022-09-28 11:58 [PATCH 0/3] arm64: Usercopy refresh Robin Murphy
                   ` (2 preceding siblings ...)
  2022-09-28 11:58 ` [PATCH 3/3] arm64: Garbage-collect usercopy leftovers Robin Murphy
@ 2022-11-07 19:08 ` Will Deacon
  3 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2022-11-07 19:08 UTC (permalink / raw)
  To: Robin Murphy, catalin.marinas
  Cc: kernel-team, Will Deacon, linux-arm-kernel, mark.rutland,
	kristina.martsenko

On Wed, 28 Sep 2022 12:58:50 +0100, Robin Murphy wrote:
> With the troublesome copy_in_user() now long gone, it's finally time for
> some less-worse usercopy routines! Beyond just improving on the mangled
> old code, this is also spurred on by Mark's ongoing work on exception
> handling, off the back of which we had some ideas around making uaccess
> fixups neater. Those can be explored in due course, but getting the last
> of the old cruft out of the way first should help make it easier. For
> now I've held off on clever macro stuff and stuck with hand-rolled
> labels and explicit annotations which do bump the line count up a fair
> bit, but in the hope of keeping the code itself more readable for
> review.
> 
> [...]

Applied to arm64 (for-next/uaccess), thanks!

[1/3] arm64: Update copy_from_user()
      https://git.kernel.org/arm64/c/02f7a8ee7a27
[2/3] arm64: Update copy_to_user()
      https://git.kernel.org/arm64/c/c1dd028491d1
[3/3] arm64: Garbage-collect usercopy leftovers
      https://git.kernel.org/arm64/c/72a11ee6f013

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arm64: Update copy_to_user()
  2022-09-28 11:58 ` [PATCH 2/3] arm64: Update copy_to_user() Robin Murphy
@ 2022-12-05 16:09   ` Will Deacon
  2022-12-05 16:46     ` Mark Rutland
  0 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2022-12-05 16:09 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas, linux-arm-kernel, mark.rutland, kristina.martsenko

Hi Robin,

On Wed, Sep 28, 2022 at 12:58:52PM +0100, Robin Murphy wrote:
> As with its counterpart, replace copy_to_user() with a new and improved
> version similarly derived from memcpy(). Different tradeoffs from the
> base implementation are made relative to copy_from_user() to get the
> most consistent results across different microarchitectures, but the
> overall shape of the performance gain ends up about the same.
> 
> The exception fixups are even more comical this time around, but that's
> down to now needing to reconstruct the faulting address, and cope with
> overlapping stores at various points. Again, improvements to the
> exception mechanism could significantly simplify things in future.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/lib/copy_to_user.S | 391 +++++++++++++++++++++++++++++-----
>  1 file changed, 341 insertions(+), 50 deletions(-)

Mark reported (off-list) a KASAN splat from Syzkaller on for-next/core:

  | BUG: KASAN: stack-out-of-bounds in _copy_to_iter+0x1084/0x131c

We've not had any reports of this before (or on other branches) and Mark
is unable to reproduce the problem on mainline. I suggested trying
for-next/uaccess and he confirmed that the problem has been introduced
there. Hopefully he can chime in with more details.

In the meantime, I think it's safest to drop this branch for now considering
where we are in the release cycle. I'll leave it pushed to the arm64 git,
so we can put fixes on top and try to merge it next time around instead.

One thing I _did_ notice from a quick skim of the code is:

> +	/* Fixups... */
> +
> +	/*
> +	 * Fault on the first write, but progress may have been possible;
> +	 * realign dst and retry a single byte to confirm.
> +	 */
> +L(fixup_pre):
> +	mov	dst, dstin
> +U_dst(	ldtrb	A_lw, [src])
> +	strb	A_lw, [dst], #1
> +L(fixup_dst):
> +	sub	x0, dstend, dst
> +	ret

Why is U_dst() being applied to the load for a copy_to_user() here? I
would've expected to annotate the store.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arm64: Update copy_to_user()
  2022-12-05 16:09   ` Will Deacon
@ 2022-12-05 16:46     ` Mark Rutland
  2022-12-05 17:31       ` Robin Murphy
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Rutland @ 2022-12-05 16:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robin Murphy, catalin.marinas, linux-arm-kernel, kristina.martsenko

On Mon, Dec 05, 2022 at 04:09:10PM +0000, Will Deacon wrote:
> Hi Robin,
> 
> On Wed, Sep 28, 2022 at 12:58:52PM +0100, Robin Murphy wrote:
> > As with its counterpart, replace copy_to_user() with a new and improved
> > version similarly derived from memcpy(). Different tradeoffs from the
> > base implementation are made relative to copy_from_user() to get the
> > most consistent results across different microarchitectures, but the
> > overall shape of the performance gain ends up about the same.
> > 
> > The exception fixups are even more comical this time around, but that's
> > down to now needing to reconstruct the faulting address, and cope with
> > overlapping stores at various points. Again, improvements to the
> > exception mechanism could significantly simplify things in future.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  arch/arm64/lib/copy_to_user.S | 391 +++++++++++++++++++++++++++++-----
> >  1 file changed, 341 insertions(+), 50 deletions(-)
> 
> Mark reported (off-list) a KASAN splat from Syzkaller on for-next/core:
> 
>   | BUG: KASAN: stack-out-of-bounds in _copy_to_iter+0x1084/0x131c

FWIW, I was testing with the config framents in:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=testing/arm64-fnc-20221202

... building with:

  usekorg 12.1.0 make ARCH=arm64 CROSS_COMPILE=aarch64-linux- defconfig testing.config syzkaller.config kasan.config

... and the full splat was:

| ==================================================================
| BUG: KASAN: stack-out-of-bounds in _copy_to_iter+0x1084/0x131c
| Read of size 8 at addr ffff800017f27cc8 by task syz-executor.1/391
| 
| CPU: 1 PID: 391 Comm: syz-executor.1 Not tainted 6.1.0-rc4-00108-gc7586d0e4d65 #1
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace.part.0+0x1c8/0x1d4
|  show_stack+0x34/0x5c
|  dump_stack_lvl+0x11c/0x18c
|  print_report+0x194/0x480
|  kasan_report+0xb8/0x250
|  __asan_report_load8_noabort+0x28/0x3c
|  _copy_to_iter+0x1084/0x131c
|  copy_page_to_iter+0x128/0x840
|  pipe_to_user+0xb8/0x160
|  __splice_from_pipe+0x358/0x744
|  __do_sys_vmsplice+0x3c8/0x8a0
|  __arm64_sys_vmsplice+0x94/0xe0
|  invoke_syscall+0x8c/0x2d0
|  el0_svc_common.constprop.0+0xf4/0x300
|  do_el0_svc+0x6c/0x1d4
|  el0_svc+0x54/0x120
|  el0t_64_sync_handler+0xf4/0x120
|  el0t_64_sync+0x190/0x194
| 
| The buggy address belongs to stack of task syz-executor.1/391
|  and is located at offset 488 in frame:
|  __do_sys_vmsplice+0x0/0x8a0
| 
| This frame has 7 objects:
|  [32, 40) 'iov'
|  [64, 72) 'start'
|  [96, 136) 'iter'
|  [176, 216) 'buf'
|  [256, 312) 'sd'
|  [352, 480) 'iovstack'
|  [512, 640) 'pages'
| 
| The buggy address belongs to the virtual mapping at
|  [ffff800017f20000, ffff800017f29000) created by:
|  kernel_clone+0x1a4/0xb40
| 
| The buggy address belongs to the physical page:
| page:000000009402e755 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4a38c
| memcg:ffff000017b86e02
| flags: 0x3fffc0000000000(node=0|zone=0|lastcpupid=0xffff)
| raw: 03fffc0000000000 0000000000000000 dead000000000122 0000000000000000
| raw: 0000000000000000 0000000000000000 00000001ffffffff ffff000017b86e02
| page dumped because: kasan: bad access detected
| 
| Memory state around the buggy address:
|  ffff800017f27b80: f2 f2 00 00 00 00 00 f2 f2 f2 f2 f2 00 00 00 00
|  ffff800017f27c00: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
| >ffff800017f27c80: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00
|                                               ^
|  ffff800017f27d00: 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3
|  ffff800017f27d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
| ==================================================================

The reduced reproducer for this was:

| r0 = syz_io_uring_setup(0x6bc0, &(0x7f0000000000), &(0x7f0000ffe000/0x1000)=nil, &(0x7f0000002000/0x2000)=nil, &(0x7f0000000080), &(0x7f0000000100))
| mmap$IORING_OFF_SQES(&(0x7f0000002000/0x4000)=nil, 0x4000, 0x0, 0x12, r0, 0x10000000)
| pipe2$9p(&(0x7f0000000000)={<r1=>0xffffffffffffffff, <r2=>0xffffffffffffffff}, 0x0)
| vmsplice(r2, &(0x7f0000001540)=[{&(0x7f0000000040)="94", 0x1}, {&(0x7f0000000180)="a40cfbddd20ba9161c5596b9f8ab0268dd4bb5a16b47f86c2fcf444faa265e8d73e21b48001c57589092db934915ad208fdcdea67bac381f76ee3bcf6f25f36662e56e905acf80567bcdf60a8a4262805c26076857dc92113c0253ada9ffa5509aaa2f2733eeaab09b4cb28bff530a25c038e919a00803c705c97734473169f0dbb85e818c6a87256c7403480c44418cdfbe15f0d0747b01fd1f311951340b4ee16fdd3b3e5375a766b4240b657a0d1858f7e6c41d632d9256acfff12153f8d9dc90a3fc81f41b230b8d5516d9b855a67154e7ba9eee191152e7ab3fbced27ddf10a58fc8ff1c0652045ef3aeaf1ae8a8354abd6bcf9bee9a0f56cc1bd9c1f9050bb10e14882f12f66a74545033d9ca721f40d1de0ad9ad326b8432ac779e0fe24a8f7ce68d6c3d268d9d430dd57fcb9e1d76747d077ffc0b2157acb0d0d37ee609eb50a20e49586cad91693210ce24cea985abce4138ddf10c5334b1f353682eeceb863c5df1ca0665330030c30971b0f1d312199cb419afa33a36b1043afcc3319f632ab0ed56863f76d2eaec351b5c43b3b9140d430689bd226ed7d9cbdae94a7fbc37e4f57626701e322aaf110b1eb69d4bfdeb20de40dd031efd77a9890d51be497ff7e358905a22ebe71b0fc0566c08f32d3b9468266e6a58729b7f62c83e55716e8e927723bd8cb3fb70c90fb5dc8812818967bfe8fbbb058ad58ed21ce50fe1a55039f8d123c086ca1f78a56981b28be8a2484e52bc3d5a3b773997484076d04c646b57651cf67a4a1cd6aaacae25d926c6bb4c073009baf1e6c6f1dc4fc6b59c43ed99025e3feb446999e2a4817cbed6543e117ca1f4fb43703b78759e5762cc78d2aa935a67f2bf6229e13cda47023aabd4c4b6faba3db2a6abe0bfd2a65323df124d22dd6c7ed80bd643fa912c071dfe42e4c579fc7dd53a229b39a32cb9560c12bac3691ddb02fdf1d1ad49b1052a011619464bbae8c215041aa5b1a9007ae8863a931c05b3043e1a058c8be34849f02453a527dd775efdba1d7bdc4411c9c5adfb2f7692fb2cb6a43d5eb99d6278b64ea24916df21c73c35a726a3b03bf7c8e3de2b3b3c615d3ba714619b5151b7e8396f639112e2ac09ad167e63b18aab3a5478f7281d9ae5b0c1a0ac86cc20bf3df57075dfd0f2491c9ab5cb422dfed832dd0d406623a85a1ccf7f37270d228538328798591a5bc516629eee659f8146f577347bf0f6453f1d3aad97b660357da6160506278d146a5d720c2bcc572d9fa6e90b52444db803226249fe2657235aa3db2d7e7a8a1d5fbb7acc4dd5e1de056746af8256d32e8e936ee134cf5efb0a9eea03b7269ddfde666620be98dffd632bcb1c7e795bebabdb96b50c78d3c29d3ef235ddcd52b6c73f2aed472bff1e093756ba832c0ab853841906613f95682f0cdc37a851d57ac4f8918d77de65d5577e2c65225748f9d0e837445db3ffae17289724ab3e4a39e906b4b4661fd4c1599b30c9b9967d09fdbbc9f3727766eeb63f927ae751808026652399df01f82abf51a5293e0283cc63022c40824354bf884ff2b608fb6bbe8de8feec644708a282bf2103ab68efe46b3dcbcc20a0e178238f33c9e0a94b31cc68a472ee1e86bc99d90d8b220c91d72130f81eba41a58de44af2f03b5c0f4f9eda244e99b4ebf377483cdd7bae65de86e99e2f5dca94b8224388c3517752269eba1bd47f9c322b63dd497d6e1f195d31f857930af59b1aa937f0821728b70e8257bc4697b2dc07a1bf7c15034a4c2ab29d83257e63331d0c4c1c1c248f99dfe2862f42b99df8fb6431b7af90b62ce878c6ca3ed738d39fd66d8762b0dd401b1ed502dd0606b46a86ad532a40a0e14e67f30c413049682ec18deee5f1871ddbf7a7d945dfc69930a4642ac1e18a5cad46daa272436c6460d0fe03ee301d345af898a952f4328f5c786c1e1df1456cc8ea350d5077b5a8c08abc1e9494ce4b8bee08543a3fe22b9cb278082ae20e4430f28e0950f9aa3f45ea9f59735b02fedae3bbf6b360e8408dc0b15843666f04695c6e596b7596e7712ba9cfd53695fa2ae4f9887c84ed4bdbbd9977f4c457cc8133c957b38aefd1afd189e24fa4fb2373aab404736f62133ab1923342d654e0ca30f491ba7ceee626ab7cdcfa9f8d679bd63d69d0fb6031488442fb9e7307ae1873516764120888f38a4e456ec63e85fb5edef778d7a73636cba18a05ec1cc8fbe6b58a1f55d9d9519988e330f79357b88d92e0437d3f57715618387201fc4f716000be3a85652ec83d70abd313c6f51195d8582a40b941bbef0d632645890705433fdd0706e3d0a6850f90c94c3b7f45dfaee963705c86f25bcb85d82b11c84937b7fbd6b196c0e1dc44f2ef8c728e7aa61627e852e1d4fe978fb89785eecf51dd100c7906c597c8cdb377514ed21d82524b4acdb27e497a205cad6b4e267a6568b0b5a92d76657f5ae16368ae1f8988e9e84c45c2119c6b88df05044c36c254b4a2d2cec514e00785b4827eaed5947f9465433b6f68e26d7ee85a5898f56bfe75745cb79b9028638c0d165d15c52e3cc28f25d7004423fcc0c3cba38ddf8bc7f3a671f27e72b1df7ebc23912e931b5b53a2fc21c221e17b1115ade67b33ce1a4600b85eb6a04c63cd72346d01abcc31f79f5367d87e509309f5204c9940bf7de850173f632459182966601403959425da461cd31dcb5e6743d829d17a3d79d748dbe019bcd88b48804aa1776f48fdcff110fdcb9d5ad765560706e469d91fc55558887b54b752b5ec9b7de8a4ba49c7fa7662f0ab86220024ebd168a6ab9af7c75fabc2835b3ed4364ee1422999a2687dbceae58b490317f7531c5566f2408c16f33935f0015776fae25260074b01b610ab5643a70592a0c8ddf7e8ddb33ef0ddfdc0b8d256da0e173ecd90fe932040e80de0d194d76bd60cf401eaa4e25662062a32b0c6d1f40576a313609ea153268c0819d15481ad7361b20f09e61f6d5c5ff55d7900b8588cf7fe003d73d2cfa447afd1b3cef7e8d0b6b910b764afb43f835ef365705161eae14f63e67b2f3c0e4cdfd8eee6a95b14c9a37c4c20dc818adb645562bce72cf7f027091174d1e6076a292f5d491459d160b8ccb363aeba6dc0af4b8330cc1304e5f1da2aab6efab5adad72d924967331b35fc1c371bbbf8cf7d78d79e48c2ca71fb5a52d9415f17a96a0b31c2f0a5d9a9fbc171c63f1d2ea106993ac8c4c3aef6cea4298c7e57cf6923edb9c27c4ffff06a389e85ed69ceec61d146d093ede6064303f36baeb45813c28bb73371bc10a5ab4aa0bb7790898f19da71e580a56c6b7605caffa3e6b404a427004c2533d7cd48bb76fa7610607d95ec6eef644992337f420bddf8a9ef1b19320bce1f325c7655246a932dfc14518a4677d43e75b64edcd16267c82ec86ceb935f3b9a44e22aebcf2e1b5f6e73fb4cec247519878d9d0fd34aaef4c27d52dd78627849e623e0d9e1d745532ec3c182287d60d957817539515ce6f3fd0a1fd241713f1816f2d0a063e636b5adcd315b4ece5fd8dbe9ca81373d55915e199c304ba9b4532f3708f49971f6b1dfd4a4686ba6ff4aa52ddd54123765551b1c3fb1e3520f753b180e673cebc71e7bac24f2aba30538a8aec8a849c2e92a89e9755391dcfe33b4ef34c3598d5e255e30cd0d7fb3ede65fff7594c06d39569d17655ec7332676b124e29d3c5b3a24aba82338388601629da385f36fbceaaa91716b32f926df488f552012fecfe610e7b7cff725bf313bbf13cf622e3354c5320a95309760c5", 0xa80}], 0x2, 0x0)
| vmsplice(r1, &(0x7f0000001280)=[{&(0x7f0000001580)="29c7dedc1fc83e848a08d1ab6a6b70a8aba7434e6ad302c1f808d3e154d9a375904104ac349cb5edcd78b744ba6b27cc8f0ad38dda013828b1dcc8ef23a0b0cc865e9a527c9a0bf17f17e433f3700591c5b6a68a430559421b3a960dfb3d8e1a6c3b4144fc1820f4c11f013aaf7591983196aaed77577d52f8423632ce590d28c9f45cb91c745f407ac8af5e75892e589c1b5c73bc0f4646400545073a34214c4d5791711951fc2212cfcabd101ec69f54209c55df4ce1f258a84f979a13d2b01af941b372c111e518022ac74083fc15c3d649dfc3abff6af7d8dbb3e3a9702e3e6a175578176b02c1fbaef7060642f498626feb394e0fa0e6cca23849726911d16c6fb0507172611edd52f28abf5ba517dff8fbae13e8f5356e6a3fe6df0294180dfa82f4c0f8f3c0f297df774e986a48d5d83c6d669722c8e2516220f5f09189459282ab34a733acb7683f7f108bc5997190d527b56efb47e6a46cb3960ba626d3435d817a1ea26959b8a9cb52465dab60efa5d2e81fa8cc07fbbc9550b86163c9d603ae03e7000c50fbb0ce4c0481008892db89dc249bfb15929533f240f1d1a0c8a88e1b836e8ecf9510d10e98f2ef740ce61c885cf00a7aec6f8eee033212cb0d0c7a78a100745453c008502c08e9dfea6456f977956e208102c8ef93a275e4a1a41cbbd964d8f42ff81c729c5742cfc37d33965b07f1b2157e5aa456bb259dcaf9baa775c6d9474e7099eb811db6edefadb4b7a264d4b5c125fe514b710efe9c32ce8ec566e3d643e7771001dc64b72d44e8607032dc2b832a25b0a74db91bf2405fb69bd32fb7ca14ab52570a3074ec20b42ffc71e5bfb8a16582e2b48d784eb09d653af90419920ed7233eb64e6a22ad31833b17ab3d3b0d826ec7d69faf0c234187d6ec20844bb627bc1c476c3756b104e62c99dcd53f1e507f188be09ada0f7595cea3b3df488506398406a7782a0cade85e9ea925dc904d6028052e31b908ff3c7c1d0f0bb49be4208634c8bdb88463f7cb9942ab8621191679f5fc459906d9b63af25e1e7ef68bf7472fbed1585eb3620219691dfc692c3dbfe9aed440dccf2fb96dd7bb6f269d2043abdcc2f3aa469b9ace6e29dfe8846b9b9c17f7a0616fc59cbb66ff7c01a9fa670f4858de60decef9bd3fbf892691e73e40477b0ca55c5463322d8b228a3168a540b029072c8e6f56819fd2ef1486156429769f253d9126901b4e37b24261c127f9948674fcc3855b6f676e5979edd9b857c348ec176d2977fc09431cb35b1ad80fac77ec7a89d9aa9ecf72e66bab174da68cfeb902e5ac1ad8504514bf2fdc45b1fc716f02c4e490168a574bfb1b11dfabd9ead6ab86cc8bc8f861cea8e294bed3651dd78bf11ee548250bc749eb9d81e9fe9fc066884af8d7ea3e5b556e1e98797e8e2022e3fd523508f0b3b2e9cce819cf8779e77937d191e9ff2ece77c530af5ebbba5d642b1fc3e8c32bfe14a1535676e9ce6d23e89b0f22c2e9e739b073857c015cf75be65a0bdbe70145f13c9cdcf4eafcfb1bb733d1b26eb7a5992e3f810f0af8403bb75b55e4430a7d2e8724cbdd7a22c1d4043530c9ee4388ff08331a16664cd9f4348a64d10f5db094ac3336f830b8428b88ec3e46ab56d089efb61c0ef08dd478de5b54288c6d42640367922a9eb84c3024adda569a8792be03217d1b791f6d31e21dff9808e779411190d2294b210d67ffde952f1c65231f5ac82d15fc98072fc1e43605346e5116aa6b5d8372411913c80b06cc71c1ee0608e4ed215550c44db4e3d5ac295cacd486a97813589d96b7b1a105155176baeb6ed29092fa9327d0d7d67ab6c58f7c2f9fca5fe117c3b999e1492fcf9d5e101e9ad61c132e6e3165ee81ddbaa9e403feb2a66fca150735ad95d9b9295ade8c3c396c5bad4d48b05e1ed09cefafd7e3e35818ee5550e7c76ff52432105469a79e7133008dd025695c6ed2a459a8d6c7f49ba44a12b7c7bb9e91eff1634d1106abfbbc7c8fd690961184462bfdcfd1e8b8c730fd971b95e929db1089e04b179eaa094318bc31cb46bb86a64dac45ae0da7222c981d40b6b193c2fd6b374a4b483a690a25589e48e493c26f4604899653fcd7b18b7601140631a7beed7921e8ca6befb917d56fedcbf3be67da78b3dd6c0653af71ecda583570678c331ac0e98c881938a6e9835a627b48c72b5c604c92cbcb19ac7b68a619f45b9fcedc846e7210ed73307cb8a3b5d87f2a6f3a97e74aad9f84d5bbd395f67c5dc56b1b16ff5221962b24feeb38ea32ffcbe1aadc0cc85825533ce22734c2e72fafb99b1d7d4fa9140877a90fae4cf38fb90cf63c527e49f1c8a9ea5b3e4851f77df566439010776820214b6fae22a2f99ccbd86593034348d7b0668bd2ac4d4456f89ab5756a17f8484008905bd1a08430f5cdeac84525ced42092def29f4bf1b7703befceae16da3d43ff6a19aae61d27ce7b94fbdff56da0035511e0d36d7f01b5f15b5a1f42b3f4c07c03d68ce58dfce67ae6634ccdd382e4b68dbc1df0a098bd75ff459b7028d1caeb0c478a178e67253c013a2b9d19ec62f10b90c26259def748c2836cec94f92fa81470d5dbd891a5034bd5841c7b4af7a3bed1b67de385f7e0fb85ed53928da52a0b98e2ab5f32719578c3d4d8ca0f3f7d9c82ef9cd405e0322097f8372bb6e537754f0599bdaa657f2b9aadef7373e98e6dc9482e3dbe88aa554c70191f0ddcd8588da337d2d4389e4a468ec21ed7eb3653a45cb87cb3c626b7f1d03cb0a385fdaf3b30ca1ee145a7729958c330e69e91f058cac7c4b74149369266b758cf62f1b8474e0e1d96101812b6c24abb244ffac625b83cd50e8b1c8d9ba7dbb1e92fd9e6a8f158b3abff84c24413e772b18cdb12ecff68f516dc3191840e5611eb3e1581d477a496ee4403230decbe9cc1ef0aa1beda4c15fa9f70874a7494a83e450039ef2385d19fe381de1580449e206f31bd1ba17d99c60bc6eb164174d69320679089c474e4befdaabd688c4102047a808340094a38934d327f044ced1070f9c3d7fffe3863ff61417fb263c351d7324ddd8e4361398c87eaed9eab9a63a4f0a23122a1c758acf81f91a0f9109df4c8af6d4f0900fc3ef04f9e9c6825ee42d6d76e3ed75b845e5da333804af68fed85ad3271235f7ad87d6325c1426dcca790ae3f3c4c5980d1ae7d9cb5dc577f02babe2f0198468c12fd54a9222e2a6bdc70bea9e9c8b1d9a1e7f0b090dfefb36250bbac93f636aeac45b4f92d5b0b2fb3b91d36e48c8499ebe4c7f9d7aadc2a327e1c08201f695b072933d1d223c7fbe0c059244ca4f874e475336c6740e235ab9457e00c5da38686903edbfd5834385e9c730749002e798040f12a37e415f6ba49ad23ae79650ca233fcef58d8b412cdef87c70d5897b25dcf4f6caaa266da726104549a6158165b4724c5ec9ef7bc807f46beb2691cf3de90d2bf72ba26e55a47ae74e8a0854c4f8cf4a57ef38b9e431fd34850fa69336ed46912727532352ebf73595edaab6065917d3a2bb775ac7424fafd380be00adf58a8fdd2b0b6e1b536fd77876a8a18097b03ca0ccefb655bda134bd2386b1bc4b1489a38e54d025612c36305a245649fc89fe281cd8f93395f00f6dc5e1417f353b84d52f08cf2ce44a97601ab513cdb351a5a608e03ac8c233764572e3c006c203cea941cdeec4bda877edc887a07ff0003cf1342f1cc64857bffd264104f850a5382efe5cbe5c439af6007b2f64555b1d276388ef50134d0ca819b52105351a5570adc17757601d890", 0xa81}], 0x1, 0x0)

... which blew up atop for-next/uaccess, but was fine atop v6.1-rc4.

A bisect ended with:

  c1dd028491d18d88d4ece4e5ad8473440a68a6b1 is the first bad commit

i.e.

  c1dd028491d18d88 ("arm64: Update copy_to_user()")

... which seesm to agree with Will's analysis below.

> We've not had any reports of this before (or on other branches) and Mark
> is unable to reproduce the problem on mainline. I suggested trying
> for-next/uaccess and he confirmed that the problem has been introduced
> there. Hopefully he can chime in with more details.
> 
> In the meantime, I think it's safest to drop this branch for now considering
> where we are in the release cycle. I'll leave it pushed to the arm64 git,
> so we can put fixes on top and try to merge it next time around instead.
> 
> One thing I _did_ notice from a quick skim of the code is:
> 
> > +	/* Fixups... */
> > +
> > +	/*
> > +	 * Fault on the first write, but progress may have been possible;
> > +	 * realign dst and retry a single byte to confirm.
> > +	 */
> > +L(fixup_pre):
> > +	mov	dst, dstin
> > +U_dst(	ldtrb	A_lw, [src])
> > +	strb	A_lw, [dst], #1
> > +L(fixup_dst):
> > +	sub	x0, dstend, dst
> > +	ret
> 
> Why is U_dst() being applied to the load for a copy_to_user() here? I
> would've expected to annotate the store.

Ignoring the annotation, that's an LDTRB (an *unprivileged* load) feeding into
an STRB (a *privileged* store), which is definitely the wrote way around for
copy_*to*_user().

I think this is a copy-paste mixup with the code in copy_from_user(), since the
two are identical here:

# copy_from_user()
|         /*
|          * Fault before anything has been written, but progress may have
|          * been possible; realign dst and retry a single byte to confirm.
|          */
| L(fixup_pre):
|         mov     dst, dstin
| U_dst(  ldtrb   A_lw, [src])
|         strb    A_lw, [dst], #1
| L(fixup_dst):
|         sub     x0, dstend, dst
|         ret

# copy_to_user()
|         /*  
|          * Fault on the first write, but progress may have been possible;
|          * realign dst and retry a single byte to confirm.
|          */
| L(fixup_pre):
|         mov     dst, dstin
| U_dst(  ldtrb   A_lw, [src])
|         strb    A_lw, [dst], #1
| L(fixup_dst):
|         sub     x0, dstend, dst 
|         ret 

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arm64: Update copy_to_user()
  2022-12-05 16:46     ` Mark Rutland
@ 2022-12-05 17:31       ` Robin Murphy
  0 siblings, 0 replies; 8+ messages in thread
From: Robin Murphy @ 2022-12-05 17:31 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon
  Cc: catalin.marinas, linux-arm-kernel, kristina.martsenko

On 2022-12-05 16:46, Mark Rutland wrote:
> On Mon, Dec 05, 2022 at 04:09:10PM +0000, Will Deacon wrote:
>> Hi Robin,
>>
>> On Wed, Sep 28, 2022 at 12:58:52PM +0100, Robin Murphy wrote:
>>> As with its counterpart, replace copy_to_user() with a new and improved
>>> version similarly derived from memcpy(). Different tradeoffs from the
>>> base implementation are made relative to copy_from_user() to get the
>>> most consistent results across different microarchitectures, but the
>>> overall shape of the performance gain ends up about the same.
>>>
>>> The exception fixups are even more comical this time around, but that's
>>> down to now needing to reconstruct the faulting address, and cope with
>>> overlapping stores at various points. Again, improvements to the
>>> exception mechanism could significantly simplify things in future.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   arch/arm64/lib/copy_to_user.S | 391 +++++++++++++++++++++++++++++-----
>>>   1 file changed, 341 insertions(+), 50 deletions(-)
>>
>> Mark reported (off-list) a KASAN splat from Syzkaller on for-next/core:
>>
>>    | BUG: KASAN: stack-out-of-bounds in _copy_to_iter+0x1084/0x131c
> 
> FWIW, I was testing with the config framents in:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=testing/arm64-fnc-20221202
> 
> ... building with:
> 
>    usekorg 12.1.0 make ARCH=arm64 CROSS_COMPILE=aarch64-linux- defconfig testing.config syzkaller.config kasan.config
> 
> ... and the full splat was:
> 
> | ==================================================================
> | BUG: KASAN: stack-out-of-bounds in _copy_to_iter+0x1084/0x131c
> | Read of size 8 at addr ffff800017f27cc8 by task syz-executor.1/391
> |
> | CPU: 1 PID: 391 Comm: syz-executor.1 Not tainted 6.1.0-rc4-00108-gc7586d0e4d65 #1
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace.part.0+0x1c8/0x1d4
> |  show_stack+0x34/0x5c
> |  dump_stack_lvl+0x11c/0x18c
> |  print_report+0x194/0x480
> |  kasan_report+0xb8/0x250
> |  __asan_report_load8_noabort+0x28/0x3c
> |  _copy_to_iter+0x1084/0x131c
> |  copy_page_to_iter+0x128/0x840
> |  pipe_to_user+0xb8/0x160
> |  __splice_from_pipe+0x358/0x744
> |  __do_sys_vmsplice+0x3c8/0x8a0
> |  __arm64_sys_vmsplice+0x94/0xe0
> |  invoke_syscall+0x8c/0x2d0
> |  el0_svc_common.constprop.0+0xf4/0x300
> |  do_el0_svc+0x6c/0x1d4
> |  el0_svc+0x54/0x120
> |  el0t_64_sync_handler+0xf4/0x120
> |  el0t_64_sync+0x190/0x194
> |
> | The buggy address belongs to stack of task syz-executor.1/391
> |  and is located at offset 488 in frame:
> |  __do_sys_vmsplice+0x0/0x8a0
> |
> | This frame has 7 objects:
> |  [32, 40) 'iov'
> |  [64, 72) 'start'
> |  [96, 136) 'iter'
> |  [176, 216) 'buf'
> |  [256, 312) 'sd'
> |  [352, 480) 'iovstack'
> |  [512, 640) 'pages'
> |
> | The buggy address belongs to the virtual mapping at
> |  [ffff800017f20000, ffff800017f29000) created by:
> |  kernel_clone+0x1a4/0xb40
> |
> | The buggy address belongs to the physical page:
> | page:000000009402e755 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4a38c
> | memcg:ffff000017b86e02
> | flags: 0x3fffc0000000000(node=0|zone=0|lastcpupid=0xffff)
> | raw: 03fffc0000000000 0000000000000000 dead000000000122 0000000000000000
> | raw: 0000000000000000 0000000000000000 00000001ffffffff ffff000017b86e02
> | page dumped because: kasan: bad access detected
> |
> | Memory state around the buggy address:
> |  ffff800017f27b80: f2 f2 00 00 00 00 00 f2 f2 f2 f2 f2 00 00 00 00
> |  ffff800017f27c00: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
> | >ffff800017f27c80: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00
> |                                               ^
> |  ffff800017f27d00: 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3
> |  ffff800017f27d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> | ==================================================================
> 
> The reduced reproducer for this was:
> 
> | r0 = syz_io_uring_setup(0x6bc0, &(0x7f0000000000), &(0x7f0000ffe000/0x1000)=nil, &(0x7f0000002000/0x2000)=nil, &(0x7f0000000080), &(0x7f0000000100))
> | mmap$IORING_OFF_SQES(&(0x7f0000002000/0x4000)=nil, 0x4000, 0x0, 0x12, r0, 0x10000000)
> | pipe2$9p(&(0x7f0000000000)={<r1=>0xffffffffffffffff, <r2=>0xffffffffffffffff}, 0x0)
> | vmsplice(r2, &(0x7f0000001540)=[{&(0x7f0000000040)="94", 0x1}, {&(0x7f0000000180)="a40cfbddd20ba9161c5596b9f8ab0268dd4bb5a16b47f86c2fcf444faa265e8d73e21b48001c57589092db934915ad208fdcdea67bac381f76ee3bcf6f25f36662e56e905acf80567bcdf60a8a4262805c26076857dc92113c0253ada9ffa5509aaa2f2733eeaab09b4cb28bff530a25c038e919a00803c705c97734473169f0dbb85e818c6a87256c7403480c44418cdfbe15f0d0747b01fd1f311951340b4ee16fdd3b3e5375a766b4240b657a0d1858f7e6c41d632d9256acfff12153f8d9dc90a3fc81f41b230b8d5516d9b855a67154e7ba9eee191152e7ab3fbced27ddf10a58fc8ff1c0652045ef3aeaf1ae8a8354abd6bcf9bee9a0f56cc1bd9c1f9050bb10e14882f12f66a74545033d9ca721f40d1de0ad9ad326b8432ac779e0fe24a8f7ce68d6c3d268d9d430dd57fcb9e1d76747d077ffc0b2157acb0d0d37ee609eb50a20e49586cad91693210ce24cea985abce4138ddf10c5334b1f353682eeceb863c5df1ca0665330030c30971b0f1d312199cb419afa33a36b1043afcc3319f632ab0ed56863f76d2eaec351b5c43b3b9140d430689bd226ed7d9cbdae94a7fbc37e4f57626701e322aaf110b1eb69d4bfdeb20de40dd031efd77a9890d51be497ff7e358905a22ebe71b0fc0566c08f32d3b9468266e6a58729b7f62c83e55716e8e927723bd8cb3fb70c90fb5dc8812818967bfe8fbbb058ad58ed21ce50fe1a55039f8d123c086ca1f78a56981b28be8a2484e52bc3d5a3b773997484076d04c646b57651cf67a4a1cd6aaacae25d926c6bb4c073009baf1e6c6f1dc4fc6b59c43ed99025e3feb446999e2a4817cbed6543e117ca1f4fb43703b78759e5762cc78d2aa935a67f2bf6229e13cda47023aabd4c4b6faba3db2a6abe0bfd2a65323df124d22dd6c7ed80bd643fa912c071dfe42e4c579fc7dd53a229b39a32cb9560c12bac3691ddb02fdf1d1ad49b1052a011619464bbae8c215041aa5b1a9007ae8863a931c05b3043e1a058c8be34849f02453a527dd775efdba1d7bdc4411c9c5adfb2f7692fb2cb6a43d5eb99d6278b64ea24916df21c73c35a726a3b03bf7c8e3de2b3b3c615d3ba714619b5151b7e8396f639112e2ac09ad167e63b18aab3a5478f7281d9ae5b0c1a0ac86cc20bf3df57075dfd0f2491c9ab5cb422dfed832dd0d406623a85a1ccf7f37270d228538328798591a5bc516629eee659f8146f577347bf0f6453f1d3aad97b660357da6160506278d146a5d720c2bcc572d9fa6e90b52444db803226249fe2657235aa3db2d7e7a8a1d5fbb7acc4dd5e1de056746af8256d32e8e936ee134cf5efb0a9eea03b7269ddfde666620be98dffd632bcb1c7e795bebabdb96b50c78d3c29d3ef235ddcd52b6c73f2aed472bff1e093756ba832c0ab853841906613f95682f0cdc37a851d57ac4f8918d77de65d5577e2c65225748f9d0e837445db3ffae17289724ab3e4a39e906b4b4661fd4c1599b30c9b9967d09fdbbc9f3727766eeb63f927ae751808026652399df01f82abf51a5293e0283cc63022c40824354bf884ff2b608fb6bbe8de8feec644708a282bf2103ab68efe46b3dcbcc20a0e178238f33c9e0a94b31cc68a472ee1e86bc99d90d8b220c91d72130f81eba41a58de44af2f03b5c0f4f9eda244e99b4ebf377483cdd7bae65de86e99e2f5dca94b8224388c3517752269eba1bd47f9c322b63dd497d6e1f195d31f857930af59b1aa937f0821728b70e8257bc4697b2dc07a1bf7c15034a4c2ab29d83257e63331d0c4c1c1c248f99dfe2862f42b99df8fb6431b7af90b62ce878c6ca3ed738d39fd66d8762b0dd401b1ed502dd0606b46a86ad532a40a0e14e67f30c413049682ec18deee5f1871ddbf7a7d945dfc69930a4642ac1e18a5cad46daa272436c6460d0fe03ee301d345af898a952f4328f5c786c1e1df1456cc8ea350d5077b5a8c08abc1e9494ce4b8bee08543a3fe22b9cb278082ae20e4430f28e0950f9aa3f45ea9f59735b02fedae3bbf6b360e8408dc0b15843666f04695c6e596b7596e7712ba9cfd53695fa2ae4f9887c84ed4bdbbd9977f4c457cc8133c957b38aefd1afd189e24fa4fb2373aab404736f62133ab1923342d654e0ca30f491ba7ceee626ab7cdcfa9f8d679bd63d69d0fb6031488442fb9e7307ae1873516764120888f38a4e456ec63e85fb5edef778d7a73636cba18a05ec1cc8fbe6b58a1f55d9d9519988e330f79357b88d92e0437d3f57715618387201fc4f716000be3a85652ec83d70abd313c6f51195d8582a40b941bbef0d632645890705433fdd0706e3d0a6850f90c94c3b7f45dfaee963705c86f25bcb85d82b11c84937b7fbd6b196c0e1dc44f2ef8c728e7aa61627e852e1d4fe978fb89785eecf51dd100c7906c597c8cdb377514ed21d82524b4acdb27e497a205cad6b4e267a6568b0b5a92d76657f5ae16368ae1f8988e9e84c45c2119c6b88df05044c36c254b4a2d2cec514e00785b4827eaed5947f9465433b6f68e26d7ee85a5898f56bfe75745cb79b9028638c0d165d15c52e3cc28f25d7004423fcc0c3cba38ddf8bc7f3a671f27e72b1df7ebc23912e931b5b53a2fc21c221e17b1115ade67b33ce1a4600b85eb6a04c63cd72346d01abcc31f79f5367d87e509309f5204c9940bf7de850173f632459182966601403959425da461cd31dcb5e6743d829d17a3d79d748dbe019bcd88b48804aa1776f48fdcff110fdcb9d5ad765560706e469d91fc55558887b54b752b5ec9b7de8a4ba49c7fa7662f0ab86220024ebd168a6ab9af7c75fabc2835b3ed4364ee1422999a2687dbceae58b490317f7531c5566f2408c16f33935f0015776fae25260074b01b610ab5643a70592a0c8ddf7e8ddb33ef0ddfdc0b8d256da0e173ecd90fe932040e80de0d194d76bd60cf401eaa4e25662062a32b0c6d1f40576a313609ea153268c0819d15481ad7361b20f09e61f6d5c5ff55d7900b8588cf7fe003d73d2cfa447afd1b3cef7e8d0b6b910b764afb43f835ef365705161eae14f63e67b2f3c0e4cdfd8eee6a95b14c9a37c4c20dc818adb645562bce72cf7f027091174d1e6076a292f5d491459d160b8ccb363aeba6dc0af4b8330cc1304e5f1da2aab6efab5adad72d924967331b35fc1c371bbbf8cf7d78d79e48c2ca71fb5a52d9415f17a96a0b31c2f0a5d9a9fbc171c63f1d2ea106993ac8c4c3aef6cea4298c7e57cf6923edb9c27c4ffff06a389e85ed69ceec61d146d093ede6064303f36baeb45813c28bb73371bc10a5ab4aa0bb7790898f19da71e580a56c6b7605caffa3e6b404a427004c2533d7cd48bb76fa7610607d95ec6eef644992337f420bddf8a9ef1b19320bce1f325c7655246a932dfc14518a4677d43e75b64edcd16267c82ec86ceb935f3b9a44e22aebcf2e1b5f6e73fb4cec247519878d9d0fd34aaef4c27d52dd78627849e623e0d9e1d745532ec3c182287d60d957817539515ce6f3fd0a1fd241713f1816f2d0a063e636b5adcd315b4ece5fd8dbe9ca81373d55915e199c304ba9b4532f3708f49971f6b1dfd4a4686ba6ff4aa52ddd54123765551b1c3fb1e3520f753b180e673cebc71e7bac24f2aba30538a8aec8a849c2e92a89e9755391dcfe33b4ef34c3598d5e255e30cd0d7fb3ede65fff7594c06d39569d17655ec7332676b124e29d3c5b3a24aba82338388601629da385f36fbceaaa91716b32f926df488f552012fecfe610e7b7cff725bf313bbf13cf622e3354c5320a95309760c5", 0xa80}], 0x2, 0x0)
> | vmsplice(r1, &(0x7f0000001280)=[{&(0x7f0000001580)="29c7dedc1fc83e848a08d1ab6a6b70a8aba7434e6ad302c1f808d3e154d9a375904104ac349cb5edcd78b744ba6b27cc8f0ad38dda013828b1dcc8ef23a0b0cc865e9a527c9a0bf17f17e433f3700591c5b6a68a430559421b3a960dfb3d8e1a6c3b4144fc1820f4c11f013aaf7591983196aaed77577d52f8423632ce590d28c9f45cb91c745f407ac8af5e75892e589c1b5c73bc0f4646400545073a34214c4d5791711951fc2212cfcabd101ec69f54209c55df4ce1f258a84f979a13d2b01af941b372c111e518022ac74083fc15c3d649dfc3abff6af7d8dbb3e3a9702e3e6a175578176b02c1fbaef7060642f498626feb394e0fa0e6cca23849726911d16c6fb0507172611edd52f28abf5ba517dff8fbae13e8f5356e6a3fe6df0294180dfa82f4c0f8f3c0f297df774e986a48d5d83c6d669722c8e2516220f5f09189459282ab34a733acb7683f7f108bc5997190d527b56efb47e6a46cb3960ba626d3435d817a1ea26959b8a9cb52465dab60efa5d2e81fa8cc07fbbc9550b86163c9d603ae03e7000c50fbb0ce4c0481008892db89dc249bfb15929533f240f1d1a0c8a88e1b836e8ecf9510d10e98f2ef740ce61c885cf00a7aec6f8eee033212cb0d0c7a78a100745453c008502c08e9dfea6456f977956e208102c8ef93a275e4a1a41cbbd964d8f42ff81c729c5742cfc37d33965b07f1b2157e5aa456bb259dcaf9baa775c6d9474e7099eb811db6edefadb4b7a264d4b5c125fe514b710efe9c32ce8ec566e3d643e7771001dc64b72d44e8607032dc2b832a25b0a74db91bf2405fb69bd32fb7ca14ab52570a3074ec20b42ffc71e5bfb8a16582e2b48d784eb09d653af90419920ed7233eb64e6a22ad31833b17ab3d3b0d826ec7d69faf0c234187d6ec20844bb627bc1c476c3756b104e62c99dcd53f1e507f188be09ada0f7595cea3b3df488506398406a7782a0cade85e9ea925dc904d6028052e31b908ff3c7c1d0f0bb49be4208634c8bdb88463f7cb9942ab8621191679f5fc459906d9b63af25e1e7ef68bf7472fbed1585eb3620219691dfc692c3dbfe9aed440dccf2fb96dd7bb6f269d2043abdcc2f3aa469b9ace6e29dfe8846b9b9c17f7a0616fc59cbb66ff7c01a9fa670f4858de60decef9bd3fbf892691e73e40477b0ca55c5463322d8b228a3168a540b029072c8e6f56819fd2ef1486156429769f253d9126901b4e37b24261c127f9948674fcc3855b6f676e5979edd9b857c348ec176d2977fc09431cb35b1ad80fac77ec7a89d9aa9ecf72e66bab174da68cfeb902e5ac1ad8504514bf2fdc45b1fc716f02c4e490168a574bfb1b11dfabd9ead6ab86cc8bc8f861cea8e294bed3651dd78bf11ee548250bc749eb9d81e9fe9fc066884af8d7ea3e5b556e1e98797e8e2022e3fd523508f0b3b2e9cce819cf8779e77937d191e9ff2ece77c530af5ebbba5d642b1fc3e8c32bfe14a1535676e9ce6d23e89b0f22c2e9e739b073857c015cf75be65a0bdbe70145f13c9cdcf4eafcfb1bb733d1b26eb7a5992e3f810f0af8403bb75b55e4430a7d2e8724cbdd7a22c1d4043530c9ee4388ff08331a16664cd9f4348a64d10f5db094ac3336f830b8428b88ec3e46ab56d089efb61c0ef08dd478de5b54288c6d42640367922a9eb84c3024adda569a8792be03217d1b791f6d31e21dff9808e779411190d2294b210d67ffde952f1c65231f5ac82d15fc98072fc1e43605346e5116aa6b5d8372411913c80b06cc71c1ee0608e4ed215550c44db4e3d5ac295cacd486a97813589d96b7b1a105155176baeb6ed29092fa9327d0d7d67ab6c58f7c2f9fca5fe117c3b999e1492fcf9d5e101e9ad61c132e6e3165ee81ddbaa9e403feb2a66fca150735ad95d9b9295ade8c3c396c5bad4d48b05e1ed09cefafd7e3e35818ee5550e7c76ff52432105469a79e7133008dd025695c6ed2a459a8d6c7f49ba44a12b7c7bb9e91eff1634d1106abfbbc7c8fd690961184462bfdcfd1e8b8c730fd971b95e929db1089e04b179eaa094318bc31cb46bb86a64dac45ae0da7222c981d40b6b193c2fd6b374a4b483a690a25589e48e493c26f4604899653fcd7b18b7601140631a7beed7921e8ca6befb917d56fedcbf3be67da78b3dd6c0653af71ecda583570678c331ac0e98c881938a6e9835a627b48c72b5c604c92cbcb19ac7b68a619f45b9fcedc846e7210ed73307cb8a3b5d87f2a6f3a97e74aad9f84d5bbd395f67c5dc56b1b16ff5221962b24feeb38ea32ffcbe1aadc0cc85825533ce22734c2e72fafb99b1d7d4fa9140877a90fae4cf38fb90cf63c527e49f1c8a9ea5b3e4851f77df566439010776820214b6fae22a2f99ccbd86593034348d7b0668bd2ac4d4456f89ab5756a17f8484008905bd1a08430f5cdeac84525ced42092def29f4bf1b7703befceae16da3d43ff6a19aae61d27ce7b94fbdff56da0035511e0d36d7f01b5f15b5a1f42b3f4c07c03d68ce58dfce67ae6634ccdd382e4b68dbc1df0a098bd75ff459b7028d1caeb0c478a178e67253c013a2b9d19ec62f10b90c26259def748c2836cec94f92fa81470d5dbd891a5034bd5841c7b4af7a3bed1b67de385f7e0fb85ed53928da52a0b98e2ab5f32719578c3d4d8ca0f3f7d9c82ef9cd405e0322097f8372bb6e537754f0599bdaa657f2b9aadef7373e98e6dc9482e3dbe88aa554c70191f0ddcd8588da337d2d4389e4a468ec21ed7eb3653a45cb87cb3c626b7f1d03cb0a385fdaf3b30ca1ee145a7729958c330e69e91f058cac7c4b74149369266b758cf62f1b8474e0e1d96101812b6c24abb244ffac625b83cd50e8b1c8d9ba7dbb1e92fd9e6a8f158b3abff84c24413e772b18cdb12ecff68f516dc3191840e5611eb3e1581d477a496ee4403230decbe9cc1ef0aa1beda4c15fa9f70874a7494a83e450039ef2385d19fe381de1580449e206f31bd1ba17d99c60bc6eb164174d69320679089c474e4befdaabd688c4102047a808340094a38934d327f044ced1070f9c3d7fffe3863ff61417fb263c351d7324ddd8e4361398c87eaed9eab9a63a4f0a23122a1c758acf81f91a0f9109df4c8af6d4f0900fc3ef04f9e9c6825ee42d6d76e3ed75b845e5da333804af68fed85ad3271235f7ad87d6325c1426dcca790ae3f3c4c5980d1ae7d9cb5dc577f02babe2f0198468c12fd54a9222e2a6bdc70bea9e9c8b1d9a1e7f0b090dfefb36250bbac93f636aeac45b4f92d5b0b2fb3b91d36e48c8499ebe4c7f9d7aadc2a327e1c08201f695b072933d1d223c7fbe0c059244ca4f874e475336c6740e235ab9457e00c5da38686903edbfd5834385e9c730749002e798040f12a37e415f6ba49ad23ae79650ca233fcef58d8b412cdef87c70d5897b25dcf4f6caaa266da726104549a6158165b4724c5ec9ef7bc807f46beb2691cf3de90d2bf72ba26e55a47ae74e8a0854c4f8cf4a57ef38b9e431fd34850fa69336ed46912727532352ebf73595edaab6065917d3a2bb775ac7424fafd380be00adf58a8fdd2b0b6e1b536fd77876a8a18097b03ca0ccefb655bda134bd2386b1bc4b1489a38e54d025612c36305a245649fc89fe281cd8f93395f00f6dc5e1417f353b84d52f08cf2ce44a97601ab513cdb351a5a608e03ac8c233764572e3c006c203cea941cdeec4bda877edc887a07ff0003cf1342f1cc64857bffd264104f850a5382efe5cbe5c439af6007b2f64555b1d276388ef50134d0ca819b52105351a5570adc17757601d890", 0xa81}], 0x1, 0x0)
> 
> ... which blew up atop for-next/uaccess, but was fine atop v6.1-rc4.
> 
> A bisect ended with:
> 
>    c1dd028491d18d88d4ece4e5ad8473440a68a6b1 is the first bad commit
> 
> i.e.
> 
>    c1dd028491d18d88 ("arm64: Update copy_to_user()")
> 
> ... which seesm to agree with Will's analysis below.
> 
>> We've not had any reports of this before (or on other branches) and Mark
>> is unable to reproduce the problem on mainline. I suggested trying
>> for-next/uaccess and he confirmed that the problem has been introduced
>> there. Hopefully he can chime in with more details.
>>
>> In the meantime, I think it's safest to drop this branch for now considering
>> where we are in the release cycle. I'll leave it pushed to the arm64 git,
>> so we can put fixes on top and try to merge it next time around instead.
>>
>> One thing I _did_ notice from a quick skim of the code is:
>>
>>> +	/* Fixups... */
>>> +
>>> +	/*
>>> +	 * Fault on the first write, but progress may have been possible;
>>> +	 * realign dst and retry a single byte to confirm.
>>> +	 */
>>> +L(fixup_pre):
>>> +	mov	dst, dstin
>>> +U_dst(	ldtrb	A_lw, [src])
>>> +	strb	A_lw, [dst], #1
>>> +L(fixup_dst):
>>> +	sub	x0, dstend, dst
>>> +	ret
>>
>> Why is U_dst() being applied to the load for a copy_to_user() here? I
>> would've expected to annotate the store.
> 
> Ignoring the annotation, that's an LDTRB (an *unprivileged* load) feeding into
> an STRB (a *privileged* store), which is definitely the wrote way around for
> copy_*to*_user().
> 
> I think this is a copy-paste mixup with the code in copy_from_user(), since the
> two are identical here:

Indeed, I think one of the very last things I did was to rename and 
reorder a load of the fixup labels for consistency, and it does appear 
that I inadvertently made things a bit *too* consistent here :(

[ FWIW "fixup_dst" refers to the address in the dst register being 
correctly adjusted for the final return calculation, so "U_dst()" just 
marks a fault which doesn't need any pre-processing. Prior to that I was 
starting to confuse myself whether fixup_0 meant 0 offset or 0 bytes 
copied... ]

Sadly I don't see how this little mishap alone could explain the KASAN 
report, though - seems like that's going to need more digging, so given 
the timescale I'm happy if you'd rather play it safe this time around.

Thanks,
Robin.

> 
> # copy_from_user()
> |         /*
> |          * Fault before anything has been written, but progress may have
> |          * been possible; realign dst and retry a single byte to confirm.
> |          */
> | L(fixup_pre):
> |         mov     dst, dstin
> | U_dst(  ldtrb   A_lw, [src])
> |         strb    A_lw, [dst], #1
> | L(fixup_dst):
> |         sub     x0, dstend, dst
> |         ret
> 
> # copy_to_user()
> |         /*
> |          * Fault on the first write, but progress may have been possible;
> |          * realign dst and retry a single byte to confirm.
> |          */
> | L(fixup_pre):
> |         mov     dst, dstin
> | U_dst(  ldtrb   A_lw, [src])
> |         strb    A_lw, [dst], #1
> | L(fixup_dst):
> |         sub     x0, dstend, dst
> |         ret
> 
> Thanks,
> Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-12-05 17:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 11:58 [PATCH 0/3] arm64: Usercopy refresh Robin Murphy
2022-09-28 11:58 ` [PATCH 1/3] arm64: Update copy_from_user() Robin Murphy
2022-09-28 11:58 ` [PATCH 2/3] arm64: Update copy_to_user() Robin Murphy
2022-12-05 16:09   ` Will Deacon
2022-12-05 16:46     ` Mark Rutland
2022-12-05 17:31       ` Robin Murphy
2022-09-28 11:58 ` [PATCH 3/3] arm64: Garbage-collect usercopy leftovers Robin Murphy
2022-11-07 19:08 ` [PATCH 0/3] arm64: Usercopy refresh Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).