Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
From: Oliver Swede <oli.swede@arm.com>
To: Will Deacon <will@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 13/14] arm64: Add fixup routines for usercopy store exceptions
Date: Tue, 30 Jun 2020 19:48:21 +0000
Message-ID: <20200630194822.1082-14-oli.swede@arm.com> (raw)
In-Reply-To: <20200630194822.1082-1-oli.swede@arm.com>

This adds the fixup routines for exceptions that occur on store
operations while copying, by providing the calling code with a more
accurate value for the number of bytes that failed to copy, rather
than returning the full buffer width.

The three routines for store exceptions work together to analyse
the position of the fault relative to the start or the end of the
buffer, and backtrack from the optimized memcpy algorithm to
determine if some number of bytes has already been successfully
copied.

The store operations occur mostly in-order, with the exception of
a few copy size ranges - this is specific to the new copy template,
which uses the latest memcpy implementation.

Signed-off-by: Oliver Swede <oli.swede@arm.com>
---
 arch/arm64/lib/copy_user_fixup.S | 217 ++++++++++++++++++++++++++++++-
 1 file changed, 215 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/lib/copy_user_fixup.S b/arch/arm64/lib/copy_user_fixup.S
index ae94b492129d..37ca3d99a02a 100644
--- a/arch/arm64/lib/copy_user_fixup.S
+++ b/arch/arm64/lib/copy_user_fixup.S
@@ -169,12 +169,225 @@ addr	.req	x15
 	 */
 	b	L(end_fixup)
 	
+/*
+ * The following three routines are directed to from faults
+ * on store instructions.
+ */
 9996:
+	/*
+	 * This routine is reached from faults on store instructions
+	 * where the target address has been specified relative to the
+	 * start of the user space memory buffer, and is also not
+	 * guaranteed to be 16-byte aligned.
+	 *
+	 * For copy sizes <= 128 bytes, the stores occur in-order,
+	 * so it has copied up to (addr-dst)&~15.
+	 * For copy sizes > 128 bytes, this should only be directed
+	 * to from a fault on the first store of the long copy, before
+	 * the algorithm aligns dst to 16B, so no bytes have copied at
+	 * this point.
+	 */
+
+	/* Retrieve original params from stack */
+	ldr	dst, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count
+
+	/* count <= 3 -> count - (addr-dst) */
+	cmp	count, 3
+	sub	x0, addr, dst // relative fault offset in buffer
+	sub	x0, count, x0 // bytes yet to copy
+	b.le	L(end_fixup)
+        /* 3 < count <= 32 -> count - (addr-dst) */
+	cmp	count, 32
+	b.le	L(none_copied)
+	/* 32 < count < 128 -> count - ((addr-dst)&15) */
+	cmp	count, 128
+	sub	x0, addr, dst // relative fault offset
+	bic	x0, x0, 15 // bytes already copied (steps of 16B stores)
+	sub	x0, count, x0 // bytes yet to copy
+	b.le	L(end_fixup)
+	/* 128 < count -> count */
+	b	L(none_copied)
+
 9997:
+	/*
+	 * This routine is reached from faults on store instructions
+	 * where the target address has been specified relative to
+	 * the end of the user space memory buffer and is also not
+	 * guaranteed to be 16-byte aligned.
+	 *
+	 * In this scenario, the copy is close to completion and
+	 * has occurred in-order, so it is straightforward to
+	 * calculate the remaining bytes.
+	 *
+	 * The algorithm increments dst by 64B for each loop64()
+	 * subroutine, and tmp1 stores its latest value, which
+	 * allows for the calculation of a threshold that it has
+	 * copied up to.
+	 * 
+	 * To account for faults on data that has already been copied
+	 * (which can occur due to the way the algorithm uses
+	 * overlapping copies within a buffer), this threshold can be
+	 * subtracted from the copy size and the result compared
+	 * against the remaining bytes after the fault offset in the
+	 * last 64B; the minimum of the two can then be taken for the
+	 * return value.
+	 */
+
+	/* Save the current (adjusted) dst for later restoration */
+	mov	tmp1, dst
+
+	/* Retrieve original params from stack */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count	
+
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+	
+	/*
+	 * For copy size >128 bytes, select max of
+	 * { tmp1-dst+80, ((addr-dstend+64)&15) }
+	 */
+	sub	tmp1, tmp1, dst // relative position of new dst
+	add	tmp1, tmp1, 80 // copied up to here
+	sub	tmp1, count, tmp1 // remaining bytes after non-overlapping section
+	sub	x0, dstend, 64
+	sub	x0, addr, x0
+	bic	x0, x0, 15 // fault offset within dest. buffer
+	add	x0, dstend, x0
+	sub	x0, x0, 64
+	sub	x0, dstend, x0 // remaining bytes in final (overlapping) 64B
+	cmp	x0, tmp1
+	csel	x0, x0, tmp1, lt
+	cmp	count, 128
+	b.gt	L(end_fixup)
+
+	cmp	count, 2
+	b.le	L(none_copied)
+	cmp	count, 3 // one byte left
+	mov	x0, 1
+	b.eq	L(end_fixup)
+	cmp	count, 7 // four bytes left
+	sub	x0, count, 4
+	b.le	L(end_fixup)
+	cmp	count, 15 // eight bytes left
+	sub	x0, count, 8
+	b.le	L(end_fixup)
+	cmp	count, 32 // 16 bytes left
+	sub	x0, count, 16	
+	b.le	L(end_fixup)
+	/*
+	 * For copy size 33..64 select min of
+	 * {(32 - fault_offset), (count-32)}
+	 * as copy may overlap
+	 */
+	sub	tmp1, dstend, 32
+	sub	tmp1, addr, tmp1
+	bic	tmp1, tmp1, 15
+	mov	x0, 32
+	sub	tmp1, x0, tmp1
+	sub	x0, count, 32
+	cmp	x0, tmp1
+	csel	x0, x0, tmp1, lt
+	cmp	count, 64
+	b.le	L(end_fixup)
+	/* For copy size 65..96 select min of
+	 * {(count - fault_offset), (count-64)}
+	 * as copy may overlap
+	 */
+	sub	tmp1, dstend, 32
+	sub	tmp1, addr, tmp1
+	bic	tmp1, tmp1, 15
+	mov	x0, 32
+	sub	tmp1, x0, tmp1
+	sub	x0, count, 64
+	cmp	x0, tmp1
+	csel	x0, x0, tmp1, lt
+	cmp	count, 96
+	b.lt	L(end_fixup)
+	/*
+	 * For copy size 97..128 same as above, but account for
+	 * out-of-order initial stores, where no bytes have been
+	 * copied on those faults
+	 */
+	sub	tmp1, dstend, 64
+	sub	tmp1, addr, tmp1
+	bic	tmp1, tmp1, 15
+	mov	x0, 64
+	sub	tmp1, x0, tmp1
+	cmp	count, 128
+	mov	x0, 32
+	ccmp	tmp1, x0, #0, le
+	b.gt	L(none_copied) // none copied if faults in first or second chunk
+	sub	x0, count, 64
+	cmp	x0, tmp1
+	csel	x0, x0, tmp1, lt
+	cmp	count, 128
+	b.le	L(end_fixup)
+
+	b	L(none_copied)
+
 9998:
-	/* Retrieve info from stack */
+	/*
+	 * This routine is reached from faults on store instructions
+	 * where the target address has been specified relative to the
+	 * start of the user space memory buffer, and is also
+	 * guaranteed to be 16-byte aligned.
+	 *
+	 * These instrucions occur after the algorithm aligns dst
+	 * to 16B, which is after the very first store in a long copy.
+	 * It then continues copying from dst+16 onwards.
+	 *
+	 * This could result in the second store attempting to copy
+	 * data that has already been copied, as there would be an
+	 * overlap in the buffer if the original dst is not 16 bytes
+	 * aligned. In this case we report that 16 bytes has already
+	 * already been copied, so it must take the minimum of the
+	 * two values.
+	 */
+
+	/* Retrieve original params from stack */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
 	ldr	count, [sp], #16	// count: x2
-	add	sp, sp, 32
+	add	srcend, src, count
+	add	dstend, dst, count	
+
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+
+	/* Take the min from {16,(fault_addr&15)-(dst&15)}
+	 * and subtract from count to obtain the return value */
+	bic	tmp1, dst, 15 // aligned dst
+	bic	x0, addr, 15
+	sub	x0, x0, tmp1 // relative fault offset
+	cmp	x0, 16
+	bic	x0, addr, 15
+	sub	x0, x0, dst
+	sub	x0, count, x0
+	b.gt	L(end_fixup)
+	sub	x0, count, 16
+	b	L(end_fixup) // initial unaligned chunk copied
+
 L(none_copied):
 	/*
 	 * Return the initial count as the number
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-30 19:48 [PATCH v4 00/14] arm64: Optimise and update memcpy, user copy and string routines Oliver Swede
2020-06-30 19:48 ` [PATCH v4 01/14] arm64: Allow passing fault address to fixup handlers Oliver Swede
2020-06-30 19:48 ` [PATCH v4 02/14] arm64: kprobes: Drop open-coded exception fixup Oliver Swede
2020-06-30 19:48 ` [PATCH v4 03/14] arm64: Import latest version of Cortex Strings' memcmp Oliver Swede
2020-06-30 19:48 ` [PATCH v4 04/14] arm64: Import latest version of Cortex Strings' memmove Oliver Swede
2020-06-30 19:48 ` [PATCH v4 05/14] arm64: Import latest version of Cortex Strings' strcmp Oliver Swede
2020-06-30 19:48 ` [PATCH v4 06/14] arm64: Import latest version of Cortex Strings' strlen Oliver Swede
2020-06-30 19:48 ` [PATCH v4 07/14] arm64: Import latest version of Cortex Strings' strncmp Oliver Swede
2020-06-30 19:48 ` [PATCH v4 08/14] arm64: Import latest optimization of memcpy Oliver Swede
2020-06-30 19:48 ` [PATCH v4 09/14] arm64: Tidy up _asm_extable_faultaddr usage Oliver Swede
2020-06-30 19:48 ` [PATCH v4 10/14] arm64: Store the arguments to copy_*_user on the stack Oliver Swede
2020-06-30 19:48 ` [PATCH v4 11/14] arm64: Use additional memcpy macros and fixups Oliver Swede
2020-06-30 19:48 ` [PATCH v4 12/14] arm64: Add fixup routines for usercopy load exceptions Oliver Swede
2020-06-30 19:48 ` Oliver Swede [this message]
2020-06-30 19:48 ` [PATCH v4 14/14] arm64: Improve accuracy of fixup for UAO cases Oliver Swede
2020-07-01  8:12 ` [PATCH v4 00/14] arm64: Optimise and update memcpy, user copy and string routines Oli Swede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200630194822.1082-14-oli.swede@arm.com \
    --to=oli.swede@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git