All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Swede <oli.swede@arm.com>
To: Will Deacon <will@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v3 12/13] arm64: Add fixup routines for usercopy load exceptions
Date: Thu, 14 May 2020 14:32:26 +0000	[thread overview]
Message-ID: <20200514143227.605-13-oli.swede@arm.com> (raw)
In-Reply-To: <20200514143227.605-1-oli.swede@arm.com>

This adds the fixup routines for exceptions that occur on load
operations while copying, by providing the calling code with a more
accurate value for the number of bytes that failed to copy.

The three routines for load exceptions work together to analyse
the position of the fault relative to the start or the end of the
buffer, and backtrack from the optimized memcpy algorithm to
determine if some number of bytes has already been successfully
copied.

The new template uses out-of-order copying, and this fixup routine is
specific to the latest memcpy implementation. It is assumed there is
no requirement to follow through with the copying of data that may
reside in temporary registers on a fault, as this would greatly
increase the fixup's complexity.

Signed-off-by: Oliver Swede <oli.swede@arm.com>
---
 arch/arm64/lib/copy_user_fixup.S | 170 ++++++++++++++++++++++++++++++-
 1 file changed, 165 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/lib/copy_user_fixup.S b/arch/arm64/lib/copy_user_fixup.S
index f878c8831b14..256a33522749 100644
--- a/arch/arm64/lib/copy_user_fixup.S
+++ b/arch/arm64/lib/copy_user_fixup.S
@@ -3,19 +3,179 @@
 addr	.req	x15
 .section .fixup,"ax"
 .align	2
+
+	/*
+	 * These fixup routines assume that it is not a requirement
+	 * to follow through with the copying of any intermediate
+	 * data in registers: this would be highly dependent on the
+	 * procedure in the copy template, which utilizes out-of-order
+	 * copying and is subject to change by future optimizations.
+	 *
+	 * The subroutine that is excuted depends on the properties of
+	 * the target address in the instruction: if it is an address
+	 * relative to the start or the end of the buffer, and (in the
+	 * case of copy sizes larger than 128 bytes) whether it is
+	 * aligned or unaligned with 16 bytes.
+	 */
+
+	/*
+	 * The following three routines are directed to from faults on load
+	 * instructions. In each case, nothing has been copied if either:
+	 *	a) the copy size is less than 128 bytes, as the algorithm does
+	 * 	   not interleave load/store instruction, so nothing has been
+	 *	   copied and the full width (srcend-src) can be returned
+	 *	b) the copy size is greater than or equal to 128 bytes and the
+	 *	   src and dst buffers overlap, as this would result in a
+	 *	   backwards copy, but the calling code expects the return
+	 *	   value (no. bytes not copied) to be from an in-order
+	 *	   perspective.
+	 */
 9993:
+	/*
+	 * This is reached from load instructions that are specified
+	 * relative to the start of a user space memory buffer, and
+	 * are not guaranteed to be aligned to 16B.
+	 */
+
+	/* Retrieve useful information & free the stack area */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count
+
+	/* Copy size < 128 bytes */
+	cmp	count, 128
+	b.ls	L(none_copied)
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+
+	/*
+	 * The fault occurred in a load instruction at the start of the
+	 * algorithm's subroutine for long copies, and no bytes have
+	 * been stored at this point.
+	 */
+	b	L(none_copied)
 9994:
+	/*
+	 * This is reached from load instructions that are specified
+	 * relative to the end of a user space memory buffer, and
+	 * are not guaranteed to be aligned to 16B.
+	 */
+
+	/* Store the current dst before the original is
+	 * restored */
+	mov	tmp1, dst
+
+	/* Retrieve useful information & free the stack area */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count
+
+	/* Copy size < 128 bytes */
+	cmp	count, 128
+	b.ls	L(none_copied)
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+
+	/*
+	 * In the case of an access relative to the end of
+	 * the buffer, the long copy has reached the final
+	 * 64B chunk in copy64_from_end().
+	 *
+	 * The fault address should fall in 1-of-4 16B blocks,
+	 * each of which indicates how many bytes have been
+	 * stored from the in-order perspective.
+	 *
+	 * A number of iterations of the loop to copy 64B
+	 * chunks may have completed; tmp1 stores the
+	 * latest position of the dst pointer and this can
+	 * be used to deduce how many 16B copies have taken
+	 * place.
+	 */
+
+	/* Calculate the index of the 16B block containing
+	 * containing the fault address */
+	sub	x0, srcend, 64
+	cmp	addr, x0
+	b.lt	L(none_copied) // unexpected fault address
+	sub	x0, addr, x0 // relative offset of fault in buffer
+	bic	x0, x0, 15 // assume no data copied between addr and the target
+	sub	tmp1, dst, tmp1 // already copied up to dst
+	add	x0, x0, tmp1 // plus the difference between addr and srcend-64
+	sub	x0, count, x0 // no. bytes not copied
+	b	L(end_fixup)
+
 9995:
+	/*
+	 * This is reached from load instructions that are specified
+	 * relative to the start of a user space memory buffer, and
+	 * are guaranteed to be aligned to 16B.
+	 */
+
+	/* Retrieve useful information & free the stack area */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count
+
+	/* Copy size <= 128 bytes */
+	cmp	count, 128
+	b.ls	L(none_copied)
+
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+
+	sub	x0, addr, src
+	lsr	x0, addr, 4 // calculate the index of faulting 16B block
+	/* Map the index in x0 to the no. bytes already copied */
+	cmp	x0, 1
+	b.le	L(none_copied) // no stores for i=0,1
+	cmp	x0, 4
+	mov	tmp1, x0
+	sub	x0, count, 16
+	b.le	L(end_fixup) // one store (16B) for i=2,3,4
+	/* Faulted in a loop: stored up to ((i-3) * 16) - (dst % 16) */
+	mov	x0, tmp1
+	sub	x0, x0, 3
+	lsl	x0, x0, 4
+	mov	tmp1, dst
+	and	tmp1, tmp1, 15
+	sub	x0, x0, tmp1
+	b	L(end_fixup)
+
 9996:
 9997:
 9998:
-	/* Retrieve useful information & free the stack area */
-	ldr	count, [sp, #16]	// x2
-	add	sp, sp, 32
+L(none_copied):
 	/*
-	 * Return the initial count as the (under-estimated) number
+	 * Return the initial count as the number
 	 * of bytes that failed to copy
 	 */
 	mov	x0, count
+L(end_fixup):
 	ret
-
-- 
2.17.1


WARNING: multiple messages have this Message-ID (diff)
From: Oliver Swede <oli.swede@arm.com>
To: Will Deacon <will@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: [PATCH v3 12/13] arm64: Add fixup routines for usercopy load exceptions
Date: Thu, 14 May 2020 14:32:26 +0000	[thread overview]
Message-ID: <20200514143227.605-13-oli.swede@arm.com> (raw)
In-Reply-To: <20200514143227.605-1-oli.swede@arm.com>

This adds the fixup routines for exceptions that occur on load
operations while copying, by providing the calling code with a more
accurate value for the number of bytes that failed to copy.

The three routines for load exceptions work together to analyse
the position of the fault relative to the start or the end of the
buffer, and backtrack from the optimized memcpy algorithm to
determine if some number of bytes has already been successfully
copied.

The new template uses out-of-order copying, and this fixup routine is
specific to the latest memcpy implementation. It is assumed there is
no requirement to follow through with the copying of data that may
reside in temporary registers on a fault, as this would greatly
increase the fixup's complexity.

Signed-off-by: Oliver Swede <oli.swede@arm.com>
---
 arch/arm64/lib/copy_user_fixup.S | 170 ++++++++++++++++++++++++++++++-
 1 file changed, 165 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/lib/copy_user_fixup.S b/arch/arm64/lib/copy_user_fixup.S
index f878c8831b14..256a33522749 100644
--- a/arch/arm64/lib/copy_user_fixup.S
+++ b/arch/arm64/lib/copy_user_fixup.S
@@ -3,19 +3,179 @@
 addr	.req	x15
 .section .fixup,"ax"
 .align	2
+
+	/*
+	 * These fixup routines assume that it is not a requirement
+	 * to follow through with the copying of any intermediate
+	 * data in registers: this would be highly dependent on the
+	 * procedure in the copy template, which utilizes out-of-order
+	 * copying and is subject to change by future optimizations.
+	 *
+	 * The subroutine that is excuted depends on the properties of
+	 * the target address in the instruction: if it is an address
+	 * relative to the start or the end of the buffer, and (in the
+	 * case of copy sizes larger than 128 bytes) whether it is
+	 * aligned or unaligned with 16 bytes.
+	 */
+
+	/*
+	 * The following three routines are directed to from faults on load
+	 * instructions. In each case, nothing has been copied if either:
+	 *	a) the copy size is less than 128 bytes, as the algorithm does
+	 * 	   not interleave load/store instruction, so nothing has been
+	 *	   copied and the full width (srcend-src) can be returned
+	 *	b) the copy size is greater than or equal to 128 bytes and the
+	 *	   src and dst buffers overlap, as this would result in a
+	 *	   backwards copy, but the calling code expects the return
+	 *	   value (no. bytes not copied) to be from an in-order
+	 *	   perspective.
+	 */
 9993:
+	/*
+	 * This is reached from load instructions that are specified
+	 * relative to the start of a user space memory buffer, and
+	 * are not guaranteed to be aligned to 16B.
+	 */
+
+	/* Retrieve useful information & free the stack area */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count
+
+	/* Copy size < 128 bytes */
+	cmp	count, 128
+	b.ls	L(none_copied)
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+
+	/*
+	 * The fault occurred in a load instruction at the start of the
+	 * algorithm's subroutine for long copies, and no bytes have
+	 * been stored at this point.
+	 */
+	b	L(none_copied)
 9994:
+	/*
+	 * This is reached from load instructions that are specified
+	 * relative to the end of a user space memory buffer, and
+	 * are not guaranteed to be aligned to 16B.
+	 */
+
+	/* Store the current dst before the original is
+	 * restored */
+	mov	tmp1, dst
+
+	/* Retrieve useful information & free the stack area */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count
+
+	/* Copy size < 128 bytes */
+	cmp	count, 128
+	b.ls	L(none_copied)
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+
+	/*
+	 * In the case of an access relative to the end of
+	 * the buffer, the long copy has reached the final
+	 * 64B chunk in copy64_from_end().
+	 *
+	 * The fault address should fall in 1-of-4 16B blocks,
+	 * each of which indicates how many bytes have been
+	 * stored from the in-order perspective.
+	 *
+	 * A number of iterations of the loop to copy 64B
+	 * chunks may have completed; tmp1 stores the
+	 * latest position of the dst pointer and this can
+	 * be used to deduce how many 16B copies have taken
+	 * place.
+	 */
+
+	/* Calculate the index of the 16B block containing
+	 * containing the fault address */
+	sub	x0, srcend, 64
+	cmp	addr, x0
+	b.lt	L(none_copied) // unexpected fault address
+	sub	x0, addr, x0 // relative offset of fault in buffer
+	bic	x0, x0, 15 // assume no data copied between addr and the target
+	sub	tmp1, dst, tmp1 // already copied up to dst
+	add	x0, x0, tmp1 // plus the difference between addr and srcend-64
+	sub	x0, count, x0 // no. bytes not copied
+	b	L(end_fixup)
+
 9995:
+	/*
+	 * This is reached from load instructions that are specified
+	 * relative to the start of a user space memory buffer, and
+	 * are guaranteed to be aligned to 16B.
+	 */
+
+	/* Retrieve useful information & free the stack area */
+	ldp	dst, src, [sp], #16	// dst: x3, src: x1
+	ldr	count, [sp], #16	// count: x2
+	add	srcend, src, count
+	add	dstend, dst, count
+
+	/* Copy size <= 128 bytes */
+	cmp	count, 128
+	b.ls	L(none_copied)
+
+	/*
+	 * Overlapping buffers:
+	 * (src <= dst && dst < srcend) || (dst <= src && src < dstend)
+	 */
+	cmp	src, dst
+	ccmp	dst, srcend, #0, le
+	b.lt	L(none_copied)
+	cmp	dst, src
+	ccmp	src, dstend, #0, le
+	b.lt	L(none_copied)
+
+	sub	x0, addr, src
+	lsr	x0, addr, 4 // calculate the index of faulting 16B block
+	/* Map the index in x0 to the no. bytes already copied */
+	cmp	x0, 1
+	b.le	L(none_copied) // no stores for i=0,1
+	cmp	x0, 4
+	mov	tmp1, x0
+	sub	x0, count, 16
+	b.le	L(end_fixup) // one store (16B) for i=2,3,4
+	/* Faulted in a loop: stored up to ((i-3) * 16) - (dst % 16) */
+	mov	x0, tmp1
+	sub	x0, x0, 3
+	lsl	x0, x0, 4
+	mov	tmp1, dst
+	and	tmp1, tmp1, 15
+	sub	x0, x0, tmp1
+	b	L(end_fixup)
+
 9996:
 9997:
 9998:
-	/* Retrieve useful information & free the stack area */
-	ldr	count, [sp, #16]	// x2
-	add	sp, sp, 32
+L(none_copied):
 	/*
-	 * Return the initial count as the (under-estimated) number
+	 * Return the initial count as the number
 	 * of bytes that failed to copy
 	 */
 	mov	x0, count
+L(end_fixup):
 	ret
-
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-05-14 14:33 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14 14:32 [PATCH v3 00/13] arm64: Optimise and update memcpy, user copy and string routines Oliver Swede
2020-05-14 14:32 ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 01/13] arm64: Allow passing fault address to fixup handlers Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 02/13] arm64: kprobes: Drop open-coded exception fixup Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 03/13] arm64: Import latest version of Cortex Strings' memcmp Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 04/13] arm64: Import latest version of Cortex Strings' memmove Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 05/13] arm64: Import latest version of Cortex Strings' strcmp Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 06/13] arm64: Import latest version of Cortex Strings' strlen Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 07/13] arm64: Import latest version of Cortex Strings' strncmp Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 08/13] arm64: Import latest optimization of memcpy Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 09/13] arm64: Tidy up _asm_extable_faultaddr usage Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 10/13] arm64: Store the arguments to copy_*_user on the stack Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` [PATCH v3 11/13] arm64: Use additional memcpy macros and fixups Oliver Swede
2020-05-14 14:32   ` Oliver Swede
2020-05-14 14:32 ` Oliver Swede [this message]
2020-05-14 14:32   ` [PATCH v3 12/13] arm64: Add fixup routines for usercopy load exceptions Oliver Swede
2020-05-14 14:32 ` [PATCH v3 13/13] arm64: Add fixup routines for usercopy store exceptions Oliver Swede
2020-05-14 14:32   ` Oliver Swede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200514143227.605-13-oli.swede@arm.com \
    --to=oli.swede@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.