linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
@ 2020-10-29 20:59 Peter Collingbourne
  2020-11-03 15:28 ` Mark Rutland
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Collingbourne @ 2020-10-29 20:59 UTC (permalink / raw)
  To: Mark Brown, Mark Rutland, Will Deacon, Catalin Marinas, Andrey Konovalov
  Cc: Peter Collingbourne, Linux ARM

By using outlined checks we can achieve a significant code size
improvement by moving the tag-based ASAN checks into separate
functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these
functions have a custom calling convention that preserves most
registers and is specialized to the register containing the address
and the type of access, and as a result we can eliminate the code
size and performance overhead of a standard calling convention such
as AAPCS for these functions.

This change depends on a separate series of changes to Clang [1] to
support outlined checks in the kernel, although the change works fine
without them (we just don't get outlined checks). This is because the
flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang
changes land. The flag was introduced in the Clang 9.0 timeframe as
part of the support for outlined checks in userspace and because our
minimum Clang version is 10.0 we can pass it unconditionally.

Outlined checks require a new runtime function with a custom calling
convention. Add this function to arch/arm64/lib.

I measured the code size of defconfig + tag-based KASAN, as well
as boot time (i.e. time to init launch) on a DragonBoard 845c with
an Android arm64 GKI kernel. The results are below:

                               code size    boot time
CONFIG_KASAN_INLINE=y before    92824064      6.18s
CONFIG_KASAN_INLINE=y after     38822400      6.65s
CONFIG_KASAN_OUTLINE=y          39215616     11.48s

We can see straight away that specialized outlined checks beat the
existing CONFIG_KASAN_OUTLINE=y on both code size and boot time
for tag-based ASAN.

As for the comparison between CONFIG_KASAN_INLINE=y before and after
we saw similar performance numbers in userspace [2] and decided
that since the performance overhead is minimal compared to the
overhead of tag-based ASAN itself as well as compared to the code
size improvements we would just replace the inlined checks with the
specialized outlined checks without the option to select between them,
and that is what I have implemented in this patch. But we may make a
different decision for the kernel such as having CONFIG_KASAN_OUTLINE=y
turn on specialized outlined checks if Clang is new enough.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76
Link: [1] https://reviews.llvm.org/D90426
Link: [2] https://reviews.llvm.org/D56954
---
 arch/arm64/include/asm/asm-prototypes.h |  6 +++
 arch/arm64/include/asm/module.lds.h     | 17 +++++++-
 arch/arm64/lib/Makefile                 |  2 +
 arch/arm64/lib/kasan_sw_tags.S          | 54 +++++++++++++++++++++++++
 mm/kasan/tags.c                         |  7 ++++
 scripts/Makefile.kasan                  |  1 +
 6 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/lib/kasan_sw_tags.S

diff --git a/arch/arm64/include/asm/asm-prototypes.h b/arch/arm64/include/asm/asm-prototypes.h
index 1c9a3a0c5fa5..ec1d9655f885 100644
--- a/arch/arm64/include/asm/asm-prototypes.h
+++ b/arch/arm64/include/asm/asm-prototypes.h
@@ -23,4 +23,10 @@ long long __ashlti3(long long a, int b);
 long long __ashrti3(long long a, int b);
 long long __lshrti3(long long a, int b);
 
+/*
+ * This function uses a custom calling convention and cannot be called from C so
+ * this prototype is not entirely accurate.
+ */
+void __hwasan_tag_mismatch(unsigned long addr, unsigned long access_info);
+
 #endif /* __ASM_PROTOTYPES_H */
diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
index 691f15af788e..4a6d717f75f3 100644
--- a/arch/arm64/include/asm/module.lds.h
+++ b/arch/arm64/include/asm/module.lds.h
@@ -1,7 +1,20 @@
-#ifdef CONFIG_ARM64_MODULE_PLTS
 SECTIONS {
+#ifdef CONFIG_ARM64_MODULE_PLTS
 	.plt (NOLOAD) : { BYTE(0) }
 	.init.plt (NOLOAD) : { BYTE(0) }
 	.text.ftrace_trampoline (NOLOAD) : { BYTE(0) }
-}
 #endif
+
+#ifdef CONFIG_KASAN_SW_TAGS
+	/*
+	 * Outlined checks go into comdat-deduplicated sections named .text.hot.
+	 * Because they are in comdats they are not combined by the linker and
+	 * we otherwise end up with multiple sections with the same .text.hot
+	 * name in the .ko file. The kernel module loader warns if it sees
+	 * multiple sections with the same name so we use this sections
+	 * directive to force them into a single section and silence the
+	 * warning.
+	 */
+	.text.hot : { *(.text.hot) }
+#endif
+}
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index d31e1169d9b8..8e60d76a1b47 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -18,3 +18,5 @@ obj-$(CONFIG_CRC32) += crc32.o
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 
 obj-$(CONFIG_ARM64_MTE) += mte.o
+
+obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
diff --git a/arch/arm64/lib/kasan_sw_tags.S b/arch/arm64/lib/kasan_sw_tags.S
new file mode 100644
index 000000000000..f72f0e8e1192
--- /dev/null
+++ b/arch/arm64/lib/kasan_sw_tags.S
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Google LLC
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+/*
+ * Report a tag mismatch detected by tag-based KASAN.
+ *
+ * This function has a custom calling convention in order to minimize the sizes
+ * of the compiler-generated thunks that call it. All registers except for x16
+ * and x17 must be preserved. This includes x0 and x1 which are used by the
+ * caller to pass arguments, and x29 (the frame pointer register). In order to
+ * allow these registers to be restored the caller spills them to stack. The 256
+ * bytes of stack space allocated by the caller must be deallocated on return.
+ *
+ * This function takes care of transitioning to the standard AAPCS calling
+ * convention and calls the C function kasan_tag_mismatch to report the error.
+ *
+ * Parameters:
+ *	x0 - the fault address
+ *	x1 - an encoded description of the faulting access
+ */
+SYM_FUNC_START(__hwasan_tag_mismatch)
+	add	x29, sp, #232
+	stp	x2, x3, [sp, #16]
+	stp	x4, x5, [sp, #32]
+	stp	x6, x7, [sp, #48]
+	stp	x8, x9, [sp, #64]
+	stp	x10, x11, [sp, #80]
+	stp	x12, x13, [sp, #96]
+	stp	x14, x15, [sp, #112]
+#ifndef CONFIG_SHADOW_CALL_STACK
+	str	x18, [sp, #128]
+#endif
+	mov	x2, x30
+	bl	kasan_tag_mismatch
+	ldp	x29, x30, [sp, #232]
+#ifndef CONFIG_SHADOW_CALL_STACK
+	ldr	x18, [sp, #128]
+#endif
+	ldp	x14, x15, [sp, #112]
+	ldp	x12, x13, [sp, #96]
+	ldp	x10, x11, [sp, #80]
+	ldp	x8, x9, [sp, #64]
+	ldp	x6, x7, [sp, #48]
+	ldp	x4, x5, [sp, #32]
+	ldp	x2, x3, [sp, #16]
+	ldp	x0, x1, [sp], #256
+	ret
+SYM_FUNC_END(__hwasan_tag_mismatch)
+EXPORT_SYMBOL(__hwasan_tag_mismatch)
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index e02a36a51f42..d00613956c79 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -198,3 +198,10 @@ struct kasan_track *kasan_get_free_track(struct kmem_cache *cache,
 
 	return &alloc_meta->free_track[i];
 }
+
+void kasan_tag_mismatch(unsigned long addr, unsigned long access_info,
+			unsigned long ret_ip)
+{
+	kasan_report(addr, 1 << (access_info & 0xf), access_info & 0x10,
+		     ret_ip);
+}
diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
index 1e000cc2e7b4..1f2cccc264d8 100644
--- a/scripts/Makefile.kasan
+++ b/scripts/Makefile.kasan
@@ -44,6 +44,7 @@ endif
 CFLAGS_KASAN := -fsanitize=kernel-hwaddress \
 		-mllvm -hwasan-instrument-stack=$(CONFIG_KASAN_STACK) \
 		-mllvm -hwasan-use-short-granules=0 \
+		-mllvm -hwasan-inline-all-checks=0 \
 		$(instrumentation_flags)
 
 endif # CONFIG_KASAN_SW_TAGS
-- 
2.29.1.341.ge80a0c044ae-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
  2020-10-29 20:59 [PATCH] kasan: arm64: support specialized outlined tag mismatch checks Peter Collingbourne
@ 2020-11-03 15:28 ` Mark Rutland
  2020-11-06  1:12   ` Peter Collingbourne
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Rutland @ 2020-11-03 15:28 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Mark Brown, Will Deacon, Linux ARM, Andrey Konovalov

Hi Peter,

On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> Outlined checks require a new runtime function with a custom calling
> convention. Add this function to arch/arm64/lib.

[...]

> +/*
> + * Report a tag mismatch detected by tag-based KASAN.
> + *
> + * This function has a custom calling convention in order to minimize the sizes
> + * of the compiler-generated thunks that call it. All registers except for x16
> + * and x17 must be preserved. This includes x0 and x1 which are used by the
> + * caller to pass arguments, and x29 (the frame pointer register). In order to
> + * allow these registers to be restored the caller spills them to stack. The 256
> + * bytes of stack space allocated by the caller must be deallocated on return.

This is a bit difficult to follow.

Am I right in understanding that the caller has pre-allocated 256 bytes
on the stack by decrementing the SP, yet it's up to the callee to undo
that? That seems bizarre to me, and it would be much easier (and more
generally useful) for the caller to leave it to the trampoline to
allocate/free its own space.

> + *
> + * This function takes care of transitioning to the standard AAPCS calling
> + * convention and calls the C function kasan_tag_mismatch to report the error.
> + *
> + * Parameters:
> + *	x0 - the fault address
> + *	x1 - an encoded description of the faulting access
> + */
> +SYM_FUNC_START(__hwasan_tag_mismatch)

For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
and place any BTI instructions necessary manually.

We should probably add SYM_NONSTD_FUNC_*() helpers for trampolines, as
it looks like we didn't add BTIs into the ftrace trampolines (and module
PLTs can get there via indirect branches). I'll put together a
point-fix for ftrace.

> +	add	x29, sp, #232

Where has this offset come from? Is there already a frame record created
by the caller? If so, that should be mentioned in the description of the
calling convention. If not, I don't think this code works.

> +	stp	x2, x3, [sp, #16]
> +	stp	x4, x5, [sp, #32]
> +	stp	x6, x7, [sp, #48]
> +	stp	x8, x9, [sp, #64]
> +	stp	x10, x11, [sp, #80]
> +	stp	x12, x13, [sp, #96]
> +	stp	x14, x15, [sp, #112]

Is there a reason this starts at an offset of 16?

For offsets please use (16 * n) like we do in entry.S, e.g.

|	stp	x2, x3, [sp,  #16 * 0]
|	stp	x4, x5, [sp,  #16 * 1]
|	stp	x4, x5, [sp,  #16 * 2]

... as it makes the relationship clearer.

> +#ifndef CONFIG_SHADOW_CALL_STACK
> +	str	x18, [sp, #128]
> +#endif

This doesn't look right to me. The shadow call stack should remain
balanced across a call, so why do you need to save x18? any manipulation
within kasan_tag_mismatch should be balanced.

Did you mean to save the return address to the shadow stack, rather
than saving the shadow stack pointer?

> +	mov	x2, x30
> +	bl	kasan_tag_mismatch

We didn't preserve x0 and x1 above, so they can be clobbered here, but
the comment said we needed to preserve them.

> +	ldp	x29, x30, [sp, #232]

This doesn't look right. This function didn't save x29/x30 to the stack,
so the only way this could work is if the caller had a redundant frame
record pointing to itself.

> +#ifndef CONFIG_SHADOW_CALL_STACK
> +	ldr	x18, [sp, #128]
> +#endif

As above, this doesn't look right.

> +	ldp	x14, x15, [sp, #112]
> +	ldp	x12, x13, [sp, #96]
> +	ldp	x10, x11, [sp, #80]
> +	ldp	x8, x9, [sp, #64]
> +	ldp	x6, x7, [sp, #48]
> +	ldp	x4, x5, [sp, #32]
> +	ldp	x2, x3, [sp, #16]
> +	ldp	x0, x1, [sp], #256

As above, we didn't save x0 and x1 -- did the caller do that?

Likewise, did the caller allocate 256 bytes that we have to deallocate?
That's *really* confusing.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
  2020-11-03 15:28 ` Mark Rutland
@ 2020-11-06  1:12   ` Peter Collingbourne
  2020-11-13 11:22     ` Mark Rutland
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Collingbourne @ 2020-11-06  1:12 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Mark Brown, Will Deacon, Linux ARM, Andrey Konovalov

On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Peter,
>
> On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > Outlined checks require a new runtime function with a custom calling
> > convention. Add this function to arch/arm64/lib.
>
> [...]
>
> > +/*
> > + * Report a tag mismatch detected by tag-based KASAN.
> > + *
> > + * This function has a custom calling convention in order to minimize the sizes
> > + * of the compiler-generated thunks that call it. All registers except for x16
> > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > + * allow these registers to be restored the caller spills them to stack. The 256
> > + * bytes of stack space allocated by the caller must be deallocated on return.
>
> This is a bit difficult to follow.
>
> Am I right in understanding that the caller has pre-allocated 256 bytes
> on the stack by decrementing the SP, yet it's up to the callee to undo
> that? That seems bizarre to me, and it would be much easier (and more
> generally useful) for the caller to leave it to the trampoline to
> allocate/free its own space.

Yes, that's how it works. The goal was to reduce code size costs,
since otherwise you would need setup and teardown code in each
trampoline together with unwind info which will not necessarily be
deduplicated by the linker. It's probably easiest to think of the
trampoline as the "head" of the tag mismatch handling function, while
the runtime part is the "tail". From that perspective everything is
balanced.

For reference here is an example of a trampoline function for the kernel:

__hwasan_check_x1_67043328:
sbfx x16, x1, #4, #52
ldrb w16, [x9, x16]
cmp x16, x1, lsr #56
b.ne .Ltmp5
.Ltmp6:
ret
.Ltmp5:
lsr x16, x1, #56
cmp x16, #255
b.eq .Ltmp6
stp x0, x1, [sp, #-256]!
stp x29, x30, [sp, #232]
mov x0, x1
mov x1, #0
b __hwasan_tag_mismatch

> > + *
> > + * This function takes care of transitioning to the standard AAPCS calling
> > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > + *
> > + * Parameters:
> > + *   x0 - the fault address
> > + *   x1 - an encoded description of the faulting access
> > + */
> > +SYM_FUNC_START(__hwasan_tag_mismatch)
>
> For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> and place any BTI instructions necessary manually.

Okay. Since the trampolines may appear in modules I guess it's
possible for the function to be called via a module PLT so a BTI
instruction is required.

If I am reading the code in arch/arm64/kernel/module-plts.c correctly
it looks like the module PLTs may only clobber x16, correct? That
works for me.

> We should probably add SYM_NONSTD_FUNC_*() helpers for trampolines, as
> it looks like we didn't add BTIs into the ftrace trampolines (and module
> PLTs can get there via indirect branches). I'll put together a
> point-fix for ftrace.
>
> > +     add     x29, sp, #232
>
> Where has this offset come from? Is there already a frame record created
> by the caller? If so, that should be mentioned in the description of the
> calling convention. If not, I don't think this code works.

Yes, the frame record is created by the caller. The description does
mention that x29 is saved but not x30 which appears after it. I will
update the description.

> > +     stp     x2, x3, [sp, #16]
> > +     stp     x4, x5, [sp, #32]
> > +     stp     x6, x7, [sp, #48]
> > +     stp     x8, x9, [sp, #64]
> > +     stp     x10, x11, [sp, #80]
> > +     stp     x12, x13, [sp, #96]
> > +     stp     x14, x15, [sp, #112]
>
> Is there a reason this starts at an offset of 16?

Because the caller already saved x0 and x1 at offset 0.

> For offsets please use (16 * n) like we do in entry.S, e.g.
>
> |       stp     x2, x3, [sp,  #16 * 0]
> |       stp     x4, x5, [sp,  #16 * 1]
> |       stp     x4, x5, [sp,  #16 * 2]
>
> ... as it makes the relationship clearer.

Will do.

> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > +     str     x18, [sp, #128]
> > +#endif
>
> This doesn't look right to me. The shadow call stack should remain
> balanced across a call, so why do you need to save x18? any manipulation
> within kasan_tag_mismatch should be balanced.
>
> Did you mean to save the return address to the shadow stack, rather
> than saving the shadow stack pointer?

This is to handle the case where SCS is disabled. In this case, x18
becomes a temporary register for AAPCS functions, but this calling
convention requires it to be saved.

IIRC there was a proposal to always reserve x18 in the kernel
regardless of whether SCS was enabled. If that were adopted we would
be able to drop the conditional x18 save/restore here.

> > +     mov     x2, x30
> > +     bl      kasan_tag_mismatch
>
> We didn't preserve x0 and x1 above, so they can be clobbered here, but
> the comment said we needed to preserve them.

They were saved by the caller (out of necessity, because x0 and x1 are
used to pass arguments).

> > +     ldp     x29, x30, [sp, #232]
>
> This doesn't look right. This function didn't save x29/x30 to the stack,
> so the only way this could work is if the caller had a redundant frame
> record pointing to itself.

The frame record was created by the caller (before this function
updated x29, so the chain of frame records is maintained).

> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > +     ldr     x18, [sp, #128]
> > +#endif
>
> As above, this doesn't look right.

Answered above.

> > +     ldp     x14, x15, [sp, #112]
> > +     ldp     x12, x13, [sp, #96]
> > +     ldp     x10, x11, [sp, #80]
> > +     ldp     x8, x9, [sp, #64]
> > +     ldp     x6, x7, [sp, #48]
> > +     ldp     x4, x5, [sp, #32]
> > +     ldp     x2, x3, [sp, #16]
> > +     ldp     x0, x1, [sp], #256
>
> As above, we didn't save x0 and x1 -- did the caller do that?

Yes.

> Likewise, did the caller allocate 256 bytes that we have to deallocate?

Yes, as mentioned in the description above.

Peter

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
  2020-11-06  1:12   ` Peter Collingbourne
@ 2020-11-13 11:22     ` Mark Rutland
  2020-11-20 16:48       ` Mark Rutland
                         ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Mark Rutland @ 2020-11-13 11:22 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Mark Brown, Will Deacon, Linux ARM, Andrey Konovalov

Hi Peter,

Thanks for this; the additional detail was very helpful.

The high-level idea looks fine to me, but I have some qualms with the
ABI, and I think we should rejig the compiler side rather than taking
this as-is. It also looks like there may be an issue with PLTs. I've
tried to explain those in detail below.

On Thu, Nov 05, 2020 at 05:12:56PM -0800, Peter Collingbourne wrote:
> On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Hi Peter,
> >
> > On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > > Outlined checks require a new runtime function with a custom calling
> > > convention. Add this function to arch/arm64/lib.
> >
> > [...]
> >
> > > +/*
> > > + * Report a tag mismatch detected by tag-based KASAN.
> > > + *
> > > + * This function has a custom calling convention in order to minimize the sizes
> > > + * of the compiler-generated thunks that call it. All registers except for x16
> > > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > > + * allow these registers to be restored the caller spills them to stack. The 256
> > > + * bytes of stack space allocated by the caller must be deallocated on return.
> >
> > This is a bit difficult to follow.
> >
> > Am I right in understanding that the caller has pre-allocated 256 bytes
> > on the stack by decrementing the SP, yet it's up to the callee to undo
> > that? That seems bizarre to me, and it would be much easier (and more
> > generally useful) for the caller to leave it to the trampoline to
> > allocate/free its own space.
> 
> Yes, that's how it works. The goal was to reduce code size costs,
> since otherwise you would need setup and teardown code in each
> trampoline together with unwind info which will not necessarily be
> deduplicated by the linker. It's probably easiest to think of the
> trampoline as the "head" of the tag mismatch handling function, while
> the runtime part is the "tail". From that perspective everything is
> balanced.

I appreciate the general idea, but there some details that I think
aren't quite right in the ABI, and O'd ratehr we rejig that before
settling on it, especially as IIUC this was added specifically to speed
up kernel usage, and I'd like to make sure it works well in-context.

Firstly, reserving 256 bytes in the compiler-generated trampoline is
arbitrary, and I'd rather that trampoline only reserved the space it
needs (currently 32 bytes for a stack record and x0/x1), even if that
means the runtime trampoline has to shuffle x0/x1.

Secondly, I don't beieve that the compiler-generated trampoline needs to
stack x29/x30. It makes a tail-call to __hwasan_tag_mismatch, which
generally suggests it *shouldn't* create a frame record. It doesn't
clobber the original x29/x30, so a frame record isn't necessary (even
for reliable stacktrace), and it doesn't plumb the new record into x29,
so it could leave all that work to the runtime trampoline and save an
instruction.

I think that the stacking in the compiler generated trampoline should be
limited to:

	stp x0, x1, [sp, #-16]!

... which is much like how mfentry works on most architectures, and
similar to how patchable-function-entry works on arm64.

> For reference here is an example of a trampoline function for the kernel:
> 
> __hwasan_check_x1_67043328:
> sbfx x16, x1, #4, #52
> ldrb w16, [x9, x16]
> cmp x16, x1, lsr #56
> b.ne .Ltmp5
> .Ltmp6:
> ret
> .Ltmp5:
> lsr x16, x1, #56
> cmp x16, #255
> b.eq .Ltmp6
> stp x0, x1, [sp, #-256]!
> stp x29, x30, [sp, #232]
> mov x0, x1
> mov x1, #0
> b __hwasan_tag_mismatch

Thanks for this; it was helpful in clarifying my understanding.

I note this clobbers x16 unconditionally. Is x17 free for use too?

> > > + *
> > > + * This function takes care of transitioning to the standard AAPCS calling
> > > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > > + *
> > > + * Parameters:
> > > + *   x0 - the fault address
> > > + *   x1 - an encoded description of the faulting access
> > > + */
> > > +SYM_FUNC_START(__hwasan_tag_mismatch)
> >
> > For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> > and place any BTI instructions necessary manually.
> 
> Okay. Since the trampolines may appear in modules I guess it's
> possible for the function to be called via a module PLT so a BTI
> instruction is required.

I think so, but I am not completely certain (there are special rules for
branches from PLTs using x16/x17).

> If I am reading the code in arch/arm64/kernel/module-plts.c correctly
> it looks like the module PLTs may only clobber x16, correct? That
> works for me.

PLTs can clobber both x16 and x17. Do callers of the compiler-generated
trampolines account for those being clobbered? I see from the example
above that the trampolines do not.

[...]

> > > +#ifndef CONFIG_SHADOW_CALL_STACK
> > > +     str     x18, [sp, #128]
> > > +#endif
> >
> > This doesn't look right to me. The shadow call stack should remain
> > balanced across a call, so why do you need to save x18? any manipulation
> > within kasan_tag_mismatch should be balanced.
> >
> > Did you mean to save the return address to the shadow stack, rather
> > than saving the shadow stack pointer?
> 
> This is to handle the case where SCS is disabled. In this case, x18
> becomes a temporary register for AAPCS functions, but this calling
> convention requires it to be saved.

My bad -- I had misread the `ifndef` as an `ifdef`, which was why I was
confused as to why you'd save x18 when SCS was enabled, which is not
what's going on.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
  2020-11-13 11:22     ` Mark Rutland
@ 2020-11-20 16:48       ` Mark Rutland
  2020-11-20 17:45       ` Ard Biesheuvel
  2020-11-20 22:59       ` Peter Collingbourne
  2 siblings, 0 replies; 8+ messages in thread
From: Mark Rutland @ 2020-11-20 16:48 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Mark Brown, Will Deacon, Linux ARM, Andrey Konovalov

Hi Peter,

On Fri, Nov 13, 2020 at 11:22:23AM +0000, Mark Rutland wrote:
> The high-level idea looks fine to me, but I have some qualms with the
> ABI, and I think we should rejig the compiler side rather than taking
> this as-is. It also looks like there may be an issue with PLTs. I've
> tried to explain those in detail below.

Have you had a chance to consider any of this yet?

I would like to support this option if we can get the ABI details sorted
out, so if you have any thoughts on this, please let me know.

Thanks,
Mark.

> On Thu, Nov 05, 2020 at 05:12:56PM -0800, Peter Collingbourne wrote:
> > On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > Hi Peter,
> > >
> > > On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > > > Outlined checks require a new runtime function with a custom calling
> > > > convention. Add this function to arch/arm64/lib.
> > >
> > > [...]
> > >
> > > > +/*
> > > > + * Report a tag mismatch detected by tag-based KASAN.
> > > > + *
> > > > + * This function has a custom calling convention in order to minimize the sizes
> > > > + * of the compiler-generated thunks that call it. All registers except for x16
> > > > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > > > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > > > + * allow these registers to be restored the caller spills them to stack. The 256
> > > > + * bytes of stack space allocated by the caller must be deallocated on return.
> > >
> > > This is a bit difficult to follow.
> > >
> > > Am I right in understanding that the caller has pre-allocated 256 bytes
> > > on the stack by decrementing the SP, yet it's up to the callee to undo
> > > that? That seems bizarre to me, and it would be much easier (and more
> > > generally useful) for the caller to leave it to the trampoline to
> > > allocate/free its own space.
> > 
> > Yes, that's how it works. The goal was to reduce code size costs,
> > since otherwise you would need setup and teardown code in each
> > trampoline together with unwind info which will not necessarily be
> > deduplicated by the linker. It's probably easiest to think of the
> > trampoline as the "head" of the tag mismatch handling function, while
> > the runtime part is the "tail". From that perspective everything is
> > balanced.
> 
> I appreciate the general idea, but there some details that I think
> aren't quite right in the ABI, and O'd ratehr we rejig that before
> settling on it, especially as IIUC this was added specifically to speed
> up kernel usage, and I'd like to make sure it works well in-context.
> 
> Firstly, reserving 256 bytes in the compiler-generated trampoline is
> arbitrary, and I'd rather that trampoline only reserved the space it
> needs (currently 32 bytes for a stack record and x0/x1), even if that
> means the runtime trampoline has to shuffle x0/x1.
> 
> Secondly, I don't beieve that the compiler-generated trampoline needs to
> stack x29/x30. It makes a tail-call to __hwasan_tag_mismatch, which
> generally suggests it *shouldn't* create a frame record. It doesn't
> clobber the original x29/x30, so a frame record isn't necessary (even
> for reliable stacktrace), and it doesn't plumb the new record into x29,
> so it could leave all that work to the runtime trampoline and save an
> instruction.
> 
> I think that the stacking in the compiler generated trampoline should be
> limited to:
> 
> 	stp x0, x1, [sp, #-16]!
> 
> ... which is much like how mfentry works on most architectures, and
> similar to how patchable-function-entry works on arm64.
> 
> > For reference here is an example of a trampoline function for the kernel:
> > 
> > __hwasan_check_x1_67043328:
> > sbfx x16, x1, #4, #52
> > ldrb w16, [x9, x16]
> > cmp x16, x1, lsr #56
> > b.ne .Ltmp5
> > .Ltmp6:
> > ret
> > .Ltmp5:
> > lsr x16, x1, #56
> > cmp x16, #255
> > b.eq .Ltmp6
> > stp x0, x1, [sp, #-256]!
> > stp x29, x30, [sp, #232]
> > mov x0, x1
> > mov x1, #0
> > b __hwasan_tag_mismatch
> 
> Thanks for this; it was helpful in clarifying my understanding.
> 
> I note this clobbers x16 unconditionally. Is x17 free for use too?
> 
> > > > + *
> > > > + * This function takes care of transitioning to the standard AAPCS calling
> > > > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > > > + *
> > > > + * Parameters:
> > > > + *   x0 - the fault address
> > > > + *   x1 - an encoded description of the faulting access
> > > > + */
> > > > +SYM_FUNC_START(__hwasan_tag_mismatch)
> > >
> > > For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> > > and place any BTI instructions necessary manually.
> > 
> > Okay. Since the trampolines may appear in modules I guess it's
> > possible for the function to be called via a module PLT so a BTI
> > instruction is required.
> 
> I think so, but I am not completely certain (there are special rules for
> branches from PLTs using x16/x17).
> 
> > If I am reading the code in arch/arm64/kernel/module-plts.c correctly
> > it looks like the module PLTs may only clobber x16, correct? That
> > works for me.
> 
> PLTs can clobber both x16 and x17. Do callers of the compiler-generated
> trampolines account for those being clobbered? I see from the example
> above that the trampolines do not.
> 
> [...]
> 
> > > > +#ifndef CONFIG_SHADOW_CALL_STACK
> > > > +     str     x18, [sp, #128]
> > > > +#endif
> > >
> > > This doesn't look right to me. The shadow call stack should remain
> > > balanced across a call, so why do you need to save x18? any manipulation
> > > within kasan_tag_mismatch should be balanced.
> > >
> > > Did you mean to save the return address to the shadow stack, rather
> > > than saving the shadow stack pointer?
> > 
> > This is to handle the case where SCS is disabled. In this case, x18
> > becomes a temporary register for AAPCS functions, but this calling
> > convention requires it to be saved.
> 
> My bad -- I had misread the `ifndef` as an `ifdef`, which was why I was
> confused as to why you'd save x18 when SCS was enabled, which is not
> what's going on.
> 
> Thanks,
> Mark.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
  2020-11-13 11:22     ` Mark Rutland
  2020-11-20 16:48       ` Mark Rutland
@ 2020-11-20 17:45       ` Ard Biesheuvel
  2020-11-20 22:59       ` Peter Collingbourne
  2 siblings, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2020-11-20 17:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Catalin Marinas, Mark Brown, Andrey Konovalov,
	Peter Collingbourne, Linux ARM

On Fri, 13 Nov 2020 at 12:23, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Peter,
>
> Thanks for this; the additional detail was very helpful.
>
> The high-level idea looks fine to me, but I have some qualms with the
> ABI, and I think we should rejig the compiler side rather than taking
> this as-is. It also looks like there may be an issue with PLTs. I've
> tried to explain those in detail below.
>
> On Thu, Nov 05, 2020 at 05:12:56PM -0800, Peter Collingbourne wrote:
> > On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > Hi Peter,
> > >
> > > On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > > > Outlined checks require a new runtime function with a custom calling
> > > > convention. Add this function to arch/arm64/lib.
> > >
> > > [...]
> > >
> > > > +/*
> > > > + * Report a tag mismatch detected by tag-based KASAN.
> > > > + *
> > > > + * This function has a custom calling convention in order to minimize the sizes
> > > > + * of the compiler-generated thunks that call it. All registers except for x16
> > > > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > > > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > > > + * allow these registers to be restored the caller spills them to stack. The 256
> > > > + * bytes of stack space allocated by the caller must be deallocated on return.
> > >
> > > This is a bit difficult to follow.
> > >
> > > Am I right in understanding that the caller has pre-allocated 256 bytes
> > > on the stack by decrementing the SP, yet it's up to the callee to undo
> > > that? That seems bizarre to me, and it would be much easier (and more
> > > generally useful) for the caller to leave it to the trampoline to
> > > allocate/free its own space.
> >
> > Yes, that's how it works. The goal was to reduce code size costs,
> > since otherwise you would need setup and teardown code in each
> > trampoline together with unwind info which will not necessarily be
> > deduplicated by the linker. It's probably easiest to think of the
> > trampoline as the "head" of the tag mismatch handling function, while
> > the runtime part is the "tail". From that perspective everything is
> > balanced.
>
> I appreciate the general idea, but there some details that I think
> aren't quite right in the ABI, and O'd ratehr we rejig that before
> settling on it, especially as IIUC this was added specifically to speed
> up kernel usage, and I'd like to make sure it works well in-context.
>
> Firstly, reserving 256 bytes in the compiler-generated trampoline is
> arbitrary, and I'd rather that trampoline only reserved the space it
> needs (currently 32 bytes for a stack record and x0/x1), even if that
> means the runtime trampoline has to shuffle x0/x1.
>
> Secondly, I don't beieve that the compiler-generated trampoline needs to
> stack x29/x30. It makes a tail-call to __hwasan_tag_mismatch, which
> generally suggests it *shouldn't* create a frame record. It doesn't
> clobber the original x29/x30, so a frame record isn't necessary (even
> for reliable stacktrace), and it doesn't plumb the new record into x29,
> so it could leave all that work to the runtime trampoline and save an
> instruction.
>
> I think that the stacking in the compiler generated trampoline should be
> limited to:
>
>         stp x0, x1, [sp, #-16]!
>
> ... which is much like how mfentry works on most architectures, and
> similar to how patchable-function-entry works on arm64.
>
> > For reference here is an example of a trampoline function for the kernel:
> >
> > __hwasan_check_x1_67043328:
> > sbfx x16, x1, #4, #52
> > ldrb w16, [x9, x16]
> > cmp x16, x1, lsr #56
> > b.ne .Ltmp5
> > .Ltmp6:
> > ret
> > .Ltmp5:
> > lsr x16, x1, #56
> > cmp x16, #255
> > b.eq .Ltmp6
> > stp x0, x1, [sp, #-256]!
> > stp x29, x30, [sp, #232]
> > mov x0, x1
> > mov x1, #0
> > b __hwasan_tag_mismatch
>
> Thanks for this; it was helpful in clarifying my understanding.
>
> I note this clobbers x16 unconditionally. Is x17 free for use too?
>
> > > > + *
> > > > + * This function takes care of transitioning to the standard AAPCS calling
> > > > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > > > + *
> > > > + * Parameters:
> > > > + *   x0 - the fault address
> > > > + *   x1 - an encoded description of the faulting access
> > > > + */
> > > > +SYM_FUNC_START(__hwasan_tag_mismatch)
> > >
> > > For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> > > and place any BTI instructions necessary manually.
> >
> > Okay. Since the trampolines may appear in modules I guess it's
> > possible for the function to be called via a module PLT so a BTI
> > instruction is required.
>
> I think so, but I am not completely certain (there are special rules for
> branches from PLTs using x16/x17).
>

Yes, you will need a BTI C here so that indirect calls via X16 are permitted.

> > If I am reading the code in arch/arm64/kernel/module-plts.c correctly
> > it looks like the module PLTs may only clobber x16, correct? That
> > works for me.
>
> PLTs can clobber both x16 and x17. Do callers of the compiler-generated
> trampolines account for those being clobbered? I see from the example
> above that the trampolines do not.
>

Note that this may happen when the kernel is very big - when you build
allyesconfig, the linker may emit veneers for direct branches that are
out of -/+ 128 MB range, and I don't think we can assume anything
about whether those may use x16, x17 or both.

> > > > +#ifndef CONFIG_SHADOW_CALL_STACK
> > > > +     str     x18, [sp, #128]
> > > > +#endif
> > >
> > > This doesn't look right to me. The shadow call stack should remain
> > > balanced across a call, so why do you need to save x18? any manipulation
> > > within kasan_tag_mismatch should be balanced.
> > >
> > > Did you mean to save the return address to the shadow stack, rather
> > > than saving the shadow stack pointer?
> >
> > This is to handle the case where SCS is disabled. In this case, x18
> > becomes a temporary register for AAPCS functions, but this calling
> > convention requires it to be saved.
>
> My bad -- I had misread the `ifndef` as an `ifdef`, which was why I was
> confused as to why you'd save x18 when SCS was enabled, which is not
> what's going on.
>
> Thanks,
> Mark.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
  2020-11-13 11:22     ` Mark Rutland
  2020-11-20 16:48       ` Mark Rutland
  2020-11-20 17:45       ` Ard Biesheuvel
@ 2020-11-20 22:59       ` Peter Collingbourne
  2020-11-24 19:56         ` Mark Rutland
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Collingbourne @ 2020-11-20 22:59 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Mark Brown, Linux ARM, Andrey Konovalov,
	Will Deacon, Evgenii Stepanov

On Fri, Nov 13, 2020 at 3:22 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Peter,
>
> Thanks for this; the additional detail was very helpful.
>
> The high-level idea looks fine to me, but I have some qualms with the
> ABI, and I think we should rejig the compiler side rather than taking
> this as-is. It also looks like there may be an issue with PLTs. I've
> tried to explain those in detail below.
>
> On Thu, Nov 05, 2020 at 05:12:56PM -0800, Peter Collingbourne wrote:
> > On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > Hi Peter,
> > >
> > > On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > > > Outlined checks require a new runtime function with a custom calling
> > > > convention. Add this function to arch/arm64/lib.
> > >
> > > [...]
> > >
> > > > +/*
> > > > + * Report a tag mismatch detected by tag-based KASAN.
> > > > + *
> > > > + * This function has a custom calling convention in order to minimize the sizes
> > > > + * of the compiler-generated thunks that call it. All registers except for x16
> > > > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > > > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > > > + * allow these registers to be restored the caller spills them to stack. The 256
> > > > + * bytes of stack space allocated by the caller must be deallocated on return.
> > >
> > > This is a bit difficult to follow.
> > >
> > > Am I right in understanding that the caller has pre-allocated 256 bytes
> > > on the stack by decrementing the SP, yet it's up to the callee to undo
> > > that? That seems bizarre to me, and it would be much easier (and more
> > > generally useful) for the caller to leave it to the trampoline to
> > > allocate/free its own space.
> >
> > Yes, that's how it works. The goal was to reduce code size costs,
> > since otherwise you would need setup and teardown code in each
> > trampoline together with unwind info which will not necessarily be
> > deduplicated by the linker. It's probably easiest to think of the
> > trampoline as the "head" of the tag mismatch handling function, while
> > the runtime part is the "tail". From that perspective everything is
> > balanced.
>
> I appreciate the general idea, but there some details that I think
> aren't quite right in the ABI, and O'd ratehr we rejig that before
> settling on it, especially as IIUC this was added specifically to speed
> up kernel usage, and I'd like to make sure it works well in-context.
>
> Firstly, reserving 256 bytes in the compiler-generated trampoline is
> arbitrary, and I'd rather that trampoline only reserved the space it
> needs (currently 32 bytes for a stack record and x0/x1), even if that
> means the runtime trampoline has to shuffle x0/x1.
>
> Secondly, I don't beieve that the compiler-generated trampoline needs to
> stack x29/x30. It makes a tail-call to __hwasan_tag_mismatch, which
> generally suggests it *shouldn't* create a frame record. It doesn't
> clobber the original x29/x30, so a frame record isn't necessary (even
> for reliable stacktrace), and it doesn't plumb the new record into x29,
> so it could leave all that work to the runtime trampoline and save an
> instruction.
>
> I think that the stacking in the compiler generated trampoline should be
> limited to:
>
>         stp x0, x1, [sp, #-16]!
>
> ... which is much like how mfentry works on most architectures, and
> similar to how patchable-function-entry works on arm64.

Hi Mark,

I think all of your suggestions may be considered useful improvements
to the ABI, and if we were redesigning it from scratch then I think it
would look a lot more like what you've proposed. However, as discussed
out-of-band, the ABI is already being used in userspace, and has been
for over a year, and I don't think any of them rise to the level of
justifying creating a new ABI, in particular because they only affect
the uncommon case where we've actually detected an error.

> > For reference here is an example of a trampoline function for the kernel:
> >
> > __hwasan_check_x1_67043328:
> > sbfx x16, x1, #4, #52
> > ldrb w16, [x9, x16]
> > cmp x16, x1, lsr #56
> > b.ne .Ltmp5
> > .Ltmp6:
> > ret
> > .Ltmp5:
> > lsr x16, x1, #56
> > cmp x16, #255
> > b.eq .Ltmp6
> > stp x0, x1, [sp, #-256]!
> > stp x29, x30, [sp, #232]
> > mov x0, x1
> > mov x1, #0
> > b __hwasan_tag_mismatch
>
> Thanks for this; it was helpful in clarifying my understanding.
>
> I note this clobbers x16 unconditionally. Is x17 free for use too?

Yes, and indeed it must be free for use because the linker is allowed
to clobber x16 and x17 across the call via a range extension thunk
(that said, I don't know what ld.bfd does, but ld.lld just clobbers
x16).

> > > > + *
> > > > + * This function takes care of transitioning to the standard AAPCS calling
> > > > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > > > + *
> > > > + * Parameters:
> > > > + *   x0 - the fault address
> > > > + *   x1 - an encoded description of the faulting access
> > > > + */
> > > > +SYM_FUNC_START(__hwasan_tag_mismatch)
> > >
> > > For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> > > and place any BTI instructions necessary manually.
> >
> > Okay. Since the trampolines may appear in modules I guess it's
> > possible for the function to be called via a module PLT so a BTI
> > instruction is required.
>
> I think so, but I am not completely certain (there are special rules for
> branches from PLTs using x16/x17).

As also pointed out by Ard a "bti c" is required here so I've added one in v2.

> > If I am reading the code in arch/arm64/kernel/module-plts.c correctly
> > it looks like the module PLTs may only clobber x16, correct? That
> > works for me.
>
> PLTs can clobber both x16 and x17. Do callers of the compiler-generated
> trampolines account for those being clobbered? I see from the example
> above that the trampolines do not.

Yes, the call to the check function is modelled by the compiler as
clobbering x16 and x17 (and x30 of course).

> [...]
>
> > > > +#ifndef CONFIG_SHADOW_CALL_STACK
> > > > +     str     x18, [sp, #128]
> > > > +#endif
> > >
> > > This doesn't look right to me. The shadow call stack should remain
> > > balanced across a call, so why do you need to save x18? any manipulation
> > > within kasan_tag_mismatch should be balanced.
> > >
> > > Did you mean to save the return address to the shadow stack, rather
> > > than saving the shadow stack pointer?
> >
> > This is to handle the case where SCS is disabled. In this case, x18
> > becomes a temporary register for AAPCS functions, but this calling
> > convention requires it to be saved.
>
> My bad -- I had misread the `ifndef` as an `ifdef`, which was why I was
> confused as to why you'd save x18 when SCS was enabled, which is not
> what's going on.

Ack.

Peter

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks
  2020-11-20 22:59       ` Peter Collingbourne
@ 2020-11-24 19:56         ` Mark Rutland
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Rutland @ 2020-11-24 19:56 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Mark Brown, Linux ARM, Andrey Konovalov,
	Will Deacon, Evgenii Stepanov

On Fri, Nov 20, 2020 at 02:59:42PM -0800, Peter Collingbourne wrote:
> On Fri, Nov 13, 2020 at 3:22 AM Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Mark,
> 
> I think all of your suggestions may be considered useful improvements
> to the ABI, and if we were redesigning it from scratch then I think it
> would look a lot more like what you've proposed. However, as discussed
> out-of-band, the ABI is already being used in userspace, and has been
> for over a year, and I don't think any of them rise to the level of
> justifying creating a new ABI, in particular because they only affect
> the uncommon case where we've actually detected an error.

Sure, and apologies for the confusion here.  I had sent this email just
before that out-of-band conversation, and as mentioned there I wasn't
aware this was an existing ABI.

Given that, my main concerns are that this works correctly with PLTs
(which as below, it should) and reliable stacktrace (which it appears it
should as it affectively makes a tail-call), and that we thoroughly
document the calling convention since it is surprising.

I see you've sent a v2, and I'll follow up there.

> > > For reference here is an example of a trampoline function for the kernel:
> > >
> > > __hwasan_check_x1_67043328:
> > > sbfx x16, x1, #4, #52
> > > ldrb w16, [x9, x16]
> > > cmp x16, x1, lsr #56
> > > b.ne .Ltmp5
> > > .Ltmp6:
> > > ret
> > > .Ltmp5:
> > > lsr x16, x1, #56
> > > cmp x16, #255
> > > b.eq .Ltmp6
> > > stp x0, x1, [sp, #-256]!
> > > stp x29, x30, [sp, #232]
> > > mov x0, x1
> > > mov x1, #0
> > > b __hwasan_tag_mismatch
> >
> > Thanks for this; it was helpful in clarifying my understanding.
> >
> > I note this clobbers x16 unconditionally. Is x17 free for use too?
> 
> Yes, and indeed it must be free for use because the linker is allowed
> to clobber x16 and x17 across the call via a range extension thunk
> (that said, I don't know what ld.bfd does, but ld.lld just clobbers
> x16).

Sure, that range extension thunk or PLT was exactly what I was
concerned with, since I wasn't sure which portions of the usual AAPCS
rules were being respected here. Thanks for confirming!

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-24 19:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29 20:59 [PATCH] kasan: arm64: support specialized outlined tag mismatch checks Peter Collingbourne
2020-11-03 15:28 ` Mark Rutland
2020-11-06  1:12   ` Peter Collingbourne
2020-11-13 11:22     ` Mark Rutland
2020-11-20 16:48       ` Mark Rutland
2020-11-20 17:45       ` Ard Biesheuvel
2020-11-20 22:59       ` Peter Collingbourne
2020-11-24 19:56         ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).