Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction

From: Bertrand Marquis <Bertrand.Marquis@arm.com>
To: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	"security@xenproject.org" <security@xenproject.org>,
	Paul Durrant <paul@xen.org>,
	Xen-devel <xen-devel@lists.xenproject.org>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Andre Przywara <Andre.Przywara@arm.com>,
	Julien Grall <jgrall@amazon.com>, nd <nd@arm.com>
Subject: Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
Date: Tue, 18 Aug 2020 17:06:38 +0000	[thread overview]
Message-ID: <7EAB4E0A-338C-4DCF-80A4-A426BC95C051@arm.com> (raw)
In-Reply-To: <5a406d61-293b-8dd7-44c2-d4f5dde4856f@xen.org>

> On 18 Aug 2020, at 17:43, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 18/08/2020 17:35, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>> Somehow we stopped on this thread and you did already most of the work so I think we should try to finish what you started
> 
> Sorry this fell-through the cracks. I have a new version for patch #1, but not yet patch #2.

No problem this came back while trying to reduce my todolist stack :-)

> 
> I am still debating with myself where the speculation barrier should be added after the SMC :).

I think that we should unless the SMC is in the context switch path (as all other calls should not have a performance impact).

> 
>>> On 4 Jul 2020, at 17:07, Julien Grall <julien@xen.org> wrote:
>>> 
>>> On 17/06/2020 17:23, Julien Grall wrote:
>>>> Hi,
>>>> On 16/06/2020 22:24, Stefano Stabellini wrote:
>>>>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>> 
>>>>>> Some CPUs can speculate past a RET instruction and potentially perform
>>>>>> speculative accesses to memory before processing the return.
>>>>>> 
>>>>>> There is no known gadget available after the RET instruction today.
>>>>>> However some of the registers (such as in check_pending_guest_serror())
>>>>>> may contain a value provided the guest.
>>>>>                                ^ by
>>>>> 
>>>>> 
>>>>>> In order to harden the code, it would be better to add a speculation
>>>>>> barrier after each RET instruction. The performance is meant to be
>>>>>> negligeable as the speculation barrier is not meant to be archicturally
>>>>>> executed.
>>>>>> 
>>>>>> Note that on arm32, the ldmia instruction will act as a return from the
>>>>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>>>>> possible to speculate after the instruction, add preventively a
>>>>>> speculation barrier after it as well.
>>>>>> 
>>>>>> This is part of the work to mitigate straight-line speculation.
>>>>>> 
>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>> 
>>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>>> 
>>>>> I did a compile-test on the patch too.
>>>>> 
>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> I am still unsure whether we preventively should add a speculation barrier
>>>>>> preventively after all the RET instructions in arm*/lib/. The smc call be
>>>>>> taken care in a follow-up patch.
>>>>> 
>>>>> SMC is great to have but it seems to be overkill to do the ones under
>>>>> lib/.
>>>> From my understanding, the compiler will add a speculation barrier preventively after each 'ret' when the mitigation are turned on.So it feels to me we want to follow the same approach.
>>>> Obviously, we can avoid them but I would like to have a justification for not adding them (nothing is overkilled against speculation ;)).
>>> 
>>> I finally found some time to look at arm*/lib in more details. Some of the helpers can definitely be called with guest inputs.
>>> 
>>> For instance, memchr() is called from hypfs_get_path_user() with the 3rd argument controlled by the guest. In both 32-bit and 64-bit implementation, you will reach the end of the function memchr() with r2/w2 and r3/w3 (it contains a character from the buffer) controlled by the guest.
>>> 
>>> As this is the only function in the unit, we don't know what will be the instructions right after RET. So it would be safer to add a speculation barrier there too.
>> How about adding a speculation barrier directly in the ENDPROC macro ?
> 
> This would unfortunately not cover all the cases because you can return in the middle of the function. I will have a look to see if we can leverage it.

I agree that it would not solve all of them but a big part would be solved by it.
An other solution might be to have a RETURN macro encoded as "mov pc,lr; sb" and "ret; sb”.

The patch sounds good, i just need to find a way to analyse if you missed a ret or not which is not easy with such a patch :-)

Bertrand

> 
> Below the first draft for v2 of this patch.
> 
> commit 7f7cf1558283d0d35fc22ead909e8a812da82f27
> Author: Julien Grall <jgrall@amazon.com>
> Date:   Tue Jun 16 16:33:12 2020 +0100
> 
>    xen/arm: Place a speculation barrier following an ret instruction
> 
>    Some CPUs can speculate past a RET instruction and potentially perform
>    speculative accesses to memory before processing the return.
> 
>    There is no known gadget available after the RET instruction today.
>    However some of the registers (such as in check_pending_guest_serror())
>    may contain a value provided the guest.
> 
>    In order to harden the code, it would be better to add a speculation
>    barrier after each RET instruction. The performance is meant to be
>    negligeable as the speculation barrier is not meant to be architecturally
>    executed.
> 
>    The RET in boot code (i.e arm*/entry.S) are not followed by an SB
>    because this is boot code. Therefore there is no concern as all the
>    registers are controlled by the hypervisor.
> 
>    Note that on arm32, the ldmia instruction will act as a return from the
>    function __context_switch(). While the whitepaper doesn't suggest it is
>    possible to speculate after the instruction, add preventively a
>    speculation barrier after it as well.
> 
>    This is part of the work to mitigate straight-line speculation.
> 
>    Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
>    ---
> 
>        Changes in v2:
>            - Add an a speculation barrier for the __arm_smccc_1_0_smc. This
>            was originally part of the second patch.
>            - I have now included all the arm*/lib/ assembly helpers
>            because some of them are called with guest input value (e.g
>            memchr).
> 
> diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
> index b228d44b190c..bd54fc43558b 100644
> --- a/xen/arch/arm/arm32/entry.S
> +++ b/xen/arch/arm/arm32/entry.S
> @@ -442,6 +442,7 @@ ENTRY(__context_switch)
> 
>         add     r4, r1, #VCPU_arch_saved_context
>         ldmia   r4, {r4 - sl, fp, sp, pc}       /* Load registers and return */
> +        sb
> 
> /*
>  * Local variables:
> diff --git a/xen/arch/arm/arm32/lib/assembler.h b/xen/arch/arm/arm32/lib/assembler.h
> index 6de2638a36d5..cffdb60baa29 100644
> --- a/xen/arch/arm/arm32/lib/assembler.h
> +++ b/xen/arch/arm/arm32/lib/assembler.h
> @@ -17,6 +17,8 @@
> #ifndef __ASM_ASSEMBLER_H__
> #define __ASM_ASSEMBLER_H__
> 
> +#include <asm/macros.h>
> +
> #ifndef __ASSEMBLY__
> #error "Only include this from assembly code"
> #endif
> diff --git a/xen/arch/arm/arm32/lib/findbit.S b/xen/arch/arm/arm32/lib/findbit.S
> index 00674a31efad..df86922a25bc 100644
> --- a/xen/arch/arm/arm32/lib/findbit.S
> +++ b/xen/arch/arm/arm32/lib/findbit.S
> @@ -35,6 +35,7 @@ ENTRY(_find_first_zero_bit_le)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_zero_bit_le)
> 
> /*
> @@ -76,6 +77,7 @@ ENTRY(_find_first_bit_le)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_bit_le)
> 
> /*
> @@ -114,6 +116,7 @@ ENTRY(_find_first_zero_bit_be)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_zero_bit_be)
> 
> ENTRY(_find_next_zero_bit_be)
> @@ -148,6 +151,7 @@ ENTRY(_find_first_bit_be)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_bit_be)
> 
> ENTRY(_find_next_bit_be)
> @@ -192,4 +196,5 @@ ENDPROC(_find_next_bit_be)
> 		cmp	r1, r0			@ Clamp to maxbit
> 		movlo	r0, r1
> 		mov	pc, lr
> +		sb
> 
> diff --git a/xen/arch/arm/arm32/lib/lib1funcs.S b/xen/arch/arm/arm32/lib/lib1funcs.S
> index f1278bd6c139..ebb9b4f1babf 100644
> --- a/xen/arch/arm/arm32/lib/lib1funcs.S
> +++ b/xen/arch/arm/arm32/lib/lib1funcs.S
> @@ -217,15 +217,18 @@ UNWIND(.fnstart)
> 
> 	mov	r0, r2
> 	mov	pc, lr
> +	sb
> 
> 11:	moveq	r0, #1
> 	movne	r0, #0
> 	mov	pc, lr
> +	sb
> 
> 12:	ARM_DIV2_ORDER r1, r2
> 
> 	mov	r0, r0, lsr r2
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__udivsi3)
> @@ -245,6 +248,7 @@ UNWIND(.fnstart)
> 	ARM_MOD_BODY r0, r1, r2, r3
> 
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__umodsi3)
> @@ -271,15 +275,18 @@ UNWIND(.fnstart)
> 	cmp	ip, #0
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> 10:	teq	ip, r0				@ same sign ?
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> 11:	movlo	r0, #0
> 	moveq	r0, ip, asr #31
> 	orreq	r0, r0, #1
> 	mov	pc, lr
> +	sb
> 
> 12:	ARM_DIV2_ORDER r1, r2
> 
> @@ -287,6 +294,7 @@ UNWIND(.fnstart)
> 	mov	r0, r3, lsr r2
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__divsi3)
> @@ -312,6 +320,7 @@ UNWIND(.fnstart)
> 10:	cmp	ip, #0
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__modsi3)
> @@ -328,6 +337,7 @@ UNWIND(.save {r0, r1, ip, lr}	)
> 	mul	r3, r0, r2
> 	sub	r1, r1, r3
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_uidivmod)
> @@ -341,6 +351,7 @@ UNWIND(.save {r0, r1, ip, lr}	)
> 	mul	r3, r0, r2
> 	sub	r1, r1, r3
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_idivmod)
> @@ -355,6 +366,7 @@ UNWIND(.save {lr}	)
> 	add sp, sp, #8
> 	ldmfd sp!, {r2, r3}
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_uldivmod)
> @@ -369,6 +381,7 @@ UNWIND(.save {lr}	)
> 	add sp, sp, #16
> 	ldmfd	sp!, {r2, r3}
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_ldivmod)
> @@ -382,5 +395,6 @@ UNWIND(.save {lr})
> 	bl	__div0
> 	mov	r0, #0			@ About as wrong as it could be.
> 	ldr	pc, [sp], #8
> +	sb
> UNWIND(.fnend)
> ENDPROC(Ldiv0)
> diff --git a/xen/arch/arm/arm32/lib/lshrdi3.S b/xen/arch/arm/arm32/lib/lshrdi3.S
> index 88b9655282ab..e4d275842c00 100644
> --- a/xen/arch/arm/arm32/lib/lshrdi3.S
> +++ b/xen/arch/arm/arm32/lib/lshrdi3.S
> @@ -46,6 +46,7 @@ ENTRY(__aeabi_llsr)
>  THUMB(	orrmi	al, al, r3		)
> 	mov	ah, ah, lsr r2
> 	mov	pc, lr
> +	sb
> 
> ENDPROC(__lshrdi3)
> ENDPROC(__aeabi_llsr)
> diff --git a/xen/arch/arm/arm32/lib/memchr.S b/xen/arch/arm/arm32/lib/memchr.S
> index 7dae4ab6a6f8..e2910d381b08 100644
> --- a/xen/arch/arm/arm32/lib/memchr.S
> +++ b/xen/arch/arm/arm32/lib/memchr.S
> @@ -22,4 +22,5 @@ ENTRY(memchr)
> 	sub	r0, r0, #1
> 2:	movne	r0, #0
> 	mov	pc, lr
> +	sb
> ENDPROC(memchr)
> diff --git a/xen/arch/arm/arm32/lib/memset.S b/xen/arch/arm/arm32/lib/memset.S
> index 5a1dadf767d7..9ce78d9e4695 100644
> --- a/xen/arch/arm/arm32/lib/memset.S
> +++ b/xen/arch/arm/arm32/lib/memset.S
> @@ -110,6 +110,7 @@ ENTRY(memset)
> 	tst	r2, #1
> 	strneb	r1, [ip], #1
> 	mov	pc, lr
> +	sb
> 
> 6:	subs	r2, r2, #4		@ 1 do we have enough
> 	blt	5b			@ 1 bytes to align with?
> diff --git a/xen/arch/arm/arm32/lib/strchr.S b/xen/arch/arm/arm32/lib/strchr.S
> index fa53ad8bc659..697b5b30e38b 100644
> --- a/xen/arch/arm/arm32/lib/strchr.S
> +++ b/xen/arch/arm/arm32/lib/strchr.S
> @@ -25,4 +25,5 @@ ENTRY(strchr)
> 		movne	r0, #0
> 		subeq	r0, r0, #1
> 		mov	pc, lr
> +		sb
> ENDPROC(strchr)
> diff --git a/xen/arch/arm/arm32/lib/strrchr.S b/xen/arch/arm/arm32/lib/strrchr.S
> index ec4d40de5e5a..9f64f23d5cb0 100644
> --- a/xen/arch/arm/arm32/lib/strrchr.S
> +++ b/xen/arch/arm/arm32/lib/strrchr.S
> @@ -22,4 +22,5 @@ ENTRY(strrchr)
> 		bne	1b
> 		mov	r0, r3
> 		mov	pc, lr
> +		sb
> ENDPROC(strrchr)
> diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
> index 175ea2981e72..b37d8fe09dc2 100644
> --- a/xen/arch/arm/arm64/entry.S
> +++ b/xen/arch/arm/arm64/entry.S
> @@ -522,6 +522,7 @@ abort_guest_exit_end:
>         cset    x19, ne
> 
>         ret
> +        sb
> ENDPROC(check_pending_guest_serror)
> 
> /*
> @@ -583,6 +584,7 @@ ENTRY(__context_switch)
>         ldr     lr, [x8]
>         mov     sp, x9
>         ret
> +        sb
> 
> /*
>  * Local variables:
> diff --git a/xen/arch/arm/arm64/lib/assembler.h b/xen/arch/arm/arm64/lib/assembler.h
> index 3f9c0dcf5d32..8356fd5d11fd 100644
> --- a/xen/arch/arm/arm64/lib/assembler.h
> +++ b/xen/arch/arm/arm64/lib/assembler.h
> @@ -5,6 +5,8 @@
> #error "Only include this from assembly code"
> #endif
> 
> +#include <asm/macros.h>
> +
> /* Only LE support so far */
> #define CPU_BE(x...)
> #define CPU_LE(x...) x
> diff --git a/xen/arch/arm/arm64/lib/clear_page.S b/xen/arch/arm/arm64/lib/clear_page.S
> index 9f8a680e1880..cf07bd00cb7e 100644
> --- a/xen/arch/arm/arm64/lib/clear_page.S
> +++ b/xen/arch/arm/arm64/lib/clear_page.S
> @@ -14,6 +14,8 @@
>  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>  */
> 
> +#include <asm/macros.h>
> +
> /*
>  * Clear page @dest
>  *
> @@ -31,4 +33,5 @@ ENTRY(clear_page)
> 	tst	x0, #(PAGE_SIZE - 1)
> 	b.ne	1b
> 	ret
> +	sb
> ENDPROC(clear_page)
> diff --git a/xen/arch/arm/arm64/lib/memchr.S b/xen/arch/arm/arm64/lib/memchr.S
> index 81f113bb1cab..ff75f03d86bc 100644
> --- a/xen/arch/arm/arm64/lib/memchr.S
> +++ b/xen/arch/arm/arm64/lib/memchr.S
> @@ -38,6 +38,8 @@ ENTRY(memchr)
> 	b.ne	1b
> 	sub	x0, x0, #1
> 	ret
> +	sb
> 2:	mov	x0, #0
> 	ret
> +	sb
> ENDPROC(memchr)
> diff --git a/xen/arch/arm/arm64/lib/memcmp.S b/xen/arch/arm/arm64/lib/memcmp.S
> index 87c2537ffeb7..2583bd404249 100644
> --- a/xen/arch/arm/arm64/lib/memcmp.S
> +++ b/xen/arch/arm/arm64/lib/memcmp.S
> @@ -160,6 +160,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp1 )
> 1:
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lstart_align:
> 	lsr	limit_wd, limit, #3
> @@ -236,6 +237,7 @@ CPU_LE( rev	data2, data2 )
> 	lsr	data1, data1, #56
> 	sub	result, data1, data2, lsr #56
> 	ret
> +	sb
> 
> .Lremain8:
> 	/* Limit % 8 == 0 =>. all data are equal.*/
> @@ -251,7 +253,9 @@ CPU_LE( rev	data2, data2 )
> 	b.eq	.Ltiny8proc
> 	sub	result, data1, data2
> 	ret
> +	sb
> .Lret0:
> 	mov	result, #0
> 	ret
> +	sb
> ENDPROC(memcmp)
> diff --git a/xen/arch/arm/arm64/lib/memcpy.S b/xen/arch/arm/arm64/lib/memcpy.S
> index d90d20ef3ea8..f79ba8cb9515 100644
> --- a/xen/arch/arm/arm64/lib/memcpy.S
> +++ b/xen/arch/arm/arm64/lib/memcpy.S
> @@ -142,6 +142,7 @@ ENTRY(memcpy)
> 
> .Lexitfunc:
> 	ret
> +	sb
> 
> .Lcpy_over64:
> 	subs	count, count, #128
> @@ -162,6 +163,7 @@ ENTRY(memcpy)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> 
> 	/*
> 	* Critical loop.  Start at a new cache line boundary.  Assuming
> @@ -197,4 +199,5 @@ ENTRY(memcpy)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> ENDPROC(memcpy)
> diff --git a/xen/arch/arm/arm64/lib/memmove.S b/xen/arch/arm/arm64/lib/memmove.S
> index a49de845d066..dc0429583db7 100644
> --- a/xen/arch/arm/arm64/lib/memmove.S
> +++ b/xen/arch/arm/arm64/lib/memmove.S
> @@ -138,6 +138,7 @@ ENTRY(memmove)
> 
> .Lexitfunc:
> 	ret
> +	sb
> 
> .Lcpy_over64:
> 	subs	count, count, #128
> @@ -158,6 +159,7 @@ ENTRY(memmove)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> 
> 	/*
> 	* Critical loop. Start at a new cache line boundary. Assuming
> @@ -193,4 +195,5 @@ ENTRY(memmove)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> ENDPROC(memmove)
> diff --git a/xen/arch/arm/arm64/lib/memset.S b/xen/arch/arm/arm64/lib/memset.S
> index 5bf751521b60..e052d7c937d8 100644
> --- a/xen/arch/arm/arm64/lib/memset.S
> +++ b/xen/arch/arm/arm64/lib/memset.S
> @@ -76,6 +76,7 @@ ENTRY(memset)
> 	strb	A_lw, [dst]
> 4:
> 	ret
> +	sb
> 
> .Lover16_proc:
> 	/*Whether  the start address is aligned with 16.*/
> @@ -120,6 +121,7 @@ ENTRY(memset)
> 	stp	A_l, A_l, [dst, #-16]	/* Repeat some/all of last store. */
> 4:
> 	ret
> +	sb
> 
> 	/*
> 	* Critical loop. Start at a new cache line boundary. Assuming
> @@ -141,6 +143,7 @@ ENTRY(memset)
> 	b.ne	.Ltail63
> .Lexitfunc:
> 	ret
> +	sb
> 
> 	/*
> 	* For zeroing memory, check to see if we can use the ZVA feature to
> @@ -212,4 +215,5 @@ ENTRY(memset)
> 	ands	count, count, zva_bits_x
> 	b.ne	.Ltail_maybe_long
> 	ret
> +	sb
> ENDPROC(memset)
> diff --git a/xen/arch/arm/arm64/lib/strchr.S b/xen/arch/arm/arm64/lib/strchr.S
> index 0506b0ff7f3a..be1d2fa2ec82 100644
> --- a/xen/arch/arm/arm64/lib/strchr.S
> +++ b/xen/arch/arm/arm64/lib/strchr.S
> @@ -17,6 +17,7 @@
>  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>  */
> 
> +#include <asm/macros.h>
> 
> /*
>  * Find the first occurrence of a character in a string.
> @@ -37,4 +38,5 @@ ENTRY(strchr)
> 	cmp	w2, w1
> 	csel	x0, x0, xzr, eq
> 	ret
> +	sb
> ENDPROC(strchr)
> diff --git a/xen/arch/arm/arm64/lib/strcmp.S b/xen/arch/arm/arm64/lib/strcmp.S
> index c6f42dd2550b..08eda29302f0 100644
> --- a/xen/arch/arm/arm64/lib/strcmp.S
> +++ b/xen/arch/arm/arm64/lib/strcmp.S
> @@ -133,6 +133,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp1 )	/* Shift (tmp1 & 63).  */
> 1:
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lstart_align:
> 	ands	xzr, src1, #7
> @@ -205,6 +206,7 @@ CPU_BE( cmp	data1, data2 )
> CPU_BE( cset	result, ne )
> CPU_BE( cneg	result, result, lo )
> CPU_BE( ret )
> +CPU_BE( sb )
> CPU_BE( 1: )
> 	/*Re-compute the NUL-byte detection, using a byte-reversed value. */
> CPU_BE(	rev	tmp3, data1 )
> @@ -230,4 +232,5 @@ CPU_BE(	orr	syndrome, diff, has_nul )
> 	lsr	data1, data1, #56
> 	sub	result, data1, data2, lsr #56
> 	ret
> +	sb
> ENDPROC(strcmp)
> diff --git a/xen/arch/arm/arm64/lib/strlen.S b/xen/arch/arm/arm64/lib/strlen.S
> index fb6aaf1a6afb..92df3c5e4418 100644
> --- a/xen/arch/arm/arm64/lib/strlen.S
> +++ b/xen/arch/arm/arm64/lib/strlen.S
> @@ -106,6 +106,7 @@ CPU_BE( bic	has_nul2, tmp1, tmp2 )
> 	clz	pos, has_nul2
> 	add	len, len, pos, lsr #3		/* Bits to bytes.  */
> 	ret
> +	sb
> 
> .Lmisaligned:
> 	cmp	tmp1, #8
> diff --git a/xen/arch/arm/arm64/lib/strncmp.S b/xen/arch/arm/arm64/lib/strncmp.S
> index a4a0f779f56e..e13ff99fe75e 100644
> --- a/xen/arch/arm/arm64/lib/strncmp.S
> +++ b/xen/arch/arm/arm64/lib/strncmp.S
> @@ -182,6 +182,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp3 )	/* Shift (tmp1 & 63).  */
> 1:
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lstart_align:
> 	lsr	limit_wd, limit, #3
> @@ -264,6 +265,7 @@ CPU_BE( cmp	data1, data2 )
> CPU_BE( cset	result, ne )
> CPU_BE( cneg	result, result, lo )
> CPU_BE( ret )
> +CPU_BE( ret )
> CPU_BE( 1: )
> 	/* Re-compute the NUL-byte detection, using a byte-reversed value.*/
> CPU_BE( rev	tmp3, data1 )
> @@ -288,6 +290,7 @@ CPU_BE( orr	syndrome, diff, has_nul )
> 	lsr	data1, data1, #56
> 	sub	result, data1, data2, lsr #56
> 	ret
> +	sb
> 
> .Lremain8:
> 	/* Limit % 8 == 0 => all bytes significant.  */
> @@ -303,8 +306,10 @@ CPU_BE( orr	syndrome, diff, has_nul )
> 	b.eq	.Ltiny8proc
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lret0:
> 	mov	result, #0
> 	ret
> +	sb
> ENDPROC(strncmp)
> diff --git a/xen/arch/arm/arm64/lib/strnlen.S b/xen/arch/arm/arm64/lib/strnlen.S
> index 81c8e8b54ea9..c59467bf17e1 100644
> --- a/xen/arch/arm/arm64/lib/strnlen.S
> +++ b/xen/arch/arm/arm64/lib/strnlen.S
> @@ -126,6 +126,7 @@ CPU_BE( bic	has_nul2, tmp1, tmp2 )
> 	cmp	len, limit
> 	csel	len, len, limit, ls     /* Return the lower value.  */
> 	ret
> +	sb
> 
> .Lmisaligned:
> 	/*
> @@ -168,4 +169,5 @@ CPU_LE( lsr	tmp2, tmp2, tmp4 )	/* Shift (tmp1 & 63).  */
> .Lhit_limit:
> 	mov	len, limit
> 	ret
> +	sb
> ENDPROC(strnlen)
> diff --git a/xen/arch/arm/arm64/lib/strrchr.S b/xen/arch/arm/arm64/lib/strrchr.S
> index 07059983f828..fc8b6e446071 100644
> --- a/xen/arch/arm/arm64/lib/strrchr.S
> +++ b/xen/arch/arm/arm64/lib/strrchr.S
> @@ -17,6 +17,8 @@
>  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>  */
> 
> +#include <asm/macros.h>
> +
> /*
>  * Find the last occurrence of a character in a string.
>  *
> @@ -37,4 +39,5 @@ ENTRY(strrchr)
> 	b	1b
> 2:	mov	x0, x3
> 	ret
> +	sb
> ENDPROC(strrchr)
> diff --git a/xen/arch/arm/arm64/smc.S b/xen/arch/arm/arm64/smc.S
> index b0752be57e8f..e0a3106dd7ec 100644
> --- a/xen/arch/arm/arm64/smc.S
> +++ b/xen/arch/arm/arm64/smc.S
> @@ -30,3 +30,4 @@ ENTRY(__arm_smccc_1_0_smc)
>         stp     x2, x3, [x4, #SMCCC_RES_a2]
> 1:
>         ret
> +        sb
> 
> -- 
> Julien Grall