xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] xen/arm: Mitigate straight-line speculation
@ 2020-06-16 17:59 Julien Grall
  2020-06-16 17:59 ` [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction Julien Grall
  2020-06-16 17:59 ` [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call Julien Grall
  0 siblings, 2 replies; 20+ messages in thread
From: Julien Grall @ 2020-06-16 17:59 UTC (permalink / raw)
  To: security
  Cc: sstabellini, paul, Andre.Przywara, Julien Grall,
	Bertrand.Marquis, xen-devel, Volodymyr_Babchuk

From: Julien Grall <jgrall@amazon.com>

Hi all,

Arm recently released a whitepaper about a new category of speculation.
(see [1] and [2]). In short, a processor may be able to speculate past
some of the unconditional control flow instructions (e.g eret, smc, br).

In some of the cases, the registers will contain values controlled by
the guest. While there is no known gadget afterwards, we still want to
prevent any leakage in the future.

The mitigation is planned in two parts:
   1) Arm provided patches for both GCC and LLVM to add speculation barrier
   and remove problematic code sequence.
   2) Inspection of assembly code and call to higher level (e.g smc in our case).

I am still waiting on more input for 1), so this series only address 2)
for the moment.

Note that the ERET instruction was already addressed as part of XSA-312.

The patch series is directly sent on the mailing list as the
security team has been aware of the issues after the whitepaper was
publicly released.

Cheers,

[1] https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability
[2] https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/downloads/straight-line-speculation

Julien Grall (2):
  xen/arm: entry: Place a speculation barrier following an ret
    instruction
  xen/arm: Mitigate straight-line speculation for SMC call

 xen/arch/arm/arm32/entry.S   |  1 +
 xen/arch/arm/arm64/entry.S   |  2 ++
 xen/arch/arm/arm64/smc.S     |  1 +
 xen/include/asm-arm/smccc.h  | 13 +++++++++++++
 xen/include/asm-arm/system.h |  8 ++++++++
 5 files changed, 25 insertions(+)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-06-16 17:59 [PATCH 0/2] xen/arm: Mitigate straight-line speculation Julien Grall
@ 2020-06-16 17:59 ` Julien Grall
  2020-06-16 21:24   ` Stefano Stabellini
  2020-06-16 17:59 ` [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call Julien Grall
  1 sibling, 1 reply; 20+ messages in thread
From: Julien Grall @ 2020-06-16 17:59 UTC (permalink / raw)
  To: security
  Cc: sstabellini, paul, Andre.Przywara, Julien Grall,
	Bertrand.Marquis, xen-devel, Volodymyr_Babchuk

From: Julien Grall <jgrall@amazon.com>

Some CPUs can speculate past a RET instruction and potentially perform
speculative accesses to memory before processing the return.

There is no known gadget available after the RET instruction today.
However some of the registers (such as in check_pending_guest_serror())
may contain a value provided the guest.

In order to harden the code, it would be better to add a speculation
barrier after each RET instruction. The performance is meant to be
negligeable as the speculation barrier is not meant to be archicturally
executed.

Note that on arm32, the ldmia instruction will act as a return from the
function __context_switch(). While the whitepaper doesn't suggest it is
possible to speculate after the instruction, add preventively a
speculation barrier after it as well.

This is part of the work to mitigate straight-line speculation.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

I am still unsure whether we preventively should add a speculation barrier
preventively after all the RET instructions in arm*/lib/. The smc call be
taken care in a follow-up patch.
---
 xen/arch/arm/arm32/entry.S | 1 +
 xen/arch/arm/arm64/entry.S | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
index b228d44b190c..bd54fc43558b 100644
--- a/xen/arch/arm/arm32/entry.S
+++ b/xen/arch/arm/arm32/entry.S
@@ -442,6 +442,7 @@ ENTRY(__context_switch)
 
         add     r4, r1, #VCPU_arch_saved_context
         ldmia   r4, {r4 - sl, fp, sp, pc}       /* Load registers and return */
+        sb
 
 /*
  * Local variables:
diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
index 175ea2981e72..b37d8fe09dc2 100644
--- a/xen/arch/arm/arm64/entry.S
+++ b/xen/arch/arm/arm64/entry.S
@@ -522,6 +522,7 @@ abort_guest_exit_end:
         cset    x19, ne
 
         ret
+        sb
 ENDPROC(check_pending_guest_serror)
 
 /*
@@ -583,6 +584,7 @@ ENTRY(__context_switch)
         ldr     lr, [x8]
         mov     sp, x9
         ret
+        sb
 
 /*
  * Local variables:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call
  2020-06-16 17:59 [PATCH 0/2] xen/arm: Mitigate straight-line speculation Julien Grall
  2020-06-16 17:59 ` [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction Julien Grall
@ 2020-06-16 17:59 ` Julien Grall
  2020-06-16 21:34   ` Stefano Stabellini
  2020-06-18 17:46   ` Julien Grall
  1 sibling, 2 replies; 20+ messages in thread
From: Julien Grall @ 2020-06-16 17:59 UTC (permalink / raw)
  To: security
  Cc: sstabellini, paul, Andre.Przywara, Julien Grall,
	Bertrand.Marquis, xen-devel, Volodymyr_Babchuk

From: Julien Grall <jgrall@amazon.com>

SMC call will update some of registers (typically only x0) depending on
the arguments provided.

Some CPUs can speculate past a SMC instruction and potentially perform
speculative access to emrmoy using the pre-call values before executing
the SMC.

There is no known gadget available after the SMC call today. However
some of the registers may contain values from the guest and are expected
to be updated by the SMC call.

In order to harden the code, it would be better to prevent straight-line
speculation from an SMC. Architecturally executing the speculation
barrier after every SMC can be rather expensive (particularly on core
not SB). Therefore we want to mitigate it diferrently:

    * For arm_smccc_1_0_smc, we can avoid a speculation barrier right
    after the SMC instruction and instead rely on the one after eret.
    * For arm_smccc_1_1_smc, we can place a B instruction after the SMC
    instruction to skip the barrier.

Note that arm_smccc_1_0_smc version on arm32 is just an alias to
arm_smccc_1_1_smc.

Note that no speculation barrier has been added after the SMC
instruction in arm64/entry.S. This is fine because the call is not
expected to modify any registers. So straight-line speculation doesn't
matter.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Note this hasn't been vetted by Arm but they are using the same
sort of mitigation for blr. So I am quite confident this could do the
trick.
---
 xen/arch/arm/arm64/smc.S     |  1 +
 xen/include/asm-arm/smccc.h  | 13 +++++++++++++
 xen/include/asm-arm/system.h |  8 ++++++++
 3 files changed, 22 insertions(+)

diff --git a/xen/arch/arm/arm64/smc.S b/xen/arch/arm/arm64/smc.S
index b0752be57e8f..e0a3106dd7ec 100644
--- a/xen/arch/arm/arm64/smc.S
+++ b/xen/arch/arm/arm64/smc.S
@@ -30,3 +30,4 @@ ENTRY(__arm_smccc_1_0_smc)
         stp     x2, x3, [x4, #SMCCC_RES_a2]
 1:
         ret
+        sb
diff --git a/xen/include/asm-arm/smccc.h b/xen/include/asm-arm/smccc.h
index 9d94beb3df2d..8650224923b1 100644
--- a/xen/include/asm-arm/smccc.h
+++ b/xen/include/asm-arm/smccc.h
@@ -200,11 +200,24 @@ struct arm_smccc_res {
  * We have an output list that is not necessarily used, and GCC feels
  * entitled to optimise the whole sequence away. "volatile" is what
  * makes it stick.
+ *
+ * Some of the SMC callers are passing directly values that are
+ * controlled by the guest. To mitigate against straight-line
+ * speculation, a speculation barrier is required. As it may be
+ * expensive to architecturally execute the speculation barrier, we are
+ * using a B instruction to architecturally skip it.
+ *
+ * Note that the speculation barrier is technically not necessary as the
+ * B instruction should already block straight-line speculation. But
+ * better be safe than sorry ;).
  */
 #define arm_smccc_1_1_smc(...)                                  \
     do {                                                        \
         __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \
         asm volatile("smc #0\n"                                 \
+                     "b 1f\n"                                   \
+                     ASM_SB                                     \
+                     "1:\n"                                     \
                      __constraints(__count_args(__VA_ARGS__))); \
         if ( ___res )                                           \
         *___res = (typeof(*___res)){r0, r1, r2, r3};            \
diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h
index 65d5c8e423d7..e33ff4e0fc39 100644
--- a/xen/include/asm-arm/system.h
+++ b/xen/include/asm-arm/system.h
@@ -33,6 +33,14 @@
 #define smp_mb__before_atomic()    smp_mb()
 #define smp_mb__after_atomic()     smp_mb()
 
+/*
+ * Speculative barrier
+ * XXX: Add support for the 'sb' instruction
+ */
+#define ASM_SB "dsb nsh \n isb \n"
+
+#define sb()    asm volatile(ASM_SB)
+
 /*
  * This is used to ensure the compiler did actually allocate the register we
  * asked it for some inline assembly sequences.  Apparently we can't trust
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-06-16 17:59 ` [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction Julien Grall
@ 2020-06-16 21:24   ` Stefano Stabellini
  2020-06-17 16:23     ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2020-06-16 21:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: sstabellini, paul, Andre.Przywara, Julien Grall,
	Bertrand.Marquis, security, xen-devel, Volodymyr_Babchuk

On Tue, 16 Jun 2020, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Some CPUs can speculate past a RET instruction and potentially perform
> speculative accesses to memory before processing the return.
> 
> There is no known gadget available after the RET instruction today.
> However some of the registers (such as in check_pending_guest_serror())
> may contain a value provided the guest.
                              ^ by


> In order to harden the code, it would be better to add a speculation
> barrier after each RET instruction. The performance is meant to be
> negligeable as the speculation barrier is not meant to be archicturally
> executed.
> 
> Note that on arm32, the ldmia instruction will act as a return from the
> function __context_switch(). While the whitepaper doesn't suggest it is
> possible to speculate after the instruction, add preventively a
> speculation barrier after it as well.
> 
> This is part of the work to mitigate straight-line speculation.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

I did a compile-test on the patch too.


> ---
> 
> I am still unsure whether we preventively should add a speculation barrier
> preventively after all the RET instructions in arm*/lib/. The smc call be
> taken care in a follow-up patch.

SMC is great to have but it seems to be overkill to do the ones under
lib/.


> ---
>  xen/arch/arm/arm32/entry.S | 1 +
>  xen/arch/arm/arm64/entry.S | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
> index b228d44b190c..bd54fc43558b 100644
> --- a/xen/arch/arm/arm32/entry.S
> +++ b/xen/arch/arm/arm32/entry.S
> @@ -442,6 +442,7 @@ ENTRY(__context_switch)
>  
>          add     r4, r1, #VCPU_arch_saved_context
>          ldmia   r4, {r4 - sl, fp, sp, pc}       /* Load registers and return */
> +        sb
>  
>  /*
>   * Local variables:
> diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
> index 175ea2981e72..b37d8fe09dc2 100644
> --- a/xen/arch/arm/arm64/entry.S
> +++ b/xen/arch/arm/arm64/entry.S
> @@ -522,6 +522,7 @@ abort_guest_exit_end:
>          cset    x19, ne
>  
>          ret
> +        sb
>  ENDPROC(check_pending_guest_serror)
>  
>  /*
> @@ -583,6 +584,7 @@ ENTRY(__context_switch)
>          ldr     lr, [x8]
>          mov     sp, x9
>          ret
> +        sb
>  
>  /*
>   * Local variables:
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call
  2020-06-16 17:59 ` [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call Julien Grall
@ 2020-06-16 21:34   ` Stefano Stabellini
  2020-06-16 21:57     ` Julien Grall
  2020-06-18 17:46   ` Julien Grall
  1 sibling, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2020-06-16 21:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: sstabellini, paul, Andre.Przywara, Julien Grall,
	Bertrand.Marquis, security, xen-devel, Volodymyr_Babchuk

On Tue, 16 Jun 2020, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> SMC call will update some of registers (typically only x0) depending on
  ^a SMC call

> the arguments provided.
> 
> Some CPUs can speculate past a SMC instruction and potentially perform
> speculative access to emrmoy using the pre-call values before executing
                        ^ memory

> the SMC.
> 
> There is no known gadget available after the SMC call today. However
> some of the registers may contain values from the guest and are expected
> to be updated by the SMC call.
> 
> In order to harden the code, it would be better to prevent straight-line
> speculation from an SMC. Architecturally executing the speculation
                   ^ a? any?


> barrier after every SMC can be rather expensive (particularly on core
> not SB). Therefore we want to mitigate it diferrently:
       ^ not SB capable?                    ^ differently


> 
>     * For arm_smccc_1_0_smc, we can avoid a speculation barrier right
>     after the SMC instruction and instead rely on the one after eret.
                                                                  ^ ret


>     * For arm_smccc_1_1_smc, we can place a B instruction after the SMC
>     instruction to skip the barrier.
> 
> Note that arm_smccc_1_0_smc version on arm32 is just an alias to
> arm_smccc_1_1_smc.
> 
> Note that no speculation barrier has been added after the SMC
> instruction in arm64/entry.S. This is fine because the call is not
> expected to modify any registers. So straight-line speculation doesn't
> matter.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

I couldn't spot any issues with the patch and I compile-tested it.


> ---
> 
> Note this hasn't been vetted by Arm but they are using the same
> sort of mitigation for blr. So I am quite confident this could do the
> trick.

Noted


> ---
>  xen/arch/arm/arm64/smc.S     |  1 +
>  xen/include/asm-arm/smccc.h  | 13 +++++++++++++
>  xen/include/asm-arm/system.h |  8 ++++++++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/xen/arch/arm/arm64/smc.S b/xen/arch/arm/arm64/smc.S
> index b0752be57e8f..e0a3106dd7ec 100644
> --- a/xen/arch/arm/arm64/smc.S
> +++ b/xen/arch/arm/arm64/smc.S
> @@ -30,3 +30,4 @@ ENTRY(__arm_smccc_1_0_smc)
>          stp     x2, x3, [x4, #SMCCC_RES_a2]
>  1:
>          ret
> +        sb
> diff --git a/xen/include/asm-arm/smccc.h b/xen/include/asm-arm/smccc.h
> index 9d94beb3df2d..8650224923b1 100644
> --- a/xen/include/asm-arm/smccc.h
> +++ b/xen/include/asm-arm/smccc.h
> @@ -200,11 +200,24 @@ struct arm_smccc_res {
>   * We have an output list that is not necessarily used, and GCC feels
>   * entitled to optimise the whole sequence away. "volatile" is what
>   * makes it stick.
> + *
> + * Some of the SMC callers are passing directly values that are
> + * controlled by the guest. To mitigate against straight-line
> + * speculation, a speculation barrier is required. As it may be
> + * expensive to architecturally execute the speculation barrier, we are
> + * using a B instruction to architecturally skip it.
> + *
> + * Note that the speculation barrier is technically not necessary as the
> + * B instruction should already block straight-line speculation. But
> + * better be safe than sorry ;).

Eh eh indeed :-)

I think this would be one thing to consider removing depending on ARM's
feedback.


>   */
>  #define arm_smccc_1_1_smc(...)                                  \
>      do {                                                        \
>          __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \
>          asm volatile("smc #0\n"                                 \
> +                     "b 1f\n"                                   \
> +                     ASM_SB                                     \
> +                     "1:\n"                                     \
>                       __constraints(__count_args(__VA_ARGS__))); \
>          if ( ___res )                                           \
>          *___res = (typeof(*___res)){r0, r1, r2, r3};            \
> diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h
> index 65d5c8e423d7..e33ff4e0fc39 100644
> --- a/xen/include/asm-arm/system.h
> +++ b/xen/include/asm-arm/system.h
> @@ -33,6 +33,14 @@
>  #define smp_mb__before_atomic()    smp_mb()
>  #define smp_mb__after_atomic()     smp_mb()
>  
> +/*
> + * Speculative barrier
> + * XXX: Add support for the 'sb' instruction
> + */
> +#define ASM_SB "dsb nsh \n isb \n"

Why not ';' ? Anyway it doesn't matter.


> +#define sb()    asm volatile(ASM_SB)
> +
>  /*
>   * This is used to ensure the compiler did actually allocate the register we
>   * asked it for some inline assembly sequences.  Apparently we can't trust
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call
  2020-06-16 21:34   ` Stefano Stabellini
@ 2020-06-16 21:57     ` Julien Grall
  2020-06-16 23:16       ` Andrew Cooper
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2020-06-16 21:57 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Paul Durrant, Andre Przywara, Julien Grall, Bertrand Marquis,
	Xen.org security team, xen-devel, Volodymyr Babchuk

On Tue, 16 Jun 2020 at 22:34, Stefano Stabellini <sstabellini@kernel.org> wrote:
>
> On Tue, 16 Jun 2020, Julien Grall wrote:
> > From: Julien Grall <jgrall@amazon.com>
> >
> > SMC call will update some of registers (typically only x0) depending on
>   ^a SMC call
>
> > the arguments provided.
> >
> > Some CPUs can speculate past a SMC instruction and potentially perform
> > speculative access to emrmoy using the pre-call values before executing
>                         ^ memory
>
> > the SMC.
> >
> > There is no known gadget available after the SMC call today. However
> > some of the registers may contain values from the guest and are expected
> > to be updated by the SMC call.
> >
> > In order to harden the code, it would be better to prevent straight-line
> > speculation from an SMC. Architecturally executing the speculation
>                    ^ a? any?

"any" might be better.

>
>
> > barrier after every SMC can be rather expensive (particularly on core
> > not SB). Therefore we want to mitigate it diferrently:
>        ^ not SB capable?                    ^ differently

It might be better to say "which doesn't support ARMv8.0-SB"

> >   */
> >  #define arm_smccc_1_1_smc(...)                                  \
> >      do {                                                        \
> >          __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \
> >          asm volatile("smc #0\n"                                 \
> > +                     "b 1f\n"                                   \
> > +                     ASM_SB                                     \
> > +                     "1:\n"                                     \
> >                       __constraints(__count_args(__VA_ARGS__))); \
> >          if ( ___res )                                           \
> >          *___res = (typeof(*___res)){r0, r1, r2, r3};            \
> > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h
> > index 65d5c8e423d7..e33ff4e0fc39 100644
> > --- a/xen/include/asm-arm/system.h
> > +++ b/xen/include/asm-arm/system.h
> > @@ -33,6 +33,14 @@
> >  #define smp_mb__before_atomic()    smp_mb()
> >  #define smp_mb__after_atomic()     smp_mb()
> >
> > +/*
> > + * Speculative barrier
> > + * XXX: Add support for the 'sb' instruction
> > + */
> > +#define ASM_SB "dsb nsh \n isb \n"
>
> Why not ';' ? Anyway it doesn't matter.

Per [1] and [2], a semicolon is not portable as some assemblers may
treat anything after it as a comment.

This reminds me that I have been using semicolons in entry.S. I
should probably have a look to avoid them.

Cheers,

[1] https://developer.arm.com/docs/dui0801/d/structure-of-assembly-language-modules/syntax-of-source-lines-in-assembly-language
[2] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#AssemblerTemplate


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call
  2020-06-16 21:57     ` Julien Grall
@ 2020-06-16 23:16       ` Andrew Cooper
  2020-06-16 23:27         ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2020-06-16 23:16 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Paul Durrant, Andre Przywara, Julien Grall, Bertrand Marquis,
	Xen.org security team, xen-devel, Volodymyr Babchuk

On 16/06/2020 22:57, Julien Grall wrote:
> On Tue, 16 Jun 2020 at 22:34, Stefano Stabellini <sstabellini@kernel.org> wrote:
>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> SMC call will update some of registers (typically only x0) depending on
>>   ^a SMC call

An SMC call.

>>
>>> the arguments provided.
>>>
>>> Some CPUs can speculate past a SMC instruction and potentially perform
>>> speculative access to emrmoy using the pre-call values before executing
>>                         ^ memory
>>
>>> the SMC.
>>>
>>> There is no known gadget available after the SMC call today. However
>>> some of the registers may contain values from the guest and are expected
>>> to be updated by the SMC call.
>>>
>>> In order to harden the code, it would be better to prevent straight-line
>>> speculation from an SMC. Architecturally executing the speculation
>>                    ^ a? any?
> "any" might be better.

"an SMC" is correct, but "any" is also fine.

'a' vs 'an' is based on the sound of the following.  S in "S-M-C" as an
abbreviation starts with an 'e' vowel sound, unlike 's' in secure, so
the correct grammar is "an SMC" and "a secure monitor call".

~Andrew


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call
  2020-06-16 23:16       ` Andrew Cooper
@ 2020-06-16 23:27         ` Stefano Stabellini
  0 siblings, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2020-06-16 23:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Paul Durrant, Andre Przywara, Julien Grall,
	Bertrand Marquis, Xen.org security team, xen-devel,
	Volodymyr Babchuk, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

On Wed, 17 Jun 2020, Andrew Cooper wrote:
> On 16/06/2020 22:57, Julien Grall wrote:
> > On Tue, 16 Jun 2020 at 22:34, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >> On Tue, 16 Jun 2020, Julien Grall wrote:
> >>> From: Julien Grall <jgrall@amazon.com>
> >>>
> >>> SMC call will update some of registers (typically only x0) depending on
> >>   ^a SMC call
> 
> An SMC call.
> 
> >>
> >>> the arguments provided.
> >>>
> >>> Some CPUs can speculate past a SMC instruction and potentially perform
> >>> speculative access to emrmoy using the pre-call values before executing
> >>                         ^ memory
> >>
> >>> the SMC.
> >>>
> >>> There is no known gadget available after the SMC call today. However
> >>> some of the registers may contain values from the guest and are expected
> >>> to be updated by the SMC call.
> >>>
> >>> In order to harden the code, it would be better to prevent straight-line
> >>> speculation from an SMC. Architecturally executing the speculation
> >>                    ^ a? any?
> > "any" might be better.
> 
> "an SMC" is correct, but "any" is also fine.
> 
> 'a' vs 'an' is based on the sound of the following.  S in "S-M-C" as an
> abbreviation starts with an 'e' vowel sound, unlike 's' in secure, so
> the correct grammar is "an SMC" and "a secure monitor call".

LOL! English sometimes... damn.

Anyway, many thanks for the correction :-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-06-16 21:24   ` Stefano Stabellini
@ 2020-06-17 16:23     ` Julien Grall
  2020-07-04 16:07       ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2020-06-17 16:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: paul, Andre.Przywara, Julien Grall, Bertrand.Marquis, security,
	xen-devel, Volodymyr_Babchuk

Hi,

On 16/06/2020 22:24, Stefano Stabellini wrote:
> On Tue, 16 Jun 2020, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Some CPUs can speculate past a RET instruction and potentially perform
>> speculative accesses to memory before processing the return.
>>
>> There is no known gadget available after the RET instruction today.
>> However some of the registers (such as in check_pending_guest_serror())
>> may contain a value provided the guest.
>                                ^ by
> 
> 
>> In order to harden the code, it would be better to add a speculation
>> barrier after each RET instruction. The performance is meant to be
>> negligeable as the speculation barrier is not meant to be archicturally
>> executed.
>>
>> Note that on arm32, the ldmia instruction will act as a return from the
>> function __context_switch(). While the whitepaper doesn't suggest it is
>> possible to speculate after the instruction, add preventively a
>> speculation barrier after it as well.
>>
>> This is part of the work to mitigate straight-line speculation.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> I did a compile-test on the patch too.
> 
> 
>> ---
>>
>> I am still unsure whether we preventively should add a speculation barrier
>> preventively after all the RET instructions in arm*/lib/. The smc call be
>> taken care in a follow-up patch.
> 
> SMC is great to have but it seems to be overkill to do the ones under
> lib/.
 From my understanding, the compiler will add a speculation barrier 
preventively after each 'ret' when the mitigation are turned on.So it 
feels to me we want to follow the same approach.

Obviously, we can avoid them but I would like to have a justification 
for not adding them (nothing is overkilled against speculation ;)).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call
  2020-06-16 17:59 ` [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call Julien Grall
  2020-06-16 21:34   ` Stefano Stabellini
@ 2020-06-18 17:46   ` Julien Grall
  1 sibling, 0 replies; 20+ messages in thread
From: Julien Grall @ 2020-06-18 17:46 UTC (permalink / raw)
  To: security
  Cc: sstabellini, paul, Andre.Przywara, Julien Grall,
	Bertrand.Marquis, xen-devel, Volodymyr_Babchuk



On 16/06/2020 18:59, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> SMC call will update some of registers (typically only x0) depending on
> the arguments provided.
> 
> Some CPUs can speculate past a SMC instruction and potentially perform
> speculative access to emrmoy using the pre-call values before executing
> the SMC.
> 
> There is no known gadget available after the SMC call today. However
> some of the registers may contain values from the guest and are expected
> to be updated by the SMC call.
> 
> In order to harden the code, it would be better to prevent straight-line
> speculation from an SMC. Architecturally executing the speculation
> barrier after every SMC can be rather expensive (particularly on core
> not SB). Therefore we want to mitigate it diferrently:
> 
>      * For arm_smccc_1_0_smc, we can avoid a speculation barrier right
>      after the SMC instruction and instead rely on the one after eret.
>      * For arm_smccc_1_1_smc, we can place a B instruction after the SMC
>      instruction to skip the barrier.
> 
> Note that arm_smccc_1_0_smc version on arm32 is just an alias to
> arm_smccc_1_1_smc.
> 
> Note that no speculation barrier has been added after the SMC
> instruction in arm64/entry.S. This is fine because the call is not
> expected to modify any registers. So straight-line speculation doesn't
> matter.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
> 
> Note this hasn't been vetted by Arm but they are using the same
> sort of mitigation for blr. So I am quite confident this could do the
> trick.

Actually there is some unknown on whether this may introduce issue on 
other sort of speculation. As there is no known reveal gadge after the 
SMC call and this is only about prevention, I will withdraw this patch 
for the time being.

Patch #1 is still valid though.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-06-17 16:23     ` Julien Grall
@ 2020-07-04 16:07       ` Julien Grall
  2020-08-18 16:35         ` Bertrand Marquis
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2020-07-04 16:07 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: paul, Andre.Przywara, Julien Grall, Bertrand.Marquis, security,
	xen-devel, Volodymyr_Babchuk

On 17/06/2020 17:23, Julien Grall wrote:
> Hi,
> 
> On 16/06/2020 22:24, Stefano Stabellini wrote:
>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> Some CPUs can speculate past a RET instruction and potentially perform
>>> speculative accesses to memory before processing the return.
>>>
>>> There is no known gadget available after the RET instruction today.
>>> However some of the registers (such as in check_pending_guest_serror())
>>> may contain a value provided the guest.
>>                                ^ by
>>
>>
>>> In order to harden the code, it would be better to add a speculation
>>> barrier after each RET instruction. The performance is meant to be
>>> negligeable as the speculation barrier is not meant to be archicturally
>>> executed.
>>>
>>> Note that on arm32, the ldmia instruction will act as a return from the
>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>> possible to speculate after the instruction, add preventively a
>>> speculation barrier after it as well.
>>>
>>> This is part of the work to mitigate straight-line speculation.
>>>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>
>> I did a compile-test on the patch too.
>>
>>
>>> ---
>>>
>>> I am still unsure whether we preventively should add a speculation 
>>> barrier
>>> preventively after all the RET instructions in arm*/lib/. The smc 
>>> call be
>>> taken care in a follow-up patch.
>>
>> SMC is great to have but it seems to be overkill to do the ones under
>> lib/.
>  From my understanding, the compiler will add a speculation barrier 
> preventively after each 'ret' when the mitigation are turned on.So it 
> feels to me we want to follow the same approach.
> 
> Obviously, we can avoid them but I would like to have a justification 
> for not adding them (nothing is overkilled against speculation ;)).

I finally found some time to look at arm*/lib in more details. Some of 
the helpers can definitely be called with guest inputs.

For instance, memchr() is called from hypfs_get_path_user() with the 3rd 
argument controlled by the guest. In both 32-bit and 64-bit 
implementation, you will reach the end of the function memchr() with 
r2/w2 and r3/w3 (it contains a character from the buffer) controlled by 
the guest.

As this is the only function in the unit, we don't know what will be the 
instructions right after RET. So it would be safer to add a speculation 
barrier there too.

Note that hypfs is currently only accessibly by Dom0. Yet, I still think 
we should try to harden any code if we can :).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-07-04 16:07       ` Julien Grall
@ 2020-08-18 16:35         ` Bertrand Marquis
  2020-08-18 16:43           ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Bertrand Marquis @ 2020-08-18 16:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, security, Paul Durrant, Xen-devel,
	Volodymyr Babchuk, Andre Przywara, Julien Grall, nd

Hi Julien,

Somehow we stopped on this thread and you did already most of the work so I think we should try to finish what you started


> On 4 Jul 2020, at 17:07, Julien Grall <julien@xen.org> wrote:
> 
> On 17/06/2020 17:23, Julien Grall wrote:
>> Hi,
>> On 16/06/2020 22:24, Stefano Stabellini wrote:
>>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>> 
>>>> Some CPUs can speculate past a RET instruction and potentially perform
>>>> speculative accesses to memory before processing the return.
>>>> 
>>>> There is no known gadget available after the RET instruction today.
>>>> However some of the registers (such as in check_pending_guest_serror())
>>>> may contain a value provided the guest.
>>>                                ^ by
>>> 
>>> 
>>>> In order to harden the code, it would be better to add a speculation
>>>> barrier after each RET instruction. The performance is meant to be
>>>> negligeable as the speculation barrier is not meant to be archicturally
>>>> executed.
>>>> 
>>>> Note that on arm32, the ldmia instruction will act as a return from the
>>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>>> possible to speculate after the instruction, add preventively a
>>>> speculation barrier after it as well.
>>>> 
>>>> This is part of the work to mitigate straight-line speculation.
>>>> 
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>> 
>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>> 
>>> I did a compile-test on the patch too.
>>> 
>>> 
>>>> ---
>>>> 
>>>> I am still unsure whether we preventively should add a speculation barrier
>>>> preventively after all the RET instructions in arm*/lib/. The smc call be
>>>> taken care in a follow-up patch.
>>> 
>>> SMC is great to have but it seems to be overkill to do the ones under
>>> lib/.
>> From my understanding, the compiler will add a speculation barrier preventively after each 'ret' when the mitigation are turned on.So it feels to me we want to follow the same approach.
>> Obviously, we can avoid them but I would like to have a justification for not adding them (nothing is overkilled against speculation ;)).
> 
> I finally found some time to look at arm*/lib in more details. Some of the helpers can definitely be called with guest inputs.
> 
> For instance, memchr() is called from hypfs_get_path_user() with the 3rd argument controlled by the guest. In both 32-bit and 64-bit implementation, you will reach the end of the function memchr() with r2/w2 and r3/w3 (it contains a character from the buffer) controlled by the guest.
> 
> As this is the only function in the unit, we don't know what will be the instructions right after RET. So it would be safer to add a speculation barrier there too.

How about adding a speculation barrier directly in the ENDPROC macro ?

> 
> Note that hypfs is currently only accessibly by Dom0. Yet, I still think we should try to harden any code if we can :).

Agree with that.

Cheers (and sorry for the delay)
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-18 16:35         ` Bertrand Marquis
@ 2020-08-18 16:43           ` Julien Grall
  2020-08-18 17:06             ` Bertrand Marquis
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2020-08-18 16:43 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, security, Paul Durrant, Xen-devel,
	Volodymyr Babchuk, Andre Przywara, Julien Grall, nd



On 18/08/2020 17:35, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

> 
> Somehow we stopped on this thread and you did already most of the work so I think we should try to finish what you started

Sorry this fell-through the cracks. I have a new version for patch #1, 
but not yet patch #2.

I am still debating with myself where the speculation barrier should be 
added after the SMC :).

> 
> 
>> On 4 Jul 2020, at 17:07, Julien Grall <julien@xen.org> wrote:
>>
>> On 17/06/2020 17:23, Julien Grall wrote:
>>> Hi,
>>> On 16/06/2020 22:24, Stefano Stabellini wrote:
>>>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> Some CPUs can speculate past a RET instruction and potentially perform
>>>>> speculative accesses to memory before processing the return.
>>>>>
>>>>> There is no known gadget available after the RET instruction today.
>>>>> However some of the registers (such as in check_pending_guest_serror())
>>>>> may contain a value provided the guest.
>>>>                                 ^ by
>>>>
>>>>
>>>>> In order to harden the code, it would be better to add a speculation
>>>>> barrier after each RET instruction. The performance is meant to be
>>>>> negligeable as the speculation barrier is not meant to be archicturally
>>>>> executed.
>>>>>
>>>>> Note that on arm32, the ldmia instruction will act as a return from the
>>>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>>>> possible to speculate after the instruction, add preventively a
>>>>> speculation barrier after it as well.
>>>>>
>>>>> This is part of the work to mitigate straight-line speculation.
>>>>>
>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>
>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>>
>>>> I did a compile-test on the patch too.
>>>>
>>>>
>>>>> ---
>>>>>
>>>>> I am still unsure whether we preventively should add a speculation barrier
>>>>> preventively after all the RET instructions in arm*/lib/. The smc call be
>>>>> taken care in a follow-up patch.
>>>>
>>>> SMC is great to have but it seems to be overkill to do the ones under
>>>> lib/.
>>>  From my understanding, the compiler will add a speculation barrier preventively after each 'ret' when the mitigation are turned on.So it feels to me we want to follow the same approach.
>>> Obviously, we can avoid them but I would like to have a justification for not adding them (nothing is overkilled against speculation ;)).
>>
>> I finally found some time to look at arm*/lib in more details. Some of the helpers can definitely be called with guest inputs.
>>
>> For instance, memchr() is called from hypfs_get_path_user() with the 3rd argument controlled by the guest. In both 32-bit and 64-bit implementation, you will reach the end of the function memchr() with r2/w2 and r3/w3 (it contains a character from the buffer) controlled by the guest.
>>
>> As this is the only function in the unit, we don't know what will be the instructions right after RET. So it would be safer to add a speculation barrier there too.
> 
> How about adding a speculation barrier directly in the ENDPROC macro ?

This would unfortunately not cover all the cases because you can return 
in the middle of the function. I will have a look to see if we can 
leverage it.

Below the first draft for v2 of this patch.

commit 7f7cf1558283d0d35fc22ead909e8a812da82f27
Author: Julien Grall <jgrall@amazon.com>
Date:   Tue Jun 16 16:33:12 2020 +0100

     xen/arm: Place a speculation barrier following an ret instruction

     Some CPUs can speculate past a RET instruction and potentially perform
     speculative accesses to memory before processing the return.

     There is no known gadget available after the RET instruction today.
     However some of the registers (such as in check_pending_guest_serror())
     may contain a value provided the guest.

     In order to harden the code, it would be better to add a speculation
     barrier after each RET instruction. The performance is meant to be
     negligeable as the speculation barrier is not meant to be 
architecturally
     executed.

     The RET in boot code (i.e arm*/entry.S) are not followed by an SB
     because this is boot code. Therefore there is no concern as all the
     registers are controlled by the hypervisor.

     Note that on arm32, the ldmia instruction will act as a return from the
     function __context_switch(). While the whitepaper doesn't suggest it is
     possible to speculate after the instruction, add preventively a
     speculation barrier after it as well.

     This is part of the work to mitigate straight-line speculation.

     Signed-off-by: Julien Grall <jgrall@amazon.com>

     ---

         Changes in v2:
             - Add an a speculation barrier for the __arm_smccc_1_0_smc. 
This
             was originally part of the second patch.
             - I have now included all the arm*/lib/ assembly helpers
             because some of them are called with guest input value (e.g
             memchr).

diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
index b228d44b190c..bd54fc43558b 100644
--- a/xen/arch/arm/arm32/entry.S
+++ b/xen/arch/arm/arm32/entry.S
@@ -442,6 +442,7 @@ ENTRY(__context_switch)

          add     r4, r1, #VCPU_arch_saved_context
          ldmia   r4, {r4 - sl, fp, sp, pc}       /* Load registers and 
return */
+        sb

  /*
   * Local variables:
diff --git a/xen/arch/arm/arm32/lib/assembler.h 
b/xen/arch/arm/arm32/lib/assembler.h
index 6de2638a36d5..cffdb60baa29 100644
--- a/xen/arch/arm/arm32/lib/assembler.h
+++ b/xen/arch/arm/arm32/lib/assembler.h
@@ -17,6 +17,8 @@
  #ifndef __ASM_ASSEMBLER_H__
  #define __ASM_ASSEMBLER_H__

+#include <asm/macros.h>
+
  #ifndef __ASSEMBLY__
  #error "Only include this from assembly code"
  #endif
diff --git a/xen/arch/arm/arm32/lib/findbit.S 
b/xen/arch/arm/arm32/lib/findbit.S
index 00674a31efad..df86922a25bc 100644
--- a/xen/arch/arm/arm32/lib/findbit.S
+++ b/xen/arch/arm/arm32/lib/findbit.S
@@ -35,6 +35,7 @@ ENTRY(_find_first_zero_bit_le)
  		blo	1b
  3:		mov	r0, r1			@ no free bits
  		mov	pc, lr
+		sb
  ENDPROC(_find_first_zero_bit_le)

  /*
@@ -76,6 +77,7 @@ ENTRY(_find_first_bit_le)
  		blo	1b
  3:		mov	r0, r1			@ no free bits
  		mov	pc, lr
+		sb
  ENDPROC(_find_first_bit_le)

  /*
@@ -114,6 +116,7 @@ ENTRY(_find_first_zero_bit_be)
  		blo	1b
  3:		mov	r0, r1			@ no free bits
  		mov	pc, lr
+		sb
  ENDPROC(_find_first_zero_bit_be)

  ENTRY(_find_next_zero_bit_be)
@@ -148,6 +151,7 @@ ENTRY(_find_first_bit_be)
  		blo	1b
  3:		mov	r0, r1			@ no free bits
  		mov	pc, lr
+		sb
  ENDPROC(_find_first_bit_be)

  ENTRY(_find_next_bit_be)
@@ -192,4 +196,5 @@ ENDPROC(_find_next_bit_be)
  		cmp	r1, r0			@ Clamp to maxbit
  		movlo	r0, r1
  		mov	pc, lr
+		sb

diff --git a/xen/arch/arm/arm32/lib/lib1funcs.S 
b/xen/arch/arm/arm32/lib/lib1funcs.S
index f1278bd6c139..ebb9b4f1babf 100644
--- a/xen/arch/arm/arm32/lib/lib1funcs.S
+++ b/xen/arch/arm/arm32/lib/lib1funcs.S
@@ -217,15 +217,18 @@ UNWIND(.fnstart)

  	mov	r0, r2
  	mov	pc, lr
+	sb

  11:	moveq	r0, #1
  	movne	r0, #0
  	mov	pc, lr
+	sb

  12:	ARM_DIV2_ORDER r1, r2

  	mov	r0, r0, lsr r2
  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__udivsi3)
@@ -245,6 +248,7 @@ UNWIND(.fnstart)
  	ARM_MOD_BODY r0, r1, r2, r3

  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__umodsi3)
@@ -271,15 +275,18 @@ UNWIND(.fnstart)
  	cmp	ip, #0
  	rsbmi	r0, r0, #0
  	mov	pc, lr
+	sb

  10:	teq	ip, r0				@ same sign ?
  	rsbmi	r0, r0, #0
  	mov	pc, lr
+	sb

  11:	movlo	r0, #0
  	moveq	r0, ip, asr #31
  	orreq	r0, r0, #1
  	mov	pc, lr
+	sb

  12:	ARM_DIV2_ORDER r1, r2

@@ -287,6 +294,7 @@ UNWIND(.fnstart)
  	mov	r0, r3, lsr r2
  	rsbmi	r0, r0, #0
  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__divsi3)
@@ -312,6 +320,7 @@ UNWIND(.fnstart)
  10:	cmp	ip, #0
  	rsbmi	r0, r0, #0
  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__modsi3)
@@ -328,6 +337,7 @@ UNWIND(.save {r0, r1, ip, lr}	)
  	mul	r3, r0, r2
  	sub	r1, r1, r3
  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__aeabi_uidivmod)
@@ -341,6 +351,7 @@ UNWIND(.save {r0, r1, ip, lr}	)
  	mul	r3, r0, r2
  	sub	r1, r1, r3
  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__aeabi_idivmod)
@@ -355,6 +366,7 @@ UNWIND(.save {lr}	)
  	add sp, sp, #8
  	ldmfd sp!, {r2, r3}
  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__aeabi_uldivmod)
@@ -369,6 +381,7 @@ UNWIND(.save {lr}	)
  	add sp, sp, #16
  	ldmfd	sp!, {r2, r3}
  	mov	pc, lr
+	sb

  UNWIND(.fnend)
  ENDPROC(__aeabi_ldivmod)
@@ -382,5 +395,6 @@ UNWIND(.save {lr})
  	bl	__div0
  	mov	r0, #0			@ About as wrong as it could be.
  	ldr	pc, [sp], #8
+	sb
  UNWIND(.fnend)
  ENDPROC(Ldiv0)
diff --git a/xen/arch/arm/arm32/lib/lshrdi3.S 
b/xen/arch/arm/arm32/lib/lshrdi3.S
index 88b9655282ab..e4d275842c00 100644
--- a/xen/arch/arm/arm32/lib/lshrdi3.S
+++ b/xen/arch/arm/arm32/lib/lshrdi3.S
@@ -46,6 +46,7 @@ ENTRY(__aeabi_llsr)
   THUMB(	orrmi	al, al, r3		)
  	mov	ah, ah, lsr r2
  	mov	pc, lr
+	sb

  ENDPROC(__lshrdi3)
  ENDPROC(__aeabi_llsr)
diff --git a/xen/arch/arm/arm32/lib/memchr.S 
b/xen/arch/arm/arm32/lib/memchr.S
index 7dae4ab6a6f8..e2910d381b08 100644
--- a/xen/arch/arm/arm32/lib/memchr.S
+++ b/xen/arch/arm/arm32/lib/memchr.S
@@ -22,4 +22,5 @@ ENTRY(memchr)
  	sub	r0, r0, #1
  2:	movne	r0, #0
  	mov	pc, lr
+	sb
  ENDPROC(memchr)
diff --git a/xen/arch/arm/arm32/lib/memset.S 
b/xen/arch/arm/arm32/lib/memset.S
index 5a1dadf767d7..9ce78d9e4695 100644
--- a/xen/arch/arm/arm32/lib/memset.S
+++ b/xen/arch/arm/arm32/lib/memset.S
@@ -110,6 +110,7 @@ ENTRY(memset)
  	tst	r2, #1
  	strneb	r1, [ip], #1
  	mov	pc, lr
+	sb

  6:	subs	r2, r2, #4		@ 1 do we have enough
  	blt	5b			@ 1 bytes to align with?
diff --git a/xen/arch/arm/arm32/lib/strchr.S 
b/xen/arch/arm/arm32/lib/strchr.S
index fa53ad8bc659..697b5b30e38b 100644
--- a/xen/arch/arm/arm32/lib/strchr.S
+++ b/xen/arch/arm/arm32/lib/strchr.S
@@ -25,4 +25,5 @@ ENTRY(strchr)
  		movne	r0, #0
  		subeq	r0, r0, #1
  		mov	pc, lr
+		sb
  ENDPROC(strchr)
diff --git a/xen/arch/arm/arm32/lib/strrchr.S 
b/xen/arch/arm/arm32/lib/strrchr.S
index ec4d40de5e5a..9f64f23d5cb0 100644
--- a/xen/arch/arm/arm32/lib/strrchr.S
+++ b/xen/arch/arm/arm32/lib/strrchr.S
@@ -22,4 +22,5 @@ ENTRY(strrchr)
  		bne	1b
  		mov	r0, r3
  		mov	pc, lr
+		sb
  ENDPROC(strrchr)
diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
index 175ea2981e72..b37d8fe09dc2 100644
--- a/xen/arch/arm/arm64/entry.S
+++ b/xen/arch/arm/arm64/entry.S
@@ -522,6 +522,7 @@ abort_guest_exit_end:
          cset    x19, ne

          ret
+        sb
  ENDPROC(check_pending_guest_serror)

  /*
@@ -583,6 +584,7 @@ ENTRY(__context_switch)
          ldr     lr, [x8]
          mov     sp, x9
          ret
+        sb

  /*
   * Local variables:
diff --git a/xen/arch/arm/arm64/lib/assembler.h 
b/xen/arch/arm/arm64/lib/assembler.h
index 3f9c0dcf5d32..8356fd5d11fd 100644
--- a/xen/arch/arm/arm64/lib/assembler.h
+++ b/xen/arch/arm/arm64/lib/assembler.h
@@ -5,6 +5,8 @@
  #error "Only include this from assembly code"
  #endif

+#include <asm/macros.h>
+
  /* Only LE support so far */
  #define CPU_BE(x...)
  #define CPU_LE(x...) x
diff --git a/xen/arch/arm/arm64/lib/clear_page.S 
b/xen/arch/arm/arm64/lib/clear_page.S
index 9f8a680e1880..cf07bd00cb7e 100644
--- a/xen/arch/arm/arm64/lib/clear_page.S
+++ b/xen/arch/arm/arm64/lib/clear_page.S
@@ -14,6 +14,8 @@
   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
   */

+#include <asm/macros.h>
+
  /*
   * Clear page @dest
   *
@@ -31,4 +33,5 @@ ENTRY(clear_page)
  	tst	x0, #(PAGE_SIZE - 1)
  	b.ne	1b
  	ret
+	sb
  ENDPROC(clear_page)
diff --git a/xen/arch/arm/arm64/lib/memchr.S 
b/xen/arch/arm/arm64/lib/memchr.S
index 81f113bb1cab..ff75f03d86bc 100644
--- a/xen/arch/arm/arm64/lib/memchr.S
+++ b/xen/arch/arm/arm64/lib/memchr.S
@@ -38,6 +38,8 @@ ENTRY(memchr)
  	b.ne	1b
  	sub	x0, x0, #1
  	ret
+	sb
  2:	mov	x0, #0
  	ret
+	sb
  ENDPROC(memchr)
diff --git a/xen/arch/arm/arm64/lib/memcmp.S 
b/xen/arch/arm/arm64/lib/memcmp.S
index 87c2537ffeb7..2583bd404249 100644
--- a/xen/arch/arm/arm64/lib/memcmp.S
+++ b/xen/arch/arm/arm64/lib/memcmp.S
@@ -160,6 +160,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp1 )
  1:
  	sub	result, data1, data2
  	ret
+	sb

  .Lstart_align:
  	lsr	limit_wd, limit, #3
@@ -236,6 +237,7 @@ CPU_LE( rev	data2, data2 )
  	lsr	data1, data1, #56
  	sub	result, data1, data2, lsr #56
  	ret
+	sb

  .Lremain8:
  	/* Limit % 8 == 0 =>. all data are equal.*/
@@ -251,7 +253,9 @@ CPU_LE( rev	data2, data2 )
  	b.eq	.Ltiny8proc
  	sub	result, data1, data2
  	ret
+	sb
  .Lret0:
  	mov	result, #0
  	ret
+	sb
  ENDPROC(memcmp)
diff --git a/xen/arch/arm/arm64/lib/memcpy.S 
b/xen/arch/arm/arm64/lib/memcpy.S
index d90d20ef3ea8..f79ba8cb9515 100644
--- a/xen/arch/arm/arm64/lib/memcpy.S
+++ b/xen/arch/arm/arm64/lib/memcpy.S
@@ -142,6 +142,7 @@ ENTRY(memcpy)

  .Lexitfunc:
  	ret
+	sb

  .Lcpy_over64:
  	subs	count, count, #128
@@ -162,6 +163,7 @@ ENTRY(memcpy)
  	tst	count, #0x3f
  	b.ne	.Ltail63
  	ret
+	sb

  	/*
  	* Critical loop.  Start at a new cache line boundary.  Assuming
@@ -197,4 +199,5 @@ ENTRY(memcpy)
  	tst	count, #0x3f
  	b.ne	.Ltail63
  	ret
+	sb
  ENDPROC(memcpy)
diff --git a/xen/arch/arm/arm64/lib/memmove.S 
b/xen/arch/arm/arm64/lib/memmove.S
index a49de845d066..dc0429583db7 100644
--- a/xen/arch/arm/arm64/lib/memmove.S
+++ b/xen/arch/arm/arm64/lib/memmove.S
@@ -138,6 +138,7 @@ ENTRY(memmove)

  .Lexitfunc:
  	ret
+	sb

  .Lcpy_over64:
  	subs	count, count, #128
@@ -158,6 +159,7 @@ ENTRY(memmove)
  	tst	count, #0x3f
  	b.ne	.Ltail63
  	ret
+	sb

  	/*
  	* Critical loop. Start at a new cache line boundary. Assuming
@@ -193,4 +195,5 @@ ENTRY(memmove)
  	tst	count, #0x3f
  	b.ne	.Ltail63
  	ret
+	sb
  ENDPROC(memmove)
diff --git a/xen/arch/arm/arm64/lib/memset.S 
b/xen/arch/arm/arm64/lib/memset.S
index 5bf751521b60..e052d7c937d8 100644
--- a/xen/arch/arm/arm64/lib/memset.S
+++ b/xen/arch/arm/arm64/lib/memset.S
@@ -76,6 +76,7 @@ ENTRY(memset)
  	strb	A_lw, [dst]
  4:
  	ret
+	sb

  .Lover16_proc:
  	/*Whether  the start address is aligned with 16.*/
@@ -120,6 +121,7 @@ ENTRY(memset)
  	stp	A_l, A_l, [dst, #-16]	/* Repeat some/all of last store. */
  4:
  	ret
+	sb

  	/*
  	* Critical loop. Start at a new cache line boundary. Assuming
@@ -141,6 +143,7 @@ ENTRY(memset)
  	b.ne	.Ltail63
  .Lexitfunc:
  	ret
+	sb

  	/*
  	* For zeroing memory, check to see if we can use the ZVA feature to
@@ -212,4 +215,5 @@ ENTRY(memset)
  	ands	count, count, zva_bits_x
  	b.ne	.Ltail_maybe_long
  	ret
+	sb
  ENDPROC(memset)
diff --git a/xen/arch/arm/arm64/lib/strchr.S 
b/xen/arch/arm/arm64/lib/strchr.S
index 0506b0ff7f3a..be1d2fa2ec82 100644
--- a/xen/arch/arm/arm64/lib/strchr.S
+++ b/xen/arch/arm/arm64/lib/strchr.S
@@ -17,6 +17,7 @@
   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
   */

+#include <asm/macros.h>

  /*
   * Find the first occurrence of a character in a string.
@@ -37,4 +38,5 @@ ENTRY(strchr)
  	cmp	w2, w1
  	csel	x0, x0, xzr, eq
  	ret
+	sb
  ENDPROC(strchr)
diff --git a/xen/arch/arm/arm64/lib/strcmp.S 
b/xen/arch/arm/arm64/lib/strcmp.S
index c6f42dd2550b..08eda29302f0 100644
--- a/xen/arch/arm/arm64/lib/strcmp.S
+++ b/xen/arch/arm/arm64/lib/strcmp.S
@@ -133,6 +133,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp1 )	/* Shift (tmp1 & 63).  */
  1:
  	sub	result, data1, data2
  	ret
+	sb

  .Lstart_align:
  	ands	xzr, src1, #7
@@ -205,6 +206,7 @@ CPU_BE( cmp	data1, data2 )
  CPU_BE( cset	result, ne )
  CPU_BE( cneg	result, result, lo )
  CPU_BE( ret )
+CPU_BE( sb )
  CPU_BE( 1: )
  	/*Re-compute the NUL-byte detection, using a byte-reversed value. */
  CPU_BE(	rev	tmp3, data1 )
@@ -230,4 +232,5 @@ CPU_BE(	orr	syndrome, diff, has_nul )
  	lsr	data1, data1, #56
  	sub	result, data1, data2, lsr #56
  	ret
+	sb
  ENDPROC(strcmp)
diff --git a/xen/arch/arm/arm64/lib/strlen.S 
b/xen/arch/arm/arm64/lib/strlen.S
index fb6aaf1a6afb..92df3c5e4418 100644
--- a/xen/arch/arm/arm64/lib/strlen.S
+++ b/xen/arch/arm/arm64/lib/strlen.S
@@ -106,6 +106,7 @@ CPU_BE( bic	has_nul2, tmp1, tmp2 )
  	clz	pos, has_nul2
  	add	len, len, pos, lsr #3		/* Bits to bytes.  */
  	ret
+	sb

  .Lmisaligned:
  	cmp	tmp1, #8
diff --git a/xen/arch/arm/arm64/lib/strncmp.S 
b/xen/arch/arm/arm64/lib/strncmp.S
index a4a0f779f56e..e13ff99fe75e 100644
--- a/xen/arch/arm/arm64/lib/strncmp.S
+++ b/xen/arch/arm/arm64/lib/strncmp.S
@@ -182,6 +182,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp3 )	/* Shift (tmp1 & 63).  */
  1:
  	sub	result, data1, data2
  	ret
+	sb

  .Lstart_align:
  	lsr	limit_wd, limit, #3
@@ -264,6 +265,7 @@ CPU_BE( cmp	data1, data2 )
  CPU_BE( cset	result, ne )
  CPU_BE( cneg	result, result, lo )
  CPU_BE( ret )
+CPU_BE( ret )
  CPU_BE( 1: )
  	/* Re-compute the NUL-byte detection, using a byte-reversed value.*/
  CPU_BE( rev	tmp3, data1 )
@@ -288,6 +290,7 @@ CPU_BE( orr	syndrome, diff, has_nul )
  	lsr	data1, data1, #56
  	sub	result, data1, data2, lsr #56
  	ret
+	sb

  .Lremain8:
  	/* Limit % 8 == 0 => all bytes significant.  */
@@ -303,8 +306,10 @@ CPU_BE( orr	syndrome, diff, has_nul )
  	b.eq	.Ltiny8proc
  	sub	result, data1, data2
  	ret
+	sb

  .Lret0:
  	mov	result, #0
  	ret
+	sb
  ENDPROC(strncmp)
diff --git a/xen/arch/arm/arm64/lib/strnlen.S 
b/xen/arch/arm/arm64/lib/strnlen.S
index 81c8e8b54ea9..c59467bf17e1 100644
--- a/xen/arch/arm/arm64/lib/strnlen.S
+++ b/xen/arch/arm/arm64/lib/strnlen.S
@@ -126,6 +126,7 @@ CPU_BE( bic	has_nul2, tmp1, tmp2 )
  	cmp	len, limit
  	csel	len, len, limit, ls     /* Return the lower value.  */
  	ret
+	sb

  .Lmisaligned:
  	/*
@@ -168,4 +169,5 @@ CPU_LE( lsr	tmp2, tmp2, tmp4 )	/* Shift (tmp1 & 63).  */
  .Lhit_limit:
  	mov	len, limit
  	ret
+	sb
  ENDPROC(strnlen)
diff --git a/xen/arch/arm/arm64/lib/strrchr.S 
b/xen/arch/arm/arm64/lib/strrchr.S
index 07059983f828..fc8b6e446071 100644
--- a/xen/arch/arm/arm64/lib/strrchr.S
+++ b/xen/arch/arm/arm64/lib/strrchr.S
@@ -17,6 +17,8 @@
   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
   */

+#include <asm/macros.h>
+
  /*
   * Find the last occurrence of a character in a string.
   *
@@ -37,4 +39,5 @@ ENTRY(strrchr)
  	b	1b
  2:	mov	x0, x3
  	ret
+	sb
  ENDPROC(strrchr)
diff --git a/xen/arch/arm/arm64/smc.S b/xen/arch/arm/arm64/smc.S
index b0752be57e8f..e0a3106dd7ec 100644
--- a/xen/arch/arm/arm64/smc.S
+++ b/xen/arch/arm/arm64/smc.S
@@ -30,3 +30,4 @@ ENTRY(__arm_smccc_1_0_smc)
          stp     x2, x3, [x4, #SMCCC_RES_a2]
  1:
          ret
+        sb

-- 
Julien Grall


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-18 16:43           ` Julien Grall
@ 2020-08-18 17:06             ` Bertrand Marquis
  2020-08-18 17:34               ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Bertrand Marquis @ 2020-08-18 17:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, security, Paul Durrant, Xen-devel,
	Volodymyr Babchuk, Andre Przywara, Julien Grall, nd



> On 18 Aug 2020, at 17:43, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 18/08/2020 17:35, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>> Somehow we stopped on this thread and you did already most of the work so I think we should try to finish what you started
> 
> Sorry this fell-through the cracks. I have a new version for patch #1, but not yet patch #2.

No problem this came back while trying to reduce my todolist stack :-)

> 
> I am still debating with myself where the speculation barrier should be added after the SMC :).

I think that we should unless the SMC is in the context switch path (as all other calls should not have a performance impact).

> 
>>> On 4 Jul 2020, at 17:07, Julien Grall <julien@xen.org> wrote:
>>> 
>>> On 17/06/2020 17:23, Julien Grall wrote:
>>>> Hi,
>>>> On 16/06/2020 22:24, Stefano Stabellini wrote:
>>>>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>> 
>>>>>> Some CPUs can speculate past a RET instruction and potentially perform
>>>>>> speculative accesses to memory before processing the return.
>>>>>> 
>>>>>> There is no known gadget available after the RET instruction today.
>>>>>> However some of the registers (such as in check_pending_guest_serror())
>>>>>> may contain a value provided the guest.
>>>>>                                ^ by
>>>>> 
>>>>> 
>>>>>> In order to harden the code, it would be better to add a speculation
>>>>>> barrier after each RET instruction. The performance is meant to be
>>>>>> negligeable as the speculation barrier is not meant to be archicturally
>>>>>> executed.
>>>>>> 
>>>>>> Note that on arm32, the ldmia instruction will act as a return from the
>>>>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>>>>> possible to speculate after the instruction, add preventively a
>>>>>> speculation barrier after it as well.
>>>>>> 
>>>>>> This is part of the work to mitigate straight-line speculation.
>>>>>> 
>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>> 
>>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>>> 
>>>>> I did a compile-test on the patch too.
>>>>> 
>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> I am still unsure whether we preventively should add a speculation barrier
>>>>>> preventively after all the RET instructions in arm*/lib/. The smc call be
>>>>>> taken care in a follow-up patch.
>>>>> 
>>>>> SMC is great to have but it seems to be overkill to do the ones under
>>>>> lib/.
>>>> From my understanding, the compiler will add a speculation barrier preventively after each 'ret' when the mitigation are turned on.So it feels to me we want to follow the same approach.
>>>> Obviously, we can avoid them but I would like to have a justification for not adding them (nothing is overkilled against speculation ;)).
>>> 
>>> I finally found some time to look at arm*/lib in more details. Some of the helpers can definitely be called with guest inputs.
>>> 
>>> For instance, memchr() is called from hypfs_get_path_user() with the 3rd argument controlled by the guest. In both 32-bit and 64-bit implementation, you will reach the end of the function memchr() with r2/w2 and r3/w3 (it contains a character from the buffer) controlled by the guest.
>>> 
>>> As this is the only function in the unit, we don't know what will be the instructions right after RET. So it would be safer to add a speculation barrier there too.
>> How about adding a speculation barrier directly in the ENDPROC macro ?
> 
> This would unfortunately not cover all the cases because you can return in the middle of the function. I will have a look to see if we can leverage it.

I agree that it would not solve all of them but a big part would be solved by it.
An other solution might be to have a RETURN macro encoded as "mov pc,lr; sb" and "ret; sb”.

The patch sounds good, i just need to find a way to analyse if you missed a ret or not which is not easy with such a patch :-)

Bertrand

> 
> Below the first draft for v2 of this patch.
> 
> commit 7f7cf1558283d0d35fc22ead909e8a812da82f27
> Author: Julien Grall <jgrall@amazon.com>
> Date:   Tue Jun 16 16:33:12 2020 +0100
> 
>    xen/arm: Place a speculation barrier following an ret instruction
> 
>    Some CPUs can speculate past a RET instruction and potentially perform
>    speculative accesses to memory before processing the return.
> 
>    There is no known gadget available after the RET instruction today.
>    However some of the registers (such as in check_pending_guest_serror())
>    may contain a value provided the guest.
> 
>    In order to harden the code, it would be better to add a speculation
>    barrier after each RET instruction. The performance is meant to be
>    negligeable as the speculation barrier is not meant to be architecturally
>    executed.
> 
>    The RET in boot code (i.e arm*/entry.S) are not followed by an SB
>    because this is boot code. Therefore there is no concern as all the
>    registers are controlled by the hypervisor.
> 
>    Note that on arm32, the ldmia instruction will act as a return from the
>    function __context_switch(). While the whitepaper doesn't suggest it is
>    possible to speculate after the instruction, add preventively a
>    speculation barrier after it as well.
> 
>    This is part of the work to mitigate straight-line speculation.
> 
>    Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
>    ---
> 
>        Changes in v2:
>            - Add an a speculation barrier for the __arm_smccc_1_0_smc. This
>            was originally part of the second patch.
>            - I have now included all the arm*/lib/ assembly helpers
>            because some of them are called with guest input value (e.g
>            memchr).
> 
> diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
> index b228d44b190c..bd54fc43558b 100644
> --- a/xen/arch/arm/arm32/entry.S
> +++ b/xen/arch/arm/arm32/entry.S
> @@ -442,6 +442,7 @@ ENTRY(__context_switch)
> 
>         add     r4, r1, #VCPU_arch_saved_context
>         ldmia   r4, {r4 - sl, fp, sp, pc}       /* Load registers and return */
> +        sb
> 
> /*
>  * Local variables:
> diff --git a/xen/arch/arm/arm32/lib/assembler.h b/xen/arch/arm/arm32/lib/assembler.h
> index 6de2638a36d5..cffdb60baa29 100644
> --- a/xen/arch/arm/arm32/lib/assembler.h
> +++ b/xen/arch/arm/arm32/lib/assembler.h
> @@ -17,6 +17,8 @@
> #ifndef __ASM_ASSEMBLER_H__
> #define __ASM_ASSEMBLER_H__
> 
> +#include <asm/macros.h>
> +
> #ifndef __ASSEMBLY__
> #error "Only include this from assembly code"
> #endif
> diff --git a/xen/arch/arm/arm32/lib/findbit.S b/xen/arch/arm/arm32/lib/findbit.S
> index 00674a31efad..df86922a25bc 100644
> --- a/xen/arch/arm/arm32/lib/findbit.S
> +++ b/xen/arch/arm/arm32/lib/findbit.S
> @@ -35,6 +35,7 @@ ENTRY(_find_first_zero_bit_le)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_zero_bit_le)
> 
> /*
> @@ -76,6 +77,7 @@ ENTRY(_find_first_bit_le)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_bit_le)
> 
> /*
> @@ -114,6 +116,7 @@ ENTRY(_find_first_zero_bit_be)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_zero_bit_be)
> 
> ENTRY(_find_next_zero_bit_be)
> @@ -148,6 +151,7 @@ ENTRY(_find_first_bit_be)
> 		blo	1b
> 3:		mov	r0, r1			@ no free bits
> 		mov	pc, lr
> +		sb
> ENDPROC(_find_first_bit_be)
> 
> ENTRY(_find_next_bit_be)
> @@ -192,4 +196,5 @@ ENDPROC(_find_next_bit_be)
> 		cmp	r1, r0			@ Clamp to maxbit
> 		movlo	r0, r1
> 		mov	pc, lr
> +		sb
> 
> diff --git a/xen/arch/arm/arm32/lib/lib1funcs.S b/xen/arch/arm/arm32/lib/lib1funcs.S
> index f1278bd6c139..ebb9b4f1babf 100644
> --- a/xen/arch/arm/arm32/lib/lib1funcs.S
> +++ b/xen/arch/arm/arm32/lib/lib1funcs.S
> @@ -217,15 +217,18 @@ UNWIND(.fnstart)
> 
> 	mov	r0, r2
> 	mov	pc, lr
> +	sb
> 
> 11:	moveq	r0, #1
> 	movne	r0, #0
> 	mov	pc, lr
> +	sb
> 
> 12:	ARM_DIV2_ORDER r1, r2
> 
> 	mov	r0, r0, lsr r2
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__udivsi3)
> @@ -245,6 +248,7 @@ UNWIND(.fnstart)
> 	ARM_MOD_BODY r0, r1, r2, r3
> 
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__umodsi3)
> @@ -271,15 +275,18 @@ UNWIND(.fnstart)
> 	cmp	ip, #0
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> 10:	teq	ip, r0				@ same sign ?
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> 11:	movlo	r0, #0
> 	moveq	r0, ip, asr #31
> 	orreq	r0, r0, #1
> 	mov	pc, lr
> +	sb
> 
> 12:	ARM_DIV2_ORDER r1, r2
> 
> @@ -287,6 +294,7 @@ UNWIND(.fnstart)
> 	mov	r0, r3, lsr r2
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__divsi3)
> @@ -312,6 +320,7 @@ UNWIND(.fnstart)
> 10:	cmp	ip, #0
> 	rsbmi	r0, r0, #0
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__modsi3)
> @@ -328,6 +337,7 @@ UNWIND(.save {r0, r1, ip, lr}	)
> 	mul	r3, r0, r2
> 	sub	r1, r1, r3
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_uidivmod)
> @@ -341,6 +351,7 @@ UNWIND(.save {r0, r1, ip, lr}	)
> 	mul	r3, r0, r2
> 	sub	r1, r1, r3
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_idivmod)
> @@ -355,6 +366,7 @@ UNWIND(.save {lr}	)
> 	add sp, sp, #8
> 	ldmfd sp!, {r2, r3}
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_uldivmod)
> @@ -369,6 +381,7 @@ UNWIND(.save {lr}	)
> 	add sp, sp, #16
> 	ldmfd	sp!, {r2, r3}
> 	mov	pc, lr
> +	sb
> 
> UNWIND(.fnend)
> ENDPROC(__aeabi_ldivmod)
> @@ -382,5 +395,6 @@ UNWIND(.save {lr})
> 	bl	__div0
> 	mov	r0, #0			@ About as wrong as it could be.
> 	ldr	pc, [sp], #8
> +	sb
> UNWIND(.fnend)
> ENDPROC(Ldiv0)
> diff --git a/xen/arch/arm/arm32/lib/lshrdi3.S b/xen/arch/arm/arm32/lib/lshrdi3.S
> index 88b9655282ab..e4d275842c00 100644
> --- a/xen/arch/arm/arm32/lib/lshrdi3.S
> +++ b/xen/arch/arm/arm32/lib/lshrdi3.S
> @@ -46,6 +46,7 @@ ENTRY(__aeabi_llsr)
>  THUMB(	orrmi	al, al, r3		)
> 	mov	ah, ah, lsr r2
> 	mov	pc, lr
> +	sb
> 
> ENDPROC(__lshrdi3)
> ENDPROC(__aeabi_llsr)
> diff --git a/xen/arch/arm/arm32/lib/memchr.S b/xen/arch/arm/arm32/lib/memchr.S
> index 7dae4ab6a6f8..e2910d381b08 100644
> --- a/xen/arch/arm/arm32/lib/memchr.S
> +++ b/xen/arch/arm/arm32/lib/memchr.S
> @@ -22,4 +22,5 @@ ENTRY(memchr)
> 	sub	r0, r0, #1
> 2:	movne	r0, #0
> 	mov	pc, lr
> +	sb
> ENDPROC(memchr)
> diff --git a/xen/arch/arm/arm32/lib/memset.S b/xen/arch/arm/arm32/lib/memset.S
> index 5a1dadf767d7..9ce78d9e4695 100644
> --- a/xen/arch/arm/arm32/lib/memset.S
> +++ b/xen/arch/arm/arm32/lib/memset.S
> @@ -110,6 +110,7 @@ ENTRY(memset)
> 	tst	r2, #1
> 	strneb	r1, [ip], #1
> 	mov	pc, lr
> +	sb
> 
> 6:	subs	r2, r2, #4		@ 1 do we have enough
> 	blt	5b			@ 1 bytes to align with?
> diff --git a/xen/arch/arm/arm32/lib/strchr.S b/xen/arch/arm/arm32/lib/strchr.S
> index fa53ad8bc659..697b5b30e38b 100644
> --- a/xen/arch/arm/arm32/lib/strchr.S
> +++ b/xen/arch/arm/arm32/lib/strchr.S
> @@ -25,4 +25,5 @@ ENTRY(strchr)
> 		movne	r0, #0
> 		subeq	r0, r0, #1
> 		mov	pc, lr
> +		sb
> ENDPROC(strchr)
> diff --git a/xen/arch/arm/arm32/lib/strrchr.S b/xen/arch/arm/arm32/lib/strrchr.S
> index ec4d40de5e5a..9f64f23d5cb0 100644
> --- a/xen/arch/arm/arm32/lib/strrchr.S
> +++ b/xen/arch/arm/arm32/lib/strrchr.S
> @@ -22,4 +22,5 @@ ENTRY(strrchr)
> 		bne	1b
> 		mov	r0, r3
> 		mov	pc, lr
> +		sb
> ENDPROC(strrchr)
> diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
> index 175ea2981e72..b37d8fe09dc2 100644
> --- a/xen/arch/arm/arm64/entry.S
> +++ b/xen/arch/arm/arm64/entry.S
> @@ -522,6 +522,7 @@ abort_guest_exit_end:
>         cset    x19, ne
> 
>         ret
> +        sb
> ENDPROC(check_pending_guest_serror)
> 
> /*
> @@ -583,6 +584,7 @@ ENTRY(__context_switch)
>         ldr     lr, [x8]
>         mov     sp, x9
>         ret
> +        sb
> 
> /*
>  * Local variables:
> diff --git a/xen/arch/arm/arm64/lib/assembler.h b/xen/arch/arm/arm64/lib/assembler.h
> index 3f9c0dcf5d32..8356fd5d11fd 100644
> --- a/xen/arch/arm/arm64/lib/assembler.h
> +++ b/xen/arch/arm/arm64/lib/assembler.h
> @@ -5,6 +5,8 @@
> #error "Only include this from assembly code"
> #endif
> 
> +#include <asm/macros.h>
> +
> /* Only LE support so far */
> #define CPU_BE(x...)
> #define CPU_LE(x...) x
> diff --git a/xen/arch/arm/arm64/lib/clear_page.S b/xen/arch/arm/arm64/lib/clear_page.S
> index 9f8a680e1880..cf07bd00cb7e 100644
> --- a/xen/arch/arm/arm64/lib/clear_page.S
> +++ b/xen/arch/arm/arm64/lib/clear_page.S
> @@ -14,6 +14,8 @@
>  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>  */
> 
> +#include <asm/macros.h>
> +
> /*
>  * Clear page @dest
>  *
> @@ -31,4 +33,5 @@ ENTRY(clear_page)
> 	tst	x0, #(PAGE_SIZE - 1)
> 	b.ne	1b
> 	ret
> +	sb
> ENDPROC(clear_page)
> diff --git a/xen/arch/arm/arm64/lib/memchr.S b/xen/arch/arm/arm64/lib/memchr.S
> index 81f113bb1cab..ff75f03d86bc 100644
> --- a/xen/arch/arm/arm64/lib/memchr.S
> +++ b/xen/arch/arm/arm64/lib/memchr.S
> @@ -38,6 +38,8 @@ ENTRY(memchr)
> 	b.ne	1b
> 	sub	x0, x0, #1
> 	ret
> +	sb
> 2:	mov	x0, #0
> 	ret
> +	sb
> ENDPROC(memchr)
> diff --git a/xen/arch/arm/arm64/lib/memcmp.S b/xen/arch/arm/arm64/lib/memcmp.S
> index 87c2537ffeb7..2583bd404249 100644
> --- a/xen/arch/arm/arm64/lib/memcmp.S
> +++ b/xen/arch/arm/arm64/lib/memcmp.S
> @@ -160,6 +160,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp1 )
> 1:
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lstart_align:
> 	lsr	limit_wd, limit, #3
> @@ -236,6 +237,7 @@ CPU_LE( rev	data2, data2 )
> 	lsr	data1, data1, #56
> 	sub	result, data1, data2, lsr #56
> 	ret
> +	sb
> 
> .Lremain8:
> 	/* Limit % 8 == 0 =>. all data are equal.*/
> @@ -251,7 +253,9 @@ CPU_LE( rev	data2, data2 )
> 	b.eq	.Ltiny8proc
> 	sub	result, data1, data2
> 	ret
> +	sb
> .Lret0:
> 	mov	result, #0
> 	ret
> +	sb
> ENDPROC(memcmp)
> diff --git a/xen/arch/arm/arm64/lib/memcpy.S b/xen/arch/arm/arm64/lib/memcpy.S
> index d90d20ef3ea8..f79ba8cb9515 100644
> --- a/xen/arch/arm/arm64/lib/memcpy.S
> +++ b/xen/arch/arm/arm64/lib/memcpy.S
> @@ -142,6 +142,7 @@ ENTRY(memcpy)
> 
> .Lexitfunc:
> 	ret
> +	sb
> 
> .Lcpy_over64:
> 	subs	count, count, #128
> @@ -162,6 +163,7 @@ ENTRY(memcpy)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> 
> 	/*
> 	* Critical loop.  Start at a new cache line boundary.  Assuming
> @@ -197,4 +199,5 @@ ENTRY(memcpy)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> ENDPROC(memcpy)
> diff --git a/xen/arch/arm/arm64/lib/memmove.S b/xen/arch/arm/arm64/lib/memmove.S
> index a49de845d066..dc0429583db7 100644
> --- a/xen/arch/arm/arm64/lib/memmove.S
> +++ b/xen/arch/arm/arm64/lib/memmove.S
> @@ -138,6 +138,7 @@ ENTRY(memmove)
> 
> .Lexitfunc:
> 	ret
> +	sb
> 
> .Lcpy_over64:
> 	subs	count, count, #128
> @@ -158,6 +159,7 @@ ENTRY(memmove)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> 
> 	/*
> 	* Critical loop. Start at a new cache line boundary. Assuming
> @@ -193,4 +195,5 @@ ENTRY(memmove)
> 	tst	count, #0x3f
> 	b.ne	.Ltail63
> 	ret
> +	sb
> ENDPROC(memmove)
> diff --git a/xen/arch/arm/arm64/lib/memset.S b/xen/arch/arm/arm64/lib/memset.S
> index 5bf751521b60..e052d7c937d8 100644
> --- a/xen/arch/arm/arm64/lib/memset.S
> +++ b/xen/arch/arm/arm64/lib/memset.S
> @@ -76,6 +76,7 @@ ENTRY(memset)
> 	strb	A_lw, [dst]
> 4:
> 	ret
> +	sb
> 
> .Lover16_proc:
> 	/*Whether  the start address is aligned with 16.*/
> @@ -120,6 +121,7 @@ ENTRY(memset)
> 	stp	A_l, A_l, [dst, #-16]	/* Repeat some/all of last store. */
> 4:
> 	ret
> +	sb
> 
> 	/*
> 	* Critical loop. Start at a new cache line boundary. Assuming
> @@ -141,6 +143,7 @@ ENTRY(memset)
> 	b.ne	.Ltail63
> .Lexitfunc:
> 	ret
> +	sb
> 
> 	/*
> 	* For zeroing memory, check to see if we can use the ZVA feature to
> @@ -212,4 +215,5 @@ ENTRY(memset)
> 	ands	count, count, zva_bits_x
> 	b.ne	.Ltail_maybe_long
> 	ret
> +	sb
> ENDPROC(memset)
> diff --git a/xen/arch/arm/arm64/lib/strchr.S b/xen/arch/arm/arm64/lib/strchr.S
> index 0506b0ff7f3a..be1d2fa2ec82 100644
> --- a/xen/arch/arm/arm64/lib/strchr.S
> +++ b/xen/arch/arm/arm64/lib/strchr.S
> @@ -17,6 +17,7 @@
>  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>  */
> 
> +#include <asm/macros.h>
> 
> /*
>  * Find the first occurrence of a character in a string.
> @@ -37,4 +38,5 @@ ENTRY(strchr)
> 	cmp	w2, w1
> 	csel	x0, x0, xzr, eq
> 	ret
> +	sb
> ENDPROC(strchr)
> diff --git a/xen/arch/arm/arm64/lib/strcmp.S b/xen/arch/arm/arm64/lib/strcmp.S
> index c6f42dd2550b..08eda29302f0 100644
> --- a/xen/arch/arm/arm64/lib/strcmp.S
> +++ b/xen/arch/arm/arm64/lib/strcmp.S
> @@ -133,6 +133,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp1 )	/* Shift (tmp1 & 63).  */
> 1:
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lstart_align:
> 	ands	xzr, src1, #7
> @@ -205,6 +206,7 @@ CPU_BE( cmp	data1, data2 )
> CPU_BE( cset	result, ne )
> CPU_BE( cneg	result, result, lo )
> CPU_BE( ret )
> +CPU_BE( sb )
> CPU_BE( 1: )
> 	/*Re-compute the NUL-byte detection, using a byte-reversed value. */
> CPU_BE(	rev	tmp3, data1 )
> @@ -230,4 +232,5 @@ CPU_BE(	orr	syndrome, diff, has_nul )
> 	lsr	data1, data1, #56
> 	sub	result, data1, data2, lsr #56
> 	ret
> +	sb
> ENDPROC(strcmp)
> diff --git a/xen/arch/arm/arm64/lib/strlen.S b/xen/arch/arm/arm64/lib/strlen.S
> index fb6aaf1a6afb..92df3c5e4418 100644
> --- a/xen/arch/arm/arm64/lib/strlen.S
> +++ b/xen/arch/arm/arm64/lib/strlen.S
> @@ -106,6 +106,7 @@ CPU_BE( bic	has_nul2, tmp1, tmp2 )
> 	clz	pos, has_nul2
> 	add	len, len, pos, lsr #3		/* Bits to bytes.  */
> 	ret
> +	sb
> 
> .Lmisaligned:
> 	cmp	tmp1, #8
> diff --git a/xen/arch/arm/arm64/lib/strncmp.S b/xen/arch/arm/arm64/lib/strncmp.S
> index a4a0f779f56e..e13ff99fe75e 100644
> --- a/xen/arch/arm/arm64/lib/strncmp.S
> +++ b/xen/arch/arm/arm64/lib/strncmp.S
> @@ -182,6 +182,7 @@ CPU_LE( lsr	tmp2, tmp2, tmp3 )	/* Shift (tmp1 & 63).  */
> 1:
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lstart_align:
> 	lsr	limit_wd, limit, #3
> @@ -264,6 +265,7 @@ CPU_BE( cmp	data1, data2 )
> CPU_BE( cset	result, ne )
> CPU_BE( cneg	result, result, lo )
> CPU_BE( ret )
> +CPU_BE( ret )
> CPU_BE( 1: )
> 	/* Re-compute the NUL-byte detection, using a byte-reversed value.*/
> CPU_BE( rev	tmp3, data1 )
> @@ -288,6 +290,7 @@ CPU_BE( orr	syndrome, diff, has_nul )
> 	lsr	data1, data1, #56
> 	sub	result, data1, data2, lsr #56
> 	ret
> +	sb
> 
> .Lremain8:
> 	/* Limit % 8 == 0 => all bytes significant.  */
> @@ -303,8 +306,10 @@ CPU_BE( orr	syndrome, diff, has_nul )
> 	b.eq	.Ltiny8proc
> 	sub	result, data1, data2
> 	ret
> +	sb
> 
> .Lret0:
> 	mov	result, #0
> 	ret
> +	sb
> ENDPROC(strncmp)
> diff --git a/xen/arch/arm/arm64/lib/strnlen.S b/xen/arch/arm/arm64/lib/strnlen.S
> index 81c8e8b54ea9..c59467bf17e1 100644
> --- a/xen/arch/arm/arm64/lib/strnlen.S
> +++ b/xen/arch/arm/arm64/lib/strnlen.S
> @@ -126,6 +126,7 @@ CPU_BE( bic	has_nul2, tmp1, tmp2 )
> 	cmp	len, limit
> 	csel	len, len, limit, ls     /* Return the lower value.  */
> 	ret
> +	sb
> 
> .Lmisaligned:
> 	/*
> @@ -168,4 +169,5 @@ CPU_LE( lsr	tmp2, tmp2, tmp4 )	/* Shift (tmp1 & 63).  */
> .Lhit_limit:
> 	mov	len, limit
> 	ret
> +	sb
> ENDPROC(strnlen)
> diff --git a/xen/arch/arm/arm64/lib/strrchr.S b/xen/arch/arm/arm64/lib/strrchr.S
> index 07059983f828..fc8b6e446071 100644
> --- a/xen/arch/arm/arm64/lib/strrchr.S
> +++ b/xen/arch/arm/arm64/lib/strrchr.S
> @@ -17,6 +17,8 @@
>  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>  */
> 
> +#include <asm/macros.h>
> +
> /*
>  * Find the last occurrence of a character in a string.
>  *
> @@ -37,4 +39,5 @@ ENTRY(strrchr)
> 	b	1b
> 2:	mov	x0, x3
> 	ret
> +	sb
> ENDPROC(strrchr)
> diff --git a/xen/arch/arm/arm64/smc.S b/xen/arch/arm/arm64/smc.S
> index b0752be57e8f..e0a3106dd7ec 100644
> --- a/xen/arch/arm/arm64/smc.S
> +++ b/xen/arch/arm/arm64/smc.S
> @@ -30,3 +30,4 @@ ENTRY(__arm_smccc_1_0_smc)
>         stp     x2, x3, [x4, #SMCCC_RES_a2]
> 1:
>         ret
> +        sb
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-18 17:06             ` Bertrand Marquis
@ 2020-08-18 17:34               ` Julien Grall
  2020-08-19  7:59                 ` Bertrand Marquis
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2020-08-18 17:34 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, security, Paul Durrant, Xen-devel,
	Volodymyr Babchuk, Andre Przywara, Julien Grall, nd

Hi Bertrand,

On 18/08/2020 18:06, Bertrand Marquis wrote:
> 
> 
>> On 18 Aug 2020, at 17:43, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 18/08/2020 17:35, Bertrand Marquis wrote:
>>> Hi Julien,
>>
>> Hi Bertrand,
>>
>>> Somehow we stopped on this thread and you did already most of the work so I think we should try to finish what you started
>>
>> Sorry this fell-through the cracks. I have a new version for patch #1, but not yet patch #2.
> 
> No problem this came back while trying to reduce my todolist stack :-)
> 
>>
>> I am still debating with myself where the speculation barrier should be added after the SMC :).
> 
> I think that we should unless the SMC is in the context switch path (as all other calls should not have a performance impact).
I will introduce *_unsafe() helpers that will not contain the 
speculation barrier. It could then be used in place where we think the 
barrier is unnecessary.

> 
>>
>>>> On 4 Jul 2020, at 17:07, Julien Grall <julien@xen.org> wrote:
>>>>
>>>> On 17/06/2020 17:23, Julien Grall wrote:
>>>>> Hi,
>>>>> On 16/06/2020 22:24, Stefano Stabellini wrote:
>>>>>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>>>
>>>>>>> Some CPUs can speculate past a RET instruction and potentially perform
>>>>>>> speculative accesses to memory before processing the return.
>>>>>>>
>>>>>>> There is no known gadget available after the RET instruction today.
>>>>>>> However some of the registers (such as in check_pending_guest_serror())
>>>>>>> may contain a value provided the guest.
>>>>>>                                 ^ by
>>>>>>
>>>>>>
>>>>>>> In order to harden the code, it would be better to add a speculation
>>>>>>> barrier after each RET instruction. The performance is meant to be
>>>>>>> negligeable as the speculation barrier is not meant to be archicturally
>>>>>>> executed.
>>>>>>>
>>>>>>> Note that on arm32, the ldmia instruction will act as a return from the
>>>>>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>>>>>> possible to speculate after the instruction, add preventively a
>>>>>>> speculation barrier after it as well.
>>>>>>>
>>>>>>> This is part of the work to mitigate straight-line speculation.
>>>>>>>
>>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>>>
>>>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>>>>
>>>>>> I did a compile-test on the patch too.
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> I am still unsure whether we preventively should add a speculation barrier
>>>>>>> preventively after all the RET instructions in arm*/lib/. The smc call be
>>>>>>> taken care in a follow-up patch.
>>>>>>
>>>>>> SMC is great to have but it seems to be overkill to do the ones under
>>>>>> lib/.
>>>>>  From my understanding, the compiler will add a speculation barrier preventively after each 'ret' when the mitigation are turned on.So it feels to me we want to follow the same approach.
>>>>> Obviously, we can avoid them but I would like to have a justification for not adding them (nothing is overkilled against speculation ;)).
>>>>
>>>> I finally found some time to look at arm*/lib in more details. Some of the helpers can definitely be called with guest inputs.
>>>>
>>>> For instance, memchr() is called from hypfs_get_path_user() with the 3rd argument controlled by the guest. In both 32-bit and 64-bit implementation, you will reach the end of the function memchr() with r2/w2 and r3/w3 (it contains a character from the buffer) controlled by the guest.
>>>>
>>>> As this is the only function in the unit, we don't know what will be the instructions right after RET. So it would be safer to add a speculation barrier there too.
>>> How about adding a speculation barrier directly in the ENDPROC macro ?
>>
>> This would unfortunately not cover all the cases because you can return in the middle of the function. I will have a look to see if we can leverage it.
> 
> I agree that it would not solve all of them but a big part would be solved by it.
> An other solution might be to have a RETURN macro encoded as "mov pc,lr; sb" and "ret; sb”.

This is a bit messy on Arm32 because not all the return are using "mov 
pc,lr".  Anyway, I will explore the two approaches.

> 
> The patch sounds good, i just need to find a way to analyse if you missed a ret or not which is not easy with such a patch :-)

I did struggle to find all the instances. The directory lib/ is actually 
quite difficult to go through on 32-bit because they are multiple way to
implement a return.

Finding a way to reduce manual speculation barrier would definitely be 
helpful. I will try to revise the patch during this week.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-18 17:34               ` Julien Grall
@ 2020-08-19  7:59                 ` Bertrand Marquis
  2020-08-19  8:02                   ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Bertrand Marquis @ 2020-08-19  7:59 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, security, Paul Durrant, Xen-devel,
	Volodymyr Babchuk, Andre Przywara, Julien Grall, nd



> On 18 Aug 2020, at 18:34, Julien Grall <julien@xen.org> wrote:
> 
> Hi Bertrand,
> 
> On 18/08/2020 18:06, Bertrand Marquis wrote:
>>> On 18 Aug 2020, at 17:43, Julien Grall <julien@xen.org> wrote:
>>> 
>>> 
>>> 
>>> On 18/08/2020 17:35, Bertrand Marquis wrote:
>>>> Hi Julien,
>>> 
>>> Hi Bertrand,
>>> 
>>>> Somehow we stopped on this thread and you did already most of the work so I think we should try to finish what you started
>>> 
>>> Sorry this fell-through the cracks. I have a new version for patch #1, but not yet patch #2.
>> No problem this came back while trying to reduce my todolist stack :-)
>>> 
>>> I am still debating with myself where the speculation barrier should be added after the SMC :).
>> I think that we should unless the SMC is in the context switch path (as all other calls should not have a performance impact).
> I will introduce *_unsafe() helpers that will not contain the speculation barrier. It could then be used in place where we think the barrier is unnecessary.

great idea, this will make it clear by reading the code reducing the need of documentation.

> 
>>> 
>>>>> On 4 Jul 2020, at 17:07, Julien Grall <julien@xen.org> wrote:
>>>>> 
>>>>> On 17/06/2020 17:23, Julien Grall wrote:
>>>>>> Hi,
>>>>>> On 16/06/2020 22:24, Stefano Stabellini wrote:
>>>>>>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>>>> 
>>>>>>>> Some CPUs can speculate past a RET instruction and potentially perform
>>>>>>>> speculative accesses to memory before processing the return.
>>>>>>>> 
>>>>>>>> There is no known gadget available after the RET instruction today.
>>>>>>>> However some of the registers (such as in check_pending_guest_serror())
>>>>>>>> may contain a value provided the guest.
>>>>>>>                                ^ by
>>>>>>> 
>>>>>>> 
>>>>>>>> In order to harden the code, it would be better to add a speculation
>>>>>>>> barrier after each RET instruction. The performance is meant to be
>>>>>>>> negligeable as the speculation barrier is not meant to be archicturally
>>>>>>>> executed.
>>>>>>>> 
>>>>>>>> Note that on arm32, the ldmia instruction will act as a return from the
>>>>>>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>>>>>>> possible to speculate after the instruction, add preventively a
>>>>>>>> speculation barrier after it as well.
>>>>>>>> 
>>>>>>>> This is part of the work to mitigate straight-line speculation.
>>>>>>>> 
>>>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>>>> 
>>>>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>>>>> 
>>>>>>> I did a compile-test on the patch too.
>>>>>>> 
>>>>>>> 
>>>>>>>> ---
>>>>>>>> 
>>>>>>>> I am still unsure whether we preventively should add a speculation barrier
>>>>>>>> preventively after all the RET instructions in arm*/lib/. The smc call be
>>>>>>>> taken care in a follow-up patch.
>>>>>>> 
>>>>>>> SMC is great to have but it seems to be overkill to do the ones under
>>>>>>> lib/.
>>>>>> From my understanding, the compiler will add a speculation barrier preventively after each 'ret' when the mitigation are turned on.So it feels to me we want to follow the same approach.
>>>>>> Obviously, we can avoid them but I would like to have a justification for not adding them (nothing is overkilled against speculation ;)).
>>>>> 
>>>>> I finally found some time to look at arm*/lib in more details. Some of the helpers can definitely be called with guest inputs.
>>>>> 
>>>>> For instance, memchr() is called from hypfs_get_path_user() with the 3rd argument controlled by the guest. In both 32-bit and 64-bit implementation, you will reach the end of the function memchr() with r2/w2 and r3/w3 (it contains a character from the buffer) controlled by the guest.
>>>>> 
>>>>> As this is the only function in the unit, we don't know what will be the instructions right after RET. So it would be safer to add a speculation barrier there too.
>>>> How about adding a speculation barrier directly in the ENDPROC macro ?
>>> 
>>> This would unfortunately not cover all the cases because you can return in the middle of the function. I will have a look to see if we can leverage it.
>> I agree that it would not solve all of them but a big part would be solved by it.
>> An other solution might be to have a RETURN macro encoded as "mov pc,lr; sb" and "ret; sb”.
> 
> This is a bit messy on Arm32 because not all the return are using "mov pc,lr".  Anyway, I will explore the two approaches.

ack.

> 
>> The patch sounds good, i just need to find a way to analyse if you missed a ret or not which is not easy with such a patch :-)
> 
> I did struggle to find all the instances. The directory lib/ is actually quite difficult to go through on 32-bit because they are multiple way to
> implement a return.

some part of the assembler code might benefit from a few branch instead of middle return to make the code clearer also but this might impact
a bit the performances.

> 
> Finding a way to reduce manual speculation barrier would definitely be helpful. I will try to revise the patch during this week.

ok i will on my side list the returns to be able to review the final patch more easily.

Cheers
Bertrand


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-19  7:59                 ` Bertrand Marquis
@ 2020-08-19  8:02                   ` Jan Beulich
  2020-08-19  8:50                     ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2020-08-19  8:02 UTC (permalink / raw)
  To: Bertrand Marquis, Julien Grall
  Cc: Stefano Stabellini, Paul Durrant, Xen-devel, Volodymyr Babchuk,
	Andre Przywara, Julien Grall, nd

On 19.08.2020 09:59, Bertrand Marquis wrote:
>> On 18 Aug 2020, at 18:34, Julien Grall <julien@xen.org> wrote:

Btw - is there any need for this thread to be cross posted to both
xen-devel@ and security@? (I've dropped the latter here.)

Jan

>> On 18/08/2020 18:06, Bertrand Marquis wrote:
>>>> On 18 Aug 2020, at 17:43, Julien Grall <julien@xen.org> wrote:
>>>>
>>>>
>>>>
>>>> On 18/08/2020 17:35, Bertrand Marquis wrote:
>>>>> Hi Julien,
>>>>
>>>> Hi Bertrand,
>>>>
>>>>> Somehow we stopped on this thread and you did already most of the work so I think we should try to finish what you started
>>>>
>>>> Sorry this fell-through the cracks. I have a new version for patch #1, but not yet patch #2.
>>> No problem this came back while trying to reduce my todolist stack :-)
>>>>
>>>> I am still debating with myself where the speculation barrier should be added after the SMC :).
>>> I think that we should unless the SMC is in the context switch path (as all other calls should not have a performance impact).
>> I will introduce *_unsafe() helpers that will not contain the speculation barrier. It could then be used in place where we think the barrier is unnecessary.
> 
> great idea, this will make it clear by reading the code reducing the need of documentation.
> 
>>
>>>>
>>>>>> On 4 Jul 2020, at 17:07, Julien Grall <julien@xen.org> wrote:
>>>>>>
>>>>>> On 17/06/2020 17:23, Julien Grall wrote:
>>>>>>> Hi,
>>>>>>> On 16/06/2020 22:24, Stefano Stabellini wrote:
>>>>>>>> On Tue, 16 Jun 2020, Julien Grall wrote:
>>>>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>>>>>
>>>>>>>>> Some CPUs can speculate past a RET instruction and potentially perform
>>>>>>>>> speculative accesses to memory before processing the return.
>>>>>>>>>
>>>>>>>>> There is no known gadget available after the RET instruction today.
>>>>>>>>> However some of the registers (such as in check_pending_guest_serror())
>>>>>>>>> may contain a value provided the guest.
>>>>>>>>                                ^ by
>>>>>>>>
>>>>>>>>
>>>>>>>>> In order to harden the code, it would be better to add a speculation
>>>>>>>>> barrier after each RET instruction. The performance is meant to be
>>>>>>>>> negligeable as the speculation barrier is not meant to be archicturally
>>>>>>>>> executed.
>>>>>>>>>
>>>>>>>>> Note that on arm32, the ldmia instruction will act as a return from the
>>>>>>>>> function __context_switch(). While the whitepaper doesn't suggest it is
>>>>>>>>> possible to speculate after the instruction, add preventively a
>>>>>>>>> speculation barrier after it as well.
>>>>>>>>>
>>>>>>>>> This is part of the work to mitigate straight-line speculation.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>>>>>
>>>>>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>>>>>>
>>>>>>>> I did a compile-test on the patch too.
>>>>>>>>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> I am still unsure whether we preventively should add a speculation barrier
>>>>>>>>> preventively after all the RET instructions in arm*/lib/. The smc call be
>>>>>>>>> taken care in a follow-up patch.
>>>>>>>>
>>>>>>>> SMC is great to have but it seems to be overkill to do the ones under
>>>>>>>> lib/.
>>>>>>> From my understanding, the compiler will add a speculation barrier preventively after each 'ret' when the mitigation are turned on.So it feels to me we want to follow the same approach.
>>>>>>> Obviously, we can avoid them but I would like to have a justification for not adding them (nothing is overkilled against speculation ;)).
>>>>>>
>>>>>> I finally found some time to look at arm*/lib in more details. Some of the helpers can definitely be called with guest inputs.
>>>>>>
>>>>>> For instance, memchr() is called from hypfs_get_path_user() with the 3rd argument controlled by the guest. In both 32-bit and 64-bit implementation, you will reach the end of the function memchr() with r2/w2 and r3/w3 (it contains a character from the buffer) controlled by the guest.
>>>>>>
>>>>>> As this is the only function in the unit, we don't know what will be the instructions right after RET. So it would be safer to add a speculation barrier there too.
>>>>> How about adding a speculation barrier directly in the ENDPROC macro ?
>>>>
>>>> This would unfortunately not cover all the cases because you can return in the middle of the function. I will have a look to see if we can leverage it.
>>> I agree that it would not solve all of them but a big part would be solved by it.
>>> An other solution might be to have a RETURN macro encoded as "mov pc,lr; sb" and "ret; sb”.
>>
>> This is a bit messy on Arm32 because not all the return are using "mov pc,lr".  Anyway, I will explore the two approaches.
> 
> ack.
> 
>>
>>> The patch sounds good, i just need to find a way to analyse if you missed a ret or not which is not easy with such a patch :-)
>>
>> I did struggle to find all the instances. The directory lib/ is actually quite difficult to go through on 32-bit because they are multiple way to
>> implement a return.
> 
> some part of the assembler code might benefit from a few branch instead of middle return to make the code clearer also but this might impact
> a bit the performances.
> 
>>
>> Finding a way to reduce manual speculation barrier would definitely be helpful. I will try to revise the patch during this week.
> 
> ok i will on my side list the returns to be able to review the final patch more easily.
> 
> Cheers
> Bertrand
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-19  8:02                   ` Jan Beulich
@ 2020-08-19  8:50                     ` Julien Grall
  2020-08-19  8:58                       ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2020-08-19  8:50 UTC (permalink / raw)
  To: Jan Beulich, Bertrand Marquis
  Cc: Stefano Stabellini, Paul Durrant, Xen-devel, Volodymyr Babchuk,
	Andre Przywara, Julien Grall, security@xenproject.org

(Adding back security)

Hi Jan,

On 19/08/2020 09:02, Jan Beulich wrote:
> On 19.08.2020 09:59, Bertrand Marquis wrote:
>>> On 18 Aug 2020, at 18:34, Julien Grall <julien@xen.org> wrote:
> 
> Btw - is there any need for this thread to be cross posted to both
> xen-devel@ and security@? (I've dropped the latter here.)

 From the cover letter:

"The patch series is directly sent on the mailing list as the
security team has been aware of the issues after the whitepaper was
publicly released."

This is technically still a security issue except this is discussed in 
the open as it is a zero day for us. An XSA will have to be issued in 
due course. Hence why security@ is added to keep track of the conversation.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-19  8:50                     ` Julien Grall
@ 2020-08-19  8:58                       ` Jan Beulich
  2020-08-19  9:56                         ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2020-08-19  8:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: Bertrand Marquis, Stefano Stabellini, Paul Durrant, Xen-devel,
	Volodymyr Babchuk, Andre Przywara, Julien Grall

On 19.08.2020 10:50, Julien Grall wrote:
> On 19/08/2020 09:02, Jan Beulich wrote:
>> On 19.08.2020 09:59, Bertrand Marquis wrote:
>>>> On 18 Aug 2020, at 18:34, Julien Grall <julien@xen.org> wrote:
>>
>> Btw - is there any need for this thread to be cross posted to both
>> xen-devel@ and security@? (I've dropped the latter here.)
> 
>  From the cover letter:
> 
> "The patch series is directly sent on the mailing list as the
> security team has been aware of the issues after the whitepaper was
> publicly released."
> 
> This is technically still a security issue except this is discussed in 
> the open as it is a zero day for us. An XSA will have to be issued in 
> due course. Hence why security@ is added to keep track of the conversation.

I thought cross-posting is generally considered bad practice. I can't
see what extra "keeping track of the conversation" gets added by CCing
security@: Everything will already be recorded in the list archives of
xen-devel.

For some background of my original question: The cross posting confuses
the rules I have set up in my mail client - the mail gets moved back
and forth between the two distinct folders for each of the lists. I
haven't been able to figure a non-clumsy way yet to avoid this
happening. The mail client we used to use until about a year ago did
not have any issue with the same scenario.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction
  2020-08-19  8:58                       ` Jan Beulich
@ 2020-08-19  9:56                         ` Julien Grall
  0 siblings, 0 replies; 20+ messages in thread
From: Julien Grall @ 2020-08-19  9:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Bertrand Marquis, Stefano Stabellini, Paul Durrant, Xen-devel,
	Volodymyr Babchuk, Andre Przywara, Julien Grall

On 19/08/2020 09:58, Jan Beulich wrote:
> On 19.08.2020 10:50, Julien Grall wrote:
>> On 19/08/2020 09:02, Jan Beulich wrote:
>>> On 19.08.2020 09:59, Bertrand Marquis wrote:
>>>>> On 18 Aug 2020, at 18:34, Julien Grall <julien@xen.org> wrote:
>>>
>>> Btw - is there any need for this thread to be cross posted to both
>>> xen-devel@ and security@? (I've dropped the latter here.)
>>
>>   From the cover letter:
>>
>> "The patch series is directly sent on the mailing list as the
>> security team has been aware of the issues after the whitepaper was
>> publicly released."
>>
>> This is technically still a security issue except this is discussed in
>> the open as it is a zero day for us. An XSA will have to be issued in
>> due course. Hence why security@ is added to keep track of the conversation.
> 
> I thought cross-posting is generally considered bad practice. I can't
> see what extra "keeping track of the conversation" gets added by CCing
> security@: Everything will already be recorded in the list archives of
> xen-devel.

The "keep track of the conversation" wasn't in the sense of recording 
but that we are aware that there is a pending 0-day discussion and take 
action one the discussion as settled.

> 
> For some background of my original question: The cross posting confuses
> the rules I have set up in my mail client - the mail gets moved back
> and forth between the two distinct folders for each of the lists. I
> haven't been able to figure a non-clumsy way yet to avoid this
> happening. The mail client we used to use until about a year ago did
> not have any issue with the same scenario.

I am sorry to hear your e-mail client is not capable to do basic 
filtering. However, this is not the first place where we do that (see 
Linux or QEMU patches).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-08-19  9:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16 17:59 [PATCH 0/2] xen/arm: Mitigate straight-line speculation Julien Grall
2020-06-16 17:59 ` [PATCH 1/2] xen/arm: entry: Place a speculation barrier following an ret instruction Julien Grall
2020-06-16 21:24   ` Stefano Stabellini
2020-06-17 16:23     ` Julien Grall
2020-07-04 16:07       ` Julien Grall
2020-08-18 16:35         ` Bertrand Marquis
2020-08-18 16:43           ` Julien Grall
2020-08-18 17:06             ` Bertrand Marquis
2020-08-18 17:34               ` Julien Grall
2020-08-19  7:59                 ` Bertrand Marquis
2020-08-19  8:02                   ` Jan Beulich
2020-08-19  8:50                     ` Julien Grall
2020-08-19  8:58                       ` Jan Beulich
2020-08-19  9:56                         ` Julien Grall
2020-06-16 17:59 ` [PATCH 2/2] xen/arm: Mitigate straight-line speculation for SMC call Julien Grall
2020-06-16 21:34   ` Stefano Stabellini
2020-06-16 21:57     ` Julien Grall
2020-06-16 23:16       ` Andrew Cooper
2020-06-16 23:27         ` Stefano Stabellini
2020-06-18 17:46   ` Julien Grall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).