All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
@ 2022-03-08 14:01 Andrew Cooper
  2022-03-08 14:37 ` Jan Beulich
  2022-03-10  7:30 ` Doebel, Bjoern
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-03-08 14:01 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Bjoern Doebel, Michael Kurth, Martin Pohlack

For livepatching, we need to look at a potentially clobbered function and
determine whether it used to have an ENDBR64 instruction.

Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
the was_endbr64() predicate.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Bjoern Doebel <doebel@amazon.de>
CC: Michael Kurth <mku@amazon.de>
CC: Martin Pohlack <mpohlack@amazon.de>

Bjoern: For the livepatching code, I think you want:

  if ( is_endbr64(...) || was_endbr64(...) )
      needed += ENDBR64_LEN;
---
 xen/arch/x86/alternative.c       | 10 +++++++++-
 xen/arch/x86/include/asm/endbr.h | 12 ++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index d41eeef1bcaf..ffb1b1d960c8 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -362,7 +362,15 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
             if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
                 continue;
 
-            add_nops(ptr, ENDBR64_LEN);
+            /*
+             * Can't use add_nops() here.  ENDBR64_POISON is specifically
+             * different to NOP4 so it can be spotted after the fact.
+             *
+             * All CET-capable hardware uses P6 NOPS (no need to plumb through
+             * ideal_nops), and doesn't require a branch to synchronise the
+             * instruction stream.
+             */
+            memcpy(ptr, ENDBR64_POISON, ENDBR64_LEN);
             clobbered++;
         }
 
diff --git a/xen/arch/x86/include/asm/endbr.h b/xen/arch/x86/include/asm/endbr.h
index 6090afeb0bd8..5e1e55cb467d 100644
--- a/xen/arch/x86/include/asm/endbr.h
+++ b/xen/arch/x86/include/asm/endbr.h
@@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
     *(uint32_t *)ptr = gen_endbr64();
 }
 
+/*
+ * After clobbering ENDBR64, we may need to confirm that the site used to
+ * contain an ENDBR64 instruction.  Use an encoding which isn't the default
+ * P6_NOP4.
+ */
+#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
+
+static inline bool was_endbr64(const void *ptr)
+{
+    return *(const uint32_t *)ptr == 0x001f0f66;
+}
+
 #endif /* XEN_ASM_ENDBR_H */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-08 14:01 [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber Andrew Cooper
@ 2022-03-08 14:37 ` Jan Beulich
  2022-03-08 15:19   ` Andrew Cooper
  2022-03-10  7:30 ` Doebel, Bjoern
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2022-03-08 14:37 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Bjoern Doebel, Michael Kurth, Martin Pohlack, Xen-devel

On 08.03.2022 15:01, Andrew Cooper wrote:
> For livepatching, we need to look at a potentially clobbered function and
> determine whether it used to have an ENDBR64 instruction.
> 
> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
> the was_endbr64() predicate.

Did you consider using ENDBR32 for this purpose? I'm worried that
the pattern you picked is still too close to what might be used
(down the road) by a tool chain. If ENDBR32 isn't suitable for
some reason, how about "nop %sp" or "nopw (%rsp)" (and then maybe
even "data16" substituted by rex, cs, ds, es, or ss)?

One neat thing about ENDBR32 would be that you wouldn't even
need memcpy() - you'd merely flip the low main opcode bit.

> Bjoern: For the livepatching code, I think you want:
> 
>   if ( is_endbr64(...) || was_endbr64(...) )
>       needed += ENDBR64_LEN;

Looks like I didn't fully understand the problem then from your
initial description. The adjustment here (and the one needed in
Björn's patch) is to compensate for the advancing of the
targets of altcalls past the ENDBR?

> --- a/xen/arch/x86/include/asm/endbr.h
> +++ b/xen/arch/x86/include/asm/endbr.h
> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>      *(uint32_t *)ptr = gen_endbr64();
>  }
>  
> +/*
> + * After clobbering ENDBR64, we may need to confirm that the site used to
> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
> + * P6_NOP4.
> + */
> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */

In case this remains as is - did you mean "opsz" instead of "osp"?
But this really is "nopw (%rax)" anyway.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-08 14:37 ` Jan Beulich
@ 2022-03-08 15:19   ` Andrew Cooper
  2022-03-08 15:36     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2022-03-08 15:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monne, Wei Liu, Bjoern Doebel, Michael Kurth,
	Martin Pohlack, Xen-devel

On 08/03/2022 14:37, Jan Beulich wrote:
> On 08.03.2022 15:01, Andrew Cooper wrote:
>> For livepatching, we need to look at a potentially clobbered function and
>> determine whether it used to have an ENDBR64 instruction.
>>
>> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
>> the was_endbr64() predicate.
> Did you consider using ENDBR32 for this purpose?

No, and no because that's very short sighted.  Even 4 non-nops would be
better than ENDBR32, because they wouldn't create actually-usable
codepaths in corner cases we care to exclude.

> I'm worried that
> the pattern you picked is still too close to what might be used
> (down the road) by a tool chain.

This is what Linux are doing too, but no - I'm not worried.  The
encoding isn't the only protection; toolchains also have no reason to
put a nop4 at the head of functions; nop5 is the common one to find.

> One neat thing about ENDBR32 would be that you wouldn't even
> need memcpy() - you'd merely flip the low main opcode bit.

Not relevant.  You're taking the SMC pipeline hit for any sized write,
and a single movl is far less cryptic.

>> Bjoern: For the livepatching code, I think you want:
>>
>>   if ( is_endbr64(...) || was_endbr64(...) )
>>       needed += ENDBR64_LEN;
> Looks like I didn't fully understand the problem then from your
> initial description. The adjustment here (and the one needed in
> Björn's patch) is to compensate for the advancing of the
> targets of altcalls past the ENDBR?

No.  Consider this scenario:

callee:
    endbr64
    ...

altcall:
    call *foo(%rip)

During boot, we rewrite altcall to be `call callee+4` and turn endbr64
into nops, so it now looks like:

callee:
    nop4
    ...

altcall:
    call callee+4

Then we want to livepatch callee to callee_new, so we get

callee_new:
    endbr64
    ...

in the livepatch itself.

Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5).

The livepatch logic calling is_endbr(callee) doesn't work because it's
now a nop4, which is why it needs a was_endbr64(callee) too.

>
>> --- a/xen/arch/x86/include/asm/endbr.h
>> +++ b/xen/arch/x86/include/asm/endbr.h
>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>>      *(uint32_t *)ptr = gen_endbr64();
>>  }
>>  
>> +/*
>> + * After clobbering ENDBR64, we may need to confirm that the site used to
>> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
>> + * P6_NOP4.
>> + */
>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
> In case this remains as is - did you mean "opsz" instead of "osp"?
> But this really is "nopw (%rax)" anyway.

Oh, osp is the nasm name.  I'll switch to nopw.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-08 15:19   ` Andrew Cooper
@ 2022-03-08 15:36     ` Jan Beulich
  2022-03-08 16:03       ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2022-03-08 15:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monne, Wei Liu, Bjoern Doebel, Michael Kurth,
	Martin Pohlack, Xen-devel

On 08.03.2022 16:19, Andrew Cooper wrote:
> On 08/03/2022 14:37, Jan Beulich wrote:
>> On 08.03.2022 15:01, Andrew Cooper wrote:
>>> For livepatching, we need to look at a potentially clobbered function and
>>> determine whether it used to have an ENDBR64 instruction.
>>>
>>> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
>>> the was_endbr64() predicate.
>> Did you consider using ENDBR32 for this purpose?
> 
> No, and no because that's very short sighted.  Even 4 non-nops would be
> better than ENDBR32, because they wouldn't create actually-usable
> codepaths in corner cases we care to exclude.

Well - I thought of ENDBR32 because elsewhere you said we don't
need to be worried about any byte sequence resembling that insn,
since for it to become "usable" an attacker would first need to
have managed to manufacture a 32-bit ring0 code segment. Now you
say effectively the opposite.

>> I'm worried that
>> the pattern you picked is still too close to what might be used
>> (down the road) by a tool chain.
> 
> This is what Linux are doing too, but no - I'm not worried.  The
> encoding isn't the only protection; toolchains also have no reason to
> put a nop4 at the head of functions; nop5 is the common one to find.

Well, okay - let's hope for the best then.

>>> Bjoern: For the livepatching code, I think you want:
>>>
>>>   if ( is_endbr64(...) || was_endbr64(...) )
>>>       needed += ENDBR64_LEN;
>> Looks like I didn't fully understand the problem then from your
>> initial description. The adjustment here (and the one needed in
>> Björn's patch) is to compensate for the advancing of the
>> targets of altcalls past the ENDBR?
> 
> No.  Consider this scenario:
> 
> callee:
>     endbr64
>     ...
> 
> altcall:
>     call *foo(%rip)
> 
> During boot, we rewrite altcall to be `call callee+4` and turn endbr64
> into nops, so it now looks like:
> 
> callee:
>     nop4
>     ...
> 
> altcall:
>     call callee+4
> 
> Then we want to livepatch callee to callee_new, so we get
> 
> callee_new:
>     endbr64
>     ...
> 
> in the livepatch itself.
> 
> Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5).
> 
> The livepatch logic calling is_endbr(callee) doesn't work because it's
> now a nop4, which is why it needs a was_endbr64(callee) too.

Sounds like exactly what I was thinking of. Perhaps my description
wasn't sufficiently clear / unambiguous then.

>>> --- a/xen/arch/x86/include/asm/endbr.h
>>> +++ b/xen/arch/x86/include/asm/endbr.h
>>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>>>      *(uint32_t *)ptr = gen_endbr64();
>>>  }
>>>  
>>> +/*
>>> + * After clobbering ENDBR64, we may need to confirm that the site used to
>>> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
>>> + * P6_NOP4.
>>> + */
>>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
>> In case this remains as is - did you mean "opsz" instead of "osp"?
>> But this really is "nopw (%rax)" anyway.
> 
> Oh, osp is the nasm name.  I'll switch to nopw.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-08 15:36     ` Jan Beulich
@ 2022-03-08 16:03       ` Andrew Cooper
  2022-03-10 18:42         ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2022-03-08 16:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monne, Wei Liu, Bjoern Doebel, Michael Kurth,
	Martin Pohlack, Xen-devel

On 08/03/2022 15:36, Jan Beulich wrote:
> On 08.03.2022 16:19, Andrew Cooper wrote:
>> On 08/03/2022 14:37, Jan Beulich wrote:
>>> On 08.03.2022 15:01, Andrew Cooper wrote:
>>>> For livepatching, we need to look at a potentially clobbered function and
>>>> determine whether it used to have an ENDBR64 instruction.
>>>>
>>>> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
>>>> the was_endbr64() predicate.
>>> Did you consider using ENDBR32 for this purpose?
>> No, and no because that's very short sighted.  Even 4 non-nops would be
>> better than ENDBR32, because they wouldn't create actually-usable
>> codepaths in corner cases we care to exclude.
> Well - I thought of ENDBR32 because elsewhere you said we don't
> need to be worried about any byte sequence resembling that insn,
> since for it to become "usable" an attacker would first need to
> have managed to manufacture a 32-bit ring0 code segment. Now you
> say effectively the opposite.

We've got ~0 risk of having any embedded ENDBR32's because we never
refer to the opcode, and therefore adding 2x 0.7s delay to scan the
binary on build is a waste.  If the check were free, it would be a
different matter.

At any point, if we were to introduce references to ENDBR32, we'd want
to start scanning for embedded sequences.

But at no point do we want to intentionally remove our defence in depth
created by both having no CS32 code segment, and no ENDBR32 instructions.

>
>>> I'm worried that
>>> the pattern you picked is still too close to what might be used
>>> (down the road) by a tool chain.
>> This is what Linux are doing too, but no - I'm not worried.  The
>> encoding isn't the only protection; toolchains also have no reason to
>> put a nop4 at the head of functions; nop5 is the common one to find.
> Well, okay - let's hope for the best then.
>
>>>> Bjoern: For the livepatching code, I think you want:
>>>>
>>>>   if ( is_endbr64(...) || was_endbr64(...) )
>>>>       needed += ENDBR64_LEN;
>>> Looks like I didn't fully understand the problem then from your
>>> initial description. The adjustment here (and the one needed in
>>> Björn's patch) is to compensate for the advancing of the
>>> targets of altcalls past the ENDBR?
>> No.  Consider this scenario:
>>
>> callee:
>>     endbr64
>>     ...
>>
>> altcall:
>>     call *foo(%rip)
>>
>> During boot, we rewrite altcall to be `call callee+4` and turn endbr64
>> into nops, so it now looks like:
>>
>> callee:
>>     nop4
>>     ...
>>
>> altcall:
>>     call callee+4
>>
>> Then we want to livepatch callee to callee_new, so we get
>>
>> callee_new:
>>     endbr64
>>     ...
>>
>> in the livepatch itself.
>>
>> Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5).
>>
>> The livepatch logic calling is_endbr(callee) doesn't work because it's
>> now a nop4, which is why it needs a was_endbr64(callee) too.
> Sounds like exactly what I was thinking of. Perhaps my description
> wasn't sufficiently clear / unambiguous then.

Ah yes - I think I did misinterpret what you wrote.  I hope everything
is clear now.

>
>>>> --- a/xen/arch/x86/include/asm/endbr.h
>>>> +++ b/xen/arch/x86/include/asm/endbr.h
>>>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>>>>      *(uint32_t *)ptr = gen_endbr64();
>>>>  }
>>>>  
>>>> +/*
>>>> + * After clobbering ENDBR64, we may need to confirm that the site used to
>>>> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
>>>> + * P6_NOP4.
>>>> + */
>>>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
>>> In case this remains as is - did you mean "opsz" instead of "osp"?
>>> But this really is "nopw (%rax)" anyway.
>> Oh, osp is the nasm name.  I'll switch to nopw.
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-08 14:01 [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber Andrew Cooper
  2022-03-08 14:37 ` Jan Beulich
@ 2022-03-10  7:30 ` Doebel, Bjoern
  1 sibling, 0 replies; 11+ messages in thread
From: Doebel, Bjoern @ 2022-03-10  7:30 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné, Wei Liu



On 08.03.22 15:01, Andrew Cooper wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> For livepatching, we need to look at a potentially clobbered function and
> determine whether it used to have an ENDBR64 instruction.
> 
> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
> the was_endbr64() predicate.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Bjoern Doebel <doebel@amazon.de>
> CC: Michael Kurth <mku@amazon.de>
> CC: Martin Pohlack <mpohlack@amazon.de>
> 
> Bjoern: For the livepatching code, I think you want:
> 
>    if ( is_endbr64(...) || was_endbr64(...) )
>        needed += ENDBR64_LEN;
> ---
>   xen/arch/x86/alternative.c       | 10 +++++++++-
>   xen/arch/x86/include/asm/endbr.h | 12 ++++++++++++
>   2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
> index d41eeef1bcaf..ffb1b1d960c8 100644
> --- a/xen/arch/x86/alternative.c
> +++ b/xen/arch/x86/alternative.c
> @@ -362,7 +362,15 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>               if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
>                   continue;
> 
> -            add_nops(ptr, ENDBR64_LEN);
> +            /*
> +             * Can't use add_nops() here.  ENDBR64_POISON is specifically
> +             * different to NOP4 so it can be spotted after the fact.
> +             *
> +             * All CET-capable hardware uses P6 NOPS (no need to plumb through
> +             * ideal_nops), and doesn't require a branch to synchronise the
> +             * instruction stream.
> +             */
> +            memcpy(ptr, ENDBR64_POISON, ENDBR64_LEN);
>               clobbered++;
>           }
> 
> diff --git a/xen/arch/x86/include/asm/endbr.h b/xen/arch/x86/include/asm/endbr.h
> index 6090afeb0bd8..5e1e55cb467d 100644
> --- a/xen/arch/x86/include/asm/endbr.h
> +++ b/xen/arch/x86/include/asm/endbr.h
> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>       *(uint32_t *)ptr = gen_endbr64();
>   }
> 
> +/*
> + * After clobbering ENDBR64, we may need to confirm that the site used to
> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
> + * P6_NOP4.
> + */
> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
> +
> +static inline bool was_endbr64(const void *ptr)
> +{
> +    return *(const uint32_t *)ptr == 0x001f0f66;
> +}
> +
>   #endif /* XEN_ASM_ENDBR_H */
> --
> 2.11.0

Reviewed-by: Bjoern Doebel <doebel@amazon.de>



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-08 16:03       ` Andrew Cooper
@ 2022-03-10 18:42         ` Andrew Cooper
  2022-03-11  7:18           ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2022-03-10 18:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monne, Wei Liu, Bjoern Doebel, Michael Kurth,
	Martin Pohlack, Xen-devel

On 08/03/2022 16:03, Andrew Cooper wrote:
>>>>> --- a/xen/arch/x86/include/asm/endbr.h
>>>>> +++ b/xen/arch/x86/include/asm/endbr.h
>>>>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>>>>>      *(uint32_t *)ptr = gen_endbr64();
>>>>>  }
>>>>>  
>>>>> +/*
>>>>> + * After clobbering ENDBR64, we may need to confirm that the site used to
>>>>> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
>>>>> + * P6_NOP4.
>>>>> + */
>>>>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
>>>> In case this remains as is - did you mean "opsz" instead of "osp"?
>>>> But this really is "nopw (%rax)" anyway.
>>> Oh, osp is the nasm name.  I'll switch to nopw.
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Thanks.

It does occur to me that we can extend check-endbr.sh for this.

diff --git a/xen/arch/x86/indirect-thunk.S b/xen/arch/x86/indirect-thunk.S
index 7cc22da0ef93..3baaf7ab4983 100644
--- a/xen/arch/x86/indirect-thunk.S
+++ b/xen/arch/x86/indirect-thunk.S
@@ -38,6 +38,7 @@
         .section .text.__x86_indirect_thunk_\reg, "ax", @progbits
 
 ENTRY(__x86_indirect_thunk_\reg)
+       nopw (%rax)
         ALTERNATIVE_2 __stringify(IND_THUNK_RETPOLINE \reg),              \
         __stringify(IND_THUNK_LFENCE \reg), X86_FEATURE_IND_THUNK_LFENCE, \
         __stringify(IND_THUNK_JMP \reg),    X86_FEATURE_IND_THUNK_JMP
diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
index 9799c451a18d..652ac8d0b983 100755
--- a/xen/tools/check-endbr.sh
+++ b/xen/tools/check-endbr.sh
@@ -67,7 +67,7 @@ eval $(${OBJDUMP} -j .text $1 -h |
 ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
 if $perl_re
 then
-    LC_ALL=C grep -aobP '\363\17\36\372' $TEXT_BIN
+    LC_ALL=C grep -aobP '\363\17\36\372|\x66\x0f\x1f\x00' $TEXT_BIN
 else
     grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN
 fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}'
> $ALL

yields:

check-endbr.sh xen-syms Fail: Found 15 embedded endbr64 instructions
0xffff82d040377f00: __x86_indirect_thunk_rax at
/local/xen.git/xen/arch/x86/indirect-thunk.S:55
0xffff82d040377f20: __x86_indirect_thunk_rcx at ??:?
0xffff82d040377f40: __x86_indirect_thunk_rdx at ??:?
0xffff82d040377f60: __x86_indirect_thunk_rbx at ??:?
0xffff82d040377f80: __x86_indirect_thunk_rbp at ??:?
0xffff82d040377fa0: __x86_indirect_thunk_rsi at ??:?
0xffff82d040377fc0: __x86_indirect_thunk_rdi at ??:?
0xffff82d040377fe0: __x86_indirect_thunk_r8 at ??:?
0xffff82d040378000: __x86_indirect_thunk_r9 at ??:?
0xffff82d040378020: __x86_indirect_thunk_r10 at ??:?
0xffff82d040378040: __x86_indirect_thunk_r11 at ??:?
0xffff82d040378060: __x86_indirect_thunk_r12 at ??:?
0xffff82d040378080: __x86_indirect_thunk_r13 at ??:?
0xffff82d0403780a0: __x86_indirect_thunk_r14 at ??:?
0xffff82d0403780c0: __x86_indirect_thunk_r15 at ??:?
...
check-endbr.sh xen.efi Fail: Found 15 embedded endbr64 instructions
0xffff82d040377f00: ?? at /local/xen.git/xen/arch/x86/indirect-thunk.S:55
0xffff82d040377f20: ?? at head.o:?
0xffff82d040377f40: ?? at head.o:?
0xffff82d040377f60: ?? at head.o:?
0xffff82d040377f80: ?? at head.o:?
0xffff82d040377fa0: ?? at head.o:?
0xffff82d040377fc0: ?? at head.o:?
0xffff82d040377fe0: ?? at head.o:?
0xffff82d040378000: ?? at head.o:?
0xffff82d040378020: ?? at head.o:?
0xffff82d040378040: ?? at head.o:?
0xffff82d040378060: ?? at head.o:?
0xffff82d040378080: ?? at head.o:?
0xffff82d0403780a0: ?? at head.o:?
0xffff82d0403780c0: ?? at head.o:?

Obviously the changes to check-endbr want cleaning up, but I think it's
entirely within scope to check for ENDBR64_POISON too, and we can do it
without adding an extra pass.  Would you be happier with this check added?

But we also have some clear errors with debug symbols.  It's perhaps not
terribly surprising that irp/endr only gets file/line for the first
instance, and at least ELF manage to get the function name right, but
EFI is a mess and manages to get the wrong file.  Any idea how to get
rather less nonsense out of the debug symbols?

~Andrew

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-10 18:42         ` Andrew Cooper
@ 2022-03-11  7:18           ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2022-03-11  7:18 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monne, Wei Liu, Bjoern Doebel, Michael Kurth,
	Martin Pohlack, Xen-devel

On 10.03.2022 19:42, Andrew Cooper wrote:
> On 08/03/2022 16:03, Andrew Cooper wrote:
>>>>>> --- a/xen/arch/x86/include/asm/endbr.h
>>>>>> +++ b/xen/arch/x86/include/asm/endbr.h
>>>>>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>>>>>>      *(uint32_t *)ptr = gen_endbr64();
>>>>>>  }
>>>>>>  
>>>>>> +/*
>>>>>> + * After clobbering ENDBR64, we may need to confirm that the site used to
>>>>>> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
>>>>>> + * P6_NOP4.
>>>>>> + */
>>>>>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
>>>>> In case this remains as is - did you mean "opsz" instead of "osp"?
>>>>> But this really is "nopw (%rax)" anyway.
>>>> Oh, osp is the nasm name.  I'll switch to nopw.
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> Thanks.
> 
> It does occur to me that we can extend check-endbr.sh for this.
> 
> diff --git a/xen/arch/x86/indirect-thunk.S b/xen/arch/x86/indirect-thunk.S
> index 7cc22da0ef93..3baaf7ab4983 100644
> --- a/xen/arch/x86/indirect-thunk.S
> +++ b/xen/arch/x86/indirect-thunk.S
> @@ -38,6 +38,7 @@
>          .section .text.__x86_indirect_thunk_\reg, "ax", @progbits
>  
>  ENTRY(__x86_indirect_thunk_\reg)
> +       nopw (%rax)
>          ALTERNATIVE_2 __stringify(IND_THUNK_RETPOLINE \reg),              \
>          __stringify(IND_THUNK_LFENCE \reg), X86_FEATURE_IND_THUNK_LFENCE, \
>          __stringify(IND_THUNK_JMP \reg),    X86_FEATURE_IND_THUNK_JMP
> diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
> index 9799c451a18d..652ac8d0b983 100755
> --- a/xen/tools/check-endbr.sh
> +++ b/xen/tools/check-endbr.sh
> @@ -67,7 +67,7 @@ eval $(${OBJDUMP} -j .text $1 -h |
>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>  if $perl_re
>  then
> -    LC_ALL=C grep -aobP '\363\17\36\372' $TEXT_BIN
> +    LC_ALL=C grep -aobP '\363\17\36\372|\x66\x0f\x1f\x00' $TEXT_BIN
>  else
>      grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN
>  fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}'
>> $ALL
> 
> yields:
> 
> check-endbr.sh xen-syms Fail: Found 15 embedded endbr64 instructions
> 0xffff82d040377f00: __x86_indirect_thunk_rax at
> /local/xen.git/xen/arch/x86/indirect-thunk.S:55
> 0xffff82d040377f20: __x86_indirect_thunk_rcx at ??:?
> 0xffff82d040377f40: __x86_indirect_thunk_rdx at ??:?
> 0xffff82d040377f60: __x86_indirect_thunk_rbx at ??:?
> 0xffff82d040377f80: __x86_indirect_thunk_rbp at ??:?
> 0xffff82d040377fa0: __x86_indirect_thunk_rsi at ??:?
> 0xffff82d040377fc0: __x86_indirect_thunk_rdi at ??:?
> 0xffff82d040377fe0: __x86_indirect_thunk_r8 at ??:?
> 0xffff82d040378000: __x86_indirect_thunk_r9 at ??:?
> 0xffff82d040378020: __x86_indirect_thunk_r10 at ??:?
> 0xffff82d040378040: __x86_indirect_thunk_r11 at ??:?
> 0xffff82d040378060: __x86_indirect_thunk_r12 at ??:?
> 0xffff82d040378080: __x86_indirect_thunk_r13 at ??:?
> 0xffff82d0403780a0: __x86_indirect_thunk_r14 at ??:?
> 0xffff82d0403780c0: __x86_indirect_thunk_r15 at ??:?
> ...
> check-endbr.sh xen.efi Fail: Found 15 embedded endbr64 instructions
> 0xffff82d040377f00: ?? at /local/xen.git/xen/arch/x86/indirect-thunk.S:55
> 0xffff82d040377f20: ?? at head.o:?
> 0xffff82d040377f40: ?? at head.o:?
> 0xffff82d040377f60: ?? at head.o:?
> 0xffff82d040377f80: ?? at head.o:?
> 0xffff82d040377fa0: ?? at head.o:?
> 0xffff82d040377fc0: ?? at head.o:?
> 0xffff82d040377fe0: ?? at head.o:?
> 0xffff82d040378000: ?? at head.o:?
> 0xffff82d040378020: ?? at head.o:?
> 0xffff82d040378040: ?? at head.o:?
> 0xffff82d040378060: ?? at head.o:?
> 0xffff82d040378080: ?? at head.o:?
> 0xffff82d0403780a0: ?? at head.o:?
> 0xffff82d0403780c0: ?? at head.o:?
> 
> Obviously the changes to check-endbr want cleaning up, but I think it's
> entirely within scope to check for ENDBR64_POISON too, and we can do it
> without adding an extra pass.  Would you be happier with this check added?

Yes, this would feel better. Thanks for having continued to think
about it.

> But we also have some clear errors with debug symbols.  It's perhaps not
> terribly surprising that irp/endr only gets file/line for the first
> instance,

I have to admit I would expect it to at least figure the file. But
there's no .debug_line contents at all for ..._rcx .. ..._r15.

> and at least ELF manage to get the function name right, but
> EFI is a mess and manages to get the wrong file.  Any idea how to get
> rather less nonsense out of the debug symbols?

A random example with a symbol from a C file works here, at least.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-17 10:43 ` Jan Beulich
@ 2022-03-17 12:07   ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-03-17 12:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monne, Wei Liu, Bjoern Doebel, Michael Kurth,
	Martin Pohlack, Xen-devel

On 17/03/2022 10:43, Jan Beulich wrote:
> On 17.03.2022 11:02, Andrew Cooper wrote:
>> For livepatching, we need to look at a potentially clobbered function and
>> determine whether it used to have an ENDBR64 instruction.
>>
>> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and extend
>> check-endbr.sh to look for it.
>>
>> The choice of nop has some complicated consequences.  nopw (%rax) has a ModRM
>> byte of 0, which the Bourne compatible shells unconditionally strip from
>> parameters, meaning that we can't pass it to `grep -aob`.
> Urgh. But as per my earlier comments I'm happier with ...
>
>> Therefore, use nopw (%rcx) so the ModRM byte becomes 1.
> ... a non-zero ModR/M byte anyway.
>
>> This then demonstrates another bug.  Under perl regexes, \1 thru \9 are
>> subpattern matches, and not octal escapes.  Switch the `grep -P` runes to use
>> hex escapes instead.
>>
>> The build time check then requires that the endbr64 poison have the same
>> treatment as endbr64 to avoid placing the byte pattern in immediate operands.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
>> Jan: As you had the buggy grep, can you please confirm that hex escapes work.
> (Build-)Tested-by: Jan Beulich <jbeulich@suse.com>

Thanks.

>
> When working out the workaround, I actually did test with hex, but
> then switched to octal to make easily visible that the two patterns
> actually match. I also did wonder about octal and sub-pattern
> matching conflicting, but the grep I used definitely didn't have
> an issue there. Hence I assume grep behavior changed at some point;
> I wonder how they mean to have octal expressed now.

$ LC_ALL=C grep -aobP '\363\17\36\372|\146\17\37\1' text.bin
grep: reference to non-existent subpattern

> https://perldoc.perl.org/perlre specifically outlines how the
> conflict is dealt with - assuming you have observed grep to misbehave,
> I wonder whether they've accumulated a bug. (The doc also makes clear
> that such references aren't limited to single digit numbers; you may
> want to adjust your description in this regard.)

That part of the doc does say that the dynamic interpretation is only
for \10 and higher, so I don't think this is a bug.  \1 use to
exclusively mean the first capture group, not an octal character, and
this behaviour remains.

> Depending on how exactly your grep behaves, switching to always-three-
> digit octal escapes may be an alternative to retain the property of
> making obvious the similarity between the two pattern representations.

\01 and \001 do both work properly, but the non-ambiguous forms are \o1
or \o001.

Overall, I think it's better to stay with the hex escapes, because
they're also non-ambiguous.  The mix of octal and hex is irritating, but
the comments are very clear about what we're searching for.


And on that note, I realise we can also scan for endbr32 in exactly the
same way for no extra cost.  I'll fold that in too, seeing as the
discussion has come up before, and post a v3.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
  2022-03-17 10:02 Andrew Cooper
@ 2022-03-17 10:43 ` Jan Beulich
  2022-03-17 12:07   ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2022-03-17 10:43 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Bjoern Doebel, Michael Kurth, Martin Pohlack, Xen-devel

On 17.03.2022 11:02, Andrew Cooper wrote:
> For livepatching, we need to look at a potentially clobbered function and
> determine whether it used to have an ENDBR64 instruction.
> 
> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and extend
> check-endbr.sh to look for it.
> 
> The choice of nop has some complicated consequences.  nopw (%rax) has a ModRM
> byte of 0, which the Bourne compatible shells unconditionally strip from
> parameters, meaning that we can't pass it to `grep -aob`.

Urgh. But as per my earlier comments I'm happier with ...

> Therefore, use nopw (%rcx) so the ModRM byte becomes 1.

... a non-zero ModR/M byte anyway.

> This then demonstrates another bug.  Under perl regexes, \1 thru \9 are
> subpattern matches, and not octal escapes.  Switch the `grep -P` runes to use
> hex escapes instead.
> 
> The build time check then requires that the endbr64 poison have the same
> treatment as endbr64 to avoid placing the byte pattern in immediate operands.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> Jan: As you had the buggy grep, can you please confirm that hex escapes work.

(Build-)Tested-by: Jan Beulich <jbeulich@suse.com>

When working out the workaround, I actually did test with hex, but
then switched to octal to make easily visible that the two patterns
actually match. I also did wonder about octal and sub-pattern
matching conflicting, but the grep I used definitely didn't have
an issue there. Hence I assume grep behavior changed at some point;
I wonder how they mean to have octal expressed now.
https://perldoc.perl.org/perlre specifically outlines how the
conflict is dealt with - assuming you have observed grep to misbehave,
I wonder whether they've accumulated a bug. (The doc also makes clear
that such references aren't limited to single digit numbers; you may
want to adjust your description in this regard.)

Depending on how exactly your grep behaves, switching to always-three-
digit octal escapes may be an alternative to retain the property of
making obvious the similarity between the two pattern representations.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
@ 2022-03-17 10:02 Andrew Cooper
  2022-03-17 10:43 ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2022-03-17 10:02 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Bjoern Doebel, Michael Kurth, Martin Pohlack

For livepatching, we need to look at a potentially clobbered function and
determine whether it used to have an ENDBR64 instruction.

Use a non-default 4-byte P6 long nop, not emitted by toolchains, and extend
check-endbr.sh to look for it.

The choice of nop has some complicated consequences.  nopw (%rax) has a ModRM
byte of 0, which the Bourne compatible shells unconditionally strip from
parameters, meaning that we can't pass it to `grep -aob`.

Therefore, use nopw (%rcx) so the ModRM byte becomes 1.

This then demonstrates another bug.  Under perl regexes, \1 thru \9 are
subpattern matches, and not octal escapes.  Switch the `grep -P` runes to use
hex escapes instead.

The build time check then requires that the endbr64 poison have the same
treatment as endbr64 to avoid placing the byte pattern in immediate operands.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Bjoern Doebel <doebel@amazon.de>
CC: Michael Kurth <mku@amazon.de>
CC: Martin Pohlack <mpohlack@amazon.de>

v2:
 * Check for the poison byte pattern in check-endbr.sh
 * Use nopw (%rcx) to work around shell NUL (mis)features
 * Use hex escapes to work around Perl subpattern matches

Jan: As you had the buggy grep, can you please confirm that hex escapes work.
---
 xen/arch/x86/alternative.c       |  2 +-
 xen/arch/x86/include/asm/endbr.h | 26 ++++++++++++++++++++++++++
 xen/tools/check-endbr.sh         | 12 +++++++-----
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index d41eeef1bcaf..0c6fc7b4fb0c 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -362,7 +362,7 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
             if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
                 continue;
 
-            add_nops(ptr, ENDBR64_LEN);
+            place_endbr64_poison(ptr);
             clobbered++;
         }
 
diff --git a/xen/arch/x86/include/asm/endbr.h b/xen/arch/x86/include/asm/endbr.h
index 6090afeb0bd8..d946fac13130 100644
--- a/xen/arch/x86/include/asm/endbr.h
+++ b/xen/arch/x86/include/asm/endbr.h
@@ -52,4 +52,30 @@ static inline void place_endbr64(void *ptr)
     *(uint32_t *)ptr = gen_endbr64();
 }
 
+/*
+ * After clobbering ENDBR64, we may need to confirm that the site used to
+ * contain an ENDBR64 instruction.  Use an encoding which isn't the default
+ * P6_NOP4.  Specifically, nopw (%rcx)
+ */
+static inline uint32_t __attribute_const__ gen_endbr64_poison(void)
+{
+    uint32_t res;
+
+    asm ( "mov $~0x011f0f66, %[res]\n\t"
+          "not %[res]\n\t"
+          : [res] "=&r" (res) );
+
+    return res;
+}
+
+static inline bool is_endbr64_poison(const void *ptr)
+{
+    return *(const uint32_t *)ptr == gen_endbr64_poison();
+}
+
+static inline void place_endbr64_poison(void *ptr)
+{
+    *(uint32_t *)ptr = gen_endbr64_poison();
+}
+
 #endif /* XEN_ASM_ENDBR_H */
diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
index 9799c451a18d..126a2a14d44e 100755
--- a/xen/tools/check-endbr.sh
+++ b/xen/tools/check-endbr.sh
@@ -27,7 +27,7 @@ echo "X" | grep -aob "X" -q 2>/dev/null ||
 # Check whether grep supports Perl regexps. Older GNU grep doesn't reliably
 # find binary patterns otherwise.
 perl_re=true
-echo "X" | grep -aobP "\130" -q 2>/dev/null || perl_re=false
+echo "X" | grep -aobP "\x58" -q 2>/dev/null || perl_re=false
 
 #
 # First, look for all the valid endbr64 instructions.
@@ -45,13 +45,15 @@ echo "X" | grep -aobP "\130" -q 2>/dev/null || perl_re=false
 ${OBJDUMP} -j .text $1 -d -w | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
 
 #
-# Second, look for any endbr64 byte sequence
+# Second, look for any endbr64 or nop4 poison byte sequences
 # This has a couple of complications:
 #
 # 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
 #    the grep offset to be from the start of .text.
 #
 # 2) dash's printf doesn't understand hex escapes, hence the use of octal.
+#    `grep -P` on the other hand can interpret hex escapes, and must use them
+#    to avoid \1 thru \9 being interpreted as subpatterns matches.
 #
 # 3) AWK can't add 64bit integers, because internally all numbers are doubles.
 #    When the upper bits are set, the exponents worth of precision is lost in
@@ -67,9 +69,9 @@ eval $(${OBJDUMP} -j .text $1 -h |
 ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
 if $perl_re
 then
-    LC_ALL=C grep -aobP '\363\17\36\372' $TEXT_BIN
+    LC_ALL=C grep -aobP '\xf3\x0f\x1e\xfa|\x66\x0f\x1f\x01' $TEXT_BIN
 else
-    grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN
+    grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\146\17\37\1')" $TEXT_BIN
 fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}' > $ALL
 
 # Wait for $VALID to become complete
@@ -90,6 +92,6 @@ nr_bad=$(wc -l < $BAD)
 [ "$nr_bad" -eq 0 ] && exit 0
 
 # Failure
-echo "$MSG_PFX Fail: Found ${nr_bad} embedded endbr64 instructions" >&2
+echo "$MSG_PFX Fail: Found ${nr_bad} embedded endbr64 or poison instructions" >&2
 ${ADDR2LINE} -afip -e $1 < $BAD >&2
 exit 1
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-03-17 12:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08 14:01 [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber Andrew Cooper
2022-03-08 14:37 ` Jan Beulich
2022-03-08 15:19   ` Andrew Cooper
2022-03-08 15:36     ` Jan Beulich
2022-03-08 16:03       ` Andrew Cooper
2022-03-10 18:42         ` Andrew Cooper
2022-03-11  7:18           ` Jan Beulich
2022-03-10  7:30 ` Doebel, Bjoern
2022-03-17 10:02 Andrew Cooper
2022-03-17 10:43 ` Jan Beulich
2022-03-17 12:07   ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.