xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks
@ 2020-10-26  9:40 Jan Beulich
  2020-10-26  9:41 ` Jan Beulich
  2020-10-28 21:31 ` Andrew Cooper
  0 siblings, 2 replies; 5+ messages in thread
From: Jan Beulich @ 2020-10-26  9:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

In the case that no 64-bit SYSCALL callback is registered, the guest
will be crashed when 64-bit userspace executes a SYSCALL instruction,
which would be a userspace => kernel DoS.  Similarly for 32-bit
userspace when no 32-bit SYSCALL callback was registered either.

This has been the case ever since the introduction of 64bit PV support,
but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
yield #GP/#UD in userspace before the callback is registered, and are
therefore safe by default.

This change does constitute a change in the PV ABI, for the corner case
of a PV guest kernel not registering a 64-bit callback (which has to be
considered a defacto requirement of the unwritten PV ABI, considering
there is no PV equivalent of EFER.SCE).

It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
SYSENTER (safe by default, until explicitly enabled).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
---
v3:
 * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
   callbacks", to allow the uncontroversial part of that change to go
   in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
   over just the REX prefix too much trickery even for an unlikely to be
   taken code path?)

v2:
 * Drop unnecessary instruction suffixes
 * Don't truncate #UD entrypoint to 32 bits

--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -33,11 +33,27 @@ ENTRY(switch_to_kernel)
         cmoveq VCPU_syscall32_addr(%rbx),%rax
         testq %rax,%rax
         cmovzq VCPU_syscall_addr(%rbx),%rax
-        movq  %rax,TRAPBOUNCE_eip(%rdx)
         /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
         btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
         setc  %cl
         leal  (,%rcx,TBF_INTERRUPT),%ecx
+
+        test  %rax, %rax
+UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
+        mov   VCPU_trap_ctxt(%rbx), %rdi
+        movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
+        cmpw  $FLAT_USER_CS32, UREGS_cs(%rsp)
+        je    0f
+        rex64                           # subl => subq
+0:
+        subl  $2, UREGS_rip(%rsp)
+        mov   X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_eip(%rdi), %rax
+        testb $4, X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_flags(%rdi)
+        setnz %cl
+        lea   TBF_EXCEPTION(, %rcx, TBF_INTERRUPT), %ecx
+UNLIKELY_END(syscall_no_callback)
+
+        movq  %rax, TRAPBOUNCE_eip(%rdx)
         movb  %cl,TRAPBOUNCE_flags(%rdx)
         call  create_bounce_frame
         andl  $~X86_EFLAGS_DF,UREGS_eflags(%rsp)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks
  2020-10-26  9:40 [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks Jan Beulich
@ 2020-10-26  9:41 ` Jan Beulich
  2020-10-28 21:31 ` Andrew Cooper
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2020-10-26  9:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

On 26.10.2020 10:40, Jan Beulich wrote:

And of course this should have

From: Andrew Cooper <andrew.cooper3@citrix.com>

right here, sorry.

Jan

> In the case that no 64-bit SYSCALL callback is registered, the guest
> will be crashed when 64-bit userspace executes a SYSCALL instruction,
> which would be a userspace => kernel DoS.  Similarly for 32-bit
> userspace when no 32-bit SYSCALL callback was registered either.
> 
> This has been the case ever since the introduction of 64bit PV support,
> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
> yield #GP/#UD in userspace before the callback is registered, and are
> therefore safe by default.
> 
> This change does constitute a change in the PV ABI, for the corner case
> of a PV guest kernel not registering a 64-bit callback (which has to be
> considered a defacto requirement of the unwritten PV ABI, considering
> there is no PV equivalent of EFER.SCE).
> 
> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
> SYSENTER (safe by default, until explicitly enabled).
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <JBeulich@suse.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks
  2020-10-26  9:40 [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks Jan Beulich
  2020-10-26  9:41 ` Jan Beulich
@ 2020-10-28 21:31 ` Andrew Cooper
  2020-10-29  8:53   ` Jan Beulich
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2020-10-28 21:31 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 26/10/2020 09:40, Jan Beulich wrote:
> In the case that no 64-bit SYSCALL callback is registered, the guest
> will be crashed when 64-bit userspace executes a SYSCALL instruction,
> which would be a userspace => kernel DoS.  Similarly for 32-bit
> userspace when no 32-bit SYSCALL callback was registered either.
>
> This has been the case ever since the introduction of 64bit PV support,
> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
> yield #GP/#UD in userspace before the callback is registered, and are
> therefore safe by default.
>
> This change does constitute a change in the PV ABI, for the corner case
> of a PV guest kernel not registering a 64-bit callback (which has to be
> considered a defacto requirement of the unwritten PV ABI, considering
> there is no PV equivalent of EFER.SCE).
>
> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
> SYSENTER (safe by default, until explicitly enabled).
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <JBeulich@suse.com>
> ---
> v3:
>  * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
>    callbacks", to allow the uncontroversial part of that change to go
>    in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
>    over just the REX prefix too much trickery even for an unlikely to be
>    taken code path?)

I find this submission confusion seeing as my v3 is already suitably
acked and ready to commit.  What I haven't had yet enough free time to
do so.

> v2:
>  * Drop unnecessary instruction suffixes
>  * Don't truncate #UD entrypoint to 32 bits
>
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -33,11 +33,27 @@ ENTRY(switch_to_kernel)
>          cmoveq VCPU_syscall32_addr(%rbx),%rax
>          testq %rax,%rax
>          cmovzq VCPU_syscall_addr(%rbx),%rax
> -        movq  %rax,TRAPBOUNCE_eip(%rdx)
>          /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
>          btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
>          setc  %cl
>          leal  (,%rcx,TBF_INTERRUPT),%ecx
> +
> +        test  %rax, %rax
> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
> +        mov   VCPU_trap_ctxt(%rbx), %rdi
> +        movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
> +        cmpw  $FLAT_USER_CS32, UREGS_cs(%rsp)
> +        je    0f
> +        rex64                           # subl => subq
> +0:
> +        subl  $2, UREGS_rip(%rsp)

There was deliberately not a 32bit sub here (see below).

As to the construct, I'm having a hard time deciding whether this is an
excellent idea, or far too clever for its own good.

Some basic perf testing shows that there is a visible difference in
execution time of these two paths, which means there is some
optimisation being missed in the frontend for the 32bit case.  However,
the difference is tiny in the grand scheme of things (about 0.4%
difference in aggregate time to execute a loop of this pattern with a
fixed number of iterations.)

However, the 32bit case isn't actually interesting here.  A guest can't
execute a SYSCALL instruction on/across the 4G->0 boundary because the
M2P is mapped NX up to the 4G boundary, so we can never reach this point
with %eip < 2.

Therefore, the 64bit-only form is the appropriate one to use, which
solves any question of cleverness, or potential decode stalls it causes.

~Andrew


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks
  2020-10-28 21:31 ` Andrew Cooper
@ 2020-10-29  8:53   ` Jan Beulich
  2020-10-29 13:05     ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2020-10-29  8:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 28.10.2020 22:31, Andrew Cooper wrote:
> On 26/10/2020 09:40, Jan Beulich wrote:
>> In the case that no 64-bit SYSCALL callback is registered, the guest
>> will be crashed when 64-bit userspace executes a SYSCALL instruction,
>> which would be a userspace => kernel DoS.  Similarly for 32-bit
>> userspace when no 32-bit SYSCALL callback was registered either.
>>
>> This has been the case ever since the introduction of 64bit PV support,
>> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
>> yield #GP/#UD in userspace before the callback is registered, and are
>> therefore safe by default.
>>
>> This change does constitute a change in the PV ABI, for the corner case
>> of a PV guest kernel not registering a 64-bit callback (which has to be
>> considered a defacto requirement of the unwritten PV ABI, considering
>> there is no PV equivalent of EFER.SCE).
>>
>> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
>> SYSENTER (safe by default, until explicitly enabled).
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <JBeulich@suse.com>
>> ---
>> v3:
>>  * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
>>    callbacks", to allow the uncontroversial part of that change to go
>>    in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
>>    over just the REX prefix too much trickery even for an unlikely to be
>>    taken code path?)
> 
> I find this submission confusion seeing as my v3 is already suitably
> acked and ready to commit.  What I haven't had yet enough free time to
> do so.

My objection to the other half stands, and hence, I'm afraid, stands
in the way of your patch getting committed. Aiui Roger's ack doesn't
invalidate my objection, sorry.

>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -33,11 +33,27 @@ ENTRY(switch_to_kernel)
>>          cmoveq VCPU_syscall32_addr(%rbx),%rax
>>          testq %rax,%rax
>>          cmovzq VCPU_syscall_addr(%rbx),%rax
>> -        movq  %rax,TRAPBOUNCE_eip(%rdx)
>>          /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
>>          btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
>>          setc  %cl
>>          leal  (,%rcx,TBF_INTERRUPT),%ecx
>> +
>> +        test  %rax, %rax
>> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
>> +        mov   VCPU_trap_ctxt(%rbx), %rdi
>> +        movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
>> +        cmpw  $FLAT_USER_CS32, UREGS_cs(%rsp)
>> +        je    0f
>> +        rex64                           # subl => subq
>> +0:
>> +        subl  $2, UREGS_rip(%rsp)
> 
> There was deliberately not a 32bit sub here (see below).

Funny you should say this when what I've taken as input (v2; I
don't think I had seen a v3, or else I would have called this
one v4) had "subl", not "subq". Roger's comment was regarding
a "mov" with a 32-bit register destination, not this "sub". If
you had also noticed and fixed this one, without you having
posted v3 (or without me having seen the posting) I couldn't
have known.

> As to the construct, I'm having a hard time deciding whether this is an
> excellent idea, or far too clever for its own good.
> 
> Some basic perf testing shows that there is a visible difference in
> execution time of these two paths, which means there is some
> optimisation being missed in the frontend for the 32bit case.  However,
> the difference is tiny in the grand scheme of things (about 0.4%
> difference in aggregate time to execute a loop of this pattern with a
> fixed number of iterations.)
> 
> However, the 32bit case isn't actually interesting here.  A guest can't
> execute a SYSCALL instruction on/across the 4G->0 boundary because the
> M2P is mapped NX up to the 4G boundary, so we can never reach this point
> with %eip < 2.
> 
> Therefore, the 64bit-only form is the appropriate one to use, which
> solves any question of cleverness, or potential decode stalls it causes.

Good point.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks
  2020-10-29  8:53   ` Jan Beulich
@ 2020-10-29 13:05     ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2020-10-29 13:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, xen-devel, Roger Pau Monné

On 29.10.2020 09:53, Jan Beulich wrote:
> On 28.10.2020 22:31, Andrew Cooper wrote:
>> On 26/10/2020 09:40, Jan Beulich wrote:
>>> In the case that no 64-bit SYSCALL callback is registered, the guest
>>> will be crashed when 64-bit userspace executes a SYSCALL instruction,
>>> which would be a userspace => kernel DoS.  Similarly for 32-bit
>>> userspace when no 32-bit SYSCALL callback was registered either.
>>>
>>> This has been the case ever since the introduction of 64bit PV support,
>>> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
>>> yield #GP/#UD in userspace before the callback is registered, and are
>>> therefore safe by default.
>>>
>>> This change does constitute a change in the PV ABI, for the corner case
>>> of a PV guest kernel not registering a 64-bit callback (which has to be
>>> considered a defacto requirement of the unwritten PV ABI, considering
>>> there is no PV equivalent of EFER.SCE).
>>>
>>> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
>>> SYSENTER (safe by default, until explicitly enabled).
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-off-by: Jan Beulich <JBeulich@suse.com>
>>> ---
>>> v3:
>>>  * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
>>>    callbacks", to allow the uncontroversial part of that change to go
>>>    in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
>>>    over just the REX prefix too much trickery even for an unlikely to be
>>>    taken code path?)
>>
>> I find this submission confusion seeing as my v3 is already suitably
>> acked and ready to commit.  What I haven't had yet enough free time to
>> do so.
> 
> My objection to the other half stands, and hence, I'm afraid, stands
> in the way of your patch getting committed. Aiui Roger's ack doesn't
> invalidate my objection, sorry.

Actually I realize now that earlier I said I'm okay with this going in
if Roger acks it. I take it that Roger was aware of the controversy
when providing the ack. Therefore I'd like to take back what I've said
above, and your supposed v3 ought to be okay to get committed.

As to backporting - I'd surely be taking the split off part here, but
I have to admit I'm hesitant to take the full change. IOW splitting
the two changes might still be a helpful thing.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-29 13:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26  9:40 [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks Jan Beulich
2020-10-26  9:41 ` Jan Beulich
2020-10-28 21:31 ` Andrew Cooper
2020-10-29  8:53   ` Jan Beulich
2020-10-29 13:05     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).