All of lore.kernel.org
 help / color / mirror / Atom feed
* Getting rid of invalid SYSCALL RSP under Xen?
@ 2015-07-23 16:49 Andy Lutomirski
  2015-07-26 19:34 ` Andrew Cooper
  2015-07-26 19:34 ` Andrew Cooper
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-23 16:49 UTC (permalink / raw)
  To: X86 ML, Boris Ostrovsky, Andrew Cooper, linux-kernel,
	Borislav Petkov, Steven Rostedt

Hi-

In entry_64.S, we have:

ENTRY(entry_SYSCALL_64)
    /*
     * Interrupts are off on entry.
     * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
     * it is too small to ever cause noticeable irq latency.
     */
    SWAPGS_UNSAFE_STACK
    /*
     * A hypervisor implementation might want to use a label
     * after the swapgs, so that it can do the swapgs
     * for the guest and jump here on syscall.
     */
GLOBAL(entry_SYSCALL_64_after_swapgs)

    movq    %rsp, PER_CPU_VAR(rsp_scratch)
    movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp

It would be really, really nice if Xen entered the SYSCALL path
*after* the mov to %rsp.

Similarly, we have:

    movq    RSP(%rsp), %rsp
    /* big comment */
    USERGS_SYSRET64

It would be really nice if we could just mov to %rsp, swapgs, and
sysret, without worrying that the sysret is actually a jump on Xen.

I suspect that making Xen stop using these code paths would actually
be a simplification.  On SYSCALL entry, Xen lands in
xen_syscall_target (AFAICT) and clearly has rsp pointing somewhere
valid.  Xen obligingly shoves the user RSP into the hardware RSP
register and jumps into the entry code.

Is that stuff running on the normal kernel stack?  If so, can we just
enter later on:

    pushq    %r11                /* pt_regs->flags */
    pushq    $__USER_CS            /* pt_regs->cs */
    pushq    %rcx                /* pt_regs->ip */

<-- Xen enters here

    pushq    %rax                /* pt_regs->orig_ax */
    pushq    %rdi                /* pt_regs->di */
    pushq    %rsi                /* pt_regs->si */
    pushq    %rdx                /* pt_regs->dx */

For SYSRET, I think the way to go is to force Xen to always use the
syscall slow path.  Instead, Xen could hook into
syscall_return_via_sysret or even right before the opportunistic
sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.

Would this work?

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-23 16:49 Getting rid of invalid SYSCALL RSP under Xen? Andy Lutomirski
@ 2015-07-26 19:34 ` Andrew Cooper
  2015-07-26 22:08   ` Andy Lutomirski
  2015-07-26 22:08   ` Andy Lutomirski
  2015-07-26 19:34 ` Andrew Cooper
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-07-26 19:34 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Boris Ostrovsky, linux-kernel,
	Borislav Petkov, Steven Rostedt, xen-devel

On 23/07/2015 17:49, Andy Lutomirski wrote:
> Hi-

Hi.  Apologies for the delay.  I have been out of the office for a few days.

>
> In entry_64.S, we have:
>
> ENTRY(entry_SYSCALL_64)
>     /*
>      * Interrupts are off on entry.
>      * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
>      * it is too small to ever cause noticeable irq latency.
>      */
>     SWAPGS_UNSAFE_STACK
>     /*
>      * A hypervisor implementation might want to use a label
>      * after the swapgs, so that it can do the swapgs
>      * for the guest and jump here on syscall.
>      */
> GLOBAL(entry_SYSCALL_64_after_swapgs)
>
>     movq    %rsp, PER_CPU_VAR(rsp_scratch)
>     movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> It would be really, really nice if Xen entered the SYSCALL path
> *after* the mov to %rsp.
>
> Similarly, we have:
>
>     movq    RSP(%rsp), %rsp
>     /* big comment */
>     USERGS_SYSRET64
>
> It would be really nice if we could just mov to %rsp, swapgs, and
> sysret, without worrying that the sysret is actually a jump on Xen.
>
> I suspect that making Xen stop using these code paths would actually
> be a simplification.  On SYSCALL entry, Xen lands in
> xen_syscall_target (AFAICT) and clearly has rsp pointing somewhere
> valid.  Xen obligingly shoves the user RSP into the hardware RSP
> register and jumps into the entry code.
>
> Is that stuff running on the normal kernel stack?

Yes. The Xen ABI takes what is essentially tss->esp0 and uses that stack
for all "switch to kernel" actions, including syscall and sysenter.

>   If so, can we just
> enter later on:
>
>     pushq    %r11                /* pt_regs->flags */
>     pushq    $__USER_CS            /* pt_regs->cs */
>     pushq    %rcx                /* pt_regs->ip */
>
> <-- Xen enters here
>
>     pushq    %rax                /* pt_regs->orig_ax */
>     pushq    %rdi                /* pt_regs->di */
>     pushq    %rsi                /* pt_regs->si */
>     pushq    %rdx                /* pt_regs->dx */

This looks plausible, and indeed preferable to the current doublestep
with undo_xen_syscall.

One slight complication is that xen_enable_syscall() will want to
special case register_callback() to not set CALLBACKF_mask_events, as
the entry point is now after re-enabling interrupts.

>
> For SYSRET, I think the way to go is to force Xen to always use the
> syscall slow path.  Instead, Xen could hook into
> syscall_return_via_sysret or even right before the opportunistic
> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>
> Would this work?

None of the opportunistic sysret stuff makes sense under Xen.  The path
will inevitably end up in xen_iret making a hypercall.  Short circuiting
all of this seems like a good idea, especially if it allows for the
removal of the UERGS_SYSRET.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-23 16:49 Getting rid of invalid SYSCALL RSP under Xen? Andy Lutomirski
  2015-07-26 19:34 ` Andrew Cooper
@ 2015-07-26 19:34 ` Andrew Cooper
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-07-26 19:34 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, Boris Ostrovsky, linux-kernel,
	Borislav Petkov, Steven Rostedt, xen-devel

On 23/07/2015 17:49, Andy Lutomirski wrote:
> Hi-

Hi.  Apologies for the delay.  I have been out of the office for a few days.

>
> In entry_64.S, we have:
>
> ENTRY(entry_SYSCALL_64)
>     /*
>      * Interrupts are off on entry.
>      * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
>      * it is too small to ever cause noticeable irq latency.
>      */
>     SWAPGS_UNSAFE_STACK
>     /*
>      * A hypervisor implementation might want to use a label
>      * after the swapgs, so that it can do the swapgs
>      * for the guest and jump here on syscall.
>      */
> GLOBAL(entry_SYSCALL_64_after_swapgs)
>
>     movq    %rsp, PER_CPU_VAR(rsp_scratch)
>     movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> It would be really, really nice if Xen entered the SYSCALL path
> *after* the mov to %rsp.
>
> Similarly, we have:
>
>     movq    RSP(%rsp), %rsp
>     /* big comment */
>     USERGS_SYSRET64
>
> It would be really nice if we could just mov to %rsp, swapgs, and
> sysret, without worrying that the sysret is actually a jump on Xen.
>
> I suspect that making Xen stop using these code paths would actually
> be a simplification.  On SYSCALL entry, Xen lands in
> xen_syscall_target (AFAICT) and clearly has rsp pointing somewhere
> valid.  Xen obligingly shoves the user RSP into the hardware RSP
> register and jumps into the entry code.
>
> Is that stuff running on the normal kernel stack?

Yes. The Xen ABI takes what is essentially tss->esp0 and uses that stack
for all "switch to kernel" actions, including syscall and sysenter.

>   If so, can we just
> enter later on:
>
>     pushq    %r11                /* pt_regs->flags */
>     pushq    $__USER_CS            /* pt_regs->cs */
>     pushq    %rcx                /* pt_regs->ip */
>
> <-- Xen enters here
>
>     pushq    %rax                /* pt_regs->orig_ax */
>     pushq    %rdi                /* pt_regs->di */
>     pushq    %rsi                /* pt_regs->si */
>     pushq    %rdx                /* pt_regs->dx */

This looks plausible, and indeed preferable to the current doublestep
with undo_xen_syscall.

One slight complication is that xen_enable_syscall() will want to
special case register_callback() to not set CALLBACKF_mask_events, as
the entry point is now after re-enabling interrupts.

>
> For SYSRET, I think the way to go is to force Xen to always use the
> syscall slow path.  Instead, Xen could hook into
> syscall_return_via_sysret or even right before the opportunistic
> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>
> Would this work?

None of the opportunistic sysret stuff makes sense under Xen.  The path
will inevitably end up in xen_iret making a hypercall.  Short circuiting
all of this seems like a good idea, especially if it allows for the
removal of the UERGS_SYSRET.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-26 19:34 ` Andrew Cooper
  2015-07-26 22:08   ` Andy Lutomirski
@ 2015-07-26 22:08   ` Andy Lutomirski
  2015-07-26 23:05       ` Andrew Cooper
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-26 22:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: X86 ML, Boris Ostrovsky, linux-kernel, Borislav Petkov,
	Steven Rostedt, xen-devel

On Sun, Jul 26, 2015 at 12:34 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 23/07/2015 17:49, Andy Lutomirski wrote:
>> Hi-
>
> Hi.  Apologies for the delay.  I have been out of the office for a few days.
>
>>
>> In entry_64.S, we have:
>>
>> ENTRY(entry_SYSCALL_64)
>>     /*
>>      * Interrupts are off on entry.
>>      * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
>>      * it is too small to ever cause noticeable irq latency.
>>      */
>>     SWAPGS_UNSAFE_STACK
>>     /*
>>      * A hypervisor implementation might want to use a label
>>      * after the swapgs, so that it can do the swapgs
>>      * for the guest and jump here on syscall.
>>      */
>> GLOBAL(entry_SYSCALL_64_after_swapgs)
>>
>>     movq    %rsp, PER_CPU_VAR(rsp_scratch)
>>     movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>>
>> It would be really, really nice if Xen entered the SYSCALL path
>> *after* the mov to %rsp.
>>
>> Similarly, we have:
>>
>>     movq    RSP(%rsp), %rsp
>>     /* big comment */
>>     USERGS_SYSRET64
>>
>> It would be really nice if we could just mov to %rsp, swapgs, and
>> sysret, without worrying that the sysret is actually a jump on Xen.
>>
>> I suspect that making Xen stop using these code paths would actually
>> be a simplification.  On SYSCALL entry, Xen lands in
>> xen_syscall_target (AFAICT) and clearly has rsp pointing somewhere
>> valid.  Xen obligingly shoves the user RSP into the hardware RSP
>> register and jumps into the entry code.
>>
>> Is that stuff running on the normal kernel stack?
>
> Yes. The Xen ABI takes what is essentially tss->esp0 and uses that stack
> for all "switch to kernel" actions, including syscall and sysenter.
>
>>   If so, can we just
>> enter later on:
>>
>>     pushq    %r11                /* pt_regs->flags */
>>     pushq    $__USER_CS            /* pt_regs->cs */
>>     pushq    %rcx                /* pt_regs->ip */
>>
>> <-- Xen enters here
>>
>>     pushq    %rax                /* pt_regs->orig_ax */
>>     pushq    %rdi                /* pt_regs->di */
>>     pushq    %rsi                /* pt_regs->si */
>>     pushq    %rdx                /* pt_regs->dx */
>
> This looks plausible, and indeed preferable to the current doublestep
> with undo_xen_syscall.
>
> One slight complication is that xen_enable_syscall() will want to
> special case register_callback() to not set CALLBACKF_mask_events, as
> the entry point is now after re-enabling interrupts.

I wouldn't do that.  Let's just move the ENABLE_INTERRUPTS a few
instructions later even on native -- I want to do that anyway.

>
>>
>> For SYSRET, I think the way to go is to force Xen to always use the
>> syscall slow path.  Instead, Xen could hook into
>> syscall_return_via_sysret or even right before the opportunistic
>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>
>> Would this work?
>
> None of the opportunistic sysret stuff makes sense under Xen.  The path
> will inevitably end up in xen_iret making a hypercall.  Short circuiting
> all of this seems like a good idea, especially if it allows for the
> removal of the UERGS_SYSRET.

Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
should have its own opportunistic VGCF_IN_SYSCALL logic.

Hmm, maybe some of this would be easier to think about if, rather than
having a paravirt op, we could have:

ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN

Or just IF_XEN("jmp ...");

As a practical matter, x86_64 has native and Xen -- I don't think
there's any other paravirt platform that needs the asm hooks.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-26 19:34 ` Andrew Cooper
@ 2015-07-26 22:08   ` Andy Lutomirski
  2015-07-26 22:08   ` Andy Lutomirski
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-26 22:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: X86 ML, linux-kernel, Steven Rostedt, xen-devel, Borislav Petkov,
	Boris Ostrovsky

On Sun, Jul 26, 2015 at 12:34 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 23/07/2015 17:49, Andy Lutomirski wrote:
>> Hi-
>
> Hi.  Apologies for the delay.  I have been out of the office for a few days.
>
>>
>> In entry_64.S, we have:
>>
>> ENTRY(entry_SYSCALL_64)
>>     /*
>>      * Interrupts are off on entry.
>>      * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
>>      * it is too small to ever cause noticeable irq latency.
>>      */
>>     SWAPGS_UNSAFE_STACK
>>     /*
>>      * A hypervisor implementation might want to use a label
>>      * after the swapgs, so that it can do the swapgs
>>      * for the guest and jump here on syscall.
>>      */
>> GLOBAL(entry_SYSCALL_64_after_swapgs)
>>
>>     movq    %rsp, PER_CPU_VAR(rsp_scratch)
>>     movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>>
>> It would be really, really nice if Xen entered the SYSCALL path
>> *after* the mov to %rsp.
>>
>> Similarly, we have:
>>
>>     movq    RSP(%rsp), %rsp
>>     /* big comment */
>>     USERGS_SYSRET64
>>
>> It would be really nice if we could just mov to %rsp, swapgs, and
>> sysret, without worrying that the sysret is actually a jump on Xen.
>>
>> I suspect that making Xen stop using these code paths would actually
>> be a simplification.  On SYSCALL entry, Xen lands in
>> xen_syscall_target (AFAICT) and clearly has rsp pointing somewhere
>> valid.  Xen obligingly shoves the user RSP into the hardware RSP
>> register and jumps into the entry code.
>>
>> Is that stuff running on the normal kernel stack?
>
> Yes. The Xen ABI takes what is essentially tss->esp0 and uses that stack
> for all "switch to kernel" actions, including syscall and sysenter.
>
>>   If so, can we just
>> enter later on:
>>
>>     pushq    %r11                /* pt_regs->flags */
>>     pushq    $__USER_CS            /* pt_regs->cs */
>>     pushq    %rcx                /* pt_regs->ip */
>>
>> <-- Xen enters here
>>
>>     pushq    %rax                /* pt_regs->orig_ax */
>>     pushq    %rdi                /* pt_regs->di */
>>     pushq    %rsi                /* pt_regs->si */
>>     pushq    %rdx                /* pt_regs->dx */
>
> This looks plausible, and indeed preferable to the current doublestep
> with undo_xen_syscall.
>
> One slight complication is that xen_enable_syscall() will want to
> special case register_callback() to not set CALLBACKF_mask_events, as
> the entry point is now after re-enabling interrupts.

I wouldn't do that.  Let's just move the ENABLE_INTERRUPTS a few
instructions later even on native -- I want to do that anyway.

>
>>
>> For SYSRET, I think the way to go is to force Xen to always use the
>> syscall slow path.  Instead, Xen could hook into
>> syscall_return_via_sysret or even right before the opportunistic
>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>
>> Would this work?
>
> None of the opportunistic sysret stuff makes sense under Xen.  The path
> will inevitably end up in xen_iret making a hypercall.  Short circuiting
> all of this seems like a good idea, especially if it allows for the
> removal of the UERGS_SYSRET.

Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
should have its own opportunistic VGCF_IN_SYSCALL logic.

Hmm, maybe some of this would be easier to think about if, rather than
having a paravirt op, we could have:

ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN

Or just IF_XEN("jmp ...");

As a practical matter, x86_64 has native and Xen -- I don't think
there's any other paravirt platform that needs the asm hooks.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-26 22:08   ` Andy Lutomirski
@ 2015-07-26 23:05       ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-07-26 23:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Boris Ostrovsky, linux-kernel, Borislav Petkov,
	Steven Rostedt, xen-devel

On 26/07/2015 23:08, Andy Lutomirski wrote:
>
>>>   If so, can we just
>>> enter later on:
>>>
>>>     pushq    %r11                /* pt_regs->flags */
>>>     pushq    $__USER_CS            /* pt_regs->cs */
>>>     pushq    %rcx                /* pt_regs->ip */
>>>
>>> <-- Xen enters here
>>>
>>>     pushq    %rax                /* pt_regs->orig_ax */
>>>     pushq    %rdi                /* pt_regs->di */
>>>     pushq    %rsi                /* pt_regs->si */
>>>     pushq    %rdx                /* pt_regs->dx */
>> This looks plausible, and indeed preferable to the current doublestep
>> with undo_xen_syscall.
>>
>> One slight complication is that xen_enable_syscall() will want to
>> special case register_callback() to not set CALLBACKF_mask_events, as
>> the entry point is now after re-enabling interrupts.
> I wouldn't do that.  Let's just move the ENABLE_INTERRUPTS a few
> instructions later even on native -- I want to do that anyway.

That would also work.

>
>>> For SYSRET, I think the way to go is to force Xen to always use the
>>> syscall slow path.  Instead, Xen could hook into
>>> syscall_return_via_sysret or even right before the opportunistic
>>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>>
>>> Would this work?
>> None of the opportunistic sysret stuff makes sense under Xen.  The path
>> will inevitably end up in xen_iret making a hypercall.  Short circuiting
>> all of this seems like a good idea, especially if it allows for the
>> removal of the UERGS_SYSRET.
> Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
> should have its own opportunistic VGCF_IN_SYSCALL logic.

VGCF_in_syscall affects whether the extra r11/rcx get restored or not,
as the hypercall itself is implemented using syscall.  As the extra
r11/rcx (and rax for that matter) are unconditionally saved in the
hypercall stub, I can't see anything Linux could usefully do,
opportunistically speaking.

>
> Hmm, maybe some of this would be easier to think about if, rather than
> having a paravirt op, we could have:
>
> ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN
>
> Or just IF_XEN("jmp ...");
>
> As a practical matter, x86_64 has native and Xen -- I don't think
> there's any other paravirt platform that needs the asm hooks.

It would certainly seem so.  A careful use of IF_XEN() or two would make
the code far clearer to read, and to drop the hooks.

~Andrew


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
@ 2015-07-26 23:05       ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-07-26 23:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Steven Rostedt, xen-devel, Borislav Petkov,
	Boris Ostrovsky

On 26/07/2015 23:08, Andy Lutomirski wrote:
>
>>>   If so, can we just
>>> enter later on:
>>>
>>>     pushq    %r11                /* pt_regs->flags */
>>>     pushq    $__USER_CS            /* pt_regs->cs */
>>>     pushq    %rcx                /* pt_regs->ip */
>>>
>>> <-- Xen enters here
>>>
>>>     pushq    %rax                /* pt_regs->orig_ax */
>>>     pushq    %rdi                /* pt_regs->di */
>>>     pushq    %rsi                /* pt_regs->si */
>>>     pushq    %rdx                /* pt_regs->dx */
>> This looks plausible, and indeed preferable to the current doublestep
>> with undo_xen_syscall.
>>
>> One slight complication is that xen_enable_syscall() will want to
>> special case register_callback() to not set CALLBACKF_mask_events, as
>> the entry point is now after re-enabling interrupts.
> I wouldn't do that.  Let's just move the ENABLE_INTERRUPTS a few
> instructions later even on native -- I want to do that anyway.

That would also work.

>
>>> For SYSRET, I think the way to go is to force Xen to always use the
>>> syscall slow path.  Instead, Xen could hook into
>>> syscall_return_via_sysret or even right before the opportunistic
>>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>>
>>> Would this work?
>> None of the opportunistic sysret stuff makes sense under Xen.  The path
>> will inevitably end up in xen_iret making a hypercall.  Short circuiting
>> all of this seems like a good idea, especially if it allows for the
>> removal of the UERGS_SYSRET.
> Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
> should have its own opportunistic VGCF_IN_SYSCALL logic.

VGCF_in_syscall affects whether the extra r11/rcx get restored or not,
as the hypercall itself is implemented using syscall.  As the extra
r11/rcx (and rax for that matter) are unconditionally saved in the
hypercall stub, I can't see anything Linux could usefully do,
opportunistically speaking.

>
> Hmm, maybe some of this would be easier to think about if, rather than
> having a paravirt op, we could have:
>
> ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN
>
> Or just IF_XEN("jmp ...");
>
> As a practical matter, x86_64 has native and Xen -- I don't think
> there's any other paravirt platform that needs the asm hooks.

It would certainly seem so.  A careful use of IF_XEN() or two would make
the code far clearer to read, and to drop the hooks.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-26 23:05       ` Andrew Cooper
  (?)
  (?)
@ 2015-07-26 23:27       ` Andy Lutomirski
  2015-07-27 13:52         ` Andrew Cooper
  2015-07-27 13:52         ` Andrew Cooper
  -1 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-26 23:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: X86 ML, Boris Ostrovsky, linux-kernel, Borislav Petkov,
	Steven Rostedt, xen-devel

On Sun, Jul 26, 2015 at 4:05 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 26/07/2015 23:08, Andy Lutomirski wrote:
>>
>>>>   If so, can we just
>>>> enter later on:
>>>>
>>>>     pushq    %r11                /* pt_regs->flags */
>>>>     pushq    $__USER_CS            /* pt_regs->cs */
>>>>     pushq    %rcx                /* pt_regs->ip */
>>>>
>>>> <-- Xen enters here
>>>>
>>>>     pushq    %rax                /* pt_regs->orig_ax */
>>>>     pushq    %rdi                /* pt_regs->di */
>>>>     pushq    %rsi                /* pt_regs->si */
>>>>     pushq    %rdx                /* pt_regs->dx */
>>> This looks plausible, and indeed preferable to the current doublestep
>>> with undo_xen_syscall.
>>>
>>> One slight complication is that xen_enable_syscall() will want to
>>> special case register_callback() to not set CALLBACKF_mask_events, as
>>> the entry point is now after re-enabling interrupts.
>> I wouldn't do that.  Let's just move the ENABLE_INTERRUPTS a few
>> instructions later even on native -- I want to do that anyway.
>
> That would also work.
>
>>
>>>> For SYSRET, I think the way to go is to force Xen to always use the
>>>> syscall slow path.  Instead, Xen could hook into
>>>> syscall_return_via_sysret or even right before the opportunistic
>>>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>>>
>>>> Would this work?
>>> None of the opportunistic sysret stuff makes sense under Xen.  The path
>>> will inevitably end up in xen_iret making a hypercall.  Short circuiting
>>> all of this seems like a good idea, especially if it allows for the
>>> removal of the UERGS_SYSRET.
>> Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
>> should have its own opportunistic VGCF_IN_SYSCALL logic.
>
> VGCF_in_syscall affects whether the extra r11/rcx get restored or not,
> as the hypercall itself is implemented using syscall.  As the extra
> r11/rcx (and rax for that matter) are unconditionally saved in the
> hypercall stub, I can't see anything Linux could usefully do,
> opportunistically speaking.

Xen does:

/* %rbx: struct vcpu, interrupts disabled */
restore_all_guest:
        ASSERT_INTERRUPTS_DISABLED
        RESTORE_ALL
        testw $TRAP_syscall,4(%rsp)
        jz    iret_exit_to_guest

        /* Don't use SYSRET path if the return address is not canonical. */
        movq  8(%rsp),%rcx
        sarq  $47,%rcx
        incl  %ecx
        cmpl  $1,%ecx
        ja    .Lforce_iret

        cmpw  $FLAT_USER_CS32,16(%rsp)# CS
        movq  8(%rsp),%rcx            # RIP
        movq  24(%rsp),%r11           # RFLAGS
        movq  32(%rsp),%rsp           # RSP
        je    1f
        sysretq
1:      sysretl

That's essentially the same thing as opportunistic sysret.  If Linux
stops setting VGCF_in_syscall, though, I think we'll bypass that code,
which will hurt performance.  Whether this should be fixed in the
hypervisor or in the guest kernel hooks, I don't know, but it would be
easy to have a very simple xen_opportunistic_sysret path that checks
rcx==rip and r11==rflags and, if so, sets VGCF_in_syscall.

>
>>
>> Hmm, maybe some of this would be easier to think about if, rather than
>> having a paravirt op, we could have:
>>
>> ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN
>>
>> Or just IF_XEN("jmp ...");
>>
>> As a practical matter, x86_64 has native and Xen -- I don't think
>> there's any other paravirt platform that needs the asm hooks.
>
> It would certainly seem so.  A careful use of IF_XEN() or two would make
> the code far clearer to read, and to drop the hooks.
>

Want to add an IF_XEN macro?

I'm about to send patches for the SYSCALL bit.

--Andy

> ~Andrew
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-26 23:05       ` Andrew Cooper
  (?)
@ 2015-07-26 23:27       ` Andy Lutomirski
  -1 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-26 23:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: X86 ML, linux-kernel, Steven Rostedt, xen-devel, Borislav Petkov,
	Boris Ostrovsky

On Sun, Jul 26, 2015 at 4:05 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 26/07/2015 23:08, Andy Lutomirski wrote:
>>
>>>>   If so, can we just
>>>> enter later on:
>>>>
>>>>     pushq    %r11                /* pt_regs->flags */
>>>>     pushq    $__USER_CS            /* pt_regs->cs */
>>>>     pushq    %rcx                /* pt_regs->ip */
>>>>
>>>> <-- Xen enters here
>>>>
>>>>     pushq    %rax                /* pt_regs->orig_ax */
>>>>     pushq    %rdi                /* pt_regs->di */
>>>>     pushq    %rsi                /* pt_regs->si */
>>>>     pushq    %rdx                /* pt_regs->dx */
>>> This looks plausible, and indeed preferable to the current doublestep
>>> with undo_xen_syscall.
>>>
>>> One slight complication is that xen_enable_syscall() will want to
>>> special case register_callback() to not set CALLBACKF_mask_events, as
>>> the entry point is now after re-enabling interrupts.
>> I wouldn't do that.  Let's just move the ENABLE_INTERRUPTS a few
>> instructions later even on native -- I want to do that anyway.
>
> That would also work.
>
>>
>>>> For SYSRET, I think the way to go is to force Xen to always use the
>>>> syscall slow path.  Instead, Xen could hook into
>>>> syscall_return_via_sysret or even right before the opportunistic
>>>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>>>
>>>> Would this work?
>>> None of the opportunistic sysret stuff makes sense under Xen.  The path
>>> will inevitably end up in xen_iret making a hypercall.  Short circuiting
>>> all of this seems like a good idea, especially if it allows for the
>>> removal of the UERGS_SYSRET.
>> Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
>> should have its own opportunistic VGCF_IN_SYSCALL logic.
>
> VGCF_in_syscall affects whether the extra r11/rcx get restored or not,
> as the hypercall itself is implemented using syscall.  As the extra
> r11/rcx (and rax for that matter) are unconditionally saved in the
> hypercall stub, I can't see anything Linux could usefully do,
> opportunistically speaking.

Xen does:

/* %rbx: struct vcpu, interrupts disabled */
restore_all_guest:
        ASSERT_INTERRUPTS_DISABLED
        RESTORE_ALL
        testw $TRAP_syscall,4(%rsp)
        jz    iret_exit_to_guest

        /* Don't use SYSRET path if the return address is not canonical. */
        movq  8(%rsp),%rcx
        sarq  $47,%rcx
        incl  %ecx
        cmpl  $1,%ecx
        ja    .Lforce_iret

        cmpw  $FLAT_USER_CS32,16(%rsp)# CS
        movq  8(%rsp),%rcx            # RIP
        movq  24(%rsp),%r11           # RFLAGS
        movq  32(%rsp),%rsp           # RSP
        je    1f
        sysretq
1:      sysretl

That's essentially the same thing as opportunistic sysret.  If Linux
stops setting VGCF_in_syscall, though, I think we'll bypass that code,
which will hurt performance.  Whether this should be fixed in the
hypervisor or in the guest kernel hooks, I don't know, but it would be
easy to have a very simple xen_opportunistic_sysret path that checks
rcx==rip and r11==rflags and, if so, sets VGCF_in_syscall.

>
>>
>> Hmm, maybe some of this would be easier to think about if, rather than
>> having a paravirt op, we could have:
>>
>> ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN
>>
>> Or just IF_XEN("jmp ...");
>>
>> As a practical matter, x86_64 has native and Xen -- I don't think
>> there's any other paravirt platform that needs the asm hooks.
>
> It would certainly seem so.  A careful use of IF_XEN() or two would make
> the code far clearer to read, and to drop the hooks.
>

Want to add an IF_XEN macro?

I'm about to send patches for the SYSCALL bit.

--Andy

> ~Andrew
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-26 23:27       ` Andy Lutomirski
  2015-07-27 13:52         ` Andrew Cooper
@ 2015-07-27 13:52         ` Andrew Cooper
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-07-27 13:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Boris Ostrovsky, linux-kernel, Borislav Petkov,
	Steven Rostedt, xen-devel

On 27/07/15 00:27, Andy Lutomirski wrote:
>
>>>>> For SYSRET, I think the way to go is to force Xen to always use the
>>>>> syscall slow path.  Instead, Xen could hook into
>>>>> syscall_return_via_sysret or even right before the opportunistic
>>>>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>>>>
>>>>> Would this work?
>>>> None of the opportunistic sysret stuff makes sense under Xen.  The path
>>>> will inevitably end up in xen_iret making a hypercall.  Short circuiting
>>>> all of this seems like a good idea, especially if it allows for the
>>>> removal of the UERGS_SYSRET.
>>> Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
>>> should have its own opportunistic VGCF_IN_SYSCALL logic.
>> VGCF_in_syscall affects whether the extra r11/rcx get restored or not,
>> as the hypercall itself is implemented using syscall.  As the extra
>> r11/rcx (and rax for that matter) are unconditionally saved in the
>> hypercall stub, I can't see anything Linux could usefully do,
>> opportunistically speaking.
> Xen does:
>
> /* %rbx: struct vcpu, interrupts disabled */
> restore_all_guest:
>         ASSERT_INTERRUPTS_DISABLED
>         RESTORE_ALL
>         testw $TRAP_syscall,4(%rsp)
>         jz    iret_exit_to_guest
>
>         /* Don't use SYSRET path if the return address is not canonical. */
>         movq  8(%rsp),%rcx
>         sarq  $47,%rcx
>         incl  %ecx
>         cmpl  $1,%ecx
>         ja    .Lforce_iret
>
>         cmpw  $FLAT_USER_CS32,16(%rsp)# CS
>         movq  8(%rsp),%rcx            # RIP
>         movq  24(%rsp),%r11           # RFLAGS
>         movq  32(%rsp),%rsp           # RSP
>         je    1f
>         sysretq
> 1:      sysretl
>
> That's essentially the same thing as opportunistic sysret.  If Linux
> stops setting VGCF_in_syscall, though, I think we'll bypass that code,
> which will hurt performance.  Whether this should be fixed in the
> hypervisor or in the guest kernel hooks, I don't know, but it would be
> easy to have a very simple xen_opportunistic_sysret path that checks
> rcx==rip and r11==rflags and, if so, sets VGCF_in_syscall.

I see your point.  I didn't intend to suggest that Linux should stop
setting VGCF_in_syscall, as it is the only entity which knows whether it
is safe to clobber rcx/r11 in user context.

Having said this, Xen could certainly do its own opportunistic sysret
calculations as well.  There are a number of issues in the Xen sysret
code which I plan to fix in due course, and I will see about making this
adjustment.

>
>>> Hmm, maybe some of this would be easier to think about if, rather than
>>> having a paravirt op, we could have:
>>>
>>> ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN
>>>
>>> Or just IF_XEN("jmp ...");
>>>
>>> As a practical matter, x86_64 has native and Xen -- I don't think
>>> there's any other paravirt platform that needs the asm hooks.
>> It would certainly seem so.  A careful use of IF_XEN() or two would make
>> the code far clearer to read, and to drop the hooks.
>>
> Want to add an IF_XEN macro?

I currently have a blocker bug against the impending Xen 4.6 release
which is higher on my todo list, but I will look into this as soon as I can.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Getting rid of invalid SYSCALL RSP under Xen?
  2015-07-26 23:27       ` Andy Lutomirski
@ 2015-07-27 13:52         ` Andrew Cooper
  2015-07-27 13:52         ` Andrew Cooper
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-07-27 13:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Steven Rostedt, xen-devel, Borislav Petkov,
	Boris Ostrovsky

On 27/07/15 00:27, Andy Lutomirski wrote:
>
>>>>> For SYSRET, I think the way to go is to force Xen to always use the
>>>>> syscall slow path.  Instead, Xen could hook into
>>>>> syscall_return_via_sysret or even right before the opportunistic
>>>>> sysret stuff.  Then we could remove the USERGS_SYSRET hooks entirely.
>>>>>
>>>>> Would this work?
>>>> None of the opportunistic sysret stuff makes sense under Xen.  The path
>>>> will inevitably end up in xen_iret making a hypercall.  Short circuiting
>>>> all of this seems like a good idea, especially if it allows for the
>>>> removal of the UERGS_SYSRET.
>>> Doesn't Xen decide what to do based on VGCF_IN_SYSCALL?  Maybe Xen
>>> should have its own opportunistic VGCF_IN_SYSCALL logic.
>> VGCF_in_syscall affects whether the extra r11/rcx get restored or not,
>> as the hypercall itself is implemented using syscall.  As the extra
>> r11/rcx (and rax for that matter) are unconditionally saved in the
>> hypercall stub, I can't see anything Linux could usefully do,
>> opportunistically speaking.
> Xen does:
>
> /* %rbx: struct vcpu, interrupts disabled */
> restore_all_guest:
>         ASSERT_INTERRUPTS_DISABLED
>         RESTORE_ALL
>         testw $TRAP_syscall,4(%rsp)
>         jz    iret_exit_to_guest
>
>         /* Don't use SYSRET path if the return address is not canonical. */
>         movq  8(%rsp),%rcx
>         sarq  $47,%rcx
>         incl  %ecx
>         cmpl  $1,%ecx
>         ja    .Lforce_iret
>
>         cmpw  $FLAT_USER_CS32,16(%rsp)# CS
>         movq  8(%rsp),%rcx            # RIP
>         movq  24(%rsp),%r11           # RFLAGS
>         movq  32(%rsp),%rsp           # RSP
>         je    1f
>         sysretq
> 1:      sysretl
>
> That's essentially the same thing as opportunistic sysret.  If Linux
> stops setting VGCF_in_syscall, though, I think we'll bypass that code,
> which will hurt performance.  Whether this should be fixed in the
> hypervisor or in the guest kernel hooks, I don't know, but it would be
> easy to have a very simple xen_opportunistic_sysret path that checks
> rcx==rip and r11==rflags and, if so, sets VGCF_in_syscall.

I see your point.  I didn't intend to suggest that Linux should stop
setting VGCF_in_syscall, as it is the only entity which knows whether it
is safe to clobber rcx/r11 in user context.

Having said this, Xen could certainly do its own opportunistic sysret
calculations as well.  There are a number of issues in the Xen sysret
code which I plan to fix in due course, and I will see about making this
adjustment.

>
>>> Hmm, maybe some of this would be easier to think about if, rather than
>>> having a paravirt op, we could have:
>>>
>>> ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN
>>>
>>> Or just IF_XEN("jmp ...");
>>>
>>> As a practical matter, x86_64 has native and Xen -- I don't think
>>> there's any other paravirt platform that needs the asm hooks.
>> It would certainly seem so.  A careful use of IF_XEN() or two would make
>> the code far clearer to read, and to drop the hooks.
>>
> Want to add an IF_XEN macro?

I currently have a blocker bug against the impending Xen 4.6 release
which is higher on my todo list, but I will look into this as soon as I can.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-07-27 13:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-23 16:49 Getting rid of invalid SYSCALL RSP under Xen? Andy Lutomirski
2015-07-26 19:34 ` Andrew Cooper
2015-07-26 22:08   ` Andy Lutomirski
2015-07-26 22:08   ` Andy Lutomirski
2015-07-26 23:05     ` Andrew Cooper
2015-07-26 23:05       ` Andrew Cooper
2015-07-26 23:27       ` Andy Lutomirski
2015-07-26 23:27       ` Andy Lutomirski
2015-07-27 13:52         ` Andrew Cooper
2015-07-27 13:52         ` Andrew Cooper
2015-07-26 19:34 ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.