* [PATCH] xen/x86: Adjust stack pointer in xen_sysexit @ 2015-11-13 23:18 Boris Ostrovsky 2015-11-13 23:26 ` Andy Lutomirski 2015-11-13 23:26 ` Andy Lutomirski 0 siblings, 2 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-13 23:18 UTC (permalink / raw) To: konrad.wilk, david.vrabel; +Cc: xen-devel, linux-kernel, luto, boris.ostrovsky After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack frame that is passed to xen_sysexit is no longer a "standard" one (i.e. it's not pt_regs). We need to adjust it so that subsequent xen_iret can use it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- Alternatively, we could return 0 from do_fast_syscall_32() if paravirt_enabled() is true since Xen PV guests will end up using xen_iret one way or the other. And then we won't need xen_sysexit at all. arch/x86/xen/xen-asm_32.S | 23 ++++++++++++++++------- 1 files changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S index fd92a64..c70ec37 100644 --- a/arch/x86/xen/xen-asm_32.S +++ b/arch/x86/xen/xen-asm_32.S @@ -36,15 +36,24 @@ check_events: /* * We can't use sysexit directly, because we're not running in ring0. - * But we can easily fake it up using iret. Assuming xen_sysexit is - * jumped to with a standard stack frame, we can just strip it back to - * a standard iret frame and use iret. + * But we can easily fake it up using iret. + * We came here from the opportunistic SYSEXIT path in entry_SYSENTER_32 + * which left the stack looking like this: + * $__USER_DS + * %ecx + * eflags + * $__USER_CS + * %eip + * %eax + * %gs + * %fs + * %es + * %ds <-- %esp + * + * so we need to adjust it to look like a standard iret frame */ ENTRY(xen_sysexit) - movl PT_EAX(%esp), %eax /* Shouldn't be necessary? */ - orl $X86_EFLAGS_IF, PT_EFLAGS(%esp) - lea PT_EIP(%esp), %esp - + add $5*4, %esp jmp xen_iret ENDPROC(xen_sysexit) -- 1.7.1 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-13 23:18 [PATCH] xen/x86: Adjust stack pointer in xen_sysexit Boris Ostrovsky @ 2015-11-13 23:26 ` Andy Lutomirski 2015-11-14 1:23 ` Boris Ostrovsky 2015-11-14 1:23 ` Boris Ostrovsky 2015-11-13 23:26 ` Andy Lutomirski 1 sibling, 2 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-13 23:26 UTC (permalink / raw) To: Boris Ostrovsky Cc: Konrad Rzeszutek Wilk, David Vrabel, xen-devel, linux-kernel, Andrew Lutomirski On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c > ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack > frame that is passed to xen_sysexit is no longer a "standard" one (i.e. > it's not pt_regs). > > We need to adjust it so that subsequent xen_iret can use it. I'm wondering if this should be more straightforward: movq %rsp, %rdi call do_fast_syscall_32 testl %eax, %eax jz .Lsyscall_32_done /* Opportunistic SYSRET */ sysret32_from_system_call: XEN_DO_SYSRET32 where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a variant of Xen's iret path that knows that the fast path is okay. --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-13 23:26 ` Andy Lutomirski @ 2015-11-14 1:23 ` Boris Ostrovsky 2015-11-14 1:23 ` Boris Ostrovsky 1 sibling, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-14 1:23 UTC (permalink / raw) To: Andy Lutomirski; +Cc: linux-kernel, xen-devel, David Vrabel, Andrew Lutomirski On 11/13/2015 06:26 PM, Andy Lutomirski wrote: > On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >> it's not pt_regs). >> >> We need to adjust it so that subsequent xen_iret can use it. > I'm wondering if this should be more straightforward: > > movq %rsp, %rdi > call do_fast_syscall_32 > testl %eax, %eax > jz .Lsyscall_32_done > > /* Opportunistic SYSRET */ > sysret32_from_system_call: > XEN_DO_SYSRET32 > > where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a > variant of Xen's iret path that knows that the fast path is okay. This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too. As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-13 23:26 ` Andy Lutomirski 2015-11-14 1:23 ` Boris Ostrovsky @ 2015-11-14 1:23 ` Boris Ostrovsky 2015-11-15 18:02 ` Andy Lutomirski 2015-11-15 18:02 ` Andy Lutomirski 1 sibling, 2 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-14 1:23 UTC (permalink / raw) To: Andy Lutomirski Cc: Konrad Rzeszutek Wilk, David Vrabel, xen-devel, linux-kernel, Andrew Lutomirski On 11/13/2015 06:26 PM, Andy Lutomirski wrote: > On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >> it's not pt_regs). >> >> We need to adjust it so that subsequent xen_iret can use it. > I'm wondering if this should be more straightforward: > > movq %rsp, %rdi > call do_fast_syscall_32 > testl %eax, %eax > jz .Lsyscall_32_done > > /* Opportunistic SYSRET */ > sysret32_from_system_call: > XEN_DO_SYSRET32 > > where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a > variant of Xen's iret path that knows that the fast path is okay. This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too. As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-14 1:23 ` Boris Ostrovsky @ 2015-11-15 18:02 ` Andy Lutomirski 2015-11-15 18:02 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-15 18:02 UTC (permalink / raw) To: Boris Ostrovsky; +Cc: David Vrabel, linux-kernel, xen-devel On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: > > > > On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >> >> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >> <boris.ostrovsky@oracle.com> wrote: >>> >>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>> it's not pt_regs). >>> >>> We need to adjust it so that subsequent xen_iret can use it. >> >> I'm wondering if this should be more straightforward: >> >> movq %rsp, %rdi >> call do_fast_syscall_32 >> testl %eax, %eax >> jz .Lsyscall_32_done >> >> /* Opportunistic SYSRET */ >> sysret32_from_system_call: >> XEN_DO_SYSRET32 >> >> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >> variant of Xen's iret path that knows that the fast path is okay. > > > > This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too. In 4.4, it's almost identical (which was part of the point of this whole series). We use sysret32 instead of sysexit, but the underlying structure is the same: munge the stack frame and register state appropriately to use the fast return instruction in question and then execute it. In both cases, the only real difference from the IRET path is that we're willing to lose the values of some subset of cx, dx, and (on 64-bit kernels) r11. > > As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think. Agreed, unless... Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could be easier to follow and be less code at the same time. Frankly, following the control flow from asm through the pre-paravirt-patching and post-paravirt-patching variants and into the final targets is getting a little bit old, and ALTERNATIVE is crystal clear in comparison (and has all the interesting info inline with the rest of the asm). Of course, it doesn't work early in boot, but that's fine for anything involving user/kernel switches. --Andy > > -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-14 1:23 ` Boris Ostrovsky 2015-11-15 18:02 ` Andy Lutomirski @ 2015-11-15 18:02 ` Andy Lutomirski 2015-11-16 16:25 ` Boris Ostrovsky 2015-11-16 16:25 ` Boris Ostrovsky 1 sibling, 2 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-15 18:02 UTC (permalink / raw) To: Boris Ostrovsky Cc: linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: > > > > On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >> >> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >> <boris.ostrovsky@oracle.com> wrote: >>> >>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>> it's not pt_regs). >>> >>> We need to adjust it so that subsequent xen_iret can use it. >> >> I'm wondering if this should be more straightforward: >> >> movq %rsp, %rdi >> call do_fast_syscall_32 >> testl %eax, %eax >> jz .Lsyscall_32_done >> >> /* Opportunistic SYSRET */ >> sysret32_from_system_call: >> XEN_DO_SYSRET32 >> >> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >> variant of Xen's iret path that knows that the fast path is okay. > > > > This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too. In 4.4, it's almost identical (which was part of the point of this whole series). We use sysret32 instead of sysexit, but the underlying structure is the same: munge the stack frame and register state appropriately to use the fast return instruction in question and then execute it. In both cases, the only real difference from the IRET path is that we're willing to lose the values of some subset of cx, dx, and (on 64-bit kernels) r11. > > As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think. Agreed, unless... Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could be easier to follow and be less code at the same time. Frankly, following the control flow from asm through the pre-paravirt-patching and post-paravirt-patching variants and into the final targets is getting a little bit old, and ALTERNATIVE is crystal clear in comparison (and has all the interesting info inline with the rest of the asm). Of course, it doesn't work early in boot, but that's fine for anything involving user/kernel switches. --Andy > > -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-15 18:02 ` Andy Lutomirski @ 2015-11-16 16:25 ` Boris Ostrovsky 2015-11-16 16:25 ` Boris Ostrovsky 1 sibling, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-16 16:25 UTC (permalink / raw) To: Andy Lutomirski; +Cc: David Vrabel, linux-kernel, xen-devel On 11/15/2015 01:02 PM, Andy Lutomirski wrote: > On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >> >> >> On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >>> <boris.ostrovsky@oracle.com> wrote: >>>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>>> it's not pt_regs). >>>> >>>> We need to adjust it so that subsequent xen_iret can use it. >>> I'm wondering if this should be more straightforward: >>> >>> movq %rsp, %rdi >>> call do_fast_syscall_32 >>> testl %eax, %eax >>> jz .Lsyscall_32_done >>> >>> /* Opportunistic SYSRET */ >>> sysret32_from_system_call: >>> XEN_DO_SYSRET32 >>> >>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >>> variant of Xen's iret path that knows that the fast path is okay. >> >> >> This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too. > In 4.4, it's almost identical (which was part of the point of this > whole series). We use sysret32 instead of sysexit, but the underlying > structure is the same: munge the stack frame and register state > appropriately to use the fast return instruction in question and then > execute it. In both cases, the only real difference from the IRET > path is that we're willing to lose the values of some subset of cx, > dx, and (on 64-bit kernels) r11. So it turned out that for compat mode we don't need to do anything since xen_sysret32 doesn't assume any stack format (or, rather, it assumes that it can't be used) and builds the IRET frame itself. > >> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think. > Agreed, unless... > > Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could > be easier to follow and be less code at the same time. Frankly, > following the control flow from asm through the pre-paravirt-patching > and post-paravirt-patching variants and into the final targets is > getting a little bit old, and ALTERNATIVE is crystal clear in > comparison (and has all the interesting info inline with the rest of > the asm). Of course, it doesn't work early in boot, but that's fine > for anything involving user/kernel switches. We don't currently have a Xen-specific CPU feature. We could, in principle, add it but we can't replace all of current paravirt patching with a single feature since PVH guests use a subset of existing pv ops (and in the future it may become even more fine-grained). And I don't think we should go ALTERNATIVE route for one set of features and keep pv ops for the rest --- it should be either one or the other. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-15 18:02 ` Andy Lutomirski 2015-11-16 16:25 ` Boris Ostrovsky @ 2015-11-16 16:25 ` Boris Ostrovsky 2015-11-16 19:03 ` Andy Lutomirski 2015-11-16 19:03 ` Andy Lutomirski 1 sibling, 2 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-16 16:25 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On 11/15/2015 01:02 PM, Andy Lutomirski wrote: > On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >> >> >> On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >>> <boris.ostrovsky@oracle.com> wrote: >>>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>>> it's not pt_regs). >>>> >>>> We need to adjust it so that subsequent xen_iret can use it. >>> I'm wondering if this should be more straightforward: >>> >>> movq %rsp, %rdi >>> call do_fast_syscall_32 >>> testl %eax, %eax >>> jz .Lsyscall_32_done >>> >>> /* Opportunistic SYSRET */ >>> sysret32_from_system_call: >>> XEN_DO_SYSRET32 >>> >>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >>> variant of Xen's iret path that knows that the fast path is okay. >> >> >> This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too. > In 4.4, it's almost identical (which was part of the point of this > whole series). We use sysret32 instead of sysexit, but the underlying > structure is the same: munge the stack frame and register state > appropriately to use the fast return instruction in question and then > execute it. In both cases, the only real difference from the IRET > path is that we're willing to lose the values of some subset of cx, > dx, and (on 64-bit kernels) r11. So it turned out that for compat mode we don't need to do anything since xen_sysret32 doesn't assume any stack format (or, rather, it assumes that it can't be used) and builds the IRET frame itself. > >> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think. > Agreed, unless... > > Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could > be easier to follow and be less code at the same time. Frankly, > following the control flow from asm through the pre-paravirt-patching > and post-paravirt-patching variants and into the final targets is > getting a little bit old, and ALTERNATIVE is crystal clear in > comparison (and has all the interesting info inline with the rest of > the asm). Of course, it doesn't work early in boot, but that's fine > for anything involving user/kernel switches. We don't currently have a Xen-specific CPU feature. We could, in principle, add it but we can't replace all of current paravirt patching with a single feature since PVH guests use a subset of existing pv ops (and in the future it may become even more fine-grained). And I don't think we should go ALTERNATIVE route for one set of features and keep pv ops for the rest --- it should be either one or the other. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 16:25 ` Boris Ostrovsky @ 2015-11-16 19:03 ` Andy Lutomirski 2015-11-16 19:59 ` Borislav Petkov ` (3 more replies) 2015-11-16 19:03 ` Andy Lutomirski 1 sibling, 4 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 19:03 UTC (permalink / raw) To: Boris Ostrovsky Cc: linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk, Borislav Petkov On Mon, Nov 16, 2015 at 8:25 AM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 11/15/2015 01:02 PM, Andy Lutomirski wrote: >> >> On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> >> wrote: >>> >>> >>> >>> On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >>>> >>>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >>>> <boris.ostrovsky@oracle.com> wrote: >>>>> >>>>> After 32-bit syscall rewrite, and specifically after commit >>>>> 5f310f739b4c >>>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>>>> it's not pt_regs). >>>>> >>>>> We need to adjust it so that subsequent xen_iret can use it. >>>> >>>> I'm wondering if this should be more straightforward: >>>> >>>> movq %rsp, %rdi >>>> call do_fast_syscall_32 >>>> testl %eax, %eax >>>> jz .Lsyscall_32_done >>>> >>>> /* Opportunistic SYSRET */ >>>> sysret32_from_system_call: >>>> XEN_DO_SYSRET32 >>>> >>>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >>>> variant of Xen's iret path that knows that the fast path is okay. >>> >>> >>> >>> This patch is for 32-bit kernel. I actually haven't looked at compat code >>> (probably because our tests don't try that), I need to do that too. >> >> In 4.4, it's almost identical (which was part of the point of this >> whole series). We use sysret32 instead of sysexit, but the underlying >> structure is the same: munge the stack frame and register state >> appropriately to use the fast return instruction in question and then >> execute it. In both cases, the only real difference from the IRET >> path is that we're willing to lose the values of some subset of cx, >> dx, and (on 64-bit kernels) r11. > > > > So it turned out that for compat mode we don't need to do anything since > xen_sysret32 doesn't assume any stack format (or, rather, it assumes that it > can't be used) and builds the IRET frame itself. > It's still a waste of effort, though. Also, I'd eventually like the number of places in Xen code in which rsp/esp is invalid to be exactly zero, and this approach makes this harder or even impossible. > >> >>> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for >>> baremetal otherwise current paravirt op will use native_usergs_sysret32 (for >>> compat code). Which means a new pv_op, I think. >> >> Agreed, unless... >> >> Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could >> be easier to follow and be less code at the same time. Frankly, >> following the control flow from asm through the pre-paravirt-patching >> and post-paravirt-patching variants and into the final targets is >> getting a little bit old, and ALTERNATIVE is crystal clear in >> comparison (and has all the interesting info inline with the rest of >> the asm). Of course, it doesn't work early in boot, but that's fine >> for anything involving user/kernel switches. > > > > We don't currently have a Xen-specific CPU feature. We could, in principle, > add it but we can't replace all of current paravirt patching with a single > feature since PVH guests use a subset of existing pv ops (and in the future > it may become even more fine-grained). > > And I don't think we should go ALTERNATIVE route for one set of features and > keep pv ops for the rest --- it should be either one or the other. Does PVH hook into the entry asm code at all? I thought it was just boot code and drivers. In any case, someone needs to do some serious review and cleanup on the whole paravirt op mess. We have a bunch of paravirt ops that serve little purpose. The paravirt infrastructure is a bit weird, too: it seems to effectively have four states for each patch site. There's: 1. The initial state, which is unoptimized and works on native. Presumably any of these that happen early also need to work, if slowly, on Xen. 2. The Xen state without text patching. I'm not actually sure why this exists at all. Are there pvops that need to switch too early for us to patch the text? 3. The native patched state. This is supposedly optimal, but it results in a few more NOPs than are really needed. 4. The Xen patched state. Alternatives have only two states, and the code is much easier to understand. Also, alternatives avoid things like: ... SWAPGS ... The reader surely doesn't remember that this isn't guaranteed to be a swapgs instruction on native. Using: ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and much clearer. We could hide *that* behind a macro and no one would be confused. (Well, they'd be confused by the fact that Xen PV handles gsbase very differently from native, but that has nothing to do with the macro.) I think we could convert piecemeal, and I wonder if this new patch for 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good starting point. Borislav, what do you think? Would you be okay with adding a Xen PV pseudofeature? --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 19:03 ` Andy Lutomirski @ 2015-11-16 19:59 ` Borislav Petkov 2015-11-16 20:11 ` Andy Lutomirski 2015-11-16 20:11 ` Andy Lutomirski 2015-11-16 19:59 ` Borislav Petkov ` (2 subsequent siblings) 3 siblings, 2 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-16 19:59 UTC (permalink / raw) To: Andy Lutomirski Cc: Boris Ostrovsky, linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote: > ... > The reader surely doesn't remember that this isn't guaranteed to be a > swapgs instruction on native. Using: > > ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV > > would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and > much clearer. We could hide *that* behind a macro and no one would be > confused. (Well, they'd be confused by the fact that Xen PV handles > gsbase very differently from native, but that has nothing to do with > the macro.) > > I think we could convert piecemeal, and I wonder if this new patch for > 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good > starting point. Borislav, what do you think? Would you be okay with > adding a Xen PV pseudofeature? AFAICT, I'd prefer this becomes rather a jump label which gets enabled on xen. Especially if a single pseudofeature might not be enough, apprently... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 19:59 ` Borislav Petkov @ 2015-11-16 20:11 ` Andy Lutomirski 2015-11-16 20:11 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 20:11 UTC (permalink / raw) To: Borislav Petkov; +Cc: Boris Ostrovsky, David Vrabel, linux-kernel, xen-devel On Mon, Nov 16, 2015 at 11:59 AM, Borislav Petkov <bp@alien8.de> wrote: > On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote: > >> ... >> The reader surely doesn't remember that this isn't guaranteed to be a >> swapgs instruction on native. Using: >> >> ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV >> >> would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and >> much clearer. We could hide *that* behind a macro and no one would be >> confused. (Well, they'd be confused by the fact that Xen PV handles >> gsbase very differently from native, but that has nothing to do with >> the macro.) >> >> I think we could convert piecemeal, and I wonder if this new patch for >> 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good >> starting point. Borislav, what do you think? Would you be okay with >> adding a Xen PV pseudofeature? > > AFAICT, I'd prefer this becomes rather a jump label which gets enabled > on xen. Especially if a single pseudofeature might not be enough, > apprently... Except it's not a jump. (Also, the alternatives infrastructure is IMO much nicer than the jump label infrastructure.) Taking SWAPGS as an example, the semantics we need are: - On native, do swapgs. This *can't* be a call due to RSP issues. - On Xen PV, swapgs will work, but it's emulated. We'd rather just nop it out. In principle, we could static jump over it on Xen, but that also involves forcing the jump label to be built on old GCC versions, which PeterZ objected to the last time I asked. If it would make you feel better, it could be X86_BUG_XENPV :-p Are there really multiple feature bits for this stuff? I'd like to imagine that the entry code is all either Xen PV or native/PVH/PVHVM -- i.e. I assumed that PVH works like native for all entries. --Andy tip #101: Trim your mails when you reply. -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 19:59 ` Borislav Petkov 2015-11-16 20:11 ` Andy Lutomirski @ 2015-11-16 20:11 ` Andy Lutomirski 2015-11-16 20:22 ` Borislav Petkov 2015-11-16 20:22 ` Borislav Petkov 1 sibling, 2 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 20:11 UTC (permalink / raw) To: Borislav Petkov Cc: Boris Ostrovsky, linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On Mon, Nov 16, 2015 at 11:59 AM, Borislav Petkov <bp@alien8.de> wrote: > On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote: > >> ... >> The reader surely doesn't remember that this isn't guaranteed to be a >> swapgs instruction on native. Using: >> >> ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV >> >> would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and >> much clearer. We could hide *that* behind a macro and no one would be >> confused. (Well, they'd be confused by the fact that Xen PV handles >> gsbase very differently from native, but that has nothing to do with >> the macro.) >> >> I think we could convert piecemeal, and I wonder if this new patch for >> 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good >> starting point. Borislav, what do you think? Would you be okay with >> adding a Xen PV pseudofeature? > > AFAICT, I'd prefer this becomes rather a jump label which gets enabled > on xen. Especially if a single pseudofeature might not be enough, > apprently... Except it's not a jump. (Also, the alternatives infrastructure is IMO much nicer than the jump label infrastructure.) Taking SWAPGS as an example, the semantics we need are: - On native, do swapgs. This *can't* be a call due to RSP issues. - On Xen PV, swapgs will work, but it's emulated. We'd rather just nop it out. In principle, we could static jump over it on Xen, but that also involves forcing the jump label to be built on old GCC versions, which PeterZ objected to the last time I asked. If it would make you feel better, it could be X86_BUG_XENPV :-p Are there really multiple feature bits for this stuff? I'd like to imagine that the entry code is all either Xen PV or native/PVH/PVHVM -- i.e. I assumed that PVH works like native for all entries. --Andy tip #101: Trim your mails when you reply. -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:11 ` Andy Lutomirski @ 2015-11-16 20:22 ` Borislav Petkov 2015-11-16 20:22 ` Borislav Petkov 1 sibling, 0 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-16 20:22 UTC (permalink / raw) To: Andy Lutomirski, H. Peter Anvin Cc: Boris Ostrovsky, David Vrabel, linux-kernel, xen-devel On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: > On Mon, Nov 16, 2015 at 11:59 AM, Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote: > > > >> ... > >> The reader surely doesn't remember that this isn't guaranteed to be a > >> swapgs instruction on native. Using: > >> > >> ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV > >> > >> would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and > >> much clearer. We could hide *that* behind a macro and no one would be > >> confused. (Well, they'd be confused by the fact that Xen PV handles > >> gsbase very differently from native, but that has nothing to do with > >> the macro.) > >> > >> I think we could convert piecemeal, and I wonder if this new patch for > >> 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good > >> starting point. Borislav, what do you think? Would you be okay with > >> adding a Xen PV pseudofeature? > > > > AFAICT, I'd prefer this becomes rather a jump label which gets enabled > > on xen. Especially if a single pseudofeature might not be enough, > > apprently... > > Except it's not a jump. (Also, the alternatives infrastructure is IMO > much nicer than the jump label infrastructure.) > > Taking SWAPGS as an example, the semantics we need are: > > - On native, do swapgs. This *can't* be a call due to RSP issues. > - On Xen PV, swapgs will work, but it's emulated. We'd rather just nop it out. Huh, so what's wrong with a jump: jmp 1f swapgs 1: > In principle, we could static jump over it on Xen, but that also > involves forcing the jump label to be built on old GCC versions, which > PeterZ objected to the last time I asked. :-\ > If it would make you feel better, it could be X86_BUG_XENPV :-p That doesn't matter - I just don't want to open the flood gates on pseudo feature bits. hpa, what do you think? > Are there really multiple feature bits for this stuff? I'd like to > imagine that the entry code is all either Xen PV or native/PVH/PVHVM > -- i.e. I assumed that PVH works like native for all entries. I just reacted to Boris' statement: "We don't currently have a Xen-specific CPU feature. We could, in principle, add it but we can't replace all of current paravirt patching with a single feature since PVH guests use a subset of existing pv ops (and in the future it may become even more fine-grained)." -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:11 ` Andy Lutomirski 2015-11-16 20:22 ` Borislav Petkov @ 2015-11-16 20:22 ` Borislav Petkov 2015-11-16 20:48 ` Boris Ostrovsky ` (3 more replies) 1 sibling, 4 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-16 20:22 UTC (permalink / raw) To: Andy Lutomirski, H. Peter Anvin Cc: Boris Ostrovsky, linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: > On Mon, Nov 16, 2015 at 11:59 AM, Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote: > > > >> ... > >> The reader surely doesn't remember that this isn't guaranteed to be a > >> swapgs instruction on native. Using: > >> > >> ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV > >> > >> would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and > >> much clearer. We could hide *that* behind a macro and no one would be > >> confused. (Well, they'd be confused by the fact that Xen PV handles > >> gsbase very differently from native, but that has nothing to do with > >> the macro.) > >> > >> I think we could convert piecemeal, and I wonder if this new patch for > >> 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good > >> starting point. Borislav, what do you think? Would you be okay with > >> adding a Xen PV pseudofeature? > > > > AFAICT, I'd prefer this becomes rather a jump label which gets enabled > > on xen. Especially if a single pseudofeature might not be enough, > > apprently... > > Except it's not a jump. (Also, the alternatives infrastructure is IMO > much nicer than the jump label infrastructure.) > > Taking SWAPGS as an example, the semantics we need are: > > - On native, do swapgs. This *can't* be a call due to RSP issues. > - On Xen PV, swapgs will work, but it's emulated. We'd rather just nop it out. Huh, so what's wrong with a jump: jmp 1f swapgs 1: > In principle, we could static jump over it on Xen, but that also > involves forcing the jump label to be built on old GCC versions, which > PeterZ objected to the last time I asked. :-\ > If it would make you feel better, it could be X86_BUG_XENPV :-p That doesn't matter - I just don't want to open the flood gates on pseudo feature bits. hpa, what do you think? > Are there really multiple feature bits for this stuff? I'd like to > imagine that the entry code is all either Xen PV or native/PVH/PVHVM > -- i.e. I assumed that PVH works like native for all entries. I just reacted to Boris' statement: "We don't currently have a Xen-specific CPU feature. We could, in principle, add it but we can't replace all of current paravirt patching with a single feature since PVH guests use a subset of existing pv ops (and in the future it may become even more fine-grained)." -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:22 ` Borislav Petkov @ 2015-11-16 20:48 ` Boris Ostrovsky 2015-11-16 20:50 ` Andy Lutomirski 2015-11-16 20:50 ` Andy Lutomirski 2015-11-16 20:48 ` Boris Ostrovsky ` (2 subsequent siblings) 3 siblings, 2 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-16 20:48 UTC (permalink / raw) To: Borislav Petkov, Andy Lutomirski, H. Peter Anvin Cc: linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On 11/16/2015 03:22 PM, Borislav Petkov wrote: > On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: > >> Are there really multiple feature bits for this stuff? I'd like to >> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >> -- i.e. I assumed that PVH works like native for all entries. Almost. For PVH we will have a small stub to set up bootparams and such but then we jump to startup_{32|64} code. > I just reacted to Boris' statement: > > "We don't currently have a Xen-specific CPU feature. We could, in > principle, add it but we can't replace all of current paravirt patching > with a single feature since PVH guests use a subset of existing pv ops > (and in the future it may become even more fine-grained)." Actually, nevermind this --- I was thinking about APIC ops and they are not pv ops. Note though that there are other users of pv ops --- lguest and looks like KVM (for one op) use them too. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:48 ` Boris Ostrovsky @ 2015-11-16 20:50 ` Andy Lutomirski 2015-11-16 21:00 ` Borislav Petkov ` (2 more replies) 2015-11-16 20:50 ` Andy Lutomirski 1 sibling, 3 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 20:50 UTC (permalink / raw) To: Boris Ostrovsky Cc: Borislav Petkov, H. Peter Anvin, linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 11/16/2015 03:22 PM, Borislav Petkov wrote: >> >> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: >> >>> Are there really multiple feature bits for this stuff? I'd like to >>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >>> -- i.e. I assumed that PVH works like native for all entries. > > > > Almost. For PVH we will have a small stub to set up bootparams and such but > then we jump to startup_{32|64} code. > > >> I just reacted to Boris' statement: >> >> "We don't currently have a Xen-specific CPU feature. We could, in >> principle, add it but we can't replace all of current paravirt patching >> with a single feature since PVH guests use a subset of existing pv ops >> (and in the future it may become even more fine-grained)." > > > Actually, nevermind this --- I was thinking about APIC ops and they are not > pv ops. > > Note though that there are other users of pv ops --- lguest and looks like > KVM (for one op) use them too. > Honestly, I think we should just delete lguest some time soon. And KVM uses this stuff so lightly that we shouldn't need all of the pvop stuff. (In fact, I'm slowly working on removing KVM_GUEST's dependency on PARAVIRT.) --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:50 ` Andy Lutomirski @ 2015-11-16 21:00 ` Borislav Petkov 2015-11-16 21:00 ` Borislav Petkov 2015-11-16 21:03 ` Konrad Rzeszutek Wilk 2 siblings, 0 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-16 21:00 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, xen-devel, David Vrabel, H. Peter Anvin, Boris Ostrovsky On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote: > (In fact, I'm slowly working on removing KVM_GUEST's dependency on > PARAVIRT.) Good! The hope that we'll be able to remove paravirt one day is != 0 again. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:50 ` Andy Lutomirski 2015-11-16 21:00 ` Borislav Petkov @ 2015-11-16 21:00 ` Borislav Petkov 2015-11-16 21:03 ` Konrad Rzeszutek Wilk 2 siblings, 0 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-16 21:00 UTC (permalink / raw) To: Andy Lutomirski Cc: Boris Ostrovsky, H. Peter Anvin, linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote: > (In fact, I'm slowly working on removing KVM_GUEST's dependency on > PARAVIRT.) Good! The hope that we'll be able to remove paravirt one day is != 0 again. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:50 ` Andy Lutomirski 2015-11-16 21:00 ` Borislav Petkov 2015-11-16 21:00 ` Borislav Petkov @ 2015-11-16 21:03 ` Konrad Rzeszutek Wilk 2015-11-16 21:04 ` Andy Lutomirski 2015-11-16 21:04 ` Andy Lutomirski 2 siblings, 2 replies; 51+ messages in thread From: Konrad Rzeszutek Wilk @ 2015-11-16 21:03 UTC (permalink / raw) To: Andy Lutomirski, Joao Martins Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin, Boris Ostrovsky On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote: > On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: > > On 11/16/2015 03:22 PM, Borislav Petkov wrote: > >> > >> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: > >> > >>> Are there really multiple feature bits for this stuff? I'd like to > >>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM > >>> -- i.e. I assumed that PVH works like native for all entries. > > > > > > > > Almost. For PVH we will have a small stub to set up bootparams and such but > > then we jump to startup_{32|64} code. > > > > > >> I just reacted to Boris' statement: > >> > >> "We don't currently have a Xen-specific CPU feature. We could, in > >> principle, add it but we can't replace all of current paravirt patching > >> with a single feature since PVH guests use a subset of existing pv ops > >> (and in the future it may become even more fine-grained)." > > > > > > Actually, nevermind this --- I was thinking about APIC ops and they are not > > pv ops. > > > > Note though that there are other users of pv ops --- lguest and looks like > > KVM (for one op) use them too. > > > > Honestly, I think we should just delete lguest some time soon. And > KVM uses this stuff so lightly that we shouldn't need all of the pvop > stuff. (In fact, I'm slowly working on removing KVM_GUEST's > dependency on PARAVIRT.) Even for the pvclock? (sorry for stealing this thread on this subject). ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 21:03 ` Konrad Rzeszutek Wilk @ 2015-11-16 21:04 ` Andy Lutomirski 2015-11-17 10:53 ` Joao Martins 2015-11-17 10:53 ` Joao Martins 2015-11-16 21:04 ` Andy Lutomirski 1 sibling, 2 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 21:04 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Joao Martins, Boris Ostrovsky, Borislav Petkov, H. Peter Anvin, linux-kernel, xen-devel, David Vrabel On Mon, Nov 16, 2015 at 1:03 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote: >> On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky >> <boris.ostrovsky@oracle.com> wrote: >> > On 11/16/2015 03:22 PM, Borislav Petkov wrote: >> >> >> >> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: >> >> >> >>> Are there really multiple feature bits for this stuff? I'd like to >> >>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >> >>> -- i.e. I assumed that PVH works like native for all entries. >> > >> > >> > >> > Almost. For PVH we will have a small stub to set up bootparams and such but >> > then we jump to startup_{32|64} code. >> > >> > >> >> I just reacted to Boris' statement: >> >> >> >> "We don't currently have a Xen-specific CPU feature. We could, in >> >> principle, add it but we can't replace all of current paravirt patching >> >> with a single feature since PVH guests use a subset of existing pv ops >> >> (and in the future it may become even more fine-grained)." >> > >> > >> > Actually, nevermind this --- I was thinking about APIC ops and they are not >> > pv ops. >> > >> > Note though that there are other users of pv ops --- lguest and looks like >> > KVM (for one op) use them too. >> > >> >> Honestly, I think we should just delete lguest some time soon. And >> KVM uses this stuff so lightly that we shouldn't need all of the pvop >> stuff. (In fact, I'm slowly working on removing KVM_GUEST's >> dependency on PARAVIRT.) > > Even for the pvclock? > > (sorry for stealing this thread on this subject). I don't think that pvclock uses any of the paravirt infrastructure. It's just another clock source AFAIK. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 21:04 ` Andy Lutomirski @ 2015-11-17 10:53 ` Joao Martins 2015-11-17 10:53 ` Joao Martins 1 sibling, 0 replies; 51+ messages in thread From: Joao Martins @ 2015-11-17 10:53 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin, Boris Ostrovsky On 11/16/2015 09:04 PM, Andy Lutomirski wrote: > On Mon, Nov 16, 2015 at 1:03 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: >> On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote: >>> On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky >>> <boris.ostrovsky@oracle.com> wrote: >>>> On 11/16/2015 03:22 PM, Borislav Petkov wrote: >>>>> >>>>> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: >>>>> >>>>>> Are there really multiple feature bits for this stuff? I'd like to >>>>>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >>>>>> -- i.e. I assumed that PVH works like native for all entries. >>>> >>>> >>>> >>>> Almost. For PVH we will have a small stub to set up bootparams and such but >>>> then we jump to startup_{32|64} code. >>>> >>>> >>>>> I just reacted to Boris' statement: >>>>> >>>>> "We don't currently have a Xen-specific CPU feature. We could, in >>>>> principle, add it but we can't replace all of current paravirt patching >>>>> with a single feature since PVH guests use a subset of existing pv ops >>>>> (and in the future it may become even more fine-grained)." >>>> >>>> >>>> Actually, nevermind this --- I was thinking about APIC ops and they are not >>>> pv ops. >>>> >>>> Note though that there are other users of pv ops --- lguest and looks like >>>> KVM (for one op) use them too. >>>> >>> >>> Honestly, I think we should just delete lguest some time soon. And >>> KVM uses this stuff so lightly that we shouldn't need all of the pvop >>> stuff. (In fact, I'm slowly working on removing KVM_GUEST's >>> dependency on PARAVIRT.) >> >> Even for the pvclock? >> >> (sorry for stealing this thread on this subject). > > I don't think that pvclock uses any of the paravirt infrastructure. > It's just another clock source AFAIK. > Yeah, but pv_time_ops is still used on both Xen and KVM. Even though it looks that on KVM some of it's used only when pvclock isn't marked as stable (i.e. no PVCLOCK_TSC_STABLE_BIT). Joao > --Andy > ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 21:04 ` Andy Lutomirski 2015-11-17 10:53 ` Joao Martins @ 2015-11-17 10:53 ` Joao Martins 1 sibling, 0 replies; 51+ messages in thread From: Joao Martins @ 2015-11-17 10:53 UTC (permalink / raw) To: Andy Lutomirski Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, Borislav Petkov, H. Peter Anvin, linux-kernel, xen-devel, David Vrabel On 11/16/2015 09:04 PM, Andy Lutomirski wrote: > On Mon, Nov 16, 2015 at 1:03 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: >> On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote: >>> On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky >>> <boris.ostrovsky@oracle.com> wrote: >>>> On 11/16/2015 03:22 PM, Borislav Petkov wrote: >>>>> >>>>> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: >>>>> >>>>>> Are there really multiple feature bits for this stuff? I'd like to >>>>>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >>>>>> -- i.e. I assumed that PVH works like native for all entries. >>>> >>>> >>>> >>>> Almost. For PVH we will have a small stub to set up bootparams and such but >>>> then we jump to startup_{32|64} code. >>>> >>>> >>>>> I just reacted to Boris' statement: >>>>> >>>>> "We don't currently have a Xen-specific CPU feature. We could, in >>>>> principle, add it but we can't replace all of current paravirt patching >>>>> with a single feature since PVH guests use a subset of existing pv ops >>>>> (and in the future it may become even more fine-grained)." >>>> >>>> >>>> Actually, nevermind this --- I was thinking about APIC ops and they are not >>>> pv ops. >>>> >>>> Note though that there are other users of pv ops --- lguest and looks like >>>> KVM (for one op) use them too. >>>> >>> >>> Honestly, I think we should just delete lguest some time soon. And >>> KVM uses this stuff so lightly that we shouldn't need all of the pvop >>> stuff. (In fact, I'm slowly working on removing KVM_GUEST's >>> dependency on PARAVIRT.) >> >> Even for the pvclock? >> >> (sorry for stealing this thread on this subject). > > I don't think that pvclock uses any of the paravirt infrastructure. > It's just another clock source AFAIK. > Yeah, but pv_time_ops is still used on both Xen and KVM. Even though it looks that on KVM some of it's used only when pvclock isn't marked as stable (i.e. no PVCLOCK_TSC_STABLE_BIT). Joao > --Andy > ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 21:03 ` Konrad Rzeszutek Wilk 2015-11-16 21:04 ` Andy Lutomirski @ 2015-11-16 21:04 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 21:04 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin, Joao Martins, Boris Ostrovsky On Mon, Nov 16, 2015 at 1:03 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote: >> On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky >> <boris.ostrovsky@oracle.com> wrote: >> > On 11/16/2015 03:22 PM, Borislav Petkov wrote: >> >> >> >> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: >> >> >> >>> Are there really multiple feature bits for this stuff? I'd like to >> >>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >> >>> -- i.e. I assumed that PVH works like native for all entries. >> > >> > >> > >> > Almost. For PVH we will have a small stub to set up bootparams and such but >> > then we jump to startup_{32|64} code. >> > >> > >> >> I just reacted to Boris' statement: >> >> >> >> "We don't currently have a Xen-specific CPU feature. We could, in >> >> principle, add it but we can't replace all of current paravirt patching >> >> with a single feature since PVH guests use a subset of existing pv ops >> >> (and in the future it may become even more fine-grained)." >> > >> > >> > Actually, nevermind this --- I was thinking about APIC ops and they are not >> > pv ops. >> > >> > Note though that there are other users of pv ops --- lguest and looks like >> > KVM (for one op) use them too. >> > >> >> Honestly, I think we should just delete lguest some time soon. And >> KVM uses this stuff so lightly that we shouldn't need all of the pvop >> stuff. (In fact, I'm slowly working on removing KVM_GUEST's >> dependency on PARAVIRT.) > > Even for the pvclock? > > (sorry for stealing this thread on this subject). I don't think that pvclock uses any of the paravirt infrastructure. It's just another clock source AFAIK. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:48 ` Boris Ostrovsky 2015-11-16 20:50 ` Andy Lutomirski @ 2015-11-16 20:50 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 20:50 UTC (permalink / raw) To: Boris Ostrovsky Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 11/16/2015 03:22 PM, Borislav Petkov wrote: >> >> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: >> >>> Are there really multiple feature bits for this stuff? I'd like to >>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >>> -- i.e. I assumed that PVH works like native for all entries. > > > > Almost. For PVH we will have a small stub to set up bootparams and such but > then we jump to startup_{32|64} code. > > >> I just reacted to Boris' statement: >> >> "We don't currently have a Xen-specific CPU feature. We could, in >> principle, add it but we can't replace all of current paravirt patching >> with a single feature since PVH guests use a subset of existing pv ops >> (and in the future it may become even more fine-grained)." > > > Actually, nevermind this --- I was thinking about APIC ops and they are not > pv ops. > > Note though that there are other users of pv ops --- lguest and looks like > KVM (for one op) use them too. > Honestly, I think we should just delete lguest some time soon. And KVM uses this stuff so lightly that we shouldn't need all of the pvop stuff. (In fact, I'm slowly working on removing KVM_GUEST's dependency on PARAVIRT.) --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:22 ` Borislav Petkov 2015-11-16 20:48 ` Boris Ostrovsky @ 2015-11-16 20:48 ` Boris Ostrovsky 2015-11-16 21:55 ` H. Peter Anvin 2015-11-16 21:55 ` H. Peter Anvin 3 siblings, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-16 20:48 UTC (permalink / raw) To: Borislav Petkov, Andy Lutomirski, H. Peter Anvin Cc: David Vrabel, linux-kernel, xen-devel On 11/16/2015 03:22 PM, Borislav Petkov wrote: > On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote: > >> Are there really multiple feature bits for this stuff? I'd like to >> imagine that the entry code is all either Xen PV or native/PVH/PVHVM >> -- i.e. I assumed that PVH works like native for all entries. Almost. For PVH we will have a small stub to set up bootparams and such but then we jump to startup_{32|64} code. > I just reacted to Boris' statement: > > "We don't currently have a Xen-specific CPU feature. We could, in > principle, add it but we can't replace all of current paravirt patching > with a single feature since PVH guests use a subset of existing pv ops > (and in the future it may become even more fine-grained)." Actually, nevermind this --- I was thinking about APIC ops and they are not pv ops. Note though that there are other users of pv ops --- lguest and looks like KVM (for one op) use them too. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:22 ` Borislav Petkov 2015-11-16 20:48 ` Boris Ostrovsky 2015-11-16 20:48 ` Boris Ostrovsky @ 2015-11-16 21:55 ` H. Peter Anvin 2015-11-16 21:55 ` H. Peter Anvin 3 siblings, 0 replies; 51+ messages in thread From: H. Peter Anvin @ 2015-11-16 21:55 UTC (permalink / raw) To: Borislav Petkov, Andy Lutomirski Cc: Boris Ostrovsky, David Vrabel, linux-kernel, xen-devel On 11/16/15 12:22, Borislav Petkov wrote: > > Huh, so what's wrong with a jump: > > jmp 1f > swapgs > 1: > What is the point of that jump? >> If it would make you feel better, it could be X86_BUG_XENPV :-p > > That doesn't matter - I just don't want to open the flood gates on > pseudo feature bits. > > hpa, what do you think? Pseudo feature bits are fine, we already have plenty of them. They make sense as they let us reuse a lot of infrastructure. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 20:22 ` Borislav Petkov ` (2 preceding siblings ...) 2015-11-16 21:55 ` H. Peter Anvin @ 2015-11-16 21:55 ` H. Peter Anvin 2015-11-17 14:40 ` Boris Ostrovsky 2015-11-17 14:40 ` Boris Ostrovsky 3 siblings, 2 replies; 51+ messages in thread From: H. Peter Anvin @ 2015-11-16 21:55 UTC (permalink / raw) To: Borislav Petkov, Andy Lutomirski Cc: Boris Ostrovsky, linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On 11/16/15 12:22, Borislav Petkov wrote: > > Huh, so what's wrong with a jump: > > jmp 1f > swapgs > 1: > What is the point of that jump? >> If it would make you feel better, it could be X86_BUG_XENPV :-p > > That doesn't matter - I just don't want to open the flood gates on > pseudo feature bits. > > hpa, what do you think? Pseudo feature bits are fine, we already have plenty of them. They make sense as they let us reuse a lot of infrastructure. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 21:55 ` H. Peter Anvin @ 2015-11-17 14:40 ` Boris Ostrovsky 2015-11-17 14:40 ` Boris Ostrovsky 1 sibling, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-17 14:40 UTC (permalink / raw) To: H. Peter Anvin, Borislav Petkov, Andy Lutomirski Cc: David Vrabel, linux-kernel, xen-devel On 11/16/2015 04:55 PM, H. Peter Anvin wrote: > On 11/16/15 12:22, Borislav Petkov wrote: >> Huh, so what's wrong with a jump: >> >> jmp 1f >> swapgs >> 1: >> > What is the point of that jump? > >>> If it would make you feel better, it could be X86_BUG_XENPV :-p >> That doesn't matter - I just don't want to open the flood gates on >> pseudo feature bits. >> >> hpa, what do you think? > Pseudo feature bits are fine, we already have plenty of them. They make > sense as they let us reuse a lot of infrastructure. So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 3eb572e..c43df7b 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -308,7 +308,8 @@ sysenter_past_esp: movl %esp, %eax call do_fast_syscall_32 - testl %eax, %eax + /* PV guests always use IRET path */ + ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done", X86_FEATURE_PV jz .Lsyscall_32_done /* Opportunistic SYSEXIT */ diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index c320183..2d1bc82 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -121,7 +121,7 @@ sysenter_flags_fixed: movq %rsp, %rdi call do_fast_syscall_32 - testl %eax, %eax + ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done", X86_FEATURE_PV jz .Lsyscall_32_done jmp sysret32_from_system_call @@ -200,7 +200,8 @@ ENTRY(entry_SYSCALL_compat) movq %rsp, %rdi call do_fast_syscall_32 - testl %eax, %eax + /* PV guests always use IRET path */ + ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done", X86_FEATURE_PV jz .Lsyscall_32_done /* Opportunistic SYSRET */ diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e4f8010..723327b 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -216,6 +216,7 @@ #define X86_FEATURE_PAUSEFILTER ( 8*32+13) /* AMD filtered pause intercept */ #define X86_FEATURE_PFTHRESHOLD ( 8*32+14) /* AMD pause filter threshold */ #define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ +#define X86_FEATURE_PV ( 8*32+16) /* Paravirtual guest */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */ ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 21:55 ` H. Peter Anvin 2015-11-17 14:40 ` Boris Ostrovsky @ 2015-11-17 14:40 ` Boris Ostrovsky 2015-11-17 18:49 ` Andy Lutomirski 2015-11-17 18:49 ` Andy Lutomirski 1 sibling, 2 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-17 14:40 UTC (permalink / raw) To: H. Peter Anvin, Borislav Petkov, Andy Lutomirski Cc: linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk On 11/16/2015 04:55 PM, H. Peter Anvin wrote: > On 11/16/15 12:22, Borislav Petkov wrote: >> Huh, so what's wrong with a jump: >> >> jmp 1f >> swapgs >> 1: >> > What is the point of that jump? > >>> If it would make you feel better, it could be X86_BUG_XENPV :-p >> That doesn't matter - I just don't want to open the flood gates on >> pseudo feature bits. >> >> hpa, what do you think? > Pseudo feature bits are fine, we already have plenty of them. They make > sense as they let us reuse a lot of infrastructure. So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 3eb572e..c43df7b 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -308,7 +308,8 @@ sysenter_past_esp: movl %esp, %eax call do_fast_syscall_32 - testl %eax, %eax + /* PV guests always use IRET path */ + ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done", X86_FEATURE_PV jz .Lsyscall_32_done /* Opportunistic SYSEXIT */ diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index c320183..2d1bc82 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -121,7 +121,7 @@ sysenter_flags_fixed: movq %rsp, %rdi call do_fast_syscall_32 - testl %eax, %eax + ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done", X86_FEATURE_PV jz .Lsyscall_32_done jmp sysret32_from_system_call @@ -200,7 +200,8 @@ ENTRY(entry_SYSCALL_compat) movq %rsp, %rdi call do_fast_syscall_32 - testl %eax, %eax + /* PV guests always use IRET path */ + ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done", X86_FEATURE_PV jz .Lsyscall_32_done /* Opportunistic SYSRET */ diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e4f8010..723327b 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -216,6 +216,7 @@ #define X86_FEATURE_PAUSEFILTER ( 8*32+13) /* AMD filtered pause intercept */ #define X86_FEATURE_PFTHRESHOLD ( 8*32+14) /* AMD pause filter threshold */ #define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ +#define X86_FEATURE_PV ( 8*32+16) /* Paravirtual guest */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */ ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 14:40 ` Boris Ostrovsky @ 2015-11-17 18:49 ` Andy Lutomirski 2015-11-17 19:12 ` Andrew Cooper 2015-11-17 19:12 ` [Xen-devel] " Andrew Cooper 2015-11-17 18:49 ` Andy Lutomirski 1 sibling, 2 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-17 18:49 UTC (permalink / raw) To: Boris Ostrovsky Cc: xen-devel, Konrad Rzeszutek Wilk, David Vrabel, Borislav Petkov, linux-kernel, H. Peter Anvin On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: > > On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >> >> On 11/16/15 12:22, Borislav Petkov wrote: >>> >>> Huh, so what's wrong with a jump: >>> >>> jmp 1f >>> swapgs >>> 1: >>> >> What is the point of that jump? >> >>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>> >>> That doesn't matter - I just don't want to open the flood gates on >>> pseudo feature bits. >>> >>> hpa, what do you think? >> >> Pseudo feature bits are fine, we already have plenty of them. They make >> sense as they let us reuse a lot of infrastructure. > > > > So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) > Looks good to me. Does Xen have any sysexit/sysret32 equivalent to return to 32-bit user mode? If so, it could be worth trying to wire it up by patching the jz instead of the test instruction. Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to too many people. --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 18:49 ` Andy Lutomirski @ 2015-11-17 19:12 ` Andrew Cooper 2015-11-17 19:12 ` [Xen-devel] " Andrew Cooper 1 sibling, 0 replies; 51+ messages in thread From: Andrew Cooper @ 2015-11-17 19:12 UTC (permalink / raw) To: Andy Lutomirski, Boris Ostrovsky Cc: David Vrabel, Borislav Petkov, linux-kernel, H. Peter Anvin, xen-devel On 17/11/15 18:49, Andy Lutomirski wrote: > On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>> On 11/16/15 12:22, Borislav Petkov wrote: >>>> Huh, so what's wrong with a jump: >>>> >>>> jmp 1f >>>> swapgs >>>> 1: >>>> >>> What is the point of that jump? >>> >>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>> That doesn't matter - I just don't want to open the flood gates on >>>> pseudo feature bits. >>>> >>>> hpa, what do you think? >>> Pseudo feature bits are fine, we already have plenty of them. They make >>> sense as they let us reuse a lot of infrastructure. >> >> >> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >> > Looks good to me. Does Xen have any sysexit/sysret32 equivalent to > return to 32-bit user mode? If so, it could be worth trying to wire > it up by patching the jz instead of the test instruction. >From the guests point of view, there is only hypercall_iret. > > Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to > too many people. I agree - PV on its own is too generic. An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an unambiguous, although rather longer. ~Andrew ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 18:49 ` Andy Lutomirski 2015-11-17 19:12 ` Andrew Cooper @ 2015-11-17 19:12 ` Andrew Cooper 2015-11-17 19:16 ` Andy Lutomirski 2015-11-17 19:16 ` Andy Lutomirski 1 sibling, 2 replies; 51+ messages in thread From: Andrew Cooper @ 2015-11-17 19:12 UTC (permalink / raw) To: Andy Lutomirski, Boris Ostrovsky Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On 17/11/15 18:49, Andy Lutomirski wrote: > On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>> On 11/16/15 12:22, Borislav Petkov wrote: >>>> Huh, so what's wrong with a jump: >>>> >>>> jmp 1f >>>> swapgs >>>> 1: >>>> >>> What is the point of that jump? >>> >>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>> That doesn't matter - I just don't want to open the flood gates on >>>> pseudo feature bits. >>>> >>>> hpa, what do you think? >>> Pseudo feature bits are fine, we already have plenty of them. They make >>> sense as they let us reuse a lot of infrastructure. >> >> >> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >> > Looks good to me. Does Xen have any sysexit/sysret32 equivalent to > return to 32-bit user mode? If so, it could be worth trying to wire > it up by patching the jz instead of the test instruction. >From the guests point of view, there is only hypercall_iret. > > Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to > too many people. I agree - PV on its own is too generic. An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an unambiguous, although rather longer. ~Andrew ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:12 ` [Xen-devel] " Andrew Cooper @ 2015-11-17 19:16 ` Andy Lutomirski 2015-11-17 19:21 ` Borislav Petkov ` (5 more replies) 2015-11-17 19:16 ` Andy Lutomirski 1 sibling, 6 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-17 19:16 UTC (permalink / raw) To: Andrew Cooper Cc: Boris Ostrovsky, linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 17/11/15 18:49, Andy Lutomirski wrote: >> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>>> On 11/16/15 12:22, Borislav Petkov wrote: >>>>> Huh, so what's wrong with a jump: >>>>> >>>>> jmp 1f >>>>> swapgs >>>>> 1: >>>>> >>>> What is the point of that jump? >>>> >>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>>> That doesn't matter - I just don't want to open the flood gates on >>>>> pseudo feature bits. >>>>> >>>>> hpa, what do you think? >>>> Pseudo feature bits are fine, we already have plenty of them. They make >>>> sense as they let us reuse a lot of infrastructure. >>> >>> >>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >>> >> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >> return to 32-bit user mode? If so, it could be worth trying to wire >> it up by patching the jz instead of the test instruction. > > From the guests point of view, there is only hypercall_iret. Doesn't hypercall_iret have flags that ask for different behavior, though? (VG_syscall or whatever for the 64-bit case?) > >> >> Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to >> too many people. > > I agree - PV on its own is too generic. > > An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an > unambiguous, although rather longer. Works for me, too, although seeing "xen_pv_host" in the Linux cpu features would be very strange indeed :) --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:16 ` Andy Lutomirski @ 2015-11-17 19:21 ` Borislav Petkov 2015-11-17 19:21 ` [Xen-devel] " Borislav Petkov ` (4 subsequent siblings) 5 siblings, 0 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-17 19:21 UTC (permalink / raw) To: Andy Lutomirski Cc: Andrew Cooper, linux-kernel, xen-devel, David Vrabel, H. Peter Anvin, Boris Ostrovsky On Tue, Nov 17, 2015 at 11:16:11AM -0800, Andy Lutomirski wrote: > Works for me, too, although seeing "xen_pv_host" in the Linux cpu > features would be very strange indeed :) That feature would be, of course, *not* in /proc/cpuinfo. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:16 ` Andy Lutomirski 2015-11-17 19:21 ` Borislav Petkov @ 2015-11-17 19:21 ` Borislav Petkov 2015-11-17 19:29 ` Andrew Cooper ` (3 subsequent siblings) 5 siblings, 0 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-17 19:21 UTC (permalink / raw) To: Andy Lutomirski Cc: Andrew Cooper, Boris Ostrovsky, linux-kernel, xen-devel, David Vrabel, H. Peter Anvin On Tue, Nov 17, 2015 at 11:16:11AM -0800, Andy Lutomirski wrote: > Works for me, too, although seeing "xen_pv_host" in the Linux cpu > features would be very strange indeed :) That feature would be, of course, *not* in /proc/cpuinfo. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:16 ` Andy Lutomirski 2015-11-17 19:21 ` Borislav Petkov 2015-11-17 19:21 ` [Xen-devel] " Borislav Petkov @ 2015-11-17 19:29 ` Andrew Cooper 2015-11-17 19:36 ` Andy Lutomirski 2015-11-17 19:36 ` Andy Lutomirski 2015-11-17 19:29 ` Andrew Cooper ` (2 subsequent siblings) 5 siblings, 2 replies; 51+ messages in thread From: Andrew Cooper @ 2015-11-17 19:29 UTC (permalink / raw) To: Andy Lutomirski Cc: Boris Ostrovsky, linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On 17/11/15 19:16, Andy Lutomirski wrote: > On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper > <andrew.cooper3@citrix.com> wrote: >> On 17/11/15 18:49, Andy Lutomirski wrote: >>> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >>>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>>>> On 11/16/15 12:22, Borislav Petkov wrote: >>>>>> Huh, so what's wrong with a jump: >>>>>> >>>>>> jmp 1f >>>>>> swapgs >>>>>> 1: >>>>>> >>>>> What is the point of that jump? >>>>> >>>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>>>> That doesn't matter - I just don't want to open the flood gates on >>>>>> pseudo feature bits. >>>>>> >>>>>> hpa, what do you think? >>>>> Pseudo feature bits are fine, we already have plenty of them. They make >>>>> sense as they let us reuse a lot of infrastructure. >>>> >>>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >>>> >>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>> return to 32-bit user mode? If so, it could be worth trying to wire >>> it up by patching the jz instead of the test instruction. >> From the guests point of view, there is only hypercall_iret. > Doesn't hypercall_iret have flags that ask for different behavior, > though? (VG_syscall or whatever for the 64-bit case?) The one and only flag is VGCF_in_syscall Xen has its own logic for choosing between sysretq/sysretl if VGCF_in_syscall, but will end up on an iret path in all other circumstances. There is definitely some room for optimisation here, but in in some copious free time before that, I want to see about brining most of our asm code up into C. The vast majority of it doesn't need to be written in asm. > >>> Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to >>> too many people. >> I agree - PV on its own is too generic. >> >> An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an >> unambiguous, although rather longer. > Works for me, too, although seeing "xen_pv_host" in the Linux cpu > features would be very strange indeed :) This makes me wonders whether the `insmod xen` project has managed to gain any traction ;) ~Andrew ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:29 ` Andrew Cooper @ 2015-11-17 19:36 ` Andy Lutomirski 2015-11-17 19:36 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-17 19:36 UTC (permalink / raw) To: Andrew Cooper Cc: Boris Ostrovsky, linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On Tue, Nov 17, 2015 at 11:29 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 17/11/15 19:16, Andy Lutomirski wrote: >> On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper >> <andrew.cooper3@citrix.com> wrote: >>> On 17/11/15 18:49, Andy Lutomirski wrote: >>>> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >>>>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>>>>> On 11/16/15 12:22, Borislav Petkov wrote: >>>>>>> Huh, so what's wrong with a jump: >>>>>>> >>>>>>> jmp 1f >>>>>>> swapgs >>>>>>> 1: >>>>>>> >>>>>> What is the point of that jump? >>>>>> >>>>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>>>>> That doesn't matter - I just don't want to open the flood gates on >>>>>>> pseudo feature bits. >>>>>>> >>>>>>> hpa, what do you think? >>>>>> Pseudo feature bits are fine, we already have plenty of them. They make >>>>>> sense as they let us reuse a lot of infrastructure. >>>>> >>>>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >>>>> >>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>>> return to 32-bit user mode? If so, it could be worth trying to wire >>>> it up by patching the jz instead of the test instruction. >>> From the guests point of view, there is only hypercall_iret. >> Doesn't hypercall_iret have flags that ask for different behavior, >> though? (VG_syscall or whatever for the 64-bit case?) > > The one and only flag is VGCF_in_syscall > > Xen has its own logic for choosing between sysretq/sysretl if > VGCF_in_syscall, but will end up on an iret path in all other > circumstances. In that case, a nicer version of this patch could preserve the new sysret-or-iret decision (on 64-bit kernels in the compat path) and use it to set VGCF_in_syscall. This might work considerably better now than it ever would have, since Linux now tries to use sysret32 on *all* 64-bit CPUs, and the way it's structured for compat is just a flag (the testl %eax,%eax thing) that indicates that sysret32 is okay. Anyway, that can be a followup patch. --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:29 ` Andrew Cooper 2015-11-17 19:36 ` Andy Lutomirski @ 2015-11-17 19:36 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-17 19:36 UTC (permalink / raw) To: Andrew Cooper Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin, Boris Ostrovsky On Tue, Nov 17, 2015 at 11:29 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 17/11/15 19:16, Andy Lutomirski wrote: >> On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper >> <andrew.cooper3@citrix.com> wrote: >>> On 17/11/15 18:49, Andy Lutomirski wrote: >>>> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >>>>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>>>>> On 11/16/15 12:22, Borislav Petkov wrote: >>>>>>> Huh, so what's wrong with a jump: >>>>>>> >>>>>>> jmp 1f >>>>>>> swapgs >>>>>>> 1: >>>>>>> >>>>>> What is the point of that jump? >>>>>> >>>>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>>>>> That doesn't matter - I just don't want to open the flood gates on >>>>>>> pseudo feature bits. >>>>>>> >>>>>>> hpa, what do you think? >>>>>> Pseudo feature bits are fine, we already have plenty of them. They make >>>>>> sense as they let us reuse a lot of infrastructure. >>>>> >>>>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >>>>> >>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>>> return to 32-bit user mode? If so, it could be worth trying to wire >>>> it up by patching the jz instead of the test instruction. >>> From the guests point of view, there is only hypercall_iret. >> Doesn't hypercall_iret have flags that ask for different behavior, >> though? (VG_syscall or whatever for the 64-bit case?) > > The one and only flag is VGCF_in_syscall > > Xen has its own logic for choosing between sysretq/sysretl if > VGCF_in_syscall, but will end up on an iret path in all other > circumstances. In that case, a nicer version of this patch could preserve the new sysret-or-iret decision (on 64-bit kernels in the compat path) and use it to set VGCF_in_syscall. This might work considerably better now than it ever would have, since Linux now tries to use sysret32 on *all* 64-bit CPUs, and the way it's structured for compat is just a flag (the testl %eax,%eax thing) that indicates that sysret32 is okay. Anyway, that can be a followup patch. --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:16 ` Andy Lutomirski ` (2 preceding siblings ...) 2015-11-17 19:29 ` Andrew Cooper @ 2015-11-17 19:29 ` Andrew Cooper 2015-11-17 19:37 ` [Xen-devel] " Boris Ostrovsky 2015-11-17 19:37 ` Boris Ostrovsky 5 siblings, 0 replies; 51+ messages in thread From: Andrew Cooper @ 2015-11-17 19:29 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin, Boris Ostrovsky On 17/11/15 19:16, Andy Lutomirski wrote: > On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper > <andrew.cooper3@citrix.com> wrote: >> On 17/11/15 18:49, Andy Lutomirski wrote: >>> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >>>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>>>> On 11/16/15 12:22, Borislav Petkov wrote: >>>>>> Huh, so what's wrong with a jump: >>>>>> >>>>>> jmp 1f >>>>>> swapgs >>>>>> 1: >>>>>> >>>>> What is the point of that jump? >>>>> >>>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>>>> That doesn't matter - I just don't want to open the flood gates on >>>>>> pseudo feature bits. >>>>>> >>>>>> hpa, what do you think? >>>>> Pseudo feature bits are fine, we already have plenty of them. They make >>>>> sense as they let us reuse a lot of infrastructure. >>>> >>>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >>>> >>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>> return to 32-bit user mode? If so, it could be worth trying to wire >>> it up by patching the jz instead of the test instruction. >> From the guests point of view, there is only hypercall_iret. > Doesn't hypercall_iret have flags that ask for different behavior, > though? (VG_syscall or whatever for the 64-bit case?) The one and only flag is VGCF_in_syscall Xen has its own logic for choosing between sysretq/sysretl if VGCF_in_syscall, but will end up on an iret path in all other circumstances. There is definitely some room for optimisation here, but in in some copious free time before that, I want to see about brining most of our asm code up into C. The vast majority of it doesn't need to be written in asm. > >>> Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to >>> too many people. >> I agree - PV on its own is too generic. >> >> An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an >> unambiguous, although rather longer. > Works for me, too, although seeing "xen_pv_host" in the Linux cpu > features would be very strange indeed :) This makes me wonders whether the `insmod xen` project has managed to gain any traction ;) ~Andrew ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:16 ` Andy Lutomirski ` (3 preceding siblings ...) 2015-11-17 19:29 ` Andrew Cooper @ 2015-11-17 19:37 ` Boris Ostrovsky 2015-11-17 19:38 ` Boris Ostrovsky 2015-11-17 19:38 ` [Xen-devel] " Boris Ostrovsky 2015-11-17 19:37 ` Boris Ostrovsky 5 siblings, 2 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-17 19:37 UTC (permalink / raw) To: Andy Lutomirski, Andrew Cooper Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On 11/17/2015 02:16 PM, Andy Lutomirski wrote: >>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>> return to 32-bit user mode? If so, it could be worth trying to wire >>> it up by patching the jz instead of the test instruction. We can actually make patching a little bit more efficient by replacing the test instruction with 'xor %eax,%eax'. That way we won't need any 'nop's. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:37 ` [Xen-devel] " Boris Ostrovsky @ 2015-11-17 19:38 ` Boris Ostrovsky 2015-11-17 19:38 ` [Xen-devel] " Boris Ostrovsky 1 sibling, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-17 19:38 UTC (permalink / raw) To: Andy Lutomirski, Andrew Cooper Cc: David Vrabel, Borislav Petkov, linux-kernel, H. Peter Anvin, xen-devel On 11/17/2015 02:37 PM, Boris Ostrovsky wrote: > On 11/17/2015 02:16 PM, Andy Lutomirski wrote: >>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>>> return to 32-bit user mode? If so, it could be worth trying to wire >>>> it up by patching the jz instead of the test instruction. > > We can actually make patching a little bit more efficient by replacing > the test instruction with 'xor %eax,%eax'. That way we won't need any > 'nop's. Nevermind that, we are looking at flags, not register. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:37 ` [Xen-devel] " Boris Ostrovsky 2015-11-17 19:38 ` Boris Ostrovsky @ 2015-11-17 19:38 ` Boris Ostrovsky 1 sibling, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-17 19:38 UTC (permalink / raw) To: Andy Lutomirski, Andrew Cooper Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On 11/17/2015 02:37 PM, Boris Ostrovsky wrote: > On 11/17/2015 02:16 PM, Andy Lutomirski wrote: >>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>>> return to 32-bit user mode? If so, it could be worth trying to wire >>>> it up by patching the jz instead of the test instruction. > > We can actually make patching a little bit more efficient by replacing > the test instruction with 'xor %eax,%eax'. That way we won't need any > 'nop's. Nevermind that, we are looking at flags, not register. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:16 ` Andy Lutomirski ` (4 preceding siblings ...) 2015-11-17 19:37 ` [Xen-devel] " Boris Ostrovsky @ 2015-11-17 19:37 ` Boris Ostrovsky 5 siblings, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-17 19:37 UTC (permalink / raw) To: Andy Lutomirski, Andrew Cooper Cc: David Vrabel, Borislav Petkov, linux-kernel, H. Peter Anvin, xen-devel On 11/17/2015 02:16 PM, Andy Lutomirski wrote: >>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >>> return to 32-bit user mode? If so, it could be worth trying to wire >>> it up by patching the jz instead of the test instruction. We can actually make patching a little bit more efficient by replacing the test instruction with 'xor %eax,%eax'. That way we won't need any 'nop's. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 19:12 ` [Xen-devel] " Andrew Cooper 2015-11-17 19:16 ` Andy Lutomirski @ 2015-11-17 19:16 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-17 19:16 UTC (permalink / raw) To: Andrew Cooper Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin, Boris Ostrovsky On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 17/11/15 18:49, Andy Lutomirski wrote: >> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: >>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >>>> On 11/16/15 12:22, Borislav Petkov wrote: >>>>> Huh, so what's wrong with a jump: >>>>> >>>>> jmp 1f >>>>> swapgs >>>>> 1: >>>>> >>>> What is the point of that jump? >>>> >>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>>>> That doesn't matter - I just don't want to open the flood gates on >>>>> pseudo feature bits. >>>>> >>>>> hpa, what do you think? >>>> Pseudo feature bits are fine, we already have plenty of them. They make >>>> sense as they let us reuse a lot of infrastructure. >>> >>> >>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) >>> >> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to >> return to 32-bit user mode? If so, it could be worth trying to wire >> it up by patching the jz instead of the test instruction. > > From the guests point of view, there is only hypercall_iret. Doesn't hypercall_iret have flags that ask for different behavior, though? (VG_syscall or whatever for the 64-bit case?) > >> >> Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to >> too many people. > > I agree - PV on its own is too generic. > > An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an > unambiguous, although rather longer. Works for me, too, although seeing "xen_pv_host" in the Linux cpu features would be very strange indeed :) --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-17 14:40 ` Boris Ostrovsky 2015-11-17 18:49 ` Andy Lutomirski @ 2015-11-17 18:49 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-17 18:49 UTC (permalink / raw) To: Boris Ostrovsky Cc: linux-kernel, xen-devel, Borislav Petkov, David Vrabel, H. Peter Anvin On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> wrote: > > On 11/16/2015 04:55 PM, H. Peter Anvin wrote: >> >> On 11/16/15 12:22, Borislav Petkov wrote: >>> >>> Huh, so what's wrong with a jump: >>> >>> jmp 1f >>> swapgs >>> 1: >>> >> What is the point of that jump? >> >>>> If it would make you feel better, it could be X86_BUG_XENPV :-p >>> >>> That doesn't matter - I just don't want to open the flood gates on >>> pseudo feature bits. >>> >>> hpa, what do you think? >> >> Pseudo feature bits are fine, we already have plenty of them. They make >> sense as they let us reuse a lot of infrastructure. > > > > So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them) > Looks good to me. Does Xen have any sysexit/sysret32 equivalent to return to 32-bit user mode? If so, it could be worth trying to wire it up by patching the jz instead of the test instruction. Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to too many people. --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 19:03 ` Andy Lutomirski 2015-11-16 19:59 ` Borislav Petkov @ 2015-11-16 19:59 ` Borislav Petkov 2015-11-16 20:31 ` Boris Ostrovsky 2015-11-16 20:31 ` Boris Ostrovsky 3 siblings, 0 replies; 51+ messages in thread From: Borislav Petkov @ 2015-11-16 19:59 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Boris Ostrovsky, David Vrabel, linux-kernel, xen-devel On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote: > ... > The reader surely doesn't remember that this isn't guaranteed to be a > swapgs instruction on native. Using: > > ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV > > would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and > much clearer. We could hide *that* behind a macro and no one would be > confused. (Well, they'd be confused by the fact that Xen PV handles > gsbase very differently from native, but that has nothing to do with > the macro.) > > I think we could convert piecemeal, and I wonder if this new patch for > 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good > starting point. Borislav, what do you think? Would you be okay with > adding a Xen PV pseudofeature? AFAICT, I'd prefer this becomes rather a jump label which gets enabled on xen. Especially if a single pseudofeature might not be enough, apprently... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 19:03 ` Andy Lutomirski 2015-11-16 19:59 ` Borislav Petkov 2015-11-16 19:59 ` Borislav Petkov @ 2015-11-16 20:31 ` Boris Ostrovsky 2015-11-16 20:31 ` Boris Ostrovsky 3 siblings, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-16 20:31 UTC (permalink / raw) To: Andy Lutomirski; +Cc: David Vrabel, Borislav Petkov, linux-kernel, xen-devel On 11/16/2015 02:03 PM, Andy Lutomirski wrote: > It's still a waste of effort, though. Also, I'd eventually like the > number of places in Xen code in which rsp/esp is invalid to be exactly > zero, and this approach makes this harder or even impossible. That's what PVH is going to do. > Does PVH hook into the entry asm code at all? I thought it was just > boot code and drivers. Not the current version --- it starts with xen_start_kernel(). But we are currently changing it and my plan is to have a small stub executed initially (to set bootparams and such) and then jump to startup_{32|64}(). > > In any case, someone needs to do some serious review and cleanup on > the whole paravirt op mess. We have a bunch of paravirt ops that > serve little purpose. > > The paravirt infrastructure is a bit weird, too: it seems to > effectively have four states for each patch site. There's: > > 1. The initial state, which is unoptimized and works on native. > Presumably any of these that happen early also need to work, if > slowly, on Xen. Not on PV (and as of today, on PVH) --- we start directly from xen_start_kernel(). I.e. from step 2. > > 2. The Xen state without text patching. I'm not actually sure why > this exists at all. Are there pvops that need to switch too early for > us to patch the text? I don't think so. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 19:03 ` Andy Lutomirski ` (2 preceding siblings ...) 2015-11-16 20:31 ` Boris Ostrovsky @ 2015-11-16 20:31 ` Boris Ostrovsky 3 siblings, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-16 20:31 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, xen-devel, David Vrabel, Konrad Rzeszutek Wilk, Borislav Petkov On 11/16/2015 02:03 PM, Andy Lutomirski wrote: > It's still a waste of effort, though. Also, I'd eventually like the > number of places in Xen code in which rsp/esp is invalid to be exactly > zero, and this approach makes this harder or even impossible. That's what PVH is going to do. > Does PVH hook into the entry asm code at all? I thought it was just > boot code and drivers. Not the current version --- it starts with xen_start_kernel(). But we are currently changing it and my plan is to have a small stub executed initially (to set bootparams and such) and then jump to startup_{32|64}(). > > In any case, someone needs to do some serious review and cleanup on > the whole paravirt op mess. We have a bunch of paravirt ops that > serve little purpose. > > The paravirt infrastructure is a bit weird, too: it seems to > effectively have four states for each patch site. There's: > > 1. The initial state, which is unoptimized and works on native. > Presumably any of these that happen early also need to work, if > slowly, on Xen. Not on PV (and as of today, on PVH) --- we start directly from xen_start_kernel(). I.e. from step 2. > > 2. The Xen state without text patching. I'm not actually sure why > this exists at all. Are there pvops that need to switch too early for > us to patch the text? I don't think so. -boris ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-16 16:25 ` Boris Ostrovsky 2015-11-16 19:03 ` Andy Lutomirski @ 2015-11-16 19:03 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-16 19:03 UTC (permalink / raw) To: Boris Ostrovsky; +Cc: David Vrabel, Borislav Petkov, linux-kernel, xen-devel On Mon, Nov 16, 2015 at 8:25 AM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 11/15/2015 01:02 PM, Andy Lutomirski wrote: >> >> On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <boris.ostrovsky@oracle.com> >> wrote: >>> >>> >>> >>> On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >>>> >>>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >>>> <boris.ostrovsky@oracle.com> wrote: >>>>> >>>>> After 32-bit syscall rewrite, and specifically after commit >>>>> 5f310f739b4c >>>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>>>> it's not pt_regs). >>>>> >>>>> We need to adjust it so that subsequent xen_iret can use it. >>>> >>>> I'm wondering if this should be more straightforward: >>>> >>>> movq %rsp, %rdi >>>> call do_fast_syscall_32 >>>> testl %eax, %eax >>>> jz .Lsyscall_32_done >>>> >>>> /* Opportunistic SYSRET */ >>>> sysret32_from_system_call: >>>> XEN_DO_SYSRET32 >>>> >>>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >>>> variant of Xen's iret path that knows that the fast path is okay. >>> >>> >>> >>> This patch is for 32-bit kernel. I actually haven't looked at compat code >>> (probably because our tests don't try that), I need to do that too. >> >> In 4.4, it's almost identical (which was part of the point of this >> whole series). We use sysret32 instead of sysexit, but the underlying >> structure is the same: munge the stack frame and register state >> appropriately to use the fast return instruction in question and then >> execute it. In both cases, the only real difference from the IRET >> path is that we're willing to lose the values of some subset of cx, >> dx, and (on 64-bit kernels) r11. > > > > So it turned out that for compat mode we don't need to do anything since > xen_sysret32 doesn't assume any stack format (or, rather, it assumes that it > can't be used) and builds the IRET frame itself. > It's still a waste of effort, though. Also, I'd eventually like the number of places in Xen code in which rsp/esp is invalid to be exactly zero, and this approach makes this harder or even impossible. > >> >>> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for >>> baremetal otherwise current paravirt op will use native_usergs_sysret32 (for >>> compat code). Which means a new pv_op, I think. >> >> Agreed, unless... >> >> Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could >> be easier to follow and be less code at the same time. Frankly, >> following the control flow from asm through the pre-paravirt-patching >> and post-paravirt-patching variants and into the final targets is >> getting a little bit old, and ALTERNATIVE is crystal clear in >> comparison (and has all the interesting info inline with the rest of >> the asm). Of course, it doesn't work early in boot, but that's fine >> for anything involving user/kernel switches. > > > > We don't currently have a Xen-specific CPU feature. We could, in principle, > add it but we can't replace all of current paravirt patching with a single > feature since PVH guests use a subset of existing pv ops (and in the future > it may become even more fine-grained). > > And I don't think we should go ALTERNATIVE route for one set of features and > keep pv ops for the rest --- it should be either one or the other. Does PVH hook into the entry asm code at all? I thought it was just boot code and drivers. In any case, someone needs to do some serious review and cleanup on the whole paravirt op mess. We have a bunch of paravirt ops that serve little purpose. The paravirt infrastructure is a bit weird, too: it seems to effectively have four states for each patch site. There's: 1. The initial state, which is unoptimized and works on native. Presumably any of these that happen early also need to work, if slowly, on Xen. 2. The Xen state without text patching. I'm not actually sure why this exists at all. Are there pvops that need to switch too early for us to patch the text? 3. The native patched state. This is supposedly optimal, but it results in a few more NOPs than are really needed. 4. The Xen patched state. Alternatives have only two states, and the code is much easier to understand. Also, alternatives avoid things like: ... SWAPGS ... The reader surely doesn't remember that this isn't guaranteed to be a swapgs instruction on native. Using: ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and much clearer. We could hide *that* behind a macro and no one would be confused. (Well, they'd be confused by the fact that Xen PV handles gsbase very differently from native, but that has nothing to do with the macro.) I think we could convert piecemeal, and I wonder if this new patch for 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good starting point. Borislav, what do you think? Would you be okay with adding a Xen PV pseudofeature? --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit 2015-11-13 23:18 [PATCH] xen/x86: Adjust stack pointer in xen_sysexit Boris Ostrovsky 2015-11-13 23:26 ` Andy Lutomirski @ 2015-11-13 23:26 ` Andy Lutomirski 1 sibling, 0 replies; 51+ messages in thread From: Andy Lutomirski @ 2015-11-13 23:26 UTC (permalink / raw) To: Boris Ostrovsky; +Cc: linux-kernel, xen-devel, David Vrabel, Andrew Lutomirski On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c > ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack > frame that is passed to xen_sysexit is no longer a "standard" one (i.e. > it's not pt_regs). > > We need to adjust it so that subsequent xen_iret can use it. I'm wondering if this should be more straightforward: movq %rsp, %rdi call do_fast_syscall_32 testl %eax, %eax jz .Lsyscall_32_done /* Opportunistic SYSRET */ sysret32_from_system_call: XEN_DO_SYSRET32 where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a variant of Xen's iret path that knows that the fast path is okay. --Andy ^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH] xen/x86: Adjust stack pointer in xen_sysexit @ 2015-11-13 23:18 Boris Ostrovsky 0 siblings, 0 replies; 51+ messages in thread From: Boris Ostrovsky @ 2015-11-13 23:18 UTC (permalink / raw) To: konrad.wilk, david.vrabel; +Cc: boris.ostrovsky, linux-kernel, luto, xen-devel After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack frame that is passed to xen_sysexit is no longer a "standard" one (i.e. it's not pt_regs). We need to adjust it so that subsequent xen_iret can use it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- Alternatively, we could return 0 from do_fast_syscall_32() if paravirt_enabled() is true since Xen PV guests will end up using xen_iret one way or the other. And then we won't need xen_sysexit at all. arch/x86/xen/xen-asm_32.S | 23 ++++++++++++++++------- 1 files changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S index fd92a64..c70ec37 100644 --- a/arch/x86/xen/xen-asm_32.S +++ b/arch/x86/xen/xen-asm_32.S @@ -36,15 +36,24 @@ check_events: /* * We can't use sysexit directly, because we're not running in ring0. - * But we can easily fake it up using iret. Assuming xen_sysexit is - * jumped to with a standard stack frame, we can just strip it back to - * a standard iret frame and use iret. + * But we can easily fake it up using iret. + * We came here from the opportunistic SYSEXIT path in entry_SYSENTER_32 + * which left the stack looking like this: + * $__USER_DS + * %ecx + * eflags + * $__USER_CS + * %eip + * %eax + * %gs + * %fs + * %es + * %ds <-- %esp + * + * so we need to adjust it to look like a standard iret frame */ ENTRY(xen_sysexit) - movl PT_EAX(%esp), %eax /* Shouldn't be necessary? */ - orl $X86_EFLAGS_IF, PT_EFLAGS(%esp) - lea PT_EIP(%esp), %esp - + add $5*4, %esp jmp xen_iret ENDPROC(xen_sysexit) -- 1.7.1 ^ permalink raw reply related [flat|nested] 51+ messages in thread
end of thread, other threads:[~2015-11-17 19:38 UTC | newest] Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-13 23:18 [PATCH] xen/x86: Adjust stack pointer in xen_sysexit Boris Ostrovsky 2015-11-13 23:26 ` Andy Lutomirski 2015-11-14 1:23 ` Boris Ostrovsky 2015-11-14 1:23 ` Boris Ostrovsky 2015-11-15 18:02 ` Andy Lutomirski 2015-11-15 18:02 ` Andy Lutomirski 2015-11-16 16:25 ` Boris Ostrovsky 2015-11-16 16:25 ` Boris Ostrovsky 2015-11-16 19:03 ` Andy Lutomirski 2015-11-16 19:59 ` Borislav Petkov 2015-11-16 20:11 ` Andy Lutomirski 2015-11-16 20:11 ` Andy Lutomirski 2015-11-16 20:22 ` Borislav Petkov 2015-11-16 20:22 ` Borislav Petkov 2015-11-16 20:48 ` Boris Ostrovsky 2015-11-16 20:50 ` Andy Lutomirski 2015-11-16 21:00 ` Borislav Petkov 2015-11-16 21:00 ` Borislav Petkov 2015-11-16 21:03 ` Konrad Rzeszutek Wilk 2015-11-16 21:04 ` Andy Lutomirski 2015-11-17 10:53 ` Joao Martins 2015-11-17 10:53 ` Joao Martins 2015-11-16 21:04 ` Andy Lutomirski 2015-11-16 20:50 ` Andy Lutomirski 2015-11-16 20:48 ` Boris Ostrovsky 2015-11-16 21:55 ` H. Peter Anvin 2015-11-16 21:55 ` H. Peter Anvin 2015-11-17 14:40 ` Boris Ostrovsky 2015-11-17 14:40 ` Boris Ostrovsky 2015-11-17 18:49 ` Andy Lutomirski 2015-11-17 19:12 ` Andrew Cooper 2015-11-17 19:12 ` [Xen-devel] " Andrew Cooper 2015-11-17 19:16 ` Andy Lutomirski 2015-11-17 19:21 ` Borislav Petkov 2015-11-17 19:21 ` [Xen-devel] " Borislav Petkov 2015-11-17 19:29 ` Andrew Cooper 2015-11-17 19:36 ` Andy Lutomirski 2015-11-17 19:36 ` Andy Lutomirski 2015-11-17 19:29 ` Andrew Cooper 2015-11-17 19:37 ` [Xen-devel] " Boris Ostrovsky 2015-11-17 19:38 ` Boris Ostrovsky 2015-11-17 19:38 ` [Xen-devel] " Boris Ostrovsky 2015-11-17 19:37 ` Boris Ostrovsky 2015-11-17 19:16 ` Andy Lutomirski 2015-11-17 18:49 ` Andy Lutomirski 2015-11-16 19:59 ` Borislav Petkov 2015-11-16 20:31 ` Boris Ostrovsky 2015-11-16 20:31 ` Boris Ostrovsky 2015-11-16 19:03 ` Andy Lutomirski 2015-11-13 23:26 ` Andy Lutomirski 2015-11-13 23:18 Boris Ostrovsky
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.