xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86: accommodate 32-bit PV guests with SMAP/SMEP handling
@ 2016-03-04 11:08 Jan Beulich
  2016-03-04 11:27 ` [PATCH 1/4] x86/alternatives: correct near branch check Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-04 11:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

As was previously explained[1], SMAP (and with less relevance also
SMEP) is not compatible with 32-bit PV guests which aren't aware/
prepared to be run with that feature enabled. Andrew's original
approach either sacrificed architectural correctness for making
32-bit guests work again or by disabling SMAP also for not
insignificant portions of hypervisor code, both by allowing to control
the workaround mode via command line option.

This alternative approach disables SMAP and SMEP only while
running 32-bit PV guest code plus a few hypervisor instructions
early after entering hypervisor context or later before leaving it.

The 4th patch really is unrelated except for not applying cleanly
without the earlier ones, and the potential having been noticed
while putting together the 2nd one.

1: alternatives: correct near branch check
2: suppress SMAP and SMEP while running 32-bit PV guest code
3: use optimal NOPs to fill the SMAP/SMEP placeholders
4: use 32-bit loads for 32-bit PV guest state reload

Signed-off-by: Jan Beulich <jbeulich@suse.com>

[1] http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg03888.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 1/4] x86/alternatives: correct near branch check
  2016-03-04 11:08 [PATCH 0/4] x86: accommodate 32-bit PV guests with SMAP/SMEP handling Jan Beulich
@ 2016-03-04 11:27 ` Jan Beulich
  2016-03-07 15:43   ` Andrew Cooper
  2016-03-04 11:27 ` [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code Jan Beulich
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-04 11:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

Make sure the near JMP/CALL check doesn't consume uninitialized
data, not even in a benign way. And relax the length check at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -174,7 +174,7 @@ static void __init apply_alternatives(st
         memcpy(insnbuf, replacement, a->replacementlen);
 
         /* 0xe8/0xe9 are relative branches; fix the offset. */
-        if ( (*insnbuf & 0xfe) == 0xe8 && a->replacementlen == 5 )
+        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
             *(s32 *)(insnbuf + 1) += replacement - instr;
 
         add_nops(insnbuf + a->replacementlen,




[-- Attachment #2: x86-alternatives-branch-check.patch --]
[-- Type: text/plain, Size: 739 bytes --]

x86/alternatives: correct near branch check

Make sure the near JMP/CALL check doesn't consume uninitialized
data, not even in a benign way. And relax the length check at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -174,7 +174,7 @@ static void __init apply_alternatives(st
         memcpy(insnbuf, replacement, a->replacementlen);
 
         /* 0xe8/0xe9 are relative branches; fix the offset. */
-        if ( (*insnbuf & 0xfe) == 0xe8 && a->replacementlen == 5 )
+        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
             *(s32 *)(insnbuf + 1) += replacement - instr;
 
         add_nops(insnbuf + a->replacementlen,

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-04 11:08 [PATCH 0/4] x86: accommodate 32-bit PV guests with SMAP/SMEP handling Jan Beulich
  2016-03-04 11:27 ` [PATCH 1/4] x86/alternatives: correct near branch check Jan Beulich
@ 2016-03-04 11:27 ` Jan Beulich
  2016-03-07 16:59   ` Andrew Cooper
  2016-03-09  8:09   ` Wu, Feng
  2016-03-04 11:28 ` [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-04 11:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 11635 bytes --]

Since such guests' kernel code runs in ring 1, their memory accesses,
at the paging layer, are supervisor mode ones, and hence subject to
SMAP/SMEP checks. Such guests cannot be expected to be aware of those
two features though (and so far we also don't expose the respective
feature flags), and hence may suffer page faults they cannot deal with.

While the placement of the re-enabling slightly weakens the intended
protection, it was selected such that 64-bit paths would remain
unaffected where possible. At the expense of a further performance hit
the re-enabling could be put right next to the CLACs.

Note that this introduces a number of extra TLB flushes - CR4.SMEP
transitioning from 0 to 1 always causes a flush, and it transitioning
from 1 to 0 may also do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
 static bool_t __initdata opt_smap = 1;
 boolean_param("smap", opt_smap);
 
+unsigned long __read_mostly cr4_smep_smap_mask;
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1335,6 +1337,8 @@ void __init noreturn __start_xen(unsigne
     if ( cpu_has_smap )
         set_in_cr4(X86_CR4_SMAP);
 
+    cr4_smep_smap_mask = mmu_cr4_features & (X86_CR4_SMEP | X86_CR4_SMAP);
+
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
@@ -1471,7 +1475,10 @@ void __init noreturn __start_xen(unsigne
      * copy_from_user().
      */
     if ( cpu_has_smap )
+    {
+        cr4_smep_smap_mask &= ~X86_CR4_SMAP;
         write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
 
     printk("%sNX (Execute Disable) protection %sactive\n",
            cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
@@ -1488,7 +1495,10 @@ void __init noreturn __start_xen(unsigne
         panic("Could not set up DOM0 guest OS");
 
     if ( cpu_has_smap )
+    {
         write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_smep_smap_mask |= X86_CR4_SMAP;
+    }
 
     /* Scrub RAM that is still free and so may go to an unprivileged domain. */
     scrub_heap_pages();
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -16,14 +16,16 @@ ENTRY(compat_hypercall)
         ASM_CLAC
         pushq $0
         SAVE_VOLATILE type=TRAP_syscall compat=1
+        SMEP_SMAP_RESTORE
 
         cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
         movl  $HYPERCALL_VECTOR,%edi
         call  check_for_unexpected_msi
-        LOAD_C_CLOBBERED
+        LOAD_C_CLOBBERED compat=1 ax=0
 UNLIKELY_END(msi_check)
 
+        movl  UREGS_rax(%rsp),%eax
         GET_CURRENT(%rbx)
 
         cmpl  $NR_hypercalls,%eax
@@ -33,7 +35,6 @@ UNLIKELY_END(msi_check)
         pushq UREGS_rbx(%rsp); pushq %rcx; pushq %rdx; pushq %rsi; pushq %rdi
         pushq UREGS_rbp+5*8(%rsp)
         leaq  compat_hypercall_args_table(%rip),%r10
-        movl  %eax,%eax
         movl  $6,%ecx
         subb  (%r10,%rax,1),%cl
         movq  %rsp,%rdi
@@ -48,7 +49,6 @@ UNLIKELY_END(msi_check)
 #define SHADOW_BYTES 16 /* Shadow EIP + shadow hypercall # */
 #else
         /* Relocate argument registers and zero-extend to 64 bits. */
-        movl  %eax,%eax              /* Hypercall #  */
         xchgl %ecx,%esi              /* Arg 2, Arg 4 */
         movl  %edx,%edx              /* Arg 3        */
         movl  %edi,%r8d              /* Arg 5        */
@@ -174,10 +174,43 @@ compat_bad_hypercall:
 /* %rbx: struct vcpu, interrupts disabled */
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
+.Lcr4_orig:
+        ASM_NOP3 /* mov   %cr4, %rax */
+        ASM_NOP6 /* and   $..., %rax */
+        ASM_NOP3 /* mov   %rax, %cr4 */
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_alt:
+        mov   %cr4, %rax
+        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
+        mov   %rax, %cr4
+.Lcr4_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        .popsection
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
         _ASM_PRE_EXTABLE(.Lft0, handle_exception)
 
+/* This mustn't modify registers other than %rax. */
+ENTRY(cr4_smep_smap_restore)
+        mov   %cr4, %rax
+        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
+        jnz   0f
+        or    cr4_smep_smap_mask(%rip), %rax
+        mov   %rax, %cr4
+        ret
+0:
+        and   cr4_smep_smap_mask(%rip), %eax
+        cmp   cr4_smep_smap_mask(%rip), %eax
+        je    1f
+        BUG
+1:
+        xor   %eax, %eax
+        ret
+
 /* %rdx: trap_bounce, %rbx: struct vcpu */
 ENTRY(compat_post_handle_exception)
         testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
@@ -190,6 +223,7 @@ ENTRY(compat_post_handle_exception)
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         sti
+        SMEP_SMAP_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
         pushq %r11
@@ -225,6 +259,7 @@ UNLIKELY_END(compat_syscall_gpf)
         jmp   .Lcompat_bounce_exception
 
 ENTRY(compat_sysenter)
+        SMEP_SMAP_RESTORE
         movq  VCPU_trap_ctxt(%rbx),%rcx
         cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
         movzwl VCPU_sysenter_sel(%rbx),%eax
@@ -238,6 +273,7 @@ ENTRY(compat_sysenter)
         jmp   compat_test_all_events
 
 ENTRY(compat_int80_direct_trap)
+        SMEP_SMAP_RESTORE
         call  compat_create_bounce_frame
         jmp   compat_test_all_events
 
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
 
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
+        SMEP_SMAP_RESTORE
         movq %rsp,%rdi
         callq do_IRQ
         jmp ret_from_intr
@@ -454,13 +455,64 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
+        GET_CURRENT(%rbx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
-        sti
+
+.Lsmep_smap_orig:
+        jmp   0f
+        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
+        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
+        .else
+        // worst case: rex + opcode + modrm + 4-byte displacement
+        .skip (1 + 1 + 1 + 4) - 2, 0xcc
+        .endif
+        .pushsection .altinstr_replacement, "ax"
+.Lsmep_smap_alt:
+        mov   VCPU_domain(%rbx),%rax
+.Lsmep_smap_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
+                             X86_FEATURE_SMEP, \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
+        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
+                             X86_FEATURE_SMAP, \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
+        .popsection
+
+        testb $3,UREGS_cs(%rsp)
+        jz    0f
+        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
+        je    0f
+        call  cr4_smep_smap_restore
+        /*
+         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
+         * compat_restore_all_guest and it actually returning to guest
+         * context, in which case the guest would run with the two features
+         * enabled. The only bad that can happen from this is a kernel mode
+         * #PF which the guest doesn't expect. Rather than trying to make the
+         * NMI/#MC exit path honor the intended CR4 setting, simply check
+         * whether the wrong CR4 was in use when the #PF occurred, and exit
+         * back to the guest (which will in turn clear the two CR4 bits) to
+         * re-execute the instruction. If we get back here, the CR4 bits
+         * should then be found clear (unless another NMI/#MC occurred at
+         * exactly the right time), and we'll continue processing the
+         * exception as normal.
+         */
+        test  %rax,%rax
+        jnz   0f
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   0f
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+0:      sti
 1:      movq  %rsp,%rdi
         movzbl UREGS_entry_vector(%rsp),%eax
         leaq  exception_table(%rip),%rdx
-        GET_CURRENT(%rbx)
         PERFC_INCR(exceptions, %rax, %rbx)
         callq *(%rdx,%rax,8)
         testb $3,UREGS_cs(%rsp)
@@ -592,6 +644,7 @@ handle_ist_exception:
         SAVE_ALL CLAC
         testb $3,UREGS_cs(%rsp)
         jz    1f
+        SMEP_SMAP_RESTORE
         /* Interrupted guest context. Copy the context to stack bottom. */
         GET_CPUINFO_FIELD(guest_cpu_user_regs,%rdi)
         movq  %rsp,%rsi
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -209,6 +209,16 @@ void ret_from_intr(void);
 
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
+
+#define SMEP_SMAP_RESTORE                                              \
+        667: ASM_NOP5;                                                 \
+        .pushsection .altinstr_replacement, "ax";                      \
+        668: call cr4_smep_smap_restore;                               \
+        .section .altinstructions, "a";                                \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;       \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;       \
+        .popsection
+
 #else
 static always_inline void clac(void)
 {
@@ -308,14 +318,18 @@ static always_inline void stac(void)
  *
  * For the way it is used in RESTORE_ALL, this macro must preserve EFLAGS.ZF.
  */
-.macro LOAD_C_CLOBBERED compat=0
+.macro LOAD_C_CLOBBERED compat=0 ax=1
 .if !\compat
         movq  UREGS_r11(%rsp),%r11
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.endif
+.if \ax
         movq  UREGS_rax(%rsp),%rax
+.endif
+.elseif \ax
+        movl  UREGS_rax(%rsp),%eax
+.endif
         movq  UREGS_rcx(%rsp),%rcx
         movq  UREGS_rdx(%rsp),%rdx
         movq  UREGS_rsi(%rsp),%rsi
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -134,12 +134,12 @@
 #define TF_kernel_mode         (1<<_TF_kernel_mode)
 
 /* #PF error code values. */
-#define PFEC_page_present   (1U<<0)
-#define PFEC_write_access   (1U<<1)
-#define PFEC_user_mode      (1U<<2)
-#define PFEC_reserved_bit   (1U<<3)
-#define PFEC_insn_fetch     (1U<<4)
-#define PFEC_prot_key       (1U<<5)
+#define PFEC_page_present   (_AC(1,U) << 0)
+#define PFEC_write_access   (_AC(1,U) << 1)
+#define PFEC_user_mode      (_AC(1,U) << 2)
+#define PFEC_reserved_bit   (_AC(1,U) << 3)
+#define PFEC_insn_fetch     (_AC(1,U) << 4)
+#define PFEC_prot_key       (_AC(1,U) << 5)
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
 #define PFEC_page_shared    (1U<<17)



[-- Attachment #2: x86-32on64-suppress-SMAP-SMEP.patch --]
[-- Type: text/plain, Size: 11697 bytes --]

x86: suppress SMAP and SMEP while running 32-bit PV guest code

Since such guests' kernel code runs in ring 1, their memory accesses,
at the paging layer, are supervisor mode ones, and hence subject to
SMAP/SMEP checks. Such guests cannot be expected to be aware of those
two features though (and so far we also don't expose the respective
feature flags), and hence may suffer page faults they cannot deal with.

While the placement of the re-enabling slightly weakens the intended
protection, it was selected such that 64-bit paths would remain
unaffected where possible. At the expense of a further performance hit
the re-enabling could be put right next to the CLACs.

Note that this introduces a number of extra TLB flushes - CR4.SMEP
transitioning from 0 to 1 always causes a flush, and it transitioning
from 1 to 0 may also do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
 static bool_t __initdata opt_smap = 1;
 boolean_param("smap", opt_smap);
 
+unsigned long __read_mostly cr4_smep_smap_mask;
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1335,6 +1337,8 @@ void __init noreturn __start_xen(unsigne
     if ( cpu_has_smap )
         set_in_cr4(X86_CR4_SMAP);
 
+    cr4_smep_smap_mask = mmu_cr4_features & (X86_CR4_SMEP | X86_CR4_SMAP);
+
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
@@ -1471,7 +1475,10 @@ void __init noreturn __start_xen(unsigne
      * copy_from_user().
      */
     if ( cpu_has_smap )
+    {
+        cr4_smep_smap_mask &= ~X86_CR4_SMAP;
         write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
 
     printk("%sNX (Execute Disable) protection %sactive\n",
            cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
@@ -1488,7 +1495,10 @@ void __init noreturn __start_xen(unsigne
         panic("Could not set up DOM0 guest OS");
 
     if ( cpu_has_smap )
+    {
         write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_smep_smap_mask |= X86_CR4_SMAP;
+    }
 
     /* Scrub RAM that is still free and so may go to an unprivileged domain. */
     scrub_heap_pages();
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -16,14 +16,16 @@ ENTRY(compat_hypercall)
         ASM_CLAC
         pushq $0
         SAVE_VOLATILE type=TRAP_syscall compat=1
+        SMEP_SMAP_RESTORE
 
         cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
         movl  $HYPERCALL_VECTOR,%edi
         call  check_for_unexpected_msi
-        LOAD_C_CLOBBERED
+        LOAD_C_CLOBBERED compat=1 ax=0
 UNLIKELY_END(msi_check)
 
+        movl  UREGS_rax(%rsp),%eax
         GET_CURRENT(%rbx)
 
         cmpl  $NR_hypercalls,%eax
@@ -33,7 +35,6 @@ UNLIKELY_END(msi_check)
         pushq UREGS_rbx(%rsp); pushq %rcx; pushq %rdx; pushq %rsi; pushq %rdi
         pushq UREGS_rbp+5*8(%rsp)
         leaq  compat_hypercall_args_table(%rip),%r10
-        movl  %eax,%eax
         movl  $6,%ecx
         subb  (%r10,%rax,1),%cl
         movq  %rsp,%rdi
@@ -48,7 +49,6 @@ UNLIKELY_END(msi_check)
 #define SHADOW_BYTES 16 /* Shadow EIP + shadow hypercall # */
 #else
         /* Relocate argument registers and zero-extend to 64 bits. */
-        movl  %eax,%eax              /* Hypercall #  */
         xchgl %ecx,%esi              /* Arg 2, Arg 4 */
         movl  %edx,%edx              /* Arg 3        */
         movl  %edi,%r8d              /* Arg 5        */
@@ -174,10 +174,43 @@ compat_bad_hypercall:
 /* %rbx: struct vcpu, interrupts disabled */
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
+.Lcr4_orig:
+        ASM_NOP3 /* mov   %cr4, %rax */
+        ASM_NOP6 /* and   $..., %rax */
+        ASM_NOP3 /* mov   %rax, %cr4 */
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_alt:
+        mov   %cr4, %rax
+        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
+        mov   %rax, %cr4
+.Lcr4_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        .popsection
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
         _ASM_PRE_EXTABLE(.Lft0, handle_exception)
 
+/* This mustn't modify registers other than %rax. */
+ENTRY(cr4_smep_smap_restore)
+        mov   %cr4, %rax
+        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
+        jnz   0f
+        or    cr4_smep_smap_mask(%rip), %rax
+        mov   %rax, %cr4
+        ret
+0:
+        and   cr4_smep_smap_mask(%rip), %eax
+        cmp   cr4_smep_smap_mask(%rip), %eax
+        je    1f
+        BUG
+1:
+        xor   %eax, %eax
+        ret
+
 /* %rdx: trap_bounce, %rbx: struct vcpu */
 ENTRY(compat_post_handle_exception)
         testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
@@ -190,6 +223,7 @@ ENTRY(compat_post_handle_exception)
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         sti
+        SMEP_SMAP_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
         pushq %r11
@@ -225,6 +259,7 @@ UNLIKELY_END(compat_syscall_gpf)
         jmp   .Lcompat_bounce_exception
 
 ENTRY(compat_sysenter)
+        SMEP_SMAP_RESTORE
         movq  VCPU_trap_ctxt(%rbx),%rcx
         cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
         movzwl VCPU_sysenter_sel(%rbx),%eax
@@ -238,6 +273,7 @@ ENTRY(compat_sysenter)
         jmp   compat_test_all_events
 
 ENTRY(compat_int80_direct_trap)
+        SMEP_SMAP_RESTORE
         call  compat_create_bounce_frame
         jmp   compat_test_all_events
 
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
 
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
+        SMEP_SMAP_RESTORE
         movq %rsp,%rdi
         callq do_IRQ
         jmp ret_from_intr
@@ -454,13 +455,64 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
+        GET_CURRENT(%rbx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
-        sti
+
+.Lsmep_smap_orig:
+        jmp   0f
+        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
+        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
+        .else
+        // worst case: rex + opcode + modrm + 4-byte displacement
+        .skip (1 + 1 + 1 + 4) - 2, 0xcc
+        .endif
+        .pushsection .altinstr_replacement, "ax"
+.Lsmep_smap_alt:
+        mov   VCPU_domain(%rbx),%rax
+.Lsmep_smap_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
+                             X86_FEATURE_SMEP, \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
+        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
+                             X86_FEATURE_SMAP, \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
+                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
+        .popsection
+
+        testb $3,UREGS_cs(%rsp)
+        jz    0f
+        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
+        je    0f
+        call  cr4_smep_smap_restore
+        /*
+         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
+         * compat_restore_all_guest and it actually returning to guest
+         * context, in which case the guest would run with the two features
+         * enabled. The only bad that can happen from this is a kernel mode
+         * #PF which the guest doesn't expect. Rather than trying to make the
+         * NMI/#MC exit path honor the intended CR4 setting, simply check
+         * whether the wrong CR4 was in use when the #PF occurred, and exit
+         * back to the guest (which will in turn clear the two CR4 bits) to
+         * re-execute the instruction. If we get back here, the CR4 bits
+         * should then be found clear (unless another NMI/#MC occurred at
+         * exactly the right time), and we'll continue processing the
+         * exception as normal.
+         */
+        test  %rax,%rax
+        jnz   0f
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   0f
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+0:      sti
 1:      movq  %rsp,%rdi
         movzbl UREGS_entry_vector(%rsp),%eax
         leaq  exception_table(%rip),%rdx
-        GET_CURRENT(%rbx)
         PERFC_INCR(exceptions, %rax, %rbx)
         callq *(%rdx,%rax,8)
         testb $3,UREGS_cs(%rsp)
@@ -592,6 +644,7 @@ handle_ist_exception:
         SAVE_ALL CLAC
         testb $3,UREGS_cs(%rsp)
         jz    1f
+        SMEP_SMAP_RESTORE
         /* Interrupted guest context. Copy the context to stack bottom. */
         GET_CPUINFO_FIELD(guest_cpu_user_regs,%rdi)
         movq  %rsp,%rsi
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -209,6 +209,16 @@ void ret_from_intr(void);
 
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
+
+#define SMEP_SMAP_RESTORE                                              \
+        667: ASM_NOP5;                                                 \
+        .pushsection .altinstr_replacement, "ax";                      \
+        668: call cr4_smep_smap_restore;                               \
+        .section .altinstructions, "a";                                \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;       \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;       \
+        .popsection
+
 #else
 static always_inline void clac(void)
 {
@@ -308,14 +318,18 @@ static always_inline void stac(void)
  *
  * For the way it is used in RESTORE_ALL, this macro must preserve EFLAGS.ZF.
  */
-.macro LOAD_C_CLOBBERED compat=0
+.macro LOAD_C_CLOBBERED compat=0 ax=1
 .if !\compat
         movq  UREGS_r11(%rsp),%r11
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.endif
+.if \ax
         movq  UREGS_rax(%rsp),%rax
+.endif
+.elseif \ax
+        movl  UREGS_rax(%rsp),%eax
+.endif
         movq  UREGS_rcx(%rsp),%rcx
         movq  UREGS_rdx(%rsp),%rdx
         movq  UREGS_rsi(%rsp),%rsi
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -134,12 +134,12 @@
 #define TF_kernel_mode         (1<<_TF_kernel_mode)
 
 /* #PF error code values. */
-#define PFEC_page_present   (1U<<0)
-#define PFEC_write_access   (1U<<1)
-#define PFEC_user_mode      (1U<<2)
-#define PFEC_reserved_bit   (1U<<3)
-#define PFEC_insn_fetch     (1U<<4)
-#define PFEC_prot_key       (1U<<5)
+#define PFEC_page_present   (_AC(1,U) << 0)
+#define PFEC_write_access   (_AC(1,U) << 1)
+#define PFEC_user_mode      (_AC(1,U) << 2)
+#define PFEC_reserved_bit   (_AC(1,U) << 3)
+#define PFEC_insn_fetch     (_AC(1,U) << 4)
+#define PFEC_prot_key       (_AC(1,U) << 5)
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
 #define PFEC_page_shared    (1U<<17)

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders
  2016-03-04 11:08 [PATCH 0/4] x86: accommodate 32-bit PV guests with SMAP/SMEP handling Jan Beulich
  2016-03-04 11:27 ` [PATCH 1/4] x86/alternatives: correct near branch check Jan Beulich
  2016-03-04 11:27 ` [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code Jan Beulich
@ 2016-03-04 11:28 ` Jan Beulich
  2016-03-07 17:43   ` Andrew Cooper
  2016-03-04 11:29 ` [PATCH 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
  2016-03-10  9:44 ` [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
  4 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-04 11:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Alternatives patching code picks the most suitable NOPs for the
running system, so simply use it to replace the pre-populated ones.

Use an arbitrary, always available feature to key off from.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -185,6 +185,7 @@ ENTRY(compat_restore_all_guest)
         mov   %rax, %cr4
 .Lcr4_alt_end:
         .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_orig, X86_FEATURE_LM, 12, 0
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
                              (.Lcr4_alt_end - .Lcr4_alt)
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -204,6 +204,7 @@ void ret_from_intr(void);
         662: __ASM_##op;                                               \
         .popsection;                                                   \
         .pushsection .altinstructions, "a";                            \
+        altinstruction_entry 661b, 661b, X86_FEATURE_LM, 3, 0;         \
         altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3;       \
         .popsection
 
@@ -215,6 +216,7 @@ void ret_from_intr(void);
         .pushsection .altinstr_replacement, "ax";                      \
         668: call cr4_smep_smap_restore;                               \
         .section .altinstructions, "a";                                \
+        altinstruction_entry 667b, 667b, X86_FEATURE_LM, 5, 0;         \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;       \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;       \
         .popsection




[-- Attachment #2: x86-SMEP-SMAP-NOPs.patch --]
[-- Type: text/plain, Size: 1867 bytes --]

x86: use optimal NOPs to fill the SMAP/SMEP placeholders

Alternatives patching code picks the most suitable NOPs for the
running system, so simply use it to replace the pre-populated ones.

Use an arbitrary, always available feature to key off from.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -185,6 +185,7 @@ ENTRY(compat_restore_all_guest)
         mov   %rax, %cr4
 .Lcr4_alt_end:
         .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_orig, X86_FEATURE_LM, 12, 0
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
                              (.Lcr4_alt_end - .Lcr4_alt)
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -204,6 +204,7 @@ void ret_from_intr(void);
         662: __ASM_##op;                                               \
         .popsection;                                                   \
         .pushsection .altinstructions, "a";                            \
+        altinstruction_entry 661b, 661b, X86_FEATURE_LM, 3, 0;         \
         altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3;       \
         .popsection
 
@@ -215,6 +216,7 @@ void ret_from_intr(void);
         .pushsection .altinstr_replacement, "ax";                      \
         668: call cr4_smep_smap_restore;                               \
         .section .altinstructions, "a";                                \
+        altinstruction_entry 667b, 667b, X86_FEATURE_LM, 5, 0;         \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;       \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;       \
         .popsection

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 4/4] x86: use 32-bit loads for 32-bit PV guest state reload
  2016-03-04 11:08 [PATCH 0/4] x86: accommodate 32-bit PV guests with SMAP/SMEP handling Jan Beulich
                   ` (2 preceding siblings ...)
  2016-03-04 11:28 ` [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders Jan Beulich
@ 2016-03-04 11:29 ` Jan Beulich
  2016-03-07 17:45   ` Andrew Cooper
  2016-03-10  9:44 ` [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
  4 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-04 11:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 1477 bytes --]

This is slightly more efficient than loading 64-bit quantities.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -313,6 +313,13 @@ static always_inline void stac(void)
 987:
 .endm
 
+#define LOAD_ONE_REG(reg, compat) \
+.if !(compat); \
+        movq  UREGS_r##reg(%rsp),%r##reg; \
+.else; \
+        movl  UREGS_r##reg(%rsp),%e##reg; \
+.endif
+
 /*
  * Reload registers not preserved by C code from frame.
  *
@@ -326,16 +333,14 @@ static always_inline void stac(void)
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.if \ax
-        movq  UREGS_rax(%rsp),%rax
 .endif
-.elseif \ax
-        movl  UREGS_rax(%rsp),%eax
+.if \ax
+        LOAD_ONE_REG(ax, \compat)
 .endif
-        movq  UREGS_rcx(%rsp),%rcx
-        movq  UREGS_rdx(%rsp),%rdx
-        movq  UREGS_rsi(%rsp),%rsi
-        movq  UREGS_rdi(%rsp),%rdi
+        LOAD_ONE_REG(cx, \compat)
+        LOAD_ONE_REG(dx, \compat)
+        LOAD_ONE_REG(si, \compat)
+        LOAD_ONE_REG(di, \compat)
 .endm
 
 /*
@@ -372,8 +377,9 @@ static always_inline void stac(void)
         .subsection 0
 #endif
 .endif
-987:    movq  UREGS_rbp(%rsp),%rbp
-        movq  UREGS_rbx(%rsp),%rbx
+987:
+        LOAD_ONE_REG(bp, \compat)
+        LOAD_ONE_REG(bx, \compat)
         subq  $-(UREGS_error_code-UREGS_r15+\adj), %rsp
 .endm
 




[-- Attachment #2: x86-32on64-load-low.patch --]
[-- Type: text/plain, Size: 1529 bytes --]

x86: use 32-bit loads for 32-bit PV guest state reload

This is slightly more efficient than loading 64-bit quantities.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -313,6 +313,13 @@ static always_inline void stac(void)
 987:
 .endm
 
+#define LOAD_ONE_REG(reg, compat) \
+.if !(compat); \
+        movq  UREGS_r##reg(%rsp),%r##reg; \
+.else; \
+        movl  UREGS_r##reg(%rsp),%e##reg; \
+.endif
+
 /*
  * Reload registers not preserved by C code from frame.
  *
@@ -326,16 +333,14 @@ static always_inline void stac(void)
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.if \ax
-        movq  UREGS_rax(%rsp),%rax
 .endif
-.elseif \ax
-        movl  UREGS_rax(%rsp),%eax
+.if \ax
+        LOAD_ONE_REG(ax, \compat)
 .endif
-        movq  UREGS_rcx(%rsp),%rcx
-        movq  UREGS_rdx(%rsp),%rdx
-        movq  UREGS_rsi(%rsp),%rsi
-        movq  UREGS_rdi(%rsp),%rdi
+        LOAD_ONE_REG(cx, \compat)
+        LOAD_ONE_REG(dx, \compat)
+        LOAD_ONE_REG(si, \compat)
+        LOAD_ONE_REG(di, \compat)
 .endm
 
 /*
@@ -372,8 +377,9 @@ static always_inline void stac(void)
         .subsection 0
 #endif
 .endif
-987:    movq  UREGS_rbp(%rsp),%rbp
-        movq  UREGS_rbx(%rsp),%rbx
+987:
+        LOAD_ONE_REG(bp, \compat)
+        LOAD_ONE_REG(bx, \compat)
         subq  $-(UREGS_error_code-UREGS_r15+\adj), %rsp
 .endm
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/4] x86/alternatives: correct near branch check
  2016-03-04 11:27 ` [PATCH 1/4] x86/alternatives: correct near branch check Jan Beulich
@ 2016-03-07 15:43   ` Andrew Cooper
  2016-03-07 15:56     ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-03-07 15:43 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Keir Fraser, Feng Wu

On 04/03/16 11:27, Jan Beulich wrote:
> Make sure the near JMP/CALL check doesn't consume uninitialized
> data, not even in a benign way. And relax the length check at once.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/alternative.c
> +++ b/xen/arch/x86/alternative.c
> @@ -174,7 +174,7 @@ static void __init apply_alternatives(st
>          memcpy(insnbuf, replacement, a->replacementlen);
>  
>          /* 0xe8/0xe9 are relative branches; fix the offset. */
> -        if ( (*insnbuf & 0xfe) == 0xe8 && a->replacementlen == 5 )
> +        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
>              *(s32 *)(insnbuf + 1) += replacement - instr;
>  
>          add_nops(insnbuf + a->replacementlen,
>
>
>

Swapping the order is definitely a good thing.

However, relaxing the length check seems less so.  `E8 rel32` or `E9
rel32` encodings are strictly 5 bytes long.

There are complications with the `67 E{8,9} rel16` encodings, but those
are not catered for anyway, and the manual warns about undefined
behaviour if used in long mode.

What is your usecase for relaxing the check?  IMO, if it isn't exactly 5
bytes long, there is some corruption somewhere and the relocation
should't happen.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/4] x86/alternatives: correct near branch check
  2016-03-07 15:43   ` Andrew Cooper
@ 2016-03-07 15:56     ` Jan Beulich
  2016-03-07 16:11       ` Andrew Cooper
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-07 15:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser, Feng Wu

>>> On 07.03.16 at 16:43, <andrew.cooper3@citrix.com> wrote:
> On 04/03/16 11:27, Jan Beulich wrote:
>> Make sure the near JMP/CALL check doesn't consume uninitialized
>> data, not even in a benign way. And relax the length check at once.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/alternative.c
>> +++ b/xen/arch/x86/alternative.c
>> @@ -174,7 +174,7 @@ static void __init apply_alternatives(st
>>          memcpy(insnbuf, replacement, a->replacementlen);
>>  
>>          /* 0xe8/0xe9 are relative branches; fix the offset. */
>> -        if ( (*insnbuf & 0xfe) == 0xe8 && a->replacementlen == 5 )
>> +        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
>>              *(s32 *)(insnbuf + 1) += replacement - instr;
>>  
>>          add_nops(insnbuf + a->replacementlen,
>>
>>
>>
> 
> Swapping the order is definitely a good thing.
> 
> However, relaxing the length check seems less so.  `E8 rel32` or `E9
> rel32` encodings are strictly 5 bytes long.
> 
> There are complications with the `67 E{8,9} rel16` encodings, but those
> are not catered for anyway, and the manual warns about undefined
> behaviour if used in long mode.
> 
> What is your usecase for relaxing the check?  IMO, if it isn't exactly 5
> bytes long, there is some corruption somewhere and the relocation
> should't happen.

The relaxation is solely because at least CALL could validly
be followed by further instructions.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/4] x86/alternatives: correct near branch check
  2016-03-07 15:56     ` Jan Beulich
@ 2016-03-07 16:11       ` Andrew Cooper
  2016-03-07 16:21         ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-03-07 16:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Feng Wu

On 07/03/16 15:56, Jan Beulich wrote:
>>>> On 07.03.16 at 16:43, <andrew.cooper3@citrix.com> wrote:
>> On 04/03/16 11:27, Jan Beulich wrote:
>>> Make sure the near JMP/CALL check doesn't consume uninitialized
>>> data, not even in a benign way. And relax the length check at once.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/xen/arch/x86/alternative.c
>>> +++ b/xen/arch/x86/alternative.c
>>> @@ -174,7 +174,7 @@ static void __init apply_alternatives(st
>>>          memcpy(insnbuf, replacement, a->replacementlen);
>>>  
>>>          /* 0xe8/0xe9 are relative branches; fix the offset. */
>>> -        if ( (*insnbuf & 0xfe) == 0xe8 && a->replacementlen == 5 )
>>> +        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
>>>              *(s32 *)(insnbuf + 1) += replacement - instr;
>>>  
>>>          add_nops(insnbuf + a->replacementlen,
>>>
>>>
>>>
>> Swapping the order is definitely a good thing.
>>
>> However, relaxing the length check seems less so.  `E8 rel32` or `E9
>> rel32` encodings are strictly 5 bytes long.
>>
>> There are complications with the `67 E{8,9} rel16` encodings, but those
>> are not catered for anyway, and the manual warns about undefined
>> behaviour if used in long mode.
>>
>> What is your usecase for relaxing the check?  IMO, if it isn't exactly 5
>> bytes long, there is some corruption somewhere and the relocation
>> should't happen.
> The relaxation is solely because at least CALL could validly
> be followed by further instructions.

But without scanning the entire replacement buffer, there might be other
relocations needing to happen.

That would require decoding the instructions, which is an extreme faff. 
It would be better to leave it currently as-is to effectively disallow
mixing a jmp/call replacement with other code, to avoid the subtle
failure of a second relocation not taking effect

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/4] x86/alternatives: correct near branch check
  2016-03-07 16:11       ` Andrew Cooper
@ 2016-03-07 16:21         ` Jan Beulich
  2016-03-08 17:33           ` Andrew Cooper
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-07 16:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser, Feng Wu

>>> On 07.03.16 at 17:11, <andrew.cooper3@citrix.com> wrote:
> On 07/03/16 15:56, Jan Beulich wrote:
>>>>> On 07.03.16 at 16:43, <andrew.cooper3@citrix.com> wrote:
>>> On 04/03/16 11:27, Jan Beulich wrote:
>>>> Make sure the near JMP/CALL check doesn't consume uninitialized
>>>> data, not even in a benign way. And relax the length check at once.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> --- a/xen/arch/x86/alternative.c
>>>> +++ b/xen/arch/x86/alternative.c
>>>> @@ -174,7 +174,7 @@ static void __init apply_alternatives(st
>>>>          memcpy(insnbuf, replacement, a->replacementlen);
>>>>  
>>>>          /* 0xe8/0xe9 are relative branches; fix the offset. */
>>>> -        if ( (*insnbuf & 0xfe) == 0xe8 && a->replacementlen == 5 )
>>>> +        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
>>>>              *(s32 *)(insnbuf + 1) += replacement - instr;
>>>>  
>>>>          add_nops(insnbuf + a->replacementlen,
>>>>
>>>>
>>>>
>>> Swapping the order is definitely a good thing.
>>>
>>> However, relaxing the length check seems less so.  `E8 rel32` or `E9
>>> rel32` encodings are strictly 5 bytes long.
>>>
>>> There are complications with the `67 E{8,9} rel16` encodings, but those
>>> are not catered for anyway, and the manual warns about undefined
>>> behaviour if used in long mode.
>>>
>>> What is your usecase for relaxing the check?  IMO, if it isn't exactly 5
>>> bytes long, there is some corruption somewhere and the relocation
>>> should't happen.
>> The relaxation is solely because at least CALL could validly
>> be followed by further instructions.
> 
> But without scanning the entire replacement buffer, there might be other
> relocations needing to happen.
> 
> That would require decoding the instructions, which is an extreme faff. 
> It would be better to leave it currently as-is to effectively disallow
> mixing a jmp/call replacement with other code, to avoid the subtle
> failure of a second relocation not taking effect

Well, such missing further fixup would be noticed immediately by
someone trying (unless the patch code path never gets executed).
Whereas a simply adjustment to register state would seem quite
reasonable to follow a call. While right now the subsequent
patches don't depend on this being >= or ==, I think it was wrong
to be == from the beginning.

Plus - there are endless other possibilities of instructions needing
fixups (most notably such with RIP-relative memory operands),
none of which are even remotely reasonable to deal with here.
I.e. namely in the absence of a CALL/JMP the same issue would
exist anyway, which is why I'm not overly concerned of those.
All we want is a specific special case to be treated correctly.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-04 11:27 ` [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code Jan Beulich
@ 2016-03-07 16:59   ` Andrew Cooper
  2016-03-08  7:57     ` Jan Beulich
  2016-03-09  8:09   ` Wu, Feng
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-03-07 16:59 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Feng Wu, Keir Fraser

On 04/03/16 11:27, Jan Beulich wrote:
> Since such guests' kernel code runs in ring 1, their memory accesses,
> at the paging layer, are supervisor mode ones, and hence subject to
> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> two features though (and so far we also don't expose the respective
> feature flags), and hence may suffer page faults they cannot deal with.
>
> While the placement of the re-enabling slightly weakens the intended
> protection, it was selected such that 64-bit paths would remain
> unaffected where possible. At the expense of a further performance hit
> the re-enabling could be put right next to the CLACs.
>
> Note that this introduces a number of extra TLB flushes - CR4.SMEP
> transitioning from 0 to 1 always causes a flush, and it transitioning
> from 1 to 0 may also do.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
>  static bool_t __initdata opt_smap = 1;
>  boolean_param("smap", opt_smap);
>  
> +unsigned long __read_mostly cr4_smep_smap_mask;

Are we liable to gain any other cr4 features which would want to be
included in this?  Might it be wise to chose a slightly more generic
name such as cr4_pv32_mask ?

>  #define SHADOW_BYTES 16 /* Shadow EIP + shadow hypercall # */
>  #else
>          /* Relocate argument registers and zero-extend to 64 bits. */
> -        movl  %eax,%eax              /* Hypercall #  */
>          xchgl %ecx,%esi              /* Arg 2, Arg 4 */
>          movl  %edx,%edx              /* Arg 3        */
>          movl  %edi,%r8d              /* Arg 5        */
> @@ -174,10 +174,43 @@ compat_bad_hypercall:
>  /* %rbx: struct vcpu, interrupts disabled */
>  ENTRY(compat_restore_all_guest)
>          ASSERT_INTERRUPTS_DISABLED
> +.Lcr4_orig:
> +        ASM_NOP3 /* mov   %cr4, %rax */
> +        ASM_NOP6 /* and   $..., %rax */
> +        ASM_NOP3 /* mov   %rax, %cr4 */
> +        .pushsection .altinstr_replacement, "ax"
> +.Lcr4_alt:
> +        mov   %cr4, %rax
> +        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
> +        mov   %rax, %cr4
> +.Lcr4_alt_end:
> +        .section .altinstructions, "a"
> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
> +                             (.Lcr4_alt_end - .Lcr4_alt)
> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
> +                             (.Lcr4_alt_end - .Lcr4_alt)

These 12's look as if they should be (.Lcr4_alt - .Lcr4_orig).

> +        .popsection
>          RESTORE_ALL adj=8 compat=1
>  .Lft0:  iretq
>          _ASM_PRE_EXTABLE(.Lft0, handle_exception)
>  
> +/* This mustn't modify registers other than %rax. */
> +ENTRY(cr4_smep_smap_restore)
> +        mov   %cr4, %rax
> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
> +        jnz   0f
> +        or    cr4_smep_smap_mask(%rip), %rax
> +        mov   %rax, %cr4
> +        ret
> +0:
> +        and   cr4_smep_smap_mask(%rip), %eax
> +        cmp   cr4_smep_smap_mask(%rip), %eax
> +        je    1f
> +        BUG

What is the purpose of this bugcheck? It looks like it is catching a
mismatch of masked options, but I am not completely sure.

For all other ASM level BUG's, I put a short comment on the same line,
to aid people who hit the bug.

> +1:
> +        xor   %eax, %eax
> +        ret
> +
>  /* %rdx: trap_bounce, %rbx: struct vcpu */
>  ENTRY(compat_post_handle_exception)
>          testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
> @@ -190,6 +223,7 @@ ENTRY(compat_post_handle_exception)
>  /* See lstar_enter for entry register state. */
>  ENTRY(cstar_enter)
>          sti
> +        SMEP_SMAP_RESTORE
>          movq  8(%rsp),%rax /* Restore %rax. */
>          movq  $FLAT_KERNEL_SS,8(%rsp)
>          pushq %r11
> @@ -225,6 +259,7 @@ UNLIKELY_END(compat_syscall_gpf)
>          jmp   .Lcompat_bounce_exception
>  
>  ENTRY(compat_sysenter)
> +        SMEP_SMAP_RESTORE
>          movq  VCPU_trap_ctxt(%rbx),%rcx
>          cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
>          movzwl VCPU_sysenter_sel(%rbx),%eax
> @@ -238,6 +273,7 @@ ENTRY(compat_sysenter)
>          jmp   compat_test_all_events
>  
>  ENTRY(compat_int80_direct_trap)
> +        SMEP_SMAP_RESTORE
>          call  compat_create_bounce_frame
>          jmp   compat_test_all_events
>  
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
>  
>  ENTRY(common_interrupt)
>          SAVE_ALL CLAC
> +        SMEP_SMAP_RESTORE
>          movq %rsp,%rdi
>          callq do_IRQ
>          jmp ret_from_intr
> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>  GLOBAL(handle_exception)
>          SAVE_ALL CLAC
>  handle_exception_saved:
> +        GET_CURRENT(%rbx)
>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>          jz    exception_with_ints_disabled
> -        sti
> +
> +.Lsmep_smap_orig:
> +        jmp   0f
> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
> +        .else
> +        // worst case: rex + opcode + modrm + 4-byte displacement
> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
> +        .endif

Which bug is this?  How does it manifest.  More generally, what is this
alternative trying to achieve?

> +        .pushsection .altinstr_replacement, "ax"
> +.Lsmep_smap_alt:
> +        mov   VCPU_domain(%rbx),%rax
> +.Lsmep_smap_alt_end:
> +        .section .altinstructions, "a"
> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> +                             X86_FEATURE_SMEP, \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> +                             X86_FEATURE_SMAP, \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> +        .popsection
> +
> +        testb $3,UREGS_cs(%rsp)
> +        jz    0f
> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)

This comparison is wrong on hardware lacking SMEP and SMAP, as the "mov
VCPU_domain(%rbx),%rax" won't have happened.

> +        je    0f
> +        call  cr4_smep_smap_restore
> +        /*
> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
> +         * compat_restore_all_guest and it actually returning to guest
> +         * context, in which case the guest would run with the two features
> +         * enabled. The only bad that can happen from this is a kernel mode
> +         * #PF which the guest doesn't expect. Rather than trying to make the
> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
> +         * back to the guest (which will in turn clear the two CR4 bits) to
> +         * re-execute the instruction. If we get back here, the CR4 bits
> +         * should then be found clear (unless another NMI/#MC occurred at
> +         * exactly the right time), and we'll continue processing the
> +         * exception as normal.
> +         */
> +        test  %rax,%rax
> +        jnz   0f
> +        mov   $PFEC_page_present,%al
> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
> +        jne   0f
> +        xor   UREGS_error_code(%rsp),%eax
> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
> +        jz    compat_test_all_events
> +0:      sti

Its code like this which makes me even more certain that we have far too
much code written in assembly which doesn't need to be.  Maybe not this
specific sample, but it has taken me 15 minutes and a pad of paper to
try and work out how this conditional works, and I am still not certain
its correct.  In particular, PFEC_prot_key looks like it fool the test
into believing a non-smap/smep fault was a smap/smep fault.

Can you at least provide some C in a comment with the intended
conditional, to aid clarity?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders
  2016-03-04 11:28 ` [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders Jan Beulich
@ 2016-03-07 17:43   ` Andrew Cooper
  2016-03-08  8:02     ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-03-07 17:43 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Feng Wu, Keir Fraser

On 04/03/16 11:28, Jan Beulich wrote:
> Alternatives patching code picks the most suitable NOPs for the
> running system, so simply use it to replace the pre-populated ones.
>
> Use an arbitrary, always available feature to key off from.

I would be tempted to introduce X86_FEATURE_ALWAYS as an alias of
X86_FEATURE_LM, or even a new synthetic feature.  The choice of LM is
explained in the commit message, but will be non-obvious to people
reading the code.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/4] x86: use 32-bit loads for 32-bit PV guest state reload
  2016-03-04 11:29 ` [PATCH 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
@ 2016-03-07 17:45   ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-03-07 17:45 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Keir Fraser

On 04/03/16 11:29, Jan Beulich wrote:
> This is slightly more efficient than loading 64-bit quantities.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-07 16:59   ` Andrew Cooper
@ 2016-03-08  7:57     ` Jan Beulich
  2016-03-09  8:09       ` Wu, Feng
  2016-03-09 11:19       ` Andrew Cooper
  0 siblings, 2 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-08  7:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser, Feng Wu

>>> On 07.03.16 at 17:59, <andrew.cooper3@citrix.com> wrote:
> On 04/03/16 11:27, Jan Beulich wrote:
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
>>  static bool_t __initdata opt_smap = 1;
>>  boolean_param("smap", opt_smap);
>>  
>> +unsigned long __read_mostly cr4_smep_smap_mask;
> 
> Are we liable to gain any other cr4 features which would want to be
> included in this?  Might it be wise to chose a slightly more generic
> name such as cr4_pv32_mask ?

Ah, that's a good name suggestion - I didn't like the "smep_smap"
thing from the beginning, but couldn't think of a better variant.

>> @@ -174,10 +174,43 @@ compat_bad_hypercall:
>>  /* %rbx: struct vcpu, interrupts disabled */
>>  ENTRY(compat_restore_all_guest)
>>          ASSERT_INTERRUPTS_DISABLED
>> +.Lcr4_orig:
>> +        ASM_NOP3 /* mov   %cr4, %rax */
>> +        ASM_NOP6 /* and   $..., %rax */
>> +        ASM_NOP3 /* mov   %rax, %cr4 */
>> +        .pushsection .altinstr_replacement, "ax"
>> +.Lcr4_alt:
>> +        mov   %cr4, %rax
>> +        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
>> +        mov   %rax, %cr4
>> +.Lcr4_alt_end:
>> +        .section .altinstructions, "a"
>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
>> +                             (.Lcr4_alt_end - .Lcr4_alt)
> 
> These 12's look as if they should be (.Lcr4_alt - .Lcr4_orig).

Well, the NOPs that get put there make 12 (= 3 + 6 + 3) a
pretty obvious (shorter and hence more readable) option. But
yes, if you're of the strong opinion that we should use the
longer alternative, I can switch these around.

>> +/* This mustn't modify registers other than %rax. */
>> +ENTRY(cr4_smep_smap_restore)
>> +        mov   %cr4, %rax
>> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
>> +        jnz   0f
>> +        or    cr4_smep_smap_mask(%rip), %rax
>> +        mov   %rax, %cr4
>> +        ret
>> +0:
>> +        and   cr4_smep_smap_mask(%rip), %eax
>> +        cmp   cr4_smep_smap_mask(%rip), %eax
>> +        je    1f
>> +        BUG
> 
> What is the purpose of this bugcheck? It looks like it is catching a
> mismatch of masked options, but I am not completely sure.

This aims at detecting that some of the CR4 bits which are
expected to be set really aren't (other than the case when all
of the ones of interest here are clear).

> For all other ASM level BUG's, I put a short comment on the same line,
> to aid people who hit the bug.

Will do. Question: Should this check perhaps become !NDEBUG
dependent?

>> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>>  GLOBAL(handle_exception)
>>          SAVE_ALL CLAC
>>  handle_exception_saved:
>> +        GET_CURRENT(%rbx)
>>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>>          jz    exception_with_ints_disabled
>> -        sti
>> +
>> +.Lsmep_smap_orig:
>> +        jmp   0f
>> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
>> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
>> +        .else
>> +        // worst case: rex + opcode + modrm + 4-byte displacement
>> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
>> +        .endif
> 
> Which bug is this?  How does it manifest.  More generally, what is this
> alternative trying to achieve?

The .org gets a warning (.Lsmep_smap_orig supposedly being
undefined, and hence getting assumed to be zero) followed by
an error (attempt to move the current location backwards). The
fix https://sourceware.org/ml/binutils/2016-03/msg00030.html
is pending approval.

>> +        .pushsection .altinstr_replacement, "ax"
>> +.Lsmep_smap_alt:
>> +        mov   VCPU_domain(%rbx),%rax
>> +.Lsmep_smap_alt_end:
>> +        .section .altinstructions, "a"
>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>> +                             X86_FEATURE_SMEP, \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>> +                             X86_FEATURE_SMAP, \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>> +        .popsection
>> +
>> +        testb $3,UREGS_cs(%rsp)
>> +        jz    0f
>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
> 
> This comparison is wrong on hardware lacking SMEP and SMAP, as the "mov
> VCPU_domain(%rbx),%rax" won't have happened.

That mov indeed won't have happened, but the original instruction
is a branch past all of this code, so the above is correct (and I did
test on older hardware).

>> +        je    0f
>> +        call  cr4_smep_smap_restore
>> +        /*
>> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
>> +         * compat_restore_all_guest and it actually returning to guest
>> +         * context, in which case the guest would run with the two features
>> +         * enabled. The only bad that can happen from this is a kernel mode
>> +         * #PF which the guest doesn't expect. Rather than trying to make the
>> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
>> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
>> +         * back to the guest (which will in turn clear the two CR4 bits) to
>> +         * re-execute the instruction. If we get back here, the CR4 bits
>> +         * should then be found clear (unless another NMI/#MC occurred at
>> +         * exactly the right time), and we'll continue processing the
>> +         * exception as normal.
>> +         */
>> +        test  %rax,%rax
>> +        jnz   0f
>> +        mov   $PFEC_page_present,%al
>> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
>> +        jne   0f
>> +        xor   UREGS_error_code(%rsp),%eax
>> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
>> +        jz    compat_test_all_events
>> +0:      sti
> 
> Its code like this which makes me even more certain that we have far too
> much code written in assembly which doesn't need to be.  Maybe not this
> specific sample, but it has taken me 15 minutes and a pad of paper to
> try and work out how this conditional works, and I am still not certain
> its correct.  In particular, PFEC_prot_key looks like it fool the test
> into believing a non-smap/smep fault was a smap/smep fault.

Not sure how you come to think of PFEC_prot_key here: That's
a bit which can be set only together with PFEC_user_mode, yet
we care about kernel mode faults only here.

> Can you at least provide some C in a comment with the intended
> conditional, to aid clarity?

Sure, if you think that helps beyond the (I think) pretty extensive
comment:

+        test  %rax,%rax
+        jnz   0f
+        /*
+         * The below effectively is
+         * if ( regs->entry_vector == TRAP_page_fault &&
+         *      (regs->error_code & PFEC_page_present) &&
+         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
+         *     goto compat_test_all_events;
+         */
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   0f
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+0:

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders
  2016-03-07 17:43   ` Andrew Cooper
@ 2016-03-08  8:02     ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-08  8:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser, Feng Wu

>>> On 07.03.16 at 18:43, <andrew.cooper3@citrix.com> wrote:
> On 04/03/16 11:28, Jan Beulich wrote:
>> Alternatives patching code picks the most suitable NOPs for the
>> running system, so simply use it to replace the pre-populated ones.
>>
>> Use an arbitrary, always available feature to key off from.
> 
> I would be tempted to introduce X86_FEATURE_ALWAYS as an alias of
> X86_FEATURE_LM, or even a new synthetic feature.  The choice of LM is
> explained in the commit message, but will be non-obvious to people
> reading the code.

Okay, as an alias.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/4] x86/alternatives: correct near branch check
  2016-03-07 16:21         ` Jan Beulich
@ 2016-03-08 17:33           ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-03-08 17:33 UTC (permalink / raw)
  To: xen-devel

On 07/03/16 16:21, Jan Beulich wrote:
>>>> On 07.03.16 at 17:11, <andrew.cooper3@citrix.com> wrote:
>> On 07/03/16 15:56, Jan Beulich wrote:
>>>>>> On 07.03.16 at 16:43, <andrew.cooper3@citrix.com> wrote:
>>>> On 04/03/16 11:27, Jan Beulich wrote:
>>>>> Make sure the near JMP/CALL check doesn't consume uninitialized
>>>>> data, not even in a benign way. And relax the length check at once.
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> --- a/xen/arch/x86/alternative.c
>>>>> +++ b/xen/arch/x86/alternative.c
>>>>> @@ -174,7 +174,7 @@ static void __init apply_alternatives(st
>>>>>          memcpy(insnbuf, replacement, a->replacementlen);
>>>>>  
>>>>>          /* 0xe8/0xe9 are relative branches; fix the offset. */
>>>>> -        if ( (*insnbuf & 0xfe) == 0xe8 && a->replacementlen == 5 )
>>>>> +        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
>>>>>              *(s32 *)(insnbuf + 1) += replacement - instr;
>>>>>  
>>>>>          add_nops(insnbuf + a->replacementlen,
>>>>>
>>>>>
>>>>>
>>>> Swapping the order is definitely a good thing.
>>>>
>>>> However, relaxing the length check seems less so.  `E8 rel32` or `E9
>>>> rel32` encodings are strictly 5 bytes long.
>>>>
>>>> There are complications with the `67 E{8,9} rel16` encodings, but those
>>>> are not catered for anyway, and the manual warns about undefined
>>>> behaviour if used in long mode.
>>>>
>>>> What is your usecase for relaxing the check?  IMO, if it isn't exactly 5
>>>> bytes long, there is some corruption somewhere and the relocation
>>>> should't happen.
>>> The relaxation is solely because at least CALL could validly
>>> be followed by further instructions.
>> But without scanning the entire replacement buffer, there might be other
>> relocations needing to happen.
>>
>> That would require decoding the instructions, which is an extreme faff. 
>> It would be better to leave it currently as-is to effectively disallow
>> mixing a jmp/call replacement with other code, to avoid the subtle
>> failure of a second relocation not taking effect
> Well, such missing further fixup would be noticed immediately by
> someone trying (unless the patch code path never gets executed).
> Whereas a simply adjustment to register state would seem quite
> reasonable to follow a call. While right now the subsequent
> patches don't depend on this being >= or ==, I think it was wrong
> to be == from the beginning.
>
> Plus - there are endless other possibilities of instructions needing
> fixups (most notably such with RIP-relative memory operands),
> none of which are even remotely reasonable to deal with here.
> I.e. namely in the absence of a CALL/JMP the same issue would
> exist anyway, which is why I'm not overly concerned of those.
> All we want is a specific special case to be treated correctly.

Fair enough.  Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-04 11:27 ` [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code Jan Beulich
  2016-03-07 16:59   ` Andrew Cooper
@ 2016-03-09  8:09   ` Wu, Feng
  2016-03-09 10:45     ` Andrew Cooper
  2016-03-09 14:07     ` Jan Beulich
  1 sibling, 2 replies; 67+ messages in thread
From: Wu, Feng @ 2016-03-09  8:09 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Keir Fraser, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, March 4, 2016 7:28 PM
> To: xen-devel <xen-devel@lists.xenproject.org>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Wu, Feng
> <feng.wu@intel.com>; Keir Fraser <keir@xen.org>
> Subject: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV
> guest code
> 
> Since such guests' kernel code runs in ring 1, their memory accesses,
> at the paging layer, are supervisor mode ones, and hence subject to
> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> two features though (and so far we also don't expose the respective
> feature flags), and hence may suffer page faults they cannot deal with.
> 
> While the placement of the re-enabling slightly weakens the intended
> protection, it was selected such that 64-bit paths would remain
> unaffected where possible. At the expense of a further performance hit
> the re-enabling could be put right next to the CLACs.
> 
> Note that this introduces a number of extra TLB flushes - CR4.SMEP
> transitioning from 0 to 1 always causes a flush, and it transitioning
> from 1 to 0 may also do.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
>  static bool_t __initdata opt_smap = 1;
>  boolean_param("smap", opt_smap);
> 
> +unsigned long __read_mostly cr4_smep_smap_mask;
> +
>  /* Boot dom0 in pvh mode */
>  static bool_t __initdata opt_dom0pvh;
>  boolean_param("dom0pvh", opt_dom0pvh);
> @@ -1335,6 +1337,8 @@ void __init noreturn __start_xen(unsigne
>      if ( cpu_has_smap )
>          set_in_cr4(X86_CR4_SMAP);
> 
> +    cr4_smep_smap_mask = mmu_cr4_features & (X86_CR4_SMEP |
> X86_CR4_SMAP);
> +
>      if ( cpu_has_fsgsbase )
>          set_in_cr4(X86_CR4_FSGSBASE);
> 
> @@ -1471,7 +1475,10 @@ void __init noreturn __start_xen(unsigne
>       * copy_from_user().
>       */
>      if ( cpu_has_smap )
> +    {
> +        cr4_smep_smap_mask &= ~X86_CR4_SMAP;

You change ' cr4_smep_smap_mask ' here ...

>          write_cr4(read_cr4() & ~X86_CR4_SMAP);
> +    }
> 
>      printk("%sNX (Execute Disable) protection %sactive\n",
>             cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
> @@ -1488,7 +1495,10 @@ void __init noreturn __start_xen(unsigne
>          panic("Could not set up DOM0 guest OS");
> 
>      if ( cpu_has_smap )
> +    {
>          write_cr4(read_cr4() | X86_CR4_SMAP);
> +        cr4_smep_smap_mask |= X86_CR4_SMAP;

... and here. I am wonder whether it is because Domain 0 can actually start
running between the two place? Or I don't think the changes in the above
two places is needed. right?
> 
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
> 
>  ENTRY(common_interrupt)
>          SAVE_ALL CLAC
> +        SMEP_SMAP_RESTORE
>          movq %rsp,%rdi
>          callq do_IRQ
>          jmp ret_from_intr
> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>  GLOBAL(handle_exception)
>          SAVE_ALL CLAC
>  handle_exception_saved:
> +        GET_CURRENT(%rbx)
>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>          jz    exception_with_ints_disabled
> -        sti
> +
> +.Lsmep_smap_orig:
> +        jmp   0f
> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
> +        .else
> +        // worst case: rex + opcode + modrm + 4-byte displacement
> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
> +        .endif
> +        .pushsection .altinstr_replacement, "ax"
> +.Lsmep_smap_alt:
> +        mov   VCPU_domain(%rbx),%rax
> +.Lsmep_smap_alt_end:
> +        .section .altinstructions, "a"
> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> +                             X86_FEATURE_SMEP, \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> +                             X86_FEATURE_SMAP, \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> +        .popsection
> +
> +        testb $3,UREGS_cs(%rsp)
> +        jz    0f
> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
> +        je    0f
> +        call  cr4_smep_smap_restore
> +        /*
> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in

Do you mean "before" when you typed "between" above?

> +         * compat_restore_all_guest and it actually returning to guest
> +         * context, in which case the guest would run with the two features
> +         * enabled. The only bad that can happen from this is a kernel mode
> +         * #PF which the guest doesn't expect. Rather than trying to make the
> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
> +         * back to the guest (which will in turn clear the two CR4 bits) to
> +         * re-execute the instruction. If we get back here, the CR4 bits
> +         * should then be found clear (unless another NMI/#MC occurred at
> +         * exactly the right time), and we'll continue processing the
> +         * exception as normal.
> +         */

As Andrew mentioned in another mail, this scenario is a little tricky, could you
please give a more detailed description about how the MNI/#MC affect the
execution flow, maybe with some code in the explanation? Thanks a lot!

Thanks,
Feng


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-08  7:57     ` Jan Beulich
@ 2016-03-09  8:09       ` Wu, Feng
  2016-03-09 14:09         ` Jan Beulich
  2016-03-09 11:19       ` Andrew Cooper
  1 sibling, 1 reply; 67+ messages in thread
From: Wu, Feng @ 2016-03-09  8:09 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel, Keir Fraser, Wu, Feng

> >> +/* This mustn't modify registers other than %rax. */
> >> +ENTRY(cr4_smep_smap_restore)
> >> +        mov   %cr4, %rax
> >> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
> >> +        jnz   0f

If we clear every place where we are back to 32bit pv guest,
X86_CR4_SMEP and X86_CR4_SMAP bit should be clear
in CR4, right?  If that is the case, we cannot jump to 0f.
However, like what you mentioned in the comments below:

+        /*
+         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
+         * compat_restore_all_guest and it actually returning to guest
+         * context, in which case the guest would run with the two features
+         * enabled. The only bad that can happen from this is a kernel mode
+         * #PF which the guest doesn't expect. Rather than trying to make the
+         * NMI/#MC exit path honor the intended CR4 setting, simply check
+         * whether the wrong CR4 was in use when the #PF occurred, and exit
+         * back to the guest (which will in turn clear the two CR4 bits) to
+         * re-execute the instruction. If we get back here, the CR4 bits
+         * should then be found clear (unless another NMI/#MC occurred at
+         * exactly the right time), and we'll continue processing the
+         * exception as normal.
+         */

That means, if NMI or #MC happens in the right time, the guest can be
running with SMAP/SMEP set in cr4, only in this case, we can get the '0f'.
Is my understanding correct? Thanks!


> >> +        or    cr4_smep_smap_mask(%rip), %rax
> >> +        mov   %rax, %cr4
> >> +        ret
> >> +0:
> >> +        and   cr4_smep_smap_mask(%rip), %eax
> >> +        cmp   cr4_smep_smap_mask(%rip), %eax
> >> +        je    1f
> >> +        BUG
> >
> > What is the purpose of this bugcheck? It looks like it is catching a
> > mismatch of masked options, but I am not completely sure.
> 
> This aims at detecting that some of the CR4 bits which are
> expected to be set really aren't (other than the case when all
> of the ones of interest here are clear).

The code in the 0f section consider the following case as not
a bug, do you think this can really happen?

Let's suppose smap it bit 0 and smep is bit1, then
cr4_smep_smap_mask is 01 or 10 and %eax is 11, in this case, it
means, Xen sets both smap and smep, but when guest is running,
only one feature is set. In your path, smap and smep is set/clear
at the same time, not sure this can happen.

BTW, I think it is worth adding some comments for the 0f section.

Thanks,
Feng



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09  8:09   ` Wu, Feng
@ 2016-03-09 10:45     ` Andrew Cooper
  2016-03-09 12:27       ` Wu, Feng
  2016-03-09 14:03       ` Jan Beulich
  2016-03-09 14:07     ` Jan Beulich
  1 sibling, 2 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-03-09 10:45 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich, xen-devel; +Cc: Keir Fraser

On 09/03/16 08:09, Wu, Feng wrote:

>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
>>
>>  ENTRY(common_interrupt)
>>          SAVE_ALL CLAC
>> +        SMEP_SMAP_RESTORE
>>          movq %rsp,%rdi
>>          callq do_IRQ
>>          jmp ret_from_intr
>> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>>  GLOBAL(handle_exception)
>>          SAVE_ALL CLAC
>>  handle_exception_saved:
>> +        GET_CURRENT(%rbx)
>>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>>          jz    exception_with_ints_disabled
>> -        sti
>> +
>> +.Lsmep_smap_orig:
>> +        jmp   0f
>> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
>> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
>> +        .else
>> +        // worst case: rex + opcode + modrm + 4-byte displacement
>> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
>> +        .endif
>> +        .pushsection .altinstr_replacement, "ax"
>> +.Lsmep_smap_alt:
>> +        mov   VCPU_domain(%rbx),%rax
>> +.Lsmep_smap_alt_end:
>> +        .section .altinstructions, "a"
>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>> +                             X86_FEATURE_SMEP, \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>> +                             X86_FEATURE_SMAP, \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>> +        .popsection
>> +
>> +        testb $3,UREGS_cs(%rsp)
>> +        jz    0f
>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
>> +        je    0f
>> +        call  cr4_smep_smap_restore
>> +        /*
>> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
> Do you mean "before" when you typed "between" above?

The meaning is "between (clearing CR4.SMEP and CR4.SMAP in
compat_restore_all_guest) and (it actually returning to guest)"

Nested lists in English are a source of confusion, even to native speakers.

~Andrew

>> +         * compat_restore_all_guest and it actually returning to guest
>> +         * context, in which case the guest would run with the two features
>> +         * enabled. The only bad that can happen from this is a kernel mode
>> +         * #PF which the guest doesn't expect. Rather than trying to make the
>> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
>> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
>> +         * back to the guest (which will in turn clear the two CR4 bits) to
>> +         * re-execute the instruction. If we get back here, the CR4 bits
>> +         * should then be found clear (unless another NMI/#MC occurred at
>> +         * exactly the right time), and we'll continue processing the
>> +         * exception as normal.
>> +         */



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-08  7:57     ` Jan Beulich
  2016-03-09  8:09       ` Wu, Feng
@ 2016-03-09 11:19       ` Andrew Cooper
  2016-03-09 14:28         ` Jan Beulich
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-03-09 11:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Feng Wu

On 08/03/16 07:57, Jan Beulich wrote:
>
>>> @@ -174,10 +174,43 @@ compat_bad_hypercall:
>>>  /* %rbx: struct vcpu, interrupts disabled */
>>>  ENTRY(compat_restore_all_guest)
>>>          ASSERT_INTERRUPTS_DISABLED
>>> +.Lcr4_orig:
>>> +        ASM_NOP3 /* mov   %cr4, %rax */
>>> +        ASM_NOP6 /* and   $..., %rax */
>>> +        ASM_NOP3 /* mov   %rax, %cr4 */
>>> +        .pushsection .altinstr_replacement, "ax"
>>> +.Lcr4_alt:
>>> +        mov   %cr4, %rax
>>> +        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
>>> +        mov   %rax, %cr4
>>> +.Lcr4_alt_end:
>>> +        .section .altinstructions, "a"
>>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
>>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
>>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>> These 12's look as if they should be (.Lcr4_alt - .Lcr4_orig).
> Well, the NOPs that get put there make 12 (= 3 + 6 + 3) a
> pretty obvious (shorter and hence more readable) option. But
> yes, if you're of the strong opinion that we should use the
> longer alternative, I can switch these around.

I have to admit that I prefer the Linux ALTERNATIVE macro for assembly,
which takes care of the calculations like this.  It is slightly
unfortunate that it generally requires its assembly blocks in strings,
and is unsuitable for larger blocks.  Perhaps we can see about an
variant in due course.

>
>>> +/* This mustn't modify registers other than %rax. */
>>> +ENTRY(cr4_smep_smap_restore)
>>> +        mov   %cr4, %rax
>>> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
>>> +        jnz   0f
>>> +        or    cr4_smep_smap_mask(%rip), %rax
>>> +        mov   %rax, %cr4
>>> +        ret
>>> +0:
>>> +        and   cr4_smep_smap_mask(%rip), %eax
>>> +        cmp   cr4_smep_smap_mask(%rip), %eax
>>> +        je    1f
>>> +        BUG
>> What is the purpose of this bugcheck? It looks like it is catching a
>> mismatch of masked options, but I am not completely sure.
> This aims at detecting that some of the CR4 bits which are
> expected to be set really aren't (other than the case when all
> of the ones of interest here are clear).
>
>> For all other ASM level BUG's, I put a short comment on the same line,
>> to aid people who hit the bug.
> Will do. Question: Should this check perhaps become !NDEBUG
> dependent?

It probably should do.

>
>>> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>>>  GLOBAL(handle_exception)
>>>          SAVE_ALL CLAC
>>>  handle_exception_saved:
>>> +        GET_CURRENT(%rbx)
>>>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>>>          jz    exception_with_ints_disabled
>>> -        sti
>>> +
>>> +.Lsmep_smap_orig:
>>> +        jmp   0f
>>> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
>>> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
>>> +        .else
>>> +        // worst case: rex + opcode + modrm + 4-byte displacement
>>> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
>>> +        .endif
>> Which bug is this?  How does it manifest.  More generally, what is this
>> alternative trying to achieve?
> The .org gets a warning (.Lsmep_smap_orig supposedly being
> undefined, and hence getting assumed to be zero) followed by
> an error (attempt to move the current location backwards). The
> fix https://sourceware.org/ml/binutils/2016-03/msg00030.html
> is pending approval.

I presume this is down to the documented restriction about crossing
sections.  i.e. there is no .Lsmep_smap_orig in .altinstr_replacement,
where it found the first two symbols.

>>> +        .pushsection .altinstr_replacement, "ax"
>>> +.Lsmep_smap_alt:
>>> +        mov   VCPU_domain(%rbx),%rax
>>> +.Lsmep_smap_alt_end:
>>> +        .section .altinstructions, "a"
>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>> +                             X86_FEATURE_SMEP, \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>> +                             X86_FEATURE_SMAP, \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>> +        .popsection
>>> +
>>> +        testb $3,UREGS_cs(%rsp)
>>> +        jz    0f
>>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
>> This comparison is wrong on hardware lacking SMEP and SMAP, as the "mov
>> VCPU_domain(%rbx),%rax" won't have happened.
> That mov indeed won't have happened, but the original instruction
> is a branch past all of this code, so the above is correct (and I did
> test on older hardware).

Oh so it wont.  It is moderately subtle that this entire codeblock is
logically contained in the alternative.

It would be far clearer, and work around your org bug, if this was a
single alternative which patched jump into a nop.

At the very least, a label of .Lcr3_pv32_fixup_done would be an
improvement over 0.

>
>>> +        je    0f
>>> +        call  cr4_smep_smap_restore
>>> +        /*
>>> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
>>> +         * compat_restore_all_guest and it actually returning to guest
>>> +         * context, in which case the guest would run with the two features
>>> +         * enabled. The only bad that can happen from this is a kernel mode
>>> +         * #PF which the guest doesn't expect. Rather than trying to make the
>>> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
>>> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
>>> +         * back to the guest (which will in turn clear the two CR4 bits) to
>>> +         * re-execute the instruction. If we get back here, the CR4 bits
>>> +         * should then be found clear (unless another NMI/#MC occurred at
>>> +         * exactly the right time), and we'll continue processing the
>>> +         * exception as normal.
>>> +         */
>>> +        test  %rax,%rax
>>> +        jnz   0f
>>> +        mov   $PFEC_page_present,%al
>>> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
>>> +        jne   0f
>>> +        xor   UREGS_error_code(%rsp),%eax
>>> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
>>> +        jz    compat_test_all_events
>>> +0:      sti
>> Its code like this which makes me even more certain that we have far too
>> much code written in assembly which doesn't need to be.  Maybe not this
>> specific sample, but it has taken me 15 minutes and a pad of paper to
>> try and work out how this conditional works, and I am still not certain
>> its correct.  In particular, PFEC_prot_key looks like it fool the test
>> into believing a non-smap/smep fault was a smap/smep fault.
> Not sure how you come to think of PFEC_prot_key here: That's
> a bit which can be set only together with PFEC_user_mode, yet
> we care about kernel mode faults only here.

I would not make that assumption.  Assumptions about the valid set of
#PF flags are precisely the reason that older Linux falls into an
infinite loop when encountering a SMAP pagefault, rather than a clean crash.

~Andrew

>
>> Can you at least provide some C in a comment with the intended
>> conditional, to aid clarity?
> Sure, if you think that helps beyond the (I think) pretty extensive
> comment:
>
> +        test  %rax,%rax
> +        jnz   0f
> +        /*
> +         * The below effectively is
> +         * if ( regs->entry_vector == TRAP_page_fault &&
> +         *      (regs->error_code & PFEC_page_present) &&
> +         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
> +         *     goto compat_test_all_events;
> +         */
> +        mov   $PFEC_page_present,%al
> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
> +        jne   0f
> +        xor   UREGS_error_code(%rsp),%eax
> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
> +        jz    compat_test_all_events
> +0:
>
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 10:45     ` Andrew Cooper
@ 2016-03-09 12:27       ` Wu, Feng
  2016-03-09 12:33         ` Andrew Cooper
  2016-03-09 14:03       ` Jan Beulich
  1 sibling, 1 reply; 67+ messages in thread
From: Wu, Feng @ 2016-03-09 12:27 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Wu, Feng, Keir Fraser



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Wednesday, March 9, 2016 6:46 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>; xen-
> devel <xen-devel@lists.xenproject.org>
> Cc: Keir Fraser <keir@xen.org>
> Subject: Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV
> guest code
> 
> On 09/03/16 08:09, Wu, Feng wrote:
> 
> >> --- a/xen/arch/x86/x86_64/entry.S
> >> +++ b/xen/arch/x86/x86_64/entry.S
> >> @@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
> >>
> >>  ENTRY(common_interrupt)
> >>          SAVE_ALL CLAC
> >> +        SMEP_SMAP_RESTORE
> >>          movq %rsp,%rdi
> >>          callq do_IRQ
> >>          jmp ret_from_intr
> >> @@ -454,13 +455,64 @@ ENTRY(page_fault)
> >>  GLOBAL(handle_exception)
> >>          SAVE_ALL CLAC
> >>  handle_exception_saved:
> >> +        GET_CURRENT(%rbx)
> >>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
> >>          jz    exception_with_ints_disabled
> >> -        sti
> >> +
> >> +.Lsmep_smap_orig:
> >> +        jmp   0f
> >> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
> >> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt),
> 0xcc
> >> +        .else
> >> +        // worst case: rex + opcode + modrm + 4-byte displacement
> >> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
> >> +        .endif
> >> +        .pushsection .altinstr_replacement, "ax"
> >> +.Lsmep_smap_alt:
> >> +        mov   VCPU_domain(%rbx),%rax
> >> +.Lsmep_smap_alt_end:
> >> +        .section .altinstructions, "a"
> >> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> >> +                             X86_FEATURE_SMEP, \
> >> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> >> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> >> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> >> +                             X86_FEATURE_SMAP, \
> >> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> >> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> >> +        .popsection
> >> +
> >> +        testb $3,UREGS_cs(%rsp)
> >> +        jz    0f
> >> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
> >> +        je    0f
> >> +        call  cr4_smep_smap_restore
> >> +        /*
> >> +         * An NMI or #MC may occur between clearing CR4.SMEP and
> CR4.SMAP in
> > Do you mean "before" when you typed "between" above?
> 
> The meaning is "between (clearing CR4.SMEP and CR4.SMAP in
> compat_restore_all_guest) and (it actually returning to guest)"
> 
> Nested lists in English are a source of confusion, even to native speakers.

Oh, thanks for the clarification! Do you know how "An NMI or #MC may occur
between clearing CR4.SMEP and CR4.SMAP in compat_restore_all_guest and
it actually returning to guest context, in which case the guest would run with
the two features enabled. " can happen? Especially how the guest can run
with the two features enabled? 

Thanks,
Feng

> 
> ~Andrew
> 
> >> +         * compat_restore_all_guest and it actually returning to guest
> >> +         * context, in which case the guest would run with the two features
> >> +         * enabled. The only bad that can happen from this is a kernel mode
> >> +         * #PF which the guest doesn't expect. Rather than trying to make the
> >> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
> >> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
> >> +         * back to the guest (which will in turn clear the two CR4 bits) to
> >> +         * re-execute the instruction. If we get back here, the CR4 bits
> >> +         * should then be found clear (unless another NMI/#MC occurred at
> >> +         * exactly the right time), and we'll continue processing the
> >> +         * exception as normal.
> >> +         */
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 12:27       ` Wu, Feng
@ 2016-03-09 12:33         ` Andrew Cooper
  2016-03-09 12:36           ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-03-09 12:33 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich, xen-devel; +Cc: Keir Fraser

On 09/03/16 12:27, Wu, Feng wrote:
>
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Wednesday, March 9, 2016 6:46 PM
>> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>; xen-
>> devel <xen-devel@lists.xenproject.org>
>> Cc: Keir Fraser <keir@xen.org>
>> Subject: Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV
>> guest code
>>
>> On 09/03/16 08:09, Wu, Feng wrote:
>>
>>>> --- a/xen/arch/x86/x86_64/entry.S
>>>> +++ b/xen/arch/x86/x86_64/entry.S
>>>> @@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
>>>>
>>>>  ENTRY(common_interrupt)
>>>>          SAVE_ALL CLAC
>>>> +        SMEP_SMAP_RESTORE
>>>>          movq %rsp,%rdi
>>>>          callq do_IRQ
>>>>          jmp ret_from_intr
>>>> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>>>>  GLOBAL(handle_exception)
>>>>          SAVE_ALL CLAC
>>>>  handle_exception_saved:
>>>> +        GET_CURRENT(%rbx)
>>>>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>>>>          jz    exception_with_ints_disabled
>>>> -        sti
>>>> +
>>>> +.Lsmep_smap_orig:
>>>> +        jmp   0f
>>>> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
>>>> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt),
>> 0xcc
>>>> +        .else
>>>> +        // worst case: rex + opcode + modrm + 4-byte displacement
>>>> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
>>>> +        .endif
>>>> +        .pushsection .altinstr_replacement, "ax"
>>>> +.Lsmep_smap_alt:
>>>> +        mov   VCPU_domain(%rbx),%rax
>>>> +.Lsmep_smap_alt_end:
>>>> +        .section .altinstructions, "a"
>>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>>> +                             X86_FEATURE_SMEP, \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>>> +                             X86_FEATURE_SMAP, \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>>> +        .popsection
>>>> +
>>>> +        testb $3,UREGS_cs(%rsp)
>>>> +        jz    0f
>>>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
>>>> +        je    0f
>>>> +        call  cr4_smep_smap_restore
>>>> +        /*
>>>> +         * An NMI or #MC may occur between clearing CR4.SMEP and
>> CR4.SMAP in
>>> Do you mean "before" when you typed "between" above?
>> The meaning is "between (clearing CR4.SMEP and CR4.SMAP in
>> compat_restore_all_guest) and (it actually returning to guest)"
>>
>> Nested lists in English are a source of confusion, even to native speakers.
> Oh, thanks for the clarification! Do you know how "An NMI or #MC may occur
> between clearing CR4.SMEP and CR4.SMAP in compat_restore_all_guest and
> it actually returning to guest context, in which case the guest would run with
> the two features enabled. " can happen? Especially how the guest can run
> with the two features enabled?

NMIs and MCEs can occur at any point, even if interrupts are disabled.

The bad situation is this sequence:

* Xen is returning to the guest and disables CR4.SMEP/SMAP
* NMI occurs while still in Xen
* NMI exit path sees it is returning to Xen and re-enabled CR4.SMEP/SMAP
* Xen ends up returning to guest with CR4.SMEP/SMAP enabled.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 12:33         ` Andrew Cooper
@ 2016-03-09 12:36           ` Jan Beulich
  2016-03-09 12:54             ` Wu, Feng
  2016-03-09 13:35             ` Wu, Feng
  0 siblings, 2 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-09 12:36 UTC (permalink / raw)
  To: andrew.cooper3, feng.wu, xen-devel; +Cc: keir

>>>> Andrew Cooper <andrew.cooper3@citrix.com> 03/09/16 1:33 PM >>>
>On 09/03/16 12:27, Wu, Feng wrote:
>> Oh, thanks for the clarification! Do you know how "An NMI or #MC may occur
>> between clearing CR4.SMEP and CR4.SMAP in compat_restore_all_guest and
>> it actually returning to guest context, in which case the guest would run with
>> the two features enabled. " can happen? Especially how the guest can run
>> with the two features enabled?
>
>NMIs and MCEs can occur at any point, even if interrupts are disabled.
>
>The bad situation is this sequence:
>
>* Xen is returning to the guest and disables CR4.SMEP/SMAP
>* NMI occurs while still in Xen
>* NMI exit path sees it is returning to Xen and re-enabled CR4.SMEP/SMAP

Well, almost: Re-enabling happens on the NMI entry path. The NMI exit
path would, seeing it's returning to Xen context, simply not disable them
again.

Jan

>* Xen ends up returning to guest with CR4.SMEP/SMAP enabled.
>
>~Andrew



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 12:36           ` Jan Beulich
@ 2016-03-09 12:54             ` Wu, Feng
  2016-03-09 13:35             ` Wu, Feng
  1 sibling, 0 replies; 67+ messages in thread
From: Wu, Feng @ 2016-03-09 12:54 UTC (permalink / raw)
  To: Jan Beulich, andrew.cooper3, xen-devel; +Cc: Wu, Feng, keir



> -----Original Message-----
> From: Jan Beulich [mailto:jbeulich@suse.com]
> Sent: Wednesday, March 9, 2016 8:37 PM
> To: andrew.cooper3@citrix.com; Wu, Feng <feng.wu@intel.com>; xen-
> devel@lists.xenproject.org
> Cc: keir@xen.org
> Subject: Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV
> guest code
> 
> >>>> Andrew Cooper <andrew.cooper3@citrix.com> 03/09/16 1:33 PM >>>
> >On 09/03/16 12:27, Wu, Feng wrote:
> >> Oh, thanks for the clarification! Do you know how "An NMI or #MC may
> occur
> >> between clearing CR4.SMEP and CR4.SMAP in compat_restore_all_guest and
> >> it actually returning to guest context, in which case the guest would run with
> >> the two features enabled. " can happen? Especially how the guest can run
> >> with the two features enabled?
> >
> >NMIs and MCEs can occur at any point, even if interrupts are disabled.
> >
> >The bad situation is this sequence:
> >
> >* Xen is returning to the guest and disables CR4.SMEP/SMAP
> >* NMI occurs while still in Xen
> >* NMI exit path sees it is returning to Xen and re-enabled CR4.SMEP/SMAP
> 
> Well, almost: Re-enabling happens on the NMI entry path. The NMI exit
> path would, seeing it's returning to Xen context, simply not disable them
> again.

Oh, this is the point I ignored. Thanks for the clarification.

Thanks,
Feng

> 
> Jan
> 
> >* Xen ends up returning to guest with CR4.SMEP/SMAP enabled.
> >
> >~Andrew
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 12:36           ` Jan Beulich
  2016-03-09 12:54             ` Wu, Feng
@ 2016-03-09 13:35             ` Wu, Feng
  2016-03-09 13:42               ` Andrew Cooper
  1 sibling, 1 reply; 67+ messages in thread
From: Wu, Feng @ 2016-03-09 13:35 UTC (permalink / raw)
  To: Jan Beulich, andrew.cooper3, xen-devel; +Cc: Wu, Feng, keir



> -----Original Message-----
> From: Jan Beulich [mailto:jbeulich@suse.com]
> Sent: Wednesday, March 9, 2016 8:37 PM
> To: andrew.cooper3@citrix.com; Wu, Feng <feng.wu@intel.com>; xen-
> devel@lists.xenproject.org
> Cc: keir@xen.org
> Subject: Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV
> guest code
> 
> >>>> Andrew Cooper <andrew.cooper3@citrix.com> 03/09/16 1:33 PM >>>
> >On 09/03/16 12:27, Wu, Feng wrote:
> >> Oh, thanks for the clarification! Do you know how "An NMI or #MC may
> occur
> >> between clearing CR4.SMEP and CR4.SMAP in compat_restore_all_guest and
> >> it actually returning to guest context, in which case the guest would run with
> >> the two features enabled. " can happen? Especially how the guest can run
> >> with the two features enabled?
> >
> >NMIs and MCEs can occur at any point, even if interrupts are disabled.
> >
> >The bad situation is this sequence:
> >
> >* Xen is returning to the guest and disables CR4.SMEP/SMAP
> >* NMI occurs while still in Xen
> >* NMI exit path sees it is returning to Xen and re-enabled CR4.SMEP/SMAP
> 
> Well, almost: Re-enabling happens on the NMI entry path. The NMI exit
> path would, seeing it's returning to Xen context, simply not disable them
> again.

Thinking about this again, in this case, when the NMI happens, we are in
Xen context (CPL in cs is 0), so the CPL of the saved cs in stack is 0,right?
why do we re-enable CR4.SMEP/SMAP in this case? I mean do we only
need to enable SMEP/SMAP when coming from 32bit pv guest (CPL of cs is 1) ?

Thanks,
Feng

> 
> Jan
> 
> >* Xen ends up returning to guest with CR4.SMEP/SMAP enabled.
> >
> >~Andrew
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 13:35             ` Wu, Feng
@ 2016-03-09 13:42               ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-03-09 13:42 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich, xen-devel; +Cc: keir

On 09/03/16 13:35, Wu, Feng wrote:
>
>> -----Original Message-----
>> From: Jan Beulich [mailto:jbeulich@suse.com]
>> Sent: Wednesday, March 9, 2016 8:37 PM
>> To: andrew.cooper3@citrix.com; Wu, Feng <feng.wu@intel.com>; xen-
>> devel@lists.xenproject.org
>> Cc: keir@xen.org
>> Subject: Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV
>> guest code
>>
>>>>>> Andrew Cooper <andrew.cooper3@citrix.com> 03/09/16 1:33 PM >>>
>>> On 09/03/16 12:27, Wu, Feng wrote:
>>>> Oh, thanks for the clarification! Do you know how "An NMI or #MC may
>> occur
>>>> between clearing CR4.SMEP and CR4.SMAP in compat_restore_all_guest and
>>>> it actually returning to guest context, in which case the guest would run with
>>>> the two features enabled. " can happen? Especially how the guest can run
>>>> with the two features enabled?
>>> NMIs and MCEs can occur at any point, even if interrupts are disabled.
>>>
>>> The bad situation is this sequence:
>>>
>>> * Xen is returning to the guest and disables CR4.SMEP/SMAP
>>> * NMI occurs while still in Xen
>>> * NMI exit path sees it is returning to Xen and re-enabled CR4.SMEP/SMAP
>> Well, almost: Re-enabling happens on the NMI entry path. The NMI exit
>> path would, seeing it's returning to Xen context, simply not disable them
>> again.
> Thinking about this again, in this case, when the NMI happens, we are in
> Xen context (CPL in cs is 0), so the CPL of the saved cs in stack is 0,right?
> why do we re-enable CR4.SMEP/SMAP in this case? I mean do we only
> need to enable SMEP/SMAP when coming from 32bit pv guest (CPL of cs is 1) ?

We always want Xen to be running with SMEP/SMAP enabled.  Therefore the
safer and simpler option is to always enable them if we observe them
disabled.

Interrupting a 32bit PV guest might end up seeing a cpl of 1 or 3, and
peeking into the active struct domain to check is_pv32_domain would be a
larger overhead on the entry paths.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 10:45     ` Andrew Cooper
  2016-03-09 12:27       ` Wu, Feng
@ 2016-03-09 14:03       ` Jan Beulich
  1 sibling, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-09 14:03 UTC (permalink / raw)
  To: Andrew Cooper, Feng Wu, xen-devel; +Cc: Keir Fraser

>>> On 09.03.16 at 11:45, <andrew.cooper3@citrix.com> wrote:
> On 09/03/16 08:09, Wu, Feng wrote:
> 
>>> --- a/xen/arch/x86/x86_64/entry.S
>>> +++ b/xen/arch/x86/x86_64/entry.S
>>> @@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
>>>
>>>  ENTRY(common_interrupt)
>>>          SAVE_ALL CLAC
>>> +        SMEP_SMAP_RESTORE
>>>          movq %rsp,%rdi
>>>          callq do_IRQ
>>>          jmp ret_from_intr
>>> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>>>  GLOBAL(handle_exception)
>>>          SAVE_ALL CLAC
>>>  handle_exception_saved:
>>> +        GET_CURRENT(%rbx)
>>>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>>>          jz    exception_with_ints_disabled
>>> -        sti
>>> +
>>> +.Lsmep_smap_orig:
>>> +        jmp   0f
>>> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
>>> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 
> 0xcc
>>> +        .else
>>> +        // worst case: rex + opcode + modrm + 4-byte displacement
>>> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
>>> +        .endif
>>> +        .pushsection .altinstr_replacement, "ax"
>>> +.Lsmep_smap_alt:
>>> +        mov   VCPU_domain(%rbx),%rax
>>> +.Lsmep_smap_alt_end:
>>> +        .section .altinstructions, "a"
>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>> +                             X86_FEATURE_SMEP, \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>> +                             X86_FEATURE_SMAP, \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>> +        .popsection
>>> +
>>> +        testb $3,UREGS_cs(%rsp)
>>> +        jz    0f
>>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
>>> +        je    0f
>>> +        call  cr4_smep_smap_restore
>>> +        /*
>>> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP 
> in
>> Do you mean "before" when you typed "between" above?
> 
> The meaning is "between (clearing CR4.SMEP and CR4.SMAP in
> compat_restore_all_guest) and (it actually returning to guest)"
> 
> Nested lists in English are a source of confusion, even to native speakers.

I've switched the first "and" to / to avoid the ambiguity.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09  8:09   ` Wu, Feng
  2016-03-09 10:45     ` Andrew Cooper
@ 2016-03-09 14:07     ` Jan Beulich
  1 sibling, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-09 14:07 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 09.03.16 at 09:09, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, March 4, 2016 7:28 PM
>> @@ -1471,7 +1475,10 @@ void __init noreturn __start_xen(unsigne
>>       * copy_from_user().
>>       */
>>      if ( cpu_has_smap )
>> +    {
>> +        cr4_smep_smap_mask &= ~X86_CR4_SMAP;
> 
> You change ' cr4_smep_smap_mask ' here ...
> 
>>          write_cr4(read_cr4() & ~X86_CR4_SMAP);
>> +    }
>> 
>>      printk("%sNX (Execute Disable) protection %sactive\n",
>>             cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
>> @@ -1488,7 +1495,10 @@ void __init noreturn __start_xen(unsigne
>>          panic("Could not set up DOM0 guest OS");
>> 
>>      if ( cpu_has_smap )
>> +    {
>>          write_cr4(read_cr4() | X86_CR4_SMAP);
>> +        cr4_smep_smap_mask |= X86_CR4_SMAP;
> 
> ... and here. I am wonder whether it is because Domain 0 can actually start
> running between the two place? Or I don't think the changes in the above
> two places is needed. right?

They very definitely needed, to avoid hitting the BUG in
cr4_pv32_restore (cr4_smep_smap_restore in this patch
version) in every interrupt that occurs while Dom0 is being
constructed.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09  8:09       ` Wu, Feng
@ 2016-03-09 14:09         ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-09 14:09 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 09.03.16 at 09:09, <feng.wu@intel.com> wrote:
>> >> +/* This mustn't modify registers other than %rax. */
>> >> +ENTRY(cr4_smep_smap_restore)
>> >> +        mov   %cr4, %rax
>> >> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
>> >> +        jnz   0f
> 
> If we clear every place where we are back to 32bit pv guest,
> X86_CR4_SMEP and X86_CR4_SMAP bit should be clear
> in CR4, right?  If that is the case, we cannot jump to 0f.

I think Andrew's reply to (I think) a later mail of yours already
answered this, but just in case: We unconditionally come here
on paths that _may_ be used when entering Xen out of 32-bit
PV guest context. I.e. we do not know which state the two
flags are in.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
  2016-03-09 11:19       ` Andrew Cooper
@ 2016-03-09 14:28         ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-09 14:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser, Feng Wu

>>> On 09.03.16 at 12:19, <andrew.cooper3@citrix.com> wrote:
> On 08/03/16 07:57, Jan Beulich wrote:
>>>> @@ -174,10 +174,43 @@ compat_bad_hypercall:
>>>>  /* %rbx: struct vcpu, interrupts disabled */
>>>>  ENTRY(compat_restore_all_guest)
>>>>          ASSERT_INTERRUPTS_DISABLED
>>>> +.Lcr4_orig:
>>>> +        ASM_NOP3 /* mov   %cr4, %rax */
>>>> +        ASM_NOP6 /* and   $..., %rax */
>>>> +        ASM_NOP3 /* mov   %rax, %cr4 */
>>>> +        .pushsection .altinstr_replacement, "ax"
>>>> +.Lcr4_alt:
>>>> +        mov   %cr4, %rax
>>>> +        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
>>>> +        mov   %rax, %cr4
>>>> +.Lcr4_alt_end:
>>>> +        .section .altinstructions, "a"
>>>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
>>>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>>>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
>>>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>>> These 12's look as if they should be (.Lcr4_alt - .Lcr4_orig).
>> Well, the NOPs that get put there make 12 (= 3 + 6 + 3) a
>> pretty obvious (shorter and hence more readable) option. But
>> yes, if you're of the strong opinion that we should use the
>> longer alternative, I can switch these around.
> 
> I have to admit that I prefer the Linux ALTERNATIVE macro for assembly,
> which takes care of the calculations like this.  It is slightly
> unfortunate that it generally requires its assembly blocks in strings,
> and is unsuitable for larger blocks.  Perhaps we can see about an
> variant in due course.

I due course to me means subsequently - is that the meaning you
imply here too?

But what's interesting about this suggestion: Their macro uses
.skip instead of .org, which means I should be able to replace the
ugly gas bug workaround by simply using .skip. I'll give that a try.

>>>> +        .pushsection .altinstr_replacement, "ax"
>>>> +.Lsmep_smap_alt:
>>>> +        mov   VCPU_domain(%rbx),%rax
>>>> +.Lsmep_smap_alt_end:
>>>> +        .section .altinstructions, "a"
>>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>>> +                             X86_FEATURE_SMEP, \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>>> +                             X86_FEATURE_SMAP, \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>>> +        .popsection
>>>> +
>>>> +        testb $3,UREGS_cs(%rsp)
>>>> +        jz    0f
>>>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
>>> This comparison is wrong on hardware lacking SMEP and SMAP, as the "mov
>>> VCPU_domain(%rbx),%rax" won't have happened.
>> That mov indeed won't have happened, but the original instruction
>> is a branch past all of this code, so the above is correct (and I did
>> test on older hardware).
> 
> Oh so it wont.  It is moderately subtle that this entire codeblock is
> logically contained in the alternative.
> 
> It would be far clearer, and work around your org bug, if this was a
> single alternative which patched jump into a nop.

I specifically wanted to avoid needlessly patching a NOP in there
when going forward we expect the majority of systems to have
that patching done.

> At the very least, a label of .Lcr3_pv32_fixup_done would be an
> improvement over 0.

Agreed (albeit I prefer it to be named .Lcr4_pv32_done).

>>>> +        je    0f
>>>> +        call  cr4_smep_smap_restore
>>>> +        /*
>>>> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
>>>> +         * compat_restore_all_guest and it actually returning to guest
>>>> +         * context, in which case the guest would run with the two features
>>>> +         * enabled. The only bad that can happen from this is a kernel mode
>>>> +         * #PF which the guest doesn't expect. Rather than trying to make the
>>>> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
>>>> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
>>>> +         * back to the guest (which will in turn clear the two CR4 bits) to
>>>> +         * re-execute the instruction. If we get back here, the CR4 bits
>>>> +         * should then be found clear (unless another NMI/#MC occurred at
>>>> +         * exactly the right time), and we'll continue processing the
>>>> +         * exception as normal.
>>>> +         */
>>>> +        test  %rax,%rax
>>>> +        jnz   0f
>>>> +        mov   $PFEC_page_present,%al
>>>> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
>>>> +        jne   0f
>>>> +        xor   UREGS_error_code(%rsp),%eax
>>>> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
>>>> +        jz    compat_test_all_events
>>>> +0:      sti
>>> Its code like this which makes me even more certain that we have far too
>>> much code written in assembly which doesn't need to be.  Maybe not this
>>> specific sample, but it has taken me 15 minutes and a pad of paper to
>>> try and work out how this conditional works, and I am still not certain
>>> its correct.  In particular, PFEC_prot_key looks like it fool the test
>>> into believing a non-smap/smep fault was a smap/smep fault.
>> Not sure how you come to think of PFEC_prot_key here: That's
>> a bit which can be set only together with PFEC_user_mode, yet
>> we care about kernel mode faults only here.
> 
> I would not make that assumption.  Assumptions about the valid set of
> #PF flags are precisely the reason that older Linux falls into an
> infinite loop when encountering a SMAP pagefault, rather than a clean crash.

We have to make _some_ assumption here on the bits which
so far have no meaning. Whichever route we go, we can't
exclude that for things to be right with new features getting
added, we may need to adjust the mask. As to the example you
give for comparison - that's apples and oranges as long as Xen
isn't meant to run as PV guest on some (other) hypervisor.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling
@ 2016-03-10  9:44 ` Jan Beulich
  2016-03-10  9:53   ` [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
                     ` (3 more replies)
  0 siblings, 4 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-10  9:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

As was previously explained[1], SMAP (and with less relevance also
SMEP) is not compatible with 32-bit PV guests which aren't aware/
prepared to be run with that feature enabled. Andrew's original
approach either sacrificed architectural correctness for making
32-bit guests work again or by disabling SMAP also for not
insignificant portions of hypervisor code, both by allowing to control
the workaround mode via command line option.

This alternative approach disables SMEP and SMAP only while
running 32-bit PV guest code plus a few hypervisor instructions
early after entering hypervisor context or later before leaving it.

The 4th patch really is unrelated except for not applying cleanly
without the earlier ones, and the potential having been noticed
while putting together the 2nd one.

1: suppress SMEP and SMAP while running 32-bit PV guest code
2: use optimal NOPs to fill the SMEP/SMAP placeholders
3: use 32-bit loads for 32-bit PV guest state reload

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Various changes to patches 1 and 2 - see there.

[1] http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg03888.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-03-10  9:44 ` [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
@ 2016-03-10  9:53   ` Jan Beulich
  2016-05-13 15:48     ` Andrew Cooper
  2016-03-10  9:54   ` [PATCH v2 2/3] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-10  9:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 12192 bytes --]

Since such guests' kernel code runs in ring 1, their memory accesses,
at the paging layer, are supervisor mode ones, and hence subject to
SMAP/SMEP checks. Such guests cannot be expected to be aware of those
two features though (and so far we also don't expose the respective
feature flags), and hence may suffer page faults they cannot deal with.

While the placement of the re-enabling slightly weakens the intended
protection, it was selected such that 64-bit paths would remain
unaffected where possible. At the expense of a further performance hit
the re-enabling could be put right next to the CLACs.

Note that this introduces a number of extra TLB flushes - CR4.SMEP
transitioning from 0 to 1 always causes a flush, and it transitioning
from 1 to 0 may also do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Use more generic symbol/label names. Comment the BUG in assembly
    code and restrict it to debug builds. Add C equivalent to #PF
    re-execution condition in a comment. Use .skip instead of .org in
    handle_exception to avoid gas bug (and its slightly ugly
    workaround). Use a properly named label instead of a numeric one
    in handle_exception.

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
 static bool_t __initdata opt_smap = 1;
 boolean_param("smap", opt_smap);
 
+unsigned long __read_mostly cr4_pv32_mask;
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1335,6 +1337,8 @@ void __init noreturn __start_xen(unsigne
     if ( cpu_has_smap )
         set_in_cr4(X86_CR4_SMAP);
 
+    cr4_pv32_mask = mmu_cr4_features & (X86_CR4_SMEP | X86_CR4_SMAP);
+
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
@@ -1471,7 +1475,10 @@ void __init noreturn __start_xen(unsigne
      * copy_from_user().
      */
     if ( cpu_has_smap )
+    {
+        cr4_pv32_mask &= ~X86_CR4_SMAP;
         write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
 
     printk("%sNX (Execute Disable) protection %sactive\n",
            cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
@@ -1488,7 +1495,10 @@ void __init noreturn __start_xen(unsigne
         panic("Could not set up DOM0 guest OS");
 
     if ( cpu_has_smap )
+    {
         write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_pv32_mask |= X86_CR4_SMAP;
+    }
 
     /* Scrub RAM that is still free and so may go to an unprivileged domain. */
     scrub_heap_pages();
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -16,14 +16,16 @@ ENTRY(compat_hypercall)
         ASM_CLAC
         pushq $0
         SAVE_VOLATILE type=TRAP_syscall compat=1
+        CR4_PV32_RESTORE
 
         cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
         movl  $HYPERCALL_VECTOR,%edi
         call  check_for_unexpected_msi
-        LOAD_C_CLOBBERED
+        LOAD_C_CLOBBERED compat=1 ax=0
 UNLIKELY_END(msi_check)
 
+        movl  UREGS_rax(%rsp),%eax
         GET_CURRENT(%rbx)
 
         cmpl  $NR_hypercalls,%eax
@@ -33,7 +35,6 @@ UNLIKELY_END(msi_check)
         pushq UREGS_rbx(%rsp); pushq %rcx; pushq %rdx; pushq %rsi; pushq %rdi
         pushq UREGS_rbp+5*8(%rsp)
         leaq  compat_hypercall_args_table(%rip),%r10
-        movl  %eax,%eax
         movl  $6,%ecx
         subb  (%r10,%rax,1),%cl
         movq  %rsp,%rdi
@@ -48,7 +49,6 @@ UNLIKELY_END(msi_check)
 #define SHADOW_BYTES 16 /* Shadow EIP + shadow hypercall # */
 #else
         /* Relocate argument registers and zero-extend to 64 bits. */
-        movl  %eax,%eax              /* Hypercall #  */
         xchgl %ecx,%esi              /* Arg 2, Arg 4 */
         movl  %edx,%edx              /* Arg 3        */
         movl  %edi,%r8d              /* Arg 5        */
@@ -174,10 +174,46 @@ compat_bad_hypercall:
 /* %rbx: struct vcpu, interrupts disabled */
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
+.Lcr4_orig:
+        ASM_NOP3 /* mov   %cr4, %rax */
+        ASM_NOP6 /* and   $..., %rax */
+        ASM_NOP3 /* mov   %rax, %cr4 */
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_alt:
+        mov   %cr4, %rax
+        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
+        mov   %rax, %cr4
+.Lcr4_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        .popsection
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
         _ASM_PRE_EXTABLE(.Lft0, handle_exception)
 
+/* This mustn't modify registers other than %rax. */
+ENTRY(cr4_pv32_restore)
+        mov   %cr4, %rax
+        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
+        jnz   0f
+        or    cr4_pv32_mask(%rip), %rax
+        mov   %rax, %cr4
+        ret
+0:
+#ifndef NDEBUG
+        /* Check that _all_ of the bits intended to be set actually are. */
+        and   cr4_pv32_mask(%rip), %eax
+        cmp   cr4_pv32_mask(%rip), %eax
+        je    1f
+        BUG
+1:
+#endif
+        xor   %eax, %eax
+        ret
+
 /* %rdx: trap_bounce, %rbx: struct vcpu */
 ENTRY(compat_post_handle_exception)
         testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
@@ -190,6 +226,7 @@ ENTRY(compat_post_handle_exception)
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         sti
+        CR4_PV32_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
         pushq %r11
@@ -225,6 +262,7 @@ UNLIKELY_END(compat_syscall_gpf)
         jmp   .Lcompat_bounce_exception
 
 ENTRY(compat_sysenter)
+        CR4_PV32_RESTORE
         movq  VCPU_trap_ctxt(%rbx),%rcx
         cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
         movzwl VCPU_sysenter_sel(%rbx),%eax
@@ -238,6 +276,7 @@ ENTRY(compat_sysenter)
         jmp   compat_test_all_events
 
 ENTRY(compat_int80_direct_trap)
+        CR4_PV32_RESTORE
         call  compat_create_bounce_frame
         jmp   compat_test_all_events
 
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
 
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
+        CR4_PV32_RESTORE
         movq %rsp,%rdi
         callq do_IRQ
         jmp ret_from_intr
@@ -454,13 +455,67 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
+        GET_CURRENT(%rbx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
+
+.Lcr4_pv32_orig:
+        jmp   .Lcr4_pv32_done
+        .skip (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt) - (. - .Lcr4_pv32_orig), 0xcc
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_pv32_alt:
+        mov   VCPU_domain(%rbx),%rax
+.Lcr4_pv32_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMEP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMAP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        .popsection
+
+        testb $3,UREGS_cs(%rsp)
+        jz    .Lcr4_pv32_done
+        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
+        je    .Lcr4_pv32_done
+        call  cr4_pv32_restore
+        /*
+         * An NMI or #MC may occur between clearing CR4.SMEP / CR4.SMAP in
+         * compat_restore_all_guest and it actually returning to guest
+         * context, in which case the guest would run with the two features
+         * enabled. The only bad that can happen from this is a kernel mode
+         * #PF which the guest doesn't expect. Rather than trying to make the
+         * NMI/#MC exit path honor the intended CR4 setting, simply check
+         * whether the wrong CR4 was in use when the #PF occurred, and exit
+         * back to the guest (which will in turn clear the two CR4 bits) to
+         * re-execute the instruction. If we get back here, the CR4 bits
+         * should then be found clear (unless another NMI/#MC occurred at
+         * exactly the right time), and we'll continue processing the
+         * exception as normal.
+         */
+        test  %rax,%rax
+        jnz   .Lcr4_pv32_done
+        /*
+         * The below effectively is
+         * if ( regs->entry_vector == TRAP_page_fault &&
+         *      (regs->error_code & PFEC_page_present) &&
+         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
+         *     goto compat_test_all_events;
+         */
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   .Lcr4_pv32_done
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+.Lcr4_pv32_done:
         sti
 1:      movq  %rsp,%rdi
         movzbl UREGS_entry_vector(%rsp),%eax
         leaq  exception_table(%rip),%rdx
-        GET_CURRENT(%rbx)
         PERFC_INCR(exceptions, %rax, %rbx)
         callq *(%rdx,%rax,8)
         testb $3,UREGS_cs(%rsp)
@@ -592,6 +647,7 @@ handle_ist_exception:
         SAVE_ALL CLAC
         testb $3,UREGS_cs(%rsp)
         jz    1f
+        CR4_PV32_RESTORE
         /* Interrupted guest context. Copy the context to stack bottom. */
         GET_CPUINFO_FIELD(guest_cpu_user_regs,%rdi)
         movq  %rsp,%rsi
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -209,6 +209,16 @@ void ret_from_intr(void);
 
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
+
+#define CR4_PV32_RESTORE                                           \
+        667: ASM_NOP5;                                             \
+        .pushsection .altinstr_replacement, "ax";                  \
+        668: call cr4_pv32_restore;                                \
+        .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
+        .popsection
+
 #else
 static always_inline void clac(void)
 {
@@ -308,14 +318,18 @@ static always_inline void stac(void)
  *
  * For the way it is used in RESTORE_ALL, this macro must preserve EFLAGS.ZF.
  */
-.macro LOAD_C_CLOBBERED compat=0
+.macro LOAD_C_CLOBBERED compat=0 ax=1
 .if !\compat
         movq  UREGS_r11(%rsp),%r11
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.endif
+.if \ax
         movq  UREGS_rax(%rsp),%rax
+.endif
+.elseif \ax
+        movl  UREGS_rax(%rsp),%eax
+.endif
         movq  UREGS_rcx(%rsp),%rcx
         movq  UREGS_rdx(%rsp),%rdx
         movq  UREGS_rsi(%rsp),%rsi
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -134,12 +134,12 @@
 #define TF_kernel_mode         (1<<_TF_kernel_mode)
 
 /* #PF error code values. */
-#define PFEC_page_present   (1U<<0)
-#define PFEC_write_access   (1U<<1)
-#define PFEC_user_mode      (1U<<2)
-#define PFEC_reserved_bit   (1U<<3)
-#define PFEC_insn_fetch     (1U<<4)
-#define PFEC_prot_key       (1U<<5)
+#define PFEC_page_present   (_AC(1,U) << 0)
+#define PFEC_write_access   (_AC(1,U) << 1)
+#define PFEC_user_mode      (_AC(1,U) << 2)
+#define PFEC_reserved_bit   (_AC(1,U) << 3)
+#define PFEC_insn_fetch     (_AC(1,U) << 4)
+#define PFEC_prot_key       (_AC(1,U) << 5)
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
 #define PFEC_page_shared    (1U<<17)



[-- Attachment #2: x86-32on64-suppress-SMEP-SMAP.patch --]
[-- Type: text/plain, Size: 12254 bytes --]

x86: suppress SMEP and SMAP while running 32-bit PV guest code

Since such guests' kernel code runs in ring 1, their memory accesses,
at the paging layer, are supervisor mode ones, and hence subject to
SMAP/SMEP checks. Such guests cannot be expected to be aware of those
two features though (and so far we also don't expose the respective
feature flags), and hence may suffer page faults they cannot deal with.

While the placement of the re-enabling slightly weakens the intended
protection, it was selected such that 64-bit paths would remain
unaffected where possible. At the expense of a further performance hit
the re-enabling could be put right next to the CLACs.

Note that this introduces a number of extra TLB flushes - CR4.SMEP
transitioning from 0 to 1 always causes a flush, and it transitioning
from 1 to 0 may also do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Use more generic symbol/label names. Comment the BUG in assembly
    code and restrict it to debug builds. Add C equivalent to #PF
    re-execution condition in a comment. Use .skip instead of .org in
    handle_exception to avoid gas bug (and its slightly ugly
    workaround). Use a properly named label instead of a numeric one
    in handle_exception.

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
 static bool_t __initdata opt_smap = 1;
 boolean_param("smap", opt_smap);
 
+unsigned long __read_mostly cr4_pv32_mask;
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1335,6 +1337,8 @@ void __init noreturn __start_xen(unsigne
     if ( cpu_has_smap )
         set_in_cr4(X86_CR4_SMAP);
 
+    cr4_pv32_mask = mmu_cr4_features & (X86_CR4_SMEP | X86_CR4_SMAP);
+
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
@@ -1471,7 +1475,10 @@ void __init noreturn __start_xen(unsigne
      * copy_from_user().
      */
     if ( cpu_has_smap )
+    {
+        cr4_pv32_mask &= ~X86_CR4_SMAP;
         write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
 
     printk("%sNX (Execute Disable) protection %sactive\n",
            cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
@@ -1488,7 +1495,10 @@ void __init noreturn __start_xen(unsigne
         panic("Could not set up DOM0 guest OS");
 
     if ( cpu_has_smap )
+    {
         write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_pv32_mask |= X86_CR4_SMAP;
+    }
 
     /* Scrub RAM that is still free and so may go to an unprivileged domain. */
     scrub_heap_pages();
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -16,14 +16,16 @@ ENTRY(compat_hypercall)
         ASM_CLAC
         pushq $0
         SAVE_VOLATILE type=TRAP_syscall compat=1
+        CR4_PV32_RESTORE
 
         cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
         movl  $HYPERCALL_VECTOR,%edi
         call  check_for_unexpected_msi
-        LOAD_C_CLOBBERED
+        LOAD_C_CLOBBERED compat=1 ax=0
 UNLIKELY_END(msi_check)
 
+        movl  UREGS_rax(%rsp),%eax
         GET_CURRENT(%rbx)
 
         cmpl  $NR_hypercalls,%eax
@@ -33,7 +35,6 @@ UNLIKELY_END(msi_check)
         pushq UREGS_rbx(%rsp); pushq %rcx; pushq %rdx; pushq %rsi; pushq %rdi
         pushq UREGS_rbp+5*8(%rsp)
         leaq  compat_hypercall_args_table(%rip),%r10
-        movl  %eax,%eax
         movl  $6,%ecx
         subb  (%r10,%rax,1),%cl
         movq  %rsp,%rdi
@@ -48,7 +49,6 @@ UNLIKELY_END(msi_check)
 #define SHADOW_BYTES 16 /* Shadow EIP + shadow hypercall # */
 #else
         /* Relocate argument registers and zero-extend to 64 bits. */
-        movl  %eax,%eax              /* Hypercall #  */
         xchgl %ecx,%esi              /* Arg 2, Arg 4 */
         movl  %edx,%edx              /* Arg 3        */
         movl  %edi,%r8d              /* Arg 5        */
@@ -174,10 +174,46 @@ compat_bad_hypercall:
 /* %rbx: struct vcpu, interrupts disabled */
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
+.Lcr4_orig:
+        ASM_NOP3 /* mov   %cr4, %rax */
+        ASM_NOP6 /* and   $..., %rax */
+        ASM_NOP3 /* mov   %rax, %cr4 */
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_alt:
+        mov   %cr4, %rax
+        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
+        mov   %rax, %cr4
+.Lcr4_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        .popsection
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
         _ASM_PRE_EXTABLE(.Lft0, handle_exception)
 
+/* This mustn't modify registers other than %rax. */
+ENTRY(cr4_pv32_restore)
+        mov   %cr4, %rax
+        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
+        jnz   0f
+        or    cr4_pv32_mask(%rip), %rax
+        mov   %rax, %cr4
+        ret
+0:
+#ifndef NDEBUG
+        /* Check that _all_ of the bits intended to be set actually are. */
+        and   cr4_pv32_mask(%rip), %eax
+        cmp   cr4_pv32_mask(%rip), %eax
+        je    1f
+        BUG
+1:
+#endif
+        xor   %eax, %eax
+        ret
+
 /* %rdx: trap_bounce, %rbx: struct vcpu */
 ENTRY(compat_post_handle_exception)
         testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
@@ -190,6 +226,7 @@ ENTRY(compat_post_handle_exception)
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         sti
+        CR4_PV32_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
         pushq %r11
@@ -225,6 +262,7 @@ UNLIKELY_END(compat_syscall_gpf)
         jmp   .Lcompat_bounce_exception
 
 ENTRY(compat_sysenter)
+        CR4_PV32_RESTORE
         movq  VCPU_trap_ctxt(%rbx),%rcx
         cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
         movzwl VCPU_sysenter_sel(%rbx),%eax
@@ -238,6 +276,7 @@ ENTRY(compat_sysenter)
         jmp   compat_test_all_events
 
 ENTRY(compat_int80_direct_trap)
+        CR4_PV32_RESTORE
         call  compat_create_bounce_frame
         jmp   compat_test_all_events
 
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
 
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
+        CR4_PV32_RESTORE
         movq %rsp,%rdi
         callq do_IRQ
         jmp ret_from_intr
@@ -454,13 +455,67 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
+        GET_CURRENT(%rbx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
+
+.Lcr4_pv32_orig:
+        jmp   .Lcr4_pv32_done
+        .skip (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt) - (. - .Lcr4_pv32_orig), 0xcc
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_pv32_alt:
+        mov   VCPU_domain(%rbx),%rax
+.Lcr4_pv32_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMEP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMAP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        .popsection
+
+        testb $3,UREGS_cs(%rsp)
+        jz    .Lcr4_pv32_done
+        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
+        je    .Lcr4_pv32_done
+        call  cr4_pv32_restore
+        /*
+         * An NMI or #MC may occur between clearing CR4.SMEP / CR4.SMAP in
+         * compat_restore_all_guest and it actually returning to guest
+         * context, in which case the guest would run with the two features
+         * enabled. The only bad that can happen from this is a kernel mode
+         * #PF which the guest doesn't expect. Rather than trying to make the
+         * NMI/#MC exit path honor the intended CR4 setting, simply check
+         * whether the wrong CR4 was in use when the #PF occurred, and exit
+         * back to the guest (which will in turn clear the two CR4 bits) to
+         * re-execute the instruction. If we get back here, the CR4 bits
+         * should then be found clear (unless another NMI/#MC occurred at
+         * exactly the right time), and we'll continue processing the
+         * exception as normal.
+         */
+        test  %rax,%rax
+        jnz   .Lcr4_pv32_done
+        /*
+         * The below effectively is
+         * if ( regs->entry_vector == TRAP_page_fault &&
+         *      (regs->error_code & PFEC_page_present) &&
+         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
+         *     goto compat_test_all_events;
+         */
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   .Lcr4_pv32_done
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+.Lcr4_pv32_done:
         sti
 1:      movq  %rsp,%rdi
         movzbl UREGS_entry_vector(%rsp),%eax
         leaq  exception_table(%rip),%rdx
-        GET_CURRENT(%rbx)
         PERFC_INCR(exceptions, %rax, %rbx)
         callq *(%rdx,%rax,8)
         testb $3,UREGS_cs(%rsp)
@@ -592,6 +647,7 @@ handle_ist_exception:
         SAVE_ALL CLAC
         testb $3,UREGS_cs(%rsp)
         jz    1f
+        CR4_PV32_RESTORE
         /* Interrupted guest context. Copy the context to stack bottom. */
         GET_CPUINFO_FIELD(guest_cpu_user_regs,%rdi)
         movq  %rsp,%rsi
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -209,6 +209,16 @@ void ret_from_intr(void);
 
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
+
+#define CR4_PV32_RESTORE                                           \
+        667: ASM_NOP5;                                             \
+        .pushsection .altinstr_replacement, "ax";                  \
+        668: call cr4_pv32_restore;                                \
+        .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
+        .popsection
+
 #else
 static always_inline void clac(void)
 {
@@ -308,14 +318,18 @@ static always_inline void stac(void)
  *
  * For the way it is used in RESTORE_ALL, this macro must preserve EFLAGS.ZF.
  */
-.macro LOAD_C_CLOBBERED compat=0
+.macro LOAD_C_CLOBBERED compat=0 ax=1
 .if !\compat
         movq  UREGS_r11(%rsp),%r11
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.endif
+.if \ax
         movq  UREGS_rax(%rsp),%rax
+.endif
+.elseif \ax
+        movl  UREGS_rax(%rsp),%eax
+.endif
         movq  UREGS_rcx(%rsp),%rcx
         movq  UREGS_rdx(%rsp),%rdx
         movq  UREGS_rsi(%rsp),%rsi
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -134,12 +134,12 @@
 #define TF_kernel_mode         (1<<_TF_kernel_mode)
 
 /* #PF error code values. */
-#define PFEC_page_present   (1U<<0)
-#define PFEC_write_access   (1U<<1)
-#define PFEC_user_mode      (1U<<2)
-#define PFEC_reserved_bit   (1U<<3)
-#define PFEC_insn_fetch     (1U<<4)
-#define PFEC_prot_key       (1U<<5)
+#define PFEC_page_present   (_AC(1,U) << 0)
+#define PFEC_write_access   (_AC(1,U) << 1)
+#define PFEC_user_mode      (_AC(1,U) << 2)
+#define PFEC_reserved_bit   (_AC(1,U) << 3)
+#define PFEC_insn_fetch     (_AC(1,U) << 4)
+#define PFEC_prot_key       (_AC(1,U) << 5)
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
 #define PFEC_page_shared    (1U<<17)

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 2/3] x86: use optimal NOPs to fill the SMEP/SMAP placeholders
  2016-03-10  9:44 ` [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
  2016-03-10  9:53   ` [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
@ 2016-03-10  9:54   ` Jan Beulich
  2016-05-13 15:49     ` Andrew Cooper
  2016-03-10  9:55   ` [PATCH v2 3/3] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
       [not found]   ` <56E9A0DB02000078000DD54C@prv-mh.provo.novell.com>
  3 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-10  9:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 2305 bytes --]

Alternatives patching code picks the most suitable NOPs for the
running system, so simply use it to replace the pre-populated ones.

Use an arbitrary, always available feature to key off from, but
hide this behind the new X86_FEATURE_ALWAYS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Introduce and use X86_FEATURE_ALWAYS.

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -185,6 +185,7 @@ ENTRY(compat_restore_all_guest)
         mov   %rax, %cr4
 .Lcr4_alt_end:
         .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_orig, X86_FEATURE_ALWAYS, 12, 0
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
                              (.Lcr4_alt_end - .Lcr4_alt)
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -204,6 +204,7 @@ void ret_from_intr(void);
         662: __ASM_##op;                                               \
         .popsection;                                                   \
         .pushsection .altinstructions, "a";                            \
+        altinstruction_entry 661b, 661b, X86_FEATURE_ALWAYS, 3, 0;     \
         altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3;       \
         .popsection
 
@@ -215,6 +216,7 @@ void ret_from_intr(void);
         .pushsection .altinstr_replacement, "ax";                  \
         668: call cr4_pv32_restore;                                \
         .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 667b, X86_FEATURE_ALWAYS, 5, 0; \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
         .popsection
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -162,6 +162,9 @@
 #define cpufeat_bit(idx)	((idx) % 32)
 #define cpufeat_mask(idx)	(_AC(1, U) << cpufeat_bit(idx))
 
+/* An alias of a feature we know is always going to be present. */
+#define X86_FEATURE_ALWAYS      X86_FEATURE_LM
+
 #if !defined(__ASSEMBLY__) && !defined(X86_FEATURES_ONLY)
 #include <xen/bitops.h>
 




[-- Attachment #2: x86-SMEP-SMAP-NOPs.patch --]
[-- Type: text/plain, Size: 2359 bytes --]

x86: use optimal NOPs to fill the SMEP/SMAP placeholders

Alternatives patching code picks the most suitable NOPs for the
running system, so simply use it to replace the pre-populated ones.

Use an arbitrary, always available feature to key off from, but
hide this behind the new X86_FEATURE_ALWAYS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Introduce and use X86_FEATURE_ALWAYS.

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -185,6 +185,7 @@ ENTRY(compat_restore_all_guest)
         mov   %rax, %cr4
 .Lcr4_alt_end:
         .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_orig, X86_FEATURE_ALWAYS, 12, 0
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
                              (.Lcr4_alt_end - .Lcr4_alt)
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -204,6 +204,7 @@ void ret_from_intr(void);
         662: __ASM_##op;                                               \
         .popsection;                                                   \
         .pushsection .altinstructions, "a";                            \
+        altinstruction_entry 661b, 661b, X86_FEATURE_ALWAYS, 3, 0;     \
         altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3;       \
         .popsection
 
@@ -215,6 +216,7 @@ void ret_from_intr(void);
         .pushsection .altinstr_replacement, "ax";                  \
         668: call cr4_pv32_restore;                                \
         .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 667b, X86_FEATURE_ALWAYS, 5, 0; \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
         .popsection
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -162,6 +162,9 @@
 #define cpufeat_bit(idx)	((idx) % 32)
 #define cpufeat_mask(idx)	(_AC(1, U) << cpufeat_bit(idx))
 
+/* An alias of a feature we know is always going to be present. */
+#define X86_FEATURE_ALWAYS      X86_FEATURE_LM
+
 #if !defined(__ASSEMBLY__) && !defined(X86_FEATURES_ONLY)
 #include <xen/bitops.h>
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 3/3] x86: use 32-bit loads for 32-bit PV guest state reload
  2016-03-10  9:44 ` [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
  2016-03-10  9:53   ` [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
  2016-03-10  9:54   ` [PATCH v2 2/3] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
@ 2016-03-10  9:55   ` Jan Beulich
       [not found]   ` <56E9A0DB02000078000DD54C@prv-mh.provo.novell.com>
  3 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-10  9:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

This is slightly more efficient than loading 64-bit quantities.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -313,6 +313,13 @@ static always_inline void stac(void)
 987:
 .endm
 
+#define LOAD_ONE_REG(reg, compat) \
+.if !(compat); \
+        movq  UREGS_r##reg(%rsp),%r##reg; \
+.else; \
+        movl  UREGS_r##reg(%rsp),%e##reg; \
+.endif
+
 /*
  * Reload registers not preserved by C code from frame.
  *
@@ -326,16 +333,14 @@ static always_inline void stac(void)
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.if \ax
-        movq  UREGS_rax(%rsp),%rax
 .endif
-.elseif \ax
-        movl  UREGS_rax(%rsp),%eax
+.if \ax
+        LOAD_ONE_REG(ax, \compat)
 .endif
-        movq  UREGS_rcx(%rsp),%rcx
-        movq  UREGS_rdx(%rsp),%rdx
-        movq  UREGS_rsi(%rsp),%rsi
-        movq  UREGS_rdi(%rsp),%rdi
+        LOAD_ONE_REG(cx, \compat)
+        LOAD_ONE_REG(dx, \compat)
+        LOAD_ONE_REG(si, \compat)
+        LOAD_ONE_REG(di, \compat)
 .endm
 
 /*
@@ -372,8 +377,9 @@ static always_inline void stac(void)
         .subsection 0
 #endif
 .endif
-987:    movq  UREGS_rbp(%rsp),%rbp
-        movq  UREGS_rbx(%rsp),%rbx
+987:
+        LOAD_ONE_REG(bp, \compat)
+        LOAD_ONE_REG(bx, \compat)
         subq  $-(UREGS_error_code-UREGS_r15+\adj), %rsp
 .endm
 




[-- Attachment #2: x86-32on64-load-low.patch --]
[-- Type: text/plain, Size: 1585 bytes --]

x86: use 32-bit loads for 32-bit PV guest state reload

This is slightly more efficient than loading 64-bit quantities.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -313,6 +313,13 @@ static always_inline void stac(void)
 987:
 .endm
 
+#define LOAD_ONE_REG(reg, compat) \
+.if !(compat); \
+        movq  UREGS_r##reg(%rsp),%r##reg; \
+.else; \
+        movl  UREGS_r##reg(%rsp),%e##reg; \
+.endif
+
 /*
  * Reload registers not preserved by C code from frame.
  *
@@ -326,16 +333,14 @@ static always_inline void stac(void)
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.if \ax
-        movq  UREGS_rax(%rsp),%rax
 .endif
-.elseif \ax
-        movl  UREGS_rax(%rsp),%eax
+.if \ax
+        LOAD_ONE_REG(ax, \compat)
 .endif
-        movq  UREGS_rcx(%rsp),%rcx
-        movq  UREGS_rdx(%rsp),%rdx
-        movq  UREGS_rsi(%rsp),%rsi
-        movq  UREGS_rdi(%rsp),%rdi
+        LOAD_ONE_REG(cx, \compat)
+        LOAD_ONE_REG(dx, \compat)
+        LOAD_ONE_REG(si, \compat)
+        LOAD_ONE_REG(di, \compat)
 .endm
 
 /*
@@ -372,8 +377,9 @@ static always_inline void stac(void)
         .subsection 0
 #endif
 .endif
-987:    movq  UREGS_rbp(%rsp),%rbp
-        movq  UREGS_rbx(%rsp),%rbx
+987:
+        LOAD_ONE_REG(bp, \compat)
+        LOAD_ONE_REG(bx, \compat)
         subq  $-(UREGS_error_code-UREGS_r15+\adj), %rsp
 .endm
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling
       [not found]   ` <56E9A0DB02000078000DD54C@prv-mh.provo.novell.com>
@ 2016-03-17  7:50     ` Jan Beulich
  2016-03-17  8:02       ` [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info Jan Beulich
                         ` (7 more replies)
  0 siblings, 8 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-17  7:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

As has been explained previously[1], SMAP (and with less relevance
also SMEP) is not compatible with 32-bit PV guests which aren't
aware/prepared to be run with that feature enabled. Andrew's
original approach either sacrificed architectural correctness for
making 32-bit guests work again, or disabled SMAP also for not
insignificant portions of hypervisor code, both by allowing to control
the workaround mode via command line option.

This alternative approach disables SMEP and SMAP only while
running 32-bit PV guest code plus a few hypervisor instructions
early after entering hypervisor context from or late before exiting
to such guests. Those few instructions (in assembly source) are of
course much easier to prove not to perform undue memory
accesses than code paths reaching deep into C sources.

The 4th patch really is unrelated except for not applying cleanly
without the earlier ones, and the potential having been noticed
while putting together the 2nd one.

1: move cached CR4 value to struct cpu_info
2: suppress SMEP and SMAP while running 32-bit PV guest code
3: use optimal NOPs to fill the SMEP/SMAP placeholders
4: use 32-bit loads for 32-bit PV guest state reload

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New patch 1, as a prereq to changes in patch 2 (previously
     1). The primary reason for this are performance issues that
     have been found by Andrew with the previous version.
v2: Various changes to patches 1 and 2 - see there.

[1] http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg03888.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
@ 2016-03-17  8:02       ` Jan Beulich
  2016-03-17 16:20         ` Andrew Cooper
  2016-03-17  8:03       ` [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
                         ` (6 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-17  8:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

This not only eases using the cached value in assembly code, but also
improves the generated code resulting from such reads in C.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -64,7 +64,6 @@
 #include <asm/psr.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
-DEFINE_PER_CPU(unsigned long, cr4);
 
 static void default_idle(void);
 void (*pm_idle) (void) __read_mostly = default_idle;
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -645,7 +645,7 @@ void __init noreturn __start_xen(unsigne
     parse_video_info();
 
     rdmsrl(MSR_EFER, this_cpu(efer));
-    asm volatile ( "mov %%cr4,%0" : "=r" (this_cpu(cr4)) );
+    asm volatile ( "mov %%cr4,%0" : "=r" (get_cpu_info()->cr4) );
 
     /* We initialise the serial devices very early so we can get debugging. */
     ns16550.io_base = 0x3f8;
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -41,8 +41,8 @@ struct cpu_info {
     unsigned int processor_id;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
+    unsigned long cr4;
     /* get_stack_bottom() must be 16-byte aligned */
-    unsigned long __pad_for_stack_bottom;
 };
 
 static inline struct cpu_info *get_cpu_info(void)
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -328,8 +328,6 @@ static inline unsigned long read_cr2(voi
     return cr2;
 }
 
-DECLARE_PER_CPU(unsigned long, cr4);
-
 static inline void raw_write_cr4(unsigned long val)
 {
     asm volatile ( "mov %0,%%cr4" : : "r" (val) );
@@ -337,12 +335,12 @@ static inline void raw_write_cr4(unsigne
 
 static inline unsigned long read_cr4(void)
 {
-    return this_cpu(cr4);
+    return get_cpu_info()->cr4;
 }
 
 static inline void write_cr4(unsigned long val)
 {
-    this_cpu(cr4) = val;
+    get_cpu_info()->cr4 = val;
     raw_write_cr4(val);
 }
 




[-- Attachment #2: x86-CR4-cpu_info.patch --]
[-- Type: text/plain, Size: 1989 bytes --]

x86: move cached CR4 value to struct cpu_info

This not only eases using the cached value in assembly code, but also
improves the generated code resulting from such reads in C.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -64,7 +64,6 @@
 #include <asm/psr.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
-DEFINE_PER_CPU(unsigned long, cr4);
 
 static void default_idle(void);
 void (*pm_idle) (void) __read_mostly = default_idle;
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -645,7 +645,7 @@ void __init noreturn __start_xen(unsigne
     parse_video_info();
 
     rdmsrl(MSR_EFER, this_cpu(efer));
-    asm volatile ( "mov %%cr4,%0" : "=r" (this_cpu(cr4)) );
+    asm volatile ( "mov %%cr4,%0" : "=r" (get_cpu_info()->cr4) );
 
     /* We initialise the serial devices very early so we can get debugging. */
     ns16550.io_base = 0x3f8;
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -41,8 +41,8 @@ struct cpu_info {
     unsigned int processor_id;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
+    unsigned long cr4;
     /* get_stack_bottom() must be 16-byte aligned */
-    unsigned long __pad_for_stack_bottom;
 };
 
 static inline struct cpu_info *get_cpu_info(void)
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -328,8 +328,6 @@ static inline unsigned long read_cr2(voi
     return cr2;
 }
 
-DECLARE_PER_CPU(unsigned long, cr4);
-
 static inline void raw_write_cr4(unsigned long val)
 {
     asm volatile ( "mov %0,%%cr4" : : "r" (val) );
@@ -337,12 +335,12 @@ static inline void raw_write_cr4(unsigne
 
 static inline unsigned long read_cr4(void)
 {
-    return this_cpu(cr4);
+    return get_cpu_info()->cr4;
 }
 
 static inline void write_cr4(unsigned long val)
 {
-    this_cpu(cr4) = val;
+    get_cpu_info()->cr4 = val;
     raw_write_cr4(val);
 }
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
  2016-03-17  8:02       ` [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info Jan Beulich
@ 2016-03-17  8:03       ` Jan Beulich
  2016-03-25 18:01         ` Konrad Rzeszutek Wilk
  2016-05-13 15:58         ` Andrew Cooper
  2016-03-17  8:03       ` [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
                         ` (5 subsequent siblings)
  7 siblings, 2 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-17  8:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 13444 bytes --]

Since such guests' kernel code runs in ring 1, their memory accesses,
at the paging layer, are supervisor mode ones, and hence subject to
SMAP/SMEP checks. Such guests cannot be expected to be aware of those
two features though (and so far we also don't expose the respective
feature flags), and hence may suffer page faults they cannot deal with.

While the placement of the re-enabling slightly weakens the intended
protection, it was selected such that 64-bit paths would remain
unaffected where possible. At the expense of a further performance hit
the re-enabling could be put right next to the CLACs.

Note that this introduces a number of extra TLB flushes - CR4.SMEP
transitioning from 0 to 1 always causes a flush, and it transitioning
from 1 to 0 may also do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Move up CR4_PV32_RESTORE in handle_ist_exception. Don't clear SMEP/
    SMAP upon exiting to guest user mode. Read current CR4 value from
    memory cache even in assembly code.
v2: Use more generic symbol/label names. Comment the BUG in assembly
    code and restrict it to debug builds. Add C equivalent to #PF
    re-execution condition in a comment. Use .skip instead of .org in
    handle_exception to avoid gas bug (and its slightly ugly
    workaround). Use a properly named label instead of a numeric one
    in handle_exception.

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
 static bool_t __initdata opt_smap = 1;
 boolean_param("smap", opt_smap);
 
+unsigned long __read_mostly cr4_pv32_mask;
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1364,6 +1366,8 @@ void __init noreturn __start_xen(unsigne
     if ( cpu_has_smap )
         set_in_cr4(X86_CR4_SMAP);
 
+    cr4_pv32_mask = mmu_cr4_features & (X86_CR4_SMEP | X86_CR4_SMAP);
+
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
@@ -1500,7 +1504,10 @@ void __init noreturn __start_xen(unsigne
      * copy_from_user().
      */
     if ( cpu_has_smap )
+    {
+        cr4_pv32_mask &= ~X86_CR4_SMAP;
         write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
 
     printk("%sNX (Execute Disable) protection %sactive\n",
            cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
@@ -1517,7 +1524,10 @@ void __init noreturn __start_xen(unsigne
         panic("Could not set up DOM0 guest OS");
 
     if ( cpu_has_smap )
+    {
         write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_pv32_mask |= X86_CR4_SMAP;
+    }
 
     /* Scrub RAM that is still free and so may go to an unprivileged domain. */
     scrub_heap_pages();
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -135,6 +135,7 @@ void __dummy__(void)
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
+    OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
     BLANK();
 
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -16,14 +16,16 @@ ENTRY(compat_hypercall)
         ASM_CLAC
         pushq $0
         SAVE_VOLATILE type=TRAP_syscall compat=1
+        CR4_PV32_RESTORE
 
         cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
         movl  $HYPERCALL_VECTOR,%edi
         call  check_for_unexpected_msi
-        LOAD_C_CLOBBERED
+        LOAD_C_CLOBBERED compat=1 ax=0
 UNLIKELY_END(msi_check)
 
+        movl  UREGS_rax(%rsp),%eax
         GET_CURRENT(%rbx)
 
         cmpl  $NR_hypercalls,%eax
@@ -33,7 +35,6 @@ UNLIKELY_END(msi_check)
         pushq UREGS_rbx(%rsp); pushq %rcx; pushq %rdx; pushq %rsi; pushq %rdi
         pushq UREGS_rbp+5*8(%rsp)
         leaq  compat_hypercall_args_table(%rip),%r10
-        movl  %eax,%eax
         movl  $6,%ecx
         subb  (%r10,%rax,1),%cl
         movq  %rsp,%rdi
@@ -48,7 +49,6 @@ UNLIKELY_END(msi_check)
 #define SHADOW_BYTES 16 /* Shadow EIP + shadow hypercall # */
 #else
         /* Relocate argument registers and zero-extend to 64 bits. */
-        movl  %eax,%eax              /* Hypercall #  */
         xchgl %ecx,%esi              /* Arg 2, Arg 4 */
         movl  %edx,%edx              /* Arg 3        */
         movl  %edi,%r8d              /* Arg 5        */
@@ -174,10 +174,61 @@ compat_bad_hypercall:
 /* %rbx: struct vcpu, interrupts disabled */
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
+.Lcr4_orig:
+        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
+        ASM_NOP2 /* jpe   .Lcr4_alt_end */
+        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
+        ASM_NOP6 /* and   $..., %rax */
+        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
+        ASM_NOP3 /* mov   %rax, %cr4 */
+.Lcr4_orig_end:
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_alt:
+        testb $3,UREGS_cs(%rsp)
+        jpe   .Lcr4_alt_end
+        mov   CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
+        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
+        mov   %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
+        mov   %rax, %cr4
+.Lcr4_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, \
+                             (.Lcr4_orig_end - .Lcr4_orig), \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, \
+                             (.Lcr4_orig_end - .Lcr4_orig), \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        .popsection
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
         _ASM_PRE_EXTABLE(.Lft0, handle_exception)
 
+/* This mustn't modify registers other than %rax. */
+ENTRY(cr4_pv32_restore)
+        push  %rdx
+        GET_CPUINFO_FIELD(cr4, %rdx)
+        mov   (%rdx), %rax
+        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
+        jnz   0f
+        or    cr4_pv32_mask(%rip), %rax
+        mov   %rax, %cr4
+        mov   %rax, (%rdx)
+        pop   %rdx
+        ret
+0:
+#ifndef NDEBUG
+        /* Check that _all_ of the bits intended to be set actually are. */
+        mov   %cr4, %rax
+        and   cr4_pv32_mask(%rip), %eax
+        cmp   cr4_pv32_mask(%rip), %eax
+        je    1f
+        BUG
+1:
+#endif
+        pop   %rdx
+        xor   %eax, %eax
+        ret
+
 /* %rdx: trap_bounce, %rbx: struct vcpu */
 ENTRY(compat_post_handle_exception)
         testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
@@ -190,6 +241,7 @@ ENTRY(compat_post_handle_exception)
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         sti
+        CR4_PV32_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
         pushq %r11
@@ -225,6 +277,7 @@ UNLIKELY_END(compat_syscall_gpf)
         jmp   .Lcompat_bounce_exception
 
 ENTRY(compat_sysenter)
+        CR4_PV32_RESTORE
         movq  VCPU_trap_ctxt(%rbx),%rcx
         cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
         movzwl VCPU_sysenter_sel(%rbx),%eax
@@ -238,6 +291,7 @@ ENTRY(compat_sysenter)
         jmp   compat_test_all_events
 
 ENTRY(compat_int80_direct_trap)
+        CR4_PV32_RESTORE
         call  compat_create_bounce_frame
         jmp   compat_test_all_events
 
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
 
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
+        CR4_PV32_RESTORE
         movq %rsp,%rdi
         callq do_IRQ
         jmp ret_from_intr
@@ -454,13 +455,67 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
+        GET_CURRENT(%rbx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
+
+.Lcr4_pv32_orig:
+        jmp   .Lcr4_pv32_done
+        .skip (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt) - (. - .Lcr4_pv32_orig), 0xcc
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_pv32_alt:
+        mov   VCPU_domain(%rbx),%rax
+.Lcr4_pv32_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMEP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMAP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        .popsection
+
+        testb $3,UREGS_cs(%rsp)
+        jz    .Lcr4_pv32_done
+        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
+        je    .Lcr4_pv32_done
+        call  cr4_pv32_restore
+        /*
+         * An NMI or #MC may occur between clearing CR4.SMEP / CR4.SMAP in
+         * compat_restore_all_guest and it actually returning to guest
+         * context, in which case the guest would run with the two features
+         * enabled. The only bad that can happen from this is a kernel mode
+         * #PF which the guest doesn't expect. Rather than trying to make the
+         * NMI/#MC exit path honor the intended CR4 setting, simply check
+         * whether the wrong CR4 was in use when the #PF occurred, and exit
+         * back to the guest (which will in turn clear the two CR4 bits) to
+         * re-execute the instruction. If we get back here, the CR4 bits
+         * should then be found clear (unless another NMI/#MC occurred at
+         * exactly the right time), and we'll continue processing the
+         * exception as normal.
+         */
+        test  %rax,%rax
+        jnz   .Lcr4_pv32_done
+        /*
+         * The below effectively is
+         * if ( regs->entry_vector == TRAP_page_fault &&
+         *      (regs->error_code & PFEC_page_present) &&
+         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
+         *     goto compat_test_all_events;
+         */
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   .Lcr4_pv32_done
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+.Lcr4_pv32_done:
         sti
 1:      movq  %rsp,%rdi
         movzbl UREGS_entry_vector(%rsp),%eax
         leaq  exception_table(%rip),%rdx
-        GET_CURRENT(%rbx)
         PERFC_INCR(exceptions, %rax, %rbx)
         callq *(%rdx,%rax,8)
         testb $3,UREGS_cs(%rsp)
@@ -590,6 +645,7 @@ ENTRY(nmi)
         movl  $TRAP_nmi,4(%rsp)
 handle_ist_exception:
         SAVE_ALL CLAC
+        CR4_PV32_RESTORE
         testb $3,UREGS_cs(%rsp)
         jz    1f
         /* Interrupted guest context. Copy the context to stack bottom. */
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -209,6 +209,16 @@ void ret_from_intr(void);
 
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
+
+#define CR4_PV32_RESTORE                                           \
+        667: ASM_NOP5;                                             \
+        .pushsection .altinstr_replacement, "ax";                  \
+        668: call cr4_pv32_restore;                                \
+        .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
+        .popsection
+
 #else
 static always_inline void clac(void)
 {
@@ -308,14 +318,18 @@ static always_inline void stac(void)
  *
  * For the way it is used in RESTORE_ALL, this macro must preserve EFLAGS.ZF.
  */
-.macro LOAD_C_CLOBBERED compat=0
+.macro LOAD_C_CLOBBERED compat=0 ax=1
 .if !\compat
         movq  UREGS_r11(%rsp),%r11
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.endif
+.if \ax
         movq  UREGS_rax(%rsp),%rax
+.endif
+.elseif \ax
+        movl  UREGS_rax(%rsp),%eax
+.endif
         movq  UREGS_rcx(%rsp),%rcx
         movq  UREGS_rdx(%rsp),%rdx
         movq  UREGS_rsi(%rsp),%rsi
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -134,12 +134,12 @@
 #define TF_kernel_mode         (1<<_TF_kernel_mode)
 
 /* #PF error code values. */
-#define PFEC_page_present   (1U<<0)
-#define PFEC_write_access   (1U<<1)
-#define PFEC_user_mode      (1U<<2)
-#define PFEC_reserved_bit   (1U<<3)
-#define PFEC_insn_fetch     (1U<<4)
-#define PFEC_prot_key       (1U<<5)
+#define PFEC_page_present   (_AC(1,U) << 0)
+#define PFEC_write_access   (_AC(1,U) << 1)
+#define PFEC_user_mode      (_AC(1,U) << 2)
+#define PFEC_reserved_bit   (_AC(1,U) << 3)
+#define PFEC_insn_fetch     (_AC(1,U) << 4)
+#define PFEC_prot_key       (_AC(1,U) << 5)
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
 #define PFEC_page_shared    (1U<<17)



[-- Attachment #2: x86-32on64-suppress-SMEP-SMAP.patch --]
[-- Type: text/plain, Size: 13506 bytes --]

x86: suppress SMEP and SMAP while running 32-bit PV guest code

Since such guests' kernel code runs in ring 1, their memory accesses,
at the paging layer, are supervisor mode ones, and hence subject to
SMAP/SMEP checks. Such guests cannot be expected to be aware of those
two features though (and so far we also don't expose the respective
feature flags), and hence may suffer page faults they cannot deal with.

While the placement of the re-enabling slightly weakens the intended
protection, it was selected such that 64-bit paths would remain
unaffected where possible. At the expense of a further performance hit
the re-enabling could be put right next to the CLACs.

Note that this introduces a number of extra TLB flushes - CR4.SMEP
transitioning from 0 to 1 always causes a flush, and it transitioning
from 1 to 0 may also do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Move up CR4_PV32_RESTORE in handle_ist_exception. Don't clear SMEP/
    SMAP upon exiting to guest user mode. Read current CR4 value from
    memory cache even in assembly code.
v2: Use more generic symbol/label names. Comment the BUG in assembly
    code and restrict it to debug builds. Add C equivalent to #PF
    re-execution condition in a comment. Use .skip instead of .org in
    handle_exception to avoid gas bug (and its slightly ugly
    workaround). Use a properly named label instead of a numeric one
    in handle_exception.

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
 static bool_t __initdata opt_smap = 1;
 boolean_param("smap", opt_smap);
 
+unsigned long __read_mostly cr4_pv32_mask;
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1364,6 +1366,8 @@ void __init noreturn __start_xen(unsigne
     if ( cpu_has_smap )
         set_in_cr4(X86_CR4_SMAP);
 
+    cr4_pv32_mask = mmu_cr4_features & (X86_CR4_SMEP | X86_CR4_SMAP);
+
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
@@ -1500,7 +1504,10 @@ void __init noreturn __start_xen(unsigne
      * copy_from_user().
      */
     if ( cpu_has_smap )
+    {
+        cr4_pv32_mask &= ~X86_CR4_SMAP;
         write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
 
     printk("%sNX (Execute Disable) protection %sactive\n",
            cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
@@ -1517,7 +1524,10 @@ void __init noreturn __start_xen(unsigne
         panic("Could not set up DOM0 guest OS");
 
     if ( cpu_has_smap )
+    {
         write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_pv32_mask |= X86_CR4_SMAP;
+    }
 
     /* Scrub RAM that is still free and so may go to an unprivileged domain. */
     scrub_heap_pages();
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -135,6 +135,7 @@ void __dummy__(void)
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
+    OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
     BLANK();
 
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -16,14 +16,16 @@ ENTRY(compat_hypercall)
         ASM_CLAC
         pushq $0
         SAVE_VOLATILE type=TRAP_syscall compat=1
+        CR4_PV32_RESTORE
 
         cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
         movl  $HYPERCALL_VECTOR,%edi
         call  check_for_unexpected_msi
-        LOAD_C_CLOBBERED
+        LOAD_C_CLOBBERED compat=1 ax=0
 UNLIKELY_END(msi_check)
 
+        movl  UREGS_rax(%rsp),%eax
         GET_CURRENT(%rbx)
 
         cmpl  $NR_hypercalls,%eax
@@ -33,7 +35,6 @@ UNLIKELY_END(msi_check)
         pushq UREGS_rbx(%rsp); pushq %rcx; pushq %rdx; pushq %rsi; pushq %rdi
         pushq UREGS_rbp+5*8(%rsp)
         leaq  compat_hypercall_args_table(%rip),%r10
-        movl  %eax,%eax
         movl  $6,%ecx
         subb  (%r10,%rax,1),%cl
         movq  %rsp,%rdi
@@ -48,7 +49,6 @@ UNLIKELY_END(msi_check)
 #define SHADOW_BYTES 16 /* Shadow EIP + shadow hypercall # */
 #else
         /* Relocate argument registers and zero-extend to 64 bits. */
-        movl  %eax,%eax              /* Hypercall #  */
         xchgl %ecx,%esi              /* Arg 2, Arg 4 */
         movl  %edx,%edx              /* Arg 3        */
         movl  %edi,%r8d              /* Arg 5        */
@@ -174,10 +174,61 @@ compat_bad_hypercall:
 /* %rbx: struct vcpu, interrupts disabled */
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
+.Lcr4_orig:
+        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
+        ASM_NOP2 /* jpe   .Lcr4_alt_end */
+        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
+        ASM_NOP6 /* and   $..., %rax */
+        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
+        ASM_NOP3 /* mov   %rax, %cr4 */
+.Lcr4_orig_end:
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_alt:
+        testb $3,UREGS_cs(%rsp)
+        jpe   .Lcr4_alt_end
+        mov   CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
+        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
+        mov   %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
+        mov   %rax, %cr4
+.Lcr4_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, \
+                             (.Lcr4_orig_end - .Lcr4_orig), \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, \
+                             (.Lcr4_orig_end - .Lcr4_orig), \
+                             (.Lcr4_alt_end - .Lcr4_alt)
+        .popsection
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
         _ASM_PRE_EXTABLE(.Lft0, handle_exception)
 
+/* This mustn't modify registers other than %rax. */
+ENTRY(cr4_pv32_restore)
+        push  %rdx
+        GET_CPUINFO_FIELD(cr4, %rdx)
+        mov   (%rdx), %rax
+        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
+        jnz   0f
+        or    cr4_pv32_mask(%rip), %rax
+        mov   %rax, %cr4
+        mov   %rax, (%rdx)
+        pop   %rdx
+        ret
+0:
+#ifndef NDEBUG
+        /* Check that _all_ of the bits intended to be set actually are. */
+        mov   %cr4, %rax
+        and   cr4_pv32_mask(%rip), %eax
+        cmp   cr4_pv32_mask(%rip), %eax
+        je    1f
+        BUG
+1:
+#endif
+        pop   %rdx
+        xor   %eax, %eax
+        ret
+
 /* %rdx: trap_bounce, %rbx: struct vcpu */
 ENTRY(compat_post_handle_exception)
         testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
@@ -190,6 +241,7 @@ ENTRY(compat_post_handle_exception)
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         sti
+        CR4_PV32_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
         pushq %r11
@@ -225,6 +277,7 @@ UNLIKELY_END(compat_syscall_gpf)
         jmp   .Lcompat_bounce_exception
 
 ENTRY(compat_sysenter)
+        CR4_PV32_RESTORE
         movq  VCPU_trap_ctxt(%rbx),%rcx
         cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
         movzwl VCPU_sysenter_sel(%rbx),%eax
@@ -238,6 +291,7 @@ ENTRY(compat_sysenter)
         jmp   compat_test_all_events
 
 ENTRY(compat_int80_direct_trap)
+        CR4_PV32_RESTORE
         call  compat_create_bounce_frame
         jmp   compat_test_all_events
 
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
 
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
+        CR4_PV32_RESTORE
         movq %rsp,%rdi
         callq do_IRQ
         jmp ret_from_intr
@@ -454,13 +455,67 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
+        GET_CURRENT(%rbx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
+
+.Lcr4_pv32_orig:
+        jmp   .Lcr4_pv32_done
+        .skip (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt) - (. - .Lcr4_pv32_orig), 0xcc
+        .pushsection .altinstr_replacement, "ax"
+.Lcr4_pv32_alt:
+        mov   VCPU_domain(%rbx),%rax
+.Lcr4_pv32_alt_end:
+        .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMEP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
+                             X86_FEATURE_SMAP, \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
+                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
+        .popsection
+
+        testb $3,UREGS_cs(%rsp)
+        jz    .Lcr4_pv32_done
+        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
+        je    .Lcr4_pv32_done
+        call  cr4_pv32_restore
+        /*
+         * An NMI or #MC may occur between clearing CR4.SMEP / CR4.SMAP in
+         * compat_restore_all_guest and it actually returning to guest
+         * context, in which case the guest would run with the two features
+         * enabled. The only bad that can happen from this is a kernel mode
+         * #PF which the guest doesn't expect. Rather than trying to make the
+         * NMI/#MC exit path honor the intended CR4 setting, simply check
+         * whether the wrong CR4 was in use when the #PF occurred, and exit
+         * back to the guest (which will in turn clear the two CR4 bits) to
+         * re-execute the instruction. If we get back here, the CR4 bits
+         * should then be found clear (unless another NMI/#MC occurred at
+         * exactly the right time), and we'll continue processing the
+         * exception as normal.
+         */
+        test  %rax,%rax
+        jnz   .Lcr4_pv32_done
+        /*
+         * The below effectively is
+         * if ( regs->entry_vector == TRAP_page_fault &&
+         *      (regs->error_code & PFEC_page_present) &&
+         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
+         *     goto compat_test_all_events;
+         */
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   .Lcr4_pv32_done
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+.Lcr4_pv32_done:
         sti
 1:      movq  %rsp,%rdi
         movzbl UREGS_entry_vector(%rsp),%eax
         leaq  exception_table(%rip),%rdx
-        GET_CURRENT(%rbx)
         PERFC_INCR(exceptions, %rax, %rbx)
         callq *(%rdx,%rax,8)
         testb $3,UREGS_cs(%rsp)
@@ -590,6 +645,7 @@ ENTRY(nmi)
         movl  $TRAP_nmi,4(%rsp)
 handle_ist_exception:
         SAVE_ALL CLAC
+        CR4_PV32_RESTORE
         testb $3,UREGS_cs(%rsp)
         jz    1f
         /* Interrupted guest context. Copy the context to stack bottom. */
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -209,6 +209,16 @@ void ret_from_intr(void);
 
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
+
+#define CR4_PV32_RESTORE                                           \
+        667: ASM_NOP5;                                             \
+        .pushsection .altinstr_replacement, "ax";                  \
+        668: call cr4_pv32_restore;                                \
+        .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
+        altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
+        .popsection
+
 #else
 static always_inline void clac(void)
 {
@@ -308,14 +318,18 @@ static always_inline void stac(void)
  *
  * For the way it is used in RESTORE_ALL, this macro must preserve EFLAGS.ZF.
  */
-.macro LOAD_C_CLOBBERED compat=0
+.macro LOAD_C_CLOBBERED compat=0 ax=1
 .if !\compat
         movq  UREGS_r11(%rsp),%r11
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.endif
+.if \ax
         movq  UREGS_rax(%rsp),%rax
+.endif
+.elseif \ax
+        movl  UREGS_rax(%rsp),%eax
+.endif
         movq  UREGS_rcx(%rsp),%rcx
         movq  UREGS_rdx(%rsp),%rdx
         movq  UREGS_rsi(%rsp),%rsi
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -134,12 +134,12 @@
 #define TF_kernel_mode         (1<<_TF_kernel_mode)
 
 /* #PF error code values. */
-#define PFEC_page_present   (1U<<0)
-#define PFEC_write_access   (1U<<1)
-#define PFEC_user_mode      (1U<<2)
-#define PFEC_reserved_bit   (1U<<3)
-#define PFEC_insn_fetch     (1U<<4)
-#define PFEC_prot_key       (1U<<5)
+#define PFEC_page_present   (_AC(1,U) << 0)
+#define PFEC_write_access   (_AC(1,U) << 1)
+#define PFEC_user_mode      (_AC(1,U) << 2)
+#define PFEC_reserved_bit   (_AC(1,U) << 3)
+#define PFEC_insn_fetch     (_AC(1,U) << 4)
+#define PFEC_prot_key       (_AC(1,U) << 5)
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
 #define PFEC_page_shared    (1U<<17)

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
  2016-03-17  8:02       ` [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info Jan Beulich
  2016-03-17  8:03       ` [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
@ 2016-03-17  8:03       ` Jan Beulich
  2016-05-13 15:57         ` Andrew Cooper
  2016-03-17  8:04       ` [PATCH v3 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
                         ` (4 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-17  8:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 2875 bytes --]

Alternatives patching code picks the most suitable NOPs for the
running system, so simply use it to replace the pre-populated ones.

Use an arbitrary, always available feature to key off from, but
hide this behind the new X86_FEATURE_ALWAYS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Re-base.
v2: Introduce and use X86_FEATURE_ALWAYS.

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -175,12 +175,7 @@ compat_bad_hypercall:
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
 .Lcr4_orig:
-        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
-        ASM_NOP2 /* jpe   .Lcr4_alt_end */
-        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
-        ASM_NOP6 /* and   $..., %rax */
-        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
-        ASM_NOP3 /* mov   %rax, %cr4 */
+        .skip (.Lcr4_alt_end - .Lcr4_alt) - (. - .Lcr4_orig), 0x90
 .Lcr4_orig_end:
         .pushsection .altinstr_replacement, "ax"
 .Lcr4_alt:
@@ -192,6 +187,7 @@ ENTRY(compat_restore_all_guest)
         mov   %rax, %cr4
 .Lcr4_alt_end:
         .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_orig, X86_FEATURE_ALWAYS, 12, 0
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, \
                              (.Lcr4_orig_end - .Lcr4_orig), \
                              (.Lcr4_alt_end - .Lcr4_alt)
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -204,6 +204,7 @@ void ret_from_intr(void);
         662: __ASM_##op;                                               \
         .popsection;                                                   \
         .pushsection .altinstructions, "a";                            \
+        altinstruction_entry 661b, 661b, X86_FEATURE_ALWAYS, 3, 0;     \
         altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3;       \
         .popsection
 
@@ -215,6 +216,7 @@ void ret_from_intr(void);
         .pushsection .altinstr_replacement, "ax";                  \
         668: call cr4_pv32_restore;                                \
         .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 667b, X86_FEATURE_ALWAYS, 5, 0; \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
         .popsection
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -162,6 +162,9 @@
 #define cpufeat_bit(idx)	((idx) % 32)
 #define cpufeat_mask(idx)	(_AC(1, U) << cpufeat_bit(idx))
 
+/* An alias of a feature we know is always going to be present. */
+#define X86_FEATURE_ALWAYS      X86_FEATURE_LM
+
 #if !defined(__ASSEMBLY__) && !defined(X86_FEATURES_ONLY)
 #include <xen/bitops.h>
 




[-- Attachment #2: x86-SMEP-SMAP-NOPs.patch --]
[-- Type: text/plain, Size: 2929 bytes --]

x86: use optimal NOPs to fill the SMEP/SMAP placeholders

Alternatives patching code picks the most suitable NOPs for the
running system, so simply use it to replace the pre-populated ones.

Use an arbitrary, always available feature to key off from, but
hide this behind the new X86_FEATURE_ALWAYS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Re-base.
v2: Introduce and use X86_FEATURE_ALWAYS.

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -175,12 +175,7 @@ compat_bad_hypercall:
 ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
 .Lcr4_orig:
-        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
-        ASM_NOP2 /* jpe   .Lcr4_alt_end */
-        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
-        ASM_NOP6 /* and   $..., %rax */
-        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
-        ASM_NOP3 /* mov   %rax, %cr4 */
+        .skip (.Lcr4_alt_end - .Lcr4_alt) - (. - .Lcr4_orig), 0x90
 .Lcr4_orig_end:
         .pushsection .altinstr_replacement, "ax"
 .Lcr4_alt:
@@ -192,6 +187,7 @@ ENTRY(compat_restore_all_guest)
         mov   %rax, %cr4
 .Lcr4_alt_end:
         .section .altinstructions, "a"
+        altinstruction_entry .Lcr4_orig, .Lcr4_orig, X86_FEATURE_ALWAYS, 12, 0
         altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, \
                              (.Lcr4_orig_end - .Lcr4_orig), \
                              (.Lcr4_alt_end - .Lcr4_alt)
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -204,6 +204,7 @@ void ret_from_intr(void);
         662: __ASM_##op;                                               \
         .popsection;                                                   \
         .pushsection .altinstructions, "a";                            \
+        altinstruction_entry 661b, 661b, X86_FEATURE_ALWAYS, 3, 0;     \
         altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3;       \
         .popsection
 
@@ -215,6 +216,7 @@ void ret_from_intr(void);
         .pushsection .altinstr_replacement, "ax";                  \
         668: call cr4_pv32_restore;                                \
         .section .altinstructions, "a";                            \
+        altinstruction_entry 667b, 667b, X86_FEATURE_ALWAYS, 5, 0; \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMEP, 5, 5;   \
         altinstruction_entry 667b, 668b, X86_FEATURE_SMAP, 5, 5;   \
         .popsection
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -162,6 +162,9 @@
 #define cpufeat_bit(idx)	((idx) % 32)
 #define cpufeat_mask(idx)	(_AC(1, U) << cpufeat_bit(idx))
 
+/* An alias of a feature we know is always going to be present. */
+#define X86_FEATURE_ALWAYS      X86_FEATURE_LM
+
 #if !defined(__ASSEMBLY__) && !defined(X86_FEATURES_ONLY)
 #include <xen/bitops.h>
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 4/4] x86: use 32-bit loads for 32-bit PV guest state reload
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
                         ` (2 preceding siblings ...)
  2016-03-17  8:03       ` [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
@ 2016-03-17  8:04       ` Jan Beulich
  2016-03-25 18:02         ` Konrad Rzeszutek Wilk
  2016-03-17 16:14       ` [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses Jan Beulich
                         ` (3 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-17  8:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

This is slightly more efficient than loading 64-bit quantities.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -313,6 +313,13 @@ static always_inline void stac(void)
 987:
 .endm
 
+#define LOAD_ONE_REG(reg, compat) \
+.if !(compat); \
+        movq  UREGS_r##reg(%rsp),%r##reg; \
+.else; \
+        movl  UREGS_r##reg(%rsp),%e##reg; \
+.endif
+
 /*
  * Reload registers not preserved by C code from frame.
  *
@@ -326,16 +333,14 @@ static always_inline void stac(void)
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.if \ax
-        movq  UREGS_rax(%rsp),%rax
 .endif
-.elseif \ax
-        movl  UREGS_rax(%rsp),%eax
+.if \ax
+        LOAD_ONE_REG(ax, \compat)
 .endif
-        movq  UREGS_rcx(%rsp),%rcx
-        movq  UREGS_rdx(%rsp),%rdx
-        movq  UREGS_rsi(%rsp),%rsi
-        movq  UREGS_rdi(%rsp),%rdi
+        LOAD_ONE_REG(cx, \compat)
+        LOAD_ONE_REG(dx, \compat)
+        LOAD_ONE_REG(si, \compat)
+        LOAD_ONE_REG(di, \compat)
 .endm
 
 /*
@@ -372,8 +377,9 @@ static always_inline void stac(void)
         .subsection 0
 #endif
 .endif
-987:    movq  UREGS_rbp(%rsp),%rbp
-        movq  UREGS_rbx(%rsp),%rbx
+987:
+        LOAD_ONE_REG(bp, \compat)
+        LOAD_ONE_REG(bx, \compat)
         subq  $-(UREGS_error_code-UREGS_r15+\adj), %rsp
 .endm
 




[-- Attachment #2: x86-32on64-load-low.patch --]
[-- Type: text/plain, Size: 1585 bytes --]

x86: use 32-bit loads for 32-bit PV guest state reload

This is slightly more efficient than loading 64-bit quantities.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -313,6 +313,13 @@ static always_inline void stac(void)
 987:
 .endm
 
+#define LOAD_ONE_REG(reg, compat) \
+.if !(compat); \
+        movq  UREGS_r##reg(%rsp),%r##reg; \
+.else; \
+        movl  UREGS_r##reg(%rsp),%e##reg; \
+.endif
+
 /*
  * Reload registers not preserved by C code from frame.
  *
@@ -326,16 +333,14 @@ static always_inline void stac(void)
         movq  UREGS_r10(%rsp),%r10
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
-.if \ax
-        movq  UREGS_rax(%rsp),%rax
 .endif
-.elseif \ax
-        movl  UREGS_rax(%rsp),%eax
+.if \ax
+        LOAD_ONE_REG(ax, \compat)
 .endif
-        movq  UREGS_rcx(%rsp),%rcx
-        movq  UREGS_rdx(%rsp),%rdx
-        movq  UREGS_rsi(%rsp),%rsi
-        movq  UREGS_rdi(%rsp),%rdi
+        LOAD_ONE_REG(cx, \compat)
+        LOAD_ONE_REG(dx, \compat)
+        LOAD_ONE_REG(si, \compat)
+        LOAD_ONE_REG(di, \compat)
 .endm
 
 /*
@@ -372,8 +377,9 @@ static always_inline void stac(void)
         .subsection 0
 #endif
 .endif
-987:    movq  UREGS_rbp(%rsp),%rbp
-        movq  UREGS_rbx(%rsp),%rbx
+987:
+        LOAD_ONE_REG(bp, \compat)
+        LOAD_ONE_REG(bx, \compat)
         subq  $-(UREGS_error_code-UREGS_r15+\adj), %rsp
 .endm
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
                         ` (3 preceding siblings ...)
  2016-03-17  8:04       ` [PATCH v3 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
@ 2016-03-17 16:14       ` Jan Beulich
  2016-03-25 18:47         ` Konrad Rzeszutek Wilk
  2016-05-13 16:11         ` Andrew Cooper
  2016-05-03 13:58       ` Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
                         ` (2 subsequent siblings)
  7 siblings, 2 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-17 16:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 6993 bytes --]

Instead of addressing these fields via the base of the stack (which
uniformly requires 4-byte displacements), address them from the end
(which for everything other than guest_cpu_user_regs requires just
1-byte ones). This yields a code size reduction somewhere between 8k
and 12k in my builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Note that just like patch 4 of the series this also isn't directly
related to the SMEP/SMAP issue, but is again just a result of things
realized while doing that work, and again depends on the earlier
patches to apply cleanly.

--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -31,7 +31,7 @@
 #define CLGI   .byte 0x0F,0x01,0xDD
 
 ENTRY(svm_asm_do_resume)
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
 .Lsvm_do_resume:
         call svm_intr_assist
         mov  %rsp,%rdi
@@ -97,7 +97,7 @@ UNLIKELY_END(svm_trace)
 
         VMRUN
 
-        GET_CURRENT(%rax)
+        GET_CURRENT(ax)
         push %rdi
         push %rsi
         push %rdx
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -40,7 +40,7 @@ ENTRY(vmx_asm_vmexit_handler)
         push %r10
         push %r11
         push %rbx
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         push %rbp
         push %r12
         push %r13
@@ -113,7 +113,7 @@ UNLIKELY_END(realmode)
         BUG  /* vmx_vmentry_failure() shouldn't return. */
 
 ENTRY(vmx_asm_do_vmentry)
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         jmp  .Lvmx_do_vmentry
 
 .Lvmx_goto_emulator:
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -26,7 +26,7 @@ UNLIKELY_START(ne, msi_check)
 UNLIKELY_END(msi_check)
 
         movl  UREGS_rax(%rsp),%eax
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
 
         cmpl  $NR_hypercalls,%eax
         jae   compat_bad_hypercall
@@ -202,7 +202,7 @@ ENTRY(compat_restore_all_guest)
 /* This mustn't modify registers other than %rax. */
 ENTRY(cr4_pv32_restore)
         push  %rdx
-        GET_CPUINFO_FIELD(cr4, %rdx)
+        GET_CPUINFO_FIELD(cr4, dx)
         mov   (%rdx), %rax
         test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
         jnz   0f
@@ -245,7 +245,7 @@ ENTRY(cstar_enter)
         pushq %rcx
         pushq $0
         SAVE_VOLATILE TRAP_syscall
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         movq  VCPU_domain(%rbx),%rcx
         cmpb  $0,DOMAIN_is_32bit_pv(%rcx)
         je    switch_to_kernel
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -97,7 +97,7 @@ ENTRY(lstar_enter)
         pushq %rcx
         pushq $0
         SAVE_VOLATILE TRAP_syscall
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
         jz    switch_to_kernel
 
@@ -246,7 +246,7 @@ GLOBAL(sysenter_eflags_saved)
         pushq $0 /* null rip */
         pushq $0
         SAVE_VOLATILE TRAP_syscall
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         cmpb  $0,VCPU_sysenter_disables_events(%rbx)
         movq  VCPU_sysenter_addr(%rbx),%rax
         setne %cl
@@ -288,7 +288,7 @@ UNLIKELY_START(ne, msi_check)
         call  check_for_unexpected_msi
 UNLIKELY_END(msi_check)
 
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
 
         /* Check that the callback is non-null. */
         leaq  VCPU_int80_bounce(%rbx),%rdx
@@ -420,10 +420,10 @@ domain_crash_page_fault:
         call  show_page_walk
 ENTRY(dom_crash_sync_extable)
         # Get out of the guest-save area of the stack.
-        GET_STACK_BASE(%rax)
+        GET_STACK_END(ax)
         leaq  STACK_CPUINFO_FIELD(guest_cpu_user_regs)(%rax),%rsp
         # create_bounce_frame() temporarily clobbers CS.RPL. Fix up.
-        __GET_CURRENT(%rax)
+        __GET_CURRENT(ax)
         movq  VCPU_domain(%rax),%rax
         testb $1,DOMAIN_is_32bit_pv(%rax)
         setz  %al
@@ -441,7 +441,7 @@ ENTRY(common_interrupt)
 
 /* No special register assumptions. */
 ENTRY(ret_from_intr)
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
         movq  VCPU_domain(%rbx),%rax
@@ -455,7 +455,7 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
 
@@ -649,7 +649,7 @@ handle_ist_exception:
         testb $3,UREGS_cs(%rsp)
         jz    1f
         /* Interrupted guest context. Copy the context to stack bottom. */
-        GET_CPUINFO_FIELD(guest_cpu_user_regs,%rdi)
+        GET_CPUINFO_FIELD(guest_cpu_user_regs,di)
         movq  %rsp,%rsi
         movl  $UREGS_kernel_sizeof/8,%ecx
         movq  %rdi,%rsp
@@ -664,7 +664,7 @@ handle_ist_exception:
         /* We want to get straight to the IRET on the NMI exit path. */
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         /* Send an IPI to ourselves to cover for the lack of event checking. */
         movl  VCPU_processor(%rbx),%eax
         shll  $IRQSTAT_shift,%eax
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -127,19 +127,19 @@ void ret_from_intr(void);
         UNLIKELY_DONE(mp, tag);   \
         __UNLIKELY_END(tag)
 
-#define STACK_CPUINFO_FIELD(field) (STACK_SIZE-CPUINFO_sizeof+CPUINFO_##field)
-#define GET_STACK_BASE(reg)                       \
-        movq $~(STACK_SIZE-1),reg;                \
-        andq %rsp,reg
+#define STACK_CPUINFO_FIELD(field) (1 - CPUINFO_sizeof + CPUINFO_##field)
+#define GET_STACK_END(reg)                        \
+        movl $STACK_SIZE-1, %e##reg;              \
+        orq  %rsp, %r##reg
 
 #define GET_CPUINFO_FIELD(field, reg)             \
-        GET_STACK_BASE(reg);                      \
-        addq $STACK_CPUINFO_FIELD(field),reg
+        GET_STACK_END(reg);                       \
+        addq $STACK_CPUINFO_FIELD(field), %r##reg
 
 #define __GET_CURRENT(reg)                        \
-        movq STACK_CPUINFO_FIELD(current_vcpu)(reg),reg
+        movq STACK_CPUINFO_FIELD(current_vcpu)(%r##reg), %r##reg
 #define GET_CURRENT(reg)                          \
-        GET_STACK_BASE(reg);                      \
+        GET_STACK_END(reg);                       \
         __GET_CURRENT(reg)
 
 #ifndef NDEBUG
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -55,7 +55,7 @@ static inline struct cpu_info *get_cpu_i
     register unsigned long sp asm("rsp");
 #endif
 
-    return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
+    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
 }
 
 #define get_current()         (get_cpu_info()->current_vcpu)



[-- Attachment #2: x86-gci-use-or.patch --]
[-- Type: text/plain, Size: 7049 bytes --]

x86: reduce code size of struct cpu_info member accesses

Instead of addressing these fields via the base of the stack (which
uniformly requires 4-byte displacements), address them from the end
(which for everything other than guest_cpu_user_regs requires just
1-byte ones). This yields a code size reduction somewhere between 8k
and 12k in my builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Note that just like patch 4 of the series this also isn't directly
related to the SMEP/SMAP issue, but is again just a result of things
realized while doing that work, and again depends on the earlier
patches to apply cleanly.

--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -31,7 +31,7 @@
 #define CLGI   .byte 0x0F,0x01,0xDD
 
 ENTRY(svm_asm_do_resume)
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
 .Lsvm_do_resume:
         call svm_intr_assist
         mov  %rsp,%rdi
@@ -97,7 +97,7 @@ UNLIKELY_END(svm_trace)
 
         VMRUN
 
-        GET_CURRENT(%rax)
+        GET_CURRENT(ax)
         push %rdi
         push %rsi
         push %rdx
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -40,7 +40,7 @@ ENTRY(vmx_asm_vmexit_handler)
         push %r10
         push %r11
         push %rbx
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         push %rbp
         push %r12
         push %r13
@@ -113,7 +113,7 @@ UNLIKELY_END(realmode)
         BUG  /* vmx_vmentry_failure() shouldn't return. */
 
 ENTRY(vmx_asm_do_vmentry)
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         jmp  .Lvmx_do_vmentry
 
 .Lvmx_goto_emulator:
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -26,7 +26,7 @@ UNLIKELY_START(ne, msi_check)
 UNLIKELY_END(msi_check)
 
         movl  UREGS_rax(%rsp),%eax
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
 
         cmpl  $NR_hypercalls,%eax
         jae   compat_bad_hypercall
@@ -202,7 +202,7 @@ ENTRY(compat_restore_all_guest)
 /* This mustn't modify registers other than %rax. */
 ENTRY(cr4_pv32_restore)
         push  %rdx
-        GET_CPUINFO_FIELD(cr4, %rdx)
+        GET_CPUINFO_FIELD(cr4, dx)
         mov   (%rdx), %rax
         test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
         jnz   0f
@@ -245,7 +245,7 @@ ENTRY(cstar_enter)
         pushq %rcx
         pushq $0
         SAVE_VOLATILE TRAP_syscall
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         movq  VCPU_domain(%rbx),%rcx
         cmpb  $0,DOMAIN_is_32bit_pv(%rcx)
         je    switch_to_kernel
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -97,7 +97,7 @@ ENTRY(lstar_enter)
         pushq %rcx
         pushq $0
         SAVE_VOLATILE TRAP_syscall
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
         jz    switch_to_kernel
 
@@ -246,7 +246,7 @@ GLOBAL(sysenter_eflags_saved)
         pushq $0 /* null rip */
         pushq $0
         SAVE_VOLATILE TRAP_syscall
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         cmpb  $0,VCPU_sysenter_disables_events(%rbx)
         movq  VCPU_sysenter_addr(%rbx),%rax
         setne %cl
@@ -288,7 +288,7 @@ UNLIKELY_START(ne, msi_check)
         call  check_for_unexpected_msi
 UNLIKELY_END(msi_check)
 
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
 
         /* Check that the callback is non-null. */
         leaq  VCPU_int80_bounce(%rbx),%rdx
@@ -420,10 +420,10 @@ domain_crash_page_fault:
         call  show_page_walk
 ENTRY(dom_crash_sync_extable)
         # Get out of the guest-save area of the stack.
-        GET_STACK_BASE(%rax)
+        GET_STACK_END(ax)
         leaq  STACK_CPUINFO_FIELD(guest_cpu_user_regs)(%rax),%rsp
         # create_bounce_frame() temporarily clobbers CS.RPL. Fix up.
-        __GET_CURRENT(%rax)
+        __GET_CURRENT(ax)
         movq  VCPU_domain(%rax),%rax
         testb $1,DOMAIN_is_32bit_pv(%rax)
         setz  %al
@@ -441,7 +441,7 @@ ENTRY(common_interrupt)
 
 /* No special register assumptions. */
 ENTRY(ret_from_intr)
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
         movq  VCPU_domain(%rbx),%rax
@@ -455,7 +455,7 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 handle_exception_saved:
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
 
@@ -649,7 +649,7 @@ handle_ist_exception:
         testb $3,UREGS_cs(%rsp)
         jz    1f
         /* Interrupted guest context. Copy the context to stack bottom. */
-        GET_CPUINFO_FIELD(guest_cpu_user_regs,%rdi)
+        GET_CPUINFO_FIELD(guest_cpu_user_regs,di)
         movq  %rsp,%rsi
         movl  $UREGS_kernel_sizeof/8,%ecx
         movq  %rdi,%rsp
@@ -664,7 +664,7 @@ handle_ist_exception:
         /* We want to get straight to the IRET on the NMI exit path. */
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        GET_CURRENT(%rbx)
+        GET_CURRENT(bx)
         /* Send an IPI to ourselves to cover for the lack of event checking. */
         movl  VCPU_processor(%rbx),%eax
         shll  $IRQSTAT_shift,%eax
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -127,19 +127,19 @@ void ret_from_intr(void);
         UNLIKELY_DONE(mp, tag);   \
         __UNLIKELY_END(tag)
 
-#define STACK_CPUINFO_FIELD(field) (STACK_SIZE-CPUINFO_sizeof+CPUINFO_##field)
-#define GET_STACK_BASE(reg)                       \
-        movq $~(STACK_SIZE-1),reg;                \
-        andq %rsp,reg
+#define STACK_CPUINFO_FIELD(field) (1 - CPUINFO_sizeof + CPUINFO_##field)
+#define GET_STACK_END(reg)                        \
+        movl $STACK_SIZE-1, %e##reg;              \
+        orq  %rsp, %r##reg
 
 #define GET_CPUINFO_FIELD(field, reg)             \
-        GET_STACK_BASE(reg);                      \
-        addq $STACK_CPUINFO_FIELD(field),reg
+        GET_STACK_END(reg);                       \
+        addq $STACK_CPUINFO_FIELD(field), %r##reg
 
 #define __GET_CURRENT(reg)                        \
-        movq STACK_CPUINFO_FIELD(current_vcpu)(reg),reg
+        movq STACK_CPUINFO_FIELD(current_vcpu)(%r##reg), %r##reg
 #define GET_CURRENT(reg)                          \
-        GET_STACK_BASE(reg);                      \
+        GET_STACK_END(reg);                       \
         __GET_CURRENT(reg)
 
 #ifndef NDEBUG
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -55,7 +55,7 @@ static inline struct cpu_info *get_cpu_i
     register unsigned long sp asm("rsp");
 #endif
 
-    return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
+    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
 }
 
 #define get_current()         (get_cpu_info()->current_vcpu)

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info
  2016-03-17  8:02       ` [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info Jan Beulich
@ 2016-03-17 16:20         ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-03-17 16:20 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Keir Fraser, Feng Wu

On 17/03/16 08:02, Jan Beulich wrote:
> This not only eases using the cached value in assembly code, but also
> improves the generated code resulting from such reads in C.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-03-17  8:03       ` [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
@ 2016-03-25 18:01         ` Konrad Rzeszutek Wilk
  2016-03-29  6:55           ` Jan Beulich
  2016-05-13 15:58         ` Andrew Cooper
  1 sibling, 1 reply; 67+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-25 18:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Feng Wu, Andrew Cooper

> @@ -174,10 +174,61 @@ compat_bad_hypercall:
>  /* %rbx: struct vcpu, interrupts disabled */
>  ENTRY(compat_restore_all_guest)
>          ASSERT_INTERRUPTS_DISABLED
> +.Lcr4_orig:
> +        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
> +        ASM_NOP2 /* jpe   .Lcr4_alt_end */
> +        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
> +        ASM_NOP6 /* and   $..., %rax */
> +        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
> +        ASM_NOP3 /* mov   %rax, %cr4 */
> +.Lcr4_orig_end:
> +        .pushsection .altinstr_replacement, "ax"
> +.Lcr4_alt:
> +        testb $3,UREGS_cs(%rsp)
> +        jpe   .Lcr4_alt_end

This would jump if the last operation had even bits set. And the
'testb' is 'and' operation which would give us the '011' (for $3).

Why not just depend on the ZF ? Other places that test UREGS_cs()
look to be using that?

> +        mov   CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
> +        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
> +        mov   %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
> +        mov   %rax, %cr4
> +.Lcr4_alt_end:
> +        .section .altinstructions, "a"
> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, \
> +                             (.Lcr4_orig_end - .Lcr4_orig), \
> +                             (.Lcr4_alt_end - .Lcr4_alt)
> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, \
> +                             (.Lcr4_orig_end - .Lcr4_orig), \
> +                             (.Lcr4_alt_end - .Lcr4_alt)
> +        .popsection
>          RESTORE_ALL adj=8 compat=1
>  .Lft0:  iretq
>          _ASM_PRE_EXTABLE(.Lft0, handle_exception)
>  
> +/* This mustn't modify registers other than %rax. */
> +ENTRY(cr4_pv32_restore)
> +        push  %rdx
> +        GET_CPUINFO_FIELD(cr4, %rdx)
> +        mov   (%rdx), %rax
> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
> +        jnz   0f
> +        or    cr4_pv32_mask(%rip), %rax
> +        mov   %rax, %cr4
> +        mov   %rax, (%rdx)

Here you leave %rax with the cr4_pv32_mask value, but:

> +        pop   %rdx
> +        ret
> +0:
> +#ifndef NDEBUG
> +        /* Check that _all_ of the bits intended to be set actually are. */
> +        mov   %cr4, %rax
> +        and   cr4_pv32_mask(%rip), %eax
> +        cmp   cr4_pv32_mask(%rip), %eax
> +        je    1f
> +        BUG
> +1:
> +#endif
> +        pop   %rdx
> +        xor   %eax, %eax

.. Here you clear it. Any particular reason?

> +        ret
> +
>  /* %rdx: trap_bounce, %rbx: struct vcpu */
>  ENTRY(compat_post_handle_exception)
>          testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
.. snip..
> -.macro LOAD_C_CLOBBERED compat=0
> +.macro LOAD_C_CLOBBERED compat=0 ax=1
>  .if !\compat
>          movq  UREGS_r11(%rsp),%r11
>          movq  UREGS_r10(%rsp),%r10
>          movq  UREGS_r9(%rsp),%r9
>          movq  UREGS_r8(%rsp),%r8
> -.endif
> +.if \ax
>          movq  UREGS_rax(%rsp),%rax
> +.endif

Why the .endif here considering you are doing an:

> +.elseif \ax

an else if here?
> +        movl  UREGS_rax(%rsp),%eax
> +.endif

Actually, Why two 'if ax' ? checks?

Or am I reading this incorrect?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 4/4] x86: use 32-bit loads for 32-bit PV guest state reload
  2016-03-17  8:04       ` [PATCH v3 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
@ 2016-03-25 18:02         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 67+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-25 18:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Feng Wu, Andrew Cooper

On Thu, Mar 17, 2016 at 02:04:25AM -0600, Jan Beulich wrote:
> This is slightly more efficient than loading 64-bit quantities.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses
  2016-03-17 16:14       ` [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses Jan Beulich
@ 2016-03-25 18:47         ` Konrad Rzeszutek Wilk
  2016-03-29  6:59           ` Jan Beulich
  2016-05-13 16:11         ` Andrew Cooper
  1 sibling, 1 reply; 67+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-25 18:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Feng Wu, Andrew Cooper

On Thu, Mar 17, 2016 at 10:14:22AM -0600, Jan Beulich wrote:

Something is off with your patch. This is 5/4 :-)

> Instead of addressing these fields via the base of the stack (which
> uniformly requires 4-byte displacements), address them from the end
> (which for everything other than guest_cpu_user_regs requires just
> 1-byte ones). This yields a code size reduction somewhere between 8k
> and 12k in my builds.

Also you made the macro a bit different - the %r is removed.

Particular reason? 

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Note that just like patch 4 of the series this also isn't directly
> related to the SMEP/SMAP issue, but is again just a result of things
> realized while doing that work, and again depends on the earlier
> patches to apply cleanly.
> 

.. snip..
> --- a/xen/include/asm-x86/asm_defns.h
> +++ b/xen/include/asm-x86/asm_defns.h
> @@ -127,19 +127,19 @@ void ret_from_intr(void);
>          UNLIKELY_DONE(mp, tag);   \
>          __UNLIKELY_END(tag)
>  
> -#define STACK_CPUINFO_FIELD(field) (STACK_SIZE-CPUINFO_sizeof+CPUINFO_##field)
> -#define GET_STACK_BASE(reg)                       \
> -        movq $~(STACK_SIZE-1),reg;                \
> -        andq %rsp,reg
> +#define STACK_CPUINFO_FIELD(field) (1 - CPUINFO_sizeof + CPUINFO_##field)
> +#define GET_STACK_END(reg)                        \
> +        movl $STACK_SIZE-1, %e##reg;              \
> +        orq  %rsp, %r##reg
>  
>  #define GET_CPUINFO_FIELD(field, reg)             \
> -        GET_STACK_BASE(reg);                      \
> -        addq $STACK_CPUINFO_FIELD(field),reg
> +        GET_STACK_END(reg);                       \
> +        addq $STACK_CPUINFO_FIELD(field), %r##reg

Not subq? The GET_STACK_END gets us ..[ edit: missed first time
the change to STACK_CPUINFO_FIELD].


Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-03-25 18:01         ` Konrad Rzeszutek Wilk
@ 2016-03-29  6:55           ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-29  6:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, Keir Fraser, Feng Wu, xen-devel

>>> On 25.03.16 at 19:01, <konrad.wilk@oracle.com> wrote:
>>  @@ -174,10 +174,61 @@ compat_bad_hypercall:
>>  /* %rbx: struct vcpu, interrupts disabled */
>>  ENTRY(compat_restore_all_guest)
>>          ASSERT_INTERRUPTS_DISABLED
>> +.Lcr4_orig:
>> +        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
>> +        ASM_NOP2 /* jpe   .Lcr4_alt_end */
>> +        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
>> +        ASM_NOP6 /* and   $..., %rax */
>> +        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
>> +        ASM_NOP3 /* mov   %rax, %cr4 */
>> +.Lcr4_orig_end:
>> +        .pushsection .altinstr_replacement, "ax"
>> +.Lcr4_alt:
>> +        testb $3,UREGS_cs(%rsp)
>> +        jpe   .Lcr4_alt_end
> 
> This would jump if the last operation had even bits set. And the
> 'testb' is 'and' operation which would give us the '011' (for $3).
> 
> Why not just depend on the ZF ? Other places that test UREGS_cs()
> look to be using that?

Because we _want_ to skip the following code when outer context
is guest ring 3. See also the v3 part of the revision log.

>> +/* This mustn't modify registers other than %rax. */
>> +ENTRY(cr4_pv32_restore)
>> +        push  %rdx
>> +        GET_CPUINFO_FIELD(cr4, %rdx)
>> +        mov   (%rdx), %rax
>> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
>> +        jnz   0f
>> +        or    cr4_pv32_mask(%rip), %rax
>> +        mov   %rax, %cr4
>> +        mov   %rax, (%rdx)
> 
> Here you leave %rax with the cr4_pv32_mask value, but:
> 
>> +        pop   %rdx
>> +        ret
>> +0:
>> +#ifndef NDEBUG
>> +        /* Check that _all_ of the bits intended to be set actually are. */
>> +        mov   %cr4, %rax
>> +        and   cr4_pv32_mask(%rip), %eax
>> +        cmp   cr4_pv32_mask(%rip), %eax
>> +        je    1f
>> +        BUG
>> +1:
>> +#endif
>> +        pop   %rdx
>> +        xor   %eax, %eax
> 
> .. Here you clear it. Any particular reason?
> 
>> +        ret

Of course - see handle_exception, where this return value gets
checked (in the first case above we just care for there to be any
non-zero value in %rax).

>> -.macro LOAD_C_CLOBBERED compat=0
>> +.macro LOAD_C_CLOBBERED compat=0 ax=1
>>  .if !\compat
>>          movq  UREGS_r11(%rsp),%r11
>>          movq  UREGS_r10(%rsp),%r10
>>          movq  UREGS_r9(%rsp),%r9
>>          movq  UREGS_r8(%rsp),%r8
>> -.endif
>> +.if \ax
>>          movq  UREGS_rax(%rsp),%rax
>> +.endif
> 
> Why the .endif here considering you are doing an:
> 
>> +.elseif \ax
> 
> an else if here?
>> +        movl  UREGS_rax(%rsp),%eax
>> +.endif
> 
> Actually, Why two 'if ax' ? checks?
> 
> Or am I reading this incorrect?

After the change the macro first deals with the "native" case
(which requires looking at \ax) and then the "compat" one
(which too requires evaluating \ax).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses
  2016-03-25 18:47         ` Konrad Rzeszutek Wilk
@ 2016-03-29  6:59           ` Jan Beulich
  2016-03-30 14:28             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-03-29  6:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, Keir Fraser, Feng Wu, xen-devel

>>> On 25.03.16 at 19:47, <konrad.wilk@oracle.com> wrote:
> On Thu, Mar 17, 2016 at 10:14:22AM -0600, Jan Beulich wrote:
> 
> Something is off with your patch. This is 5/4 :-)

Well, yes - this got added later on top of the previously sent series,
to make the dependency obvious.

>> Instead of addressing these fields via the base of the stack (which
>> uniformly requires 4-byte displacements), address them from the end
>> (which for everything other than guest_cpu_user_regs requires just
>> 1-byte ones). This yields a code size reduction somewhere between 8k
>> and 12k in my builds.
> 
> Also you made the macro a bit different - the %r is removed.
> 
> Particular reason? 

This is an integral part of the change, so the macro can derive
e.g. both %eax and %rax from the passed argument

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses
  2016-03-29  6:59           ` Jan Beulich
@ 2016-03-30 14:28             ` Konrad Rzeszutek Wilk
  2016-03-30 14:42               ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-30 14:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, Feng Wu, xen-devel

On Tue, Mar 29, 2016 at 12:59:24AM -0600, Jan Beulich wrote:
> >>> On 25.03.16 at 19:47, <konrad.wilk@oracle.com> wrote:
> > On Thu, Mar 17, 2016 at 10:14:22AM -0600, Jan Beulich wrote:
> > 
> > Something is off with your patch. This is 5/4 :-)
> 
> Well, yes - this got added later on top of the previously sent series,
> to make the dependency obvious.
> 
> >> Instead of addressing these fields via the base of the stack (which
> >> uniformly requires 4-byte displacements), address them from the end
> >> (which for everything other than guest_cpu_user_regs requires just
> >> 1-byte ones). This yields a code size reduction somewhere between 8k
> >> and 12k in my builds.
> > 
> > Also you made the macro a bit different - the %r is removed.
> > 
> > Particular reason? 
> 
> This is an integral part of the change, so the macro can derive
> e.g. both %eax and %rax from the passed argument

Could you pls include that explanation in the commit description..

And with that you can put Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

on the patch.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses
  2016-03-30 14:28             ` Konrad Rzeszutek Wilk
@ 2016-03-30 14:42               ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-03-30 14:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, Keir Fraser, Feng Wu, xen-devel

>>> On 30.03.16 at 16:28, <konrad.wilk@oracle.com> wrote:
> On Tue, Mar 29, 2016 at 12:59:24AM -0600, Jan Beulich wrote:
>> >>> On 25.03.16 at 19:47, <konrad.wilk@oracle.com> wrote:
>> > On Thu, Mar 17, 2016 at 10:14:22AM -0600, Jan Beulich wrote:
>> > 
>> > Something is off with your patch. This is 5/4 :-)
>> 
>> Well, yes - this got added later on top of the previously sent series,
>> to make the dependency obvious.
>> 
>> >> Instead of addressing these fields via the base of the stack (which
>> >> uniformly requires 4-byte displacements), address them from the end
>> >> (which for everything other than guest_cpu_user_regs requires just
>> >> 1-byte ones). This yields a code size reduction somewhere between 8k
>> >> and 12k in my builds.
>> > 
>> > Also you made the macro a bit different - the %r is removed.
>> > 
>> > Particular reason? 
>> 
>> This is an integral part of the change, so the macro can derive
>> e.g. both %eax and %rax from the passed argument
> 
> Could you pls include that explanation in the commit description..
> 
> And with that you can put Reviewed-by: Konrad Rzeszutek Wilk 
> <konrad.wilk@oracle.com>
> 
> on the patch.

Well, I've got one of these already, without having been asked to
do any adjustments. Did I mis-interpret
<20160325184701.GE20741@char.us.oracle.com>?
As to adding something like this to the commit message: I have
a hard time seeing how this would belong there. The fact _that_
this is being done is very obvious from the patch itself, and once
the reader has spotted this the fact _why_ this is needed then
should become immediately obvious too. I'm all for explaining
technically difficult or hard to see things, but I'm against
needless bloat.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
                         ` (4 preceding siblings ...)
  2016-03-17 16:14       ` [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses Jan Beulich
@ 2016-05-03 13:58       ` Jan Beulich
  2016-05-03 14:10         ` Andrew Cooper
                           ` (2 more replies)
  2016-05-13 17:02       ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Wei Liu
  2016-06-21  6:19       ` Wu, Feng
  7 siblings, 3 replies; 67+ messages in thread
From: Jan Beulich @ 2016-05-03 13:58 UTC (permalink / raw)
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel, Feng Wu

>>> On 17.03.16 at 09:03,  wrote:
> Since such guests' kernel code runs in ring 1, their memory accesses,
> at the paging layer, are supervisor mode ones, and hence subject to
> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> two features though (and so far we also don't expose the respective
> feature flags), and hence may suffer page faults they cannot deal with.
> 
> While the placement of the re-enabling slightly weakens the intended
> protection, it was selected such that 64-bit paths would remain
> unaffected where possible. At the expense of a further performance hit
> the re-enabling could be put right next to the CLACs.
> 
> Note that this introduces a number of extra TLB flushes - CR4.SMEP
> transitioning from 0 to 1 always causes a flush, and it transitioning
> from 1 to 0 may also do.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

So I think we need to take some decision here, and I'm afraid
Andrew and I won't be able to settle this between us. He's
validly concerned about the performance impact this got proven
to have (for 32-bit PV guests), yet I continue to think that correct
behavior is more relevant than performance. Hence I think we
should bite the bullet and take the change. For those who value
performance more than security, they can always disable the use
of SMEP and SMAP via command line option.

Of course I'm also concerned that Intel, who did introduce the
functional regression in the first place, so far didn't participate at
all in finding an acceptable solution to the problem at hand...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-03 13:58       ` Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
@ 2016-05-03 14:10         ` Andrew Cooper
  2016-05-03 14:25           ` Jan Beulich
  2016-05-04  3:07         ` Wu, Feng
  2016-05-13 15:21         ` Wei Liu
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-05-03 14:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, xen-devel, Feng Wu

On 03/05/16 14:58, Jan Beulich wrote:
>>>> On 17.03.16 at 09:03,  wrote:
>> Since such guests' kernel code runs in ring 1, their memory accesses,
>> at the paging layer, are supervisor mode ones, and hence subject to
>> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
>> two features though (and so far we also don't expose the respective
>> feature flags), and hence may suffer page faults they cannot deal with.
>>
>> While the placement of the re-enabling slightly weakens the intended
>> protection, it was selected such that 64-bit paths would remain
>> unaffected where possible. At the expense of a further performance hit
>> the re-enabling could be put right next to the CLACs.
>>
>> Note that this introduces a number of extra TLB flushes - CR4.SMEP
>> transitioning from 0 to 1 always causes a flush, and it transitioning
>> from 1 to 0 may also do.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> So I think we need to take some decision here, and I'm afraid
> Andrew and I won't be able to settle this between us. He's
> validly concerned about the performance impact this got proven
> to have (for 32-bit PV guests), yet I continue to think that correct
> behavior is more relevant than performance. Hence I think we
> should bite the bullet and take the change. For those who value
> performance more than security, they can always disable the use
> of SMEP and SMAP via command line option.
>
> Of course I'm also concerned that Intel, who did introduce the
> functional regression in the first place, so far didn't participate at
> all in finding an acceptable solution to the problem at hand...

What I am even more concerned with is that the supposedly-optimised
version which tried to omit cr4 writes whenever possible caused a higher
performance overhead than the version which rewrite cr4 all the time.

As is stands, v3 of the series is even worse for performance than v2. 
That alone means that v3 is not in a suitable state to be accepted, as
there is some hidden bug in it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-03 14:10         ` Andrew Cooper
@ 2016-05-03 14:25           ` Jan Beulich
  2016-05-04 10:03             ` Andrew Cooper
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-05-03 14:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	IanJackson, xen-devel, Feng Wu

>>> On 03.05.16 at 16:10, <andrew.cooper3@citrix.com> wrote:
> On 03/05/16 14:58, Jan Beulich wrote:
>>>>> On 17.03.16 at 09:03,  wrote:
>>> Since such guests' kernel code runs in ring 1, their memory accesses,
>>> at the paging layer, are supervisor mode ones, and hence subject to
>>> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
>>> two features though (and so far we also don't expose the respective
>>> feature flags), and hence may suffer page faults they cannot deal with.
>>>
>>> While the placement of the re-enabling slightly weakens the intended
>>> protection, it was selected such that 64-bit paths would remain
>>> unaffected where possible. At the expense of a further performance hit
>>> the re-enabling could be put right next to the CLACs.
>>>
>>> Note that this introduces a number of extra TLB flushes - CR4.SMEP
>>> transitioning from 0 to 1 always causes a flush, and it transitioning
>>> from 1 to 0 may also do.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> So I think we need to take some decision here, and I'm afraid
>> Andrew and I won't be able to settle this between us. He's
>> validly concerned about the performance impact this got proven
>> to have (for 32-bit PV guests), yet I continue to think that correct
>> behavior is more relevant than performance. Hence I think we
>> should bite the bullet and take the change. For those who value
>> performance more than security, they can always disable the use
>> of SMEP and SMAP via command line option.
>>
>> Of course I'm also concerned that Intel, who did introduce the
>> functional regression in the first place, so far didn't participate at
>> all in finding an acceptable solution to the problem at hand...
> 
> What I am even more concerned with is that the supposedly-optimised
> version which tried to omit cr4 writes whenever possible caused a higher
> performance overhead than the version which rewrite cr4 all the time.
> 
> As is stands, v3 of the series is even worse for performance than v2. 
> That alone means that v3 is not in a suitable state to be accepted, as
> there is some hidden bug in it.

For one, we could take v2 (albeit that would feel odd).

And then, from v3 showing worse performance (which only you
have seen the numbers for, and [hopefully] know what deviations
in them really mean) it does not follow that v3 is buggy. That's one
possibility, sure (albeit hard to explain, as functionally everything
is working fine for both you and me, afaik), but not the only one.
Another may be that hardware processes CR4 writes faster when
they happen in close succession to CR4 reads. And I'm sure more
could be come up with.

And finally, I'm not tied to this patch set. I'd be fine with some
other fix to restore correct behavior. What I'm not fine with is the
limbo state this is being left in, with me all the time having to revive
the discussion.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-03 13:58       ` Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
  2016-05-03 14:10         ` Andrew Cooper
@ 2016-05-04  3:07         ` Wu, Feng
  2016-05-13 15:21         ` Wei Liu
  2 siblings, 0 replies; 67+ messages in thread
From: Wu, Feng @ 2016-05-04  3:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, May 3, 2016 9:59 PM
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@eu.citrix.com>; Ian
> Jackson <Ian.Jackson@eu.citrix.com>; Wu, Feng <feng.wu@intel.com>; Stefano
> Stabellini <sstabellini@kernel.org>; xen-devel <xen-devel@lists.xenproject.org>;
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Tim Deegan <tim@xen.org>
> Subject: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit
> PV guest code
> 
> >>> On 17.03.16 at 09:03,  wrote:
> > Since such guests' kernel code runs in ring 1, their memory accesses,
> > at the paging layer, are supervisor mode ones, and hence subject to
> > SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> > two features though (and so far we also don't expose the respective
> > feature flags), and hence may suffer page faults they cannot deal with.
> >
> > While the placement of the re-enabling slightly weakens the intended
> > protection, it was selected such that 64-bit paths would remain
> > unaffected where possible. At the expense of a further performance hit
> > the re-enabling could be put right next to the CLACs.
> >
> > Note that this introduces a number of extra TLB flushes - CR4.SMEP
> > transitioning from 0 to 1 always causes a flush, and it transitioning
> > from 1 to 0 may also do.
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> So I think we need to take some decision here, and I'm afraid
> Andrew and I won't be able to settle this between us. He's
> validly concerned about the performance impact this got proven
> to have (for 32-bit PV guests), yet I continue to think that correct
> behavior is more relevant than performance. Hence I think we
> should bite the bullet and take the change. For those who value
> performance more than security, they can always disable the use
> of SMEP and SMAP via command line option.

I think this is a proper solution for those who cares more about
performance. And BTW, Andrew, is there any detailed data which can
show the exact performance overhead introduced by this patch?

Thanks,
Feng

> 
> Of course I'm also concerned that Intel, who did introduce the
> functional regression in the first place, so far didn't participate at
> all in finding an acceptable solution to the problem at hand...
> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-03 14:25           ` Jan Beulich
@ 2016-05-04 10:03             ` Andrew Cooper
  2016-05-04 13:35               ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-05-04 10:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	IanJackson, xen-devel, Feng Wu

On 03/05/16 15:25, Jan Beulich wrote:
>>>> On 03.05.16 at 16:10, <andrew.cooper3@citrix.com> wrote:
>> On 03/05/16 14:58, Jan Beulich wrote:
>>>>>> On 17.03.16 at 09:03,  wrote:
>>>> Since such guests' kernel code runs in ring 1, their memory accesses,
>>>> at the paging layer, are supervisor mode ones, and hence subject to
>>>> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
>>>> two features though (and so far we also don't expose the respective
>>>> feature flags), and hence may suffer page faults they cannot deal with.
>>>>
>>>> While the placement of the re-enabling slightly weakens the intended
>>>> protection, it was selected such that 64-bit paths would remain
>>>> unaffected where possible. At the expense of a further performance hit
>>>> the re-enabling could be put right next to the CLACs.
>>>>
>>>> Note that this introduces a number of extra TLB flushes - CR4.SMEP
>>>> transitioning from 0 to 1 always causes a flush, and it transitioning
>>>> from 1 to 0 may also do.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> So I think we need to take some decision here, and I'm afraid
>>> Andrew and I won't be able to settle this between us. He's
>>> validly concerned about the performance impact this got proven
>>> to have (for 32-bit PV guests), yet I continue to think that correct
>>> behavior is more relevant than performance. Hence I think we
>>> should bite the bullet and take the change. For those who value
>>> performance more than security, they can always disable the use
>>> of SMEP and SMAP via command line option.
>>>
>>> Of course I'm also concerned that Intel, who did introduce the
>>> functional regression in the first place, so far didn't participate at
>>> all in finding an acceptable solution to the problem at hand...
>> What I am even more concerned with is that the supposedly-optimised
>> version which tried to omit cr4 writes whenever possible caused a higher
>> performance overhead than the version which rewrite cr4 all the time.
>>
>> As is stands, v3 of the series is even worse for performance than v2. 
>> That alone means that v3 is not in a suitable state to be accepted, as
>> there is some hidden bug in it.
> For one, we could take v2 (albeit that would feel odd).
>
> And then, from v3 showing worse performance (which only you
> have seen the numbers for, and [hopefully] know what deviations
> in them really mean)

The highlights for both v2 and v3 are on xen-devel.

>  it does not follow that v3 is buggy.

It very much does follow.

>  That's one
> possibility, sure (albeit hard to explain, as functionally everything
> is working fine for both you and me, afaik), but not the only one.
> Another may be that hardware processes CR4 writes faster when
> they happen in close succession to CR4 reads. And I'm sure more
> could be come up with.

v3 has logic intended to omit cr4 changes while in the content of 32bit
PV guest userspace.  This in turn was intended to increase performance
by reducing the quantity of TLB flushes.

Even if hardware has logic to speed up CR4 writes when following reads,
the complete omission of the writes in the first place should
necessarily be faster still.

v3 performs ~20% worse compared to v2, which indicates that one of the
logical steps is wrong.

>
> And finally, I'm not tied to this patch set. I'd be fine with some
> other fix to restore correct behavior. What I'm not fine with is the
> limbo state this is being left in, with me all the time having to revive
> the discussion.

I agree that context switching cr4 is the only way of allowing 32bit PV
guests to function without impacting other functionality.  After all,
the PV guest is doing precisely what SMAP is intended to catch.

I also accept that this will come with a performance overhead, but
frankly, the performance regression on these specific patches is
massive.  I am very concerned that the first thing I will have to do in
XenServer is revert them.

I don't like this situation, but blindly ploughing ahead given these
issues is also a problem.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-04 10:03             ` Andrew Cooper
@ 2016-05-04 13:35               ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-05-04 13:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	IanJackson, xen-devel, FengWu

>>> On 04.05.16 at 12:03, <andrew.cooper3@citrix.com> wrote:
> v3 has logic intended to omit cr4 changes while in the content of 32bit
> PV guest userspace.  This in turn was intended to increase performance
> by reducing the quantity of TLB flushes.
> 
> Even if hardware has logic to speed up CR4 writes when following reads,
> the complete omission of the writes in the first place should
> necessarily be faster still.
> 
> v3 performs ~20% worse compared to v2, which indicates that one of the
> logical steps is wrong.

Yet no-one can spot it?

May I suggest to re-measure v2 and v3 on an otherwise identical
baseline (which I'm relatively sure was not the case for the previous
measurements)? Furthermore the delta between v2 and v3 was
larger than the exit-to-guest-user-mode adjustment. Hence another
thing to consider is kind of bisecting between the two versions, at
least at the granularity of the three distinct changes done.

And then, looking at the interdiff, I note that it is quite important to
know whether you measured a debug or a release build. If it was a
debug one, the change to the debugging portion of cr4_pv32_restore
makes it so that the CR4 read is not only not avoided there, but that
read now happens in close succession to a prior write, which afaik
may incur further stalls.

>> And finally, I'm not tied to this patch set. I'd be fine with some
>> other fix to restore correct behavior. What I'm not fine with is the
>> limbo state this is being left in, with me all the time having to revive
>> the discussion.
> 
> I agree that context switching cr4 is the only way of allowing 32bit PV
> guests to function without impacting other functionality.

See below - what makes you assume this would come at a lower
price?

>  After all,
> the PV guest is doing precisely what SMAP is intended to catch.

I don't understand this part, especially when - from the previous
sentence - the context is "32-bit PV guest".

> I also accept that this will come with a performance overhead, but
> frankly, the performance regression on these specific patches is
> massive.

Yet it has to be assumed that any other approach (e.g. the
context switching one you mention) wouldn't be any better.

>  I am very concerned that the first thing I will have to do in
> XenServer is revert them.

That shouldn't be a primary reason to not make forward progress
in the community project. Or else I could equally well say it has to
go in because otherwise we will have to carry the patch locally in
our product trees.

> I don't like this situation, but blindly ploughing ahead given these
> issues is also a problem.

Re-emphasizing again that there is no known or apparent functional
bug with this patch, figuring out the performance aspects (beyond
what I've suggested above) is where I think we would need Intel's
help with. Yet you've seen Feng's reply, which was a result of us
pushing for their input. And once again - as long as there's no
functional bug, I think we either need to make progress in identifying
the reason (which I don't see much I can do for), or - also taking into
account for how long the thing has been broken in the tree - simply
commit what is there and deal with the performance fallout later.
Leaving functionally broken code in the tree for such an extended
(and otherwise unbounded) period of time is a no-go imo.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-03 13:58       ` Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
  2016-05-03 14:10         ` Andrew Cooper
  2016-05-04  3:07         ` Wu, Feng
@ 2016-05-13 15:21         ` Wei Liu
  2016-05-13 15:30           ` Jan Beulich
  2 siblings, 1 reply; 67+ messages in thread
From: Wei Liu @ 2016-05-13 15:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel, Feng Wu

On Tue, May 03, 2016 at 07:58:58AM -0600, Jan Beulich wrote:
> >>> On 17.03.16 at 09:03,  wrote:
> > Since such guests' kernel code runs in ring 1, their memory accesses,
> > at the paging layer, are supervisor mode ones, and hence subject to
> > SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> > two features though (and so far we also don't expose the respective
> > feature flags), and hence may suffer page faults they cannot deal with.
> > 
> > While the placement of the re-enabling slightly weakens the intended
> > protection, it was selected such that 64-bit paths would remain
> > unaffected where possible. At the expense of a further performance hit
> > the re-enabling could be put right next to the CLACs.
> > 
> > Note that this introduces a number of extra TLB flushes - CR4.SMEP
> > transitioning from 0 to 1 always causes a flush, and it transitioning
> > from 1 to 0 may also do.
> > 
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> So I think we need to take some decision here, and I'm afraid
> Andrew and I won't be able to settle this between us. He's
> validly concerned about the performance impact this got proven
> to have (for 32-bit PV guests), yet I continue to think that correct
> behavior is more relevant than performance. Hence I think we
> should bite the bullet and take the change. For those who value
> performance more than security, they can always disable the use
> of SMEP and SMAP via command line option.
> 
> Of course I'm also concerned that Intel, who did introduce the
> functional regression in the first place, so far didn't participate at
> all in finding an acceptable solution to the problem at hand...
> 

So this thread has not produced a conclusion. What do we need to do
about this issue?

Do we have a set of patches that make things behave correctly
(regardless of its performance impact)?

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-13 15:21         ` Wei Liu
@ 2016-05-13 15:30           ` Jan Beulich
  2016-05-13 15:33             ` Wei Liu
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-05-13 15:30 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, Feng Wu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

>>> On 13.05.16 at 17:21, <wei.liu2@citrix.com> wrote:
> On Tue, May 03, 2016 at 07:58:58AM -0600, Jan Beulich wrote:
>> >>> On 17.03.16 at 09:03,  wrote:
>> > Since such guests' kernel code runs in ring 1, their memory accesses,
>> > at the paging layer, are supervisor mode ones, and hence subject to
>> > SMAP/SMEP checks. Such guests cannot be expected to be aware of those
>> > two features though (and so far we also don't expose the respective
>> > feature flags), and hence may suffer page faults they cannot deal with.
>> > 
>> > While the placement of the re-enabling slightly weakens the intended
>> > protection, it was selected such that 64-bit paths would remain
>> > unaffected where possible. At the expense of a further performance hit
>> > the re-enabling could be put right next to the CLACs.
>> > 
>> > Note that this introduces a number of extra TLB flushes - CR4.SMEP
>> > transitioning from 0 to 1 always causes a flush, and it transitioning
>> > from 1 to 0 may also do.
>> > 
>> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> 
>> So I think we need to take some decision here, and I'm afraid
>> Andrew and I won't be able to settle this between us. He's
>> validly concerned about the performance impact this got proven
>> to have (for 32-bit PV guests), yet I continue to think that correct
>> behavior is more relevant than performance. Hence I think we
>> should bite the bullet and take the change. For those who value
>> performance more than security, they can always disable the use
>> of SMEP and SMAP via command line option.
>> 
>> Of course I'm also concerned that Intel, who did introduce the
>> functional regression in the first place, so far didn't participate at
>> all in finding an acceptable solution to the problem at hand...
>> 
> 
> So this thread has not produced a conclusion. What do we need to do
> about this issue?

Andrew privately agreed to no longer object, but of course
subject to him doing (another) proper review.

> Do we have a set of patches that make things behave correctly
> (regardless of its performance impact)?

Yes, this one.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-05-13 15:30           ` Jan Beulich
@ 2016-05-13 15:33             ` Wei Liu
  0 siblings, 0 replies; 67+ messages in thread
From: Wei Liu @ 2016-05-13 15:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel, Feng Wu

On Fri, May 13, 2016 at 09:30:23AM -0600, Jan Beulich wrote:
> >>> On 13.05.16 at 17:21, <wei.liu2@citrix.com> wrote:
> > On Tue, May 03, 2016 at 07:58:58AM -0600, Jan Beulich wrote:
> >> >>> On 17.03.16 at 09:03,  wrote:
> >> > Since such guests' kernel code runs in ring 1, their memory accesses,
> >> > at the paging layer, are supervisor mode ones, and hence subject to
> >> > SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> >> > two features though (and so far we also don't expose the respective
> >> > feature flags), and hence may suffer page faults they cannot deal with.
> >> > 
> >> > While the placement of the re-enabling slightly weakens the intended
> >> > protection, it was selected such that 64-bit paths would remain
> >> > unaffected where possible. At the expense of a further performance hit
> >> > the re-enabling could be put right next to the CLACs.
> >> > 
> >> > Note that this introduces a number of extra TLB flushes - CR4.SMEP
> >> > transitioning from 0 to 1 always causes a flush, and it transitioning
> >> > from 1 to 0 may also do.
> >> > 
> >> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> 
> >> So I think we need to take some decision here, and I'm afraid
> >> Andrew and I won't be able to settle this between us. He's
> >> validly concerned about the performance impact this got proven
> >> to have (for 32-bit PV guests), yet I continue to think that correct
> >> behavior is more relevant than performance. Hence I think we
> >> should bite the bullet and take the change. For those who value
> >> performance more than security, they can always disable the use
> >> of SMEP and SMAP via command line option.
> >> 
> >> Of course I'm also concerned that Intel, who did introduce the
> >> functional regression in the first place, so far didn't participate at
> >> all in finding an acceptable solution to the problem at hand...
> >> 
> > 
> > So this thread has not produced a conclusion. What do we need to do
> > about this issue?
> 
> Andrew privately agreed to no longer object, but of course
> subject to him doing (another) proper review.
> 
> > Do we have a set of patches that make things behave correctly
> > (regardless of its performance impact)?
> 
> Yes, this one.
> 

OK. I will wait for him to review this series.

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-03-10  9:53   ` [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
@ 2016-05-13 15:48     ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-05-13 15:48 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Keir Fraser, Feng Wu

On 10/03/16 09:53, Jan Beulich wrote:
> Since such guests' kernel code runs in ring 1, their memory accesses,
> at the paging layer, are supervisor mode ones, and hence subject to
> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> two features though (and so far we also don't expose the respective
> feature flags), and hence may suffer page faults they cannot deal with.
>
> While the placement of the re-enabling slightly weakens the intended
> protection, it was selected such that 64-bit paths would remain
> unaffected where possible. At the expense of a further performance hit
> the re-enabling could be put right next to the CLACs.
>
> Note that this introduces a number of extra TLB flushes - CR4.SMEP
> transitioning from 0 to 1 always causes a flush, and it transitioning
> from 1 to 0 may also do.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/3] x86: use optimal NOPs to fill the SMEP/SMAP placeholders
  2016-03-10  9:54   ` [PATCH v2 2/3] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
@ 2016-05-13 15:49     ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-05-13 15:49 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Feng Wu, Keir Fraser

On 10/03/16 09:54, Jan Beulich wrote:
> Alternatives patching code picks the most suitable NOPs for the
> running system, so simply use it to replace the pre-populated ones.
>
> Use an arbitrary, always available feature to key off from, but
> hide this behind the new X86_FEATURE_ALWAYS.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders
  2016-03-17  8:03       ` [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
@ 2016-05-13 15:57         ` Andrew Cooper
  2016-05-13 16:06           ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Cooper @ 2016-05-13 15:57 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Keir Fraser, Feng Wu

On 17/03/16 08:03, Jan Beulich wrote:
> Alternatives patching code picks the most suitable NOPs for the
> running system, so simply use it to replace the pre-populated ones.
>
> Use an arbitrary, always available feature to key off from, but
> hide this behind the new X86_FEATURE_ALWAYS.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3: Re-base.
> v2: Introduce and use X86_FEATURE_ALWAYS.
>
> --- a/xen/arch/x86/x86_64/compat/entry.S
> +++ b/xen/arch/x86/x86_64/compat/entry.S
> @@ -175,12 +175,7 @@ compat_bad_hypercall:
>  ENTRY(compat_restore_all_guest)
>          ASSERT_INTERRUPTS_DISABLED
>  .Lcr4_orig:
> -        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
> -        ASM_NOP2 /* jpe   .Lcr4_alt_end */
> -        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
> -        ASM_NOP6 /* and   $..., %rax */
> -        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
> -        ASM_NOP3 /* mov   %rax, %cr4 */
> +        .skip (.Lcr4_alt_end - .Lcr4_alt) - (. - .Lcr4_orig), 0x90
>  .Lcr4_orig_end:
>          .pushsection .altinstr_replacement, "ax"
>  .Lcr4_alt:

This hunk should live in patch 2.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code
  2016-03-17  8:03       ` [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
  2016-03-25 18:01         ` Konrad Rzeszutek Wilk
@ 2016-05-13 15:58         ` Andrew Cooper
  1 sibling, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-05-13 15:58 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Keir Fraser, Feng Wu

On 17/03/16 08:03, Jan Beulich wrote:
> Since such guests' kernel code runs in ring 1, their memory accesses,
> at the paging layer, are supervisor mode ones, and hence subject to
> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> two features though (and so far we also don't expose the respective
> feature flags), and hence may suffer page faults they cannot deal with.
>
> While the placement of the re-enabling slightly weakens the intended
> protection, it was selected such that 64-bit paths would remain
> unaffected where possible. At the expense of a further performance hit
> the re-enabling could be put right next to the CLACs.
>
> Note that this introduces a number of extra TLB flushes - CR4.SMEP
> transitioning from 0 to 1 always causes a flush, and it transitioning
> from 1 to 0 may also do.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Sorry - I reviewed the v3 code, and replied to the v2 email.

For clarity sake, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders
  2016-05-13 15:57         ` Andrew Cooper
@ 2016-05-13 16:06           ` Jan Beulich
  2016-05-13 16:09             ` Andrew Cooper
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2016-05-13 16:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser, Feng Wu

>>> On 13.05.16 at 17:57, <andrew.cooper3@citrix.com> wrote:
> On 17/03/16 08:03, Jan Beulich wrote:
>> Alternatives patching code picks the most suitable NOPs for the
>> running system, so simply use it to replace the pre-populated ones.
>>
>> Use an arbitrary, always available feature to key off from, but
>> hide this behind the new X86_FEATURE_ALWAYS.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v3: Re-base.
>> v2: Introduce and use X86_FEATURE_ALWAYS.
>>
>> --- a/xen/arch/x86/x86_64/compat/entry.S
>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>> @@ -175,12 +175,7 @@ compat_bad_hypercall:
>>  ENTRY(compat_restore_all_guest)
>>          ASSERT_INTERRUPTS_DISABLED
>>  .Lcr4_orig:
>> -        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
>> -        ASM_NOP2 /* jpe   .Lcr4_alt_end */
>> -        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
>> -        ASM_NOP6 /* and   $..., %rax */
>> -        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
>> -        ASM_NOP3 /* mov   %rax, %cr4 */
>> +        .skip (.Lcr4_alt_end - .Lcr4_alt) - (. - .Lcr4_orig), 0x90
>>  .Lcr4_orig_end:
>>          .pushsection .altinstr_replacement, "ax"
>>  .Lcr4_alt:
> 
> This hunk should live in patch 2.

No. In patch 2 we want to leverage multi-byte NOPs. Here, knowing
they're going to be replaced anyway, we are fine with using the
simpler .fill (producing many single byte ones).

> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Does this stand nevertheless?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders
  2016-05-13 16:06           ` Jan Beulich
@ 2016-05-13 16:09             ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-05-13 16:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Feng Wu

On 13/05/16 17:06, Jan Beulich wrote:
>>>> On 13.05.16 at 17:57, <andrew.cooper3@citrix.com> wrote:
>> On 17/03/16 08:03, Jan Beulich wrote:
>>> Alternatives patching code picks the most suitable NOPs for the
>>> running system, so simply use it to replace the pre-populated ones.
>>>
>>> Use an arbitrary, always available feature to key off from, but
>>> hide this behind the new X86_FEATURE_ALWAYS.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> v3: Re-base.
>>> v2: Introduce and use X86_FEATURE_ALWAYS.
>>>
>>> --- a/xen/arch/x86/x86_64/compat/entry.S
>>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>>> @@ -175,12 +175,7 @@ compat_bad_hypercall:
>>>  ENTRY(compat_restore_all_guest)
>>>          ASSERT_INTERRUPTS_DISABLED
>>>  .Lcr4_orig:
>>> -        ASM_NOP8 /* testb $3,UREGS_cs(%rsp) */
>>> -        ASM_NOP2 /* jpe   .Lcr4_alt_end */
>>> -        ASM_NOP8 /* mov   CPUINFO_cr4...(%rsp), %rax */
>>> -        ASM_NOP6 /* and   $..., %rax */
>>> -        ASM_NOP8 /* mov   %rax, CPUINFO_cr4...(%rsp) */
>>> -        ASM_NOP3 /* mov   %rax, %cr4 */
>>> +        .skip (.Lcr4_alt_end - .Lcr4_alt) - (. - .Lcr4_orig), 0x90
>>>  .Lcr4_orig_end:
>>>          .pushsection .altinstr_replacement, "ax"
>>>  .Lcr4_alt:
>> This hunk should live in patch 2.
> No. In patch 2 we want to leverage multi-byte NOPs. Here, knowing
> they're going to be replaced anyway, we are fine with using the
> simpler .fill (producing many single byte ones).
>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Does this stand nevertheless?

Yes.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses
  2016-03-17 16:14       ` [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses Jan Beulich
  2016-03-25 18:47         ` Konrad Rzeszutek Wilk
@ 2016-05-13 16:11         ` Andrew Cooper
  1 sibling, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-05-13 16:11 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Keir Fraser, Feng Wu

On 17/03/16 16:14, Jan Beulich wrote:
> Instead of addressing these fields via the base of the stack (which
> uniformly requires 4-byte displacements), address them from the end
> (which for everything other than guest_cpu_user_regs requires just
> 1-byte ones). This yields a code size reduction somewhere between 8k
> and 12k in my builds.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
                         ` (5 preceding siblings ...)
  2016-05-03 13:58       ` Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
@ 2016-05-13 17:02       ` Wei Liu
  2016-05-13 17:21         ` Andrew Cooper
  2016-06-21  6:19       ` Wu, Feng
  7 siblings, 1 reply; 67+ messages in thread
From: Wei Liu @ 2016-05-13 17:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Wei Liu, Feng Wu, Andrew Cooper

On Thu, Mar 17, 2016 at 01:50:39AM -0600, Jan Beulich wrote:
> As has been explained previously[1], SMAP (and with less relevance
> also SMEP) is not compatible with 32-bit PV guests which aren't
> aware/prepared to be run with that feature enabled. Andrew's
> original approach either sacrificed architectural correctness for
> making 32-bit guests work again, or disabled SMAP also for not
> insignificant portions of hypervisor code, both by allowing to control
> the workaround mode via command line option.
> 
> This alternative approach disables SMEP and SMAP only while
> running 32-bit PV guest code plus a few hypervisor instructions
> early after entering hypervisor context from or late before exiting
> to such guests. Those few instructions (in assembly source) are of
> course much easier to prove not to perform undue memory
> accesses than code paths reaching deep into C sources.
> 
> The 4th patch really is unrelated except for not applying cleanly
> without the earlier ones, and the potential having been noticed
> while putting together the 2nd one.
> 
> 1: move cached CR4 value to struct cpu_info
> 2: suppress SMEP and SMAP while running 32-bit PV guest code
> 3: use optimal NOPs to fill the SMEP/SMAP placeholders
> 4: use 32-bit loads for 32-bit PV guest state reload
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling
  2016-05-13 17:02       ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Wei Liu
@ 2016-05-13 17:21         ` Andrew Cooper
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2016-05-13 17:21 UTC (permalink / raw)
  To: Wei Liu, Jan Beulich; +Cc: xen-devel, Keir Fraser, Feng Wu

On 13/05/16 18:02, Wei Liu wrote:
> On Thu, Mar 17, 2016 at 01:50:39AM -0600, Jan Beulich wrote:
>> As has been explained previously[1], SMAP (and with less relevance
>> also SMEP) is not compatible with 32-bit PV guests which aren't
>> aware/prepared to be run with that feature enabled. Andrew's
>> original approach either sacrificed architectural correctness for
>> making 32-bit guests work again, or disabled SMAP also for not
>> insignificant portions of hypervisor code, both by allowing to control
>> the workaround mode via command line option.
>>
>> This alternative approach disables SMEP and SMAP only while
>> running 32-bit PV guest code plus a few hypervisor instructions
>> early after entering hypervisor context from or late before exiting
>> to such guests. Those few instructions (in assembly source) are of
>> course much easier to prove not to perform undue memory
>> accesses than code paths reaching deep into C sources.
>>
>> The 4th patch really is unrelated except for not applying cleanly
>> without the earlier ones, and the potential having been noticed
>> while putting together the 2nd one.
>>
>> 1: move cached CR4 value to struct cpu_info
>> 2: suppress SMEP and SMAP while running 32-bit PV guest code
>> 3: use optimal NOPs to fill the SMEP/SMAP placeholders
>> 4: use 32-bit loads for 32-bit PV guest state reload
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Release-acked-by: Wei Liu <wei.liu2@citrix.com>

And applied.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling
  2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
                         ` (6 preceding siblings ...)
  2016-05-13 17:02       ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Wei Liu
@ 2016-06-21  6:19       ` Wu, Feng
  2016-06-21  7:17         ` Jan Beulich
  7 siblings, 1 reply; 67+ messages in thread
From: Wu, Feng @ 2016-06-21  6:19 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Keir Fraser, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, March 17, 2016 3:51 PM
> To: xen-devel <xen-devel@lists.xenproject.org>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Wu, Feng
> <feng.wu@intel.com>; Keir Fraser <keir@xen.org>
> Subject: [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP
> handling
> 
> As has been explained previously[1], SMAP (and with less relevance
> also SMEP) is not compatible with 32-bit PV guests which aren't
> aware/prepared to be run with that feature enabled. Andrew's
> original approach either sacrificed architectural correctness for
> making 32-bit guests work again, or disabled SMAP also for not
> insignificant portions of hypervisor code, both by allowing to control
> the workaround mode via command line option.

Hi Jan, do you mind sharing more about Andrew's original approach?
And to solve this issue can we just disable SMEP/SMAP for Xen itself,
hence only HVM will benefit from this feature?

Thanks,
Feng

> 
> This alternative approach disables SMEP and SMAP only while
> running 32-bit PV guest code plus a few hypervisor instructions
> early after entering hypervisor context from or late before exiting
> to such guests. Those few instructions (in assembly source) are of
> course much easier to prove not to perform undue memory
> accesses than code paths reaching deep into C sources.
> 
> The 4th patch really is unrelated except for not applying cleanly
> without the earlier ones, and the potential having been noticed
> while putting together the 2nd one.
> 
> 1: move cached CR4 value to struct cpu_info
> 2: suppress SMEP and SMAP while running 32-bit PV guest code
> 3: use optimal NOPs to fill the SMEP/SMAP placeholders
> 4: use 32-bit loads for 32-bit PV guest state reload
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3: New patch 1, as a prereq to changes in patch 2 (previously
>      1). The primary reason for this are performance issues that
>      have been found by Andrew with the previous version.
> v2: Various changes to patches 1 and 2 - see there.
> 
> [1] http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg03888.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling
  2016-06-21  6:19       ` Wu, Feng
@ 2016-06-21  7:17         ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2016-06-21  7:17 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 21.06.16 at 08:19, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, March 17, 2016 3:51 PM
>> 
>> As has been explained previously[1], SMAP (and with less relevance
>> also SMEP) is not compatible with 32-bit PV guests which aren't
>> aware/prepared to be run with that feature enabled. Andrew's
>> original approach either sacrificed architectural correctness for
>> making 32-bit guests work again, or disabled SMAP also for not
>> insignificant portions of hypervisor code, both by allowing to control
>> the workaround mode via command line option.
> 
> Hi Jan, do you mind sharing more about Andrew's original approach?
> And to solve this issue can we just disable SMEP/SMAP for Xen itself,
> hence only HVM will benefit from this feature?

Did you look at the link still visible below? If you did, please be more
precise about the details you need.

>> This alternative approach disables SMEP and SMAP only while
>> running 32-bit PV guest code plus a few hypervisor instructions
>> early after entering hypervisor context from or late before exiting
>> to such guests. Those few instructions (in assembly source) are of
>> course much easier to prove not to perform undue memory
>> accesses than code paths reaching deep into C sources.
>> 
>> The 4th patch really is unrelated except for not applying cleanly
>> without the earlier ones, and the potential having been noticed
>> while putting together the 2nd one.
>> 
>> 1: move cached CR4 value to struct cpu_info
>> 2: suppress SMEP and SMAP while running 32-bit PV guest code
>> 3: use optimal NOPs to fill the SMEP/SMAP placeholders
>> 4: use 32-bit loads for 32-bit PV guest state reload
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v3: New patch 1, as a prereq to changes in patch 2 (previously
>>      1). The primary reason for this are performance issues that
>>      have been found by Andrew with the previous version.
>> v2: Various changes to patches 1 and 2 - see there.
>> 
>> [1] http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg03888.html 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2016-06-21  7:17 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 11:08 [PATCH 0/4] x86: accommodate 32-bit PV guests with SMAP/SMEP handling Jan Beulich
2016-03-04 11:27 ` [PATCH 1/4] x86/alternatives: correct near branch check Jan Beulich
2016-03-07 15:43   ` Andrew Cooper
2016-03-07 15:56     ` Jan Beulich
2016-03-07 16:11       ` Andrew Cooper
2016-03-07 16:21         ` Jan Beulich
2016-03-08 17:33           ` Andrew Cooper
2016-03-04 11:27 ` [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code Jan Beulich
2016-03-07 16:59   ` Andrew Cooper
2016-03-08  7:57     ` Jan Beulich
2016-03-09  8:09       ` Wu, Feng
2016-03-09 14:09         ` Jan Beulich
2016-03-09 11:19       ` Andrew Cooper
2016-03-09 14:28         ` Jan Beulich
2016-03-09  8:09   ` Wu, Feng
2016-03-09 10:45     ` Andrew Cooper
2016-03-09 12:27       ` Wu, Feng
2016-03-09 12:33         ` Andrew Cooper
2016-03-09 12:36           ` Jan Beulich
2016-03-09 12:54             ` Wu, Feng
2016-03-09 13:35             ` Wu, Feng
2016-03-09 13:42               ` Andrew Cooper
2016-03-09 14:03       ` Jan Beulich
2016-03-09 14:07     ` Jan Beulich
2016-03-04 11:28 ` [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders Jan Beulich
2016-03-07 17:43   ` Andrew Cooper
2016-03-08  8:02     ` Jan Beulich
2016-03-04 11:29 ` [PATCH 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
2016-03-07 17:45   ` Andrew Cooper
2016-03-10  9:44 ` [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
2016-03-10  9:53   ` [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
2016-05-13 15:48     ` Andrew Cooper
2016-03-10  9:54   ` [PATCH v2 2/3] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
2016-05-13 15:49     ` Andrew Cooper
2016-03-10  9:55   ` [PATCH v2 3/3] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
     [not found]   ` <56E9A0DB02000078000DD54C@prv-mh.provo.novell.com>
2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
2016-03-17  8:02       ` [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info Jan Beulich
2016-03-17 16:20         ` Andrew Cooper
2016-03-17  8:03       ` [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
2016-03-25 18:01         ` Konrad Rzeszutek Wilk
2016-03-29  6:55           ` Jan Beulich
2016-05-13 15:58         ` Andrew Cooper
2016-03-17  8:03       ` [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
2016-05-13 15:57         ` Andrew Cooper
2016-05-13 16:06           ` Jan Beulich
2016-05-13 16:09             ` Andrew Cooper
2016-03-17  8:04       ` [PATCH v3 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
2016-03-25 18:02         ` Konrad Rzeszutek Wilk
2016-03-17 16:14       ` [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses Jan Beulich
2016-03-25 18:47         ` Konrad Rzeszutek Wilk
2016-03-29  6:59           ` Jan Beulich
2016-03-30 14:28             ` Konrad Rzeszutek Wilk
2016-03-30 14:42               ` Jan Beulich
2016-05-13 16:11         ` Andrew Cooper
2016-05-03 13:58       ` Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
2016-05-03 14:10         ` Andrew Cooper
2016-05-03 14:25           ` Jan Beulich
2016-05-04 10:03             ` Andrew Cooper
2016-05-04 13:35               ` Jan Beulich
2016-05-04  3:07         ` Wu, Feng
2016-05-13 15:21         ` Wei Liu
2016-05-13 15:30           ` Jan Beulich
2016-05-13 15:33             ` Wei Liu
2016-05-13 17:02       ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Wei Liu
2016-05-13 17:21         ` Andrew Cooper
2016-06-21  6:19       ` Wu, Feng
2016-06-21  7:17         ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).