xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems
@ 2019-11-01 20:24 Andrew Cooper
  2019-11-01 20:25 ` [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Andrew Cooper @ 2019-11-01 20:24 UTC (permalink / raw)
  To: Xen-devel
  Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

I decided to dust off my early CPUID adjustments work in an effort to get
patch 3 in a sensible state for 4.13.  Ever so slightly RFC for 4.13 given
where we are in the release, but this is fairly self contained.

Andrew Cooper (2):
  x86/boot: Remove cached CPUID data from the trampoline
  x86/boot: Cache cpu_has_hypervisor very early on boot

Sergey Dyasli (1):
  x86/e820: fix 640k - 1M region reservation logic

 xen/arch/x86/apic.c             |  2 +-
 xen/arch/x86/boot/head.S        | 13 +++++++++++--
 xen/arch/x86/boot/trampoline.S  | 13 +++++--------
 xen/arch/x86/boot/wakeup.S      | 13 ++-----------
 xen/arch/x86/cpu/common.c       |  3 ---
 xen/arch/x86/cpu/intel.c        |  1 +
 xen/arch/x86/e820.c             |  4 ++--
 xen/arch/x86/efi/efi-boot.h     | 12 ++++++++----
 xen/arch/x86/guest/xen.c        |  6 +-----
 xen/arch/x86/mm.c               |  3 +--
 xen/include/asm-x86/processor.h |  2 +-
 11 files changed, 33 insertions(+), 39 deletions(-)

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-01 20:24 [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Andrew Cooper
@ 2019-11-01 20:25 ` Andrew Cooper
  2019-11-04 13:25   ` Jan Beulich
  2019-11-01 20:25 ` [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-01 20:25 UTC (permalink / raw)
  To: Xen-devel
  Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

We have a cached cpuid_ext_features in the trampoline which is kept in sync by
various pieces of boot logic.  This is complicated, and all it is actually
used for is to derive whether NX is safe to use.

Replace it with a canned value to load into EFER.

trampoline_setup() and efi_arch_cpu() now tweak trampoline_efer at the point
that they are stashing the main copy of CPUID data.  Similarly,
early_init_intel() needs to tweak if it has re-enabled the use of NX.

This simplifies the AP boot and S3 resume paths by using trampoline_efer
directly, rather than locally turning FEATURE_NX into EFER_NX.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Sergey Dyasli <sergey.dyasli@citrix.com>
CC: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/boot/head.S        |  9 +++++++--
 xen/arch/x86/boot/trampoline.S  | 13 +++++--------
 xen/arch/x86/boot/wakeup.S      | 13 ++-----------
 xen/arch/x86/cpu/common.c       |  3 ---
 xen/arch/x86/cpu/intel.c        |  1 +
 xen/arch/x86/efi/efi-boot.h     |  8 +++++---
 xen/include/asm-x86/processor.h |  2 +-
 7 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index a1564b520b..77309e3c82 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -640,8 +640,13 @@ trampoline_setup:
         jbe     1f
         mov     $0x80000001,%eax
         cpuid
-1:      mov     %edx,sym_fs(cpuid_ext_features)
-        mov     %edx,sym_fs(boot_cpu_data)+CPUINFO_FEATURE_OFFSET(X86_FEATURE_LM)
+1:      mov     %edx, sym_fs(boot_cpu_data) + CPUINFO_FEATURE_OFFSET(X86_FEATURE_LM)
+
+        /* Check for NX. Adjust EFER setting if available. */
+        bt      $cpufeat_bit(X86_FEATURE_NX), %edx
+        jnc     1f
+        orb     $EFER_NX >> 8, 1 + sym_esi(trampoline_efer)
+1:
 
         /* Check for availability of long mode. */
         bt      $cpufeat_bit(X86_FEATURE_LM),%edx
diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
index 870ec79a2d..26584493bb 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -88,8 +88,9 @@ trampoline_gdt:
 GLOBAL(trampoline_misc_enable_off)
         .quad   0
 
-GLOBAL(cpuid_ext_features)
-        .long   0
+/* EFER OR-mask for boot paths.  This gets adjusted with NX when available. */
+GLOBAL(trampoline_efer)
+        .long   EFER_LME | EFER_SCE
 
 GLOBAL(trampoline_xen_phys_start)
         .long   0
@@ -132,14 +133,10 @@ trampoline_protmode_entry:
 1:
 
         /* Set up EFER (Extended Feature Enable Register). */
-        mov     bootsym_rel(cpuid_ext_features,4,%edi)
         movl    $MSR_EFER,%ecx
         rdmsr
-        or      $EFER_LME|EFER_SCE,%eax   /* Long Mode + SYSCALL/SYSRET */
-        bt      $cpufeat_bit(X86_FEATURE_NX),%edi /* No Execute? */
-        jnc     1f
-        btsl    $_EFER_NX,%eax  /* No Execute     */
-1:      wrmsr
+        or      bootsym_rel(trampoline_efer, 4, %eax)
+        wrmsr
 
         mov     $(X86_CR0_PG | X86_CR0_AM | X86_CR0_WP | X86_CR0_NE |\
                   X86_CR0_ET | X86_CR0_MP | X86_CR0_PE), %eax
diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
index 25ec2fa32b..fc47721f43 100644
--- a/xen/arch/x86/boot/wakeup.S
+++ b/xen/arch/x86/boot/wakeup.S
@@ -131,20 +131,11 @@ wakeup_32:
         wrmsr
 1:
 
-        /* Will cpuid feature change after resume? */
         /* Set up EFER (Extended Feature Enable Register). */
-        mov     bootsym_rel(cpuid_ext_features,4,%edi)
-        test    $0x20100800,%edi /* SYSCALL/SYSRET, No Execute, Long Mode? */
-        jz      .Lskip_eferw
         movl    $MSR_EFER,%ecx
         rdmsr
-        btsl    $_EFER_LME,%eax /* Long Mode      */
-        btsl    $_EFER_SCE,%eax /* SYSCALL/SYSRET */
-        btl     $20,%edi        /* No Execute?    */
-        jnc     1f
-        btsl    $_EFER_NX,%eax  /* No Execute     */
-1:      wrmsr
-.Lskip_eferw:
+        or      bootsym_rel(trampoline_efer, 4, %eax)
+        wrmsr
 
         wbinvd
 
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 6c6bd63301..e5ad17d8d9 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -391,9 +391,6 @@ static void generic_identify(struct cpuinfo_x86 *c)
 		cpuid(0x80000001, &tmp, &tmp,
 		      &c->x86_capability[cpufeat_word(X86_FEATURE_LAHF_LM)],
 		      &c->x86_capability[cpufeat_word(X86_FEATURE_SYSCALL)]);
-	if (c == &boot_cpu_data)
-		bootsym(cpuid_ext_features) =
-			c->x86_capability[cpufeat_word(X86_FEATURE_NX)];
 
 	if (c->extended_cpuid_level >= 0x80000004)
 		get_model_name(c); /* Default name */
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 5356a6ae10..4d7324e4d0 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (disable) {
 		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
 		bootsym(trampoline_misc_enable_off) |= disable;
+		bootsym(trampoline_efer) |= EFER_NX;
 	}
 
 	if (disable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID)
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 940ce12706..cde193a771 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -238,7 +238,7 @@ static void __init noreturn efi_arch_post_exit_boot(void)
     asm volatile("pushq $0\n\tpopfq");
     rdmsrl(MSR_EFER, efer);
     efer |= EFER_SCE;
-    if ( cpuid_ext_features & cpufeat_mask(X86_FEATURE_NX) )
+    if ( cpu_has_nx )
         efer |= EFER_NX;
     wrmsrl(MSR_EFER, efer);
     write_cr0(X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP |
@@ -640,9 +640,11 @@ static void __init efi_arch_cpu(void)
 
     if ( (eax >> 16) == 0x8000 && eax > 0x80000000 )
     {
-        cpuid_ext_features = cpuid_edx(0x80000001);
         boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_SYSCALL)]
-            = cpuid_ext_features;
+            = cpuid_edx(0x80000001);
+
+        if ( cpu_has_nx )
+            trampoline_efer |= EFER_NX;
     }
 }
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index b686156ea0..45d8f5117e 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -151,7 +151,7 @@ extern void ctxt_switch_levelling(const struct vcpu *next);
 extern void (*ctxt_switch_masking)(const struct vcpu *next);
 
 extern bool_t opt_cpu_info;
-extern u32 cpuid_ext_features;
+extern u32 trampoline_efer;
 extern u64 trampoline_misc_enable_off;
 
 /* Maximum width of physical addresses supported by the hardware. */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
  2019-11-01 20:24 [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Andrew Cooper
  2019-11-01 20:25 ` [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline Andrew Cooper
@ 2019-11-01 20:25 ` Andrew Cooper
  2019-11-04 13:32   ` Jan Beulich
  2019-11-05 11:08   ` Sergey Dyasli
  2019-11-01 20:25 ` [Xen-devel] [PATCH 3/3] x86/e820: fix 640k - 1M region reservation logic Andrew Cooper
  2019-11-20  5:20 ` [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Jürgen Groß
  3 siblings, 2 replies; 22+ messages in thread
From: Andrew Cooper @ 2019-11-01 20:25 UTC (permalink / raw)
  To: Xen-devel
  Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

We cache Long Mode and No Execute early on boot, so take the opportunity to
cache HYPERVISOR early as well.

Replace opencoded early access to the feature bit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Sergey Dyasli <sergey.dyasli@citrix.com>
CC: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/apic.c         | 2 +-
 xen/arch/x86/boot/head.S    | 4 ++++
 xen/arch/x86/efi/efi-boot.h | 6 ++++--
 xen/arch/x86/guest/xen.c    | 6 +-----
 xen/arch/x86/mm.c           | 3 +--
 5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index a5f7b05d5a..a8ee18636f 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1156,7 +1156,7 @@ static void __init check_deadline_errata(void)
     const struct x86_cpu_id *m;
     unsigned int rev;
 
-    if ( boot_cpu_has(X86_FEATURE_HYPERVISOR) )
+    if ( cpu_has_hypervisor )
         return;
 
     m = x86_match_cpu(deadline_match);
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 77309e3c82..8d0ffbd1b0 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -630,6 +630,10 @@ trampoline_setup:
 
 1:
         /* Interrogate CPU extended features via CPUID. */
+        mov     $1, %eax
+        cpuid
+        mov     %ecx, sym_fs(boot_cpu_data) + CPUINFO_FEATURE_OFFSET(X86_FEATURE_HYPERVISOR)
+
         mov     $0x80000000,%eax
         cpuid
         shld    $16,%eax,%ecx
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index cde193a771..232972eedf 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -637,11 +637,13 @@ static void __init efi_arch_handle_module(struct file *file, const CHAR16 *name,
 static void __init efi_arch_cpu(void)
 {
     uint32_t eax = cpuid_eax(0x80000000);
+    uint32_t *caps = boot_cpu_data.x86_capability;
+
+    caps[cpufeat_word(X86_FEATURE_HYPERVISOR)] = cpuid_ecx(1);
 
     if ( (eax >> 16) == 0x8000 && eax > 0x80000000 )
     {
-        boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_SYSCALL)]
-            = cpuid_edx(0x80000001);
+        caps[cpufeat_word(X86_FEATURE_SYSCALL)] = cpuid_edx(0x80000001);
 
         if ( cpu_has_nx )
             trampoline_efer |= EFER_NX;
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 7b7a5badab..a329e7c886 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -69,11 +69,7 @@ static void __init find_xen_leaves(void)
 
 void __init probe_hypervisor(void)
 {
-    if ( xen_guest )
-        return;
-
-    /* Too early to use cpu_has_hypervisor */
-    if ( !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) )
+    if ( xen_guest || !cpu_has_hypervisor )
         return;
 
     find_xen_leaves();
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 57f22775ac..bd8182f40f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6112,8 +6112,7 @@ const struct platform_bad_page *__init get_platform_badpages(unsigned int *array
     case 0x000506e0: /* errata SKL167 / SKW159 */
     case 0x000806e0: /* erratum KBL??? */
     case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
-        *array_size = (cpuid_eax(0) >= 7 &&
-                       !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) &&
+        *array_size = (cpuid_eax(0) >= 7 && !cpu_has_hypervisor &&
                        (cpuid_count_ebx(7, 0) & cpufeat_mask(X86_FEATURE_HLE)));
         return &hle_bad_page;
     }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Xen-devel] [PATCH 3/3] x86/e820: fix 640k - 1M region reservation logic
  2019-11-01 20:24 [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Andrew Cooper
  2019-11-01 20:25 ` [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline Andrew Cooper
  2019-11-01 20:25 ` [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot Andrew Cooper
@ 2019-11-01 20:25 ` Andrew Cooper
  2019-11-04 13:35   ` Jan Beulich
  2019-11-20  5:20 ` [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Jürgen Groß
  3 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-01 20:25 UTC (permalink / raw)
  To: Xen-devel
  Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Jan Beulich, Roger Pau Monné

From: Sergey Dyasli <sergey.dyasli@citrix.com>

Converting a guest from PV to PV-in-PVH makes the guest to have 384k
less memory, which may confuse guest's balloon driver. This happens
because Xen unconditionally reserves 640k - 1M region in E820 despite
the fact that it's really a usable RAM in PVH boot mode.

Fix this by skipping region type change in virtualised environments,
trusting whatever memory map our hypervisor has provided.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Sergey Dyasli <sergey.dyasli@citrix.com>
CC: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/e820.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index 8e8a2c4e1b..082f9928a1 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -318,9 +318,9 @@ static int __init copy_e820_map(struct e820entry * biosmap, unsigned int nr_map)
 
         /*
          * Some BIOSes claim RAM in the 640k - 1M region.
-         * Not right. Fix it up.
+         * Not right. Fix it up, but only when running on bare metal.
          */
-        if (type == E820_RAM) {
+        if (!cpu_has_hypervisor && type == E820_RAM) {
             if (start < 0x100000ULL && end > 0xA0000ULL) {
                 if (start < 0xA0000ULL)
                     add_memory_region(start, 0xA0000ULL-start, type);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-01 20:25 ` [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline Andrew Cooper
@ 2019-11-04 13:25   ` Jan Beulich
  2019-11-04 14:59     ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2019-11-04 13:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 01.11.2019 21:25, Andrew Cooper wrote:
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>  	if (disable) {
>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>  		bootsym(trampoline_misc_enable_off) |= disable;
> +		bootsym(trampoline_efer) |= EFER_NX;
>  	}

I'm fine with all other changes here, just this one concerns me:
Before your change we latch a value into trampoline_misc_enable_off
just to be used for subsequent IA32_MISC_ENABLE writes we do. This
means that, on a hypervisor (like Xen itself) simply discarding
guest writes to the MSR (which is "fine" with the described usage
of ours up to now), with your change we'd now end up trying to set
EFER.NX when the feature may not actually be enabled in
IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
I.e. don't we need to read back the MSR value here, and verify
we actually managed to clear the bit before actually OR-ing in
EFER_NX?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
  2019-11-01 20:25 ` [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot Andrew Cooper
@ 2019-11-04 13:32   ` Jan Beulich
  2019-11-04 15:31     ` Andrew Cooper
  2019-11-05 11:08   ` Sergey Dyasli
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2019-11-04 13:32 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 01.11.2019 21:25, Andrew Cooper wrote:
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -630,6 +630,10 @@ trampoline_setup:
>  
>  1:
>          /* Interrogate CPU extended features via CPUID. */
> +        mov     $1, %eax
> +        cpuid
> +        mov     %ecx, sym_fs(boot_cpu_data) + CPUINFO_FEATURE_OFFSET(X86_FEATURE_HYPERVISOR)

I understand the ECX output is all we need right now. But wouldn't
it be better to then store EDX as well (and similarly ECX of
0x80000001 output)?

Also, along the lines of a question back to Sergey on his
standalone patch, wouldn't it be better to take the opportunity
and check here that CPUID leaf 1 is actually valid?

Of course the question on the (intended) effect of
"cpuid=no-hypervisor" also remains. As said before, I think this
should be honored by all of our code that possibly can (i.e. at
least everything following command line parsing).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 3/3] x86/e820: fix 640k - 1M region reservation logic
  2019-11-01 20:25 ` [Xen-devel] [PATCH 3/3] x86/e820: fix 640k - 1M region reservation logic Andrew Cooper
@ 2019-11-04 13:35   ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2019-11-04 13:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Xen-devel, Roger Pau Monné, Wei Liu, Sergey Dyasli

On 01.11.2019 21:25, Andrew Cooper wrote:
> From: Sergey Dyasli <sergey.dyasli@citrix.com>
> 
> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
> less memory, which may confuse guest's balloon driver. This happens
> because Xen unconditionally reserves 640k - 1M region in E820 despite
> the fact that it's really a usable RAM in PVH boot mode.
> 
> Fix this by skipping region type change in virtualised environments,
> trusting whatever memory map our hypervisor has provided.
> 
> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-04 13:25   ` Jan Beulich
@ 2019-11-04 14:59     ` Andrew Cooper
  2019-11-04 15:03       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-04 14:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 04/11/2019 13:25, Jan Beulich wrote:
> On 01.11.2019 21:25, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu/intel.c
>> +++ b/xen/arch/x86/cpu/intel.c
>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>>  	if (disable) {
>>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>>  		bootsym(trampoline_misc_enable_off) |= disable;
>> +		bootsym(trampoline_efer) |= EFER_NX;
>>  	}
> I'm fine with all other changes here, just this one concerns me:
> Before your change we latch a value into trampoline_misc_enable_off
> just to be used for subsequent IA32_MISC_ENABLE writes we do. This
> means that, on a hypervisor (like Xen itself) simply discarding
> guest writes to the MSR (which is "fine" with the described usage
> of ours up to now), with your change we'd now end up trying to set
> EFER.NX when the feature may not actually be enabled in
> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
> I.e. don't we need to read back the MSR value here, and verify
> we actually managed to clear the bit before actually OR-ing in
> EFER_NX?

If this is a problem in practice, execution won't continue beyond the
next if() condition just out of context, which set EFER.NX on the BSP
with an unguarded WRMSR.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-04 14:59     ` Andrew Cooper
@ 2019-11-04 15:03       ` Jan Beulich
  2019-11-04 15:22         ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2019-11-04 15:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 04.11.2019 15:59, Andrew Cooper wrote:
> On 04/11/2019 13:25, Jan Beulich wrote:
>> On 01.11.2019 21:25, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/cpu/intel.c
>>> +++ b/xen/arch/x86/cpu/intel.c
>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>>>  	if (disable) {
>>>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>>>  		bootsym(trampoline_misc_enable_off) |= disable;
>>> +		bootsym(trampoline_efer) |= EFER_NX;
>>>  	}
>> I'm fine with all other changes here, just this one concerns me:
>> Before your change we latch a value into trampoline_misc_enable_off
>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This
>> means that, on a hypervisor (like Xen itself) simply discarding
>> guest writes to the MSR (which is "fine" with the described usage
>> of ours up to now), with your change we'd now end up trying to set
>> EFER.NX when the feature may not actually be enabled in
>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
>> I.e. don't we need to read back the MSR value here, and verify
>> we actually managed to clear the bit before actually OR-ing in
>> EFER_NX?
> 
> If this is a problem in practice, execution won't continue beyond the
> next if() condition just out of context, which set EFER.NX on the BSP
> with an unguarded WRMSR.

And how is this any good? This kind of crash is exactly what I'm
asking to avoid.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-04 15:03       ` Jan Beulich
@ 2019-11-04 15:22         ` Andrew Cooper
  2019-11-04 15:31           ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-04 15:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 04/11/2019 15:03, Jan Beulich wrote:
> On 04.11.2019 15:59, Andrew Cooper wrote:
>> On 04/11/2019 13:25, Jan Beulich wrote:
>>> On 01.11.2019 21:25, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/cpu/intel.c
>>>> +++ b/xen/arch/x86/cpu/intel.c
>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>>>>  	if (disable) {
>>>>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>>>>  		bootsym(trampoline_misc_enable_off) |= disable;
>>>> +		bootsym(trampoline_efer) |= EFER_NX;
>>>>  	}
>>> I'm fine with all other changes here, just this one concerns me:
>>> Before your change we latch a value into trampoline_misc_enable_off
>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This
>>> means that, on a hypervisor (like Xen itself) simply discarding
>>> guest writes to the MSR (which is "fine" with the described usage
>>> of ours up to now), with your change we'd now end up trying to set
>>> EFER.NX when the feature may not actually be enabled in
>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
>>> I.e. don't we need to read back the MSR value here, and verify
>>> we actually managed to clear the bit before actually OR-ing in
>>> EFER_NX?
>> If this is a problem in practice, execution won't continue beyond the
>> next if() condition just out of context, which set EFER.NX on the BSP
>> with an unguarded WRMSR.
> And how is this any good? This kind of crash is exactly what I'm
> asking to avoid.

What is the point of working around a theoretical edge case of broken
nesting under Xen which demonstrably doesn't exist in practice?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
  2019-11-04 13:32   ` Jan Beulich
@ 2019-11-04 15:31     ` Andrew Cooper
  2019-11-04 15:41       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-04 15:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 04/11/2019 13:32, Jan Beulich wrote:
> On 01.11.2019 21:25, Andrew Cooper wrote:
>> --- a/xen/arch/x86/boot/head.S
>> +++ b/xen/arch/x86/boot/head.S
>> @@ -630,6 +630,10 @@ trampoline_setup:
>>  
>>  1:
>>          /* Interrogate CPU extended features via CPUID. */
>> +        mov     $1, %eax
>> +        cpuid
>> +        mov     %ecx, sym_fs(boot_cpu_data) + CPUINFO_FEATURE_OFFSET(X86_FEATURE_HYPERVISOR)
> I understand the ECX output is all we need right now. But wouldn't
> it be better to then store EDX as well (and similarly ECX of
> 0x80000001 output)?

No - I don't think so.

I did debate applying an and mask for the features we only intend to be
usable this early, to avoid getting buggy code which checks for
unexpected features this early.

> Also, along the lines of a question back to Sergey on his
> standalone patch, wouldn't it be better to take the opportunity
> and check here that CPUID leaf 1 is actually valid?

There is no such thing as a 64-bit capable CPU without leaf 1.

> Of course the question on the (intended) effect of
> "cpuid=no-hypervisor" also remains. As said before, I think this
> should be honored by all of our code that possibly can (i.e. at
> least everything following command line parsing).

There is no chance of making that work in a sane way - we use
cpu_has_hypervisor for quite a few things before the command line gets
parsed.

If you're booting under a hypervisor, your hypervisor ought to have a
way to turn that off.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-04 15:22         ` Andrew Cooper
@ 2019-11-04 15:31           ` Jan Beulich
  2019-11-12 16:09             ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2019-11-04 15:31 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 04.11.2019 16:22, Andrew Cooper wrote:
> On 04/11/2019 15:03, Jan Beulich wrote:
>> On 04.11.2019 15:59, Andrew Cooper wrote:
>>> On 04/11/2019 13:25, Jan Beulich wrote:
>>>> On 01.11.2019 21:25, Andrew Cooper wrote:
>>>>> --- a/xen/arch/x86/cpu/intel.c
>>>>> +++ b/xen/arch/x86/cpu/intel.c
>>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>>>>>  	if (disable) {
>>>>>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>>>>>  		bootsym(trampoline_misc_enable_off) |= disable;
>>>>> +		bootsym(trampoline_efer) |= EFER_NX;
>>>>>  	}
>>>> I'm fine with all other changes here, just this one concerns me:
>>>> Before your change we latch a value into trampoline_misc_enable_off
>>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This
>>>> means that, on a hypervisor (like Xen itself) simply discarding
>>>> guest writes to the MSR (which is "fine" with the described usage
>>>> of ours up to now), with your change we'd now end up trying to set
>>>> EFER.NX when the feature may not actually be enabled in
>>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
>>>> I.e. don't we need to read back the MSR value here, and verify
>>>> we actually managed to clear the bit before actually OR-ing in
>>>> EFER_NX?
>>> If this is a problem in practice, execution won't continue beyond the
>>> next if() condition just out of context, which set EFER.NX on the BSP
>>> with an unguarded WRMSR.
>> And how is this any good? This kind of crash is exactly what I'm
>> asking to avoid.
> 
> What is the point of working around a theoretical edge case of broken
> nesting under Xen which demonstrably doesn't exist in practice?

Well, so far nothing was said about this not being an actual problem.
I simply don't know whether hardware would refuse such an EFER write.
If it does, it would be appropriate for hypervisors to also refuse
it. I.e. if we don't do so right now, correcting the behavior would
trip the code here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
  2019-11-04 15:31     ` Andrew Cooper
@ 2019-11-04 15:41       ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2019-11-04 15:41 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Sergey Dyasli, Xen-devel, Wei Liu, Roger Pau Monné

On 04.11.2019 16:31, Andrew Cooper wrote:
> On 04/11/2019 13:32, Jan Beulich wrote:
>> On 01.11.2019 21:25, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/boot/head.S
>>> +++ b/xen/arch/x86/boot/head.S
>>> @@ -630,6 +630,10 @@ trampoline_setup:
>>>  
>>>  1:
>>>          /* Interrogate CPU extended features via CPUID. */
>>> +        mov     $1, %eax
>>> +        cpuid
>>> +        mov     %ecx, sym_fs(boot_cpu_data) + CPUINFO_FEATURE_OFFSET(X86_FEATURE_HYPERVISOR)
>> I understand the ECX output is all we need right now. But wouldn't
>> it be better to then store EDX as well (and similarly ECX of
>> 0x80000001 output)?
> 
> No - I don't think so.
> 
> I did debate applying an and mask for the features we only intend to be
> usable this early, to avoid getting buggy code which checks for
> unexpected features this early.

Indeed doing so would seem a good reason to not also store the EDX
value here.

>> Also, along the lines of a question back to Sergey on his
>> standalone patch, wouldn't it be better to take the opportunity
>> and check here that CPUID leaf 1 is actually valid?
> 
> There is no such thing as a 64-bit capable CPU without leaf 1.

About anything can be constructed under a hypervisor. But well, I
guess I'll stop mumbling on this aspect.

>> Of course the question on the (intended) effect of
>> "cpuid=no-hypervisor" also remains. As said before, I think this
>> should be honored by all of our code that possibly can (i.e. at
>> least everything following command line parsing).
> 
> There is no chance of making that work in a sane way - we use
> cpu_has_hypervisor for quite a few things before the command line gets
> parsed.

You said so the other day, but iirc when checking I wasn't able to
identify any such case, let alone "quite a few".

Anyway, feel free to add
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
  2019-11-01 20:25 ` [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot Andrew Cooper
  2019-11-04 13:32   ` Jan Beulich
@ 2019-11-05 11:08   ` Sergey Dyasli
  1 sibling, 0 replies; 22+ messages in thread
From: Sergey Dyasli @ 2019-11-05 11:08 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Juergen Gross, sergey.dyasli@citrix.com >> Sergey Dyasli,
	Wei Liu, Jan Beulich, Roger Pau Monné

On 01/11/2019 20:25, Andrew Cooper wrote:
> We cache Long Mode and No Execute early on boot, so take the opportunity to
> cache HYPERVISOR early as well.

Either:
1. the description needs clarifying that the whole 1c featureset is
   being stored, or
2. a mask should be applied to store only the hypervisor bit
   (this would be a safer option)

> Replace opencoded early access to the feature bit.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

--
Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-04 15:31           ` Jan Beulich
@ 2019-11-12 16:09             ` Andrew Cooper
  2019-11-12 17:15               ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-12 16:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Xen-devel, Sergey Dyasli, Wei Liu, Roger Pau Monné

On 04/11/2019 15:31, Jan Beulich wrote:
> On 04.11.2019 16:22, Andrew Cooper wrote:
>> On 04/11/2019 15:03, Jan Beulich wrote:
>>> On 04.11.2019 15:59, Andrew Cooper wrote:
>>>> On 04/11/2019 13:25, Jan Beulich wrote:
>>>>> On 01.11.2019 21:25, Andrew Cooper wrote:
>>>>>> --- a/xen/arch/x86/cpu/intel.c
>>>>>> +++ b/xen/arch/x86/cpu/intel.c
>>>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>>>>>>  	if (disable) {
>>>>>>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>>>>>>  		bootsym(trampoline_misc_enable_off) |= disable;
>>>>>> +		bootsym(trampoline_efer) |= EFER_NX;
>>>>>>  	}
>>>>> I'm fine with all other changes here, just this one concerns me:
>>>>> Before your change we latch a value into trampoline_misc_enable_off
>>>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This
>>>>> means that, on a hypervisor (like Xen itself) simply discarding
>>>>> guest writes to the MSR (which is "fine" with the described usage
>>>>> of ours up to now), with your change we'd now end up trying to set
>>>>> EFER.NX when the feature may not actually be enabled in
>>>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
>>>>> I.e. don't we need to read back the MSR value here, and verify
>>>>> we actually managed to clear the bit before actually OR-ing in
>>>>> EFER_NX?
>>>> If this is a problem in practice, execution won't continue beyond the
>>>> next if() condition just out of context, which set EFER.NX on the BSP
>>>> with an unguarded WRMSR.
>>> And how is this any good? This kind of crash is exactly what I'm
>>> asking to avoid.
>> What is the point of working around a theoretical edge case of broken
>> nesting under Xen which demonstrably doesn't exist in practice?
> Well, so far nothing was said about this not being an actual problem.

Its not an actual problem.  If it were, we would have had crash reports.

> I simply don't know whether hardware would refuse such an EFER write.

I've just experimented - writing EFER.NX takes a #GP fault when
MISC_ENABLE.XD is set.

> If it does, it would be appropriate for hypervisors to also refuse
> it. I.e. if we don't do so right now, correcting the behavior would
> trip the code here.

MISC_ENABLES.XD is architectural on any Intel system which enumerates
NX, and if the bit is set, it can be cleared.  (Although the semantics
described in the SDM are broken.  It is only available if NX is
enumerated, which is obfuscated by setting XD).

However, no hypervisor is going to bother virtualising this
functionality.  Either configure the VM with NX or without.  (KVM for
example doesn't virtualise MISC_ENABLES at all.)

There is one corner case on out-of-support versions of Xen (which don't
clear XD themselves) where XD would leak through and be ignored, after
which Xen will take a #GP fault trying to set EFER.NX, but I am still
firmly of the opinion that it is not worth putting in a workaround for
an obsolete issue which doesn't exist in practice.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-12 16:09             ` Andrew Cooper
@ 2019-11-12 17:15               ` Jan Beulich
  2019-11-13 13:22                 ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2019-11-12 17:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Sergey Dyasli, Xen-devel, Wei Liu, Roger Pau Monné

On 12.11.2019 17:09, Andrew Cooper wrote:
> On 04/11/2019 15:31, Jan Beulich wrote:
>> On 04.11.2019 16:22, Andrew Cooper wrote:
>>> On 04/11/2019 15:03, Jan Beulich wrote:
>>>> On 04.11.2019 15:59, Andrew Cooper wrote:
>>>>> On 04/11/2019 13:25, Jan Beulich wrote:
>>>>>> On 01.11.2019 21:25, Andrew Cooper wrote:
>>>>>>> --- a/xen/arch/x86/cpu/intel.c
>>>>>>> +++ b/xen/arch/x86/cpu/intel.c
>>>>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>>>>>>>  	if (disable) {
>>>>>>>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>>>>>>>  		bootsym(trampoline_misc_enable_off) |= disable;
>>>>>>> +		bootsym(trampoline_efer) |= EFER_NX;
>>>>>>>  	}
>>>>>> I'm fine with all other changes here, just this one concerns me:
>>>>>> Before your change we latch a value into trampoline_misc_enable_off
>>>>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This
>>>>>> means that, on a hypervisor (like Xen itself) simply discarding
>>>>>> guest writes to the MSR (which is "fine" with the described usage
>>>>>> of ours up to now), with your change we'd now end up trying to set
>>>>>> EFER.NX when the feature may not actually be enabled in
>>>>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
>>>>>> I.e. don't we need to read back the MSR value here, and verify
>>>>>> we actually managed to clear the bit before actually OR-ing in
>>>>>> EFER_NX?
>>>>> If this is a problem in practice, execution won't continue beyond the
>>>>> next if() condition just out of context, which set EFER.NX on the BSP
>>>>> with an unguarded WRMSR.
>>>> And how is this any good? This kind of crash is exactly what I'm
>>>> asking to avoid.
>>> What is the point of working around a theoretical edge case of broken
>>> nesting under Xen which demonstrably doesn't exist in practice?
>> Well, so far nothing was said about this not being an actual problem.
> 
> Its not an actual problem.  If it were, we would have had crash reports.
> 
>> I simply don't know whether hardware would refuse such an EFER write.
> 
> I've just experimented - writing EFER.NX takes a #GP fault when
> MISC_ENABLE.XD is set.
> 
>> If it does, it would be appropriate for hypervisors to also refuse
>> it. I.e. if we don't do so right now, correcting the behavior would
>> trip the code here.
> 
> MISC_ENABLES.XD is architectural on any Intel system which enumerates
> NX, and if the bit is set, it can be cleared.  (Although the semantics
> described in the SDM are broken.  It is only available if NX is
> enumerated, which is obfuscated by setting XD).
> 
> However, no hypervisor is going to bother virtualising this
> functionality.  Either configure the VM with NX or without.  (KVM for
> example doesn't virtualise MISC_ENABLES at all.)

I'm sorry, but I still don't follow: You say "if the bit is set, it
can be cleared", which is clearly not in line with our current guest
MSR write handling. It just so happens that we have no command line
option allowing to suppress the clearing of XD. If we had, according
to your findings above we'd run into a #GP upon trying to set NX.
How can you easily exclude another hypervisor actually doing so (and
nobody having run into the issue simply because the option is rarely
used)?

Btw - all would be fine if the code in question was guarded by an
NX feature check, but as you say that's not possible because XD set
forces NX clear. However, our setting of EFER.NX could be guarded
this way, as we _expect_ XD to be clear at that point.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-12 17:15               ` Jan Beulich
@ 2019-11-13 13:22                 ` Andrew Cooper
  2019-11-13 13:29                   ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-13 13:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Sergey Dyasli, Xen-devel, Wei Liu, Roger Pau Monné

On 12/11/2019 17:15, Jan Beulich wrote:
> On 12.11.2019 17:09, Andrew Cooper wrote:
>> On 04/11/2019 15:31, Jan Beulich wrote:
>>> On 04.11.2019 16:22, Andrew Cooper wrote:
>>>> On 04/11/2019 15:03, Jan Beulich wrote:
>>>>> On 04.11.2019 15:59, Andrew Cooper wrote:
>>>>>> On 04/11/2019 13:25, Jan Beulich wrote:
>>>>>>> On 01.11.2019 21:25, Andrew Cooper wrote:
>>>>>>>> --- a/xen/arch/x86/cpu/intel.c
>>>>>>>> +++ b/xen/arch/x86/cpu/intel.c
>>>>>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>>>>>>>>  	if (disable) {
>>>>>>>>  		wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
>>>>>>>>  		bootsym(trampoline_misc_enable_off) |= disable;
>>>>>>>> +		bootsym(trampoline_efer) |= EFER_NX;
>>>>>>>>  	}
>>>>>>> I'm fine with all other changes here, just this one concerns me:
>>>>>>> Before your change we latch a value into trampoline_misc_enable_off
>>>>>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This
>>>>>>> means that, on a hypervisor (like Xen itself) simply discarding
>>>>>>> guest writes to the MSR (which is "fine" with the described usage
>>>>>>> of ours up to now), with your change we'd now end up trying to set
>>>>>>> EFER.NX when the feature may not actually be enabled in
>>>>>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)?
>>>>>>> I.e. don't we need to read back the MSR value here, and verify
>>>>>>> we actually managed to clear the bit before actually OR-ing in
>>>>>>> EFER_NX?
>>>>>> If this is a problem in practice, execution won't continue beyond the
>>>>>> next if() condition just out of context, which set EFER.NX on the BSP
>>>>>> with an unguarded WRMSR.
>>>>> And how is this any good? This kind of crash is exactly what I'm
>>>>> asking to avoid.
>>>> What is the point of working around a theoretical edge case of broken
>>>> nesting under Xen which demonstrably doesn't exist in practice?
>>> Well, so far nothing was said about this not being an actual problem.
>> Its not an actual problem.  If it were, we would have had crash reports.
>>
>>> I simply don't know whether hardware would refuse such an EFER write.
>> I've just experimented - writing EFER.NX takes a #GP fault when
>> MISC_ENABLE.XD is set.
>>
>>> If it does, it would be appropriate for hypervisors to also refuse
>>> it. I.e. if we don't do so right now, correcting the behavior would
>>> trip the code here.
>> MISC_ENABLES.XD is architectural on any Intel system which enumerates
>> NX, and if the bit is set, it can be cleared.  (Although the semantics
>> described in the SDM are broken.  It is only available if NX is
>> enumerated, which is obfuscated by setting XD).
>>
>> However, no hypervisor is going to bother virtualising this
>> functionality.  Either configure the VM with NX or without.  (KVM for
>> example doesn't virtualise MISC_ENABLES at all.)
> I'm sorry, but I still don't follow: You say "if the bit is set, it
> can be cleared", which is clearly not in line with our current guest
> MSR write handling.

Yes - Xen's MSR handing is broken, but you snipped that part of my reply.

> It just so happens that we have no command line
> option allowing to suppress the clearing of XD.

Nor does Linux.  As to the other hypervisors...

> If we had, according
> to your findings above we'd run into a #GP upon trying to set NX.

Yes.

> How can you easily exclude another hypervisor actually doing so (and
> nobody having run into the issue simply because the option is rarely
> used)?

... observe that they require NX support as a prerequisite to install. 
You will not find a system with XD set these days.

> Btw - all would be fine if the code in question was guarded by an
> NX feature check, but as you say that's not possible because XD set
> forces NX clear. However, our setting of EFER.NX could be guarded
> this way, as we _expect_ XD to be clear at that point.

XD was clearly never designed for the OS to find and turn off, but NX
functionality is simply too important to let misconfigured firmware get
in the way of using.


The long and the short of it is that this patch does not change Xen's
behaviour WRT poorly virtualised XD.

I am not convinced the behaviour is worth changing, and I don't have
time for this scope creep.  If you want to submit a fix then go ahead,
but patch 3 of this is important to get in for 4.13

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-13 13:22                 ` Andrew Cooper
@ 2019-11-13 13:29                   ` Jan Beulich
  2019-11-19 15:15                     ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2019-11-13 13:29 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Sergey Dyasli, Xen-devel, Wei Liu, Roger Pau Monné

On 13.11.2019 14:22, Andrew Cooper wrote:
> I am not convinced the behaviour is worth changing, and I don't have
> time for this scope creep.

There's no scope creep here at all. All I'm asking for is that you
don't blindly OR in EFER_NX into trampoline_efer, but rather check
that it will be possible to successfully set it after the
MISC_ENABLE write (by reading back the value, or by reading
CPUID[0x80000001].NX again).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-13 13:29                   ` Jan Beulich
@ 2019-11-19 15:15                     ` Andrew Cooper
  2019-11-19 16:44                       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2019-11-19 15:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Sergey Dyasli, Xen-devel, Wei Liu, Roger Pau Monné

On 13/11/2019 13:29, Jan Beulich wrote:
> On 13.11.2019 14:22, Andrew Cooper wrote:
>> I am not convinced the behaviour is worth changing, and I don't have
>> time for this scope creep.
> There's no scope creep here at all.

Yes - it really is scope creep.

This patch does not change the behaviour of Xen in the case of poor
virtualisation of the bit.  Xen will still crash either way.

I have explained, repeatedly now, why I am not inclined to fix this. It
is a bug which doesn't exist in practice.

You are welcome to fix this yourself, in separate change to match the
separate scope, when you are not blocking a 4.13 fix with your request.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-19 15:15                     ` Andrew Cooper
@ 2019-11-19 16:44                       ` Jan Beulich
  2019-11-21 15:23                         ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2019-11-19 16:44 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Sergey Dyasli, Xen-devel, Wei Liu, Roger Pau Monné

On 19.11.2019 16:15, Andrew Cooper wrote:
> On 13/11/2019 13:29, Jan Beulich wrote:
>> On 13.11.2019 14:22, Andrew Cooper wrote:
>>> I am not convinced the behaviour is worth changing, and I don't have
>>> time for this scope creep.
>> There's no scope creep here at all.
> 
> Yes - it really is scope creep.
> 
> This patch does not change the behaviour of Xen in the case of poor
> virtualisation of the bit.  Xen will still crash either way.

So I have to apologize. What I didn't notice is

	if (disable & MSR_IA32_MISC_ENABLE_XD_DISABLE) {
		write_efer(read_efer() | EFER_NX);
		printk(KERN_INFO
		       "re-enabled NX (Execute Disable) protection\n");
	}

in early_init_intel(). I simply didn't expect we'd already have
such a blind EFER write. I therefore agree now that this is a
pre-existing bug that you don't make any worse.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> I have explained, repeatedly now, why I am not inclined to fix this. It
> is a bug which doesn't exist in practice.

I should have been looking more closely; the lack of sufficient
context did misguide me.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems
  2019-11-01 20:24 [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Andrew Cooper
                   ` (2 preceding siblings ...)
  2019-11-01 20:25 ` [Xen-devel] [PATCH 3/3] x86/e820: fix 640k - 1M region reservation logic Andrew Cooper
@ 2019-11-20  5:20 ` Jürgen Groß
  3 siblings, 0 replies; 22+ messages in thread
From: Jürgen Groß @ 2019-11-20  5:20 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Sergey Dyasli, Wei Liu, Jan Beulich, Roger Pau Monné

On 01.11.19 21:24, Andrew Cooper wrote:
> I decided to dust off my early CPUID adjustments work in an effort to get
> patch 3 in a sensible state for 4.13.  Ever so slightly RFC for 4.13 given
> where we are in the release, but this is fairly self contained.
> 
> Andrew Cooper (2):
>    x86/boot: Remove cached CPUID data from the trampoline
>    x86/boot: Cache cpu_has_hypervisor very early on boot
> 
> Sergey Dyasli (1):
>    x86/e820: fix 640k - 1M region reservation logic
> 
>   xen/arch/x86/apic.c             |  2 +-
>   xen/arch/x86/boot/head.S        | 13 +++++++++++--
>   xen/arch/x86/boot/trampoline.S  | 13 +++++--------
>   xen/arch/x86/boot/wakeup.S      | 13 ++-----------
>   xen/arch/x86/cpu/common.c       |  3 ---
>   xen/arch/x86/cpu/intel.c        |  1 +
>   xen/arch/x86/e820.c             |  4 ++--
>   xen/arch/x86/efi/efi-boot.h     | 12 ++++++++----
>   xen/arch/x86/guest/xen.c        |  6 +-----
>   xen/arch/x86/mm.c               |  3 +--
>   xen/include/asm-x86/processor.h |  2 +-
>   11 files changed, 33 insertions(+), 39 deletions(-)
> 

For the series:

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
  2019-11-19 16:44                       ` Jan Beulich
@ 2019-11-21 15:23                         ` Andrew Cooper
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2019-11-21 15:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Sergey Dyasli, Xen-devel, Wei Liu, Roger Pau Monné

On 19/11/2019 16:44, Jan Beulich wrote:
> On 19.11.2019 16:15, Andrew Cooper wrote:
>> On 13/11/2019 13:29, Jan Beulich wrote:
>>> On 13.11.2019 14:22, Andrew Cooper wrote:
>>>> I am not convinced the behaviour is worth changing, and I don't have
>>>> time for this scope creep.
>>> There's no scope creep here at all.
>> Yes - it really is scope creep.
>>
>> This patch does not change the behaviour of Xen in the case of poor
>> virtualisation of the bit.  Xen will still crash either way.
> So I have to apologize. What I didn't notice is
>
> 	if (disable & MSR_IA32_MISC_ENABLE_XD_DISABLE) {
> 		write_efer(read_efer() | EFER_NX);
> 		printk(KERN_INFO
> 		       "re-enabled NX (Execute Disable) protection\n");
> 	}
>
> in early_init_intel(). I simply didn't expect we'd already have
> such a blind EFER write. I therefore agree now that this is a
> pre-existing bug that you don't make any worse.
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thankyou.  I'll get this series in now, along with some other straggling
4.13 content.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-11-21 15:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01 20:24 [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Andrew Cooper
2019-11-01 20:25 ` [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline Andrew Cooper
2019-11-04 13:25   ` Jan Beulich
2019-11-04 14:59     ` Andrew Cooper
2019-11-04 15:03       ` Jan Beulich
2019-11-04 15:22         ` Andrew Cooper
2019-11-04 15:31           ` Jan Beulich
2019-11-12 16:09             ` Andrew Cooper
2019-11-12 17:15               ` Jan Beulich
2019-11-13 13:22                 ` Andrew Cooper
2019-11-13 13:29                   ` Jan Beulich
2019-11-19 15:15                     ` Andrew Cooper
2019-11-19 16:44                       ` Jan Beulich
2019-11-21 15:23                         ` Andrew Cooper
2019-11-01 20:25 ` [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot Andrew Cooper
2019-11-04 13:32   ` Jan Beulich
2019-11-04 15:31     ` Andrew Cooper
2019-11-04 15:41       ` Jan Beulich
2019-11-05 11:08   ` Sergey Dyasli
2019-11-01 20:25 ` [Xen-devel] [PATCH 3/3] x86/e820: fix 640k - 1M region reservation logic Andrew Cooper
2019-11-04 13:35   ` Jan Beulich
2019-11-20  5:20 ` [Xen-devel] [PATCH for-4.13 0/3] Fix PV shim ballooning problems Jürgen Groß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).