All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up
@ 2019-09-25 15:19 Jan Beulich
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Jan Beulich @ 2019-09-25 15:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Roger Pau Monné

1: x86: suppress XPTI-related TLB flushes when possible
2: x86/mm: honor opt_pcid also for 32-bit PV domains
3: x86/HVM: move NOFLUSH handling out of hvm_set_cr3()
4: x86/HVM: refuse CR3 loads with reserved (upper) bits set
5: x86/HVM: cosmetics to hvm_set_cr3()

The first patch was previously submitted standalone (v3) and
hasn't changed since then, but more or less fits into this
group, so gets included here going forward.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2019-09-25 15:19 [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
@ 2019-09-25 15:23 ` Jan Beulich
  2020-05-18 17:09   ` Roger Pau Monné
  2020-05-22 11:00   ` Andrew Cooper
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 2/5] x86/mm: honor opt_pcid also for 32-bit PV domains Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Jan Beulich @ 2019-09-25 15:23 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Roger Pau Monné

When there's no XPTI-enabled PV domain at all, there's no need to issue
respective TLB flushes. Hardwire opt_xpti_* to false when !PV, and
record the creation of PV domains by bumping opt_xpti_* accordingly.

As to the sticky opt_xpti_domu vs increment/decrement of opt_xpti_hwdom,
this is done this way to avoid
(a) widening the former variable,
(b) any risk of a missed flush, which would result in an XSA if a DomU
    was able to exercise it, and
(c) any races updating the variable.
Fundamentally the TLB flush done when context switching out the domain's
vCPU-s the last time before destroying the domain ought to be
sufficient, so in principle DomU handling could be made match hwdom's.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Re-base.
v2: Add comment to spec_ctrl.h. Explain difference in accounting of DomU
    and hwdom.
---
TBD: The hardwiring to false could be extended to opt_pv_l1tf_* and (for
     !HVM) opt_l1d_flush as well.

---
 xen/arch/x86/flushtlb.c         |    2 +-
 xen/arch/x86/pv/domain.c        |   14 +++++++++++++-
 xen/arch/x86/spec_ctrl.c        |    6 ++++++
 xen/include/asm-x86/spec_ctrl.h |   11 +++++++++++
 4 files changed, 31 insertions(+), 2 deletions(-)

--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -207,7 +207,7 @@ unsigned int flush_area_local(const void
                  */
                 invpcid_flush_one(PCID_PV_PRIV, addr);
                 invpcid_flush_one(PCID_PV_USER, addr);
-                if ( opt_xpti_hwdom || opt_xpti_domu )
+                if ( opt_xpti_hwdom > 1 || opt_xpti_domu > 1 )
                 {
                     invpcid_flush_one(PCID_PV_PRIV | PCID_PV_XPTI, addr);
                     invpcid_flush_one(PCID_PV_USER | PCID_PV_XPTI, addr);
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -272,6 +272,9 @@ void pv_domain_destroy(struct domain *d)
     destroy_perdomain_mapping(d, GDT_LDT_VIRT_START,
                               GDT_LDT_MBYTES << (20 - PAGE_SHIFT));
 
+    opt_xpti_hwdom -= IS_ENABLED(CONFIG_LATE_HWDOM) &&
+                      !d->domain_id && opt_xpti_hwdom;
+
     XFREE(d->arch.pv.cpuidmasks);
 
     FREE_XENHEAP_PAGE(d->arch.pv.gdt_ldt_l1tab);
@@ -310,7 +313,16 @@ int pv_domain_initialise(struct domain *
     /* 64-bit PV guest by default. */
     d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
-    d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : opt_xpti_domu;
+    if ( is_hardware_domain(d) && opt_xpti_hwdom )
+    {
+        d->arch.pv.xpti = true;
+        ++opt_xpti_hwdom;
+    }
+    if ( !is_hardware_domain(d) && opt_xpti_domu )
+    {
+        d->arch.pv.xpti = true;
+        opt_xpti_domu = 2;
+    }
 
     if ( !is_pv_32bit_domain(d) && use_invpcid && cpu_has_pcid )
         switch ( ACCESS_ONCE(opt_pcid) )
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -85,10 +85,12 @@ static int __init parse_spec_ctrl(const
 
             opt_eager_fpu = 0;
 
+#ifdef CONFIG_PV
             if ( opt_xpti_hwdom < 0 )
                 opt_xpti_hwdom = 0;
             if ( opt_xpti_domu < 0 )
                 opt_xpti_domu = 0;
+#endif
 
             if ( opt_smt < 0 )
                 opt_smt = 1;
@@ -187,6 +189,7 @@ static int __init parse_spec_ctrl(const
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
+#ifdef CONFIG_PV
 int8_t __read_mostly opt_xpti_hwdom = -1;
 int8_t __read_mostly opt_xpti_domu = -1;
 
@@ -253,6 +256,9 @@ static __init int parse_xpti(const char
     return rc;
 }
 custom_param("xpti", parse_xpti);
+#else /* !CONFIG_PV */
+# define xpti_init_default(caps) ((void)(caps))
+#endif /* CONFIG_PV */
 
 int8_t __read_mostly opt_pv_l1tf_hwdom = -1;
 int8_t __read_mostly opt_pv_l1tf_domu = -1;
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -43,7 +43,18 @@ extern bool bsp_delay_spec_ctrl;
 extern uint8_t default_xen_spec_ctrl;
 extern uint8_t default_spec_ctrl_flags;
 
+#ifdef CONFIG_PV
+/*
+ * Values -1, 0, and 1 have the usual meaning of "not established yet",
+ * "disabled", and "enabled". Values larger than 1 indicate there's actually
+ * at least one such domain (or there has been). This way XPTI-specific TLB
+ * flushes can be avoided when no XPTI-enabled domain is/was active.
+ */
 extern int8_t opt_xpti_hwdom, opt_xpti_domu;
+#else
+# define opt_xpti_hwdom false
+# define opt_xpti_domu false
+#endif
 
 extern int8_t opt_pv_l1tf_hwdom, opt_pv_l1tf_domu;
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 2/5] x86/mm: honor opt_pcid also for 32-bit PV domains
  2019-09-25 15:19 [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible Jan Beulich
@ 2019-09-25 15:23 ` Jan Beulich
  2020-05-22 11:40   ` Andrew Cooper
  2019-09-25 15:25 ` [Xen-devel] [PATCH v3 3/5] x86/HVM: move NOFLUSH handling out of hvm_set_cr3() Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2019-09-25 15:23 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Roger Pau Monné

I can't see any technical or performance reason why we should treat
32-bit PV different from 64-bit PV in this regard.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

---
 xen/arch/x86/pv/domain.c |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -180,7 +180,24 @@ int switch_compat(struct domain *d)
     d->arch.x87_fip_width = 4;
 
     d->arch.pv.xpti = false;
-    d->arch.pv.pcid = false;
+
+    if ( use_invpcid && cpu_has_pcid )
+        switch ( ACCESS_ONCE(opt_pcid) )
+        {
+        case PCID_OFF:
+        case PCID_XPTI:
+            d->arch.pv.pcid = false;
+            break;
+
+        case PCID_ALL:
+        case PCID_NOXPTI:
+            d->arch.pv.pcid = true;
+            break;
+
+        default:
+            ASSERT_UNREACHABLE();
+            break;
+        }
 
     return 0;
 
@@ -324,7 +341,7 @@ int pv_domain_initialise(struct domain *
         opt_xpti_domu = 2;
     }
 
-    if ( !is_pv_32bit_domain(d) && use_invpcid && cpu_has_pcid )
+    if ( use_invpcid && cpu_has_pcid )
         switch ( ACCESS_ONCE(opt_pcid) )
         {
         case PCID_OFF:


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 3/5] x86/HVM: move NOFLUSH handling out of hvm_set_cr3()
  2019-09-25 15:19 [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible Jan Beulich
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 2/5] x86/mm: honor opt_pcid also for 32-bit PV domains Jan Beulich
@ 2019-09-25 15:25 ` Jan Beulich
  2020-05-22 10:40   ` Andrew Cooper
  2019-09-25 15:25 ` [Xen-devel] [PATCH v3 4/5] x86/HVM: refuse CR3 loads with reserved (upper) bits set Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2019-09-25 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Kevin Tian, Tamas K Lengyel, Razvan Cojocaru,
	Wei Liu, Paul Durrant, George Dunlap, Andrew Cooper,
	Suravee Suthikulpanit, Jun Nakajima, Alexandru Isaila,
	Boris Ostrovsky, Roger Pau Monné

The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in
particular not when loading nested guest state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v3: Further restrict "noflush" local variable scopes. Remove (now
    redundant) zapping of X86_CR3_NOFLUSH from hvm_monitor_cr().

---
 xen/arch/x86/hvm/emulate.c        |    8 +++++++-
 xen/arch/x86/hvm/hvm.c            |   20 ++++++++++----------
 xen/arch/x86/hvm/monitor.c        |    3 ---
 xen/arch/x86/hvm/svm/nestedsvm.c  |    6 +++---
 xen/arch/x86/hvm/vm_event.c       |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c       |    4 ++--
 xen/include/asm-x86/domain.h      |    2 ++
 xen/include/asm-x86/hvm/support.h |    2 +-
 8 files changed, 26 insertions(+), 21 deletions(-)

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2123,8 +2123,14 @@ static int hvmemul_write_cr(
         break;
 
     case 3:
-        rc = hvm_set_cr3(val, true);
+    {
+        bool noflush = hvm_pcid_enabled(current) && (val & X86_CR3_NOFLUSH);
+
+        if ( noflush )
+            val &= ~X86_CR3_NOFLUSH;
+        rc = hvm_set_cr3(val, noflush, true);
         break;
+    }
 
     case 4:
         rc = hvm_set_cr4(val, true);
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2076,8 +2076,14 @@ int hvm_mov_to_cr(unsigned int cr, unsig
         break;
 
     case 3:
-        rc = hvm_set_cr3(val, true);
+    {
+        bool noflush = hvm_pcid_enabled(curr) && (val & X86_CR3_NOFLUSH);
+
+        if ( noflush )
+            val &= ~X86_CR3_NOFLUSH;
+        rc = hvm_set_cr3(val, noflush, true);
         break;
+    }
 
     case 4:
         rc = hvm_set_cr4(val, true);
@@ -2294,12 +2300,11 @@ int hvm_set_cr0(unsigned long value, boo
     return X86EMUL_OKAY;
 }
 
-int hvm_set_cr3(unsigned long value, bool may_defer)
+int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer)
 {
     struct vcpu *v = current;
     struct page_info *page;
     unsigned long old = v->arch.hvm.guest_cr[3];
-    bool noflush = false;
 
     if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
@@ -2311,17 +2316,12 @@ int hvm_set_cr3(unsigned long value, boo
             /* The actual write will occur in hvm_do_resume(), if permitted. */
             v->arch.vm_event->write_data.do_write.cr3 = 1;
             v->arch.vm_event->write_data.cr3 = value;
+            v->arch.vm_event->write_data.cr3_noflush = noflush;
 
             return X86EMUL_OKAY;
         }
     }
 
-    if ( hvm_pcid_enabled(v) ) /* Clear the noflush bit. */
-    {
-        noflush = value & X86_CR3_NOFLUSH;
-        value &= ~X86_CR3_NOFLUSH;
-    }
-
     if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) &&
          ((value ^ v->arch.hvm.guest_cr[3]) >> PAGE_SHIFT) )
     {
@@ -3016,7 +3016,7 @@ void hvm_task_switch(
     if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
         goto out;
 
-    rc = hvm_set_cr3(tss.cr3, true);
+    rc = hvm_set_cr3(tss.cr3, false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if ( rc != X86EMUL_OKAY )
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -38,9 +38,6 @@ bool hvm_monitor_cr(unsigned int index,
     struct arch_domain *ad = &curr->domain->arch;
     unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
 
-    if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
-        value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
-
     if ( (ad->monitor.write_ctrlreg_enabled & ctrlreg_bitmask) &&
          (!(ad->monitor.write_ctrlreg_onchangeonly & ctrlreg_bitmask) ||
           value != old) &&
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -324,7 +324,7 @@ static int nsvm_vcpu_hostrestore(struct
         v->arch.guest_table = pagetable_null();
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
     }
-    rc = hvm_set_cr3(n1vmcb->_cr3, true);
+    rc = hvm_set_cr3(n1vmcb->_cr3, false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if (rc != X86EMUL_OKAY)
@@ -584,7 +584,7 @@ static int nsvm_vmcb_prepare4vmrun(struc
         nestedsvm_vmcb_set_nestedp2m(v, ns_vmcb, n2vmcb);
 
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3, true);
+        rc = hvm_set_cr3(ns_vmcb->_cr3, false, true);
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         if (rc != X86EMUL_OKAY)
@@ -598,7 +598,7 @@ static int nsvm_vmcb_prepare4vmrun(struc
          * we assume it intercepts page faults.
          */
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3, true);
+        rc = hvm_set_cr3(ns_vmcb->_cr3, false, true);
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         if (rc != X86EMUL_OKAY)
--- a/xen/arch/x86/hvm/vm_event.c
+++ b/xen/arch/x86/hvm/vm_event.c
@@ -110,7 +110,7 @@ void hvm_vm_event_do_resume(struct vcpu
 
     if ( unlikely(w->do_write.cr3) )
     {
-        if ( hvm_set_cr3(w->cr3, false) == X86EMUL_EXCEPTION )
+        if ( hvm_set_cr3(w->cr3, w->cr3_noflush, false) == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
         w->do_write.cr3 = 0;
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1032,7 +1032,7 @@ static void load_shadow_guest_state(stru
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), true);
+    rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
@@ -1246,7 +1246,7 @@ static void load_vvmcs_host_state(struct
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), true);
+    rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -274,6 +274,8 @@ struct monitor_write_data {
         unsigned int cr4 : 1;
     } do_write;
 
+    bool cr3_noflush;
+
     uint32_t msr;
     uint64_t value;
     uint64_t cr0;
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -136,7 +136,7 @@ void hvm_shadow_handle_cd(struct vcpu *v
  */
 int hvm_set_efer(uint64_t value);
 int hvm_set_cr0(unsigned long value, bool may_defer);
-int hvm_set_cr3(unsigned long value, bool may_defer);
+int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer);
 int hvm_set_cr4(unsigned long value, bool may_defer);
 int hvm_descriptor_access_intercept(uint64_t exit_info,
                                     uint64_t vmx_exit_qualification,


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 4/5] x86/HVM: refuse CR3 loads with reserved (upper) bits set
  2019-09-25 15:19 [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
                   ` (2 preceding siblings ...)
  2019-09-25 15:25 ` [Xen-devel] [PATCH v3 3/5] x86/HVM: move NOFLUSH handling out of hvm_set_cr3() Jan Beulich
@ 2019-09-25 15:25 ` Jan Beulich
  2019-09-25 15:26 ` [Xen-devel] [PATCH v3 5/5] x86/HVM: cosmetics to hvm_set_cr3() Jan Beulich
  2020-04-28  7:59 ` Ping: [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
  5 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2019-09-25 15:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Roger Pau Monné

While bits 11 and below are, if not used for other purposes, reserved
but ignored, bits beyond physical address width are supposed to raise
exceptions (at least in the non-nested case; I'm not convinced the
current nested SVM/VMX behavior of raising #GP(0) here is correct, but
that's not the subject of this change).

Introduce currd as a local variable, and replace other v->domain
instances at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Correct return value in hvm_load_cpu_ctxt(). Re-base.
v2: Simplify the expressions used for the reserved bit checks.

---
 xen/arch/x86/hvm/hvm.c |   24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1016,6 +1016,13 @@ static int hvm_load_cpu_ctxt(struct doma
         return -EINVAL;
     }
 
+    if ( ctxt.cr3 >> d->arch.cpuid->extd.maxphysaddr )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: bad CR3 %#" PRIx64 "\n",
+               d->domain_id, ctxt.cr3);
+        return -EINVAL;
+    }
+
     if ( (ctxt.flags & ~XEN_X86_FPU_INITIALISED) != 0 )
     {
         gprintk(XENLOG_ERR, "bad flags value in CPU context: %#x\n",
@@ -2303,10 +2310,18 @@ int hvm_set_cr0(unsigned long value, boo
 int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer)
 {
     struct vcpu *v = current;
+    struct domain *currd = v->domain;
     struct page_info *page;
     unsigned long old = v->arch.hvm.guest_cr[3];
 
-    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
+    if ( value >> currd->arch.cpuid->extd.maxphysaddr )
+    {
+        HVM_DBG_LOG(DBG_LEVEL_1,
+                    "Attempt to set reserved CR3 bit(s): %lx", value);
+        return X86EMUL_EXCEPTION;
+    }
+
+    if ( may_defer && unlikely(currd->arch.monitor.write_ctrlreg_enabled &
                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
     {
         ASSERT(v->arch.vm_event);
@@ -2322,13 +2337,12 @@ int hvm_set_cr3(unsigned long value, boo
         }
     }
 
-    if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) &&
+    if ( hvm_paging_enabled(v) && !paging_mode_hap(currd) &&
          ((value ^ v->arch.hvm.guest_cr[3]) >> PAGE_SHIFT) )
     {
         /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
         HVM_DBG_LOG(DBG_LEVEL_VMMU, "CR3 value = %lx", value);
-        page = get_page_from_gfn(v->domain, value >> PAGE_SHIFT,
-                                 NULL, P2M_ALLOC);
+        page = get_page_from_gfn(currd, value >> PAGE_SHIFT, NULL, P2M_ALLOC);
         if ( !page )
             goto bad_cr3;
 
@@ -2344,7 +2358,7 @@ int hvm_set_cr3(unsigned long value, boo
 
  bad_cr3:
     gdprintk(XENLOG_ERR, "Invalid CR3\n");
-    domain_crash(v->domain);
+    domain_crash(currd);
     return X86EMUL_UNHANDLEABLE;
 }
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 5/5] x86/HVM: cosmetics to hvm_set_cr3()
  2019-09-25 15:19 [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
                   ` (3 preceding siblings ...)
  2019-09-25 15:25 ` [Xen-devel] [PATCH v3 4/5] x86/HVM: refuse CR3 loads with reserved (upper) bits set Jan Beulich
@ 2019-09-25 15:26 ` Jan Beulich
  2020-04-28  7:59 ` Ping: [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
  5 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2019-09-25 15:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Roger Pau Monné

Eliminate the not really useful local variable "old". Reduce the scope
of "page". Rename the latched "current".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over change earlier in the series.

---
 xen/arch/x86/hvm/hvm.c |   30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2309,10 +2309,8 @@ int hvm_set_cr0(unsigned long value, boo
 
 int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer)
 {
-    struct vcpu *v = current;
-    struct domain *currd = v->domain;
-    struct page_info *page;
-    unsigned long old = v->arch.hvm.guest_cr[3];
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
 
     if ( value >> currd->arch.cpuid->extd.maxphysaddr )
     {
@@ -2324,36 +2322,38 @@ int hvm_set_cr3(unsigned long value, boo
     if ( may_defer && unlikely(currd->arch.monitor.write_ctrlreg_enabled &
                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
     {
-        ASSERT(v->arch.vm_event);
+        ASSERT(curr->arch.vm_event);
 
-        if ( hvm_monitor_crX(CR3, value, old) )
+        if ( hvm_monitor_crX(CR3, value, curr->arch.hvm.guest_cr[3]) )
         {
             /* The actual write will occur in hvm_do_resume(), if permitted. */
-            v->arch.vm_event->write_data.do_write.cr3 = 1;
-            v->arch.vm_event->write_data.cr3 = value;
-            v->arch.vm_event->write_data.cr3_noflush = noflush;
+            curr->arch.vm_event->write_data.do_write.cr3 = 1;
+            curr->arch.vm_event->write_data.cr3 = value;
+            curr->arch.vm_event->write_data.cr3_noflush = noflush;
 
             return X86EMUL_OKAY;
         }
     }
 
-    if ( hvm_paging_enabled(v) && !paging_mode_hap(currd) &&
-         ((value ^ v->arch.hvm.guest_cr[3]) >> PAGE_SHIFT) )
+    if ( hvm_paging_enabled(curr) && !paging_mode_hap(currd) &&
+         ((value ^ curr->arch.hvm.guest_cr[3]) >> PAGE_SHIFT) )
     {
         /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
+        struct page_info *page;
+
         HVM_DBG_LOG(DBG_LEVEL_VMMU, "CR3 value = %lx", value);
         page = get_page_from_gfn(currd, value >> PAGE_SHIFT, NULL, P2M_ALLOC);
         if ( !page )
             goto bad_cr3;
 
-        put_page(pagetable_get_page(v->arch.guest_table));
-        v->arch.guest_table = pagetable_from_page(page);
+        put_page(pagetable_get_page(curr->arch.guest_table));
+        curr->arch.guest_table = pagetable_from_page(page);
 
         HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR3 value = %lx", value);
     }
 
-    v->arch.hvm.guest_cr[3] = value;
-    paging_update_cr3(v, noflush);
+    curr->arch.hvm.guest_cr[3] = value;
+    paging_update_cr3(curr, noflush);
     return X86EMUL_OKAY;
 
  bad_cr3:


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Ping: [PATCH v3 0/5] (remaining) XSA-292 follow-up
  2019-09-25 15:19 [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
                   ` (4 preceding siblings ...)
  2019-09-25 15:26 ` [Xen-devel] [PATCH v3 5/5] x86/HVM: cosmetics to hvm_set_cr3() Jan Beulich
@ 2020-04-28  7:59 ` Jan Beulich
  5 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-04-28  7:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, xen-devel, Wei Liu, Roger Pau Monné

Andrew,

On 25.09.2019 17:19, Jan Beulich wrote:
> 1: x86: suppress XPTI-related TLB flushes when possible
> 2: x86/mm: honor opt_pcid also for 32-bit PV domains

I realize these two weren't entirely uncontroversial. May I
please ask that you get back to them, more than half a year
after their posting? For patch 1 I don't think I got
anything back yet on v2 / v3. For patch 2 see
https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01721.html

> 3: x86/HVM: move NOFLUSH handling out of hvm_set_cr3()

See
https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01689.html
https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01737.html

> 4: x86/HVM: refuse CR3 loads with reserved (upper) bits set
> 5: x86/HVM: cosmetics to hvm_set_cr3()

These two have your ack, but depend on at least patch 3 afaict.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible Jan Beulich
@ 2020-05-18 17:09   ` Roger Pau Monné
  2020-05-19  7:55     ` Jan Beulich
  2020-05-22 11:00   ` Andrew Cooper
  1 sibling, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2020-05-18 17:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On Wed, Sep 25, 2019 at 05:23:11PM +0200, Jan Beulich wrote:
> When there's no XPTI-enabled PV domain at all, there's no need to issue
> respective TLB flushes. Hardwire opt_xpti_* to false when !PV, and
> record the creation of PV domains by bumping opt_xpti_* accordingly.
> 
> As to the sticky opt_xpti_domu vs increment/decrement of opt_xpti_hwdom,
> this is done this way to avoid
> (a) widening the former variable,
> (b) any risk of a missed flush, which would result in an XSA if a DomU
>     was able to exercise it, and
> (c) any races updating the variable.
> Fundamentally the TLB flush done when context switching out the domain's
> vCPU-s the last time before destroying the domain ought to be
> sufficient, so in principle DomU handling could be made match hwdom's.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3: Re-base.
> v2: Add comment to spec_ctrl.h. Explain difference in accounting of DomU
>     and hwdom.
> ---
> TBD: The hardwiring to false could be extended to opt_pv_l1tf_* and (for
>      !HVM) opt_l1d_flush as well.
> 
> ---
>  xen/arch/x86/flushtlb.c         |    2 +-
>  xen/arch/x86/pv/domain.c        |   14 +++++++++++++-
>  xen/arch/x86/spec_ctrl.c        |    6 ++++++
>  xen/include/asm-x86/spec_ctrl.h |   11 +++++++++++
>  4 files changed, 31 insertions(+), 2 deletions(-)
> 
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -207,7 +207,7 @@ unsigned int flush_area_local(const void
>                   */
>                  invpcid_flush_one(PCID_PV_PRIV, addr);
>                  invpcid_flush_one(PCID_PV_USER, addr);
> -                if ( opt_xpti_hwdom || opt_xpti_domu )
> +                if ( opt_xpti_hwdom > 1 || opt_xpti_domu > 1 )
>                  {
>                      invpcid_flush_one(PCID_PV_PRIV | PCID_PV_XPTI, addr);
>                      invpcid_flush_one(PCID_PV_USER | PCID_PV_XPTI, addr);
> --- a/xen/arch/x86/pv/domain.c
> +++ b/xen/arch/x86/pv/domain.c
> @@ -272,6 +272,9 @@ void pv_domain_destroy(struct domain *d)
>      destroy_perdomain_mapping(d, GDT_LDT_VIRT_START,
>                                GDT_LDT_MBYTES << (20 - PAGE_SHIFT));
>  
> +    opt_xpti_hwdom -= IS_ENABLED(CONFIG_LATE_HWDOM) &&
> +                      !d->domain_id && opt_xpti_hwdom;
> +
>      XFREE(d->arch.pv.cpuidmasks);
>  
>      FREE_XENHEAP_PAGE(d->arch.pv.gdt_ldt_l1tab);
> @@ -310,7 +313,16 @@ int pv_domain_initialise(struct domain *
>      /* 64-bit PV guest by default. */
>      d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>  
> -    d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : opt_xpti_domu;
> +    if ( is_hardware_domain(d) && opt_xpti_hwdom )
> +    {
> +        d->arch.pv.xpti = true;
> +        ++opt_xpti_hwdom;
> +    }
> +    if ( !is_hardware_domain(d) && opt_xpti_domu )
> +    {
> +        d->arch.pv.xpti = true;
> +        opt_xpti_domu = 2;

I wonder whether a store fence is needed here in order to guarantee
that opt_xpti_domu is visible to flush_area_local before proceeding
any further with domain creation.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2020-05-18 17:09   ` Roger Pau Monné
@ 2020-05-19  7:55     ` Jan Beulich
  2020-05-19  9:15       ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2020-05-19  7:55 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On 18.05.2020 19:09, Roger Pau Monné wrote:
> On Wed, Sep 25, 2019 at 05:23:11PM +0200, Jan Beulich wrote:
>> @@ -310,7 +313,16 @@ int pv_domain_initialise(struct domain *
>>      /* 64-bit PV guest by default. */
>>      d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>>  
>> -    d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : opt_xpti_domu;
>> +    if ( is_hardware_domain(d) && opt_xpti_hwdom )
>> +    {
>> +        d->arch.pv.xpti = true;
>> +        ++opt_xpti_hwdom;
>> +    }
>> +    if ( !is_hardware_domain(d) && opt_xpti_domu )
>> +    {
>> +        d->arch.pv.xpti = true;
>> +        opt_xpti_domu = 2;
> 
> I wonder whether a store fence is needed here in order to guarantee
> that opt_xpti_domu is visible to flush_area_local before proceeding
> any further with domain creation.

The changed behavior of flush_area_local() becomes relevant only
once the new domain runs. This being x86 code, the write can't
remain invisible for longer than the very latest when the function
returns, as the store can't be deferred past that (in reality it
can't be deferred even until after the next [real] function call
or the next barrier()). And due to x86'es cache coherent nature
(for WB memory) the moment the store insn completes the new value
is visible to all other CPUs.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2020-05-19  7:55     ` Jan Beulich
@ 2020-05-19  9:15       ` Roger Pau Monné
  2020-05-19  9:45         ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2020-05-19  9:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On Tue, May 19, 2020 at 09:55:38AM +0200, Jan Beulich wrote:
> On 18.05.2020 19:09, Roger Pau Monné wrote:
> > On Wed, Sep 25, 2019 at 05:23:11PM +0200, Jan Beulich wrote:
> >> @@ -310,7 +313,16 @@ int pv_domain_initialise(struct domain *
> >>      /* 64-bit PV guest by default. */
> >>      d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
> >>  
> >> -    d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : opt_xpti_domu;
> >> +    if ( is_hardware_domain(d) && opt_xpti_hwdom )
> >> +    {
> >> +        d->arch.pv.xpti = true;
> >> +        ++opt_xpti_hwdom;
> >> +    }
> >> +    if ( !is_hardware_domain(d) && opt_xpti_domu )
> >> +    {
> >> +        d->arch.pv.xpti = true;
> >> +        opt_xpti_domu = 2;
> > 
> > I wonder whether a store fence is needed here in order to guarantee
> > that opt_xpti_domu is visible to flush_area_local before proceeding
> > any further with domain creation.
> 
> The changed behavior of flush_area_local() becomes relevant only
> once the new domain runs. This being x86 code, the write can't
> remain invisible for longer than the very latest when the function
> returns, as the store can't be deferred past that (in reality it
> can't be deferred even until after the next [real] function call
> or the next barrier()). And due to x86'es cache coherent nature
> (for WB memory) the moment the store insn completes the new value
> is visible to all other CPUs.

Yes, I think it's fine because this is x86 specific code. A comment
in that regard might be nice, but I'm not going to make this a strong
request.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

I also think that turning opt_xpti_domu into a proper atomic and
increasing/decreasing (maybe a cmpxg would be needed) upon PV domain
creation/destruction should be able to accurately keep track of PV
domUs and hence could be used to further reduce the flushes when no PV
domains are running?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2020-05-19  9:15       ` Roger Pau Monné
@ 2020-05-19  9:45         ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-05-19  9:45 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On 19.05.2020 11:15, Roger Pau Monné wrote:
> On Tue, May 19, 2020 at 09:55:38AM +0200, Jan Beulich wrote:
>> On 18.05.2020 19:09, Roger Pau Monné wrote:
>>> On Wed, Sep 25, 2019 at 05:23:11PM +0200, Jan Beulich wrote:
>>>> @@ -310,7 +313,16 @@ int pv_domain_initialise(struct domain *
>>>>      /* 64-bit PV guest by default. */
>>>>      d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>>>>  
>>>> -    d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : opt_xpti_domu;
>>>> +    if ( is_hardware_domain(d) && opt_xpti_hwdom )
>>>> +    {
>>>> +        d->arch.pv.xpti = true;
>>>> +        ++opt_xpti_hwdom;
>>>> +    }
>>>> +    if ( !is_hardware_domain(d) && opt_xpti_domu )
>>>> +    {
>>>> +        d->arch.pv.xpti = true;
>>>> +        opt_xpti_domu = 2;
>>>
>>> I wonder whether a store fence is needed here in order to guarantee
>>> that opt_xpti_domu is visible to flush_area_local before proceeding
>>> any further with domain creation.
>>
>> The changed behavior of flush_area_local() becomes relevant only
>> once the new domain runs. This being x86 code, the write can't
>> remain invisible for longer than the very latest when the function
>> returns, as the store can't be deferred past that (in reality it
>> can't be deferred even until after the next [real] function call
>> or the next barrier()). And due to x86'es cache coherent nature
>> (for WB memory) the moment the store insn completes the new value
>> is visible to all other CPUs.
> 
> Yes, I think it's fine because this is x86 specific code. A comment
> in that regard might be nice, but I'm not going to make this a strong
> request.
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> I also think that turning opt_xpti_domu into a proper atomic and
> increasing/decreasing (maybe a cmpxg would be needed) upon PV domain
> creation/destruction should be able to accurately keep track of PV
> domUs and hence could be used to further reduce the flushes when no PV
> domains are running?

Possibly, which would take care of (c) in the reasons the commit
message gives. (a) and in particular (b) would then still need
addressing.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/5] x86/HVM: move NOFLUSH handling out of hvm_set_cr3()
  2019-09-25 15:25 ` [Xen-devel] [PATCH v3 3/5] x86/HVM: move NOFLUSH handling out of hvm_set_cr3() Jan Beulich
@ 2020-05-22 10:40   ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2020-05-22 10:40 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Petre Pircalabu, Kevin Tian, Tamas K Lengyel, Wei Liu,
	Razvan Cojocaru, Paul Durrant, George Dunlap,
	Suravee Suthikulpanit, Jun Nakajima, Alexandru Isaila,
	Boris Ostrovsky, Roger Pau Monné

On 25/09/2019 16:25, Jan Beulich wrote:
> The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in
> particular not when loading nested guest state.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Paul Durrant <paul@xen.org>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible Jan Beulich
  2020-05-18 17:09   ` Roger Pau Monné
@ 2020-05-22 11:00   ` Andrew Cooper
  2020-05-22 11:13     ` Roger Pau Monné
  2020-05-22 11:42     ` Jan Beulich
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Cooper @ 2020-05-22 11:00 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: George Dunlap, Wei Liu, Roger Pau Monné

On 25/09/2019 16:23, Jan Beulich wrote:
> When there's no XPTI-enabled PV domain at all, there's no need to issue
> respective TLB flushes. Hardwire opt_xpti_* to false when !PV, and
> record the creation of PV domains by bumping opt_xpti_* accordingly.
>
> As to the sticky opt_xpti_domu vs increment/decrement of opt_xpti_hwdom,
> this is done this way to avoid
> (a) widening the former variable,
> (b) any risk of a missed flush, which would result in an XSA if a DomU
>     was able to exercise it, and
> (c) any races updating the variable.
> Fundamentally the TLB flush done when context switching out the domain's
> vCPU-s the last time before destroying the domain ought to be
> sufficient, so in principle DomU handling could be made match hwdom's.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I am still concerned about the added complexity for no obvious use case.

Under what circumstances do we expect to XPTI-ness come and go on a
system, outside of custom dev-testing scenarios?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2020-05-22 11:00   ` Andrew Cooper
@ 2020-05-22 11:13     ` Roger Pau Monné
  2020-05-22 11:58       ` Andrew Cooper
  2020-05-22 11:42     ` Jan Beulich
  1 sibling, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2020-05-22 11:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, xen-devel, Wei Liu, Jan Beulich

On Fri, May 22, 2020 at 12:00:14PM +0100, Andrew Cooper wrote:
> On 25/09/2019 16:23, Jan Beulich wrote:
> > When there's no XPTI-enabled PV domain at all, there's no need to issue
> > respective TLB flushes. Hardwire opt_xpti_* to false when !PV, and
> > record the creation of PV domains by bumping opt_xpti_* accordingly.
> >
> > As to the sticky opt_xpti_domu vs increment/decrement of opt_xpti_hwdom,
> > this is done this way to avoid
> > (a) widening the former variable,
> > (b) any risk of a missed flush, which would result in an XSA if a DomU
> >     was able to exercise it, and
> > (c) any races updating the variable.
> > Fundamentally the TLB flush done when context switching out the domain's
> > vCPU-s the last time before destroying the domain ought to be
> > sufficient, so in principle DomU handling could be made match hwdom's.
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I am still concerned about the added complexity for no obvious use case.
> 
> Under what circumstances do we expect to XPTI-ness come and go on a
> system, outside of custom dev-testing scenarios?

XPTI-ness will be sticky, in the sense that once enabled cannot be
disabled anymore.

Roger.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] x86/mm: honor opt_pcid also for 32-bit PV domains
  2019-09-25 15:23 ` [Xen-devel] [PATCH v3 2/5] x86/mm: honor opt_pcid also for 32-bit PV domains Jan Beulich
@ 2020-05-22 11:40   ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2020-05-22 11:40 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: George Dunlap, Wei Liu, Roger Pau Monné

On 25/09/2019 16:23, Jan Beulich wrote:
> I can't see any technical or performance reason why we should treat
> 32-bit PV different from 64-bit PV in this regard.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

There are technical reasons, and a very good perf reason not to...

There is no such thing as a user/kernel split for 32bit guests (as
TF_kernel_mode remains set unconditionally), and there is no such thing
as an XPTI split (32bit code can't attack Xen using meltdown).

What you would gain is the perf hit of maintaining unused PCID's worth
of mappings (seeing as INVPCID is horribly expensive even on modern CPUs).

The only way this might not hurt performance is if it was tied to a PV
ABI extension letting 32bit PV guests split their user/kernel mappings
and have Xen handle the transition automatically, at which point a
user/kernel PCID split in Xen would be better than the guest kernel
trying to do KPTI on its own.

~Andrew


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2020-05-22 11:00   ` Andrew Cooper
  2020-05-22 11:13     ` Roger Pau Monné
@ 2020-05-22 11:42     ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-05-22 11:42 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: George Dunlap, Wei Liu, Roger Pau Monné

On 22.05.2020 13:00, Andrew Cooper wrote:
> On 25/09/2019 16:23, Jan Beulich wrote:
>> When there's no XPTI-enabled PV domain at all, there's no need to issue
>> respective TLB flushes. Hardwire opt_xpti_* to false when !PV, and
>> record the creation of PV domains by bumping opt_xpti_* accordingly.
>>
>> As to the sticky opt_xpti_domu vs increment/decrement of opt_xpti_hwdom,
>> this is done this way to avoid
>> (a) widening the former variable,
>> (b) any risk of a missed flush, which would result in an XSA if a DomU
>>     was able to exercise it, and
>> (c) any races updating the variable.
>> Fundamentally the TLB flush done when context switching out the domain's
>> vCPU-s the last time before destroying the domain ought to be
>> sufficient, so in principle DomU handling could be made match hwdom's.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I am still concerned about the added complexity for no obvious use case.
> 
> Under what circumstances do we expect to XPTI-ness come and go on a
> system, outside of custom dev-testing scenarios?

Run a PVH Dom0 with just HVM guests for a while on a system, until you
find a need to run a PV guest there (perhaps because of an emergency).

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible
  2020-05-22 11:13     ` Roger Pau Monné
@ 2020-05-22 11:58       ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2020-05-22 11:58 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: George Dunlap, xen-devel, Wei Liu, Jan Beulich

On 22/05/2020 12:13, Roger Pau Monné wrote:
> On Fri, May 22, 2020 at 12:00:14PM +0100, Andrew Cooper wrote:
>> On 25/09/2019 16:23, Jan Beulich wrote:
>>> When there's no XPTI-enabled PV domain at all, there's no need to issue
>>> respective TLB flushes. Hardwire opt_xpti_* to false when !PV, and
>>> record the creation of PV domains by bumping opt_xpti_* accordingly.
>>>
>>> As to the sticky opt_xpti_domu vs increment/decrement of opt_xpti_hwdom,
>>> this is done this way to avoid
>>> (a) widening the former variable,
>>> (b) any risk of a missed flush, which would result in an XSA if a DomU
>>>     was able to exercise it, and
>>> (c) any races updating the variable.
>>> Fundamentally the TLB flush done when context switching out the domain's
>>> vCPU-s the last time before destroying the domain ought to be
>>> sufficient, so in principle DomU handling could be made match hwdom's.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> I am still concerned about the added complexity for no obvious use case.
>>
>> Under what circumstances do we expect to XPTI-ness come and go on a
>> system, outside of custom dev-testing scenarios?
> XPTI-ness will be sticky, in the sense that once enabled cannot be
> disabled anymore.

I guess the question was a little too rhetorical, so lets spell it out.

You're either on Meltdown vulnerable hardware, or not.  If not, none of
this logic is relevant (AFAICT).

If you're on Meltdown-vulnerable hardware and in a production
environment, your running with XPTI (at which point none of this
complexity is relevant).

The only plausible case I can see where this would make a difference is
a dev environment starting with a non-XPTI dom0, then booting an XPTI
guest, which which point can we seriously justify bizarre logic like:

    opt_xpti_hwdom -= IS_ENABLED(CONFIG_LATE_HWDOM) &&
                      !d->domain_id && opt_xpti_hwdom;

just for an alleged perf improvement in a development corner case?

~Andrew


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-05-22 11:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-25 15:19 [Xen-devel] [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich
2019-09-25 15:23 ` [Xen-devel] [PATCH v3 1/5] x86: suppress XPTI-related TLB flushes when possible Jan Beulich
2020-05-18 17:09   ` Roger Pau Monné
2020-05-19  7:55     ` Jan Beulich
2020-05-19  9:15       ` Roger Pau Monné
2020-05-19  9:45         ` Jan Beulich
2020-05-22 11:00   ` Andrew Cooper
2020-05-22 11:13     ` Roger Pau Monné
2020-05-22 11:58       ` Andrew Cooper
2020-05-22 11:42     ` Jan Beulich
2019-09-25 15:23 ` [Xen-devel] [PATCH v3 2/5] x86/mm: honor opt_pcid also for 32-bit PV domains Jan Beulich
2020-05-22 11:40   ` Andrew Cooper
2019-09-25 15:25 ` [Xen-devel] [PATCH v3 3/5] x86/HVM: move NOFLUSH handling out of hvm_set_cr3() Jan Beulich
2020-05-22 10:40   ` Andrew Cooper
2019-09-25 15:25 ` [Xen-devel] [PATCH v3 4/5] x86/HVM: refuse CR3 loads with reserved (upper) bits set Jan Beulich
2019-09-25 15:26 ` [Xen-devel] [PATCH v3 5/5] x86/HVM: cosmetics to hvm_set_cr3() Jan Beulich
2020-04-28  7:59 ` Ping: [PATCH v3 0/5] (remaining) XSA-292 follow-up Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.