All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests
@ 2018-02-20 11:58 Andrew Cooper
  2018-02-20 11:58 ` [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs() Andrew Cooper
                   ` (5 more replies)
  0 siblings, 6 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 11:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

This rats nest was discovered when finding that MSR_TSC_AUX leaked into PV
guests.  It is RFC because I haven't done extensive testing on the result, and
because there are some functional changes for the virtualised TSC modes.

Andrew Cooper (5):
  x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
  x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  x86/time: Rework pv_soft_rdtsc() to aid further cleanup
  x86/pv: Remove deferred RDTSC{,P} handling in pv_emulate_privileged_op()
  x86: Rework MSR_TSC_AUX handling from scratch.

 xen/arch/x86/domain.c                       |  5 ++-
 xen/arch/x86/domctl.c                       | 15 ++++++++-
 xen/arch/x86/hvm/hvm.c                      | 51 ++++++++++++++++++++---------
 xen/arch/x86/hvm/svm/svm.c                  |  4 +--
 xen/arch/x86/hvm/vmx/vmx.c                  |  4 +--
 xen/arch/x86/msr.c                          | 18 ++++++++++
 xen/arch/x86/pv/emul-inv-op.c               |  5 ++-
 xen/arch/x86/pv/emul-priv-op.c              | 30 ++---------------
 xen/arch/x86/time.c                         | 19 +++++++----
 xen/arch/x86/x86_emulate/x86_emulate.c      |  1 +
 xen/include/asm-x86/hvm/hvm.h               |  6 ----
 xen/include/asm-x86/hvm/vcpu.h              |  1 -
 xen/include/asm-x86/msr.h                   | 25 ++++++++++++--
 xen/include/asm-x86/time.h                  |  2 +-
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 xen/include/public/arch-x86/hvm/save.h      |  2 ++
 xen/tools/gen-cpuid.py                      |  3 ++
 17 files changed, 123 insertions(+), 70 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
  2018-02-20 11:58 [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
@ 2018-02-20 11:58 ` Andrew Cooper
  2018-02-20 14:54   ` Roger Pau Monné
                     ` (2 more replies)
  2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 11:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

c/s d2f86bf604 which introduced "struct hvm_save_descriptor *d" accidentally
ended up shadowing the "struct domain *d" function parameter.  Rename the
former to desc.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/hvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 91bc3e8..5d39210 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1338,7 +1338,7 @@ static int hvm_save_cpu_msrs(struct domain *d, hvm_domain_context_t *h)
 
     for_each_vcpu ( d, v )
     {
-        struct hvm_save_descriptor *d = _p(&h->data[h->cur]);
+        struct hvm_save_descriptor *desc = _p(&h->data[h->cur]);
         struct hvm_msr *ctxt;
         unsigned int i;
 
@@ -1386,7 +1386,7 @@ static int hvm_save_cpu_msrs(struct domain *d, hvm_domain_context_t *h)
         if ( ctxt->count )
         {
             /* Rewrite length to indicate how much space we actually used. */
-            d->length = HVM_CPU_MSR_SIZE(ctxt->count);
+            desc->length = HVM_CPU_MSR_SIZE(ctxt->count);
             h->cur += HVM_CPU_MSR_SIZE(ctxt->count);
         }
         else
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 11:58 [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
  2018-02-20 11:58 ` [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs() Andrew Cooper
@ 2018-02-20 11:58 ` Andrew Cooper
  2018-02-20 15:22   ` Wei Liu
                     ` (4 more replies)
  2018-02-20 11:58 ` [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup Andrew Cooper
                   ` (3 subsequent siblings)
  5 siblings, 5 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 11:58 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Boris Ostrovsky, Suravee Suthikulpanit, Roger Pau Monné

If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
advertised/hidden.

At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
Therefore, default PV guests can read the value from a previously scheduled
HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.

Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.

To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
update of the MSR, and use this function consistently across the codebase.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

N.B. This has been deemed not a security issue, because the MSR doesn't store
any sensitive information.
---
 xen/arch/x86/domain.c      |  6 +++---
 xen/arch/x86/hvm/hvm.c     |  2 +-
 xen/arch/x86/hvm/svm/svm.c |  2 +-
 xen/arch/x86/hvm/vmx/vmx.c |  2 +-
 xen/arch/x86/msr.c         |  2 ++
 xen/include/asm-x86/msr.h  | 16 ++++++++++++++--
 6 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f93327b..9c3527f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1533,9 +1533,9 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
     if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
         activate_debugregs(v);
 
-    if ( (v->domain->arch.tsc_mode ==  TSC_MODE_PVRDTSCP) &&
-         boot_cpu_has(X86_FEATURE_RDTSCP) )
-        write_rdtscp_aux(v->domain->arch.incarnation);
+    if ( cpu_has_rdtscp )
+        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
+                      ? v->domain->arch.incarnation : 0);
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5d39210..0539551 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3579,7 +3579,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
         v->arch.hvm_vcpu.msr_tsc_aux = (uint32_t)msr_content;
         if ( cpu_has_rdtscp
              && (v->domain->arch.tsc_mode != TSC_MODE_PVRDTSCP) )
-            wrmsrl(MSR_TSC_AUX, (uint32_t)msr_content);
+            wrmsr_tsc_aux(msr_content);
         break;
 
     case MSR_IA32_APICBASE:
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 9f58afc..f53f430 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1093,7 +1093,7 @@ static void svm_ctxt_switch_to(struct vcpu *v)
     svm_tsc_ratio_load(v);
 
     if ( cpu_has_rdtscp )
-        wrmsrl(MSR_TSC_AUX, hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
 
 static void noreturn svm_do_resume(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 5cd689e..31acb0e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -618,7 +618,7 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
     }
 
     if ( cpu_has_rdtscp )
-        wrmsrl(MSR_TSC_AUX, hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 7875d9c..7ba9a10 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -24,6 +24,8 @@
 #include <xen/sched.h>
 #include <asm/msr.h>
 
+DEFINE_PER_CPU(uint32_t, tsc_aux);
+
 struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
                          __read_mostly  pv_max_msr_domain_policy;
 
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 928f1cc..9475a71 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -115,8 +115,6 @@ static inline uint64_t rdtsc_ordered(void)
     __write_tsc(val);                                           \
 })
 
-#define write_rdtscp_aux(val) wrmsr(MSR_TSC_AUX, (val), 0)
-
 #define rdpmc(counter,low,high) \
      __asm__ __volatile__("rdpmc" \
 			  : "=a" (low), "=d" (high) \
@@ -210,6 +208,20 @@ static inline void write_efer(uint64_t val)
 
 DECLARE_PER_CPU(u32, ler_msr);
 
+DECLARE_PER_CPU(uint32_t, tsc_aux);
+
+/* Lazy update of MSR_TSC_AUX */
+static inline void wrmsr_tsc_aux(uint32_t val)
+{
+    uint32_t *this_tsc_aux = &this_cpu(tsc_aux);
+
+    if ( *this_tsc_aux != val )
+    {
+        wrmsr(MSR_TSC_AUX, val, 0);
+        *this_tsc_aux = val;
+    }
+}
+
 /* MSR policy object for shared per-domain MSRs */
 struct msr_domain_policy
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup
  2018-02-20 11:58 [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
  2018-02-20 11:58 ` [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs() Andrew Cooper
  2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
@ 2018-02-20 11:58 ` Andrew Cooper
  2018-02-20 15:32   ` Wei Liu
                     ` (2 more replies)
  2018-02-20 11:58 ` [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op() Andrew Cooper
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 11:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Having pv_soft_rdtsc() emulate all parts of an rdtscp is awkward, and gets in
the way of some intended cleanup.

 * Drop the rdtscp parameter and always make the caller responsible for ecx
   updates when appropriate.
 * Switch the function from being void, and return the main timestamp in the
   return value.

The regs parameter is still needed, but only for the stats collection, once
again bringing into question their utility.  The parameter can however switch
to being const.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/pv/emul-inv-op.c  |  7 ++++++-
 xen/arch/x86/pv/emul-priv-op.c | 12 ++++++++----
 xen/arch/x86/time.c            |  8 ++------
 xen/include/asm-x86/time.h     |  2 +-
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
index f894417..b1916b4 100644
--- a/xen/arch/x86/pv/emul-inv-op.c
+++ b/xen/arch/x86/pv/emul-inv-op.c
@@ -46,6 +46,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     char opcode[3];
     unsigned long eip, rc;
     struct vcpu *v = current;
+    struct domain *currd = v->domain;
 
     eip = regs->rip;
     if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
@@ -56,7 +57,11 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
         return 0;
     eip += sizeof(opcode);
-    pv_soft_rdtsc(v, regs, 1);
+
+    msr_split(regs, pv_soft_rdtsc(v, regs));
+    regs->rcx = ((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
+                 ? currd->arch.incarnation : 0);
+
     pv_emul_instruction_done(regs, eip);
     return EXCRET_fault_fixed;
 }
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 17aaf97..d4d64f2 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1374,10 +1374,14 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
     case X86EMUL_OKAY:
         if ( ctxt.tsc & TSC_BASE )
         {
-            if ( ctxt.tsc & TSC_AUX )
-                pv_soft_rdtsc(curr, regs, 1);
-            else if ( currd->arch.vtsc )
-                pv_soft_rdtsc(curr, regs, 0);
+            if ( currd->arch.vtsc || (ctxt.tsc & TSC_AUX) )
+            {
+                msr_split(regs, pv_soft_rdtsc(curr, regs));
+
+                if ( ctxt.tsc & TSC_AUX )
+                    regs->rcx = ((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
+                                 ? currd->arch.incarnation : 0);
+            }
             else
                 msr_split(regs, rdtsc());
         }
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index c90524d..c4ca515 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2024,7 +2024,7 @@ u64 gtsc_to_gtime(struct domain *d, u64 tsc)
     return time;
 }
 
-void pv_soft_rdtsc(struct vcpu *v, struct cpu_user_regs *regs, int rdtscp)
+uint64_t pv_soft_rdtsc(struct vcpu *v, const struct cpu_user_regs *regs)
 {
     s_time_t now = get_s_time();
     struct domain *d = v->domain;
@@ -2045,11 +2045,7 @@ void pv_soft_rdtsc(struct vcpu *v, struct cpu_user_regs *regs, int rdtscp)
 
     spin_unlock(&d->arch.vtsc_lock);
 
-    msr_split(regs, gtime_to_gtsc(d, now));
-
-    if ( rdtscp )
-         regs->rcx =
-             (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) ? d->arch.incarnation : 0;
+    return gtime_to_gtsc(d, now);
 }
 
 bool clocksource_is_tsc(void)
diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h
index 046302e..3bac74c 100644
--- a/xen/include/asm-x86/time.h
+++ b/xen/include/asm-x86/time.h
@@ -56,7 +56,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns);
 
 uint64_t tsc_ticks2ns(uint64_t ticks);
 
-void pv_soft_rdtsc(struct vcpu *v, struct cpu_user_regs *regs, int rdtscp);
+uint64_t pv_soft_rdtsc(struct vcpu *v, const struct cpu_user_regs *regs);
 u64 gtime_to_gtsc(struct domain *d, u64 time);
 u64 gtsc_to_gtime(struct domain *d, u64 tsc);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
  2018-02-20 11:58 [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
                   ` (2 preceding siblings ...)
  2018-02-20 11:58 ` [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup Andrew Cooper
@ 2018-02-20 11:58 ` Andrew Cooper
  2018-02-20 16:08   ` Wei Liu
                     ` (2 more replies)
  2018-02-20 11:58 ` [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch Andrew Cooper
  2018-02-26 19:12 ` [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
  5 siblings, 3 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 11:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

The handling of RDTSCP for PV guests has been broken (AFAICT forever).

To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
and the emulator doesn't perform appropriate feature checking.  (Conversely,
we unilaterally advertise RDPID which uses the same path, but it should never
trap on #GP to arrive here in the first place).

A PV guest typically can see RDTSCP in native CPUID, so userspace will
probably end up using it.  On a capable pipeline (without TSD, see below), it
will execute normally and return non-virtualised data.

When a virtual TSC mode is not specified for the domain, CR4.TSD is left
clear, so executing RDTSCP will execute without trapping.  However, a guest
kernel may set TSD itself, at which point the emulator should not suddenly
switch to virtualised TSC mode and start handing out differently-scaled
values.

Drop all the deferral logic, and return scaled or raw TSC values depending
only on currd->arch.vtsc.  This changes the exact moment at which the
timestamp is taken, but that doesn't matter from the guests point of view, and
is consistent with the HVM side of things.  It also means that RDTSC and
RDTSCP are now consistent WRT handing out native or virtualised timestamps.

The MSR_TSC_AUX case unconditionally returns the migration incarnation or
zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
of hardware.

This is a behavioural change for guests, but the semantics are rather more
sane.  It lays groundwork for further fixes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/pv/emul-priv-op.c | 35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index d4d64f2..4e3641d 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -60,9 +60,6 @@ struct priv_op_ctxt {
     } cs;
     char *io_emul_stub;
     unsigned int bpmatch;
-    unsigned int tsc;
-#define TSC_BASE 1
-#define TSC_AUX 2
 };
 
 /* I/O emulation support. Helper routines for, and type of, the stack stub. */
@@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct domain *d)
 static int read_msr(unsigned int reg, uint64_t *val,
                     struct x86_emulate_ctxt *ctxt)
 {
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    const struct vcpu *curr = current;
+    struct vcpu *curr = current;
     const struct domain *currd = curr->domain;
     bool vpmu_msr = false;
     int ret;
@@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val,
         *val = curr->arch.pv_vcpu.gs_base_user;
         return X86EMUL_OKAY;
 
-    /*
-     * In order to fully retain original behavior, defer calling
-     * pv_soft_rdtsc() until after emulation. This may want/need to be
-     * reconsidered.
-     */
     case MSR_IA32_TSC:
-        poc->tsc |= TSC_BASE;
-        goto normal;
+        *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
+        return X86EMUL_OKAY;
 
     case MSR_TSC_AUX:
-        poc->tsc |= TSC_AUX;
-        if ( cpu_has_rdtscp )
-            goto normal;
-        *val = 0;
+        *val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
+                          ? currd->arch.incarnation : 0);
         return X86EMUL_OKAY;
 
     case MSR_EFER:
@@ -1372,20 +1361,6 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
     switch ( rc )
     {
     case X86EMUL_OKAY:
-        if ( ctxt.tsc & TSC_BASE )
-        {
-            if ( currd->arch.vtsc || (ctxt.tsc & TSC_AUX) )
-            {
-                msr_split(regs, pv_soft_rdtsc(curr, regs));
-
-                if ( ctxt.tsc & TSC_AUX )
-                    regs->rcx = ((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
-                                 ? currd->arch.incarnation : 0);
-            }
-            else
-                msr_split(regs, rdtsc());
-        }
-
         if ( ctxt.ctxt.retire.singlestep )
             ctxt.bpmatch |= DR_STEP;
         if ( ctxt.bpmatch )
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 11:58 [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
                   ` (3 preceding siblings ...)
  2018-02-20 11:58 ` [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op() Andrew Cooper
@ 2018-02-20 11:58 ` Andrew Cooper
  2018-02-20 17:03   ` Wei Liu
                     ` (2 more replies)
  2018-02-26 19:12 ` [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
  5 siblings, 3 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 11:58 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Boris Ostrovsky, Suravee Suthikulpanit, Roger Pau Monné

There are many problems with MSR_TSC_AUX handling.

To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
depends on RDTSCP.

For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
PV guests, which in turn allows RDPID to work.

To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
into the generic MSR policy infrastructure, and becomes common.  One
improvement is that we will now reject invalid values, rather than silently
truncating an accepting them.  This also causes the emulator to reject RDTSCP
for guests without the features.

One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
previous behaviour of this mode was to silently drop writes, but as it is a
break in the x86 ABI to start with, switch the semantics to be more sane, and
have TSC_MODE_PVRDTSCP make the MSR properly read-only.

With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
migrating.  Cope with it the common migration logic.  Care must be taken
however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as the
receiving side will reject the guest_wrmsr().

What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
emulation code functions correctly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/domain.c                       |  3 +-
 xen/arch/x86/domctl.c                       | 15 ++++++++-
 xen/arch/x86/hvm/hvm.c                      | 47 ++++++++++++++++++++---------
 xen/arch/x86/hvm/svm/svm.c                  |  4 +--
 xen/arch/x86/hvm/vmx/vmx.c                  |  4 +--
 xen/arch/x86/msr.c                          | 16 ++++++++++
 xen/arch/x86/pv/emul-inv-op.c               |  4 +--
 xen/arch/x86/pv/emul-priv-op.c              |  5 ---
 xen/arch/x86/time.c                         | 11 +++++++
 xen/arch/x86/x86_emulate/x86_emulate.c      |  1 +
 xen/include/asm-x86/hvm/hvm.h               |  6 ----
 xen/include/asm-x86/hvm/vcpu.h              |  1 -
 xen/include/asm-x86/msr.h                   |  9 ++++++
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 xen/include/public/arch-x86/hvm/save.h      |  2 ++
 xen/tools/gen-cpuid.py                      |  3 ++
 16 files changed, 96 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9c3527f..144d6f0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1534,8 +1534,7 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
         activate_debugregs(v);
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
-                      ? v->domain->arch.incarnation : 0);
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3a..979afdf 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1249,6 +1249,7 @@ long arch_do_domctl(
         static const uint32_t msrs_to_send[] = {
             MSR_SPEC_CTRL,
             MSR_INTEL_MISC_FEATURES_ENABLES,
+            MSR_TSC_AUX,
         };
         uint32_t nr_msrs = ARRAY_SIZE(msrs_to_send);
 
@@ -1284,7 +1285,18 @@ long arch_do_domctl(
                 for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
                 {
                     uint64_t val;
-                    int rc = guest_rdmsr(v, msrs_to_send[j], &val);
+                    int rc;
+
+                    /*
+                     * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this
+                     * case, the MSR is read-only, and should be rejected if
+                     * seen on the restore side.
+                     */
+                    if ( msrs_to_send[j] == MSR_TSC_AUX &&
+                         d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+                        continue;
+
+                    rc = guest_rdmsr(v, msrs_to_send[j], &val);
 
                     /*
                      * It is the programmers responsibility to ensure that
@@ -1373,6 +1385,7 @@ long arch_do_domctl(
                 {
                 case MSR_SPEC_CTRL:
                 case MSR_INTEL_MISC_FEATURES_ENABLES:
+                case MSR_TSC_AUX:
                     if ( guest_wrmsr(v, msr.index, msr.value) != X86EMUL_OKAY )
                         break;
                     continue;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0539551..ab24f87 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -792,7 +792,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
 
         ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
 
-        ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
+        /* ctxt.msr_tsc_aux omitted - now sent via generic MSR record. */
 
         hvm_get_segment_register(v, x86_seg_idtr, &seg);
         ctxt.idtr_limit = seg.limit;
@@ -1046,7 +1046,24 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
     if ( hvm_funcs.tsc_scaling.setup )
         hvm_funcs.tsc_scaling.setup(v);
 
-    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
+    /*
+     * Backwards compatibility.  MSR_TSC_AUX contains different information
+     * depending on whether TSC_MODE_PVRDTSCP is enabled.
+     *
+     * Before Xen 4.11, ctxt.msr_tsc_aux was sent unconditionally and restored
+     * here, but the value was otherwise ignored in TSC_MODE_PVRDTSCP.
+     *
+     * In 4.11, the logic was changed to send MSR_TSC_AUX via the generic MSR
+     * mechanism if tsc_mode != TSC_MODE_PVRDTSCP.  If tsc_mode ==
+     * TSC_MODE_PVRDTSCP, the tsc logic is responsibile for setting the
+     * correct value.
+     *
+     * For compatibility with migration streams from before 4.11, we restore
+     * from ctxt.msr_tsc_aux if the TSC code hasn't/isn't in charge, and we've
+     * not seen a value arrive in the generic MSR record.
+     */
+    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP && !v->arch.msr->tsc_aux )
+        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
 
     hvm_set_guest_tsc_fixed(v, ctxt.tsc, d->arch.hvm_domain.sync_tsc);
 
@@ -1329,6 +1346,7 @@ static int hvm_load_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
 static const uint32_t msrs_to_send[] = {
     MSR_SPEC_CTRL,
     MSR_INTEL_MISC_FEATURES_ENABLES,
+    MSR_TSC_AUX,
 };
 static unsigned int __read_mostly msr_count_max = ARRAY_SIZE(msrs_to_send);
 
@@ -1351,7 +1369,18 @@ static int hvm_save_cpu_msrs(struct domain *d, hvm_domain_context_t *h)
         for ( i = 0; i < ARRAY_SIZE(msrs_to_send); ++i )
         {
             uint64_t val;
-            int rc = guest_rdmsr(v, msrs_to_send[i], &val);
+            int rc;
+
+            /*
+             * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this case, the
+             * MSR is read-only, and should be rejected if seen on the restore
+             * side.
+             */
+            if ( msrs_to_send[i] == MSR_TSC_AUX &&
+                 d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+                continue;
+
+            rc = guest_rdmsr(v, msrs_to_send[i], &val);
 
             /*
              * It is the programmers responsibility to ensure that
@@ -1465,6 +1494,7 @@ static int hvm_load_cpu_msrs(struct domain *d, hvm_domain_context_t *h)
 
         case MSR_SPEC_CTRL:
         case MSR_INTEL_MISC_FEATURES_ENABLES:
+        case MSR_TSC_AUX:
             rc = guest_wrmsr(v, ctxt->msr[i].index, ctxt->msr[i].val);
 
             if ( rc != X86EMUL_OKAY )
@@ -3424,10 +3454,6 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content = v->arch.hvm_vcpu.msr_tsc_adjust;
         break;
 
-    case MSR_TSC_AUX:
-        *msr_content = hvm_msr_tsc_aux(v);
-        break;
-
     case MSR_IA32_APICBASE:
         *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
         break;
@@ -3575,13 +3601,6 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
         hvm_set_guest_tsc_adjust(v, msr_content);
         break;
 
-    case MSR_TSC_AUX:
-        v->arch.hvm_vcpu.msr_tsc_aux = (uint32_t)msr_content;
-        if ( cpu_has_rdtscp
-             && (v->domain->arch.tsc_mode != TSC_MODE_PVRDTSCP) )
-            wrmsr_tsc_aux(msr_content);
-        break;
-
     case MSR_IA32_APICBASE:
         if ( !vlapic_msr_set(vcpu_vlapic(v), msr_content) )
             goto gp_fault;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index f53f430..9406624 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1093,7 +1093,7 @@ static void svm_ctxt_switch_to(struct vcpu *v)
     svm_tsc_ratio_load(v);
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 static void noreturn svm_do_resume(struct vcpu *v)
@@ -2842,7 +2842,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
         break;
 
     case VMEXIT_RDTSCP:
-        regs->rcx = hvm_msr_tsc_aux(v);
+        regs->rcx = v->arch.msr->tsc_aux;
         /* fall through */
     case VMEXIT_RDTSC:
         svm_vmexit_do_rdtsc(regs);
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 31acb0e..45fd9c5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -618,7 +618,7 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
     }
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
@@ -3858,7 +3858,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_invlpg_intercept(exit_qualification);
         break;
     case EXIT_REASON_RDTSCP:
-        regs->rcx = hvm_msr_tsc_aux(v);
+        regs->rcx = v->arch.msr->tsc_aux;
         /* fall through */
     case EXIT_REASON_RDTSC:
         update_guest_eip(); /* Safe: RDTSC, RDTSCP */
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 7ba9a10..aab0722 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -156,6 +156,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
                _MSR_MISC_FEATURES_CPUID_FAULTING;
         break;
 
+    case MSR_TSC_AUX:
+        if ( !cp->extd.rdtscp )
+            goto gp_fault;
+
+        *val = vp->tsc_aux;
+        break;
+
     default:
         return X86EMUL_UNHANDLEABLE;
     }
@@ -231,6 +238,15 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
         break;
     }
 
+    case MSR_TSC_AUX:
+        if ( !cp->extd.rdtscp ||                      /* MSR available? */
+             d->arch.tsc_mode == TSC_MODE_PVRDTSCP || /* MSR read-only? */
+             val != (uint32_t)val )                   /* Rsvd bits set? */
+            goto gp_fault;
+
+        wrmsr_tsc_aux(vp->tsc_aux = val);
+        break;
+
     default:
         return X86EMUL_UNHANDLEABLE;
     }
diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
index b1916b4..c071372 100644
--- a/xen/arch/x86/pv/emul-inv-op.c
+++ b/xen/arch/x86/pv/emul-inv-op.c
@@ -46,7 +46,6 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     char opcode[3];
     unsigned long eip, rc;
     struct vcpu *v = current;
-    struct domain *currd = v->domain;
 
     eip = regs->rip;
     if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
@@ -59,8 +58,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     eip += sizeof(opcode);
 
     msr_split(regs, pv_soft_rdtsc(v, regs));
-    regs->rcx = ((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
-                 ? currd->arch.incarnation : 0);
+    regs->rcx = v->arch.msr->tsc_aux;
 
     pv_emul_instruction_done(regs, eip);
     return EXCRET_fault_fixed;
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 4e3641d..e9e2313 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -880,11 +880,6 @@ static int read_msr(unsigned int reg, uint64_t *val,
         *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
         return X86EMUL_OKAY;
 
-    case MSR_TSC_AUX:
-        *val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
-                          ? currd->arch.incarnation : 0);
-        return X86EMUL_OKAY;
-
     case MSR_EFER:
         *val = read_efer();
         if ( is_pv_32bit_domain(currd) )
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index c4ca515..7119d65 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
         }
         break;
     }
+
     d->arch.incarnation = incarnation + 1;
+
+    if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+    {
+        struct vcpu *v;
+
+        /* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
+        for_each_vcpu ( d, v )
+            v->arch.msr->tsc_aux = d->arch.incarnation;
+    }
+
     if ( is_hvm_domain(d) )
     {
         if ( hvm_tsc_scaling_supported && !d->arch.vtsc )
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index 85383ea..efb6003 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -5141,6 +5141,7 @@ x86_emulate(
 
         case 0xf9: /* rdtscp */
             fail_if(ops->read_msr == NULL);
+            /* Getting a result implies vcpu_has_rdtscp() */
             if ( (rc = ops->read_msr(MSR_TSC_AUX,
                                      &msr_val, ctxt)) != X86EMUL_OKAY )
                 goto done;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index dd3dd5f..7708fdf 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -511,12 +511,6 @@ static inline void hvm_invalidate_regs_fields(struct cpu_user_regs *regs)
 int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
                               struct npfec npfec);
 
-#define hvm_msr_tsc_aux(v) ({                                               \
-    struct domain *__d = (v)->domain;                                       \
-    (__d->arch.tsc_mode == TSC_MODE_PVRDTSCP)                               \
-        ? (u32)__d->arch.incarnation : (u32)(v)->arch.hvm_vcpu.msr_tsc_aux; \
-})
-
 int hvm_x2apic_msr_read(struct vcpu *v, unsigned int msr, uint64_t *msr_content);
 int hvm_x2apic_msr_write(struct vcpu *v, unsigned int msr, uint64_t msr_content);
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index d93166f..60425f3 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -170,7 +170,6 @@ struct hvm_vcpu {
 
     struct hvm_vcpu_asid n1asid;
 
-    u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
     u64                 msr_xss;
 
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 9475a71..485113f 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -250,6 +250,15 @@ struct msr_vcpu_policy
         bool available; /* This MSR is non-architectural */
         bool cpuid_faulting;
     } misc_features_enables;
+
+    /*
+     * 0xc0000103 - MSR_TSC_AUX
+     *
+     * Usually guest chosen.  However, when using TSC_MODE_PVRDTSCP the value
+     * is Xen-controlled and is the migration incarnation, at which point the
+     * MSR becomes read-only to the guest.
+     */
+    uint32_t tsc_aux;
 };
 
 void init_guest_msr_policy(void);
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index fa81af1..78fa466 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -156,7 +156,7 @@ XEN_CPUFEATURE(NX,            2*32+20) /*A  Execute Disable */
 XEN_CPUFEATURE(MMXEXT,        2*32+22) /*A  AMD MMX extensions */
 XEN_CPUFEATURE(FFXSR,         2*32+25) /*A  FFXSR instruction optimizations */
 XEN_CPUFEATURE(PAGE1GB,       2*32+26) /*H  1Gb large page support */
-XEN_CPUFEATURE(RDTSCP,        2*32+27) /*S  RDTSCP */
+XEN_CPUFEATURE(RDTSCP,        2*32+27) /*A  RDTSCP */
 XEN_CPUFEATURE(LM,            2*32+29) /*A  Long Mode (x86-64) */
 XEN_CPUFEATURE(3DNOWEXT,      2*32+30) /*A  AMD 3DNow! extensions */
 XEN_CPUFEATURE(3DNOW,         2*32+31) /*A  3DNow! */
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 4691d4d..1c1ba4e 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -141,6 +141,8 @@ struct hvm_hw_cpu {
     uint64_t msr_cstar;
     uint64_t msr_syscall_mask;
     uint64_t msr_efer;
+
+    /* Superceeded by general MSR policy in Xen 4.11.  Written as 0. */
     uint64_t msr_tsc_aux;
 
     /* guest's idea of what rdtsc() would return */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 613b909..601a7a0 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -235,6 +235,9 @@ def crunch_numbers(state):
         # absence of any enabled xstate.
         AVX: [FMA, FMA4, F16C, AVX2, XOP],
 
+        # MSR_TSC_AUX is enumerated by RDTSCP, but RDPID also reads TSC_AUX.
+        RDTSCP: [RDPID],
+
         # CX16 is only encodable in Long Mode.  LAHF_LM indicates that the
         # SAHF/LAHF instructions are reintroduced in Long Mode.  1GB
         # superpages, PCID and PKU are only available in 4 level paging.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
  2018-02-20 11:58 ` [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs() Andrew Cooper
@ 2018-02-20 14:54   ` Roger Pau Monné
  2018-02-20 15:12   ` Wei Liu
  2018-02-23 13:53   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-20 14:54 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

On Tue, Feb 20, 2018 at 11:58:39AM +0000, Andrew Cooper wrote:
> c/s d2f86bf604 which introduced "struct hvm_save_descriptor *d" accidentally
> ended up shadowing the "struct domain *d" function parameter.  Rename the
> former to desc.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Would be nice to enable -Wshadow but that requires a lot more of work.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
  2018-02-20 11:58 ` [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs() Andrew Cooper
  2018-02-20 14:54   ` Roger Pau Monné
@ 2018-02-20 15:12   ` Wei Liu
  2018-02-23 13:53   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2018-02-20 15:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Jan Beulich, Xen-devel

On Tue, Feb 20, 2018 at 11:58:39AM +0000, Andrew Cooper wrote:
> c/s d2f86bf604 which introduced "struct hvm_save_descriptor *d" accidentally
> ended up shadowing the "struct domain *d" function parameter.  Rename the
> former to desc.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
@ 2018-02-20 15:22   ` Wei Liu
  2018-02-20 15:26     ` Andrew Cooper
  2018-02-20 15:49   ` Roger Pau Monné
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Wei Liu @ 2018-02-20 15:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky, Roger Pau Monné

On Tue, Feb 20, 2018 at 11:58:40AM +0000, Andrew Cooper wrote:
> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
> advertised/hidden.
> 

This setup works only because CR4.TSD=0?

> At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
> TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
> Therefore, default PV guests can read the value from a previously scheduled
> HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.
> 
> Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.
> 
> To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
> update of the MSR, and use this function consistently across the codebase.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

The code looks correct to me.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 15:22   ` Wei Liu
@ 2018-02-20 15:26     ` Andrew Cooper
  2018-02-20 15:32       ` Wei Liu
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 15:26 UTC (permalink / raw)
  To: Wei Liu
  Cc: Kevin Tian, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky, Roger Pau Monné

On 20/02/18 15:22, Wei Liu wrote:
> On Tue, Feb 20, 2018 at 11:58:40AM +0000, Andrew Cooper wrote:
>> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
>> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
>> advertised/hidden.
>>
> This setup works only because CR4.TSD=0?

Having CR4.TSD clear is the default, and means RDTSCP will work at any
privilege level.  Setting CR4.TSD (either due to virtualised TSC, or
because the guest kernel wants to trap user accesses) will cause RDTSCP
to trap into emul-priv-op.

There is no way of causing RDPID to trap (on hardware which supports the
instruction), and it will read read the current value of MSR_TSC_AUX.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 15:26     ` Andrew Cooper
@ 2018-02-20 15:32       ` Wei Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2018-02-20 15:32 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Xen-devel, Jun Nakajima,
	Boris Ostrovsky, Suravee Suthikulpanit, Roger Pau Monné

On Tue, Feb 20, 2018 at 03:26:20PM +0000, Andrew Cooper wrote:
> On 20/02/18 15:22, Wei Liu wrote:
> > On Tue, Feb 20, 2018 at 11:58:40AM +0000, Andrew Cooper wrote:
> >> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
> >> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
> >> advertised/hidden.
> >>
> > This setup works only because CR4.TSD=0?
> 
> Having CR4.TSD clear is the default, and means RDTSCP will work at any
> privilege level.  Setting CR4.TSD (either due to virtualised TSC, or
> because the guest kernel wants to trap user accesses) will cause RDTSCP
> to trap into emul-priv-op.
> 
> There is no way of causing RDPID to trap (on hardware which supports the
> instruction), and it will read read the current value of MSR_TSC_AUX.

Alright.

In any case:

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup
  2018-02-20 11:58 ` [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup Andrew Cooper
@ 2018-02-20 15:32   ` Wei Liu
  2018-02-20 16:04   ` Roger Pau Monné
  2018-02-23 14:38   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2018-02-20 15:32 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Jan Beulich, Xen-devel

On Tue, Feb 20, 2018 at 11:58:41AM +0000, Andrew Cooper wrote:
> Having pv_soft_rdtsc() emulate all parts of an rdtscp is awkward, and gets in
> the way of some intended cleanup.
> 
>  * Drop the rdtscp parameter and always make the caller responsible for ecx
>    updates when appropriate.
>  * Switch the function from being void, and return the main timestamp in the
>    return value.
> 
> The regs parameter is still needed, but only for the stats collection, once
> again bringing into question their utility.  The parameter can however switch
> to being const.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
  2018-02-20 15:22   ` Wei Liu
@ 2018-02-20 15:49   ` Roger Pau Monné
  2018-02-23 14:04   ` Jan Beulich
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-20 15:49 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky

On Tue, Feb 20, 2018 at 11:58:40AM +0000, Andrew Cooper wrote:
> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
> advertised/hidden.
> 
> At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
> TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
> Therefore, default PV guests can read the value from a previously scheduled
> HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.
> 
> Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.
> 
> To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
> update of the MSR, and use this function consistently across the codebase.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup
  2018-02-20 11:58 ` [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup Andrew Cooper
  2018-02-20 15:32   ` Wei Liu
@ 2018-02-20 16:04   ` Roger Pau Monné
  2018-02-20 16:07     ` Andrew Cooper
  2018-02-23 14:38   ` Jan Beulich
  2 siblings, 1 reply; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-20 16:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

On Tue, Feb 20, 2018 at 11:58:41AM +0000, Andrew Cooper wrote:
> Having pv_soft_rdtsc() emulate all parts of an rdtscp is awkward, and gets in
> the way of some intended cleanup.
> 
>  * Drop the rdtscp parameter and always make the caller responsible for ecx
>    updates when appropriate.
>  * Switch the function from being void, and return the main timestamp in the
>    return value.
> 
> The regs parameter is still needed, but only for the stats collection, once
> again bringing into question their utility.  The parameter can however switch
> to being const.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/arch/x86/pv/emul-inv-op.c  |  7 ++++++-
>  xen/arch/x86/pv/emul-priv-op.c | 12 ++++++++----
>  xen/arch/x86/time.c            |  8 ++------
>  xen/include/asm-x86/time.h     |  2 +-
>  4 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
> index f894417..b1916b4 100644
> --- a/xen/arch/x86/pv/emul-inv-op.c
> +++ b/xen/arch/x86/pv/emul-inv-op.c
> @@ -46,6 +46,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
>      char opcode[3];
>      unsigned long eip, rc;
>      struct vcpu *v = current;
> +    struct domain *currd = v->domain;
const?

>  
>      eip = regs->rip;
>      if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
> @@ -56,7 +57,11 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
>      if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
>          return 0;
>      eip += sizeof(opcode);
> -    pv_soft_rdtsc(v, regs, 1);
> +
> +    msr_split(regs, pv_soft_rdtsc(v, regs));
> +    regs->rcx = ((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
> +                 ? currd->arch.incarnation : 0);

In the previous patch you use the same expression without any
parentheses. I think I prefer the version without parentheses. I also
wonder whether it would make sense to turn this into a macro (seeing
it being used in the previous patch and also further below).

With those fixed (if applicable):

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup
  2018-02-20 16:04   ` Roger Pau Monné
@ 2018-02-20 16:07     ` Andrew Cooper
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 16:07 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Wei Liu, Jan Beulich, Xen-devel

On 20/02/18 16:04, Roger Pau Monné wrote:
> On Tue, Feb 20, 2018 at 11:58:41AM +0000, Andrew Cooper wrote:
>> Having pv_soft_rdtsc() emulate all parts of an rdtscp is awkward, and gets in
>> the way of some intended cleanup.
>>
>>  * Drop the rdtscp parameter and always make the caller responsible for ecx
>>    updates when appropriate.
>>  * Switch the function from being void, and return the main timestamp in the
>>    return value.
>>
>> The regs parameter is still needed, but only for the stats collection, once
>> again bringing into question their utility.  The parameter can however switch
>> to being const.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  xen/arch/x86/pv/emul-inv-op.c  |  7 ++++++-
>>  xen/arch/x86/pv/emul-priv-op.c | 12 ++++++++----
>>  xen/arch/x86/time.c            |  8 ++------
>>  xen/include/asm-x86/time.h     |  2 +-
>>  4 files changed, 17 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
>> index f894417..b1916b4 100644
>> --- a/xen/arch/x86/pv/emul-inv-op.c
>> +++ b/xen/arch/x86/pv/emul-inv-op.c
>> @@ -46,6 +46,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
>>      char opcode[3];
>>      unsigned long eip, rc;
>>      struct vcpu *v = current;
>> +    struct domain *currd = v->domain;
> const?
>
>>  
>>      eip = regs->rip;
>>      if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
>> @@ -56,7 +57,11 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
>>      if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
>>          return 0;
>>      eip += sizeof(opcode);
>> -    pv_soft_rdtsc(v, regs, 1);
>> +
>> +    msr_split(regs, pv_soft_rdtsc(v, regs));
>> +    regs->rcx = ((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
>> +                 ? currd->arch.incarnation : 0);
> In the previous patch you use the same expression without any
> parentheses. I think I prefer the version without parentheses. I also
> wonder whether it would make sense to turn this into a macro (seeing
> it being used in the previous patch and also further below).
>
> With those fixed (if applicable):

None of them survive patch 5 in the series.  I'll fix the formatting,
but I'm not going to macro-ise anything.

>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
  2018-02-20 11:58 ` [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op() Andrew Cooper
@ 2018-02-20 16:08   ` Wei Liu
  2018-02-20 16:28   ` Roger Pau Monné
  2018-02-23 14:40   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2018-02-20 16:08 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Jan Beulich, Xen-devel

On Tue, Feb 20, 2018 at 11:58:42AM +0000, Andrew Cooper wrote:
> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> 
> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
> and the emulator doesn't perform appropriate feature checking.  (Conversely,
> we unilaterally advertise RDPID which uses the same path, but it should never
> trap on #GP to arrive here in the first place).
> 
> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> probably end up using it.  On a capable pipeline (without TSD, see below), it
> will execute normally and return non-virtualised data.
> 
> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> clear, so executing RDTSCP will execute without trapping.  However, a guest
> kernel may set TSD itself, at which point the emulator should not suddenly
> switch to virtualised TSC mode and start handing out differently-scaled
> values.
> 
> Drop all the deferral logic, and return scaled or raw TSC values depending
> only on currd->arch.vtsc.  This changes the exact moment at which the
> timestamp is taken, but that doesn't matter from the guests point of view, and
> is consistent with the HVM side of things.  It also means that RDTSC and
> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
> 
> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
> of hardware.
> 
> This is a behavioural change for guests, but the semantics are rather more
> sane.  It lays groundwork for further fixes.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
  2018-02-20 11:58 ` [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op() Andrew Cooper
  2018-02-20 16:08   ` Wei Liu
@ 2018-02-20 16:28   ` Roger Pau Monné
  2018-02-20 16:37     ` Andrew Cooper
  2018-02-23 14:40   ` Jan Beulich
  2 siblings, 1 reply; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-20 16:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

On Tue, Feb 20, 2018 at 11:58:42AM +0000, Andrew Cooper wrote:
> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> 
> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
> and the emulator doesn't perform appropriate feature checking.  (Conversely,
> we unilaterally advertise RDPID which uses the same path, but it should never
> trap on #GP to arrive here in the first place).
> 
> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> probably end up using it.  On a capable pipeline (without TSD, see below), it
> will execute normally and return non-virtualised data.
> 
> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> clear, so executing RDTSCP will execute without trapping.  However, a guest
> kernel may set TSD itself, at which point the emulator should not suddenly
> switch to virtualised TSC mode and start handing out differently-scaled
> values.
> 
> Drop all the deferral logic, and return scaled or raw TSC values depending
> only on currd->arch.vtsc.  This changes the exact moment at which the
> timestamp is taken, but that doesn't matter from the guests point of view, and
> is consistent with the HVM side of things.  It also means that RDTSC and
> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
> 
> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
> of hardware.
> 
> This is a behavioural change for guests, but the semantics are rather more
> sane.  It lays groundwork for further fixes.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/arch/x86/pv/emul-priv-op.c | 35 +++++------------------------------
>  1 file changed, 5 insertions(+), 30 deletions(-)
> 
> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index d4d64f2..4e3641d 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -60,9 +60,6 @@ struct priv_op_ctxt {
>      } cs;
>      char *io_emul_stub;
>      unsigned int bpmatch;
> -    unsigned int tsc;
> -#define TSC_BASE 1
> -#define TSC_AUX 2
>  };
>  
>  /* I/O emulation support. Helper routines for, and type of, the stack stub. */
> @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct domain *d)
>  static int read_msr(unsigned int reg, uint64_t *val,
>                      struct x86_emulate_ctxt *ctxt)
>  {
> -    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
> -    const struct vcpu *curr = current;
> +    struct vcpu *curr = current;

I think you can keep the const here?

>      const struct domain *currd = curr->domain;
>      bool vpmu_msr = false;
>      int ret;
> @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val,
>          *val = curr->arch.pv_vcpu.gs_base_user;
>          return X86EMUL_OKAY;
>  
> -    /*
> -     * In order to fully retain original behavior, defer calling
> -     * pv_soft_rdtsc() until after emulation. This may want/need to be
> -     * reconsidered.
> -     */
>      case MSR_IA32_TSC:
> -        poc->tsc |= TSC_BASE;
> -        goto normal;
> +        *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
> +        return X86EMUL_OKAY;
>  
>      case MSR_TSC_AUX:
> -        poc->tsc |= TSC_AUX;
> -        if ( cpu_has_rdtscp )
> -            goto normal;
> -        *val = 0;
> +        *val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
> +                          ? currd->arch.incarnation : 0);

I wonder whether Xen should inject a #GP here if tsc_mode is not
PVRDTSCP and RDPID is not available.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
  2018-02-20 16:28   ` Roger Pau Monné
@ 2018-02-20 16:37     ` Andrew Cooper
  2018-02-20 17:40       ` Roger Pau Monné
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 16:37 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Wei Liu, Jan Beulich, Xen-devel

On 20/02/18 16:28, Roger Pau Monné wrote:
> On Tue, Feb 20, 2018 at 11:58:42AM +0000, Andrew Cooper wrote:
>> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
>>
>> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
>> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
>> and the emulator doesn't perform appropriate feature checking.  (Conversely,
>> we unilaterally advertise RDPID which uses the same path, but it should never
>> trap on #GP to arrive here in the first place).
>>
>> A PV guest typically can see RDTSCP in native CPUID, so userspace will
>> probably end up using it.  On a capable pipeline (without TSD, see below), it
>> will execute normally and return non-virtualised data.
>>
>> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
>> clear, so executing RDTSCP will execute without trapping.  However, a guest
>> kernel may set TSD itself, at which point the emulator should not suddenly
>> switch to virtualised TSC mode and start handing out differently-scaled
>> values.
>>
>> Drop all the deferral logic, and return scaled or raw TSC values depending
>> only on currd->arch.vtsc.  This changes the exact moment at which the
>> timestamp is taken, but that doesn't matter from the guests point of view, and
>> is consistent with the HVM side of things.  It also means that RDTSC and
>> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
>>
>> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
>> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
>> of hardware.
>>
>> This is a behavioural change for guests, but the semantics are rather more
>> sane.  It lays groundwork for further fixes.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  xen/arch/x86/pv/emul-priv-op.c | 35 +++++------------------------------
>>  1 file changed, 5 insertions(+), 30 deletions(-)
>>
>> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
>> index d4d64f2..4e3641d 100644
>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -60,9 +60,6 @@ struct priv_op_ctxt {
>>      } cs;
>>      char *io_emul_stub;
>>      unsigned int bpmatch;
>> -    unsigned int tsc;
>> -#define TSC_BASE 1
>> -#define TSC_AUX 2
>>  };
>>  
>>  /* I/O emulation support. Helper routines for, and type of, the stack stub. */
>> @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct domain *d)
>>  static int read_msr(unsigned int reg, uint64_t *val,
>>                      struct x86_emulate_ctxt *ctxt)
>>  {
>> -    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
>> -    const struct vcpu *curr = current;
>> +    struct vcpu *curr = current;
> I think you can keep the const here?

pv_soft_rdtsc() mutates curr.  This change is most certainly not spurious.

>>      const struct domain *currd = curr->domain;
>>      bool vpmu_msr = false;
>>      int ret;
>> @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val,
>>          *val = curr->arch.pv_vcpu.gs_base_user;
>>          return X86EMUL_OKAY;
>>  
>> -    /*
>> -     * In order to fully retain original behavior, defer calling
>> -     * pv_soft_rdtsc() until after emulation. This may want/need to be
>> -     * reconsidered.
>> -     */
>>      case MSR_IA32_TSC:
>> -        poc->tsc |= TSC_BASE;
>> -        goto normal;
>> +        *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
>> +        return X86EMUL_OKAY;
>>  
>>      case MSR_TSC_AUX:
>> -        poc->tsc |= TSC_AUX;
>> -        if ( cpu_has_rdtscp )
>> -            goto normal;
>> -        *val = 0;
>> +        *val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
>> +                          ? currd->arch.incarnation : 0);
> I wonder whether Xen should inject a #GP here if tsc_mode is not
> PVRDTSCP and RDPID is not available.

RDPID would be a layering violation.  Also, a lot of this changes again
in patch 5.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 11:58 ` [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch Andrew Cooper
@ 2018-02-20 17:03   ` Wei Liu
  2018-02-20 17:42     ` Andrew Cooper
  2018-02-20 17:35   ` Roger Pau Monné
  2018-02-21 11:36   ` [PATCH v2 " Andrew Cooper
  2 siblings, 1 reply; 47+ messages in thread
From: Wei Liu @ 2018-02-20 17:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky, Roger Pau Monné

On Tue, Feb 20, 2018 at 11:58:43AM +0000, Andrew Cooper wrote:
> There are many problems with MSR_TSC_AUX handling.
> 
> To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
     ^
     begin

> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> depends on RDTSCP.
> 
> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> PV guests, which in turn allows RDPID to work.
> 
> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> into the generic MSR policy infrastructure, and becomes common.  One
> improvement is that we will now reject invalid values, rather than silently
> truncating an accepting them.  This also causes the emulator to reject RDTSCP
             ^
             and

> for guests without the features.
> 
>  
[...]
> @@ -1284,7 +1285,18 @@ long arch_do_domctl(
>                  for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
>                  {
>                      uint64_t val;
> -                    int rc = guest_rdmsr(v, msrs_to_send[j], &val);
> +                    int rc;
> +
> +                    /*
> +                     * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this
> +                     * case, the MSR is read-only, and should be rejected if
> +                     * seen on the restore side.
> +                     */
> +                    if ( msrs_to_send[j] == MSR_TSC_AUX &&
> +                         d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
> +                        continue;

Shouldn't we increment incarnation and send it over to the remote end?
Or send the original value and let the remote increments it?

Frankly I'm not sure how guest is supposed to use that value, but
letting the receiving end restart at 0, which can end up using the same
incarnation as before seems conceptually wrong to me. The more so you
mention a lot of migration's in your previous patches.

This is not disagreeing with the point that IA32_TSC_AUX should be
read-only for PV guest.

And +1 for the unification of HVM and PV code for handling this.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 11:58 ` [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch Andrew Cooper
  2018-02-20 17:03   ` Wei Liu
@ 2018-02-20 17:35   ` Roger Pau Monné
  2018-02-20 18:28     ` Andrew Cooper
  2018-02-21 11:36   ` [PATCH v2 " Andrew Cooper
  2 siblings, 1 reply; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-20 17:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky

On Tue, Feb 20, 2018 at 11:58:43AM +0000, Andrew Cooper wrote:
> There are many problems with MSR_TSC_AUX handling.
> 
> To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> depends on RDTSCP.
> 
> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> PV guests, which in turn allows RDPID to work.
> 
> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> into the generic MSR policy infrastructure, and becomes common.  One
> improvement is that we will now reject invalid values, rather than silently
> truncating an accepting them.  This also causes the emulator to reject RDTSCP
> for guests without the features.
> 
> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
> previous behaviour of this mode was to silently drop writes, but as it is a
> break in the x86 ABI to start with, switch the semantics to be more sane, and
> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> 
> With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
> migrating.  Cope with it the common migration logic.  Care must be taken
> however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as the
> receiving side will reject the guest_wrmsr().
> 
> What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
> all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
> emulation code functions correctly.

I have one likely stupid question about the PVRDTSCP, how does the
guest know it's actually using it? I'm not able to find any cpuid or
xenfeat to signal the guest it's running in PVRDTSCP mode, and thus
that MSR_TSC_AUX contains this magic incarnation value?

Which TBH seems quite pointless, the guest should be perfectly capable
of maintaining it's own count of migrations.

> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 0539551..ab24f87 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -792,7 +792,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>  
>          ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
>  
> -        ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
> +        /* ctxt.msr_tsc_aux omitted - now sent via generic MSR record. */
>  
>          hvm_get_segment_register(v, x86_seg_idtr, &seg);
>          ctxt.idtr_limit = seg.limit;
> @@ -1046,7 +1046,24 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>      if ( hvm_funcs.tsc_scaling.setup )
>          hvm_funcs.tsc_scaling.setup(v);
>  
> -    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
> +    /*
> +     * Backwards compatibility.  MSR_TSC_AUX contains different information
> +     * depending on whether TSC_MODE_PVRDTSCP is enabled.
> +     *
> +     * Before Xen 4.11, ctxt.msr_tsc_aux was sent unconditionally and restored
> +     * here, but the value was otherwise ignored in TSC_MODE_PVRDTSCP.
> +     *
> +     * In 4.11, the logic was changed to send MSR_TSC_AUX via the generic MSR
> +     * mechanism if tsc_mode != TSC_MODE_PVRDTSCP.  If tsc_mode ==
> +     * TSC_MODE_PVRDTSCP, the tsc logic is responsibile for setting the
> +     * correct value.
> +     *
> +     * For compatibility with migration streams from before 4.11, we restore
> +     * from ctxt.msr_tsc_aux if the TSC code hasn't/isn't in charge, and we've
> +     * not seen a value arrive in the generic MSR record.
> +     */
> +    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP && !v->arch.msr->tsc_aux )
> +        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;

I'm having a hard time figuring out whether the MSRs have been loaded
at this point, I assume there's some guarantee that MSR will be loaded
before loading the CPU context?

The rest LGTM.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
  2018-02-20 16:37     ` Andrew Cooper
@ 2018-02-20 17:40       ` Roger Pau Monné
  0 siblings, 0 replies; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-20 17:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

On Tue, Feb 20, 2018 at 04:37:51PM +0000, Andrew Cooper wrote:
> On 20/02/18 16:28, Roger Pau Monné wrote:
> > On Tue, Feb 20, 2018 at 11:58:42AM +0000, Andrew Cooper wrote:
> >> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> >>
> >> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
> >> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
> >> and the emulator doesn't perform appropriate feature checking.  (Conversely,
> >> we unilaterally advertise RDPID which uses the same path, but it should never
> >> trap on #GP to arrive here in the first place).
> >>
> >> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> >> probably end up using it.  On a capable pipeline (without TSD, see below), it
> >> will execute normally and return non-virtualised data.
> >>
> >> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> >> clear, so executing RDTSCP will execute without trapping.  However, a guest
> >> kernel may set TSD itself, at which point the emulator should not suddenly
> >> switch to virtualised TSC mode and start handing out differently-scaled
> >> values.
> >>
> >> Drop all the deferral logic, and return scaled or raw TSC values depending
> >> only on currd->arch.vtsc.  This changes the exact moment at which the
> >> timestamp is taken, but that doesn't matter from the guests point of view, and
> >> is consistent with the HVM side of things.  It also means that RDTSC and
> >> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
> >>
> >> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> >> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
> >> of hardware.
> >>
> >> This is a behavioural change for guests, but the semantics are rather more
> >> sane.  It lays groundwork for further fixes.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Wei Liu <wei.liu2@citrix.com>
> >> CC: Roger Pau Monné <roger.pau@citrix.com>
> >> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> ---
> >>  xen/arch/x86/pv/emul-priv-op.c | 35 +++++------------------------------
> >>  1 file changed, 5 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> >> index d4d64f2..4e3641d 100644
> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -60,9 +60,6 @@ struct priv_op_ctxt {
> >>      } cs;
> >>      char *io_emul_stub;
> >>      unsigned int bpmatch;
> >> -    unsigned int tsc;
> >> -#define TSC_BASE 1
> >> -#define TSC_AUX 2
> >>  };
> >>  
> >>  /* I/O emulation support. Helper routines for, and type of, the stack stub. */
> >> @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct domain *d)
> >>  static int read_msr(unsigned int reg, uint64_t *val,
> >>                      struct x86_emulate_ctxt *ctxt)
> >>  {
> >> -    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
> >> -    const struct vcpu *curr = current;
> >> +    struct vcpu *curr = current;
> > I think you can keep the const here?
> 
> pv_soft_rdtsc() mutates curr.  This change is most certainly not spurious.

Oh right, you constified regs but not the vcpu parameter. AFAICT the
vcpu parameter of pv_soft_rdtsc could also be constified? It's only
used to get the domain and whether the guests is in kernel or user
mode.

> >>      const struct domain *currd = curr->domain;
> >>      bool vpmu_msr = false;
> >>      int ret;
> >> @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val,
> >>          *val = curr->arch.pv_vcpu.gs_base_user;
> >>          return X86EMUL_OKAY;
> >>  
> >> -    /*
> >> -     * In order to fully retain original behavior, defer calling
> >> -     * pv_soft_rdtsc() until after emulation. This may want/need to be
> >> -     * reconsidered.
> >> -     */
> >>      case MSR_IA32_TSC:
> >> -        poc->tsc |= TSC_BASE;
> >> -        goto normal;
> >> +        *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
> >> +        return X86EMUL_OKAY;
> >>  
> >>      case MSR_TSC_AUX:
> >> -        poc->tsc |= TSC_AUX;
> >> -        if ( cpu_has_rdtscp )
> >> -            goto normal;
> >> -        *val = 0;
> >> +        *val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
> >> +                          ? currd->arch.incarnation : 0);
> > I wonder whether Xen should inject a #GP here if tsc_mode is not
> > PVRDTSCP and RDPID is not available.
> 
> RDPID would be a layering violation.  Also, a lot of this changes again
> in patch 5.

I see, patch 5 makes this all much better. And this is not any worse
that the previous behavior:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 17:03   ` Wei Liu
@ 2018-02-20 17:42     ` Andrew Cooper
  2018-02-21 11:08       ` Wei Liu
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 17:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: Kevin Tian, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky, Roger Pau Monné

On 20/02/18 17:03, Wei Liu wrote:
> On Tue, Feb 20, 2018 at 11:58:43AM +0000, Andrew Cooper wrote:
>> There are many problems with MSR_TSC_AUX handling.
>>
>> To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
>      ^
>      begin
>
>> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
>> depends on RDTSCP.
>>
>> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
>> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
>> PV guests, which in turn allows RDPID to work.
>>
>> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
>> into the generic MSR policy infrastructure, and becomes common.  One
>> improvement is that we will now reject invalid values, rather than silently
>> truncating an accepting them.  This also causes the emulator to reject RDTSCP
>              ^
>              and
>
>> for guests without the features.
>>
>>  
> [...]
>> @@ -1284,7 +1285,18 @@ long arch_do_domctl(
>>                  for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
>>                  {
>>                      uint64_t val;
>> -                    int rc = guest_rdmsr(v, msrs_to_send[j], &val);
>> +                    int rc;
>> +
>> +                    /*
>> +                     * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this
>> +                     * case, the MSR is read-only, and should be rejected if
>> +                     * seen on the restore side.
>> +                     */
>> +                    if ( msrs_to_send[j] == MSR_TSC_AUX &&
>> +                         d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
>> +                        continue;
> Shouldn't we increment incarnation and send it over to the remote end?
> Or send the original value and let the remote increments it?

incarnation, and its increments, is handled in tsc_set_info(), which is
keyed off the TSC_INFO record in the migration stream.  That side of
things "already works" (FSVO "works").

The check here is to cause us to skip everything to do with migrating
MSR_TSC_AUX if we think the TSC code is responsible for the value.

> Frankly I'm not sure how guest is supposed to use that value, but
> letting the receiving end restart at 0, which can end up using the same
> incarnation as before seems conceptually wrong to me. The more so you
> mention a lot of migration's in your previous patches.

I don't understand what thought and planning lead to TSC_MODE_PVRDTSCP.

From an implementation side of things, it was also responsible for
introducing a binary incompatibility into the migration stream between
3.3 and 3.4 which caused sporadic loss of interrupts on migrate, and was
sufficiently complicated to track down that it only got fixed in 4.2 
(c/s 620ce29a2d45, but not that that change is an accurate reflection of
how much stress and effort went into tracking the issue down).

> This is not disagreeing with the point that IA32_TSC_AUX should be
> read-only for PV guest.

Any guest running with TSC_MODE_PVRDTSCP.  It supposedly works for HVM
guests as well.

> And +1 for the unification of HVM and PV code for handling this.

An observant person might notice that all MSR handing is starting to
become common, and this might be following suite from CPUID a few
release ago.  This totally isn't on purpose...

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 17:35   ` Roger Pau Monné
@ 2018-02-20 18:28     ` Andrew Cooper
  2018-02-21 10:13       ` Roger Pau Monné
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-20 18:28 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky

On 20/02/18 17:35, Roger Pau Monné wrote:
> On Tue, Feb 20, 2018 at 11:58:43AM +0000, Andrew Cooper wrote:
>> There are many problems with MSR_TSC_AUX handling.
>>
>> To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
>> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
>> depends on RDTSCP.
>>
>> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
>> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
>> PV guests, which in turn allows RDPID to work.
>>
>> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
>> into the generic MSR policy infrastructure, and becomes common.  One
>> improvement is that we will now reject invalid values, rather than silently
>> truncating an accepting them.  This also causes the emulator to reject RDTSCP
>> for guests without the features.
>>
>> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
>> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
>> previous behaviour of this mode was to silently drop writes, but as it is a
>> break in the x86 ABI to start with, switch the semantics to be more sane, and
>> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
>>
>> With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
>> migrating.  Cope with it the common migration logic.  Care must be taken
>> however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as the
>> receiving side will reject the guest_wrmsr().
>>
>> What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
>> all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
>> emulation code functions correctly.
> I have one likely stupid question about the PVRDTSCP, how does the
> guest know it's actually using it? I'm not able to find any cpuid or
> xenfeat to signal the guest it's running in PVRDTSCP mode, and thus
> that MSR_TSC_AUX contains this magic incarnation value?

40000x03[0].b

Which I notice now isn't a constant in the public API. :(

(Oh - another misfeature this feature is responsible for is the fact
that there is a time subleaf of the hypervisor CPUID leaves, without any
ability to calculate the number of valid subleaves.)

I've never seen any guest-side code to cope with feature, even in the
classic-linux fork.  I'll defer to Konrad on the topic, but I'm fairly
sure it is vestigial Oracle-ism.

>
> Which TBH seems quite pointless, the guest should be perfectly capable
> of maintaining it's own count of migrations.

The only useful piece of information is whether Xen is having to
virtualise your time values because the frequency is now different to
how it was previously advertised.

Oh - we also include support for emulating RDTSCP for PV guests only, on
incapable hardware (via emul-invl-op).  I was considering restricting
this to PVRDTSC mode only.

>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 0539551..ab24f87 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -792,7 +792,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>>  
>>          ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
>>  
>> -        ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
>> +        /* ctxt.msr_tsc_aux omitted - now sent via generic MSR record. */
>>  
>>          hvm_get_segment_register(v, x86_seg_idtr, &seg);
>>          ctxt.idtr_limit = seg.limit;
>> @@ -1046,7 +1046,24 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>>      if ( hvm_funcs.tsc_scaling.setup )
>>          hvm_funcs.tsc_scaling.setup(v);
>>  
>> -    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
>> +    /*
>> +     * Backwards compatibility.  MSR_TSC_AUX contains different information
>> +     * depending on whether TSC_MODE_PVRDTSCP is enabled.
>> +     *
>> +     * Before Xen 4.11, ctxt.msr_tsc_aux was sent unconditionally and restored
>> +     * here, but the value was otherwise ignored in TSC_MODE_PVRDTSCP.
>> +     *
>> +     * In 4.11, the logic was changed to send MSR_TSC_AUX via the generic MSR
>> +     * mechanism if tsc_mode != TSC_MODE_PVRDTSCP.  If tsc_mode ==
>> +     * TSC_MODE_PVRDTSCP, the tsc logic is responsibile for setting the
>> +     * correct value.
>> +     *
>> +     * For compatibility with migration streams from before 4.11, we restore
>> +     * from ctxt.msr_tsc_aux if the TSC code hasn't/isn't in charge, and we've
>> +     * not seen a value arrive in the generic MSR record.
>> +     */
>> +    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP && !v->arch.msr->tsc_aux )
>> +        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
> I'm having a hard time figuring out whether the MSRs have been loaded
> at this point, I assume there's some guarantee that MSR will be loaded
> before loading the CPU context?

Actually, this is subtly broken, but for different reasons.  It will
break systems Remus/COLO (if anyone still cares).

TSC_INFO is ahead of HVM_CONTEXT in the migration stream, but this is
"because it always was like that", not because there is a specified
dependency.

The HVM_CONTEXT record contains both this CPU record, and the MSR
record.  There is no particular order of records in the blob, but they
appear in age order meaning that MSR record is later in the stream than
the CPU record.

The TSC mode itself (unlike its parameters) really needs to be
create-time static configuration (along with a load of other
parameters), and there is a protoplan to make this happen.

However, I think we can rely on TSC_MODE_PVRDTSCP being known before
considering whether to restore.  Restoring data will, in practice, come
either from ctxt.msr_tsc_aux, or from a later MSR record, but not both. 
However, in the case that ctxt.msr_tsc_aux is the source of information,
we should (re)load it when restoring into a non-clean domain, which is
the Remus/COLO case.

The other option is to leave the HVM migration side using this value,
and only move the PV side in the common MSR path.  This is probably a
better option until I get around to migration-v2 for the hypervisor
data, and unifying the PV and HVM state handling.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 18:28     ` Andrew Cooper
@ 2018-02-21 10:13       ` Roger Pau Monné
  0 siblings, 0 replies; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-21 10:13 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky

On Tue, Feb 20, 2018 at 06:28:15PM +0000, Andrew Cooper wrote:
> On 20/02/18 17:35, Roger Pau Monné wrote:
> > On Tue, Feb 20, 2018 at 11:58:43AM +0000, Andrew Cooper wrote:
> >> There are many problems with MSR_TSC_AUX handling.
> >>
> >> To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> >> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> >> depends on RDTSCP.
> >>
> >> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> >> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> >> PV guests, which in turn allows RDPID to work.
> >>
> >> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> >> into the generic MSR policy infrastructure, and becomes common.  One
> >> improvement is that we will now reject invalid values, rather than silently
> >> truncating an accepting them.  This also causes the emulator to reject RDTSCP
> >> for guests without the features.
> >>
> >> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
> >> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
> >> previous behaviour of this mode was to silently drop writes, but as it is a
> >> break in the x86 ABI to start with, switch the semantics to be more sane, and
> >> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> >>
> >> With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
> >> migrating.  Cope with it the common migration logic.  Care must be taken
> >> however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as the
> >> receiving side will reject the guest_wrmsr().
> >>
> >> What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
> >> all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
> >> emulation code functions correctly.
> > I have one likely stupid question about the PVRDTSCP, how does the
> > guest know it's actually using it? I'm not able to find any cpuid or
> > xenfeat to signal the guest it's running in PVRDTSCP mode, and thus
> > that MSR_TSC_AUX contains this magic incarnation value?
> 
> 40000x03[0].b
> 
> Which I notice now isn't a constant in the public API. :(

Right, looks like TSC_MODE_* should be in cpuid.h, this is part of the
ABI. The values are already listed in the comment, but it's very easy
for this to get out of sync.

> (Oh - another misfeature this feature is responsible for is the fact
> that there is a time subleaf of the hypervisor CPUID leaves, without any
> ability to calculate the number of valid subleaves.)

The same applies to the HVM leaf AFAICT, although eax on sub-leaf 0 can
be used to signal the presence of other sub-leaves in that case.

> I've never seen any guest-side code to cope with feature, even in the
> classic-linux fork.  I'll defer to Konrad on the topic, but I'm fairly
> sure it is vestigial Oracle-ism.
> 
> >
> > Which TBH seems quite pointless, the guest should be perfectly capable
> > of maintaining it's own count of migrations.
> 
> The only useful piece of information is whether Xen is having to
> virtualise your time values because the frequency is now different to
> how it was previously advertised.
> 
> Oh - we also include support for emulating RDTSCP for PV guests only, on
> incapable hardware (via emul-invl-op).  I was considering restricting
> this to PVRDTSC mode only.

Seems sensible.

> >
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >> index 0539551..ab24f87 100644
> >> --- a/xen/arch/x86/hvm/hvm.c
> >> +++ b/xen/arch/x86/hvm/hvm.c
> >> @@ -792,7 +792,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> >>  
> >>          ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
> >>  
> >> -        ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
> >> +        /* ctxt.msr_tsc_aux omitted - now sent via generic MSR record. */
> >>  
> >>          hvm_get_segment_register(v, x86_seg_idtr, &seg);
> >>          ctxt.idtr_limit = seg.limit;
> >> @@ -1046,7 +1046,24 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> >>      if ( hvm_funcs.tsc_scaling.setup )
> >>          hvm_funcs.tsc_scaling.setup(v);
> >>  
> >> -    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
> >> +    /*
> >> +     * Backwards compatibility.  MSR_TSC_AUX contains different information
> >> +     * depending on whether TSC_MODE_PVRDTSCP is enabled.
> >> +     *
> >> +     * Before Xen 4.11, ctxt.msr_tsc_aux was sent unconditionally and restored
> >> +     * here, but the value was otherwise ignored in TSC_MODE_PVRDTSCP.
> >> +     *
> >> +     * In 4.11, the logic was changed to send MSR_TSC_AUX via the generic MSR
> >> +     * mechanism if tsc_mode != TSC_MODE_PVRDTSCP.  If tsc_mode ==
> >> +     * TSC_MODE_PVRDTSCP, the tsc logic is responsibile for setting the
> >> +     * correct value.
> >> +     *
> >> +     * For compatibility with migration streams from before 4.11, we restore
> >> +     * from ctxt.msr_tsc_aux if the TSC code hasn't/isn't in charge, and we've
> >> +     * not seen a value arrive in the generic MSR record.
> >> +     */
> >> +    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP && !v->arch.msr->tsc_aux )
> >> +        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
> > I'm having a hard time figuring out whether the MSRs have been loaded
> > at this point, I assume there's some guarantee that MSR will be loaded
> > before loading the CPU context?
> 
> Actually, this is subtly broken, but for different reasons.  It will
> break systems Remus/COLO (if anyone still cares).
> 
> TSC_INFO is ahead of HVM_CONTEXT in the migration stream, but this is
> "because it always was like that", not because there is a specified
> dependency.
> 
> The HVM_CONTEXT record contains both this CPU record, and the MSR
> record.  There is no particular order of records in the blob, but they
> appear in age order meaning that MSR record is later in the stream than
> the CPU record.

So if the MSR record comes later in the stream, the comment is
wrong:

"and we've not seen a value arrive in the generic MSR record."

Will always be true when processing the CPU record, because the MSR
record hasn't been processed yet.

I would rewrite that as:

"For compatibility with migration streams from before 4.11, we restore
from ctxt.msr_tsc_aux if the TSC code hasn't/isn't in charge. Check
whether tsc_aux already contains a value, in case the MSR record was
restored before the CPU one. If not just set it to the value found in
the CPU record, a later MSR record can always overwrite it."

Which seems clearer to me, but maybe that's because I'm not very
familiar with the code itself.

> The TSC mode itself (unlike its parameters) really needs to be
> create-time static configuration (along with a load of other
> parameters), and there is a protoplan to make this happen.
> 
> However, I think we can rely on TSC_MODE_PVRDTSCP being known before
> considering whether to restore.  Restoring data will, in practice, come
> either from ctxt.msr_tsc_aux, or from a later MSR record, but not both. 
> However, in the case that ctxt.msr_tsc_aux is the source of information,
> we should (re)load it when restoring into a non-clean domain, which is
> the Remus/COLO case.

Hm, maybe an 'easy' way to solve this would be to just zero all the
relevant internal structures before attempting a restore?

> The other option is to leave the HVM migration side using this value,
> and only move the PV side in the common MSR path.  This is probably a
> better option until I get around to migration-v2 for the hypervisor
> data, and unifying the PV and HVM state handling.

But the HVM code as-is seems wrong to me for the PVRDTSCP case because
it overwrites the new incarnation with the previous one from the CPU
record on restore? (because the TSC mode is restored before the CPU
record).

Maybe we could get rid of the PVRDTSCP mode completely and forget
about all that?

We just need to reserve that TSC mode in the public ABI in order to
prevent anyone from reusing it, but I'm not sure we need to continue
to implement it.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 17:42     ` Andrew Cooper
@ 2018-02-21 11:08       ` Wei Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2018-02-21 11:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Xen-devel, Jun Nakajima,
	Boris Ostrovsky, Suravee Suthikulpanit, Roger Pau Monné

On Tue, Feb 20, 2018 at 05:42:06PM +0000, Andrew Cooper wrote:
> On 20/02/18 17:03, Wei Liu wrote:
> > On Tue, Feb 20, 2018 at 11:58:43AM +0000, Andrew Cooper wrote:
> >> There are many problems with MSR_TSC_AUX handling.
> >>
> >> To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> >      ^
> >      begin
> >
> >> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> >> depends on RDTSCP.
> >>
> >> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> >> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> >> PV guests, which in turn allows RDPID to work.
> >>
> >> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> >> into the generic MSR policy infrastructure, and becomes common.  One
> >> improvement is that we will now reject invalid values, rather than silently
> >> truncating an accepting them.  This also causes the emulator to reject RDTSCP
> >              ^
> >              and
> >
> >> for guests without the features.
> >>
> >>  
> > [...]
> >> @@ -1284,7 +1285,18 @@ long arch_do_domctl(
> >>                  for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
> >>                  {
> >>                      uint64_t val;
> >> -                    int rc = guest_rdmsr(v, msrs_to_send[j], &val);
> >> +                    int rc;
> >> +
> >> +                    /*
> >> +                     * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this
> >> +                     * case, the MSR is read-only, and should be rejected if
> >> +                     * seen on the restore side.
> >> +                     */
> >> +                    if ( msrs_to_send[j] == MSR_TSC_AUX &&
> >> +                         d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
> >> +                        continue;
> > Shouldn't we increment incarnation and send it over to the remote end?
> > Or send the original value and let the remote increments it?
> 
> incarnation, and its increments, is handled in tsc_set_info(), which is
> keyed off the TSC_INFO record in the migration stream.  That side of
> things "already works" (FSVO "works").
> 

OK. That's fine then.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-20 11:58 ` [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch Andrew Cooper
  2018-02-20 17:03   ` Wei Liu
  2018-02-20 17:35   ` Roger Pau Monné
@ 2018-02-21 11:36   ` Andrew Cooper
  2018-02-21 12:06     ` Wei Liu
                       ` (2 more replies)
  2 siblings, 3 replies; 47+ messages in thread
From: Andrew Cooper @ 2018-02-21 11:36 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Boris Ostrovsky, Suravee Suthikulpanit, Roger Pau Monné

There are many problems with MSR_TSC_AUX handling.

To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
depends on RDTSCP.

For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
PV guests, which in turn allows RDPID to work.

To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
into the generic MSR policy infrastructure, and becomes common.  One
improvement is that we will now reject invalid values, rather than silently
truncating and accepting them.  This also causes the emulator to reject RDTSCP
for guests without the features.

One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
previous behaviour of this mode was to silently drop writes, but as it is a
break in the x86 ABI to start with, switch the semantics to be more sane, and
have TSC_MODE_PVRDTSCP make the MSR properly read-only.

With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
migrating, so it is moved into the common MSR logic for PV guests.  Care must
be taken however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as
the receiving side will reject the guest_wrmsr().  The HVM side is tweaked as
well to only send/receive hvm_hw_cpu.msr_tsc_aux when the TSC logic isn't in
control of the value.

What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
emulation code functions correctly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

v2:
 * Spelling corrections in the commit message
 * Fix HVM migration issues
---
 xen/arch/x86/domain.c                       |  3 +--
 xen/arch/x86/domctl.c                       | 15 ++++++++++++++-
 xen/arch/x86/hvm/hvm.c                      | 19 ++++++-------------
 xen/arch/x86/hvm/svm/svm.c                  |  4 ++--
 xen/arch/x86/hvm/vmx/vmx.c                  |  4 ++--
 xen/arch/x86/msr.c                          | 16 ++++++++++++++++
 xen/arch/x86/pv/emul-inv-op.c               |  4 +---
 xen/arch/x86/pv/emul-priv-op.c              |  5 -----
 xen/arch/x86/time.c                         | 11 +++++++++++
 xen/arch/x86/x86_emulate/x86_emulate.c      |  1 +
 xen/include/asm-x86/hvm/hvm.h               |  6 ------
 xen/include/asm-x86/hvm/vcpu.h              |  1 -
 xen/include/asm-x86/msr.h                   |  9 +++++++++
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 xen/include/public/arch-x86/hvm/save.h      |  5 +++++
 xen/tools/gen-cpuid.py                      |  3 +++
 16 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9c3527f..144d6f0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1534,8 +1534,7 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
         activate_debugregs(v);
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
-                      ? v->domain->arch.incarnation : 0);
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3a..979afdf 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1249,6 +1249,7 @@ long arch_do_domctl(
         static const uint32_t msrs_to_send[] = {
             MSR_SPEC_CTRL,
             MSR_INTEL_MISC_FEATURES_ENABLES,
+            MSR_TSC_AUX,
         };
         uint32_t nr_msrs = ARRAY_SIZE(msrs_to_send);
 
@@ -1284,7 +1285,18 @@ long arch_do_domctl(
                 for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
                 {
                     uint64_t val;
-                    int rc = guest_rdmsr(v, msrs_to_send[j], &val);
+                    int rc;
+
+                    /*
+                     * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this
+                     * case, the MSR is read-only, and should be rejected if
+                     * seen on the restore side.
+                     */
+                    if ( msrs_to_send[j] == MSR_TSC_AUX &&
+                         d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+                        continue;
+
+                    rc = guest_rdmsr(v, msrs_to_send[j], &val);
 
                     /*
                      * It is the programmers responsibility to ensure that
@@ -1373,6 +1385,7 @@ long arch_do_domctl(
                 {
                 case MSR_SPEC_CTRL:
                 case MSR_INTEL_MISC_FEATURES_ENABLES:
+                case MSR_TSC_AUX:
                     if ( guest_wrmsr(v, msr.index, msr.value) != X86EMUL_OKAY )
                         break;
                     continue;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0539551..e45f6df 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -792,7 +792,9 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
 
         ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
 
-        ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
+        /* Only send MSR_TSC_AUX if it isn't being handled by the TSC logic. */
+        if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
+            ctxt.msr_tsc_aux = v->arch.msr->tsc_aux;
 
         hvm_get_segment_register(v, x86_seg_idtr, &seg);
         ctxt.idtr_limit = seg.limit;
@@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
     if ( hvm_funcs.tsc_scaling.setup )
         hvm_funcs.tsc_scaling.setup(v);
 
-    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
+    /* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */
+    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
+        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
 
     hvm_set_guest_tsc_fixed(v, ctxt.tsc, d->arch.hvm_domain.sync_tsc);
 
@@ -3424,10 +3428,6 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content = v->arch.hvm_vcpu.msr_tsc_adjust;
         break;
 
-    case MSR_TSC_AUX:
-        *msr_content = hvm_msr_tsc_aux(v);
-        break;
-
     case MSR_IA32_APICBASE:
         *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
         break;
@@ -3575,13 +3575,6 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
         hvm_set_guest_tsc_adjust(v, msr_content);
         break;
 
-    case MSR_TSC_AUX:
-        v->arch.hvm_vcpu.msr_tsc_aux = (uint32_t)msr_content;
-        if ( cpu_has_rdtscp
-             && (v->domain->arch.tsc_mode != TSC_MODE_PVRDTSCP) )
-            wrmsr_tsc_aux(msr_content);
-        break;
-
     case MSR_IA32_APICBASE:
         if ( !vlapic_msr_set(vcpu_vlapic(v), msr_content) )
             goto gp_fault;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index f53f430..9406624 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1093,7 +1093,7 @@ static void svm_ctxt_switch_to(struct vcpu *v)
     svm_tsc_ratio_load(v);
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 static void noreturn svm_do_resume(struct vcpu *v)
@@ -2842,7 +2842,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
         break;
 
     case VMEXIT_RDTSCP:
-        regs->rcx = hvm_msr_tsc_aux(v);
+        regs->rcx = v->arch.msr->tsc_aux;
         /* fall through */
     case VMEXIT_RDTSC:
         svm_vmexit_do_rdtsc(regs);
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 31acb0e..45fd9c5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -618,7 +618,7 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
     }
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
@@ -3858,7 +3858,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_invlpg_intercept(exit_qualification);
         break;
     case EXIT_REASON_RDTSCP:
-        regs->rcx = hvm_msr_tsc_aux(v);
+        regs->rcx = v->arch.msr->tsc_aux;
         /* fall through */
     case EXIT_REASON_RDTSC:
         update_guest_eip(); /* Safe: RDTSC, RDTSCP */
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 8ae3b4e..8a8cad2 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
                _MSR_MISC_FEATURES_CPUID_FAULTING;
         break;
 
+    case MSR_TSC_AUX:
+        if ( !cp->extd.rdtscp )
+            goto gp_fault;
+
+        *val = vp->tsc_aux;
+        break;
+
     default:
         return X86EMUL_UNHANDLEABLE;
     }
@@ -250,6 +257,15 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
         break;
     }
 
+    case MSR_TSC_AUX:
+        if ( !cp->extd.rdtscp ||                      /* MSR available? */
+             d->arch.tsc_mode == TSC_MODE_PVRDTSCP || /* MSR read-only? */
+             val != (uint32_t)val )                   /* Rsvd bits set? */
+            goto gp_fault;
+
+        wrmsr_tsc_aux(vp->tsc_aux = val);
+        break;
+
     default:
         return X86EMUL_UNHANDLEABLE;
     }
diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
index be95a5f..c071372 100644
--- a/xen/arch/x86/pv/emul-inv-op.c
+++ b/xen/arch/x86/pv/emul-inv-op.c
@@ -46,7 +46,6 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     char opcode[3];
     unsigned long eip, rc;
     struct vcpu *v = current;
-    const struct domain *currd = v->domain;
 
     eip = regs->rip;
     if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
@@ -59,8 +58,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     eip += sizeof(opcode);
 
     msr_split(regs, pv_soft_rdtsc(v, regs));
-    regs->rcx = (currd->arch.tsc_mode == TSC_MODE_PVRDTSCP
-                 ? currd->arch.incarnation : 0);
+    regs->rcx = v->arch.msr->tsc_aux;
 
     pv_emul_instruction_done(regs, eip);
     return EXCRET_fault_fixed;
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index f2888ab..7e64c10 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -880,11 +880,6 @@ static int read_msr(unsigned int reg, uint64_t *val,
         *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
         return X86EMUL_OKAY;
 
-    case MSR_TSC_AUX:
-        *val = (uint32_t)(currd->arch.tsc_mode == TSC_MODE_PVRDTSCP
-                          ? currd->arch.incarnation : 0);
-        return X86EMUL_OKAY;
-
     case MSR_EFER:
         *val = read_efer();
         if ( is_pv_32bit_domain(currd) )
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 1a6fde6..37d487c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
         }
         break;
     }
+
     d->arch.incarnation = incarnation + 1;
+
+    if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+    {
+        struct vcpu *v;
+
+        /* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
+        for_each_vcpu ( d, v )
+            v->arch.msr->tsc_aux = d->arch.incarnation;
+    }
+
     if ( is_hvm_domain(d) )
     {
         if ( hvm_tsc_scaling_supported && !d->arch.vtsc )
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index 85383ea..efb6003 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -5141,6 +5141,7 @@ x86_emulate(
 
         case 0xf9: /* rdtscp */
             fail_if(ops->read_msr == NULL);
+            /* Getting a result implies vcpu_has_rdtscp() */
             if ( (rc = ops->read_msr(MSR_TSC_AUX,
                                      &msr_val, ctxt)) != X86EMUL_OKAY )
                 goto done;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index dd3dd5f..7708fdf 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -511,12 +511,6 @@ static inline void hvm_invalidate_regs_fields(struct cpu_user_regs *regs)
 int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
                               struct npfec npfec);
 
-#define hvm_msr_tsc_aux(v) ({                                               \
-    struct domain *__d = (v)->domain;                                       \
-    (__d->arch.tsc_mode == TSC_MODE_PVRDTSCP)                               \
-        ? (u32)__d->arch.incarnation : (u32)(v)->arch.hvm_vcpu.msr_tsc_aux; \
-})
-
 int hvm_x2apic_msr_read(struct vcpu *v, unsigned int msr, uint64_t *msr_content);
 int hvm_x2apic_msr_write(struct vcpu *v, unsigned int msr, uint64_t msr_content);
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index d93166f..60425f3 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -170,7 +170,6 @@ struct hvm_vcpu {
 
     struct hvm_vcpu_asid n1asid;
 
-    u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
     u64                 msr_xss;
 
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index a5072a2..c13388a 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -258,6 +258,15 @@ struct msr_vcpu_policy
         bool available; /* This MSR is non-architectural */
         bool cpuid_faulting;
     } misc_features_enables;
+
+    /*
+     * 0xc0000103 - MSR_TSC_AUX
+     *
+     * Usually guest chosen.  However, when using TSC_MODE_PVRDTSCP the value
+     * is Xen-controlled and is the migration incarnation, at which point the
+     * MSR becomes read-only to the guest.
+     */
+    uint32_t tsc_aux;
 };
 
 void init_guest_msr_policy(void);
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index fa81af1..78fa466 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -156,7 +156,7 @@ XEN_CPUFEATURE(NX,            2*32+20) /*A  Execute Disable */
 XEN_CPUFEATURE(MMXEXT,        2*32+22) /*A  AMD MMX extensions */
 XEN_CPUFEATURE(FFXSR,         2*32+25) /*A  FFXSR instruction optimizations */
 XEN_CPUFEATURE(PAGE1GB,       2*32+26) /*H  1Gb large page support */
-XEN_CPUFEATURE(RDTSCP,        2*32+27) /*S  RDTSCP */
+XEN_CPUFEATURE(RDTSCP,        2*32+27) /*A  RDTSCP */
 XEN_CPUFEATURE(LM,            2*32+29) /*A  Long Mode (x86-64) */
 XEN_CPUFEATURE(3DNOWEXT,      2*32+30) /*A  AMD 3DNow! extensions */
 XEN_CPUFEATURE(3DNOW,         2*32+31) /*A  3DNow! */
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 4691d4d..0df3e26 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -141,6 +141,11 @@ struct hvm_hw_cpu {
     uint64_t msr_cstar;
     uint64_t msr_syscall_mask;
     uint64_t msr_efer;
+
+    /*
+     * Since 4.11, this value is written as 0 if TSC_MODE_PVRDTSCP is active,
+     * and the value is actually controlled by the TSC logic.
+     */
     uint64_t msr_tsc_aux;
 
     /* guest's idea of what rdtsc() would return */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 613b909..601a7a0 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -235,6 +235,9 @@ def crunch_numbers(state):
         # absence of any enabled xstate.
         AVX: [FMA, FMA4, F16C, AVX2, XOP],
 
+        # MSR_TSC_AUX is enumerated by RDTSCP, but RDPID also reads TSC_AUX.
+        RDTSCP: [RDPID],
+
         # CX16 is only encodable in Long Mode.  LAHF_LM indicates that the
         # SAHF/LAHF instructions are reintroduced in Long Mode.  1GB
         # superpages, PCID and PKU are only available in 4 level paging.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-21 11:36   ` [PATCH v2 " Andrew Cooper
@ 2018-02-21 12:06     ` Wei Liu
  2018-02-21 13:04     ` Roger Pau Monné
  2018-02-23 15:05     ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2018-02-21 12:06 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky, Roger Pau Monné

On Wed, Feb 21, 2018 at 11:36:15AM +0000, Andrew Cooper wrote:
> There are many problems with MSR_TSC_AUX handling.
> 
> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> depends on RDTSCP.
> 
> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> PV guests, which in turn allows RDPID to work.
> 
> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> into the generic MSR policy infrastructure, and becomes common.  One
> improvement is that we will now reject invalid values, rather than silently
> truncating and accepting them.  This also causes the emulator to reject RDTSCP
> for guests without the features.
> 
> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
> previous behaviour of this mode was to silently drop writes, but as it is a
> break in the x86 ABI to start with, switch the semantics to be more sane, and
> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> 
> With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
> migrating, so it is moved into the common MSR logic for PV guests.  Care must
> be taken however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as
> the receiving side will reject the guest_wrmsr().  The HVM side is tweaked as
> well to only send/receive hvm_hw_cpu.msr_tsc_aux when the TSC logic isn't in
> control of the value.
> 
> What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
> all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
> emulation code functions correctly.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-21 11:36   ` [PATCH v2 " Andrew Cooper
  2018-02-21 12:06     ` Wei Liu
@ 2018-02-21 13:04     ` Roger Pau Monné
  2018-02-23 15:05     ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Roger Pau Monné @ 2018-02-21 13:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel, Jan Beulich,
	Suravee Suthikulpanit, Boris Ostrovsky

On Wed, Feb 21, 2018 at 11:36:15AM +0000, Andrew Cooper wrote:
> There are many problems with MSR_TSC_AUX handling.
> 
> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> depends on RDTSCP.
> 
> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> PV guests, which in turn allows RDPID to work.
> 
> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> into the generic MSR policy infrastructure, and becomes common.  One
> improvement is that we will now reject invalid values, rather than silently
> truncating and accepting them.  This also causes the emulator to reject RDTSCP
> for guests without the features.
> 
> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
> previous behaviour of this mode was to silently drop writes, but as it is a
> break in the x86 ABI to start with, switch the semantics to be more sane, and
> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> 
> With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
> migrating, so it is moved into the common MSR logic for PV guests.  Care must
> be taken however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as
> the receiving side will reject the guest_wrmsr().  The HVM side is tweaked as
> well to only send/receive hvm_hw_cpu.msr_tsc_aux when the TSC logic isn't in
> control of the value.
> 
> What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
> all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
> emulation code functions correctly.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Just one comment nit below.

> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 0539551..e45f6df 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -792,7 +792,9 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>  
>          ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
>  
> -        ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
> +        /* Only send MSR_TSC_AUX if it isn't being handled by the TSC logic. */
> +        if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
> +            ctxt.msr_tsc_aux = v->arch.msr->tsc_aux;
>  
>          hvm_get_segment_register(v, x86_seg_idtr, &seg);
>          ctxt.idtr_limit = seg.limit;
> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>      if ( hvm_funcs.tsc_scaling.setup )
>          hvm_funcs.tsc_scaling.setup(v);
>  
> -    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
> +    /* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */
               ^ set?

We actually accept it. Ie: Xen doesn't don't error out when
msr_tsc_aux is set and d->arch.tsc_mode == TSC_MODE_PVRDTSCP, or else
we would break backwards compatibility.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
  2018-02-20 11:58 ` [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs() Andrew Cooper
  2018-02-20 14:54   ` Roger Pau Monné
  2018-02-20 15:12   ` Wei Liu
@ 2018-02-23 13:53   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2018-02-23 13:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
> c/s d2f86bf604 which introduced "struct hvm_save_descriptor *d" accidentally
> ended up shadowing the "struct domain *d" function parameter.  Rename the
> former to desc.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
  2018-02-20 15:22   ` Wei Liu
  2018-02-20 15:49   ` Roger Pau Monné
@ 2018-02-23 14:04   ` Jan Beulich
  2018-02-23 14:22     ` Andrew Cooper
  2018-02-26 11:25   ` Jan Beulich
  2018-02-26 19:52   ` Boris Ostrovsky
  4 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2018-02-23 14:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, Boris Ostrovsky, Roger Pau Monné

>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1533,9 +1533,9 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>      if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
>          activate_debugregs(v);
>  
> -    if ( (v->domain->arch.tsc_mode ==  TSC_MODE_PVRDTSCP) &&
> -         boot_cpu_has(X86_FEATURE_RDTSCP) )
> -        write_rdtscp_aux(v->domain->arch.incarnation);
> +    if ( cpu_has_rdtscp )
> +        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
> +                      ? v->domain->arch.incarnation : 0);

Wouldn't the conditional better check cpu_has_rdtscp || cpu_has_rdpid
(then also better matching the description)? And if so, then perhaps
also in other, pre-existing conditionals?

> @@ -210,6 +208,20 @@ static inline void write_efer(uint64_t val)
>  
>  DECLARE_PER_CPU(u32, ler_msr);
>  
> +DECLARE_PER_CPU(uint32_t, tsc_aux);
> +
> +/* Lazy update of MSR_TSC_AUX */
> +static inline void wrmsr_tsc_aux(uint32_t val)

Along the lines of rdtsc() (which also reads an MSR in reality)
wrtsc_aux()?

> +{
> +    uint32_t *this_tsc_aux = &this_cpu(tsc_aux);
> +
> +    if ( *this_tsc_aux != val )
> +    {
> +        wrmsr(MSR_TSC_AUX, val, 0);
> +        *this_tsc_aux = val;

I think it is generally more safe to write the cached value first, so
that some asynchronous writer would pick up the intended new
value. But obviously that's marginal here.

With at least the adjustments to the conditionals
Reviewed-by: Jan Beulich <jbeulich@suse.com>
If, otoh, you disagree, then we'll need to settle on something
first.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-23 14:04   ` Jan Beulich
@ 2018-02-23 14:22     ` Andrew Cooper
  2018-02-23 15:09       ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-23 14:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, Boris Ostrovsky, Roger Pau Monné

On 23/02/18 14:04, Jan Beulich wrote:
>>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1533,9 +1533,9 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>>      if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
>>          activate_debugregs(v);
>>  
>> -    if ( (v->domain->arch.tsc_mode ==  TSC_MODE_PVRDTSCP) &&
>> -         boot_cpu_has(X86_FEATURE_RDTSCP) )
>> -        write_rdtscp_aux(v->domain->arch.incarnation);
>> +    if ( cpu_has_rdtscp )
>> +        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
>> +                      ? v->domain->arch.incarnation : 0);
> Wouldn't the conditional better check cpu_has_rdtscp || cpu_has_rdpid
> (then also better matching the description)? And if so, then perhaps
> also in other, pre-existing conditionals?

cpu_has_rdpid already implies cpu_has_rdtscp in practice, making the ||
redundant.  No hardware exists with rdpid but without rdtscp.

Xen's CPUID logic gets updated to cope in patch 5, but the only place
this can currently go wrong is when running Xen virtualised under Xen
with PVRDTSC enabled, and I don't think we need to care about this case.

>
>> @@ -210,6 +208,20 @@ static inline void write_efer(uint64_t val)
>>  
>>  DECLARE_PER_CPU(u32, ler_msr);
>>  
>> +DECLARE_PER_CPU(uint32_t, tsc_aux);
>> +
>> +/* Lazy update of MSR_TSC_AUX */
>> +static inline void wrmsr_tsc_aux(uint32_t val)
> Along the lines of rdtsc() (which also reads an MSR in reality)
> wrtsc_aux()?

rdtsc() is named after the instruction.

There is no wrtsc instruction, and naming the function like that hides
its purpose.

>
>> +{
>> +    uint32_t *this_tsc_aux = &this_cpu(tsc_aux);
>> +
>> +    if ( *this_tsc_aux != val )
>> +    {
>> +        wrmsr(MSR_TSC_AUX, val, 0);
>> +        *this_tsc_aux = val;
> I think it is generally more safe to write the cached value first, so
> that some asynchronous writer would pick up the intended new
> value. But obviously that's marginal here.

Ok, although there is on chance of asyncrhonous writing here.

>
> With at least the adjustments to the conditionals
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> If, otoh, you disagree, then we'll need to settle on something
> first.

I'm afraid I disagree.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup
  2018-02-20 11:58 ` [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup Andrew Cooper
  2018-02-20 15:32   ` Wei Liu
  2018-02-20 16:04   ` Roger Pau Monné
@ 2018-02-23 14:38   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2018-02-23 14:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
> Having pv_soft_rdtsc() emulate all parts of an rdtscp is awkward, and gets in
> the way of some intended cleanup.
> 
>  * Drop the rdtscp parameter and always make the caller responsible for ecx
>    updates when appropriate.
>  * Switch the function from being void, and return the main timestamp in the
>    return value.
> 
> The regs parameter is still needed, but only for the stats collection, once
> again bringing into question their utility.  The parameter can however 
> switch
> to being const.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
  2018-02-20 11:58 ` [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op() Andrew Cooper
  2018-02-20 16:08   ` Wei Liu
  2018-02-20 16:28   ` Roger Pau Monné
@ 2018-02-23 14:40   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2018-02-23 14:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> 
> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
> and the emulator doesn't perform appropriate feature checking.  (Conversely,
> we unilaterally advertise RDPID which uses the same path, but it should never
> trap on #GP to arrive here in the first place).
> 
> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> probably end up using it.  On a capable pipeline (without TSD, see below), it
> will execute normally and return non-virtualised data.
> 
> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> clear, so executing RDTSCP will execute without trapping.  However, a guest
> kernel may set TSD itself, at which point the emulator should not suddenly
> switch to virtualised TSC mode and start handing out differently-scaled
> values.
> 
> Drop all the deferral logic, and return scaled or raw TSC values depending
> only on currd->arch.vtsc.  This changes the exact moment at which the
> timestamp is taken, but that doesn't matter from the guests point of view, and
> is consistent with the HVM side of things.  It also means that RDTSC and
> RDTSCP are now consistent WRT handing out native or virtualised timestamps.

The reason I didn't want to drop that deferral logic back when I
converted this code to use the main emulator was that
pv_soft_rdtsc() updates guest state beyond register values.
That is if the TSC_AUX access fails (and hence instruction
emulation as a whole fails), we would still have updated that
other state. The stats part of this is quite likely irrelevant in this
regard, but the update to d->arch.vtsc_last may yield
unintended guest observable change in behavior.

I don't mean to say this is a no-go, but it is a change that goes
beyond what you describe, and I'd like it to (a) be spelled out
and (b) given an explanation of why this is okay.

> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
> of hardware.

This, iiuc, is correct solely because we don't currently permit PV
guests to write TSC_AUX. Which renders wrong exposing RDPID
to such guests. But without having checked yet, I could imagine
patch 5 to take care of this.

The changes themselves look fine to me, provided the wider
behavioral change is both intended and acceptable.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-21 11:36   ` [PATCH v2 " Andrew Cooper
  2018-02-21 12:06     ` Wei Liu
  2018-02-21 13:04     ` Roger Pau Monné
@ 2018-02-23 15:05     ` Jan Beulich
  2018-02-23 15:51       ` Andrew Cooper
  2 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2018-02-23 15:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, Boris Ostrovsky, Roger Pau Monné

>>> On 21.02.18 at 12:36, <andrew.cooper3@citrix.com> wrote:
> There are many problems with MSR_TSC_AUX handling.
> 
> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> depends on RDTSCP.

Are you sure this isn't a documentation mistake? If it indeed isn't, of
course my earlier comments regarding the use of cpu_has_rdpid
would have been wrong (and that patch would be fine without the
requested adjustment).

> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> PV guests, which in turn allows RDPID to work.
> 
> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> into the generic MSR policy infrastructure, and becomes common.  One
> improvement is that we will now reject invalid values, rather than silently
> truncating and accepting them.  This also causes the emulator to reject RDTSCP
> for guests without the features.
> 
> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
> previous behaviour of this mode was to silently drop writes, but as it is a
> break in the x86 ABI to start with, switch the semantics to be more sane, and
> have TSC_MODE_PVRDTSCP make the MSR properly read-only.

All of this obviously wants an ack and/or testing by the Oracle folks
(assuming this is still in use somewhere).

> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>      if ( hvm_funcs.tsc_scaling.setup )
>          hvm_funcs.tsc_scaling.setup(v);
>  
> -    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
> +    /* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */
> +    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
> +        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;

Since you talk about range checking in the description, shouldn't
you reject here values with the upper 32 bits non-zero?

> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>                 _MSR_MISC_FEATURES_CPUID_FAULTING;
>          break;
>  
> +    case MSR_TSC_AUX:
> +        if ( !cp->extd.rdtscp )
> +            goto gp_fault;

Isn't this breaking the PV use without the feature flag set, but
running in TSC_MODE_PVRDTSCP? I.e. don't you want

        if ( !cp->extd.rdtscp &&
             d->arch.tsc_mode != TSC_MODE_PVRDTSCP )

? Remember there are two cases, one being that the host has the
feature flag set, but the guest has it clear, and the other being
that both have it clear. Since in the former case the guest can read
the MSR through RDTSCP, I think the MSR access ought to be
allowed too.

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
>          }
>          break;
>      }
> +
>      d->arch.incarnation = incarnation + 1;
> +
> +    if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
> +    {
> +        struct vcpu *v;
> +
> +        /* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
> +        for_each_vcpu ( d, v )
> +            v->arch.msr->tsc_aux = d->arch.incarnation;
> +    }

This not needing a lock might warrant a comment (the domain is
[explicitly or implicitly] paused when coming here).

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -235,6 +235,9 @@ def crunch_numbers(state):
>          # absence of any enabled xstate.
>          AVX: [FMA, FMA4, F16C, AVX2, XOP],
>  
> +        # MSR_TSC_AUX is enumerated by RDTSCP, but RDPID also reads TSC_AUX.
> +        RDTSCP: [RDPID],

As per above I'm not convinced this is a valid dependency.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-23 14:22     ` Andrew Cooper
@ 2018-02-23 15:09       ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2018-02-23 15:09 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, BorisOstrovsky, Roger Pau Monné

>>> On 23.02.18 at 15:22, <andrew.cooper3@citrix.com> wrote:
> On 23/02/18 14:04, Jan Beulich wrote:
>>>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -1533,9 +1533,9 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>>>      if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
>>>          activate_debugregs(v);
>>>  
>>> -    if ( (v->domain->arch.tsc_mode ==  TSC_MODE_PVRDTSCP) &&
>>> -         boot_cpu_has(X86_FEATURE_RDTSCP) )
>>> -        write_rdtscp_aux(v->domain->arch.incarnation);
>>> +    if ( cpu_has_rdtscp )
>>> +        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
>>> +                      ? v->domain->arch.incarnation : 0);
>> Wouldn't the conditional better check cpu_has_rdtscp || cpu_has_rdpid
>> (then also better matching the description)? And if so, then perhaps
>> also in other, pre-existing conditionals?
> 
> cpu_has_rdpid already implies cpu_has_rdtscp in practice, making the ||
> redundant.  No hardware exists with rdpid but without rdtscp.

The question isn't whether such hardware actually exists, but
whether the architecture allows for such hardware to exist in
theory. The dependency you add in patch 5 implies you think so;
I continue to think that the more sane behavior would be if the
MSR was tied to either flag.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-23 15:05     ` Jan Beulich
@ 2018-02-23 15:51       ` Andrew Cooper
  2018-02-26 11:30         ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-23 15:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, Boris Ostrovsky, Roger Pau Monné

On 23/02/18 15:05, Jan Beulich wrote:
>>>> On 21.02.18 at 12:36, <andrew.cooper3@citrix.com> wrote:
>> There are many problems with MSR_TSC_AUX handling.
>>
>> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
>> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
>> depends on RDTSCP.
> Are you sure this isn't a documentation mistake?

No, but nor do I have reason to doubt what is written.  It seems
plausible and reasonable to me.

RDTSCP has been in hardware for many generations now (and appears to be
architectural in AMD64 although not picked up by Intel initially), while
RDPID is new in Skylake.

The documentation is consistent between Vol 2 (CPUID, RDPID) and Vol 4
(MSRs), none of which make any link between RDPID enumeration and
TSC_AUX, but plenty of links between RDTSCP and TSC_AUX.

>  If it indeed isn't, of
> course my earlier comments regarding the use of cpu_has_rdpid
> would have been wrong (and that patch would be fine without the
> requested adjustment).
>
>> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
>> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
>> PV guests, which in turn allows RDPID to work.
>>
>> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
>> into the generic MSR policy infrastructure, and becomes common.  One
>> improvement is that we will now reject invalid values, rather than silently
>> truncating and accepting them.  This also causes the emulator to reject RDTSCP
>> for guests without the features.
>>
>> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
>> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
>> previous behaviour of this mode was to silently drop writes, but as it is a
>> break in the x86 ABI to start with, switch the semantics to be more sane, and
>> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> All of this obviously wants an ack and/or testing by the Oracle folks
> (assuming this is still in use somewhere).
>
>> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>>      if ( hvm_funcs.tsc_scaling.setup )
>>          hvm_funcs.tsc_scaling.setup(v);
>>  
>> -    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
>> +    /* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */
>> +    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
>> +        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
> Since you talk about range checking in the description, shouldn't
> you reject here values with the upper 32 bits non-zero?

Probably. In reality all migration streams have it within range, because
of the types used on the sending side.

>
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>>                 _MSR_MISC_FEATURES_CPUID_FAULTING;
>>          break;
>>  
>> +    case MSR_TSC_AUX:
>> +        if ( !cp->extd.rdtscp )
>> +            goto gp_fault;
> Isn't this breaking the PV use without the feature flag set, but
> running in TSC_MODE_PVRDTSCP? I.e. don't you want
>
>         if ( !cp->extd.rdtscp &&
>              d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
>
> ? Remember there are two cases, one being that the host has the
> feature flag set, but the guest has it clear, and the other being
> that both have it clear. Since in the former case the guest can read
> the MSR through RDTSCP, I think the MSR access ought to be
> allowed too.

There is at least a 3rd case, of no hardware RDTSCP support, which is
why we also emulate it in emul-invl-op.c

A guest trying to use PVRDTSC necessarily needs out-of-band signalling
to set it up.  I do not think it is reasonable or appropriate to retain
the ABI breakage of completing reads of the MSR when the instruction
should be architecturally unavailable.

But as you've said and I agree, we definitely need Oracle's input here.

>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
>>          }
>>          break;
>>      }
>> +
>>      d->arch.incarnation = incarnation + 1;
>> +
>> +    if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
>> +    {
>> +        struct vcpu *v;
>> +
>> +        /* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
>> +        for_each_vcpu ( d, v )
>> +            v->arch.msr->tsc_aux = d->arch.incarnation;
>> +    }
> This not needing a lock might warrant a comment (the domain is
> [explicitly or implicitly] paused when coming here).

This new piece of logic isn't any different to the rest of
tsc_set_info() WRT being paused.  It is explicitly paused at this point.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
                     ` (2 preceding siblings ...)
  2018-02-23 14:04   ` Jan Beulich
@ 2018-02-26 11:25   ` Jan Beulich
  2018-02-26 19:11     ` [ping] " Andrew Cooper
  2018-02-26 19:52   ` Boris Ostrovsky
  4 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2018-02-26 11:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, Boris Ostrovsky, Roger Pau Monné

>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
> advertised/hidden.
> 
> At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
> TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
> Therefore, default PV guests can read the value from a previously scheduled
> HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.
> 
> Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common 
> case.
> 
> To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
> update of the MSR, and use this function consistently across the codebase.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Despite me continuing to think that RDTSCP and RDPID should be
fully independent features, this being in line with the SDM:
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
  2018-02-23 15:51       ` Andrew Cooper
@ 2018-02-26 11:30         ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2018-02-26 11:30 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, Boris Ostrovsky, Roger Pau Monné

>>> On 23.02.18 at 16:51, <andrew.cooper3@citrix.com> wrote:
> On 23/02/18 15:05, Jan Beulich wrote:
>>>>> On 21.02.18 at 12:36, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/msr.c
>>> +++ b/xen/arch/x86/msr.c
>>> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>>>                 _MSR_MISC_FEATURES_CPUID_FAULTING;
>>>          break;
>>>  
>>> +    case MSR_TSC_AUX:
>>> +        if ( !cp->extd.rdtscp )
>>> +            goto gp_fault;
>> Isn't this breaking the PV use without the feature flag set, but
>> running in TSC_MODE_PVRDTSCP? I.e. don't you want
>>
>>         if ( !cp->extd.rdtscp &&
>>              d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
>>
>> ? Remember there are two cases, one being that the host has the
>> feature flag set, but the guest has it clear, and the other being
>> that both have it clear. Since in the former case the guest can read
>> the MSR through RDTSCP, I think the MSR access ought to be
>> allowed too.
> 
> There is at least a 3rd case, of no hardware RDTSCP support, which is
> why we also emulate it in emul-invl-op.c

Well, that's the "both have it clear" case I've mentioned above.

> A guest trying to use PVRDTSC necessarily needs out-of-band signalling
> to set it up.  I do not think it is reasonable or appropriate to retain
> the ABI breakage of completing reads of the MSR when the instruction
> should be architecturally unavailable.

Well, one can take either position, so I'm not going to object to
you not wanting to make the change.

>>> --- a/xen/arch/x86/time.c
>>> +++ b/xen/arch/x86/time.c
>>> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
>>>          }
>>>          break;
>>>      }
>>> +
>>>      d->arch.incarnation = incarnation + 1;
>>> +
>>> +    if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
>>> +    {
>>> +        struct vcpu *v;
>>> +
>>> +        /* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
>>> +        for_each_vcpu ( d, v )
>>> +            v->arch.msr->tsc_aux = d->arch.incarnation;
>>> +    }
>> This not needing a lock might warrant a comment (the domain is
>> [explicitly or implicitly] paused when coming here).
> 
> This new piece of logic isn't any different to the rest of
> tsc_set_info() WRT being paused.  It is explicitly paused at this point.

True.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [ping] Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-26 11:25   ` Jan Beulich
@ 2018-02-26 19:11     ` Andrew Cooper
  2018-02-27  5:38       ` Tian, Kevin
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-26 19:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, SuraveeSuthikulpanit, Xen-devel,
	Jun Nakajima, Boris Ostrovsky, Roger Pau Monné

On 26/02/18 11:25, Jan Beulich wrote:
>>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
>> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
>> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
>> advertised/hidden.
>>
>> At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
>> TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
>> Therefore, default PV guests can read the value from a previously scheduled
>> HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.
>>
>> Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common 
>> case.
>>
>> To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
>> update of the MSR, and use this function consistently across the codebase.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Despite me continuing to think that RDTSCP and RDPID should be
> fully independent features, this being in line with the SDM:
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks.

Given the important of this patch, I feel it is time to ping the VT-x
and SVM maintainers for their input.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests
  2018-02-20 11:58 [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
                   ` (4 preceding siblings ...)
  2018-02-20 11:58 ` [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch Andrew Cooper
@ 2018-02-26 19:12 ` Andrew Cooper
  2018-02-26 19:44   ` Boris Ostrovsky
  5 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-26 19:12 UTC (permalink / raw)
  To: Xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 20/02/18 11:58, Andrew Cooper wrote:
> This rats nest was discovered when finding that MSR_TSC_AUX leaked into PV
> guests.  It is RFC because I haven't done extensive testing on the result, and
> because there are some functional changes for the virtualised TSC modes.
>
> Andrew Cooper (5):
>   x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
>   x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
>   x86/time: Rework pv_soft_rdtsc() to aid further cleanup
>   x86/pv: Remove deferred RDTSC{,P} handling in pv_emulate_privileged_op()
>   x86: Rework MSR_TSC_AUX handling from scratch.

Konrad/Boris: Can we have any input WRT TSC_MODE_PVRDTSCP usage?  Are
you still using the feature, or is it abandoned?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests
  2018-02-26 19:12 ` [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
@ 2018-02-26 19:44   ` Boris Ostrovsky
  2018-02-26 23:30     ` Andrew Cooper
  0 siblings, 1 reply; 47+ messages in thread
From: Boris Ostrovsky @ 2018-02-26 19:44 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel, Konrad Rzeszutek Wilk
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 02/26/2018 02:12 PM, Andrew Cooper wrote:
> On 20/02/18 11:58, Andrew Cooper wrote:
>> This rats nest was discovered when finding that MSR_TSC_AUX leaked into PV
>> guests.  It is RFC because I haven't done extensive testing on the result, and
>> because there are some functional changes for the virtualised TSC modes.
>>
>> Andrew Cooper (5):
>>   x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
>>   x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
>>   x86/time: Rework pv_soft_rdtsc() to aid further cleanup
>>   x86/pv: Remove deferred RDTSC{,P} handling in pv_emulate_privileged_op()
>>   x86: Rework MSR_TSC_AUX handling from scratch.
> Konrad/Boris: Can we have any input WRT TSC_MODE_PVRDTSCP usage?  Are
> you still using the feature, or is it abandoned?


I already asked a few internal teams about, haven't heard back.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
                     ` (3 preceding siblings ...)
  2018-02-26 11:25   ` Jan Beulich
@ 2018-02-26 19:52   ` Boris Ostrovsky
  4 siblings, 0 replies; 47+ messages in thread
From: Boris Ostrovsky @ 2018-02-26 19:52 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich,
	Suravee Suthikulpanit, Roger Pau Monné

On 02/20/2018 06:58 AM, Andrew Cooper wrote:
> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
> advertised/hidden.
>
> At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
> TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
> Therefore, default PV guests can read the value from a previously scheduled
> HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.
>
> Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.
>
> To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
> update of the MSR, and use this function consistently across the codebase.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

(Apologies for the delay. I am quite behind with my emails)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests
  2018-02-26 19:44   ` Boris Ostrovsky
@ 2018-02-26 23:30     ` Andrew Cooper
  2018-03-09 18:05       ` Boris Ostrovsky
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-02-26 23:30 UTC (permalink / raw)
  To: Boris Ostrovsky, Xen-devel, Konrad Rzeszutek Wilk
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 26/02/2018 19:44, Boris Ostrovsky wrote:
> On 02/26/2018 02:12 PM, Andrew Cooper wrote:
>> On 20/02/18 11:58, Andrew Cooper wrote:
>>> This rats nest was discovered when finding that MSR_TSC_AUX leaked into PV
>>> guests.  It is RFC because I haven't done extensive testing on the result, and
>>> because there are some functional changes for the virtualised TSC modes.
>>>
>>> Andrew Cooper (5):
>>>   x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
>>>   x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
>>>   x86/time: Rework pv_soft_rdtsc() to aid further cleanup
>>>   x86/pv: Remove deferred RDTSC{,P} handling in pv_emulate_privileged_op()
>>>   x86: Rework MSR_TSC_AUX handling from scratch.
>> Konrad/Boris: Can we have any input WRT TSC_MODE_PVRDTSCP usage?  Are
>> you still using the feature, or is it abandoned?
> I already asked a few internal teams about, haven't heard back.

Ah ok - thanks.  I'll wait to hear back from you then.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [ping] Re: [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
  2018-02-26 19:11     ` [ping] " Andrew Cooper
@ 2018-02-27  5:38       ` Tian, Kevin
  0 siblings, 0 replies; 47+ messages in thread
From: Tian, Kevin @ 2018-02-27  5:38 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Wei Liu, SuraveeSuthikulpanit, Xen-devel, Nakajima, Jun,
	Boris Ostrovsky, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, February 27, 2018 3:11 AM
> 
> On 26/02/18 11:25, Jan Beulich wrote:
> >>>> On 20.02.18 at 12:58, <andrew.cooper3@citrix.com> wrote:
> >> If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the
> value in
> >> MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
> >> advertised/hidden.
> >>
> >> At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
> >> TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
> >> Therefore, default PV guests can read the value from a previously
> scheduled
> >> HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.
> >>
> >> Alter the PV path to always write to MSR_TSC_AUX, using 0 in the
> common
> >> case.
> >>
> >> To amortise overhead cost, introduce wrmsr_tsc_aux() which performs
> a lazy
> >> update of the MSR, and use this function consistently across the
> codebase.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Despite me continuing to think that RDTSCP and RDPID should be
> > fully independent features, this being in line with the SDM:
> > Acked-by: Jan Beulich <jbeulich@suse.com>
> 
> Thanks.
> 
> Given the important of this patch, I feel it is time to ping the VT-x
> and SVM maintainers for their input.
> 
> ~Andrew

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests
  2018-02-26 23:30     ` Andrew Cooper
@ 2018-03-09 18:05       ` Boris Ostrovsky
  2018-03-09 18:41         ` Andrew Cooper
  0 siblings, 1 reply; 47+ messages in thread
From: Boris Ostrovsky @ 2018-03-09 18:05 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel, Konrad Rzeszutek Wilk
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné



On 02/26/2018 06:30 PM, Andrew Cooper wrote:
> On 26/02/2018 19:44, Boris Ostrovsky wrote:
>> On 02/26/2018 02:12 PM, Andrew Cooper wrote:
>>> On 20/02/18 11:58, Andrew Cooper wrote:
>>>> This rats nest was discovered when finding that MSR_TSC_AUX leaked into PV
>>>> guests.  It is RFC because I haven't done extensive testing on the result, and
>>>> because there are some functional changes for the virtualised TSC modes.
>>>>
>>>> Andrew Cooper (5):
>>>>    x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
>>>>    x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
>>>>    x86/time: Rework pv_soft_rdtsc() to aid further cleanup
>>>>    x86/pv: Remove deferred RDTSC{,P} handling in pv_emulate_privileged_op()
>>>>    x86: Rework MSR_TSC_AUX handling from scratch.
>>> Konrad/Boris: Can we have any input WRT TSC_MODE_PVRDTSCP usage?  Are
>>> you still using the feature, or is it abandoned?
>> I already asked a few internal teams about, haven't heard back.
> 
> Ah ok - thanks.  I'll wait to hear back from you then.


Took longer than I hoped, sorry.

Couldn't find anyone who is still using this mode (or perhaps noone 
wanted to admit to this ;-)).

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests
  2018-03-09 18:05       ` Boris Ostrovsky
@ 2018-03-09 18:41         ` Andrew Cooper
  2018-03-09 19:10           ` Boris Ostrovsky
  0 siblings, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2018-03-09 18:41 UTC (permalink / raw)
  To: Boris Ostrovsky, Xen-devel, Konrad Rzeszutek Wilk
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 09/03/2018 18:05, Boris Ostrovsky wrote:
>
>
> On 02/26/2018 06:30 PM, Andrew Cooper wrote:
>> On 26/02/2018 19:44, Boris Ostrovsky wrote:
>>> On 02/26/2018 02:12 PM, Andrew Cooper wrote:
>>>> On 20/02/18 11:58, Andrew Cooper wrote:
>>>>> This rats nest was discovered when finding that MSR_TSC_AUX leaked
>>>>> into PV
>>>>> guests.  It is RFC because I haven't done extensive testing on the
>>>>> result, and
>>>>> because there are some functional changes for the virtualised TSC
>>>>> modes.
>>>>>
>>>>> Andrew Cooper (5):
>>>>>    x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
>>>>>    x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV
>>>>> context
>>>>>    x86/time: Rework pv_soft_rdtsc() to aid further cleanup
>>>>>    x86/pv: Remove deferred RDTSC{,P} handling in
>>>>> pv_emulate_privileged_op()
>>>>>    x86: Rework MSR_TSC_AUX handling from scratch.
>>>> Konrad/Boris: Can we have any input WRT TSC_MODE_PVRDTSCP usage?  Are
>>>> you still using the feature, or is it abandoned?
>>> I already asked a few internal teams about, haven't heard back.
>>
>> Ah ok - thanks.  I'll wait to hear back from you then.
>
>
> Took longer than I hoped, sorry.
>
> Couldn't find anyone who is still using this mode (or perhaps noone
> wanted to admit to this ;-)).

Is this your (Oracle's) blessing to purge the feature and pretend that
it never existed?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests
  2018-03-09 18:41         ` Andrew Cooper
@ 2018-03-09 19:10           ` Boris Ostrovsky
  0 siblings, 0 replies; 47+ messages in thread
From: Boris Ostrovsky @ 2018-03-09 19:10 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel, Konrad Rzeszutek Wilk
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné



On 03/09/2018 01:41 PM, Andrew Cooper wrote:
> On 09/03/2018 18:05, Boris Ostrovsky wrote:
>>
>>
>> On 02/26/2018 06:30 PM, Andrew Cooper wrote:
>>> On 26/02/2018 19:44, Boris Ostrovsky wrote:
>>>> On 02/26/2018 02:12 PM, Andrew Cooper wrote:
>>>>> On 20/02/18 11:58, Andrew Cooper wrote:
>>>>>> This rats nest was discovered when finding that MSR_TSC_AUX leaked
>>>>>> into PV
>>>>>> guests.  It is RFC because I haven't done extensive testing on the
>>>>>> result, and
>>>>>> because there are some functional changes for the virtualised TSC
>>>>>> modes.
>>>>>>
>>>>>> Andrew Cooper (5):
>>>>>>     x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
>>>>>>     x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV
>>>>>> context
>>>>>>     x86/time: Rework pv_soft_rdtsc() to aid further cleanup
>>>>>>     x86/pv: Remove deferred RDTSC{,P} handling in
>>>>>> pv_emulate_privileged_op()
>>>>>>     x86: Rework MSR_TSC_AUX handling from scratch.
>>>>> Konrad/Boris: Can we have any input WRT TSC_MODE_PVRDTSCP usage?  Are
>>>>> you still using the feature, or is it abandoned?
>>>> I already asked a few internal teams about, haven't heard back.
>>>
>>> Ah ok - thanks.  I'll wait to hear back from you then.
>>
>>
>> Took longer than I hoped, sorry.
>>
>> Couldn't find anyone who is still using this mode (or perhaps noone
>> wanted to admit to this ;-)).
> 
> Is this your (Oracle's) blessing to purge the feature and pretend that
> it never existed?


Not sure about the second part but for the first part (purging) --- sure.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2018-03-09 19:10 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-20 11:58 [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
2018-02-20 11:58 ` [PATCH 1/5] x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs() Andrew Cooper
2018-02-20 14:54   ` Roger Pau Monné
2018-02-20 15:12   ` Wei Liu
2018-02-23 13:53   ` Jan Beulich
2018-02-20 11:58 ` [PATCH 2/5] x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context Andrew Cooper
2018-02-20 15:22   ` Wei Liu
2018-02-20 15:26     ` Andrew Cooper
2018-02-20 15:32       ` Wei Liu
2018-02-20 15:49   ` Roger Pau Monné
2018-02-23 14:04   ` Jan Beulich
2018-02-23 14:22     ` Andrew Cooper
2018-02-23 15:09       ` Jan Beulich
2018-02-26 11:25   ` Jan Beulich
2018-02-26 19:11     ` [ping] " Andrew Cooper
2018-02-27  5:38       ` Tian, Kevin
2018-02-26 19:52   ` Boris Ostrovsky
2018-02-20 11:58 ` [PATCH 3/5] x86/time: Rework pv_soft_rdtsc() to aid further cleanup Andrew Cooper
2018-02-20 15:32   ` Wei Liu
2018-02-20 16:04   ` Roger Pau Monné
2018-02-20 16:07     ` Andrew Cooper
2018-02-23 14:38   ` Jan Beulich
2018-02-20 11:58 ` [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op() Andrew Cooper
2018-02-20 16:08   ` Wei Liu
2018-02-20 16:28   ` Roger Pau Monné
2018-02-20 16:37     ` Andrew Cooper
2018-02-20 17:40       ` Roger Pau Monné
2018-02-23 14:40   ` Jan Beulich
2018-02-20 11:58 ` [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch Andrew Cooper
2018-02-20 17:03   ` Wei Liu
2018-02-20 17:42     ` Andrew Cooper
2018-02-21 11:08       ` Wei Liu
2018-02-20 17:35   ` Roger Pau Monné
2018-02-20 18:28     ` Andrew Cooper
2018-02-21 10:13       ` Roger Pau Monné
2018-02-21 11:36   ` [PATCH v2 " Andrew Cooper
2018-02-21 12:06     ` Wei Liu
2018-02-21 13:04     ` Roger Pau Monné
2018-02-23 15:05     ` Jan Beulich
2018-02-23 15:51       ` Andrew Cooper
2018-02-26 11:30         ` Jan Beulich
2018-02-26 19:12 ` [RFC PATCH 0/5] x86: Multiple fixes to MSR_TSC_AUX and RDTSCP handling for guests Andrew Cooper
2018-02-26 19:44   ` Boris Ostrovsky
2018-02-26 23:30     ` Andrew Cooper
2018-03-09 18:05       ` Boris Ostrovsky
2018-03-09 18:41         ` Andrew Cooper
2018-03-09 19:10           ` Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.