All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/11] Add LMCE support
@ 2017-06-26  9:16 Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl Haozhong Zhang
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper,
	Ian Jackson, Jun Nakajima

v3 can be found at https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg04118.html.

Changes in v4:
 * Patch 1 is new and fixes some comment.
 * Changes to MCE barriers in v3 Patch 1 "x86/mce: handle LMCE locally"
   is moved out a separated v4 Patch 2. The rest of v3 Patch 1 becomes
   v4 Patch 3.
 * Introduce a new per-cpu list to handle possibly re-occurred MC# in
   MCE softirq handler. See the commit message of Patch 3 for details.
 * Other minor changes are logged in Patch 7 - 9. With changes in
   Patch 7 & 9, I take Jan's R-b.
 * Patch 8 still lacks R-b/A-b for toolstack side.

Haozhong Zhang (11):
  [N  ] 01/11 xen/mce: fix comment of struct mc_telem_cpu_ctl
  [N  ] 02/11 xen/mce: allow mce_barrier_{enter,exit} to return without waiting
  [N  ] 03/11 x86/mce: handle host LMCE
  [  R] 04/11 x86/mce_intel: detect and enable LMCE on Intel host
  [  R] 05/11 x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL
  [  R] 06/11 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
  [ MR] 07/11 x86/vmce: enable injecting LMCE to guest on Intel host

  [ MR] 08/11 x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
        (lack R-b/A-b for toolstack side)

  [ MR] 09/11 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
  [  A] 10/11 tools/libxc: add support of injecting MC# to specified CPUs
  [  A] 11/11 tools/xen-mceinj: add support of injecting LMCE

 N: new in this version
 M: modified in this version
 R: got R-b
 A: got A-b

 docs/man/xl.cfg.pod.5.in                | 24 ++++++++
 tools/libxc/include/xenctrl.h           |  2 +
 tools/libxc/xc_misc.c                   | 52 ++++++++++++++++-
 tools/libxc/xc_sr_save_x86_hvm.c        |  1 +
 tools/libxl/libxl.h                     |  7 +++
 tools/libxl/libxl_dom.c                 | 15 +++++
 tools/libxl/libxl_types.idl             |  1 +
 tools/tests/mce-test/tools/xen-mceinj.c | 50 ++++++++++++++++-
 tools/xl/xl_parse.c                     | 31 ++++++++++-
 xen/arch/x86/cpu/mcheck/barrier.c       | 12 ++--
 xen/arch/x86/cpu/mcheck/barrier.h       | 12 +++-
 xen/arch/x86/cpu/mcheck/mcaction.c      | 21 +++++--
 xen/arch/x86/cpu/mcheck/mce.c           | 90 ++++++++++++++++++++----------
 xen/arch/x86/cpu/mcheck/mce.h           |  2 +
 xen/arch/x86/cpu/mcheck/mce_intel.c     | 50 +++++++++++++++--
 xen/arch/x86/cpu/mcheck/mctelem.c       | 99 +++++++++++++++++++++++++++++----
 xen/arch/x86/cpu/mcheck/mctelem.h       |  5 +-
 xen/arch/x86/cpu/mcheck/vmce.c          | 64 ++++++++++++++++++++-
 xen/arch/x86/cpu/mcheck/vmce.h          |  2 +-
 xen/arch/x86/cpu/mcheck/x86_mca.h       |  9 ++-
 xen/arch/x86/hvm/hvm.c                  |  5 ++
 xen/arch/x86/hvm/vmx/vmx.c              |  9 +++
 xen/arch/x86/hvm/vmx/vvmx.c             |  4 --
 xen/include/asm-x86/mce.h               |  3 +
 xen/include/asm-x86/msr-index.h         |  2 +
 xen/include/public/arch-x86/hvm/save.h  |  1 +
 xen/include/public/arch-x86/xen-mca.h   |  1 +
 xen/include/public/hvm/params.h         |  7 ++-
 28 files changed, 505 insertions(+), 76 deletions(-)

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-27  6:28   ` Jan Beulich
  2017-06-26  9:16 ` [PATCH v4 02/11] xen/mce: allow mce_barrier_{enter, exit} to return without waiting Haozhong Zhang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Jan Beulich, Andrew Cooper

struct mc_telem_cpu_ctl is now used as the type of per-cpu variables,
rather than a globla variable shared by all CPUs, so some of its
comment do not apply any more.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mctelem.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c
index 96048ebcc0..57abeab357 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -108,9 +108,7 @@ static struct mc_telem_ctl {
 struct mc_telem_cpu_ctl {
 	/*
 	 * Per-CPU processing lists, used for deferred (softirq)
-	 * processing of telemetry. @pending is indexed by the
-	 * CPU that the telemetry belongs to. @processing is indexed
-	 * by the CPU that is processing the telemetry.
+	 * processing of telemetry.
 	 */
 	struct mctelem_ent *pending;
 	struct mctelem_ent *processing;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 02/11] xen/mce: allow mce_barrier_{enter, exit} to return without waiting
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-27  6:38   ` Jan Beulich
  2017-06-26  9:16 ` [PATCH v4 03/11] x86/mce: handle host LMCE Haozhong Zhang
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Jan Beulich, Andrew Cooper

Add a 'nowait' argument to mce_barrier_{enter,exit}() to allow them
return immediately without waiting mce_barrier_{enter,exit}() on other
CPUs. This is useful when handling LMCE, where mce_barrier_{enter,exit}
are called only on one CPU.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/barrier.c | 12 ++++++------
 xen/arch/x86/cpu/mcheck/barrier.h | 12 ++++++++++--
 xen/arch/x86/cpu/mcheck/mce.c     | 20 ++++++++++----------
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/barrier.c b/xen/arch/x86/cpu/mcheck/barrier.c
index 5dce1fb9b9..0b3b09103d 100644
--- a/xen/arch/x86/cpu/mcheck/barrier.c
+++ b/xen/arch/x86/cpu/mcheck/barrier.c
@@ -16,11 +16,11 @@ void mce_barrier_dec(struct mce_softirq_barrier *bar)
     atomic_dec(&bar->val);
 }
 
-void mce_barrier_enter(struct mce_softirq_barrier *bar)
+void mce_barrier_enter(struct mce_softirq_barrier *bar, bool nowait)
 {
     int gen;
 
-    if (!mce_broadcast)
+    if ( !mce_broadcast || nowait )
         return;
     atomic_inc(&bar->ingen);
     gen = atomic_read(&bar->outgen);
@@ -34,11 +34,11 @@ void mce_barrier_enter(struct mce_softirq_barrier *bar)
     }
 }
 
-void mce_barrier_exit(struct mce_softirq_barrier *bar)
+void mce_barrier_exit(struct mce_softirq_barrier *bar, bool nowait)
 {
     int gen;
 
-    if ( !mce_broadcast )
+    if ( !mce_broadcast || nowait )
         return;
     atomic_inc(&bar->outgen);
     gen = atomic_read(&bar->ingen);
@@ -54,6 +54,6 @@ void mce_barrier_exit(struct mce_softirq_barrier *bar)
 
 void mce_barrier(struct mce_softirq_barrier *bar)
 {
-    mce_barrier_enter(bar);
-    mce_barrier_exit(bar);
+    mce_barrier_enter(bar, false);
+    mce_barrier_exit(bar, false);
 }
diff --git a/xen/arch/x86/cpu/mcheck/barrier.h b/xen/arch/x86/cpu/mcheck/barrier.h
index d3ccf8b15f..f6b4370945 100644
--- a/xen/arch/x86/cpu/mcheck/barrier.h
+++ b/xen/arch/x86/cpu/mcheck/barrier.h
@@ -32,6 +32,14 @@ void mce_barrier_init(struct mce_softirq_barrier *);
 void mce_barrier_dec(struct mce_softirq_barrier *);
 
 /*
+ * If nowait is true, mce_barrier_enter/exit() will return immediately
+ * without touching the barrier. It's used when handling a LMCE which
+ * is received on only one CPU and thus does not invoke
+ * mce_barrier_enter/exit() calls on all CPUs.
+ *
+ * If nowait is false, mce_barrier_enter/exit() will handle the given
+ * barrier as below.
+ *
  * Increment the generation number and the value. The generation number
  * is incremented when entering a barrier. This way, it can be checked
  * on exit if a CPU is trying to re-enter the barrier. This can happen
@@ -43,8 +51,8 @@ void mce_barrier_dec(struct mce_softirq_barrier *);
  * These barrier functions should always be paired, so that the
  * counter value will reach 0 again after all CPUs have exited.
  */
-void mce_barrier_enter(struct mce_softirq_barrier *);
-void mce_barrier_exit(struct mce_softirq_barrier *);
+void mce_barrier_enter(struct mce_softirq_barrier *, bool nowait);
+void mce_barrier_exit(struct mce_softirq_barrier *, bool nowait);
 
 void mce_barrier(struct mce_softirq_barrier *);
 
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 54fd000aa0..1e0b03c38b 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -497,15 +497,15 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
     }
     mce_spin_unlock(&mce_logout_lock);
 
-    mce_barrier_enter(&mce_trap_bar);
+    mce_barrier_enter(&mce_trap_bar, false);
     if ( mctc != NULL && mce_urgent_action(regs, mctc))
         cpumask_set_cpu(smp_processor_id(), &mce_fatal_cpus);
-    mce_barrier_exit(&mce_trap_bar);
+    mce_barrier_exit(&mce_trap_bar, false);
 
     /*
      * Wait until everybody has processed the trap.
      */
-    mce_barrier_enter(&mce_trap_bar);
+    mce_barrier_enter(&mce_trap_bar, false);
     if (atomic_read(&severity_cpu) == smp_processor_id())
     {
         /* According to SDM, if no error bank found on any cpus,
@@ -524,16 +524,16 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
         atomic_set(&found_error, 0);
         atomic_set(&severity_cpu, -1);
     }
-    mce_barrier_exit(&mce_trap_bar);
+    mce_barrier_exit(&mce_trap_bar, false);
 
     /* Clear flags after above fatal check */
-    mce_barrier_enter(&mce_trap_bar);
+    mce_barrier_enter(&mce_trap_bar, false);
     gstatus = mca_rdmsr(MSR_IA32_MCG_STATUS);
     if ((gstatus & MCG_STATUS_MCIP) != 0) {
         mce_printk(MCE_CRITICAL, "MCE: Clear MCIP@ last step");
         mca_wrmsr(MSR_IA32_MCG_STATUS, 0);
     }
-    mce_barrier_exit(&mce_trap_bar);
+    mce_barrier_exit(&mce_trap_bar, false);
 
     raise_softirq(MACHINE_CHECK_SOFTIRQ);
 }
@@ -1703,7 +1703,7 @@ static void mce_softirq(void)
 
     mce_printk(MCE_VERBOSE, "CPU%d enter softirq\n", cpu);
 
-    mce_barrier_enter(&mce_inside_bar);
+    mce_barrier_enter(&mce_inside_bar, false);
 
     /*
      * Everybody is here. Now let's see who gets to do the
@@ -1716,10 +1716,10 @@ static void mce_softirq(void)
 
     atomic_set(&severity_cpu, cpu);
 
-    mce_barrier_enter(&mce_severity_bar);
+    mce_barrier_enter(&mce_severity_bar, false);
     if (!mctelem_has_deferred(cpu))
         atomic_set(&severity_cpu, cpu);
-    mce_barrier_exit(&mce_severity_bar);
+    mce_barrier_exit(&mce_severity_bar, false);
 
     /* We choose severity_cpu for further processing */
     if (atomic_read(&severity_cpu) == cpu) {
@@ -1740,7 +1740,7 @@ static void mce_softirq(void)
         }
     }
 
-    mce_barrier_exit(&mce_inside_bar);
+    mce_barrier_exit(&mce_inside_bar, false);
 }
 
 /* Machine Check owner judge algorithm:
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 03/11] x86/mce: handle host LMCE
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 02/11] xen/mce: allow mce_barrier_{enter, exit} to return without waiting Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-27  7:13   ` Jan Beulich
  2017-06-26  9:16 ` [PATCH v4 04/11] x86/mce_intel: detect and enable LMCE on Intel host Haozhong Zhang
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Jan Beulich, Andrew Cooper

A round of mce_softirq() may handle multiple deferred MCE's.
 1/ If all of them are LMCE's, then mce_softirq() is called on one CPU
    and should not wait for others.
 2/ If at least one of them is non-local MCE, then mce_softirq()
    should sync with other CPUs. mce_softirq() should check those two
    cases and handle them accordingly.

Because mce_softirq() can be interrupted by MC# again, we should also
ensure the deferred MCE handling in mce_softirq() is immutable to the
change of the checking result.

A per-cpu list 'lmce_pending' is introduced to 'struct mc_telem_cpu_ctl'
along with the existing per-cpu list 'pending' for LMCE handling.

MC# handler mcheck_cmn_handler() ensures that
 1/ if all deferred MCE's on a CPU are LMCE's, then all of their
    telemetries will be only in 'lmce_pending' on that CPU;
 2/ if at least one of deferred MCE on a CPU is not LMCE, then all
    telemetries of deferred MCE's on that CPU will be only in
    'pending' on that CPU.

Therefore, the non-empty of 'lmce_pending' can be used to determine
whether it's the former of the beginning two cases in MCE softirq
handler mce_softirq().

mce_softirq() atomically moves deferred MCE's from either list
'lmce_pending' on the current CPU or lists 'pending' on the current or
other CPUs to list 'processing' in the current CPU, and then handles
deferred MCE's in list 'processing'.  New coming MC# before and after
the atomic move, which change the result of the check, do not change
whether MCE's in 'processing' are LMCE or not, so mce_softirq() can
still handle 'processing' according to the result of previous check.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mcaction.c |  4 +-
 xen/arch/x86/cpu/mcheck/mce.c      | 66 ++++++++++++++------------
 xen/arch/x86/cpu/mcheck/mce.h      |  1 +
 xen/arch/x86/cpu/mcheck/mctelem.c  | 95 +++++++++++++++++++++++++++++++++++---
 xen/arch/x86/cpu/mcheck/mctelem.h  |  5 +-
 xen/arch/x86/cpu/mcheck/x86_mca.h  |  4 +-
 6 files changed, 135 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c
index dab9eac306..ca17d22bd8 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -96,7 +96,9 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 
                 bank->mc_addr = gfn << PAGE_SHIFT |
                   (bank->mc_addr & (PAGE_SIZE -1 ));
-                if (fill_vmsr_data(bank, d, global->mc_gstatus,
+                /* TODO: support injecting LMCE */
+                if (fill_vmsr_data(bank, d,
+                                   global->mc_gstatus & ~MCG_STATUS_LMCE,
                                    vmce_vcpuid == VMCE_INJECT_BROADCAST))
                 {
                     mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d "
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 1e0b03c38b..2428cc0762 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -387,6 +387,7 @@ mcheck_mca_logout(enum mca_source who, struct mca_banks *bankmask,
         sp->errcnt = errcnt;
         sp->ripv = (gstatus & MCG_STATUS_RIPV) != 0;
         sp->eipv = (gstatus & MCG_STATUS_EIPV) != 0;
+        sp->lmce = (gstatus & MCG_STATUS_LMCE) != 0;
         sp->uc = uc;
         sp->pcc = pcc;
         sp->recoverable = recover;
@@ -454,6 +455,7 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
     uint64_t gstatus;
     mctelem_cookie_t mctc = NULL;
     struct mca_summary bs;
+    bool lmce;
 
     mce_spin_lock(&mce_logout_lock);
 
@@ -462,6 +464,7 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
             sizeof(long) * BITS_TO_LONGS(clear_bank->num));
     }
     mctc = mcheck_mca_logout(MCA_MCE_SCAN, bankmask, &bs, clear_bank);
+    lmce = bs.lmce;
 
     if (bs.errcnt) {
         /*
@@ -470,7 +473,7 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
         if (bs.uc || bs.pcc) {
             add_taint(TAINT_MACHINE_CHECK);
             if (mctc != NULL)
-                mctelem_defer(mctc);
+                mctelem_defer(mctc, lmce);
             /*
              * For PCC=1 and can't be recovered, context is lost, so
              * reboot now without clearing the banks, and deal with
@@ -497,16 +500,16 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
     }
     mce_spin_unlock(&mce_logout_lock);
 
-    mce_barrier_enter(&mce_trap_bar, false);
+    mce_barrier_enter(&mce_trap_bar, lmce);
     if ( mctc != NULL && mce_urgent_action(regs, mctc))
         cpumask_set_cpu(smp_processor_id(), &mce_fatal_cpus);
-    mce_barrier_exit(&mce_trap_bar, false);
+    mce_barrier_exit(&mce_trap_bar, lmce);
 
     /*
      * Wait until everybody has processed the trap.
      */
-    mce_barrier_enter(&mce_trap_bar, false);
-    if (atomic_read(&severity_cpu) == smp_processor_id())
+    mce_barrier_enter(&mce_trap_bar, lmce);
+    if (lmce || atomic_read(&severity_cpu) == smp_processor_id())
     {
         /* According to SDM, if no error bank found on any cpus,
          * something unexpected happening, we can't do any
@@ -524,16 +527,16 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
         atomic_set(&found_error, 0);
         atomic_set(&severity_cpu, -1);
     }
-    mce_barrier_exit(&mce_trap_bar, false);
+    mce_barrier_exit(&mce_trap_bar, lmce);
 
     /* Clear flags after above fatal check */
-    mce_barrier_enter(&mce_trap_bar, false);
+    mce_barrier_enter(&mce_trap_bar, lmce);
     gstatus = mca_rdmsr(MSR_IA32_MCG_STATUS);
     if ((gstatus & MCG_STATUS_MCIP) != 0) {
         mce_printk(MCE_CRITICAL, "MCE: Clear MCIP@ last step");
         mca_wrmsr(MSR_IA32_MCG_STATUS, 0);
     }
-    mce_barrier_exit(&mce_trap_bar, false);
+    mce_barrier_exit(&mce_trap_bar, lmce);
 
     raise_softirq(MACHINE_CHECK_SOFTIRQ);
 }
@@ -1562,7 +1565,8 @@ static void mc_panic_dump(void)
 
     dprintk(XENLOG_ERR, "Begin dump mc_info\n");
     for_each_online_cpu(cpu)
-        mctelem_process_deferred(cpu, x86_mcinfo_dump_panic);
+        mctelem_process_deferred(cpu, x86_mcinfo_dump_panic,
+                                 mctelem_has_deferred_lmce(cpu));
     dprintk(XENLOG_ERR, "End dump mc_info, %x mcinfo dumped\n", mcinfo_dumpped);
 }
 
@@ -1700,38 +1704,42 @@ static void mce_softirq(void)
     static atomic_t severity_cpu;
     int cpu = smp_processor_id();
     unsigned int workcpu;
+    bool lmce = mctelem_has_deferred_lmce(cpu);
 
     mce_printk(MCE_VERBOSE, "CPU%d enter softirq\n", cpu);
 
-    mce_barrier_enter(&mce_inside_bar, false);
+    mce_barrier_enter(&mce_inside_bar, lmce);
 
-    /*
-     * Everybody is here. Now let's see who gets to do the
-     * recovery work. Right now we just see if there's a CPU
-     * that did not have any problems, and pick that one.
-     *
-     * First, just set a default value: the last CPU who reaches this
-     * will overwrite the value and become the default.
-     */
-
-    atomic_set(&severity_cpu, cpu);
+    if (!lmce) {
+        /*
+         * Everybody is here. Now let's see who gets to do the
+         * recovery work. Right now we just see if there's a CPU
+         * that did not have any problems, and pick that one.
+         *
+         * First, just set a default value: the last CPU who reaches this
+         * will overwrite the value and become the default.
+         */
 
-    mce_barrier_enter(&mce_severity_bar, false);
-    if (!mctelem_has_deferred(cpu))
         atomic_set(&severity_cpu, cpu);
-    mce_barrier_exit(&mce_severity_bar, false);
 
-    /* We choose severity_cpu for further processing */
-    if (atomic_read(&severity_cpu) == cpu) {
+        mce_barrier_enter(&mce_severity_bar, false);
+        if (!mctelem_has_deferred(cpu))
+            atomic_set(&severity_cpu, cpu);
+        mce_barrier_exit(&mce_severity_bar, false);
+    }
 
+    /* We choose severity_cpu for further processing */
+    if (lmce || atomic_read(&severity_cpu) == cpu) {
         mce_printk(MCE_VERBOSE, "CPU%d handling errors\n", cpu);
 
         /* Step1: Fill DOM0 LOG buffer, vMCE injection buffer and
          * vMCE MSRs virtualization buffer
          */
-        for_each_online_cpu(workcpu) {
-            mctelem_process_deferred(workcpu, mce_delayed_action);
-        }
+        if (lmce)
+            mctelem_process_deferred(cpu, mce_delayed_action, true);
+        else
+            for_each_online_cpu(workcpu)
+                mctelem_process_deferred(workcpu, mce_delayed_action, false);
 
         /* Step2: Send Log to DOM0 through vIRQ */
         if (dom0_vmce_enabled()) {
@@ -1740,7 +1748,7 @@ static void mce_softirq(void)
         }
     }
 
-    mce_barrier_exit(&mce_inside_bar, false);
+    mce_barrier_exit(&mce_inside_bar, lmce);
 }
 
 /* Machine Check owner judge algorithm:
diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
index 10e5cebf8b..4f13791948 100644
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -109,6 +109,7 @@ struct mca_summary {
     int         eipv;   /* meaningful on #MC */
     bool        uc;     /* UC flag */
     bool        pcc;    /* PCC flag */
+    bool        lmce;   /* LMCE flag (Intel only) */
     bool        recoverable; /* software error recoverable flag */
 };
 
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c
index 57abeab357..76bca32785 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -109,8 +109,22 @@ struct mc_telem_cpu_ctl {
 	/*
 	 * Per-CPU processing lists, used for deferred (softirq)
 	 * processing of telemetry.
+	 *
+	 * The two pending lists @lmce_pending and @pending grow at
+	 * the head in the reverse chronological order.
+	 *
+	 * @pending and @lmce_pending on the same CPU are mutually
+	 * exclusive, i.e. deferred MCE on a CPU are either all in
+	 * @lmce_pending or all in @pending. In the former case, all
+	 * deferred MCE are LMCE. In the latter case, both LMCE and
+	 * non-local MCE can be in @pending, and @pending contains at
+	 * least one non-local MCE if it's not empty.
+	 *
+	 * Changes to @pending and @lmce_pending should be performed
+	 * via mctelem_process_deferred() and mctelem_defer(), in order
+	 * to guarantee the above mutual exclusivity.
 	 */
-	struct mctelem_ent *pending;
+	struct mctelem_ent *pending, *lmce_pending;
 	struct mctelem_ent *processing;
 };
 static DEFINE_PER_CPU(struct mc_telem_cpu_ctl, mctctl);
@@ -131,26 +145,88 @@ static void mctelem_xchg_head(struct mctelem_ent **headp,
 	}
 }
 
-
-void mctelem_defer(mctelem_cookie_t cookie)
+/**
+ * Append a telemetry of deferred MCE to a per-cpu pending list,
+ * either @pending or @lmce_pending, according to rules below:
+ *  - if @pending is not empty, then the new telemetry will be
+ *    appended to @pending;
+ *  - if @pending is empty and the new telemetry is for a deferred
+ *    LMCE, then the new telemetry will be appended to @lmce_pending;
+ *  - if @pending is empty and the new telemetry is for a deferred
+ *    non-local MCE, all existing telemetries in @lmce_pending will be
+ *    moved to @pending and then the new telemetry will be appended to
+ *    @pending.
+ *
+ * This function must be called with MCIP bit set, so that it does not
+ * need to worry about MC# re-occurring in this function.
+ *
+ * As a result, this function can preserve the mutual exclusivity
+ * between @pending and @lmce_pending (see their comments in struct
+ * mc_telem_cpu_ctl).
+ *
+ * Parameters:
+ *  @cookie: telemetry of the deferred MCE
+ *  @lmce:   indicate whether the telemetry is for LMCE
+ */
+void mctelem_defer(mctelem_cookie_t cookie, bool lmce)
 {
 	struct mctelem_ent *tep = COOKIE2MCTE(cookie);
-
-	mctelem_xchg_head(&this_cpu(mctctl.pending), &tep->mcte_next, tep);
+	struct mc_telem_cpu_ctl *mctctl = &this_cpu(mctctl);
+
+	ASSERT(mctctl->pending == NULL || mctctl->lmce_pending == NULL);
+
+	if (mctctl->pending)
+		mctelem_xchg_head(&mctctl->pending, &tep->mcte_next, tep);
+	else if (lmce)
+		mctelem_xchg_head(&mctctl->lmce_pending, &tep->mcte_next, tep);
+	else {
+		if (mctctl->lmce_pending)
+			mctelem_xchg_head(&mctctl->lmce_pending,
+					  &mctctl->pending, NULL);
+		mctelem_xchg_head(&mctctl->pending, &tep->mcte_next, tep);
+	}
 }
 
+/**
+ * Move telemetries of deferred MCE from the per-cpu pending list on
+ * this or another CPU to the per-cpu processing list on this CPU, and
+ * then process all deferred MCE on the processing list.
+ *
+ * This function can be called with MCIP bit set (e.g. from MC#
+ * handler) or cleared (from MCE softirq handler). In the latter case,
+ * MC# may re-occur in this function.
+ *
+ * Parameters:
+ *  @cpu:  indicate the CPU where the pending list is
+ *  @fn:   the function to handle the deferred MCE
+ *  @lmce: indicate which pending list on @cpu is handled
+ */
 void mctelem_process_deferred(unsigned int cpu,
-			      int (*fn)(mctelem_cookie_t))
+			      int (*fn)(mctelem_cookie_t),
+			      bool lmce)
 {
 	struct mctelem_ent *tep;
 	struct mctelem_ent *head, *prev;
+	struct mc_telem_cpu_ctl *mctctl = &per_cpu(mctctl, cpu);
 	int ret;
 
 	/*
 	 * First, unhook the list of telemetry structures, and	
 	 * hook it up to the processing list head for this CPU.
+	 *
+	 * If @lmce is true and a non-local MC# occurs before the
+	 * following atomic exchange, @lmce will not hold after
+	 * resumption, because all telemetries in @lmce_pending on
+	 * @cpu are moved to @pending on @cpu in mcheck_cmn_handler().
+	 * In such a case, no telemetries will be handled in this
+	 * function after resumption. Another round of MCE softirq,
+	 * which was raised by above mcheck_cmn_handler(), will handle
+	 * those moved telemetries in @pending on @cpu.
+	 *
+	 * If another MC# occurs after the following atomic exchange,
+	 * it will be handled by another round of MCE softirq.
 	 */
-	mctelem_xchg_head(&per_cpu(mctctl.pending, cpu),
+	mctelem_xchg_head(lmce ? &mctctl->lmce_pending : &mctctl->pending,
 			  &this_cpu(mctctl.processing), NULL);
 
 	head = this_cpu(mctctl.processing);
@@ -194,6 +270,11 @@ bool mctelem_has_deferred(unsigned int cpu)
 	return false;
 }
 
+bool mctelem_has_deferred_lmce(unsigned int cpu)
+{
+	return per_cpu(mctctl.lmce_pending, cpu) != NULL;
+}
+
 /* Free an entry to its native free list; the entry must not be linked on
  * any list.
  */
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.h b/xen/arch/x86/cpu/mcheck/mctelem.h
index 9fcde4f6b8..d4eba53ae0 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.h
+++ b/xen/arch/x86/cpu/mcheck/mctelem.h
@@ -67,9 +67,10 @@ extern void mctelem_dismiss(mctelem_cookie_t);
 extern mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t);
 extern void mctelem_consume_oldest_end(mctelem_cookie_t);
 extern void mctelem_ack(mctelem_class_t, mctelem_cookie_t);
-extern void mctelem_defer(mctelem_cookie_t);
+extern void mctelem_defer(mctelem_cookie_t, bool lmce);
 extern void mctelem_process_deferred(unsigned int,
-    int (*)(mctelem_cookie_t));
+                                     int (*)(mctelem_cookie_t), bool lmce);
 bool mctelem_has_deferred(unsigned int);
+bool mctelem_has_deferred_lmce(unsigned int cpu);
 
 #endif
diff --git a/xen/arch/x86/cpu/mcheck/x86_mca.h b/xen/arch/x86/cpu/mcheck/x86_mca.h
index 34d1921ce1..de03f829c3 100644
--- a/xen/arch/x86/cpu/mcheck/x86_mca.h
+++ b/xen/arch/x86/cpu/mcheck/x86_mca.h
@@ -42,7 +42,9 @@
 #define MCG_STATUS_RIPV         0x0000000000000001ULL
 #define MCG_STATUS_EIPV         0x0000000000000002ULL
 #define MCG_STATUS_MCIP         0x0000000000000004ULL
-/* Bits 3-63 are reserved */
+#define MCG_STATUS_LMCE         0x0000000000000008ULL  /* Intel specific */
+/* Bits 3-63 are reserved on CPU not supporting LMCE */
+/* Bits 4-63 are reserved on CPU supporting LMCE */
 
 /* Bitfield of MSR_K8_MCi_STATUS registers */
 /* MCA error code */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 04/11] x86/mce_intel: detect and enable LMCE on Intel host
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (2 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 03/11] x86/mce: handle host LMCE Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 05/11] x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Jan Beulich, Andrew Cooper

Enable LMCE if it's supported by the host CPU. If Xen boot parameter
"mce_fb = 1" is present, LMCE will be disabled forcibly.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mce_intel.c | 46 ++++++++++++++++++++++++++++++++-----
 xen/arch/x86/cpu/mcheck/x86_mca.h   |  5 ++++
 xen/include/asm-x86/msr-index.h     |  2 ++
 3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 4e976c45f8..020b02deff 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -29,6 +29,9 @@ boolean_param("mce_fb", mce_force_broadcast);
 
 static int __read_mostly nr_intel_ext_msrs;
 
+/* If mce_force_broadcast == 1, lmce_support will be disabled forcibly. */
+static bool __read_mostly lmce_support;
+
 /* Intel SDM define bit15~bit0 of IA32_MCi_STATUS as the MC error code */
 #define INTEL_MCCOD_MASK 0xFFFF
 
@@ -698,10 +701,34 @@ static bool mce_is_broadcast(struct cpuinfo_x86 *c)
     return false;
 }
 
+static bool intel_enable_lmce(void)
+{
+    uint64_t msr_content;
+
+    /*
+     * Section "Enabling Local Machine Check" in Intel SDM Vol 3
+     * requires software must ensure the LOCK bit and LMCE_ON bit
+     * of MSR_IA32_FEATURE_CONTROL are set before setting
+     * MSR_IA32_MCG_EXT_CTL.LMCE_EN.
+     */
+
+    if ( rdmsr_safe(MSR_IA32_FEATURE_CONTROL, msr_content) )
+        return false;
+
+    if ( (msr_content & IA32_FEATURE_CONTROL_LOCK) &&
+         (msr_content & IA32_FEATURE_CONTROL_LMCE_ON) )
+    {
+        wrmsrl(MSR_IA32_MCG_EXT_CTL, MCG_EXT_CTL_LMCE_EN);
+        return true;
+    }
+
+    return false;
+}
+
 /* Check and init MCA */
 static void intel_init_mca(struct cpuinfo_x86 *c)
 {
-    bool broadcast, cmci = false, ser = false;
+    bool broadcast, cmci = false, ser = false, lmce = false;
     int ext_num = 0, first;
     uint64_t msr_content;
 
@@ -721,33 +748,40 @@ static void intel_init_mca(struct cpuinfo_x86 *c)
 
     first = mce_firstbank(c);
 
+    if (!mce_force_broadcast && (msr_content & MCG_LMCE_P))
+        lmce = intel_enable_lmce();
+
 #define CAP(enabled, name) ((enabled) ? ", " name : "")
     if (smp_processor_id() == 0)
     {
         dprintk(XENLOG_INFO,
-                "MCA capability: firstbank %d, %d ext MSRs%s%s%s\n",
+                "MCA Capability: firstbank %d, extended MCE MSR %d%s%s%s%s\n",
                 first, ext_num,
                 CAP(broadcast, "BCAST"),
                 CAP(ser, "SER"),
-                CAP(cmci, "CMCI"));
+                CAP(cmci, "CMCI"),
+                CAP(lmce, "LMCE"));
 
         mce_broadcast = broadcast;
         cmci_support = cmci;
         ser_support = ser;
+        lmce_support = lmce;
         nr_intel_ext_msrs = ext_num;
         firstbank = first;
     }
     else if (cmci != cmci_support || ser != ser_support ||
              broadcast != mce_broadcast ||
-             first != firstbank || ext_num != nr_intel_ext_msrs)
+             first != firstbank || ext_num != nr_intel_ext_msrs ||
+             lmce != lmce_support)
         dprintk(XENLOG_WARNING,
                 "CPU%u has different MCA capability "
-                "(firstbank %d, %d ext MSRs%s%s%s)"
+                "(firstbank %d, extended MCE MSR %d%s%s%s%s)"
                 " than BSP, may cause undetermined result!!!\n",
                 smp_processor_id(), first, ext_num,
                 CAP(broadcast, "BCAST"),
                 CAP(ser, "SER"),
-                CAP(cmci, "CMCI"));
+                CAP(cmci, "CMCI"),
+                CAP(lmce, "LMCE"));
 #undef CAP
 }
 
diff --git a/xen/arch/x86/cpu/mcheck/x86_mca.h b/xen/arch/x86/cpu/mcheck/x86_mca.h
index de03f829c3..0f87bcf63e 100644
--- a/xen/arch/x86/cpu/mcheck/x86_mca.h
+++ b/xen/arch/x86/cpu/mcheck/x86_mca.h
@@ -36,6 +36,7 @@
 #define MCG_TES_P               (1ULL<<11) /* Intel specific */
 #define MCG_EXT_CNT             16         /* Intel specific */
 #define MCG_SER_P               (1ULL<<24) /* Intel specific */
+#define MCG_LMCE_P              (1ULL<<27) /* Intel specific */
 /* Other bits are reserved */
 
 /* Bitfield of the MSR_IA32_MCG_STATUS register */
@@ -46,6 +47,10 @@
 /* Bits 3-63 are reserved on CPU not supporting LMCE */
 /* Bits 4-63 are reserved on CPU supporting LMCE */
 
+/* Bitfield of MSR_IA32_MCG_EXT_CTL register (Intel Specific) */
+#define MCG_EXT_CTL_LMCE_EN     (1ULL<<0)
+/* Other bits are reserved */
+
 /* Bitfield of MSR_K8_MCi_STATUS registers */
 /* MCA error code */
 #define MCi_STATUS_MCA          0x000000000000ffffULL
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 771e7500af..756b23d19e 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -51,6 +51,7 @@
 #define MSR_IA32_MCG_CAP		0x00000179
 #define MSR_IA32_MCG_STATUS		0x0000017a
 #define MSR_IA32_MCG_CTL		0x0000017b
+#define MSR_IA32_MCG_EXT_CTL	0x000004d0
 
 #define MSR_IA32_PEBS_ENABLE		0x000003f1
 #define MSR_IA32_DS_AREA		0x00000600
@@ -296,6 +297,7 @@
 #define IA32_FEATURE_CONTROL_SENTER_PARAM_CTL         0x7f00
 #define IA32_FEATURE_CONTROL_ENABLE_SENTER            0x8000
 #define IA32_FEATURE_CONTROL_SGX_ENABLE               0x40000
+#define IA32_FEATURE_CONTROL_LMCE_ON                  0x100000
 
 #define MSR_IA32_TSC_ADJUST		0x0000003b
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 05/11] x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (3 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 04/11] x86/mce_intel: detect and enable LMCE on Intel host Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 06/11] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL Haozhong Zhang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Jun Nakajima, Jan Beulich, Andrew Cooper

If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, then set LMCE and
LOCK bits in guest MSR_IA32_FEATURE_CONTROL. Intel SDM requires those
bits are set before SW can enable LMCE.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/cpu/mcheck/mce_intel.c | 4 ++++
 xen/arch/x86/hvm/vmx/vmx.c          | 9 +++++++++
 xen/arch/x86/hvm/vmx/vvmx.c         | 4 ----
 xen/include/asm-x86/mce.h           | 1 +
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 020b02deff..5cb49ca697 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -946,3 +946,7 @@ int vmce_intel_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
     return 1;
 }
 
+bool vmce_has_lmce(const struct vcpu *v)
+{
+    return v->arch.vmce.mcg_cap & MCG_LMCE_P;
+}
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c53b24955a..6a193ef9d4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -55,6 +55,7 @@
 #include <asm/hvm/nestedhvm.h>
 #include <asm/altp2m.h>
 #include <asm/event.h>
+#include <asm/mce.h>
 #include <asm/monitor.h>
 #include <public/arch-x86/cpuid.h>
 
@@ -2856,6 +2857,8 @@ static int is_last_branch_msr(u32 ecx)
 
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
 {
+    const struct vcpu *curr = current;
+
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
 
     switch ( msr )
@@ -2873,6 +2876,12 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         __vmread(GUEST_IA32_DEBUGCTL, msr_content);
         break;
     case MSR_IA32_FEATURE_CONTROL:
+        *msr_content = IA32_FEATURE_CONTROL_LOCK;
+        if ( vmce_has_lmce(curr) )
+            *msr_content |= IA32_FEATURE_CONTROL_LMCE_ON;
+        if ( nestedhvm_enabled(curr->domain) )
+            *msr_content |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
+        break;
     case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
         if ( !nvmx_msr_read_intercept(msr, msr_content) )
             goto gp_fault;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3560faec6d..f451935ea6 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2084,10 +2084,6 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
         data = gen_vmx_msr(data, VMX_ENTRY_CTLS_DEFAULT1, host_data);
         break;
 
-    case MSR_IA32_FEATURE_CONTROL:
-        data = IA32_FEATURE_CONTROL_LOCK |
-               IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
-        break;
     case MSR_IA32_VMX_VMCS_ENUM:
         /* The max index of VVMCS encoding is 0x1f. */
         data = 0x1f << 1;
diff --git a/xen/include/asm-x86/mce.h b/xen/include/asm-x86/mce.h
index 549bef3ebe..56ad1f92dd 100644
--- a/xen/include/asm-x86/mce.h
+++ b/xen/include/asm-x86/mce.h
@@ -36,6 +36,7 @@ extern void vmce_init_vcpu(struct vcpu *);
 extern int vmce_restore_vcpu(struct vcpu *, const struct hvm_vmce_vcpu *);
 extern int vmce_wrmsr(uint32_t msr, uint64_t val);
 extern int vmce_rdmsr(uint32_t msr, uint64_t *val);
+extern bool vmce_has_lmce(const struct vcpu *v);
 
 extern unsigned int nr_mce_banks;
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 06/11] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (4 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 05/11] x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 07/11] x86/vmce: enable injecting LMCE to guest on Intel host Haozhong Zhang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Jan Beulich, Andrew Cooper

If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, then allow guest
to read/write MSR_IA32_MCG_EXT_CTL.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/vmce.c         | 34 +++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/mce.h              |  1 +
 xen/include/public/arch-x86/hvm/save.h |  1 +
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index d591d31600..210670638f 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -90,6 +90,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt)
     v->arch.vmce.mcg_cap = ctxt->caps;
     v->arch.vmce.bank[0].mci_ctl2 = ctxt->mci_ctl2_bank0;
     v->arch.vmce.bank[1].mci_ctl2 = ctxt->mci_ctl2_bank1;
+    v->arch.vmce.mcg_ext_ctl = ctxt->mcg_ext_ctl;
 
     return 0;
 }
@@ -199,6 +200,26 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val)
         mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_CTL %#"PRIx64"\n", cur, *val);
         break;
 
+    case MSR_IA32_MCG_EXT_CTL:
+        /*
+         * If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, the LMCE and LOCK
+         * bits are always set in guest MSR_IA32_FEATURE_CONTROL by Xen, so it
+         * does not need to check them here.
+         */
+        if ( cur->arch.vmce.mcg_cap & MCG_LMCE_P )
+        {
+            *val = cur->arch.vmce.mcg_ext_ctl;
+            mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL %#"PRIx64"\n",
+                       cur, *val);
+        }
+        else
+        {
+            ret = -1;
+            mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL, not supported\n",
+                       cur);
+        }
+        break;
+
     default:
         ret = mce_bank_msr(cur, msr) ? bank_mce_rdmsr(cur, msr, val) : 0;
         break;
@@ -308,6 +329,16 @@ int vmce_wrmsr(uint32_t msr, uint64_t val)
         mce_printk(MCE_VERBOSE, "MCE: %pv: MCG_CAP is r/o\n", cur);
         break;
 
+    case MSR_IA32_MCG_EXT_CTL:
+        if ( (cur->arch.vmce.mcg_cap & MCG_LMCE_P) &&
+             !(val & ~MCG_EXT_CTL_LMCE_EN) )
+            cur->arch.vmce.mcg_ext_ctl = val;
+        else
+            ret = -1;
+        mce_printk(MCE_VERBOSE, "MCE: %pv: wr MCG_EXT_CTL %"PRIx64"%s\n",
+                   cur, val, (ret == -1) ? ", not supported" : "");
+        break;
+
     default:
         ret = mce_bank_msr(cur, msr) ? bank_mce_wrmsr(cur, msr, val) : 0;
         break;
@@ -326,7 +357,8 @@ static int vmce_save_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h)
         struct hvm_vmce_vcpu ctxt = {
             .caps = v->arch.vmce.mcg_cap,
             .mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2,
-            .mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2
+            .mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2,
+            .mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl,
         };
 
         err = hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, &ctxt);
diff --git a/xen/include/asm-x86/mce.h b/xen/include/asm-x86/mce.h
index 56ad1f92dd..35f9962638 100644
--- a/xen/include/asm-x86/mce.h
+++ b/xen/include/asm-x86/mce.h
@@ -27,6 +27,7 @@ struct vmce_bank {
 struct vmce {
     uint64_t mcg_cap;
     uint64_t mcg_status;
+    uint64_t mcg_ext_ctl;
     spinlock_t lock;
     struct vmce_bank bank[GUEST_MC_BANK_NUM];
 };
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 816973b9c2..fd7bf3fb38 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -610,6 +610,7 @@ struct hvm_vmce_vcpu {
     uint64_t caps;
     uint64_t mci_ctl2_bank0;
     uint64_t mci_ctl2_bank1;
+    uint64_t mcg_ext_ctl;
 };
 
 DECLARE_HVM_SAVE_TYPE(VMCE_VCPU, 18, struct hvm_vmce_vcpu);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 07/11] x86/vmce: enable injecting LMCE to guest on Intel host
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (5 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 06/11] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 08/11] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP Haozhong Zhang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Jan Beulich, Andrew Cooper

Inject LMCE to guest if the host MCE is LMCE and the affected vcpu is
known. Otherwise, broadcast MCE to all vcpus on Intel host.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes in v4:
 (Take Jan's R-b with following changes.)
 * Change type of mc_vcpuid in mc_memerr_dhandler() from uint16_t to
   unsigned int.
 * Add a missing space in ASSERT() in fill_vmsr_data().
---
 xen/arch/x86/cpu/mcheck/mcaction.c | 23 ++++++++++++++++-------
 xen/arch/x86/cpu/mcheck/vmce.c     | 11 ++++++++++-
 xen/arch/x86/cpu/mcheck/vmce.h     |  2 +-
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c
index ca17d22bd8..f959bed2cb 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -44,6 +44,7 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
     unsigned long mfn, gfn;
     uint32_t status;
     int vmce_vcpuid;
+    unsigned int mc_vcpuid;
 
     if (!mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL)) {
         dprintk(XENLOG_WARNING,
@@ -88,18 +89,26 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
                     goto vmce_failed;
                 }
 
-                if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
-                    global->mc_vcpuid == XEN_MC_VCPUID_INVALID)
+                mc_vcpuid = global->mc_vcpuid;
+                if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
+                    /*
+                     * Because MC# may happen asynchronously with the actual
+                     * operation that triggers the error, the domain ID as
+                     * well as the vCPU ID collected in 'global' at MC# are
+                     * not always precise. In that case, fallback to broadcast.
+                     */
+                    global->mc_domid != bank->mc_domid ||
+                    (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+                     (!(global->mc_gstatus & MCG_STATUS_LMCE) ||
+                      !(d->vcpu[mc_vcpuid]->arch.vmce.mcg_ext_ctl &
+                        MCG_EXT_CTL_LMCE_EN))))
                     vmce_vcpuid = VMCE_INJECT_BROADCAST;
                 else
-                    vmce_vcpuid = global->mc_vcpuid;
+                    vmce_vcpuid = mc_vcpuid;
 
                 bank->mc_addr = gfn << PAGE_SHIFT |
                   (bank->mc_addr & (PAGE_SIZE -1 ));
-                /* TODO: support injecting LMCE */
-                if (fill_vmsr_data(bank, d,
-                                   global->mc_gstatus & ~MCG_STATUS_LMCE,
-                                   vmce_vcpuid == VMCE_INJECT_BROADCAST))
+                if (fill_vmsr_data(bank, d, global->mc_gstatus, vmce_vcpuid))
                 {
                     mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d "
                       "failed\n", bank->mc_domid);
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 210670638f..9830835c5a 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -464,14 +464,23 @@ static int vcpu_fill_mc_msrs(struct vcpu *v, uint64_t mcg_status,
 }
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-                   uint64_t gstatus, bool broadcast)
+                   uint64_t gstatus, int vmce_vcpuid)
 {
     struct vcpu *v = d->vcpu[0];
+    bool broadcast = (vmce_vcpuid == VMCE_INJECT_BROADCAST);
     int ret, err;
 
     if ( mc_bank->mc_domid == DOMID_INVALID )
         return -EINVAL;
 
+    if ( broadcast )
+        gstatus &= ~MCG_STATUS_LMCE;
+    else if ( gstatus & MCG_STATUS_LMCE )
+    {
+        ASSERT(vmce_vcpuid >= 0 && vmce_vcpuid < d->max_vcpus);
+        v = d->vcpu[vmce_vcpuid];
+    }
+
     /*
      * vMCE with the actual error information is injected to vCPU0,
      * and, if broadcast is required, we choose to inject less severe
diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h
index 74f6381460..2797e00275 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.h
+++ b/xen/arch/x86/cpu/mcheck/vmce.h
@@ -17,7 +17,7 @@ int vmce_amd_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val);
 int vmce_amd_wrmsr(struct vcpu *, uint32_t msr, uint64_t val);
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-                   uint64_t gstatus, bool broadcast);
+                   uint64_t gstatus, int vmce_vcpuid);
 
 #define VMCE_INJECT_BROADCAST (-1)
 int inject_vmce(struct domain *d, int vcpu);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 08/11] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (6 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 07/11] x86/vmce: enable injecting LMCE to guest on Intel host Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-29 13:02   ` Wei Liu
  2017-06-26  9:16 ` [PATCH v4 09/11] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2 Haozhong Zhang
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present
in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP.
By default, LMCE is not exposed to guest so as to keep the backwards migration
compatibility.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com> for hypervisor side
---
Changes in v4:
 * Handle HVM_PARAM_MCA_CAP in xc_sr_save_x86_hvm.c:write_hvm_params().

Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 docs/man/xl.cfg.pod.5.in            | 24 ++++++++++++++++++++++++
 tools/libxc/xc_sr_save_x86_hvm.c    |  1 +
 tools/libxl/libxl.h                 |  7 +++++++
 tools/libxl/libxl_dom.c             | 15 +++++++++++++++
 tools/libxl/libxl_types.idl         |  1 +
 tools/xl/xl_parse.c                 | 31 +++++++++++++++++++++++++++++--
 xen/arch/x86/cpu/mcheck/mce.h       |  1 +
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/vmce.c      | 19 ++++++++++++++++++-
 xen/arch/x86/hvm/hvm.c              |  5 +++++
 xen/include/asm-x86/mce.h           |  1 +
 xen/include/public/hvm/params.h     |  7 ++++++-
 12 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 38084c723a..51ec74325d 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -2168,6 +2168,30 @@ natively or via hardware backwards compatibility support.
 
 =back
 
+=head3 x86
+
+=over 4
+
+=item B<mca_caps=[ "CAP", "CAP", ... ]>
+
+(HVM only) Enable MCA capabilities besides default ones enabled
+by Xen hypervisor for the HVM domain. "CAP" can be one in the
+following list:
+
+=over 4
+
+=item B<"lmce">
+
+Intel local MCE
+
+=item B<default>
+
+No MCA capabilities in above list are enabled.
+
+=back
+
+=back
+
 =head1 SEE ALSO
 
 =over 4
diff --git a/tools/libxc/xc_sr_save_x86_hvm.c b/tools/libxc/xc_sr_save_x86_hvm.c
index fc5c6ea93e..e17bb59146 100644
--- a/tools/libxc/xc_sr_save_x86_hvm.c
+++ b/tools/libxc/xc_sr_save_x86_hvm.c
@@ -77,6 +77,7 @@ static int write_hvm_params(struct xc_sr_context *ctx)
         HVM_PARAM_IOREQ_SERVER_PFN,
         HVM_PARAM_NR_IOREQ_SERVER_PAGES,
         HVM_PARAM_X87_FIP_WIDTH,
+        HVM_PARAM_MCA_CAP,
     };
 
     xc_interface *xch = ctx->xch;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index cf8687aa7e..7cf0f31f68 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -922,6 +922,13 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
  * If this is defined, the Code and Data Prioritization feature is supported.
  */
 #define LIBXL_HAVE_PSR_CDP 1
+
+/*
+ * LIBXL_HAVE_MCA_CAPS
+ *
+ * If this is defined, setting MCA capabilities for HVM domain is supported.
+ */
+#define LIBXL_HAVE_MCA_CAPS 1
 #endif
 
 /*
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 5d914a59ee..f54fd49a73 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -279,6 +279,17 @@ err:
     libxl_bitmap_dispose(&enlightenments);
     return ERROR_FAIL;
 }
+
+static int hvm_set_mca_capabilities(libxl__gc *gc, uint32_t domid,
+                                    libxl_domain_build_info *const info)
+{
+    unsigned long caps = info->u.hvm.mca_caps;
+
+    if (!caps)
+        return 0;
+
+    return xc_hvm_param_set(CTX->xch, domid, HVM_PARAM_MCA_CAP, caps);
+}
 #endif
 
 static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
@@ -440,6 +451,10 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         rc = hvm_set_viridian_features(gc, domid, info);
         if (rc)
             return rc;
+
+        rc = hvm_set_mca_capabilities(gc, domid, info);
+        if (rc)
+            return rc;
 #endif
     }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 22044259f3..8a9849c643 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -564,6 +564,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
+                                       ("mca_caps",         uint64),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 856a304b30..5c2bf17222 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -18,6 +18,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <xen/hvm/e820.h>
+#include <xen/hvm/params.h>
 
 #include <libxl.h>
 #include <libxl_utils.h>
@@ -813,8 +814,9 @@ void parse_config_data(const char *config_source,
     XLU_Config *config;
     XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms,
                    *usbctrls, *usbdevs, *p9devs;
-    XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian, *dtdevs;
-    int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian;
+    XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian, *dtdevs,
+                   *mca_caps;
+    int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian, num_mca_caps;
     int pci_power_mgmt = 0;
     int pci_msitranslate = 0;
     int pci_permissive = 0;
@@ -1182,6 +1184,31 @@ void parse_config_data(const char *config_source,
 
         if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
             b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
+
+        switch (xlu_cfg_get_list(config, "mca_caps",
+                                 &mca_caps, &num_mca_caps, 1))
+        {
+        case 0: /* Success */
+            for (i = 0; i < num_mca_caps; i++) {
+                buf = xlu_cfg_get_listitem(mca_caps, i);
+                if (!strcmp(buf, "lmce"))
+                    b_info->u.hvm.mca_caps |= XEN_HVM_MCA_CAP_LMCE;
+                else {
+                    fprintf(stderr, "ERROR: unrecognized MCA capability '%s'.\n",
+                            buf);
+                    exit(-ERROR_FAIL);
+                }
+            }
+            break;
+
+        case ESRCH: /* Option not present */
+            break;
+
+        default:
+            fprintf(stderr, "ERROR: unable to parse mca_caps.\n");
+            exit(-ERROR_FAIL);
+        }
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
index 4f13791948..664161a2af 100644
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -38,6 +38,7 @@ enum mcheck_type {
 };
 
 extern uint8_t cmci_apic_vector;
+extern bool lmce_support;
 
 /* Init functions */
 enum mcheck_type amd_mcheck_init(struct cpuinfo_x86 *c);
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 5cb49ca697..4c001b407f 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -30,7 +30,7 @@ boolean_param("mce_fb", mce_force_broadcast);
 static int __read_mostly nr_intel_ext_msrs;
 
 /* If mce_force_broadcast == 1, lmce_support will be disabled forcibly. */
-static bool __read_mostly lmce_support;
+bool __read_mostly lmce_support;
 
 /* Intel SDM define bit15~bit0 of IA32_MCi_STATUS as the MC error code */
 #define INTEL_MCCOD_MASK 0xFFFF
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 9830835c5a..a34c3d38d1 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -74,7 +74,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt)
     unsigned long guest_mcg_cap;
 
     if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        guest_mcg_cap = INTEL_GUEST_MCG_CAP;
+        guest_mcg_cap = INTEL_GUEST_MCG_CAP | MCG_LMCE_P;
     else
         guest_mcg_cap = AMD_GUEST_MCG_CAP;
 
@@ -546,3 +546,20 @@ int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn)
     return rc;
 }
 
+int vmce_enable_mca_cap(struct domain *d, uint64_t cap)
+{
+    struct vcpu *v;
+
+    if ( cap & ~XEN_HVM_MCA_CAP_MASK )
+        return -EINVAL;
+
+    if ( cap & XEN_HVM_MCA_CAP_LMCE )
+    {
+        if ( !lmce_support )
+            return -EINVAL;
+        for_each_vcpu(d, v)
+            v->arch.vmce.mcg_cap |= MCG_LMCE_P;
+    }
+
+    return 0;
+}
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 70ddc81d44..fa72d1bd1d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3985,6 +3985,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_IOREQ_SERVER_PFN:
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
+    case HVM_PARAM_MCA_CAP:
         if ( value != 0 && a->value != value )
             rc = -EEXIST;
         break;
@@ -4196,6 +4197,10 @@ static int hvmop_set_param(
                                                (0x10000 / 8) + 1) << 32);
         a.value |= VM86_TSS_UPDATED;
         break;
+
+    case HVM_PARAM_MCA_CAP:
+        rc = vmce_enable_mca_cap(d, a.value);
+        break;
     }
 
     if ( rc != 0 )
diff --git a/xen/include/asm-x86/mce.h b/xen/include/asm-x86/mce.h
index 35f9962638..d2933c91bf 100644
--- a/xen/include/asm-x86/mce.h
+++ b/xen/include/asm-x86/mce.h
@@ -38,6 +38,7 @@ extern int vmce_restore_vcpu(struct vcpu *, const struct hvm_vmce_vcpu *);
 extern int vmce_wrmsr(uint32_t msr, uint64_t val);
 extern int vmce_rdmsr(uint32_t msr, uint64_t *val);
 extern bool vmce_has_lmce(const struct vcpu *v);
+extern int vmce_enable_mca_cap(struct domain *d, uint64_t cap);
 
 extern unsigned int nr_mce_banks;
 
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 1f3ed0906d..2ec2e7c80f 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -274,6 +274,11 @@
  */
 #define HVM_PARAM_VM86_TSS_SIZED 37
 
-#define HVM_NR_PARAMS 38
+/* Enable MCA capabilities. */
+#define HVM_PARAM_MCA_CAP 38
+#define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
+#define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
+
+#define HVM_NR_PARAMS 39
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 09/11] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (7 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 08/11] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 10/11] tools/libxc: add support of injecting MC# to specified CPUs Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 11/11] tools/xen-mceinj: add support of injecting LMCE Haozhong Zhang
  10 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Jan Beulich, Andrew Cooper

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes in v4:
 (Take Jan's R-b with following changes.)
 * Adjust error messages.
---
 xen/arch/x86/cpu/mcheck/mce.c         | 24 +++++++++++++++++++++++-
 xen/include/public/arch-x86/xen-mca.h |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 2428cc0762..19e8a70cd6 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1485,11 +1485,12 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
     {
         const cpumask_t *cpumap;
         cpumask_var_t cmv;
+        bool broadcast = op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST;
 
         if (nr_mce_banks == 0)
             return x86_mcerr("do_mca #MC", -ENODEV);
 
-        if ( op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST )
+        if ( broadcast )
             cpumap = &cpu_online_map;
         else
         {
@@ -1529,6 +1530,27 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
             }
             break;
 
+        case XEN_MC_INJECT_TYPE_LMCE:
+            if ( !lmce_support )
+            {
+                ret = x86_mcerr("No LMCE support", -EINVAL);
+                break;
+            }
+            if ( broadcast )
+            {
+                ret = x86_mcerr("Broadcast cannot be used with LMCE", -EINVAL);
+                break;
+            }
+            /* Ensure at most one CPU is specified. */
+            if ( nr_cpu_ids > cpumask_next(cpumask_first(cpumap), cpumap) )
+            {
+                ret = x86_mcerr("More than one CPU specified for LMCE",
+                                -EINVAL);
+                break;
+            }
+            on_selected_cpus(cpumap, x86_mc_mceinject, NULL, 1);
+            break;
+
         default:
             ret = x86_mcerr("Wrong mca type\n", -EINVAL);
             break;
diff --git a/xen/include/public/arch-x86/xen-mca.h b/xen/include/public/arch-x86/xen-mca.h
index 7db990723b..dc35267249 100644
--- a/xen/include/public/arch-x86/xen-mca.h
+++ b/xen/include/public/arch-x86/xen-mca.h
@@ -414,6 +414,7 @@ struct xen_mc_mceinject {
 #define XEN_MC_INJECT_TYPE_MASK     0x7
 #define XEN_MC_INJECT_TYPE_MCE      0x0
 #define XEN_MC_INJECT_TYPE_CMCI     0x1
+#define XEN_MC_INJECT_TYPE_LMCE     0x2
 
 #define XEN_MC_INJECT_CPU_BROADCAST 0x8
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 10/11] tools/libxc: add support of injecting MC# to specified CPUs
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (8 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 09/11] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2 Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  2017-06-26  9:16 ` [PATCH v4 11/11] tools/xen-mceinj: add support of injecting LMCE Haozhong Zhang
  10 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Ian Jackson, Wei Liu

Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the
current xc_mca_op() does not use this feature and not provide an
interface to callers. This commit add a new xc_mca_op_inject_v2() that
receives a cpumap providing the set of target CPUs.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xenctrl.h |  2 ++
 tools/libxc/xc_misc.c         | 52 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 1629f412dd..85169b0553 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1799,6 +1799,8 @@ int xc_cpuid_apply_policy(xc_interface *xch,
 void xc_cpuid_to_str(const unsigned int *regs,
                      char **strs); /* some strs[] may be NULL if ENOMEM */
 int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
+int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
+                        xc_cpumap_t cpumap, unsigned int nr_cpus);
 #endif
 
 struct xc_px_val {
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 88084fde30..2303293c6c 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -341,7 +341,57 @@ int xc_mca_op(xc_interface *xch, struct xen_mc *mc)
     xc_hypercall_bounce_post(xch, mc);
     return ret;
 }
-#endif
+
+int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
+                        xc_cpumap_t cpumap, unsigned int nr_bits)
+{
+    int ret = -1;
+    struct xen_mc mc_buf, *mc = &mc_buf;
+    struct xen_mc_inject_v2 *inject = &mc->u.mc_inject_v2;
+
+    DECLARE_HYPERCALL_BOUNCE(cpumap, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BOUNCE(mc, sizeof(*mc), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+
+    memset(mc, 0, sizeof(*mc));
+
+    if ( cpumap )
+    {
+        if ( !nr_bits )
+        {
+            errno = EINVAL;
+            goto out;
+        }
+
+        HYPERCALL_BOUNCE_SET_SIZE(cpumap, (nr_bits + 7) / 8);
+        if ( xc_hypercall_bounce_pre(xch, cpumap) )
+        {
+            PERROR("Could not bounce cpumap memory buffer");
+            goto out;
+        }
+        set_xen_guest_handle(inject->cpumap.bitmap, cpumap);
+        inject->cpumap.nr_bits = nr_bits;
+    }
+
+    inject->flags = flags;
+    mc->cmd = XEN_MC_inject_v2;
+    mc->interface_version = XEN_MCA_INTERFACE_VERSION;
+
+    if ( xc_hypercall_bounce_pre(xch, mc) )
+    {
+        PERROR("Could not bounce xen_mc memory buffer");
+        goto out_free_cpumap;
+    }
+
+    ret = xencall1(xch->xcall, __HYPERVISOR_mca, HYPERCALL_BUFFER_AS_ARG(mc));
+
+    xc_hypercall_bounce_post(xch, mc);
+out_free_cpumap:
+    if ( cpumap )
+        xc_hypercall_bounce_post(xch, cpumap);
+out:
+    return ret;
+}
+#endif /* __i386__ || __x86_64__ */
 
 int xc_perfc_reset(xc_interface *xch)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 11/11] tools/xen-mceinj: add support of injecting LMCE
  2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
                   ` (9 preceding siblings ...)
  2017-06-26  9:16 ` [PATCH v4 10/11] tools/libxc: add support of injecting MC# to specified CPUs Haozhong Zhang
@ 2017-06-26  9:16 ` Haozhong Zhang
  10 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2017-06-26  9:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Ian Jackson, Wei Liu

If option '-l' or '--lmce' is specified and the host supports LMCE,
xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c'
is not present).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/tests/mce-test/tools/xen-mceinj.c | 50 +++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/tools/tests/mce-test/tools/xen-mceinj.c b/tools/tests/mce-test/tools/xen-mceinj.c
index bae5a46eb5..380e42190c 100644
--- a/tools/tests/mce-test/tools/xen-mceinj.c
+++ b/tools/tests/mce-test/tools/xen-mceinj.c
@@ -56,6 +56,8 @@
 #define MSR_IA32_MC0_MISC        0x00000403
 #define MSR_IA32_MC0_CTL2        0x00000280
 
+#define MCG_STATUS_LMCE          0x8
+
 struct mce_info {
     const char *description;
     uint8_t mcg_stat;
@@ -113,6 +115,7 @@ static struct mce_info mce_table[] = {
 #define LOGFILE stdout
 
 int dump;
+int lmce;
 struct xen_mc_msrinject msr_inj;
 
 static void Lprintf(const char *fmt, ...)
@@ -212,6 +215,35 @@ static int inject_mce(xc_interface *xc_handle, int cpu_nr)
     return xc_mca_op(xc_handle, &mc);
 }
 
+static int inject_lmce(xc_interface *xc_handle, unsigned int cpu)
+{
+    uint8_t *cpumap = NULL;
+    size_t cpumap_size, line, shift;
+    unsigned int nr_cpus;
+    int ret;
+
+    nr_cpus = mca_cpuinfo(xc_handle);
+    if ( !nr_cpus )
+        err(xc_handle, "Failed to get mca_cpuinfo");
+    if ( cpu >= nr_cpus )
+        err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1);
+
+    cpumap_size = (nr_cpus + 7) / 8;
+    cpumap = malloc(cpumap_size);
+    if ( !cpumap )
+        err(xc_handle, "Failed to allocate cpumap\n");
+    memset(cpumap, 0, cpumap_size);
+    line = cpu / 8;
+    shift = cpu % 8;
+    memset(cpumap + line, 1 << shift, 1);
+
+    ret = xc_mca_op_inject_v2(xc_handle, XEN_MC_INJECT_TYPE_LMCE,
+                              cpumap, cpumap_size * 8);
+
+    free(cpumap);
+    return ret;
+}
+
 static uint64_t bank_addr(int bank, int type)
 {
     uint64_t addr;
@@ -330,8 +362,15 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce,
                   uint32_t cpu_nr, uint32_t domain, uint64_t gaddr)
 {
     int ret = 0;
+    uint8_t mcg_status = mce->mcg_stat;
 
-    ret = inject_mcg_status(xc_handle, cpu_nr, mce->mcg_stat, domain);
+    if ( lmce )
+    {
+        if ( mce->cmci )
+            err(xc_handle, "No support to inject CMCI as LMCE");
+        mcg_status |= MCG_STATUS_LMCE;
+    }
+    ret = inject_mcg_status(xc_handle, cpu_nr, mcg_status, domain);
     if ( ret )
         err(xc_handle, "Failed to inject MCG_STATUS MSR");
 
@@ -354,6 +393,8 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce,
         err(xc_handle, "Failed to inject MSR");
     if ( mce->cmci )
         ret = inject_cmci(xc_handle, cpu_nr);
+    else if ( lmce )
+        ret = inject_lmce(xc_handle, cpu_nr);
     else
         ret = inject_mce(xc_handle, cpu_nr);
     if ( ret )
@@ -393,6 +434,7 @@ static struct option opts[] = {
     {"dump", 0, 0, 'D'},
     {"help", 0, 0, 'h'},
     {"page", 0, 0, 'p'},
+    {"lmce", 0, 0, 'l'},
     {"", 0, 0, '\0'}
 };
 
@@ -409,6 +451,7 @@ static void help(void)
            "  -d, --domain=DOMID   target domain, the default is Xen itself\n"
            "  -h, --help           print this page\n"
            "  -p, --page=ADDR      physical address to report\n"
+           "  -l, --lmce           inject as LMCE (Intel only)\n"
            "  -t, --type=ERROR     error type\n");
 
     for ( i = 0; i < MCE_TABLE_SIZE; i++ )
@@ -438,7 +481,7 @@ int main(int argc, char *argv[])
     }
 
     while ( 1 ) {
-        c = getopt_long(argc, argv, "c:Dd:t:hp:", opts, &opt_index);
+        c = getopt_long(argc, argv, "c:Dd:t:hp:l", opts, &opt_index);
         if ( c == -1 )
             break;
         switch ( c ) {
@@ -463,6 +506,9 @@ int main(int argc, char *argv[])
         case 't':
             type = strtol(optarg, NULL, 0);
             break;
+        case 'l':
+            lmce = 1;
+            break;
         case 'h':
         default:
             help();
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl
  2017-06-26  9:16 ` [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl Haozhong Zhang
@ 2017-06-27  6:28   ` Jan Beulich
  2017-06-29  1:56     ` haozhong.zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2017-06-27  6:28 UTC (permalink / raw)
  To: haozhong.zhang; +Cc: andrew.cooper3, xen-devel

>>> Haozhong Zhang <haozhong.zhang@intel.com> 06/26/17 11:16 AM >>>
>struct mc_telem_cpu_ctl is now used as the type of per-cpu variables,
>rather than a globla variable shared by all CPUs, so some of its
>comment do not apply any more.
>
>Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

There's no need to re-send, but for the future in any such cases where you're
adjusting for an earlier oversight it would be nice to make the connection by
naming the earlier commit.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 02/11] xen/mce: allow mce_barrier_{enter, exit} to return without waiting
  2017-06-26  9:16 ` [PATCH v4 02/11] xen/mce: allow mce_barrier_{enter, exit} to return without waiting Haozhong Zhang
@ 2017-06-27  6:38   ` Jan Beulich
  2017-06-29  2:00     ` haozhong.zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2017-06-27  6:38 UTC (permalink / raw)
  To: haozhong.zhang, xen-devel; +Cc: andrew.cooper3

>>> Haozhong Zhang <haozhong.zhang@intel.com> 06/26/17 11:17 AM >>>
>Add a 'nowait' argument to mce_barrier_{enter,exit}() to allow them
>return immediately without waiting mce_barrier_{enter,exit}() on other
>CPUs. This is useful when handling LMCE, where mce_barrier_{enter,exit}
>are called only on one CPU.
>
>Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

The patch could have my ack in principle, but ...

>--- a/xen/arch/x86/cpu/mcheck/barrier.h
>+++ b/xen/arch/x86/cpu/mcheck/barrier.h
>@@ -32,6 +32,14 @@ void mce_barrier_init(struct mce_softirq_barrier *);
 >void mce_barrier_dec(struct mce_softirq_barrier *);
 >
 >/*
>+ * If nowait is true, mce_barrier_enter/exit() will return immediately
>+ * without touching the barrier. It's used when handling a LMCE which
>+ * is received on only one CPU and thus does not invoke
>+ * mce_barrier_enter/exit() calls on all CPUs.
>+ *
>+ * If nowait is false, mce_barrier_enter/exit() will handle the given
>+ * barrier as below.
>+ *
  >* Increment the generation number and the value. The generation number
  >* is incremented when entering a barrier. This way, it can be checked
  >* on exit if a CPU is trying to re-enter the barrier. This can happen

... especially with the complete lack of mentioning the mce_broadcast
check inside the functions I wonder whether it wouldn't better be the
callers now to pass "!mce_broadcast" into the functions, instead of
"false". What do you think? Which then further makes me wonder
whether the parameter wouldn't better be inverted ("wait" instead of
"nowait").

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 03/11] x86/mce: handle host LMCE
  2017-06-26  9:16 ` [PATCH v4 03/11] x86/mce: handle host LMCE Haozhong Zhang
@ 2017-06-27  7:13   ` Jan Beulich
  2017-06-29  3:29     ` haozhong.zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2017-06-27  7:13 UTC (permalink / raw)
  To: haozhong.zhang; +Cc: andrew.cooper3, xen-devel

>>> Haozhong Zhang <haozhong.zhang@intel.com> 06/26/17 11:17 AM >>>
>+/**
>+ * Append a telemetry of deferred MCE to a per-cpu pending list,
>+ * either @pending or @lmce_pending, according to rules below:
>+ *  - if @pending is not empty, then the new telemetry will be
>+ *    appended to @pending;
>+ *  - if @pending is empty and the new telemetry is for a deferred
>+ *    LMCE, then the new telemetry will be appended to @lmce_pending;
>+ *  - if @pending is empty and the new telemetry is for a deferred
>+ *    non-local MCE, all existing telemetries in @lmce_pending will be
>+ *    moved to @pending and then the new telemetry will be appended to
>+ *    @pending.
>+ *
>+ * This function must be called with MCIP bit set, so that it does not
>+ * need to worry about MC# re-occurring in this function.
>+ *
>+ * As a result, this function can preserve the mutual exclusivity
>+ * between @pending and @lmce_pending (see their comments in struct
>+ * mc_telem_cpu_ctl).
>+ *
>+ * Parameters:
>+ *  @cookie: telemetry of the deferred MCE
>+ *  @lmce:   indicate whether the telemetry is for LMCE
>+ */
>+void mctelem_defer(mctelem_cookie_t cookie, bool lmce)
 >{
 	>struct mctelem_ent *tep = COOKIE2MCTE(cookie);
>-
>-	mctelem_xchg_head(&this_cpu(mctctl.pending), &tep->mcte_next, tep);
>+	struct mc_telem_cpu_ctl *mctctl = &this_cpu(mctctl);
>+
>+	ASSERT(mctctl->pending == NULL || mctctl->lmce_pending == NULL);
>+
>+	if (mctctl->pending)
>+		mctelem_xchg_head(&mctctl->pending, &tep->mcte_next, tep);
>+	else if (lmce)
>+		mctelem_xchg_head(&mctctl->lmce_pending, &tep->mcte_next, tep);
>+	else {
>+		if (mctctl->lmce_pending)
>+			mctelem_xchg_head(&mctctl->lmce_pending,
>+					  &mctctl->pending, NULL);

I don't think this is sufficiently proven to be safe: This may set ->pending to
non-NULL more than once, and while your comment above considers the
producer side, it doesn't consider the consumer(s). This is even more so that
the consumer side uses potentially stale information to tell which list head to
update.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl
  2017-06-27  6:28   ` Jan Beulich
@ 2017-06-29  1:56     ` haozhong.zhang
  0 siblings, 0 replies; 20+ messages in thread
From: haozhong.zhang @ 2017-06-29  1:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel

On 06/27/17 00:28 -0600, Jan Beulich wrote:
> >>> Haozhong Zhang <haozhong.zhang@intel.com> 06/26/17 11:16 AM >>>
> >struct mc_telem_cpu_ctl is now used as the type of per-cpu variables,
> >rather than a globla variable shared by all CPUs, so some of its
> >comment do not apply any more.
> >
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 
> There's no need to re-send, but for the future in any such cases where you're
> adjusting for an earlier oversight it would be nice to make the connection by
> naming the earlier commit.
> 

c/s cbc585158f ("x86/mce: eliminate unnecessary NR_CPUS-sized arrays") introduced
struct mc_telem_cpu_ctl and used it in the per-cpu way.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 02/11] xen/mce: allow mce_barrier_{enter, exit} to return without waiting
  2017-06-27  6:38   ` Jan Beulich
@ 2017-06-29  2:00     ` haozhong.zhang
  0 siblings, 0 replies; 20+ messages in thread
From: haozhong.zhang @ 2017-06-29  2:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel

On 06/27/17 00:38 -0600, Jan Beulich wrote:
> >>> Haozhong Zhang <haozhong.zhang@intel.com> 06/26/17 11:17 AM >>>
> >Add a 'nowait' argument to mce_barrier_{enter,exit}() to allow them
> >return immediately without waiting mce_barrier_{enter,exit}() on other
> >CPUs. This is useful when handling LMCE, where mce_barrier_{enter,exit}
> >are called only on one CPU.
> >
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> 
> The patch could have my ack in principle, but ...
> 
> >--- a/xen/arch/x86/cpu/mcheck/barrier.h
> >+++ b/xen/arch/x86/cpu/mcheck/barrier.h
> >@@ -32,6 +32,14 @@ void mce_barrier_init(struct mce_softirq_barrier *);
>  >void mce_barrier_dec(struct mce_softirq_barrier *);
>  >
>  >/*
> >+ * If nowait is true, mce_barrier_enter/exit() will return immediately
> >+ * without touching the barrier. It's used when handling a LMCE which
> >+ * is received on only one CPU and thus does not invoke
> >+ * mce_barrier_enter/exit() calls on all CPUs.
> >+ *
> >+ * If nowait is false, mce_barrier_enter/exit() will handle the given
> >+ * barrier as below.
> >+ *
>   >* Increment the generation number and the value. The generation number
>   >* is incremented when entering a barrier. This way, it can be checked
>   >* on exit if a CPU is trying to re-enter the barrier. This can happen
> 
> ... especially with the complete lack of mentioning the mce_broadcast
> check inside the functions I wonder whether it wouldn't better be the
> callers now to pass "!mce_broadcast" into the functions, instead of
> "false". What do you think? Which then further makes me wonder
> whether the parameter wouldn't better be inverted ("wait" instead of
> "nowait").
> 

In that case, it's better to use "wait" and let caller pass in
"mce_broadcast".

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 03/11] x86/mce: handle host LMCE
  2017-06-27  7:13   ` Jan Beulich
@ 2017-06-29  3:29     ` haozhong.zhang
  2017-06-29  6:29       ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: haozhong.zhang @ 2017-06-29  3:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel

On 06/27/17 01:13 -0600, Jan Beulich wrote:
> >>> Haozhong Zhang <haozhong.zhang@intel.com> 06/26/17 11:17 AM >>>
> >+/**
> >+ * Append a telemetry of deferred MCE to a per-cpu pending list,
> >+ * either @pending or @lmce_pending, according to rules below:
> >+ *  - if @pending is not empty, then the new telemetry will be
> >+ *    appended to @pending;
> >+ *  - if @pending is empty and the new telemetry is for a deferred
> >+ *    LMCE, then the new telemetry will be appended to @lmce_pending;
> >+ *  - if @pending is empty and the new telemetry is for a deferred
> >+ *    non-local MCE, all existing telemetries in @lmce_pending will be
> >+ *    moved to @pending and then the new telemetry will be appended to
> >+ *    @pending.
> >+ *
> >+ * This function must be called with MCIP bit set, so that it does not
> >+ * need to worry about MC# re-occurring in this function.
> >+ *
> >+ * As a result, this function can preserve the mutual exclusivity
> >+ * between @pending and @lmce_pending (see their comments in struct
> >+ * mc_telem_cpu_ctl).
> >+ *
> >+ * Parameters:
> >+ *  @cookie: telemetry of the deferred MCE
> >+ *  @lmce:   indicate whether the telemetry is for LMCE
> >+ */
> >+void mctelem_defer(mctelem_cookie_t cookie, bool lmce)
>  >{
>  	>struct mctelem_ent *tep = COOKIE2MCTE(cookie);
> >-
> >-	mctelem_xchg_head(&this_cpu(mctctl.pending), &tep->mcte_next, tep);
> >+	struct mc_telem_cpu_ctl *mctctl = &this_cpu(mctctl);
> >+
> >+	ASSERT(mctctl->pending == NULL || mctctl->lmce_pending == NULL);
> >+
> >+	if (mctctl->pending)
> >+		mctelem_xchg_head(&mctctl->pending, &tep->mcte_next, tep);
> >+	else if (lmce)
> >+		mctelem_xchg_head(&mctctl->lmce_pending, &tep->mcte_next, tep);
> >+	else {
> >+		if (mctctl->lmce_pending)
> >+			mctelem_xchg_head(&mctctl->lmce_pending,
> >+					  &mctctl->pending, NULL);
> 
> I don't think this is sufficiently proven to be safe: This may set ->pending to
> non-NULL more than once, and while your comment above considers the
> producer side, it doesn't consider the consumer(s). This is even more so that
> the consumer side uses potentially stale information to tell which list head to
> update.

What problems do you think will be caused by setting ->pending to
non-NULL more than once? The only such case is the last else branch:
it corresponds to a broadcasting MC#, so all CPUs are in the exception
context and no one is consuming ->pending at this moment.

Haozhong


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 03/11] x86/mce: handle host LMCE
  2017-06-29  3:29     ` haozhong.zhang
@ 2017-06-29  6:29       ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2017-06-29  6:29 UTC (permalink / raw)
  To: haozhong.zhang; +Cc: andrew.cooper3, xen-devel

>>> <haozhong.zhang@intel.com> 06/29/17 5:29 AM >>>
>On 06/27/17 01:13 -0600, Jan Beulich wrote:
>> >>> Haozhong Zhang <haozhong.zhang@intel.com> 06/26/17 11:17 AM >>>
>> >+	if (mctctl->pending)
>> >+		mctelem_xchg_head(&mctctl->pending, &tep->mcte_next, tep);
>> >+	else if (lmce)
>> >+		mctelem_xchg_head(&mctctl->lmce_pending, &tep->mcte_next, tep);
>> >+	else {
>> >+		if (mctctl->lmce_pending)
>> >+			mctelem_xchg_head(&mctctl->lmce_pending,
>> >+					  &mctctl->pending, NULL);
>> 
>> I don't think this is sufficiently proven to be safe: This may set ->pending to
>> non-NULL more than once, and while your comment above considers the
>> producer side, it doesn't consider the consumer(s). This is even more so that
>> the consumer side uses potentially stale information to tell which list head to
>> update.
>
>What problems do you think will be caused by setting ->pending to
>non-NULL more than once? The only such case is the last else branch:
>it corresponds to a broadcasting MC#, so all CPUs are in the exception
>context and no one is consuming ->pending at this moment.

Right, but for cases like this I think it is necessary to make this explicit via
adding a comment. The operation by itself is not as atomic as we'd want it
to be without having to consider the context in which it is being used.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 08/11] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
  2017-06-26  9:16 ` [PATCH v4 08/11] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP Haozhong Zhang
@ 2017-06-29 13:02   ` Wei Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Wei Liu @ 2017-06-29 13:02 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich, xen-devel

On Mon, Jun 26, 2017 at 05:16:22PM +0800, Haozhong Zhang wrote:
> If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present
> in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP.
> By default, LMCE is not exposed to guest so as to keep the backwards migration
> compatibility.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com> for hypervisor side

I suppose you already trid a local migration:

Acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-06-29 13:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-26  9:16 [PATCH v4 00/11] Add LMCE support Haozhong Zhang
2017-06-26  9:16 ` [PATCH v4 01/11] xen/mce: fix comment of struct mc_telem_cpu_ctl Haozhong Zhang
2017-06-27  6:28   ` Jan Beulich
2017-06-29  1:56     ` haozhong.zhang
2017-06-26  9:16 ` [PATCH v4 02/11] xen/mce: allow mce_barrier_{enter, exit} to return without waiting Haozhong Zhang
2017-06-27  6:38   ` Jan Beulich
2017-06-29  2:00     ` haozhong.zhang
2017-06-26  9:16 ` [PATCH v4 03/11] x86/mce: handle host LMCE Haozhong Zhang
2017-06-27  7:13   ` Jan Beulich
2017-06-29  3:29     ` haozhong.zhang
2017-06-29  6:29       ` Jan Beulich
2017-06-26  9:16 ` [PATCH v4 04/11] x86/mce_intel: detect and enable LMCE on Intel host Haozhong Zhang
2017-06-26  9:16 ` [PATCH v4 05/11] x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
2017-06-26  9:16 ` [PATCH v4 06/11] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL Haozhong Zhang
2017-06-26  9:16 ` [PATCH v4 07/11] x86/vmce: enable injecting LMCE to guest on Intel host Haozhong Zhang
2017-06-26  9:16 ` [PATCH v4 08/11] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP Haozhong Zhang
2017-06-29 13:02   ` Wei Liu
2017-06-26  9:16 ` [PATCH v4 09/11] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2 Haozhong Zhang
2017-06-26  9:16 ` [PATCH v4 10/11] tools/libxc: add support of injecting MC# to specified CPUs Haozhong Zhang
2017-06-26  9:16 ` [PATCH v4 11/11] tools/xen-mceinj: add support of injecting LMCE Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.