All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] MCE code cleanup and bugfix
@ 2017-02-27  5:09 Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 1/7] xen/mce: adjust comment of callback register functions Haozhong Zhang
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Christoph Egger,
	Ian Jackson, Jan Beulich

Changes in v2:
 * Only patch 6 is changed, which removes ASSERT's rather than
   modifying them.
 * Take R-b in patch 2 - 5, and A-b in patch 7.
 * Patch 1 in v1 (it only adjusts the code comment) hasn't got any R-b
   or A-b.

This patch series separates the uncommitted code cleanup and bugfix
patches from my previous patch series "[PATCH 00/19] MCE code cleanup and add LMCE support"
with other new fixes.

* This patch series is based on Jan's patch "x86/MCE: sanitize domain/vcpu ID handling".
* Patch 1 & 6 are new ones.
* Patch 2 is the previous patch 4 w/ the same title.
* Patch 3 - 5 are the previous patch 8 - 10 w/ the last one renamed.
* Patch 7 is the previous patch 11 w/ the same title.
* Individual changes are logged in each patch.

Haozhong Zhang (7):
  1/7 xen/mce: adjust comment of callback register functions
  2/7 xen/mce: remove unused x86_mcinfo_add()
  3/7 x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve()
  4/7 x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case
  5/7 x86/mce: clear MSR_IA32_MCG_STATUS by writing 0
  6/7 xen/mce: remove ASSERT's about mce_dhandler_num in mce_action()
  7/7 tools/xen-mceinj: fix the type of cpu number

 tools/tests/mce-test/tools/xen-mceinj.c | 12 +++---
 xen/arch/x86/cpu/mcheck/mcaction.c      | 20 ++++-----
 xen/arch/x86/cpu/mcheck/mce.c           | 37 ++++-------------
 xen/arch/x86/cpu/mcheck/mce.h           | 24 +++++------
 xen/arch/x86/cpu/mcheck/mce_amd.c       |  4 +-
 xen/arch/x86/cpu/mcheck/mce_intel.c     |  6 +--
 xen/arch/x86/cpu/mcheck/vmce.c          | 74 +++++++++++++++++++++++----------
 xen/arch/x86/cpu/mcheck/vmce.h          |  2 +-
 8 files changed, 91 insertions(+), 88 deletions(-)

-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/7] xen/mce: adjust comment of callback register functions
  2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
@ 2017-02-27  5:09 ` Haozhong Zhang
  2017-03-01 22:10   ` Konrad Rzeszutek Wilk
  2017-02-27  5:09 ` [PATCH v2 2/7] xen/mce: remove unused x86_mcinfo_add() Haozhong Zhang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Christoph Egger, Jan Beulich, Andrew Cooper

c/s e966818264908e842e2847f579ca4d94e586eaac added
mce_need_clearbank_register below the comment of
x86_mce_callback_register(). This commit (1) adjusts the first
paragraph of comment to be a general statement of all callback
register functions, and (2) moves the second paragraph to the
front of x86_mce_callback_register().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mce.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
index e83d431..42bb782 100644
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -128,24 +128,24 @@ extern void mcheck_mca_clearbanks(struct mca_banks *);
 extern mctelem_cookie_t mcheck_mca_logout(enum mca_source, struct mca_banks *,
     struct mca_summary *, struct mca_banks *);
 
-/* Register a callback to be made during bank telemetry logout.
- * This callback is only available to those machine check handlers
+/* Register callbacks to be made during bank telemetry logout.
+ * Those callbacks are only available to those machine check handlers
  * that call to the common mcheck_cmn_handler or who use the common
  * telemetry logout function mcheck_mca_logout in error polling.
- *
- * This can be used to collect additional information (typically non-
- * architectural) provided by newer CPU families/models without the need
- * to duplicate the whole handler resulting in various handlers each with
- * its own tweaks and bugs.  The callback receives an struct mc_info pointer
- * which it can use with x86_mcinfo_add to add additional telemetry,
- * the current MCA bank number we are reading telemetry from, and the
- * MCi_STATUS value for that bank.
  */
 
 /* Register a handler for judging whether the bank need to be cleared */
 typedef int (*mce_need_clearbank_t)(enum mca_source who, u64 status);
 extern void mce_need_clearbank_register(mce_need_clearbank_t);
 
+/* Register a callback to collect additional information (typically non-
+ * architectural) provided by newer CPU families/models without the need
+ * to duplicate the whole handler resulting in various handlers each with
+ * its own tweaks and bugs. The callback receives an struct mc_info pointer
+ * which it can use with x86_mcinfo_add to add additional telemetry,
+ * the current MCA bank number we are reading telemetry from, and the
+ * MCi_STATUS value for that bank.
+ */
 typedef struct mcinfo_extended *(*x86_mce_callback_t)
     (struct mc_info *, uint16_t, uint64_t);
 extern void x86_mce_callback_register(x86_mce_callback_t);
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/7] xen/mce: remove unused x86_mcinfo_add()
  2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 1/7] xen/mce: adjust comment of callback register functions Haozhong Zhang
@ 2017-02-27  5:09 ` Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 3/7] x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve() Haozhong Zhang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Christoph Egger, Jan Beulich, Andrew Cooper

c/s 9d13fd9fd320a7740c6446c048ff6a2990095966 turned to update the
mcinfo buffer in-place instead of using x86_mcinfo_add(). The last
uses of x86_mcinfo_add() were removed by that commit as well.
Therefore, x86_mcinfo_add() was deprecated in fact.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mce.c | 16 ----------------
 xen/arch/x86/cpu/mcheck/mce.h |  3 +--
 2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index e41989d..aa9304f 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -835,22 +835,6 @@ void *x86_mcinfo_reserve(struct mc_info *mi, int size)
     return memset(mic_index, 0, size);
 }
 
-void *x86_mcinfo_add(struct mc_info *mi, void *mcinfo)
-{
-    struct mcinfo_common *mic, *buf;
-
-    mic = (struct mcinfo_common *)mcinfo;
-    buf = x86_mcinfo_reserve(mi, mic->size);
-
-    if ( !buf )
-        mce_printk(MCE_CRITICAL,
-                   "mcinfo_add: No space left in mc_info\n");
-    else
-        memcpy(buf, mic, mic->size);
-
-    return buf;
-}
-
 static void x86_mcinfo_apei_save(
     struct mcinfo_global *mc_global, struct mcinfo_bank *mc_bank)
 {
diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
index 42bb782..a28e862 100644
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -142,7 +142,7 @@ extern void mce_need_clearbank_register(mce_need_clearbank_t);
  * architectural) provided by newer CPU families/models without the need
  * to duplicate the whole handler resulting in various handlers each with
  * its own tweaks and bugs. The callback receives an struct mc_info pointer
- * which it can use with x86_mcinfo_add to add additional telemetry,
+ * which it can use with x86_mcinfo_reserve to add additional telemetry,
  * the current MCA bank number we are reading telemetry from, and the
  * MCi_STATUS value for that bank.
  */
@@ -150,7 +150,6 @@ typedef struct mcinfo_extended *(*x86_mce_callback_t)
     (struct mc_info *, uint16_t, uint64_t);
 extern void x86_mce_callback_register(x86_mce_callback_t);
 
-void *x86_mcinfo_add(struct mc_info *mi, void *mcinfo);
 void *x86_mcinfo_reserve(struct mc_info *mi, int size);
 void x86_mcinfo_dump(struct mc_info *mi);
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/7] x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve()
  2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 1/7] xen/mce: adjust comment of callback register functions Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 2/7] xen/mce: remove unused x86_mcinfo_add() Haozhong Zhang
@ 2017-02-27  5:09 ` Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 4/7] x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case Haozhong Zhang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Christoph Egger, Jan Beulich, Andrew Cooper

All existing calls to x86_mcinfo_reserve() are followed by statements
that set the size and the type of the reserved space, so move them into
x86_mcinfo_reserve() to simplify the code.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mcaction.c  |  4 +---
 xen/arch/x86/cpu/mcheck/mce.c       | 17 +++++++++--------
 xen/arch/x86/cpu/mcheck/mce.h       |  3 ++-
 xen/arch/x86/cpu/mcheck/mce_amd.c   |  4 +---
 xen/arch/x86/cpu/mcheck/mce_intel.c |  6 +-----
 5 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c
index 322163a..32056f2 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -13,14 +13,12 @@ mci_action_add_pageoffline(int bank, struct mc_info *mi,
     if (!mi)
         return NULL;
 
-    rec = x86_mcinfo_reserve(mi, sizeof(*rec));
+    rec = x86_mcinfo_reserve(mi, sizeof(*rec), MC_TYPE_RECOVERY);
     if (!rec) {
         mi->flags |= MCINFO_FLAGS_UNCOMPLETE;
         return NULL;
     }
 
-    rec->common.type = MC_TYPE_RECOVERY;
-    rec->common.size = sizeof(*rec);
     rec->mc_bank = bank;
     rec->action_types = MC_ACTION_PAGE_OFFLINE;
     rec->action_info.page_retire.mfn = mfn;
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index aa9304f..53ca29c 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -205,7 +205,7 @@ static void mca_init_bank(enum mca_source who,
     if (!mi)
         return;
 
-    mib = x86_mcinfo_reserve(mi, sizeof(*mib));
+    mib = x86_mcinfo_reserve(mi, sizeof(*mib), MC_TYPE_BANK);
     if (!mib)
     {
         mi->flags |= MCINFO_FLAGS_UNCOMPLETE;
@@ -214,8 +214,6 @@ static void mca_init_bank(enum mca_source who,
 
     mib->mc_status = mca_rdmsr(MSR_IA32_MCx_STATUS(bank));
 
-    mib->common.type = MC_TYPE_BANK;
-    mib->common.size = sizeof (struct mcinfo_bank);
     mib->mc_bank = bank;
     mib->mc_domid = DOMID_INVALID;
 
@@ -251,8 +249,6 @@ static int mca_init_global(uint32_t flags, struct mcinfo_global *mig)
     const struct vcpu *curr = current;
 
     /* Set global information */
-    mig->common.type = MC_TYPE_GLOBAL;
-    mig->common.size = sizeof (struct mcinfo_global);
     status = mca_rdmsr(MSR_IA32_MCG_STATUS);
     mig->mc_gstatus = status;
     mig->mc_domid = DOMID_INVALID;
@@ -349,7 +345,7 @@ mcheck_mca_logout(enum mca_source who, struct mca_banks *bankmask,
             if ( (mctc = mctelem_reserve(which)) != NULL ) {
                 mci = mctelem_dataptr(mctc);
                 mcinfo_clear(mci);
-                mig = x86_mcinfo_reserve(mci, sizeof(*mig));
+                mig = x86_mcinfo_reserve(mci, sizeof(*mig), MC_TYPE_GLOBAL);
                 /* mc_info should at least hold up the global information */
                 ASSERT(mig);
                 mca_init_global(mc_flags, mig);
@@ -805,7 +801,8 @@ static void mcinfo_clear(struct mc_info *mi)
     x86_mcinfo_nentries(mi) = 0;
 }
 
-void *x86_mcinfo_reserve(struct mc_info *mi, int size)
+void *x86_mcinfo_reserve(struct mc_info *mi,
+                         unsigned int size, unsigned int type)
 {
     int i;
     unsigned long end1, end2;
@@ -832,7 +829,11 @@ void *x86_mcinfo_reserve(struct mc_info *mi, int size)
     /* there's enough space. add entry. */
     x86_mcinfo_nentries(mi)++;
 
-    return memset(mic_index, 0, size);
+    memset(mic_index, 0, size);
+    mic_index->size = size;
+    mic_index->type = type;
+
+    return mic_index;
 }
 
 static void x86_mcinfo_apei_save(
diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
index a28e862..b479b20 100644
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -150,7 +150,8 @@ typedef struct mcinfo_extended *(*x86_mce_callback_t)
     (struct mc_info *, uint16_t, uint64_t);
 extern void x86_mce_callback_register(x86_mce_callback_t);
 
-void *x86_mcinfo_reserve(struct mc_info *mi, int size);
+void *x86_mcinfo_reserve(struct mc_info *mi,
+                         unsigned int size, unsigned int type);
 void x86_mcinfo_dump(struct mc_info *mi);
 
 static inline int mce_vendor_bank_msr(const struct vcpu *v, uint32_t msr)
diff --git a/xen/arch/x86/cpu/mcheck/mce_amd.c b/xen/arch/x86/cpu/mcheck/mce_amd.c
index 599e465..fe51be9 100644
--- a/xen/arch/x86/cpu/mcheck/mce_amd.c
+++ b/xen/arch/x86/cpu/mcheck/mce_amd.c
@@ -218,15 +218,13 @@ amd_f10_handler(struct mc_info *mi, uint16_t bank, uint64_t status)
     if ( !(status & MCi_STATUS_MISCV) )
         return NULL;
 
-    mc_ext = x86_mcinfo_reserve(mi, sizeof(*mc_ext));
+    mc_ext = x86_mcinfo_reserve(mi, sizeof(*mc_ext), MC_TYPE_EXTENDED);
     if ( !mc_ext )
     {
         mi->flags |= MCINFO_FLAGS_UNCOMPLETE;
         return NULL;
     }
 
-    mc_ext->common.type = MC_TYPE_EXTENDED;
-    mc_ext->common.size = sizeof(*mc_ext);
     mc_ext->mc_msrs = 3;
 
     mc_ext->mc_msr[0].reg = MSR_F10_MC4_MISC1;
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 005e41d..189c1ed 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -200,17 +200,13 @@ intel_get_extended_msrs(struct mcinfo_global *mig, struct mc_info *mi)
             !(mig->mc_gstatus & MCG_STATUS_EIPV))
         return NULL;
 
-    mc_ext = x86_mcinfo_reserve(mi, sizeof(*mc_ext));
+    mc_ext = x86_mcinfo_reserve(mi, sizeof(*mc_ext), MC_TYPE_EXTENDED);
     if (!mc_ext)
     {
         mi->flags |= MCINFO_FLAGS_UNCOMPLETE;
         return NULL;
     }
 
-    /* this function will called when CAP(9).MCG_EXT_P = 1 */
-    mc_ext->common.type = MC_TYPE_EXTENDED;
-    mc_ext->common.size = sizeof(struct mcinfo_extended);
-
     for (i = MSR_IA32_MCG_EAX; i <= MSR_IA32_MCG_MISC; i++)
         intel_get_extended_msr(mc_ext, i);
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/7] x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case
  2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
                   ` (2 preceding siblings ...)
  2017-02-27  5:09 ` [PATCH v2 3/7] x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve() Haozhong Zhang
@ 2017-02-27  5:09 ` Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 5/7] x86/mce: clear MSR_IA32_MCG_STATUS by writing 0 Haozhong Zhang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Christoph Egger, Jan Beulich, Andrew Cooper

The current implementation only fills MC MSRs on vcpu0 and leaves MC
MSRs on other vcpus empty in the broadcast case. When guest reads 0
from MSR_IA32_MCG_STATUS on vcpuN (N > 0), it may think it's not
possible to recover the execution on that vcpu and then get panic,
although MSR_IA32_MCG_STATUS filled on vcpu0 may imply the injected
vMCE is actually recoverable. To avoid such unnecessary guest panic,
set MSR_IA32_MCG_STATUS on vcpuN (N > 0) to MCG_STATUS_MCIP|MCG_STATUS_RIPV.

In addition, fill_vmsr_data(mc_bank, ...) is changed to return -EINVAL
rather than 0, if an invalid domain ID is contained in mc_bank.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mcaction.c | 16 ++++-----
 xen/arch/x86/cpu/mcheck/vmce.c     | 74 ++++++++++++++++++++++++++------------
 xen/arch/x86/cpu/mcheck/vmce.h     |  2 +-
 3 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c
index 32056f2..dab9eac 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -88,22 +88,22 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
                     goto vmce_failed;
                 }
 
+                if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
+                    global->mc_vcpuid == XEN_MC_VCPUID_INVALID)
+                    vmce_vcpuid = VMCE_INJECT_BROADCAST;
+                else
+                    vmce_vcpuid = global->mc_vcpuid;
+
                 bank->mc_addr = gfn << PAGE_SHIFT |
                   (bank->mc_addr & (PAGE_SIZE -1 ));
-                if ( fill_vmsr_data(bank, d,
-                      global->mc_gstatus) == -1 )
+                if (fill_vmsr_data(bank, d, global->mc_gstatus,
+                                   vmce_vcpuid == VMCE_INJECT_BROADCAST))
                 {
                     mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d "
                       "failed\n", bank->mc_domid);
                     goto vmce_failed;
                 }
 
-                if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
-                    global->mc_vcpuid == XEN_MC_VCPUID_INVALID)
-                    vmce_vcpuid = VMCE_INJECT_BROADCAST;
-                else
-                    vmce_vcpuid = global->mc_vcpuid;
-
                 /* We will inject vMCE to DOMU*/
                 if ( inject_vmce(d, vmce_vcpuid) < 0 )
                 {
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 13b692c..2caccae 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -381,38 +381,66 @@ int inject_vmce(struct domain *d, int vcpu)
     return ret;
 }
 
-int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-                   uint64_t gstatus)
+static int vcpu_fill_mc_msrs(struct vcpu *v, uint64_t mcg_status,
+                             uint64_t mci_status, uint64_t mci_addr,
+                             uint64_t mci_misc)
 {
-    struct vcpu *v = d->vcpu[0];
-
-    if ( mc_bank->mc_domid != DOMID_INVALID )
+    if ( v->arch.vmce.mcg_status & MCG_STATUS_MCIP )
     {
-        if ( v->arch.vmce.mcg_status & MCG_STATUS_MCIP )
-        {
-            mce_printk(MCE_QUIET, "MCE: guest has not handled previous"
-                       " vMCE yet!\n");
-            return -1;
-        }
+        mce_printk(MCE_QUIET, "MCE: %pv: guest has not handled previous"
+                   " vMCE yet!\n", v);
+        return -EBUSY;
+    }
 
-        spin_lock(&v->arch.vmce.lock);
+    spin_lock(&v->arch.vmce.lock);
 
-        v->arch.vmce.mcg_status = gstatus;
-        /*
-         * 1. Skip bank 0 to avoid 'bank 0 quirk' of old processors
-         * 2. Filter MCi_STATUS MSCOD model specific error code to guest
-         */
-        v->arch.vmce.bank[1].mci_status = mc_bank->mc_status &
-                                              MCi_STATUS_MSCOD_MASK;
-        v->arch.vmce.bank[1].mci_addr = mc_bank->mc_addr;
-        v->arch.vmce.bank[1].mci_misc = mc_bank->mc_misc;
+    v->arch.vmce.mcg_status = mcg_status;
+    /*
+     * 1. Skip bank 0 to avoid 'bank 0 quirk' of old processors
+     * 2. Filter MCi_STATUS MSCOD model specific error code to guest
+     */
+    v->arch.vmce.bank[1].mci_status = mci_status & MCi_STATUS_MSCOD_MASK;
+    v->arch.vmce.bank[1].mci_addr = mci_addr;
+    v->arch.vmce.bank[1].mci_misc = mci_misc;
 
-        spin_unlock(&v->arch.vmce.lock);
-    }
+    spin_unlock(&v->arch.vmce.lock);
 
     return 0;
 }
 
+int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
+                   uint64_t gstatus, bool broadcast)
+{
+    struct vcpu *v = d->vcpu[0];
+    int ret, err;
+
+    if ( mc_bank->mc_domid == DOMID_INVALID )
+        return -EINVAL;
+
+    /*
+     * vMCE with the actual error information is injected to vCPU0,
+     * and, if broadcast is required, we choose to inject less severe
+     * vMCEs to other vCPUs. Thus guest can always get the severest
+     * error (i.e. the actual one) on vCPU0. If guest can recover from
+     * the severest error on vCPU0, the less severe errors on other
+     * vCPUs will not prevent guest from recovering on those vCPUs.
+     */
+    ret = vcpu_fill_mc_msrs(v, gstatus, mc_bank->mc_status,
+                            mc_bank->mc_addr, mc_bank->mc_misc);
+    if ( broadcast )
+        for_each_vcpu ( d, v )
+        {
+            if ( !v->vcpu_id )
+                continue;
+            err = vcpu_fill_mc_msrs(v, MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+                                    0, 0, 0);
+            if ( err )
+                ret = err;
+        }
+
+    return ret;
+}
+
 /* It's said some ram is setup as mmio_direct for UC cache attribute */
 #define P2M_UNMAP_TYPES (p2m_to_mask(p2m_ram_rw) \
                                 | p2m_to_mask(p2m_ram_logdirty) \
diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h
index 163ce3c..74f6381 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.h
+++ b/xen/arch/x86/cpu/mcheck/vmce.h
@@ -17,7 +17,7 @@ int vmce_amd_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val);
 int vmce_amd_wrmsr(struct vcpu *, uint32_t msr, uint64_t val);
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-    uint64_t gstatus);
+                   uint64_t gstatus, bool broadcast);
 
 #define VMCE_INJECT_BROADCAST (-1)
 int inject_vmce(struct domain *d, int vcpu);
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/7] x86/mce: clear MSR_IA32_MCG_STATUS by writing 0
  2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
                   ` (3 preceding siblings ...)
  2017-02-27  5:09 ` [PATCH v2 4/7] x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case Haozhong Zhang
@ 2017-02-27  5:09 ` Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 6/7] xen/mce: remove ASSERT's about mce_dhandler_num in mce_action() Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 7/7] tools/xen-mceinj: fix the type of cpu number Haozhong Zhang
  6 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Boris Ostrovsky, Christoph Egger, Jan Beulich,
	Andrew Cooper

On Intel CPU, an attemp to write to MSR_IA32_MCG_STATUS with any
non-zero value would result in #GP.

This commit writes 0 on AMD CPU as well instead of just clearing MCIP
bit, because all non-reserved bits of MSR_IA32_MCG_STATUS have been
handled at this point.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/cpu/mcheck/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 53ca29c..5a7e2ba 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -539,7 +539,7 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
     gstatus = mca_rdmsr(MSR_IA32_MCG_STATUS);
     if ((gstatus & MCG_STATUS_MCIP) != 0) {
         mce_printk(MCE_CRITICAL, "MCE: Clear MCIP@ last step");
-        mca_wrmsr(MSR_IA32_MCG_STATUS, gstatus & ~MCG_STATUS_MCIP);
+        mca_wrmsr(MSR_IA32_MCG_STATUS, 0);
     }
     mce_barrier_exit(&mce_trap_bar);
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/7] xen/mce: remove ASSERT's about mce_dhandler_num in mce_action()
  2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
                   ` (4 preceding siblings ...)
  2017-02-27  5:09 ` [PATCH v2 5/7] x86/mce: clear MSR_IA32_MCG_STATUS by writing 0 Haozhong Zhang
@ 2017-02-27  5:09 ` Haozhong Zhang
  2017-02-28 15:10   ` Jan Beulich
  2017-03-08  1:58   ` [PATCH v3 6/7] xen/mce: remove ASSERT's about mce_[u|d]handler_num " Haozhong Zhang
  2017-02-27  5:09 ` [PATCH v2 7/7] tools/xen-mceinj: fix the type of cpu number Haozhong Zhang
  6 siblings, 2 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Christoph Egger, Jan Beulich, Andrew Cooper

The current production build works fine even though those ASSERT's are
violated. Remove them to make the debug build work as well.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v1:
 * Remove those ASSERT's rather than making them Intel-only.
---
 xen/arch/x86/cpu/mcheck/mce.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 5a7e2ba..a42cabd 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1621,9 +1621,6 @@ static enum mce_result mce_action(const struct cpu_user_regs *regs,
         handlers = mce_uhandlers;
     }
 
-    /* At least a default handler should be registerd */
-    ASSERT(handler_num);
-
     local_mi = (struct mc_info*)mctelem_dataptr(mctc);
     x86_mcinfo_lookup(mic, local_mi, MC_TYPE_GLOBAL);
     if (mic == NULL) {
@@ -1656,7 +1653,6 @@ static enum mce_result mce_action(const struct cpu_user_regs *regs,
                 break;
             }
         }
-        ASSERT(i != handler_num);
     }
 
     return worst_result;
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 7/7] tools/xen-mceinj: fix the type of cpu number
  2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
                   ` (5 preceding siblings ...)
  2017-02-27  5:09 ` [PATCH v2 6/7] xen/mce: remove ASSERT's about mce_dhandler_num in mce_action() Haozhong Zhang
@ 2017-02-27  5:09 ` Haozhong Zhang
  6 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-02-27  5:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Ian Jackson, Wei Liu

Use "unsigned int" rather than "int" to align to the type "uint32_t"
of xen_mc_physcpuinfo.ncpus.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/tests/mce-test/tools/xen-mceinj.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/tests/mce-test/tools/xen-mceinj.c b/tools/tests/mce-test/tools/xen-mceinj.c
index 51abc8a..bae5a46 100644
--- a/tools/tests/mce-test/tools/xen-mceinj.c
+++ b/tools/tests/mce-test/tools/xen-mceinj.c
@@ -161,7 +161,7 @@ static int flush_msr_inj(xc_interface *xc_handle)
     return xc_mca_op(xc_handle, &mc);
 }
 
-static int mca_cpuinfo(xc_interface *xc_handle)
+static unsigned int mca_cpuinfo(xc_interface *xc_handle)
 {
     struct xen_mc mc;
 
@@ -176,16 +176,18 @@ static int mca_cpuinfo(xc_interface *xc_handle)
         return 0;
 }
 
-static int inject_cmci(xc_interface *xc_handle, int cpu_nr)
+static int inject_cmci(xc_interface *xc_handle, unsigned int cpu_nr)
 {
     struct xen_mc mc;
-    int nr_cpus;
+    unsigned int nr_cpus;
 
     memset(&mc, 0, sizeof(struct xen_mc));
 
     nr_cpus = mca_cpuinfo(xc_handle);
     if (!nr_cpus)
         err(xc_handle, "Failed to get mca_cpuinfo");
+    if (cpu_nr >= nr_cpus)
+        err(xc_handle, "-c %u is larger than %u", cpu_nr, nr_cpus - 1);
 
     mc.cmd = XEN_MC_inject_v2;
     mc.interface_version = XEN_MCA_INTERFACE_VERSION;
@@ -420,7 +422,7 @@ int main(int argc, char *argv[])
     int c, opt_index;
     uint32_t domid;
     xc_interface *xc_handle;
-    int cpu_nr;
+    unsigned int cpu_nr;
     uint64_t gaddr, max_gpa;
 
     /* Default Value */
@@ -444,7 +446,7 @@ int main(int argc, char *argv[])
             dump=1;
             break;
         case 'c':
-            cpu_nr = strtol(optarg, &optarg, 10);
+            cpu_nr = strtoul(optarg, &optarg, 10);
             if ( strlen(optarg) != 0 )
                 err(xc_handle, "Please input a digit parameter for CPU");
             break;
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 6/7] xen/mce: remove ASSERT's about mce_dhandler_num in mce_action()
  2017-02-27  5:09 ` [PATCH v2 6/7] xen/mce: remove ASSERT's about mce_dhandler_num in mce_action() Haozhong Zhang
@ 2017-02-28 15:10   ` Jan Beulich
  2017-03-08  1:58   ` [PATCH v3 6/7] xen/mce: remove ASSERT's about mce_[u|d]handler_num " Haozhong Zhang
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2017-02-28 15:10 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Christoph Egger, xen-devel

>>> On 27.02.17 at 06:09, <haozhong.zhang@intel.com> wrote:
> The current production build works fine even though those ASSERT's are
> violated.

This - without any evidence - is rather weak an argument. Code
inspection, otoh, pretty clearly tells us the ASSERT()s aren't
needed, as there is no dependency on the asserted expressions
to be true. Therefore, while the change itself is fine, may I ask
for a better commit message?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/7] xen/mce: adjust comment of callback register functions
  2017-02-27  5:09 ` [PATCH v2 1/7] xen/mce: adjust comment of callback register functions Haozhong Zhang
@ 2017-03-01 22:10   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 12+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-03-01 22:10 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Christoph Egger, Jan Beulich, xen-devel

On Mon, Feb 27, 2017 at 01:09:06PM +0800, Haozhong Zhang wrote:
> c/s e966818264908e842e2847f579ca4d94e586eaac added
> mce_need_clearbank_register below the comment of
> x86_mce_callback_register(). This commit (1) adjusts the first
> paragraph of comment to be a general statement of all callback
> register functions, and (2) moves the second paragraph to the
> front of x86_mce_callback_register().
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 6/7] xen/mce: remove ASSERT's about mce_[u|d]handler_num in mce_action()
  2017-02-27  5:09 ` [PATCH v2 6/7] xen/mce: remove ASSERT's about mce_dhandler_num in mce_action() Haozhong Zhang
  2017-02-28 15:10   ` Jan Beulich
@ 2017-03-08  1:58   ` Haozhong Zhang
  2017-03-08  8:25     ` Jan Beulich
  1 sibling, 1 reply; 12+ messages in thread
From: Haozhong Zhang @ 2017-03-08  1:58 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Christoph Egger, Jan Beulich, Andrew Cooper

Those assertions as well as mce_[u|d]handlers[], mce_[u|d]handler_num
and mce_action() were intel only and lifted to the common code by c/s
3a91769d6e1. However, MCE handling on AMD does not use mce_[u|d]handlers[]
before and after that commit, so assertions in mce_action() about their
size do not make sense for AMD. To be worse, they can crash the debug
build on AMD. Remove them to make the debug build work on AMD.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes in v3:
 * Explain the reason in the commit message.

Because other patches in v2 have got R-b or A-b, I only send v3 patch 6.
---
 xen/arch/x86/cpu/mcheck/mce.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 35117f8..cd4f0ee 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1616,9 +1616,6 @@ static enum mce_result mce_action(const struct cpu_user_regs *regs,
         handlers = mce_uhandlers;
     }
 
-    /* At least a default handler should be registerd */
-    ASSERT(handler_num);
-
     local_mi = (struct mc_info*)mctelem_dataptr(mctc);
     x86_mcinfo_lookup(mic, local_mi, MC_TYPE_GLOBAL);
     if (mic == NULL) {
@@ -1651,7 +1648,6 @@ static enum mce_result mce_action(const struct cpu_user_regs *regs,
                 break;
             }
         }
-        ASSERT(i != handler_num);
     }
 
     return worst_result;
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 6/7] xen/mce: remove ASSERT's about mce_[u|d]handler_num in mce_action()
  2017-03-08  1:58   ` [PATCH v3 6/7] xen/mce: remove ASSERT's about mce_[u|d]handler_num " Haozhong Zhang
@ 2017-03-08  8:25     ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2017-03-08  8:25 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Christoph Egger, xen-devel

>>> On 08.03.17 at 02:58, <haozhong.zhang@intel.com> wrote:
> Those assertions as well as mce_[u|d]handlers[], mce_[u|d]handler_num
> and mce_action() were intel only and lifted to the common code by c/s
> 3a91769d6e1. However, MCE handling on AMD does not use mce_[u|d]handlers[]
> before and after that commit, so assertions in mce_action() about their
> size do not make sense for AMD. To be worse, they can crash the debug
> build on AMD. Remove them to make the debug build work on AMD.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-03-08  8:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27  5:09 [PATCH v2 0/7] MCE code cleanup and bugfix Haozhong Zhang
2017-02-27  5:09 ` [PATCH v2 1/7] xen/mce: adjust comment of callback register functions Haozhong Zhang
2017-03-01 22:10   ` Konrad Rzeszutek Wilk
2017-02-27  5:09 ` [PATCH v2 2/7] xen/mce: remove unused x86_mcinfo_add() Haozhong Zhang
2017-02-27  5:09 ` [PATCH v2 3/7] x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve() Haozhong Zhang
2017-02-27  5:09 ` [PATCH v2 4/7] x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case Haozhong Zhang
2017-02-27  5:09 ` [PATCH v2 5/7] x86/mce: clear MSR_IA32_MCG_STATUS by writing 0 Haozhong Zhang
2017-02-27  5:09 ` [PATCH v2 6/7] xen/mce: remove ASSERT's about mce_dhandler_num in mce_action() Haozhong Zhang
2017-02-28 15:10   ` Jan Beulich
2017-03-08  1:58   ` [PATCH v3 6/7] xen/mce: remove ASSERT's about mce_[u|d]handler_num " Haozhong Zhang
2017-03-08  8:25     ` Jan Beulich
2017-02-27  5:09 ` [PATCH v2 7/7] tools/xen-mceinj: fix the type of cpu number Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.