All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen vMCE RFC V0.2] xen vMCE design
@ 2012-06-27  3:51 Liu, Jinsong
  2012-06-27 13:14 ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Liu, Jinsong @ 2012-06-27  3:51 UTC (permalink / raw)
  To: Jan Beulich, Keir Fraser, Auld, Will
  Cc: Luck, Tony, Raj, Ashok, Jiang, Yunhong, Dugger, Donald D,
	Nakajima, Jun, Zhang, Xiantao, Shan, Haitao, Li, Susie,
	xen-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 315 bytes --]

Hi,

This is updated xen vMCE design foils, according to comments from community recently.

This foils focus on vMCE part of Xen MCA, so as Keir said, it's some dense.
Later Will will present a document to elaborate more, including Intel MCA and surrounding features and Xen implementation.

Thanks,
Jinsong

[-- Attachment #2: xen vMCE design (v0 2).pdf --]
[-- Type: application/pdf, Size: 256622 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-27  3:51 [xen vMCE RFC V0.2] xen vMCE design Liu, Jinsong
@ 2012-06-27 13:14 ` Jan Beulich
  2012-06-28  8:54   ` Liu, Jinsong
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2012-06-27 13:14 UTC (permalink / raw)
  To: Jinsong Liu, Will Auld
  Cc: Ashok Raj, Donald D Dugger, Haitao Shan, Jun Nakajima, Susie Li,
	Tony Luck, Xiantao Zhang, Yunhong Jiang, xen-devel, linux-kernel,
	Keir Fraser

>>> On 27.06.12 at 05:51, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> This is updated xen vMCE design foils, according to comments from community 
> recently.
> 
> This foils focus on vMCE part of Xen MCA, so as Keir said, it's some dense.
> Later Will will present a document to elaborate more, including Intel MCA 
> and surrounding features and Xen implementation.

For MCi_CTL2 you probably meant to say "allow setting CMCI_EN
and error count threshold"?

The 2-bank approach also needs to be brought in line with the
current host derived value in MCG_CAP, especially if the code to
implement this new model doesn't make it into 4.2 (which would
generally save a larger value).

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-27 13:14 ` Jan Beulich
@ 2012-06-28  8:54   ` Liu, Jinsong
  2012-06-28  9:08     ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Liu, Jinsong @ 2012-06-28  8:54 UTC (permalink / raw)
  To: Jan Beulich, Auld, Will
  Cc: Raj, Ashok, Dugger, Donald D, Shan, Haitao, Nakajima, Jun, Li,
	Susie, Luck, Tony, Zhang, Xiantao, Jiang, Yunhong, xen-devel,
	linux-kernel, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

Jan Beulich wrote:
>>>> On 27.06.12 at 05:51, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> This is updated xen vMCE design foils, according to comments from
>> community recently. 
>> 
>> This foils focus on vMCE part of Xen MCA, so as Keir said, it's some
>> dense. Later Will will present a document to elaborate more,
>> including Intel MCA and surrounding features and Xen implementation.
> 
> For MCi_CTL2 you probably meant to say "allow setting CMCI_EN
> and error count threshold"?

Yes, exactly! the words is w/ subtle but important different meaning.

> 
> The 2-bank approach also needs to be brought in line with the
> current host derived value in MCG_CAP, especially if the code to
> implement this new model doesn't make it into 4.2 (which would
> generally save a larger value).
> 
> Jan

Let me repeat in my word to avoid misunderstanding about your concern:
Your concern rooted from the history patch (c/s 24887, as attached) which used to solve vMCE migration issue caused from bank number. I guess the patch was not in xen4.1.x but would be in xen 4.2 release recently (right? and when will xen 4.2 release?)
Per my understanding, you want us to make sure our new vMCE model compatible w/ olde vMCE. For example if our patch in xen 4.3 release, you want to make sure a guest migrate from xen 4.2 to 4.3 would not broken. Is this your concern?

Thanks,
Jinsong

[-- Attachment #2: Type: message/rfc822, Size: 44396 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 17119 bytes --]

This allows migration to a host with less MCA banks than the source
host had, while without this patch accesses to the excess banks' MSRs
caused #GP-s in the guest after migration (and it depended on the guest
kernel whether this would be fatal).

A fundamental question is whether we should also save/restore MCG_CTL
and MCi_CTL, as the HVM save record would better be defined to the
complete state that needs saving from the beginning (I'm unsure whether
the save/restore logic allows for future extension of an existing
record).

Of course, this change is expected to make migration from new to older
Xen impossible (again I'm unsure what the save/restore logic does with
records it doesn't even know about).

The (trivial) tools side change may seem unrelated, but the code should
have been that way from the beginning to allow the hypervisor to look
at currently unused ext_vcpucontext fields without risking to read
garbage when those fields get a meaning assigned in the future. This
isn't being enforced here - should it be? (Obviously, for backwards
compatibility, the hypervisor must assume these fields to be clear only
when the extended context's size exceeds the old original one.)

A future addition to this change might be to allow configuration of the
number of banks and other MCA capabilities for a guest before it starts
(i.e. to not inherits the values seen on the first host it runs on).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1879,6 +1879,7 @@ int xc_domain_save(xc_interface *xch, in
 
         domctl.cmd = XEN_DOMCTL_get_ext_vcpucontext;
         domctl.domain = dom;
+        memset(&domctl.u, 0, sizeof(domctl.u));
         domctl.u.ext_vcpucontext.vcpu = i;
         if ( xc_domctl(xch, &domctl) < 0 )
         {
--- a/tools/misc/xen-hvmctx.c
+++ b/tools/misc/xen-hvmctx.c
@@ -383,6 +383,13 @@ static void dump_viridian_vcpu(void)
            (unsigned long long) p.apic_assist);           
 }
 
+static void dump_vmce_vcpu(void)
+{
+    HVM_SAVE_TYPE(VMCE_VCPU) p;
+    READ(p);
+    printf("    VMCE_VCPU: caps %" PRIx64 "\n", p.caps);
+}
+
 int main(int argc, char **argv)
 {
     int entry, domid;
@@ -449,6 +456,7 @@ int main(int argc, char **argv)
         case HVM_SAVE_CODE(MTRR): dump_mtrr(); break;
         case HVM_SAVE_CODE(VIRIDIAN_DOMAIN): dump_viridian_domain(); break;
         case HVM_SAVE_CODE(VIRIDIAN_VCPU): dump_viridian_vcpu(); break;
+        case HVM_SAVE_CODE(VMCE_VCPU): dump_vmce_vcpu(); break;
         case HVM_SAVE_CODE(END): break;
         default:
             printf(" ** Don't understand type %u: skipping\n",
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -3,6 +3,7 @@
 #define _MCE_H
 
 #include <xen/init.h>
+#include <xen/sched.h>
 #include <xen/smp.h>
 #include <asm/types.h>
 #include <asm/traps.h>
@@ -54,8 +55,8 @@ int unmmap_broken_page(struct domain *d,
 u64 mce_cap_init(void);
 extern unsigned int firstbank;
 
-int intel_mce_rdmsr(uint32_t msr, uint64_t *val);
-int intel_mce_wrmsr(uint32_t msr, uint64_t val);
+int intel_mce_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val);
+int intel_mce_wrmsr(struct vcpu *, uint32_t msr, uint64_t val);
 
 struct mcinfo_extended *intel_get_extended_msrs(
     struct mcinfo_global *mig, struct mc_info *mi);
@@ -171,18 +172,20 @@ int vmce_domain_inject(struct mcinfo_ban
 
 extern int vmce_init(struct cpuinfo_x86 *c);
 
-static inline int mce_vendor_bank_msr(uint32_t msr)
+static inline int mce_vendor_bank_msr(const struct vcpu *v, uint32_t msr)
 {
     if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
-         msr >= MSR_IA32_MC0_CTL2 && msr < (MSR_IA32_MC0_CTL2 + nr_mce_banks) )
+         msr >= MSR_IA32_MC0_CTL2 &&
+         msr < MSR_IA32_MCx_CTL2(v->arch.mcg_cap & MCG_CAP_COUNT) )
           return 1;
     return 0;
 }
 
-static inline int mce_bank_msr(uint32_t msr)
+static inline int mce_bank_msr(const struct vcpu *v, uint32_t msr)
 {
-    if ( (msr >= MSR_IA32_MC0_CTL && msr < MSR_IA32_MCx_CTL(nr_mce_banks)) ||
-        mce_vendor_bank_msr(msr) )
+    if ( (msr >= MSR_IA32_MC0_CTL &&
+          msr < MSR_IA32_MCx_CTL(v->arch.mcg_cap & MCG_CAP_COUNT)) ||
+         mce_vendor_bank_msr(v, msr) )
         return 1;
     return 0;
 }
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -1421,11 +1421,12 @@ enum mcheck_type intel_mcheck_init(struc
 }
 
 /* intel specific MCA MSR */
-int intel_mce_wrmsr(uint32_t msr, uint64_t val)
+int intel_mce_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
     int ret = 0;
 
-    if (msr >= MSR_IA32_MC0_CTL2 && msr < (MSR_IA32_MC0_CTL2 + nr_mce_banks))
+    if ( msr >= MSR_IA32_MC0_CTL2 &&
+         msr < MSR_IA32_MCx_CTL2(v->arch.mcg_cap & MCG_CAP_COUNT) )
     {
         mce_printk(MCE_QUIET, "We have disabled CMCI capability, "
                  "Guest should not write this MSR!\n");
@@ -1435,11 +1436,12 @@ int intel_mce_wrmsr(uint32_t msr, uint64
     return ret;
 }
 
-int intel_mce_rdmsr(uint32_t msr, uint64_t *val)
+int intel_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
 {
     int ret = 0;
 
-    if (msr >= MSR_IA32_MC0_CTL2 && msr < (MSR_IA32_MC0_CTL2 + nr_mce_banks))
+    if ( msr >= MSR_IA32_MC0_CTL2 &&
+         msr < MSR_IA32_MCx_CTL2(v->arch.mcg_cap & MCG_CAP_COUNT) )
     {
         mce_printk(MCE_QUIET, "We have disabled CMCI capability, "
                  "Guest should not read this MSR!\n");
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -10,6 +10,7 @@
 #include <xen/delay.h>
 #include <xen/smp.h>
 #include <xen/mm.h>
+#include <xen/hvm/save.h>
 #include <asm/processor.h>
 #include <public/sysctl.h>
 #include <asm/system.h>
@@ -42,7 +43,6 @@ int vmce_init_msr(struct domain *d)
            nr_mce_banks * sizeof(*dom_vmce(d)->mci_ctl));
 
     dom_vmce(d)->mcg_status = 0x0;
-    dom_vmce(d)->mcg_cap = g_mcg_cap;
     dom_vmce(d)->mcg_ctl = ~(uint64_t)0x0;
     dom_vmce(d)->nr_injection = 0;
 
@@ -61,21 +61,41 @@ void vmce_destroy_msr(struct domain *d)
     dom_vmce(d) = NULL;
 }
 
-static int bank_mce_rdmsr(struct domain *d, uint32_t msr, uint64_t *val)
+void vmce_init_vcpu(struct vcpu *v)
 {
-    int bank, ret = 1;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    v->arch.mcg_cap = g_mcg_cap;
+}
+
+int vmce_restore_vcpu(struct vcpu *v, uint64_t caps)
+{
+    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P )
+    {
+        dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA capabilities"
+                " %#" PRIx64 " for d%d:v%u (supported: %#Lx)\n",
+                is_hvm_vcpu(v) ? "HVM" : "PV", caps, v->domain->domain_id,
+                v->vcpu_id, g_mcg_cap & ~MCG_CAP_COUNT);
+        return -EPERM;
+    }
+
+    v->arch.mcg_cap = caps;
+    return 0;
+}
+
+static int bank_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
+{
+    int ret = 1;
+    unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
+    struct domain_mca_msrs *vmce = dom_vmce(v->domain);
     struct bank_entry *entry;
 
-    bank = (msr - MSR_IA32_MC0_CTL) / 4;
-    if ( bank >= nr_mce_banks )
-        return -1;
+    *val = 0;
 
     switch ( msr & (MSR_IA32_MC0_CTL | 3) )
     {
     case MSR_IA32_MC0_CTL:
-        *val = vmce->mci_ctl[bank] &
-            (h_mci_ctrl ? h_mci_ctrl[bank] : ~0UL);
+        if ( bank < nr_mce_banks )
+            *val = vmce->mci_ctl[bank] &
+                   (h_mci_ctrl ? h_mci_ctrl[bank] : ~0UL);
         mce_printk(MCE_VERBOSE, "MCE: rdmsr MC%u_CTL 0x%"PRIx64"\n",
                    bank, *val);
         break;
@@ -126,7 +146,7 @@ static int bank_mce_rdmsr(struct domain 
         switch ( boot_cpu_data.x86_vendor )
         {
         case X86_VENDOR_INTEL:
-            ret = intel_mce_rdmsr(msr, val);
+            ret = intel_mce_rdmsr(v, msr, val);
             break;
         default:
             ret = 0;
@@ -145,13 +165,13 @@ static int bank_mce_rdmsr(struct domain 
  */
 int vmce_rdmsr(uint32_t msr, uint64_t *val)
 {
-    struct domain *d = current->domain;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    const struct vcpu *cur = current;
+    struct domain_mca_msrs *vmce = dom_vmce(cur->domain);
     int ret = 1;
 
     *val = 0;
 
-    spin_lock(&dom_vmce(d)->lock);
+    spin_lock(&vmce->lock);
 
     switch ( msr )
     {
@@ -162,39 +182,38 @@ int vmce_rdmsr(uint32_t msr, uint64_t *v
                        "MCE: rdmsr MCG_STATUS 0x%"PRIx64"\n", *val);
         break;
     case MSR_IA32_MCG_CAP:
-        *val = vmce->mcg_cap;
+        *val = cur->arch.mcg_cap;
         mce_printk(MCE_VERBOSE, "MCE: rdmsr MCG_CAP 0x%"PRIx64"\n",
                    *val);
         break;
     case MSR_IA32_MCG_CTL:
         /* Always 0 if no CTL support */
-        *val = vmce->mcg_ctl & h_mcg_ctl;
+        if ( cur->arch.mcg_cap & MCG_CTL_P )
+            *val = vmce->mcg_ctl & h_mcg_ctl;
         mce_printk(MCE_VERBOSE, "MCE: rdmsr MCG_CTL 0x%"PRIx64"\n",
                    *val);
         break;
     default:
-        ret = mce_bank_msr(msr) ? bank_mce_rdmsr(d, msr, val) : 0;
+        ret = mce_bank_msr(cur, msr) ? bank_mce_rdmsr(cur, msr, val) : 0;
         break;
     }
 
-    spin_unlock(&dom_vmce(d)->lock);
+    spin_unlock(&vmce->lock);
     return ret;
 }
 
-static int bank_mce_wrmsr(struct domain *d, u32 msr, u64 val)
+static int bank_mce_wrmsr(struct vcpu *v, u32 msr, u64 val)
 {
-    int bank, ret = 1;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    int ret = 1;
+    unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
+    struct domain_mca_msrs *vmce = dom_vmce(v->domain);
     struct bank_entry *entry = NULL;
 
-    bank = (msr - MSR_IA32_MC0_CTL) / 4;
-    if ( bank >= nr_mce_banks )
-        return -EINVAL;
-
     switch ( msr & (MSR_IA32_MC0_CTL | 3) )
     {
     case MSR_IA32_MC0_CTL:
-        vmce->mci_ctl[bank] = val;
+        if ( bank < nr_mce_banks )
+            vmce->mci_ctl[bank] = val;
         break;
     case MSR_IA32_MC0_STATUS:
         /* Give the first entry of the list, it corresponds to current
@@ -202,9 +221,9 @@ static int bank_mce_wrmsr(struct domain 
          * the guest, this node will be deleted.
          * Only error bank is written. Non-error banks simply return.
          */
-        if ( !list_empty(&dom_vmce(d)->impact_header) )
+        if ( !list_empty(&vmce->impact_header) )
         {
-            entry = list_entry(dom_vmce(d)->impact_header.next,
+            entry = list_entry(vmce->impact_header.next,
                                struct bank_entry, list);
             if ( entry->bank == bank )
                 entry->mci_status = val;
@@ -228,7 +247,7 @@ static int bank_mce_wrmsr(struct domain 
         switch ( boot_cpu_data.x86_vendor )
         {
         case X86_VENDOR_INTEL:
-            ret = intel_mce_wrmsr(msr, val);
+            ret = intel_mce_wrmsr(v, msr, val);
             break;
         default:
             ret = 0;
@@ -247,9 +266,9 @@ static int bank_mce_wrmsr(struct domain 
  */
 int vmce_wrmsr(u32 msr, u64 val)
 {
-    struct domain *d = current->domain;
+    struct vcpu *cur = current;
     struct bank_entry *entry = NULL;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    struct domain_mca_msrs *vmce = dom_vmce(cur->domain);
     int ret = 1;
 
     if ( !g_mcg_cap )
@@ -266,7 +285,7 @@ int vmce_wrmsr(u32 msr, u64 val)
         vmce->mcg_status = val;
         mce_printk(MCE_VERBOSE, "MCE: wrmsr MCG_STATUS %"PRIx64"\n", val);
         /* For HVM guest, this is the point for deleting vMCE injection node */
-        if ( d->is_hvm && (vmce->nr_injection > 0) )
+        if ( is_hvm_vcpu(cur) && (vmce->nr_injection > 0) )
         {
             vmce->nr_injection--; /* Should be 0 */
             if ( !list_empty(&vmce->impact_header) )
@@ -293,7 +312,7 @@ int vmce_wrmsr(u32 msr, u64 val)
         ret = -1;
         break;
     default:
-        ret = mce_bank_msr(msr) ? bank_mce_wrmsr(d, msr, val) : 0;
+        ret = mce_bank_msr(cur, msr) ? bank_mce_wrmsr(cur, msr, val) : 0;
         break;
     }
 
@@ -301,6 +320,46 @@ int vmce_wrmsr(u32 msr, u64 val)
     return ret;
 }
 
+static int vmce_save_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct vcpu *v;
+    int err = 0;
+
+    for_each_vcpu( d, v ) {
+        struct hvm_vmce_vcpu ctxt = {
+            .caps = v->arch.mcg_cap
+        };
+
+        err = hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, &ctxt);
+        if ( err )
+            break;
+    }
+
+    return err;
+}
+
+static int vmce_load_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    unsigned int vcpuid = hvm_load_instance(h);
+    struct vcpu *v;
+    struct hvm_vmce_vcpu ctxt;
+    int err;
+
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        err = -EINVAL;
+    }
+    else
+        err = hvm_load_entry(VMCE_VCPU, h, &ctxt);
+
+    return err ?: vmce_restore_vcpu(v, ctxt.caps);
+}
+
+HVM_REGISTER_SAVE_RESTORE(VMCE_VCPU, vmce_save_vcpu_ctxt,
+                          vmce_load_vcpu_ctxt, 1, HVMSR_PER_VCPU);
+
 int inject_vmce(struct domain *d)
 {
     int cpu = smp_processor_id();
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -422,6 +422,8 @@ int vcpu_initialise(struct vcpu *v)
     if ( (rc = vcpu_init_fpu(v)) != 0 )
         return rc;
 
+    vmce_init_vcpu(v);
+
     if ( is_hvm_domain(d) )
     {
         rc = hvm_vcpu_initialise(v);
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1027,11 +1027,12 @@ long arch_do_domctl(
                 evc->syscall32_callback_eip    = 0;
                 evc->syscall32_disables_events = 0;
             }
+            evc->mcg_cap = v->arch.mcg_cap;
         }
         else
         {
             ret = -EINVAL;
-            if ( evc->size != sizeof(*evc) )
+            if ( evc->size < offsetof(typeof(*evc), mcg_cap) )
                 goto ext_vcpucontext_out;
 #ifdef __x86_64__
             if ( !is_hvm_domain(d) )
@@ -1059,6 +1060,10 @@ long arch_do_domctl(
                  (evc->syscall32_callback_cs & ~3) ||
                  evc->syscall32_callback_eip )
                 goto ext_vcpucontext_out;
+
+            if ( evc->size >= offsetof(typeof(*evc), mcg_cap) +
+                              sizeof(evc->mcg_cap) )
+                ret = vmce_restore_vcpu(v, evc->mcg_cap);
         }
 
         ret = 0;
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -488,6 +488,8 @@ struct arch_vcpu
     /* This variable determines whether nonlazy extended state has been used,
      * and thus should be saved/restored. */
     bool_t nonlazy_xstate_used;
+
+    uint64_t mcg_cap;
     
     struct paging_vcpu paging;
 
--- a/xen/include/asm-x86/mce.h
+++ b/xen/include/asm-x86/mce.h
@@ -16,7 +16,6 @@ struct bank_entry {
 struct domain_mca_msrs
 {
     /* Guest should not change below values after DOM boot up */
-    uint64_t mcg_cap;
     uint64_t mcg_ctl;
     uint64_t mcg_status;
     uint64_t *mci_ctl;
@@ -28,6 +27,8 @@ struct domain_mca_msrs
 /* Guest vMCE MSRs virtualization */
 extern int vmce_init_msr(struct domain *d);
 extern void vmce_destroy_msr(struct domain *d);
+extern void vmce_init_vcpu(struct vcpu *);
+extern int vmce_restore_vcpu(struct vcpu *, uint64_t caps);
 extern int vmce_wrmsr(uint32_t msr, uint64_t val);
 extern int vmce_rdmsr(uint32_t msr, uint64_t *val);
 
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -585,9 +585,15 @@ struct hvm_viridian_vcpu_context {
 
 DECLARE_HVM_SAVE_TYPE(VIRIDIAN_VCPU, 17, struct hvm_viridian_vcpu_context);
 
+struct hvm_vmce_vcpu {
+    uint64_t caps;
+};
+
+DECLARE_HVM_SAVE_TYPE(VMCE_VCPU, 18, struct hvm_vmce_vcpu);
+
 /* 
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 17
+#define HVM_SAVE_CODE_MAX 18
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -559,7 +559,7 @@ struct xen_domctl_ext_vcpucontext {
     uint32_t         vcpu;
     /*
      * SET: Size of struct (IN)
-     * GET: Size of struct (OUT)
+     * GET: Size of struct (OUT, up to 128 bytes)
      */
     uint32_t         size;
 #if defined(__i386__) || defined(__x86_64__)
@@ -571,6 +571,7 @@ struct xen_domctl_ext_vcpucontext {
     uint16_t         sysenter_callback_cs;
     uint8_t          syscall32_disables_events;
     uint8_t          sysenter_disables_events;
+    uint64_aligned_t mcg_cap;
 #endif
 };
 typedef struct xen_domctl_ext_vcpucontext xen_domctl_ext_vcpucontext_t;



[-- Attachment #2.1.2: x86-vmce-sr.patch --]
[-- Type: text/plain, Size: 16665 bytes --]

x86/vMCE: save/restore MCA capabilities

This allows migration to a host with less MCA banks than the source
host had, while without this patch accesses to the excess banks' MSRs
caused #GP-s in the guest after migration (and it depended on the guest
kernel whether this would be fatal).

A fundamental question is whether we should also save/restore MCG_CTL
and MCi_CTL, as the HVM save record would better be defined to the
complete state that needs saving from the beginning (I'm unsure whether
the save/restore logic allows for future extension of an existing
record).

Of course, this change is expected to make migration from new to older
Xen impossible (again I'm unsure what the save/restore logic does with
records it doesn't even know about).

The (trivial) tools side change may seem unrelated, but the code should
have been that way from the beginning to allow the hypervisor to look
at currently unused ext_vcpucontext fields without risking to read
garbage when those fields get a meaning assigned in the future. This
isn't being enforced here - should it be? (Obviously, for backwards
compatibility, the hypervisor must assume these fields to be clear only
when the extended context's size exceeds the old original one.)

A future addition to this change might be to allow configuration of the
number of banks and other MCA capabilities for a guest before it starts
(i.e. to not inherits the values seen on the first host it runs on).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1879,6 +1879,7 @@ int xc_domain_save(xc_interface *xch, in
 
         domctl.cmd = XEN_DOMCTL_get_ext_vcpucontext;
         domctl.domain = dom;
+        memset(&domctl.u, 0, sizeof(domctl.u));
         domctl.u.ext_vcpucontext.vcpu = i;
         if ( xc_domctl(xch, &domctl) < 0 )
         {
--- a/tools/misc/xen-hvmctx.c
+++ b/tools/misc/xen-hvmctx.c
@@ -383,6 +383,13 @@ static void dump_viridian_vcpu(void)
            (unsigned long long) p.apic_assist);           
 }
 
+static void dump_vmce_vcpu(void)
+{
+    HVM_SAVE_TYPE(VMCE_VCPU) p;
+    READ(p);
+    printf("    VMCE_VCPU: caps %" PRIx64 "\n", p.caps);
+}
+
 int main(int argc, char **argv)
 {
     int entry, domid;
@@ -449,6 +456,7 @@ int main(int argc, char **argv)
         case HVM_SAVE_CODE(MTRR): dump_mtrr(); break;
         case HVM_SAVE_CODE(VIRIDIAN_DOMAIN): dump_viridian_domain(); break;
         case HVM_SAVE_CODE(VIRIDIAN_VCPU): dump_viridian_vcpu(); break;
+        case HVM_SAVE_CODE(VMCE_VCPU): dump_vmce_vcpu(); break;
         case HVM_SAVE_CODE(END): break;
         default:
             printf(" ** Don't understand type %u: skipping\n",
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -3,6 +3,7 @@
 #define _MCE_H
 
 #include <xen/init.h>
+#include <xen/sched.h>
 #include <xen/smp.h>
 #include <asm/types.h>
 #include <asm/traps.h>
@@ -54,8 +55,8 @@ int unmmap_broken_page(struct domain *d,
 u64 mce_cap_init(void);
 extern unsigned int firstbank;
 
-int intel_mce_rdmsr(uint32_t msr, uint64_t *val);
-int intel_mce_wrmsr(uint32_t msr, uint64_t val);
+int intel_mce_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val);
+int intel_mce_wrmsr(struct vcpu *, uint32_t msr, uint64_t val);
 
 struct mcinfo_extended *intel_get_extended_msrs(
     struct mcinfo_global *mig, struct mc_info *mi);
@@ -171,18 +172,20 @@ int vmce_domain_inject(struct mcinfo_ban
 
 extern int vmce_init(struct cpuinfo_x86 *c);
 
-static inline int mce_vendor_bank_msr(uint32_t msr)
+static inline int mce_vendor_bank_msr(const struct vcpu *v, uint32_t msr)
 {
     if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
-         msr >= MSR_IA32_MC0_CTL2 && msr < (MSR_IA32_MC0_CTL2 + nr_mce_banks) )
+         msr >= MSR_IA32_MC0_CTL2 &&
+         msr < MSR_IA32_MCx_CTL2(v->arch.mcg_cap & MCG_CAP_COUNT) )
           return 1;
     return 0;
 }
 
-static inline int mce_bank_msr(uint32_t msr)
+static inline int mce_bank_msr(const struct vcpu *v, uint32_t msr)
 {
-    if ( (msr >= MSR_IA32_MC0_CTL && msr < MSR_IA32_MCx_CTL(nr_mce_banks)) ||
-        mce_vendor_bank_msr(msr) )
+    if ( (msr >= MSR_IA32_MC0_CTL &&
+          msr < MSR_IA32_MCx_CTL(v->arch.mcg_cap & MCG_CAP_COUNT)) ||
+         mce_vendor_bank_msr(v, msr) )
         return 1;
     return 0;
 }
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -1421,11 +1421,12 @@ enum mcheck_type intel_mcheck_init(struc
 }
 
 /* intel specific MCA MSR */
-int intel_mce_wrmsr(uint32_t msr, uint64_t val)
+int intel_mce_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
     int ret = 0;
 
-    if (msr >= MSR_IA32_MC0_CTL2 && msr < (MSR_IA32_MC0_CTL2 + nr_mce_banks))
+    if ( msr >= MSR_IA32_MC0_CTL2 &&
+         msr < MSR_IA32_MCx_CTL2(v->arch.mcg_cap & MCG_CAP_COUNT) )
     {
         mce_printk(MCE_QUIET, "We have disabled CMCI capability, "
                  "Guest should not write this MSR!\n");
@@ -1435,11 +1436,12 @@ int intel_mce_wrmsr(uint32_t msr, uint64
     return ret;
 }
 
-int intel_mce_rdmsr(uint32_t msr, uint64_t *val)
+int intel_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
 {
     int ret = 0;
 
-    if (msr >= MSR_IA32_MC0_CTL2 && msr < (MSR_IA32_MC0_CTL2 + nr_mce_banks))
+    if ( msr >= MSR_IA32_MC0_CTL2 &&
+         msr < MSR_IA32_MCx_CTL2(v->arch.mcg_cap & MCG_CAP_COUNT) )
     {
         mce_printk(MCE_QUIET, "We have disabled CMCI capability, "
                  "Guest should not read this MSR!\n");
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -10,6 +10,7 @@
 #include <xen/delay.h>
 #include <xen/smp.h>
 #include <xen/mm.h>
+#include <xen/hvm/save.h>
 #include <asm/processor.h>
 #include <public/sysctl.h>
 #include <asm/system.h>
@@ -42,7 +43,6 @@ int vmce_init_msr(struct domain *d)
            nr_mce_banks * sizeof(*dom_vmce(d)->mci_ctl));
 
     dom_vmce(d)->mcg_status = 0x0;
-    dom_vmce(d)->mcg_cap = g_mcg_cap;
     dom_vmce(d)->mcg_ctl = ~(uint64_t)0x0;
     dom_vmce(d)->nr_injection = 0;
 
@@ -61,21 +61,41 @@ void vmce_destroy_msr(struct domain *d)
     dom_vmce(d) = NULL;
 }
 
-static int bank_mce_rdmsr(struct domain *d, uint32_t msr, uint64_t *val)
+void vmce_init_vcpu(struct vcpu *v)
 {
-    int bank, ret = 1;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    v->arch.mcg_cap = g_mcg_cap;
+}
+
+int vmce_restore_vcpu(struct vcpu *v, uint64_t caps)
+{
+    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P )
+    {
+        dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA capabilities"
+                " %#" PRIx64 " for d%d:v%u (supported: %#Lx)\n",
+                is_hvm_vcpu(v) ? "HVM" : "PV", caps, v->domain->domain_id,
+                v->vcpu_id, g_mcg_cap & ~MCG_CAP_COUNT);
+        return -EPERM;
+    }
+
+    v->arch.mcg_cap = caps;
+    return 0;
+}
+
+static int bank_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
+{
+    int ret = 1;
+    unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
+    struct domain_mca_msrs *vmce = dom_vmce(v->domain);
     struct bank_entry *entry;
 
-    bank = (msr - MSR_IA32_MC0_CTL) / 4;
-    if ( bank >= nr_mce_banks )
-        return -1;
+    *val = 0;
 
     switch ( msr & (MSR_IA32_MC0_CTL | 3) )
     {
     case MSR_IA32_MC0_CTL:
-        *val = vmce->mci_ctl[bank] &
-            (h_mci_ctrl ? h_mci_ctrl[bank] : ~0UL);
+        if ( bank < nr_mce_banks )
+            *val = vmce->mci_ctl[bank] &
+                   (h_mci_ctrl ? h_mci_ctrl[bank] : ~0UL);
         mce_printk(MCE_VERBOSE, "MCE: rdmsr MC%u_CTL 0x%"PRIx64"\n",
                    bank, *val);
         break;
@@ -126,7 +146,7 @@ static int bank_mce_rdmsr(struct domain 
         switch ( boot_cpu_data.x86_vendor )
         {
         case X86_VENDOR_INTEL:
-            ret = intel_mce_rdmsr(msr, val);
+            ret = intel_mce_rdmsr(v, msr, val);
             break;
         default:
             ret = 0;
@@ -145,13 +165,13 @@ static int bank_mce_rdmsr(struct domain 
  */
 int vmce_rdmsr(uint32_t msr, uint64_t *val)
 {
-    struct domain *d = current->domain;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    const struct vcpu *cur = current;
+    struct domain_mca_msrs *vmce = dom_vmce(cur->domain);
     int ret = 1;
 
     *val = 0;
 
-    spin_lock(&dom_vmce(d)->lock);
+    spin_lock(&vmce->lock);
 
     switch ( msr )
     {
@@ -162,39 +182,38 @@ int vmce_rdmsr(uint32_t msr, uint64_t *v
                        "MCE: rdmsr MCG_STATUS 0x%"PRIx64"\n", *val);
         break;
     case MSR_IA32_MCG_CAP:
-        *val = vmce->mcg_cap;
+        *val = cur->arch.mcg_cap;
         mce_printk(MCE_VERBOSE, "MCE: rdmsr MCG_CAP 0x%"PRIx64"\n",
                    *val);
         break;
     case MSR_IA32_MCG_CTL:
         /* Always 0 if no CTL support */
-        *val = vmce->mcg_ctl & h_mcg_ctl;
+        if ( cur->arch.mcg_cap & MCG_CTL_P )
+            *val = vmce->mcg_ctl & h_mcg_ctl;
         mce_printk(MCE_VERBOSE, "MCE: rdmsr MCG_CTL 0x%"PRIx64"\n",
                    *val);
         break;
     default:
-        ret = mce_bank_msr(msr) ? bank_mce_rdmsr(d, msr, val) : 0;
+        ret = mce_bank_msr(cur, msr) ? bank_mce_rdmsr(cur, msr, val) : 0;
         break;
     }
 
-    spin_unlock(&dom_vmce(d)->lock);
+    spin_unlock(&vmce->lock);
     return ret;
 }
 
-static int bank_mce_wrmsr(struct domain *d, u32 msr, u64 val)
+static int bank_mce_wrmsr(struct vcpu *v, u32 msr, u64 val)
 {
-    int bank, ret = 1;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    int ret = 1;
+    unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
+    struct domain_mca_msrs *vmce = dom_vmce(v->domain);
     struct bank_entry *entry = NULL;
 
-    bank = (msr - MSR_IA32_MC0_CTL) / 4;
-    if ( bank >= nr_mce_banks )
-        return -EINVAL;
-
     switch ( msr & (MSR_IA32_MC0_CTL | 3) )
     {
     case MSR_IA32_MC0_CTL:
-        vmce->mci_ctl[bank] = val;
+        if ( bank < nr_mce_banks )
+            vmce->mci_ctl[bank] = val;
         break;
     case MSR_IA32_MC0_STATUS:
         /* Give the first entry of the list, it corresponds to current
@@ -202,9 +221,9 @@ static int bank_mce_wrmsr(struct domain 
          * the guest, this node will be deleted.
          * Only error bank is written. Non-error banks simply return.
          */
-        if ( !list_empty(&dom_vmce(d)->impact_header) )
+        if ( !list_empty(&vmce->impact_header) )
         {
-            entry = list_entry(dom_vmce(d)->impact_header.next,
+            entry = list_entry(vmce->impact_header.next,
                                struct bank_entry, list);
             if ( entry->bank == bank )
                 entry->mci_status = val;
@@ -228,7 +247,7 @@ static int bank_mce_wrmsr(struct domain 
         switch ( boot_cpu_data.x86_vendor )
         {
         case X86_VENDOR_INTEL:
-            ret = intel_mce_wrmsr(msr, val);
+            ret = intel_mce_wrmsr(v, msr, val);
             break;
         default:
             ret = 0;
@@ -247,9 +266,9 @@ static int bank_mce_wrmsr(struct domain 
  */
 int vmce_wrmsr(u32 msr, u64 val)
 {
-    struct domain *d = current->domain;
+    struct vcpu *cur = current;
     struct bank_entry *entry = NULL;
-    struct domain_mca_msrs *vmce = dom_vmce(d);
+    struct domain_mca_msrs *vmce = dom_vmce(cur->domain);
     int ret = 1;
 
     if ( !g_mcg_cap )
@@ -266,7 +285,7 @@ int vmce_wrmsr(u32 msr, u64 val)
         vmce->mcg_status = val;
         mce_printk(MCE_VERBOSE, "MCE: wrmsr MCG_STATUS %"PRIx64"\n", val);
         /* For HVM guest, this is the point for deleting vMCE injection node */
-        if ( d->is_hvm && (vmce->nr_injection > 0) )
+        if ( is_hvm_vcpu(cur) && (vmce->nr_injection > 0) )
         {
             vmce->nr_injection--; /* Should be 0 */
             if ( !list_empty(&vmce->impact_header) )
@@ -293,7 +312,7 @@ int vmce_wrmsr(u32 msr, u64 val)
         ret = -1;
         break;
     default:
-        ret = mce_bank_msr(msr) ? bank_mce_wrmsr(d, msr, val) : 0;
+        ret = mce_bank_msr(cur, msr) ? bank_mce_wrmsr(cur, msr, val) : 0;
         break;
     }
 
@@ -301,6 +320,46 @@ int vmce_wrmsr(u32 msr, u64 val)
     return ret;
 }
 
+static int vmce_save_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    struct vcpu *v;
+    int err = 0;
+
+    for_each_vcpu( d, v ) {
+        struct hvm_vmce_vcpu ctxt = {
+            .caps = v->arch.mcg_cap
+        };
+
+        err = hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, &ctxt);
+        if ( err )
+            break;
+    }
+
+    return err;
+}
+
+static int vmce_load_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+{
+    unsigned int vcpuid = hvm_load_instance(h);
+    struct vcpu *v;
+    struct hvm_vmce_vcpu ctxt;
+    int err;
+
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        err = -EINVAL;
+    }
+    else
+        err = hvm_load_entry(VMCE_VCPU, h, &ctxt);
+
+    return err ?: vmce_restore_vcpu(v, ctxt.caps);
+}
+
+HVM_REGISTER_SAVE_RESTORE(VMCE_VCPU, vmce_save_vcpu_ctxt,
+                          vmce_load_vcpu_ctxt, 1, HVMSR_PER_VCPU);
+
 int inject_vmce(struct domain *d)
 {
     int cpu = smp_processor_id();
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -422,6 +422,8 @@ int vcpu_initialise(struct vcpu *v)
     if ( (rc = vcpu_init_fpu(v)) != 0 )
         return rc;
 
+    vmce_init_vcpu(v);
+
     if ( is_hvm_domain(d) )
     {
         rc = hvm_vcpu_initialise(v);
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1027,11 +1027,12 @@ long arch_do_domctl(
                 evc->syscall32_callback_eip    = 0;
                 evc->syscall32_disables_events = 0;
             }
+            evc->mcg_cap = v->arch.mcg_cap;
         }
         else
         {
             ret = -EINVAL;
-            if ( evc->size != sizeof(*evc) )
+            if ( evc->size < offsetof(typeof(*evc), mcg_cap) )
                 goto ext_vcpucontext_out;
 #ifdef __x86_64__
             if ( !is_hvm_domain(d) )
@@ -1059,6 +1060,10 @@ long arch_do_domctl(
                  (evc->syscall32_callback_cs & ~3) ||
                  evc->syscall32_callback_eip )
                 goto ext_vcpucontext_out;
+
+            if ( evc->size >= offsetof(typeof(*evc), mcg_cap) +
+                              sizeof(evc->mcg_cap) )
+                ret = vmce_restore_vcpu(v, evc->mcg_cap);
         }
 
         ret = 0;
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -488,6 +488,8 @@ struct arch_vcpu
     /* This variable determines whether nonlazy extended state has been used,
      * and thus should be saved/restored. */
     bool_t nonlazy_xstate_used;
+
+    uint64_t mcg_cap;
     
     struct paging_vcpu paging;
 
--- a/xen/include/asm-x86/mce.h
+++ b/xen/include/asm-x86/mce.h
@@ -16,7 +16,6 @@ struct bank_entry {
 struct domain_mca_msrs
 {
     /* Guest should not change below values after DOM boot up */
-    uint64_t mcg_cap;
     uint64_t mcg_ctl;
     uint64_t mcg_status;
     uint64_t *mci_ctl;
@@ -28,6 +27,8 @@ struct domain_mca_msrs
 /* Guest vMCE MSRs virtualization */
 extern int vmce_init_msr(struct domain *d);
 extern void vmce_destroy_msr(struct domain *d);
+extern void vmce_init_vcpu(struct vcpu *);
+extern int vmce_restore_vcpu(struct vcpu *, uint64_t caps);
 extern int vmce_wrmsr(uint32_t msr, uint64_t val);
 extern int vmce_rdmsr(uint32_t msr, uint64_t *val);
 
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -585,9 +585,15 @@ struct hvm_viridian_vcpu_context {
 
 DECLARE_HVM_SAVE_TYPE(VIRIDIAN_VCPU, 17, struct hvm_viridian_vcpu_context);
 
+struct hvm_vmce_vcpu {
+    uint64_t caps;
+};
+
+DECLARE_HVM_SAVE_TYPE(VMCE_VCPU, 18, struct hvm_vmce_vcpu);
+
 /* 
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 17
+#define HVM_SAVE_CODE_MAX 18
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -559,7 +559,7 @@ struct xen_domctl_ext_vcpucontext {
     uint32_t         vcpu;
     /*
      * SET: Size of struct (IN)
-     * GET: Size of struct (OUT)
+     * GET: Size of struct (OUT, up to 128 bytes)
      */
     uint32_t         size;
 #if defined(__i386__) || defined(__x86_64__)
@@ -571,6 +571,7 @@ struct xen_domctl_ext_vcpucontext {
     uint16_t         sysenter_callback_cs;
     uint8_t          syscall32_disables_events;
     uint8_t          sysenter_disables_events;
+    uint64_aligned_t mcg_cap;
 #endif
 };
 typedef struct xen_domctl_ext_vcpucontext xen_domctl_ext_vcpucontext_t;

[-- Attachment #2.1.3: ATT00001.txt --]
[-- Type: text/plain, Size: 142 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28  8:54   ` Liu, Jinsong
@ 2012-06-28  9:08     ` Jan Beulich
  2012-06-28  9:40       ` Liu, Jinsong
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2012-06-28  9:08 UTC (permalink / raw)
  To: Jinsong Liu, Will Auld
  Cc: Ashok Raj, Donald D Dugger, Haitao Shan, Jun Nakajima, Susie Li,
	Tony Luck, Xiantao Zhang, Yunhong Jiang, xen-devel, linux-kernel,
	Keir Fraser

>>> On 28.06.12 at 10:54, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>> The 2-bank approach also needs to be brought in line with the
>> current host derived value in MCG_CAP, especially if the code to
>> implement this new model doesn't make it into 4.2 (which would
>> generally save a larger value).
> 
> Let me repeat in my word to avoid misunderstanding about your concern:
> Your concern rooted from the history patch (c/s 24887, as attached) which 
> used to solve vMCE migration issue caused from bank number. I guess the patch 
> was not in xen4.1.x but would be in xen 4.2 release recently (right? and when 
> will xen 4.2 release?)

4.2 is in feature freeze right now, preparing for the release.

> Per my understanding, you want us to make sure our new vMCE model compatible 
> w/ olde vMCE. For example if our patch in xen 4.3 release, you want to make 
> sure a guest migrate from xen 4.2 to 4.3 would not broken. Is this your 
> concern?

Yes. If we can't get the save/restore records adjusted in time
for 4.2, compatibility with 4.2 would be a requirement. If we
manage to get the necessary adjustments done in time, and if
they're not too intrusive (i.e. would be acceptable at this late
stage of the development cycle), then the concern could be
dropped from an upstream perspective due to the lack of
support in 4.1 (and hence no backward compatibility issues).
BUT this isn't as simple from a product usage point of view: As
the save/restore code currently in -unstable was coded up to
address a problem observed by SLE11 SP2 users, we already
backported those patches. So compatibility will be a requirement
in any case.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28  9:08     ` Jan Beulich
@ 2012-06-28  9:40       ` Liu, Jinsong
  2012-06-28  9:55         ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Liu, Jinsong @ 2012-06-28  9:40 UTC (permalink / raw)
  To: Jan Beulich, Auld, Will
  Cc: Raj, Ashok, Dugger, Donald D, Shan, Haitao, Nakajima, Jun, Li,
	Susie, Luck, Tony, Zhang, Xiantao, Jiang, Yunhong, xen-devel,
	linux-kernel, Keir Fraser

Jan Beulich wrote:
>>>> On 28.06.12 at 10:54, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>> The 2-bank approach also needs to be brought in line with the
>>> current host derived value in MCG_CAP, especially if the code to
>>> implement this new model doesn't make it into 4.2 (which would
>>> generally save a larger value).
>> 
>> Let me repeat in my word to avoid misunderstanding about your
>> concern: 
>> Your concern rooted from the history patch (c/s 24887, as attached)
>> which used to solve vMCE migration issue caused from bank number. I
>> guess the patch was not in xen4.1.x but would be in xen 4.2 release
>> recently (right? and when will xen 4.2 release?)
> 
> 4.2 is in feature freeze right now, preparing for the release.
> 
>> Per my understanding, you want us to make sure our new vMCE model
>> compatible w/ olde vMCE. For example if our patch in xen 4.3
>> release, you want to make sure a guest migrate from xen 4.2 to 4.3
>> would not broken. Is this your concern?
> 
> Yes. If we can't get the save/restore records adjusted in time
> for 4.2, compatibility with 4.2 would be a requirement. If we
> manage to get the necessary adjustments done in time, and if
> they're not too intrusive (i.e. would be acceptable at this late
> stage of the development cycle), then the concern could be
> dropped from an upstream perspective due to the lack of
> support in 4.1 (and hence no backward compatibility issues).
> BUT this isn't as simple from a product usage point of view: As
> the save/restore code currently in -unstable was coded up to
> address a problem observed by SLE11 SP2 users, we already
> backported those patches. So compatibility will be a requirement
> in any case.
> 
> Jan

A basic point of new vMCE is, it get rid of old vMCE, start setting up a new model from the very beginning. From coding point of view, backward compatibility issue would be dirty and troublesome.

The point is, old vMCE interface is host-based while new vMCE is pure s/w defined, hence troubles come from the interface heterogeneous (if need keep compatibility). The basic assumption of live migration from A to B is, A and B basically at same page, so it could be migrated by setting the smallest common feature/capability set (via cpuid, command line, etc.). However, old vMCE and new vMCE are quite different and cannot controlled effectively. For example, old vMCE has MCG_CTL but new vMCE doesn't, and new vMCE has CMCI support (and MCi_CTL2) but old vMCE doesn't. I even doubt the feasibility of keeping compatibility w/ old vMCE. An example is, it's hard to migrate between Intel cpu and AMD cpu.

So I would like to push new vMCE as quickly as possible. What's the timeline of vMCE developing that xen 4.2 could accept? I wonder if we could make major components of vMCE done before xen 4.2 timeline, and leave the surrounding features and the corner cases done later?

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28  9:40       ` Liu, Jinsong
@ 2012-06-28  9:55         ` Jan Beulich
  2012-06-28  9:58           ` [Xen-devel] " Ian Campbell
  2012-06-28 13:38           ` Liu, Jinsong
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2012-06-28  9:55 UTC (permalink / raw)
  To: Jinsong Liu, Will Auld
  Cc: Ashok Raj, Donald D Dugger, Haitao Shan, Jun Nakajima, Susie Li,
	Tony Luck, Xiantao Zhang, Yunhong Jiang, xen-devel, linux-kernel,
	Keir Fraser

>>> On 28.06.12 at 11:40, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> So I would like to push new vMCE as quickly as possible. What's the timeline 
> of vMCE developing that xen 4.2 could accept?

Weeks ago, really. See http://lists.xen.org/archives/html/xen-devel/2012-06/msg01619.html
and follow-ups - we'd really only consider getting the save/restore
interface into forward compatible shape as acceptable.

> I wonder if we could make major 
> components of vMCE done before xen 4.2 timeline, and leave the surrounding 
> features and the corner cases done later?

Unfortunately it's likely going to be even less. However, if split
that way, chances are things could go into e.g. 4.2.1.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28  9:55         ` Jan Beulich
@ 2012-06-28  9:58           ` Ian Campbell
  2012-06-28 13:38           ` Liu, Jinsong
  1 sibling, 0 replies; 20+ messages in thread
From: Ian Campbell @ 2012-06-28  9:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jinsong Liu, Will Auld, Tony Luck, Keir (Xen.org),
	Ashok Raj, Yunhong Jiang, Susie Li, Haitao Shan, linux-kernel,
	Donald D Dugger, xen-devel, Jun Nakajima, Xiantao Zhang

On Thu, 2012-06-28 at 10:55 +0100, Jan Beulich wrote:
> >>> On 28.06.12 at 11:40, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> > So I would like to push new vMCE as quickly as possible. What's the timeline 
> > of vMCE developing that xen 4.2 could accept?
> 
> Weeks ago, really. See http://lists.xen.org/archives/html/xen-devel/2012-06/msg01619.html
> and follow-ups - we'd really only consider getting the save/restore
> interface into forward compatible shape as acceptable.

Yes it really is far to late to considering entire new features at this
stage.

Ian.

> 
> > I wonder if we could make major 
> > components of vMCE done before xen 4.2 timeline, and leave the surrounding 
> > features and the corner cases done later?
> 
> Unfortunately it's likely going to be even less. However, if split
> that way, chances are things could go into e.g. 4.2.1.
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28  9:55         ` Jan Beulich
  2012-06-28  9:58           ` [Xen-devel] " Ian Campbell
@ 2012-06-28 13:38           ` Liu, Jinsong
  2012-06-28 14:00             ` Jan Beulich
  1 sibling, 1 reply; 20+ messages in thread
From: Liu, Jinsong @ 2012-06-28 13:38 UTC (permalink / raw)
  To: Jan Beulich, Auld, Will
  Cc: Raj, Ashok, Dugger, Donald D, Shan, Haitao, Nakajima, Jun, Li,
	Susie, Luck, Tony, Zhang, Xiantao, Jiang, Yunhong, xen-devel,
	linux-kernel, Keir Fraser

Jan Beulich wrote:
>>>> On 28.06.12 at 11:40, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> So I would like to push new vMCE as quickly as possible. What's the
>> timeline of vMCE developing that xen 4.2 could accept?
> 
> Weeks ago, really. See
> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01619.html
> and follow-ups - we'd really only consider getting the save/restore 
> interface into forward compatible shape as acceptable.
> 
>> I wonder if we could make major
>> components of vMCE done before xen 4.2 timeline, and leave the
>> surrounding features and the corner cases done later?
> 
> Unfortunately it's likely going to be even less. However, if split
> that way, chances are things could go into e.g. 4.2.1.
> 
> Jan

So let's look at current vMCE status first:
1). functionally it work abnormally for guest (but tolerated by some guest like linux/solaris);
2). before xen 4.1 it blocks migration when migrate from big bank to small bank platform;

We may try some middle steps, minimal adjusting for xen 4.2 release (to avoid futher compatible issue at xen 4.2.1, 4.3, ...):
1). we don't handle vMCE function bugs, only make sure migration works OK;
2). update vMCE interface to a middle clean status:
    * remove MCG_CTL (otherwise we have to add this useless MSR at new vMCE);
    * stick all 1's to MCi_CTL (avoid semantic difference);
    * for MCG_CAP, clear MCG_CTL_P, limit to 2 banks (otherwise dirty code have to be added at new vMCE);

Thoughts?

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28 13:38           ` Liu, Jinsong
@ 2012-06-28 14:00             ` Jan Beulich
  2012-06-28 17:02               ` Liu, Jinsong
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2012-06-28 14:00 UTC (permalink / raw)
  To: Jinsong Liu, Will Auld
  Cc: Ashok Raj, Donald D Dugger, Haitao Shan, Jun Nakajima, Susie Li,
	Tony Luck, Xiantao Zhang, Yunhong Jiang, xen-devel, linux-kernel,
	Keir Fraser

>>> On 28.06.12 at 15:38, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>>>>> On 28.06.12 at 11:40, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> So I would like to push new vMCE as quickly as possible. What's the
>>> timeline of vMCE developing that xen 4.2 could accept?
>> 
>> Weeks ago, really. See
>> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01619.html 
>> and follow-ups - we'd really only consider getting the save/restore 
>> interface into forward compatible shape as acceptable.
>> 
>>> I wonder if we could make major
>>> components of vMCE done before xen 4.2 timeline, and leave the
>>> surrounding features and the corner cases done later?
>> 
>> Unfortunately it's likely going to be even less. However, if split
>> that way, chances are things could go into e.g. 4.2.1.
>> 
>> Jan
> 
> So let's look at current vMCE status first:
> 1). functionally it work abnormally for guest (but tolerated by some guest 
> like linux/solaris);
> 2). before xen 4.1 it blocks migration when migrate from big bank to small 
> bank platform;

Before 4.2 you mean (in 4.1 we only have this as a backport in SLE11).

> We may try some middle steps, minimal adjusting for xen 4.2 release (to 
> avoid futher compatible issue at xen 4.2.1, 4.3, ...):
> 1). we don't handle vMCE function bugs, only make sure migration works OK;

That's the minimal goal.

> 2). update vMCE interface to a middle clean status:
>     * remove MCG_CTL (otherwise we have to add this useless MSR at new 
> vMCE);
>     * stick all 1's to MCi_CTL (avoid semantic difference);
>     * for MCG_CAP, clear MCG_CTL_P, limit to 2 banks (otherwise dirty code 
> have to be added at new vMCE);

Whether that's acceptable would need to be seen when code
is ready.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28 14:00             ` Jan Beulich
@ 2012-06-28 17:02               ` Liu, Jinsong
  2012-06-29  9:58                 ` [Xen-devel] " Christoph Egger
  0 siblings, 1 reply; 20+ messages in thread
From: Liu, Jinsong @ 2012-06-28 17:02 UTC (permalink / raw)
  To: Jan Beulich, Auld, Will, Ian Campbell
  Cc: Raj, Ashok, Dugger, Donald D, Shan, Haitao, Nakajima, Jun, Li,
	Susie, Luck, Tony, Zhang, Xiantao, Jiang, Yunhong, xen-devel,
	linux-kernel, Keir Fraser

Jan Beulich wrote:
>>>> On 28.06.12 at 15:38, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>>>>> On 28.06.12 at 11:40, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote: 
>>>> So I would like to push new vMCE as quickly as possible. What's the
>>>> timeline of vMCE developing that xen 4.2 could accept?
>>> 
>>> Weeks ago, really. See
>>> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01619.html
>>> and follow-ups - we'd really only consider getting the save/restore
>>> interface into forward compatible shape as acceptable.
>>> 
>>>> I wonder if we could make major
>>>> components of vMCE done before xen 4.2 timeline, and leave the
>>>> surrounding features and the corner cases done later?
>>> 
>>> Unfortunately it's likely going to be even less. However, if split
>>> that way, chances are things could go into e.g. 4.2.1.
>>> 
>>> Jan
>> 
>> So let's look at current vMCE status first:
>> 1). functionally it work abnormally for guest (but tolerated by some
>> guest like linux/solaris); 2). before xen 4.1 it blocks migration
>> when migrate from big bank to small bank platform;
> 
> Before 4.2 you mean (in 4.1 we only have this as a backport in SLE11).

Yes.

> 
>> We may try some middle steps, minimal adjusting for xen 4.2 release
>> (to avoid futher compatible issue at xen 4.2.1, 4.3, ...):
>> 1). we don't handle vMCE function bugs, only make sure migration
>> works OK; 
> 
> That's the minimal goal.

You mean to fix current vMCE function bugs in xen 4.2? That would involve much work hence too late for xen 4.2. In fact the bugs currently tolerated by guest, so it's important but non-urgent.

What we need to do urgently is to adjust current vMCE interface a little so that
1). it would not block xen 4.2 live migration
2). it would not bring compatibility issues to new vMCE in the future
These 2 points are our minimal targets for xen 4.2

Thanks,
Jinsong

> 
>> 2). update vMCE interface to a middle clean status:
>>     * remove MCG_CTL (otherwise we have to add this useless MSR at
>> new vMCE); 
>>     * stick all 1's to MCi_CTL (avoid semantic difference);
>>     * for MCG_CAP, clear MCG_CTL_P, limit to 2 banks (otherwise
>> dirty code have to be added at new vMCE);
> 
> Whether that's acceptable would need to be seen when code
> is ready.
> 
> Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-06-28 17:02               ` Liu, Jinsong
@ 2012-06-29  9:58                 ` Christoph Egger
  2012-07-02 17:32                   ` Liu, Jinsong
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Egger @ 2012-06-29  9:58 UTC (permalink / raw)
  To: Liu, Jinsong
  Cc: Jan Beulich, Auld, Will, Ian Campbell, Luck, Tony, Keir Fraser,
	Raj, Ashok, Jiang, Yunhong, Li, Susie, Shan, Haitao,
	linux-kernel, Dugger, Donald D, xen-devel, Nakajima, Jun, Zhang,
	Xiantao


Feedback from the AMD side:

slide 2:
- PV guests are supposed to install a MCE trap handler
  which reads the MSR values from struct mcinfo_bank.
  Hence it is unclear where the #GP should come from.
  Same for HVM guests which have a PV MCE "driver"
  (those are very rare in reality).

slide 3:
- unclear what "Weird per-domain MSRs" means
- unclear what "Unnatural MCE injection semantics" means

slide 4:
- typo: interace -> interface :-)
- enable UCR-related capabilities, but only on Intel machines
- Filter non-SRAO/SRAR banks:
  Rename it to "Let guest see northbridge bank only to the guest"

slide 7:
- ignore/disable CMCI and CTL2 on AMD

slide 8:
- Filter non-SRAO/SRAR banks:
  Rename it to "Let guest see northbridge bank only to the guest"
- Question: Should we allow the guest to inject errors? Does it make
  sense?
- always disable MCi_CTL2 on AMD

slide 9:
- Model specific issue: Also affects AMD as some models have
  l3 cache and some do not.
  E.g. it does not make sense to report l3 cache errors to guests


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85689 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-06-29  9:58                 ` [Xen-devel] " Christoph Egger
@ 2012-07-02 17:32                   ` Liu, Jinsong
  2012-07-03  7:16                     ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Liu, Jinsong @ 2012-07-02 17:32 UTC (permalink / raw)
  To: Christoph Egger
  Cc: Jan Beulich, Auld, Will, Ian Campbell, Luck, Tony, Keir Fraser,
	Raj, Ashok, Jiang, Yunhong, Li, Susie, Shan, Haitao,
	linux-kernel, Dugger, Donald D, xen-devel, Nakajima, Jun, Zhang,
	Xiantao

Thanks AMD's feedback :)

This vMCE design foils is basically for Intel MCA, involving many details specific to Intel.
I agree that for x86 Intel and AMD can share logic in many fields. However, for MCA logic Intel and AMD are quite different, like
1. MSRs interface, e.g. MCG_CAP, MCi_MICS, MCi_CTL2, etc;
2. error injection, AMD provide NMI/single MCE/broadcast MCE, while in our design only concern broadcast MCE# (and pretend to expose CMCI);
3. MCE handler: currently in xen Intel and AMD mce use different triggle method and mce handler;

Considering the big difference, I suggest we separately provide Intel vMCE and AMD vMCE (i.e. vmce_intel.c and vmce_amd.c).

Thanks,
Jinsong

Christoph Egger wrote:
> Feedback from the AMD side:
> 
> slide 2:
> - PV guests are supposed to install a MCE trap handler
>   which reads the MSR values from struct mcinfo_bank.
>   Hence it is unclear where the #GP should come from.
>   Same for HVM guests which have a PV MCE "driver"
>   (those are very rare in reality).
> 
> slide 3:
> - unclear what "Weird per-domain MSRs" means
> - unclear what "Unnatural MCE injection semantics" means
> 
> slide 4:
> - typo: interace -> interface :-)
> - enable UCR-related capabilities, but only on Intel machines
> - Filter non-SRAO/SRAR banks:
>   Rename it to "Let guest see northbridge bank only to the guest"
> 
> slide 7:
> - ignore/disable CMCI and CTL2 on AMD
> 
> slide 8:
> - Filter non-SRAO/SRAR banks:
>   Rename it to "Let guest see northbridge bank only to the guest"
> - Question: Should we allow the guest to inject errors? Does it make
>   sense?
> - always disable MCi_CTL2 on AMD
> 
> slide 9:
> - Model specific issue: Also affects AMD as some models have
>   l3 cache and some do not.
>   E.g. it does not make sense to report l3 cache errors to guests


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-07-02 17:32                   ` Liu, Jinsong
@ 2012-07-03  7:16                     ` Jan Beulich
  2012-07-03  9:45                       ` Christoph Egger
  2012-07-03 13:26                       ` Luck, Tony
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2012-07-03  7:16 UTC (permalink / raw)
  To: Christoph Egger, Jinsong Liu
  Cc: IanCampbell, Ashok Raj, Donald D Dugger, Haitao Shan,
	Jun Nakajima, Susie Li, Tony Luck, Will Auld, Xiantao Zhang,
	Yunhong Jiang, xen-devel, linux-kernel, KeirFraser

>>> On 02.07.12 at 19:32, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Thanks AMD's feedback :)
> 
> This vMCE design foils is basically for Intel MCA, involving many details 
> specific to Intel.
> I agree that for x86 Intel and AMD can share logic in many fields. However, 
> for MCA logic Intel and AMD are quite different, like
> 1. MSRs interface, e.g. MCG_CAP, MCi_MICS, MCi_CTL2, etc;
> 2. error injection, AMD provide NMI/single MCE/broadcast MCE, while in our 
> design only concern broadcast MCE# (and pretend to expose CMCI);
> 3. MCE handler: currently in xen Intel and AMD mce use different triggle 
> method and mce handler;
> 
> Considering the big difference, I suggest we separately provide Intel vMCE 
> and AMD vMCE (i.e. vmce_intel.c and vmce_amd.c).

I'm not convinced of the need, and would prefer aiming at a
shared implementation unless issues arise that make this
impossible.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-07-03  7:16                     ` Jan Beulich
@ 2012-07-03  9:45                       ` Christoph Egger
  2012-07-03 13:26                       ` Luck, Tony
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Egger @ 2012-07-03  9:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jinsong Liu, IanCampbell, Ashok Raj, Donald D Dugger,
	Haitao Shan, Jun Nakajima, Susie Li, Tony Luck, Will Auld,
	Xiantao Zhang, Yunhong Jiang, xen-devel, linux-kernel,
	KeirFraser

On 07/03/12 09:16, Jan Beulich wrote:

>>>> On 02.07.12 at 19:32, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Thanks AMD's feedback :)
>>
>> This vMCE design foils is basically for Intel MCA, involving many details 
>> specific to Intel.
>> I agree that for x86 Intel and AMD can share logic in many fields. However, 
>> for MCA logic Intel and AMD are quite different, like
>> 1. MSRs interface, e.g. MCG_CAP, MCi_MICS, MCi_CTL2, etc;
>> 2. error injection, AMD provide NMI/single MCE/broadcast MCE, while in our 
>> design only concern broadcast MCE# (and pretend to expose CMCI);
>> 3. MCE handler: currently in xen Intel and AMD mce use different triggle 
>> method and mce handler;
>>
>> Considering the big difference, I suggest we separately provide Intel vMCE 
>> and AMD vMCE (i.e. vmce_intel.c and vmce_amd.c).
> 
> I'm not convinced of the need, and would prefer aiming at a
> shared implementation unless issues arise that make this
> impossible.


I have patches ready that do that. About 80% of mce_intel.c is not
Intel specific. I am just waiting for the feature freeze to end...

Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85689 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-07-03  7:16                     ` Jan Beulich
  2012-07-03  9:45                       ` Christoph Egger
@ 2012-07-03 13:26                       ` Luck, Tony
  2012-07-03 14:50                         ` Christoph Egger
  1 sibling, 1 reply; 20+ messages in thread
From: Luck, Tony @ 2012-07-03 13:26 UTC (permalink / raw)
  To: Jan Beulich, Christoph Egger, Liu, Jinsong
  Cc: IanCampbell, Raj, Ashok, Dugger, Donald D, Shan, Haitao,
	Nakajima, Jun, Li, Susie, Auld, Will, Zhang, Xiantao, Jiang,
	Yunhong, xen-devel, linux-kernel, KeirFraser

> I'm not convinced of the need, and would prefer aiming at a
> shared implementation unless issues arise that make this
> impossible.

It does sound odd. Yes, Intel and AMD have differences around CMCI ... but we are never
going to send a CMCI to a guest (there is no point, it can't do anything useful with the
information, it may do something pointlessly stupid like stop using a guest physical page).
The only reason I suggested making MCG_CAP pretend that CMCI was supported was a
small optimization ... if a Linux guest sees that CMCI is supported, it will not poll the machine
check banks looking for corrected errors.

-Tony

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-07-03 13:26                       ` Luck, Tony
@ 2012-07-03 14:50                         ` Christoph Egger
  2012-07-03 15:08                           ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Egger @ 2012-07-03 14:50 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jan Beulich, Liu, Jinsong, IanCampbell, Raj, Ashok, Dugger,
	Donald D, Shan, Haitao, Nakajima, Jun, Li, Susie, Auld, Will,
	Zhang, Xiantao, Jiang, Yunhong, xen-devel, linux-kernel,
	KeirFraser

On 07/03/12 15:26, Luck, Tony wrote:

>> I'm not convinced of the need, and would prefer aiming at a
>> shared implementation unless issues arise that make this
>> impossible.
> 
> It does sound odd. Yes, Intel and AMD have differences around CMCI ... but we are never
> going to send a CMCI to a guest (there is no point, it can't do anything useful with the
> information, it may do something pointlessly stupid like stop using a guest physical page).
> The only reason I suggested making MCG_CAP pretend that CMCI was supported was a
> small optimization ... if a Linux guest sees that CMCI is supported, it will not poll the machine
> check banks looking for corrected errors.


Are you talking about PV or HVM guest?

For HVM guests yes it makes sense to disable CMCI in MCG_CAP for both
AMD and Intel.

PV guests never read MCE MSRs directly. They install a trap handler and
use the hypercall to fetch the error telemetry. The xen interface is
common to both AMD and Intel - it's basically just an array of bytes.
The first bytes tell you how you can cast and interpret the bytes.
That *content* is specific to AMD or Intel.

How much logic can be shared is almost a matter of software design
and not hardware design.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85689 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-07-03 14:50                         ` Christoph Egger
@ 2012-07-03 15:08                           ` Jan Beulich
  2012-07-03 15:19                             ` Auld, Will
  2012-07-03 15:33                               ` Christoph Egger
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2012-07-03 15:08 UTC (permalink / raw)
  To: Christoph Egger, Tony Luck
  Cc: IanCampbell, Ashok Raj, Donald D Dugger, Haitao Shan,
	Jinsong Liu, Jun Nakajima, Susie Li, Will Auld, Xiantao Zhang,
	Yunhong Jiang, xen-devel, linux-kernel, KeirFraser

>>> On 03.07.12 at 16:50, Christoph Egger <Christoph.Egger@amd.com> wrote:
> On 07/03/12 15:26, Luck, Tony wrote:
> 
>>> I'm not convinced of the need, and would prefer aiming at a
>>> shared implementation unless issues arise that make this
>>> impossible.
>> 
>> It does sound odd. Yes, Intel and AMD have differences around CMCI ... but 
> we are never
>> going to send a CMCI to a guest (there is no point, it can't do anything 
> useful with the
>> information, it may do something pointlessly stupid like stop using a guest 
> physical page).
>> The only reason I suggested making MCG_CAP pretend that CMCI was supported 
> was a
>> small optimization ... if a Linux guest sees that CMCI is supported, it will 
> not poll the machine
>> check banks looking for corrected errors.
> 
> 
> Are you talking about PV or HVM guest?
> 
> For HVM guests yes it makes sense to disable CMCI in MCG_CAP for both
> AMD and Intel.

"enable" you mean?

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [xen vMCE RFC V0.2] xen vMCE design
  2012-07-03 15:08                           ` Jan Beulich
@ 2012-07-03 15:19                             ` Auld, Will
  2012-07-03 15:33                               ` Christoph Egger
  1 sibling, 0 replies; 20+ messages in thread
From: Auld, Will @ 2012-07-03 15:19 UTC (permalink / raw)
  To: Jan Beulich, Christoph Egger, Luck, Tony
  Cc: Liu, Jinsong, Jiang, Yunhong, KeirFraser, IanCampbell, Li, Susie,
	Shan, Haitao, linux-kernel, Dugger, Donald D, xen-devel,
	Nakajima, Jun, Zhang, Xiantao, Raj, Ashok

[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]

Here is the additional document that we mentioned on our proposal. It should be in close agreement with previous information and the discussion. 

Thanks,

Will 

-----Original Message-----
From: Jan Beulich [mailto:JBeulich@suse.com] 
Sent: Tuesday, July 03, 2012 8:08 AM
To: Christoph Egger; Luck, Tony
Cc: IanCampbell; Raj, Ashok; Dugger, Donald D; Shan, Haitao; Liu, Jinsong; Nakajima, Jun; Li, Susie; Auld, Will; Zhang, Xiantao; Jiang, Yunhong; xen-devel@lists.xensource.com; linux-kernel@vger.kernel.org; KeirFraser
Subject: Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design

>>> On 03.07.12 at 16:50, Christoph Egger <Christoph.Egger@amd.com> wrote:
> On 07/03/12 15:26, Luck, Tony wrote:
> 
>>> I'm not convinced of the need, and would prefer aiming at a shared 
>>> implementation unless issues arise that make this impossible.
>> 
>> It does sound odd. Yes, Intel and AMD have differences around CMCI 
>> ... but
> we are never
>> going to send a CMCI to a guest (there is no point, it can't do 
>> anything
> useful with the
>> information, it may do something pointlessly stupid like stop using a 
>> guest
> physical page).
>> The only reason I suggested making MCG_CAP pretend that CMCI was 
>> supported
> was a
>> small optimization ... if a Linux guest sees that CMCI is supported, 
>> it will
> not poll the machine
>> check banks looking for corrected errors.
> 
> 
> Are you talking about PV or HVM guest?
> 
> For HVM guests yes it makes sense to disable CMCI in MCG_CAP for both 
> AMD and Intel.

"enable" you mean?

Jan


[-- Attachment #2: XenMachineCheckArchitecturev0.5.pdf --]
[-- Type: application/pdf, Size: 4935817 bytes --]

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
  2012-07-03 15:08                           ` Jan Beulich
@ 2012-07-03 15:33                               ` Christoph Egger
  2012-07-03 15:33                               ` Christoph Egger
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Egger @ 2012-07-03 15:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tony Luck, IanCampbell, Ashok Raj, Donald D Dugger, Haitao Shan,
	Jinsong Liu, Jun Nakajima, Susie Li, Will Auld, Xiantao Zhang,
	Yunhong Jiang, xen-devel, linux-kernel, KeirFraser

On 07/03/12 17:08, Jan Beulich wrote:

>>>> On 03.07.12 at 16:50, Christoph Egger <Christoph.Egger@amd.com> wrote:
>> On 07/03/12 15:26, Luck, Tony wrote:
>>
>>>> I'm not convinced of the need, and would prefer aiming at a
>>>> shared implementation unless issues arise that make this
>>>> impossible.
>>>
>>> It does sound odd. Yes, Intel and AMD have differences around CMCI ... but 
>> we are never
>>> going to send a CMCI to a guest (there is no point, it can't do anything 
>> useful with the
>>> information, it may do something pointlessly stupid like stop using a guest 
>> physical page).
>>> The only reason I suggested making MCG_CAP pretend that CMCI was supported 
>> was a
>>> small optimization ... if a Linux guest sees that CMCI is supported, it will 
>> not poll the machine
>>> check banks looking for corrected errors.
>>
>>
>> Are you talking about PV or HVM guest?
>>
>> For HVM guests yes it makes sense to disable CMCI in MCG_CAP for both
>> AMD and Intel.
> 
> "enable" you mean?


Oh, sorry. I misread one sentence.
If CMCI for the guest is really just emulated (= 100% software)
then yes, enable for both AMD and Intel is fine.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85689 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [xen vMCE RFC V0.2] xen vMCE design
@ 2012-07-03 15:33                               ` Christoph Egger
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Egger @ 2012-07-03 15:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tony Luck, IanCampbell, Ashok Raj, Donald D Dugger, Haitao Shan,
	Jinsong Liu, Jun Nakajima, Susie Li, Will Auld, Xiantao Zhang,
	Yunhong Jiang, xen-devel, linux-kernel, KeirFraser

On 07/03/12 17:08, Jan Beulich wrote:

>>>> On 03.07.12 at 16:50, Christoph Egger <Christoph.Egger@amd.com> wrote:
>> On 07/03/12 15:26, Luck, Tony wrote:
>>
>>>> I'm not convinced of the need, and would prefer aiming at a
>>>> shared implementation unless issues arise that make this
>>>> impossible.
>>>
>>> It does sound odd. Yes, Intel and AMD have differences around CMCI ... but 
>> we are never
>>> going to send a CMCI to a guest (there is no point, it can't do anything 
>> useful with the
>>> information, it may do something pointlessly stupid like stop using a guest 
>> physical page).
>>> The only reason I suggested making MCG_CAP pretend that CMCI was supported 
>> was a
>>> small optimization ... if a Linux guest sees that CMCI is supported, it will 
>> not poll the machine
>>> check banks looking for corrected errors.
>>
>>
>> Are you talking about PV or HVM guest?
>>
>> For HVM guests yes it makes sense to disable CMCI in MCG_CAP for both
>> AMD and Intel.
> 
> "enable" you mean?


Oh, sorry. I misread one sentence.
If CMCI for the guest is really just emulated (= 100% software)
then yes, enable for both AMD and Intel is fine.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85689 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-07-03 15:33 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-27  3:51 [xen vMCE RFC V0.2] xen vMCE design Liu, Jinsong
2012-06-27 13:14 ` Jan Beulich
2012-06-28  8:54   ` Liu, Jinsong
2012-06-28  9:08     ` Jan Beulich
2012-06-28  9:40       ` Liu, Jinsong
2012-06-28  9:55         ` Jan Beulich
2012-06-28  9:58           ` [Xen-devel] " Ian Campbell
2012-06-28 13:38           ` Liu, Jinsong
2012-06-28 14:00             ` Jan Beulich
2012-06-28 17:02               ` Liu, Jinsong
2012-06-29  9:58                 ` [Xen-devel] " Christoph Egger
2012-07-02 17:32                   ` Liu, Jinsong
2012-07-03  7:16                     ` Jan Beulich
2012-07-03  9:45                       ` Christoph Egger
2012-07-03 13:26                       ` Luck, Tony
2012-07-03 14:50                         ` Christoph Egger
2012-07-03 15:08                           ` Jan Beulich
2012-07-03 15:19                             ` Auld, Will
2012-07-03 15:33                             ` [Xen-devel] " Christoph Egger
2012-07-03 15:33                               ` Christoph Egger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.