All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus
@ 2012-06-13  8:05 Liu, Jinsong
  2012-06-13  8:53 ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Liu, Jinsong @ 2012-06-13  8:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Shan, Haitao, Jiang, Yunhong, Keir Fraser,
	'xen-devel@lists.xensource.com',
	Zhang, Xiantao

[-- Attachment #1: Type: text/plain, Size: 4757 bytes --]

Xen vMCE bugfix: inject vMCE# to all vcpus

In our test for win8 guest mce, we find a bug in that no matter what SRAO/SRAR
error xen inject to win8 guest, it always reboot.

The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this is
not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs).

This patch fix vMCE injection bug, injecting vMCE# to all vcpus.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>

diff -r 19c15f3dfe1f xen/arch/x86/cpu/mcheck/mce.h
--- a/xen/arch/x86/cpu/mcheck/mce.h	Tue Jun 05 03:18:00 2012 +0800
+++ b/xen/arch/x86/cpu/mcheck/mce.h	Wed Jun 13 23:40:45 2012 +0800
@@ -167,7 +167,6 @@
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
         uint64_t gstatus);
-int inject_vmce(struct domain *d);
 int vmce_domain_inject(struct mcinfo_bank *bank, struct domain *d, struct mcinfo_global *global);
 
 extern int vmce_init(struct cpuinfo_x86 *c);
diff -r 19c15f3dfe1f xen/arch/x86/cpu/mcheck/mce_intel.c
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c	Tue Jun 05 03:18:00 2012 +0800
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c	Wed Jun 13 23:40:45 2012 +0800
@@ -638,6 +638,32 @@
     return rec;
 }
 
+static int inject_vmce(struct domain *d)
+{
+    struct vcpu *v;
+
+    /* inject vMCE to all vcpus */
+    for_each_vcpu(d, v)
+    {
+        if ( !test_and_set_bool(v->mce_pending) &&
+            ((d->is_hvm) ? 1 :
+            guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) )
+        {
+            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n",
+                       d->domain_id, v->vcpu_id);
+            vcpu_kick(v);
+        }
+        else
+        {
+            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n",
+                       d->domain_id, v->vcpu_id);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 static void intel_memerr_dhandler(
              struct mca_binfo *binfo,
              enum mce_result *result,
@@ -718,11 +744,8 @@
 
                 /* We will inject vMCE to DOMU*/
                 if ( inject_vmce(d) < 0 )
-                {
-                    mce_printk(MCE_QUIET, "inject vMCE to DOM%d"
-                      " failed\n", d->domain_id);
                     goto vmce_failed;
-                }
+
                 /* Impacted domain go on with domain's recovery job
                  * if the domain has its own MCA handler.
                  * For xen, it has contained the error and finished
diff -r 19c15f3dfe1f xen/arch/x86/cpu/mcheck/vmce.c
--- a/xen/arch/x86/cpu/mcheck/vmce.c	Tue Jun 05 03:18:00 2012 +0800
+++ b/xen/arch/x86/cpu/mcheck/vmce.c	Wed Jun 13 23:40:45 2012 +0800
@@ -389,53 +389,6 @@
 HVM_REGISTER_SAVE_RESTORE(VMCE_VCPU, vmce_save_vcpu_ctxt,
                           vmce_load_vcpu_ctxt, 1, HVMSR_PER_VCPU);
 
-int inject_vmce(struct domain *d)
-{
-    int cpu = smp_processor_id();
-
-    /* PV guest and HVM guest have different vMCE# injection methods. */
-    if ( !test_and_set_bool(d->vcpu[0]->mce_pending) )
-    {
-        if ( d->is_hvm )
-        {
-            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM %d\n",
-                       d->domain_id);
-            vcpu_kick(d->vcpu[0]);
-        }
-        else
-        {
-            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV DOM%d\n",
-                       d->domain_id);
-            if ( guest_has_trap_callback(d, 0, TRAP_machine_check) )
-            {
-                cpumask_copy(d->vcpu[0]->cpu_affinity_tmp,
-                             d->vcpu[0]->cpu_affinity);
-                mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, old %d\n",
-                           cpu, d->vcpu[0]->processor);
-                vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu));
-                vcpu_kick(d->vcpu[0]);
-            }
-            else
-            {
-                mce_printk(MCE_VERBOSE,
-                           "MCE: Kill PV guest with No MCE handler\n");
-                domain_crash(d);
-            }
-        }
-    }
-    else
-    {
-        /* new vMCE comes while first one has not been injected yet,
-         * in this case, inject fail. [We can't lose this vMCE for
-         * the mce node's consistency].
-         */
-        mce_printk(MCE_QUIET, "There's a pending vMCE waiting to be injected "
-                   " to this DOM%d!\n", d->domain_id);
-        return -1;
-    }
-    return 0;
-}
-
 /* This node list records errors impacting a domain. when one
  * MCE# happens, one error bank impacts a domain. This error node
  * will be inserted to the tail of the per_dom data for vMCE# MSR

[-- Attachment #2: vmce-injection-bugfix.patch --]
[-- Type: application/octet-stream, Size: 4630 bytes --]

Xen vMCE bugfix: inject vMCE# to all vcpus

In our test for win8 guest mce, we find a bug in that no matter what SRAO/SRAR
error xen inject to win8 guest, it always reboot.

The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this is
not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs).

This patch fix vMCE injection bug, injecting vMCE# to all vcpus.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>

diff -r 19c15f3dfe1f xen/arch/x86/cpu/mcheck/mce.h
--- a/xen/arch/x86/cpu/mcheck/mce.h	Tue Jun 05 03:18:00 2012 +0800
+++ b/xen/arch/x86/cpu/mcheck/mce.h	Wed Jun 13 23:40:45 2012 +0800
@@ -167,7 +167,6 @@
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
         uint64_t gstatus);
-int inject_vmce(struct domain *d);
 int vmce_domain_inject(struct mcinfo_bank *bank, struct domain *d, struct mcinfo_global *global);
 
 extern int vmce_init(struct cpuinfo_x86 *c);
diff -r 19c15f3dfe1f xen/arch/x86/cpu/mcheck/mce_intel.c
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c	Tue Jun 05 03:18:00 2012 +0800
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c	Wed Jun 13 23:40:45 2012 +0800
@@ -638,6 +638,32 @@
     return rec;
 }
 
+static int inject_vmce(struct domain *d)
+{
+    struct vcpu *v;
+
+    /* inject vMCE to all vcpus */
+    for_each_vcpu(d, v)
+    {
+        if ( !test_and_set_bool(v->mce_pending) &&
+            ((d->is_hvm) ? 1 :
+            guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) )
+        {
+            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n",
+                       d->domain_id, v->vcpu_id);
+            vcpu_kick(v);
+        }
+        else
+        {
+            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n",
+                       d->domain_id, v->vcpu_id);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 static void intel_memerr_dhandler(
              struct mca_binfo *binfo,
              enum mce_result *result,
@@ -718,11 +744,8 @@
 
                 /* We will inject vMCE to DOMU*/
                 if ( inject_vmce(d) < 0 )
-                {
-                    mce_printk(MCE_QUIET, "inject vMCE to DOM%d"
-                      " failed\n", d->domain_id);
                     goto vmce_failed;
-                }
+
                 /* Impacted domain go on with domain's recovery job
                  * if the domain has its own MCA handler.
                  * For xen, it has contained the error and finished
diff -r 19c15f3dfe1f xen/arch/x86/cpu/mcheck/vmce.c
--- a/xen/arch/x86/cpu/mcheck/vmce.c	Tue Jun 05 03:18:00 2012 +0800
+++ b/xen/arch/x86/cpu/mcheck/vmce.c	Wed Jun 13 23:40:45 2012 +0800
@@ -389,53 +389,6 @@
 HVM_REGISTER_SAVE_RESTORE(VMCE_VCPU, vmce_save_vcpu_ctxt,
                           vmce_load_vcpu_ctxt, 1, HVMSR_PER_VCPU);
 
-int inject_vmce(struct domain *d)
-{
-    int cpu = smp_processor_id();
-
-    /* PV guest and HVM guest have different vMCE# injection methods. */
-    if ( !test_and_set_bool(d->vcpu[0]->mce_pending) )
-    {
-        if ( d->is_hvm )
-        {
-            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to HVM DOM %d\n",
-                       d->domain_id);
-            vcpu_kick(d->vcpu[0]);
-        }
-        else
-        {
-            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to PV DOM%d\n",
-                       d->domain_id);
-            if ( guest_has_trap_callback(d, 0, TRAP_machine_check) )
-            {
-                cpumask_copy(d->vcpu[0]->cpu_affinity_tmp,
-                             d->vcpu[0]->cpu_affinity);
-                mce_printk(MCE_VERBOSE, "MCE: CPU%d set affinity, old %d\n",
-                           cpu, d->vcpu[0]->processor);
-                vcpu_set_affinity(d->vcpu[0], cpumask_of(cpu));
-                vcpu_kick(d->vcpu[0]);
-            }
-            else
-            {
-                mce_printk(MCE_VERBOSE,
-                           "MCE: Kill PV guest with No MCE handler\n");
-                domain_crash(d);
-            }
-        }
-    }
-    else
-    {
-        /* new vMCE comes while first one has not been injected yet,
-         * in this case, inject fail. [We can't lose this vMCE for
-         * the mce node's consistency].
-         */
-        mce_printk(MCE_QUIET, "There's a pending vMCE waiting to be injected "
-                   " to this DOM%d!\n", d->domain_id);
-        return -1;
-    }
-    return 0;
-}
-
 /* This node list records errors impacting a domain. when one
  * MCE# happens, one error bank impacts a domain. This error node
  * will be inserted to the tail of the per_dom data for vMCE# MSR

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus
  2012-06-13  8:05 [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus Liu, Jinsong
@ 2012-06-13  8:53 ` Jan Beulich
  2012-06-13 10:49   ` Liu, Jinsong
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2012-06-13  8:53 UTC (permalink / raw)
  To: Jinsong Liu
  Cc: Keir Fraser, Yunhong Jiang, Haitao Shan, Xiantao Zhang, xen-devel

>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Xen vMCE bugfix: inject vMCE# to all vcpus
> 
> In our test for win8 guest mce, we find a bug in that no matter what 
> SRAO/SRAR
> error xen inject to win8 guest, it always reboot.
> 
> The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this 
> is
> not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs).
> 
> This patch fix vMCE injection bug, injecting vMCE# to all vcpus.

I see no correlation between the fix (and its description) and the
problem at hand: Why would Win8 reboot if it doesn't receive a
particular MCE on all CPUs? Isn't that model specific behavior?

Furthermore I doubt that an MCE on one socket indeed causes
MCE-s on all other sockets, not to speak of distinct NUMA nodes
(it would already surprise me if MCE-s got broadcast across cores
within a socket, unless they are caused by a resource shared
across cores).

> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c	Tue Jun 05 03:18:00 2012 +0800
> +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c	Wed Jun 13 23:40:45 2012 +0800
> @@ -638,6 +638,32 @@
>      return rec;
>  }
>  
> +static int inject_vmce(struct domain *d)

Is it really necessary to move this vendor independent function
into a vendor specific source file?

> +{
> +    struct vcpu *v;
> +
> +    /* inject vMCE to all vcpus */
> +    for_each_vcpu(d, v)
> +    {
> +        if ( !test_and_set_bool(v->mce_pending) &&
> +            ((d->is_hvm) ? 1 :
> +            guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) )

Quite strange a way to say

            (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check))

> +        {
> +            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n",
> +                       d->domain_id, v->vcpu_id);
> +            vcpu_kick(v);
> +        }
> +        else
> +        {
> +            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n",
> +                       d->domain_id, v->vcpu_id);
> +            return -1;

Why do you bail here? This is particularly bad if v->mce_pending
was already set on some vCPU (as that could simply mean the guest
just didn't get around to handle the vMCE yet).

> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  static void intel_memerr_dhandler(
>               struct mca_binfo *binfo,
>               enum mce_result *result,

Also, how does this whole change interact with vmce_{rd,wr}msr()?
The struct bank_entry instances live on a per-domain list, so the
vMCE being delivered to all vCPU-s means they will all race for the
single entry (and might erroneously access others, particularly in
vmce_wrmsr()'s MCG_STATUS handling).

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus
  2012-06-13  8:53 ` Jan Beulich
@ 2012-06-13 10:49   ` Liu, Jinsong
  2012-06-13 11:41     ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Liu, Jinsong @ 2012-06-13 10:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Jiang, Yunhong, Shan, Haitao, Zhang, Xiantao, xen-devel

Jan Beulich wrote:
>>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Xen vMCE bugfix: inject vMCE# to all vcpus
>> 
>> In our test for win8 guest mce, we find a bug in that no matter what
>> SRAO/SRAR error xen inject to win8 guest, it always reboot.
>> 
>> The root cause is, current Xen vMCE logic inject vMCE# only to
>> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w
>> generate MCE# to all CPUs). 
>> 
>> This patch fix vMCE injection bug, injecting vMCE# to all vcpus.
> 
> I see no correlation between the fix (and its description) and the
> problem at hand: Why would Win8 reboot if it doesn't receive a
> particular MCE on all CPUs? Isn't that model specific behavior?

It's not model specific. For Intel processor, MCE# is broadcast to all logical processors on the system on which the UCR errors are supported (refer Intel SDM 15.9.3.1 & 15.9.3.2). So for vMCE injection under Intel arch, it's a bug if inject vMCE# only to vcpu0.

As for why win8 reboot, I guess it need wait all cpus enter mce handler and then go ahead. If timeout fail, it will reboot. I have no win8 code, I can only infer so, and test result prove this point.
(It's reasonable of win8 doing so, as Intel's SDM suggestion, due to the potentially shared machine check MSR resources among the logical processors on the same package/core, the MCE handler may be required to synchronize with the other processors that received a machine check error and serialize access to the machine check registers when analyzing, logging and clearing the information in the machine check registers)

> 
> Furthermore I doubt that an MCE on one socket indeed causes
> MCE-s on all other sockets, not to speak of distinct NUMA nodes
> (it would already surprise me if MCE-s got broadcast across cores
> within a socket, unless they are caused by a resource shared
> across cores).

Somehow I agree. But at least currently from Intel SDM, architecturally it would broadcast.

> 
>> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c	Tue Jun 05 03:18:00 2012
>> +0800 +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c	Wed Jun 13 23:40:45
>>      2012 +0800 @@ -638,6 +638,32 @@ return rec;
>>  }
>> 
>> +static int inject_vmce(struct domain *d)
> 
> Is it really necessary to move this vendor independent function
> into a vendor specific source file?

For AMD, I'm not quite sure if inject vMCE to all vcpus is proper for its arch. I remember AMD use nmi/ single MCE#/ broadcast MCE# in their different platforms.

BTW, currently in Xen mce logic, inject_vmce() is only used by Intel mce, so I put it into Intel mce logic to avoid potential issue. Intel mce and AMD mce differs in some points (including MCE# injection), so we'd better not spool together. Only if we are totally sure some logic is safe to co-used by Intel and AMD will we put it into common mce code.

> 
>> +{
>> +    struct vcpu *v;
>> +
>> +    /* inject vMCE to all vcpus */
>> +    for_each_vcpu(d, v)
>> +    {
>> +        if ( !test_and_set_bool(v->mce_pending) &&
>> +            ((d->is_hvm) ? 1 :
>> +            guest_has_trap_callback(d, v->vcpu_id,
>> TRAP_machine_check)) ) 
> 
> Quite strange a way to say
> 
>             (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id,
> TRAP_machine_check)) 

For hvm it's OK to just set v->mce_pending. While for pv it need also check if guest callback has been registered.

> 
>> +        {
>> +            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d
>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>> +            vcpu_kick(v);
>> +        }
>> +        else
>> +        {
>> +            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d
>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>> +            return -1;
> 
> Why do you bail here? This is particularly bad if v->mce_pending
> was already set on some vCPU (as that could simply mean the guest
> just didn't get around to handle the vMCE yet).

the case v->mce_pending already set while new vmce come will result in domain crash.
I don't think guest can still handle this case, e.g. under baremetal if 2nd mce occur while 1st mce have not been handled os will reboot directly.

> 
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  static void intel_memerr_dhandler(
>>               struct mca_binfo *binfo,
>>               enum mce_result *result,
> 
> Also, how does this whole change interact with vmce_{rd,wr}msr()?
> The struct bank_entry instances live on a per-domain list, so the
> vMCE being delivered to all vCPU-s means they will all race for the
> single entry (and might erroneously access others, particularly in
> vmce_wrmsr()'s MCG_STATUS handling).

Yes, I agree.
However, recently we have done vMCE re-design work (ongoing some internal review) and would present sooner. In new approach, MSRs are per-vcpu instead of per-domain.  MSRs race issue (and the effort to make it clean) would be pointless then. So at this butfix patch, I only touch vMCE# injection itself.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus
  2012-06-13 10:49   ` Liu, Jinsong
@ 2012-06-13 11:41     ` Jan Beulich
  2012-06-13 12:37       ` Liu, Jinsong
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2012-06-13 11:41 UTC (permalink / raw)
  To: Jinsong Liu
  Cc: Keir Fraser, Yunhong Jiang, Haitao Shan, Xiantao Zhang, xen-devel

>>> On 13.06.12 at 12:49, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>>>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Xen vMCE bugfix: inject vMCE# to all vcpus
>>> 
>>> In our test for win8 guest mce, we find a bug in that no matter what
>>> SRAO/SRAR error xen inject to win8 guest, it always reboot.
>>> 
>>> The root cause is, current Xen vMCE logic inject vMCE# only to
>>> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w
>>> generate MCE# to all CPUs). 
>>> 
>>> This patch fix vMCE injection bug, injecting vMCE# to all vcpus.
>> 
>> I see no correlation between the fix (and its description) and the
>> problem at hand: Why would Win8 reboot if it doesn't receive a
>> particular MCE on all CPUs? Isn't that model specific behavior?
> 
> It's not model specific. For Intel processor, MCE# is broadcast to all 
> logical processors on the system on which the UCR errors are supported (refer 
> Intel SDM 15.9.3.1 & 15.9.3.2). So for vMCE injection under Intel arch, it's a 
> bug if inject vMCE# only to vcpu0.

This is for a certain group of errors, but I can't find any statement
that this is general architectural behavior.

>> Furthermore I doubt that an MCE on one socket indeed causes
>> MCE-s on all other sockets, not to speak of distinct NUMA nodes
>> (it would already surprise me if MCE-s got broadcast across cores
>> within a socket, unless they are caused by a resource shared
>> across cores).
> 
> Somehow I agree. But at least currently from Intel SDM, architecturally it 
> would broadcast.

I can't even see how this would work reliably across packages or
nodes: Suppose two entities each encounter a UCR - they would
each signal the other one, and the handling of this signal would
need to be synchronized with the handling of the local UCR
(irrespective of the order in which the events arrive).

>>> +        if ( !test_and_set_bool(v->mce_pending) &&
>>> +            ((d->is_hvm) ? 1 :
>>> +            guest_has_trap_callback(d, v->vcpu_id,
>>> TRAP_machine_check)) ) 
>> 
>> Quite strange a way to say
>> 
>>             (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id,
>> TRAP_machine_check)) 
> 
> For hvm it's OK to just set v->mce_pending. While for pv it need also check if 
> guest callback has been registered.

Sure. I was just asking why you use a conditional operator when
the simpler || would do.

>>> +        {
>>> +            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d
>>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>>> +            vcpu_kick(v);
>>> +        }
>>> +        else
>>> +        {
>>> +            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d
>>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>>> +            return -1;
>> 
>> Why do you bail here? This is particularly bad if v->mce_pending
>> was already set on some vCPU (as that could simply mean the guest
>> just didn't get around to handle the vMCE yet).
> 
> the case v->mce_pending already set while new vmce come will result in domain 
> crash.
> I don't think guest can still handle this case, e.g. under baremetal if 2nd 
> mce occur while 1st mce have not been handled os will reboot directly.

Sorry, no. This goes back to the questionable nature of the
broadcasting above - if a CPU encounters a UCR itself and gets
signaled of a remote CPU having encountered one, it should be
able to cope. Otherwise the risk of (pointless!) system death
would increase with the number of CPUs.

>> Also, how does this whole change interact with vmce_{rd,wr}msr()?
>> The struct bank_entry instances live on a per-domain list, so the
>> vMCE being delivered to all vCPU-s means they will all race for the
>> single entry (and might erroneously access others, particularly in
>> vmce_wrmsr()'s MCG_STATUS handling).
> 
> Yes, I agree.
> However, recently we have done vMCE re-design work (ongoing some internal 
> review) and would present sooner. In new approach, MSRs are per-vcpu instead 
> of per-domain.  MSRs race issue (and the effort to make it clean) would be 
> pointless then. So at this butfix patch, I only touch vMCE# injection itself.

I understand that you want the simpler version as bug fix, but
you didn't answer whether it was at all verified that the single
per-domain entry logic would actually cope with the broadcast
delivery logic you're adding here. I'm afraid the per-vCPU entry
logic is a prerequisite to this change (in whatever form it may end
up going in).

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus
  2012-06-13 11:41     ` Jan Beulich
@ 2012-06-13 12:37       ` Liu, Jinsong
  0 siblings, 0 replies; 5+ messages in thread
From: Liu, Jinsong @ 2012-06-13 12:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Jiang, Yunhong, Shan, Haitao, Zhang, Xiantao, xen-devel

Jan Beulich wrote:
>>>> On 13.06.12 at 12:49, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>>>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote: 
>>>> Xen vMCE bugfix: inject vMCE# to all vcpus
>>>> 
>>>> In our test for win8 guest mce, we find a bug in that no matter
>>>> what SRAO/SRAR error xen inject to win8 guest, it always reboot.
>>>> 
>>>> The root cause is, current Xen vMCE logic inject vMCE# only to
>>>> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w
>>>> generate MCE# to all CPUs). 
>>>> 
>>>> This patch fix vMCE injection bug, injecting vMCE# to all vcpus.
>>> 
>>> I see no correlation between the fix (and its description) and the
>>> problem at hand: Why would Win8 reboot if it doesn't receive a
>>> particular MCE on all CPUs? Isn't that model specific behavior?
>> 
>> It's not model specific. For Intel processor, MCE# is broadcast to
>> all logical processors on the system on which the UCR errors are
>> supported (refer Intel SDM 15.9.3.1 & 15.9.3.2). So for vMCE
>> injection under Intel arch, it's a bug if inject vMCE# only to vcpu0.
> 
> This is for a certain group of errors, but I can't find any statement
> that this is general architectural behavior.

Xen mce only inject SRAO/SRAR error to guest, which both belong to UCR error.
For other error like CE and UCNA, hypervisor just virq to dom0 mcelog for logging.

> 
>>> Furthermore I doubt that an MCE on one socket indeed causes
>>> MCE-s on all other sockets, not to speak of distinct NUMA nodes
>>> (it would already surprise me if MCE-s got broadcast across cores
>>> within a socket, unless they are caused by a resource shared
>>> across cores).
>> 
>> Somehow I agree. But at least currently from Intel SDM,
>> architecturally it would broadcast.
> 
> I can't even see how this would work reliably across packages or
> nodes: Suppose two entities each encounter a UCR - they would
> each signal the other one, and the handling of this signal would
> need to be synchronized with the handling of the local UCR
> (irrespective of the order in which the events arrive).

I don't think h/w story being so, though I in fact don't have all details.

Basically, processor is just a 'report point' (via bank) to indicate what error occur. For example, error (mostly) generated at memory and transfer through footpoints e.g. memory --> memory controller --> QPI --> cache controller --> ... --> ... --> cpu core. It's not one processor signaling the other cpu. A mca mechanism monitor different units decide when and what error reported, and notify all processors (that's what broadcast mean).

> 
>>>> +        if ( !test_and_set_bool(v->mce_pending) &&
>>>> +            ((d->is_hvm) ? 1 :
>>>> +            guest_has_trap_callback(d, v->vcpu_id,
>>>> TRAP_machine_check)) )
>>> 
>>> Quite strange a way to say
>>> 
>>>             (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id,
>>> TRAP_machine_check))
>> 
>> For hvm it's OK to just set v->mce_pending. While for pv it need
>> also check if guest callback has been registered.
> 
> Sure. I was just asking why you use a conditional operator when
> the simpler || would do.

Ah I see. It's better.

> 
>>>> +        {
>>>> +            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d
>>>> vcpu%d\n", +                       d->domain_id, v->vcpu_id); +   
>>>> vcpu_kick(v); +        }
>>>> +        else
>>>> +        {
>>>> +            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d
>>>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>>>> +            return -1;
>>> 
>>> Why do you bail here? This is particularly bad if v->mce_pending
>>> was already set on some vCPU (as that could simply mean the guest
>>> just didn't get around to handle the vMCE yet).
>> 
>> the case v->mce_pending already set while new vmce come will result
>> in domain crash. I don't think guest can still handle this case,
>> e.g. under baremetal if 2nd mce occur while 1st mce have not been
>> handled os will reboot directly. 
> 
> Sorry, no. This goes back to the questionable nature of the
> broadcasting above - if a CPU encounters a UCR itself and gets
> signaled of a remote CPU having encountered one, it should be
> able to cope. Otherwise the risk of (pointless!) system death
> would increase with the number of CPUs.
> 
>>> Also, how does this whole change interact with vmce_{rd,wr}msr()?
>>> The struct bank_entry instances live on a per-domain list, so the
>>> vMCE being delivered to all vCPU-s means they will all race for the
>>> single entry (and might erroneously access others, particularly in
>>> vmce_wrmsr()'s MCG_STATUS handling).
>> 
>> Yes, I agree.
>> However, recently we have done vMCE re-design work (ongoing some
>> internal review) and would present sooner. In new approach, MSRs are
>> per-vcpu instead of per-domain.  MSRs race issue (and the effort to
>> make it clean) would be pointless then. So at this butfix patch, I
>> only touch vMCE# injection itself. 
> 
> I understand that you want the simpler version as bug fix, but
> you didn't answer whether it was at all verified that the single
> per-domain entry logic would actually cope with the broadcast
> delivery logic you're adding here. I'm afraid the per-vCPU entry
> logic is a prerequisite to this change (in whatever form it may end
> up going in).
> 
> Jan

Agree, let's hold it. After we present new vMCE approach, we can do it at that time, and no race issue then.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-13 12:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13  8:05 [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus Liu, Jinsong
2012-06-13  8:53 ` Jan Beulich
2012-06-13 10:49   ` Liu, Jinsong
2012-06-13 11:41     ` Jan Beulich
2012-06-13 12:37       ` Liu, Jinsong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.