linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] Add mcelog support for xen platform
@ 2012-04-19 13:29 Liu, Jinsong
  2012-04-20 18:40 ` Konrad Rzeszutek Wilk
  2012-04-21  4:14 ` [Xen-devel] " Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 25+ messages in thread
From: Liu, Jinsong @ 2012-04-19 13:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 21524 bytes --]

>From af4467b6cf0104ce98cc160438d865256c7d5561 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong <jinsong.liu@intel.com>
Date: Fri, 20 Apr 2012 05:08:38 +0800
Subject: [PATCH 1/3] Add mcelog support for xen platform

When MCA error occurs, it would be handled by xen hypervisor first,
and then the error information would be sent to dom0 for logging.

This patch gets error information from xen hypervisor and convert
xen format error into linux format mcelog. This logic is basically
self-contained, not much touching other kernel components.

So under xen platform when MCA error occurs, the error information
would be sent to dom0 and converted into native linux mcelog format.
By using tools like mcelog tool users could read specific error information,
like what they did under native linux.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/xen/hypercall.h |    8 +
 arch/x86/xen/enlighten.c             |    4 +-
 drivers/xen/Kconfig                  |    8 +
 drivers/xen/Makefile                 |    1 +
 drivers/xen/mcelog.c                 |  240 ++++++++++++++++++++++++
 include/xen/interface/xen-mca.h      |  336 ++++++++++++++++++++++++++++++++++
 6 files changed, 594 insertions(+), 3 deletions(-)
 create mode 100644 drivers/xen/mcelog.c
 create mode 100644 include/xen/interface/xen-mca.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 5728852..59c226d 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -48,6 +48,7 @@
 #include <xen/interface/sched.h>
 #include <xen/interface/physdev.h>
 #include <xen/interface/platform.h>
+#include <xen/interface/xen-mca.h>
 
 /*
  * The hypercall asms have to meet several constraints:
@@ -302,6 +303,13 @@ HYPERVISOR_set_timer_op(u64 timeout)
 }
 
 static inline int
+HYPERVISOR_mca(struct xen_mc *mc_op)
+{
+	mc_op->interface_version = XEN_MCA_INTERFACE_VERSION;
+	return _hypercall1(int, mca, mc_op);
+}
+
+static inline int
 HYPERVISOR_dom0_op(struct xen_platform_op *platform_op)
 {
 	platform_op->interface_version = XENPF_INTERFACE_VERSION;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 4f51beb..15628d4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -329,9 +329,7 @@ static void __init xen_init_cpuid_mask(void)
 	unsigned int xsave_mask;
 
 	cpuid_leaf1_edx_mask =
-		~((1 << X86_FEATURE_MCE)  |  /* disable MCE */
-		  (1 << X86_FEATURE_MCA)  |  /* disable MCA */
-		  (1 << X86_FEATURE_MTRR) |  /* disable MTRR */
+		~((1 << X86_FEATURE_MTRR) |  /* disable MTRR */
 		  (1 << X86_FEATURE_ACC));   /* thermal monitoring */
 
 	if (!xen_initial_domain())
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 9424313..0f54241 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -194,4 +194,12 @@ config XEN_ACPI_PROCESSOR
           module will be called xen_acpi_processor  If you do not know what to choose,
           select M here. If the CPUFREQ drivers are built in, select Y here.
 
+config XEN_MCE_LOG
+	bool "Xen platform mcelog"
+	depends on XEN_DOM0 && X86_64 && X86_MCE
+	default n
+	help
+	  Allow kernel fetching MCE error from Xen platform and
+	  converting it into Linux mcelog format for mcelog tools
+
 endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 9adc5be..1d3e763 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_XEN_GNTDEV)		+= xen-gntdev.o
 obj-$(CONFIG_XEN_GRANT_DEV_ALLOC)	+= xen-gntalloc.o
 obj-$(CONFIG_XENFS)			+= xenfs/
 obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
+obj-$(CONFIG_XEN_MCE_LOG)		+= mcelog.o
 obj-$(CONFIG_XEN_PVHVM)			+= platform-pci.o
 obj-$(CONFIG_XEN_TMEM)			+= tmem.o
 obj-$(CONFIG_SWIOTLB_XEN)		+= swiotlb-xen.o
diff --git a/drivers/xen/mcelog.c b/drivers/xen/mcelog.c
new file mode 100644
index 0000000..95ceb5e
--- /dev/null
+++ b/drivers/xen/mcelog.c
@@ -0,0 +1,240 @@
+/******************************************************************************
+ * mcelog.c
+ * Driver for receiving and transferring machine check error infomation
+ *
+ * Copyright (c) 2012 Intel Corporation
+ * Author: Liu, Jinsong <jinsong.liu@intel.com>
+ * Author: Jiang, Yunhong <yunhong.jiang@intel.com>
+ * Author: Ke, Liping <liping.ke@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <asm/mce.h>
+
+#include <xen/interface/xen.h>
+#include <xen/events.h>
+#include <xen/interface/vcpu.h>
+#include <xen/xen.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>
+
+#define XEN_MCELOG "xen_mcelog: "
+
+static struct mc_info g_mi;
+static struct mcinfo_logical_cpu *g_physinfo;
+static uint32_t ncpus;
+
+static DEFINE_SPINLOCK(mcelog_lock);
+
+static int convert_log(struct mc_info *mi)
+{
+	struct mcinfo_common *mic;
+	struct mcinfo_global *mc_global;
+	struct mcinfo_bank *mc_bank;
+	struct mce m;
+	uint32_t i;
+
+	mic = NULL;
+	x86_mcinfo_lookup(&mic, mi, MC_TYPE_GLOBAL);
+	if (unlikely(!mic)) {
+		pr_warning(XEN_MCELOG "Failed to find global error info\n");
+		return -ENODEV;
+	}
+
+	mce_setup(&m);
+	mc_global = (struct mcinfo_global *)mic;
+	m.mcgstatus = mc_global->mc_gstatus;
+	m.apicid = mc_global->mc_apicid;
+
+	for (i = 0; i < ncpus; i++)
+		if (g_physinfo[i].mc_apicid == m.apicid)
+			break;
+	if (unlikely(i == ncpus)) {
+		pr_warning(XEN_MCELOG "Failed to match cpu with apicid %d\n",
+			   m.apicid);
+		return -ENODEV;
+	}
+
+	m.socketid = g_physinfo[i].mc_chipid;
+	m.cpu = m.extcpu = g_physinfo[i].mc_cpunr;
+	m.cpuvendor = (__u8)g_physinfo[i].mc_vendor;
+	m.mcgcap = g_physinfo[i].mc_msrvalues[__MC_MSR_MCGCAP].value;
+
+	mic = NULL;
+	x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK);
+	if (unlikely(!mic)) {
+		pr_warning(XEN_MCELOG "Fail to find bank error info\n");
+		return -ENODEV;
+	}
+
+	do {
+		if ((!mic) || (mic->size == 0) ||
+		    (mic->type != MC_TYPE_GLOBAL   &&
+		     mic->type != MC_TYPE_BANK     &&
+		     mic->type != MC_TYPE_EXTENDED &&
+		     mic->type != MC_TYPE_RECOVERY))
+			break;
+
+		if (mic->type == MC_TYPE_BANK) {
+			mc_bank = (struct mcinfo_bank *)mic;
+			m.misc = mc_bank->mc_misc;
+			m.status = mc_bank->mc_status;
+			m.addr = mc_bank->mc_addr;
+			m.tsc = mc_bank->mc_tsc;
+			m.bank = mc_bank->mc_bank;
+			m.finished = 1;
+			/*log this record*/
+			mce_log(&m);
+		}
+		mic = x86_mcinfo_next(mic);
+	} while (1);
+
+	return 0;
+}
+
+static int mc_queue_handle(uint32_t flags)
+{
+	struct xen_mc mc_op;
+	int ret = 0;
+	unsigned long tmp;
+
+	spin_lock_irqsave(&mcelog_lock, tmp);
+
+	mc_op.cmd = XEN_MC_fetch;
+	mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
+	set_xen_guest_handle(mc_op.u.mc_fetch.data, &g_mi);
+	do {
+		mc_op.u.mc_fetch.flags = flags;
+		ret = HYPERVISOR_mca(&mc_op);
+		if (ret) {
+			pr_err(XEN_MCELOG "Failed to fetch %s error log\n",
+			       (flags == XEN_MC_URGENT) ?
+			       "urgnet" : "nonurgent");
+			break;
+		}
+
+		if (mc_op.u.mc_fetch.flags & XEN_MC_NODATA ||
+		    mc_op.u.mc_fetch.flags & XEN_MC_FETCHFAILED)
+			break;
+		else {
+			ret = convert_log(&g_mi);
+			if (ret)
+				pr_warning(XEN_MCELOG
+					   "Failed to convert this error log, "
+					   "continue acking it anyway\n");
+
+			mc_op.u.mc_fetch.flags = flags | XEN_MC_ACK;
+			ret = HYPERVISOR_mca(&mc_op);
+			if (ret) {
+				pr_err(XEN_MCELOG
+				       "Failed to ack previous error log\n");
+				break;
+			}
+		}
+	} while (1);
+
+	spin_unlock_irqrestore(&mcelog_lock, tmp);
+
+	return ret;
+}
+
+/* virq handler for machine check error info*/
+static irqreturn_t xen_mce_interrupt(int irq, void *dev_id)
+{
+	int err;
+
+	/* urgent mc_info */
+	err = mc_queue_handle(XEN_MC_URGENT);
+	if (err)
+		pr_err(XEN_MCELOG
+		       "Failed to handle urgent mc_info queue, "
+		       "continue handling nonurgent mc_info queue anyway.\n");
+
+	/* nonurgent mc_info */
+	err = mc_queue_handle(XEN_MC_NONURGENT);
+	if (err)
+		pr_err(XEN_MCELOG
+		       "Failed to handle nonurgent mc_info queue.\n");
+
+	return IRQ_HANDLED;
+}
+
+static int bind_virq_for_mce(void)
+{
+	int ret;
+	struct xen_mc mc_op;
+
+	memset(&mc_op, 0, sizeof(struct xen_mc));
+
+	/* Fetch physical CPU Numbers */
+	mc_op.cmd = XEN_MC_physcpuinfo;
+	mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
+	set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
+	ret = HYPERVISOR_mca(&mc_op);
+	if (ret) {
+		pr_err(XEN_MCELOG "Failed to get CPU numbers\n");
+		return ret;
+	}
+
+	/* Fetch each CPU Physical Info for later reference*/
+	ncpus = mc_op.u.mc_physcpuinfo.ncpus;
+	g_physinfo = kcalloc(ncpus, sizeof(struct mcinfo_logical_cpu),
+			     GFP_KERNEL);
+	if (!g_physinfo)
+		return -ENOMEM;
+	set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
+	ret = HYPERVISOR_mca(&mc_op);
+	if (ret) {
+		pr_err(XEN_MCELOG "Failed to get CPU info\n");
+		kfree(g_physinfo);
+		return ret;
+	}
+
+	ret  = bind_virq_to_irqhandler(VIRQ_MCA, 0,
+				       xen_mce_interrupt, 0, "mce", NULL);
+	if (ret < 0) {
+		pr_err(XEN_MCELOG "Failed to bind virq\n");
+		kfree(g_physinfo);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int __init mcelog_init(void)
+{
+	/* Only DOM0 is responsible for MCE logging */
+	if (xen_initial_domain())
+		return bind_virq_for_mce();
+
+	return -ENODEV;
+}
+late_initcall(mcelog_init);
diff --git a/include/xen/interface/xen-mca.h b/include/xen/interface/xen-mca.h
new file mode 100644
index 0000000..f2561db
--- /dev/null
+++ b/include/xen/interface/xen-mca.h
@@ -0,0 +1,336 @@
+/******************************************************************************
+ * arch-x86/mca.h
+ * Guest OS machine check interface to x86 Xen.
+ *
+ * Contributed by Advanced Micro Devices, Inc.
+ * Author: Christoph Egger <Christoph.Egger@amd.com>
+ *
+ * Updated by Intel Corporation
+ * Author: Liu, Jinsong <jinsong.liu@intel.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_MCA_H__
+#define __XEN_PUBLIC_ARCH_X86_MCA_H__
+
+/* Hypercall */
+#define __HYPERVISOR_mca __HYPERVISOR_arch_0
+
+#define XEN_MCA_INTERFACE_VERSION	0x01ecc003
+
+/* IN: Dom0 calls hypercall to retrieve nonurgent error log entry */
+#define XEN_MC_NONURGENT	0x1
+/* IN: Dom0 calls hypercall to retrieve urgent error log entry */
+#define XEN_MC_URGENT		0x2
+/* IN: Dom0 acknowledges previosly-fetched error log entry */
+#define XEN_MC_ACK		0x4
+
+/* OUT: All is ok */
+#define XEN_MC_OK		0x0
+/* OUT: Domain could not fetch data. */
+#define XEN_MC_FETCHFAILED	0x1
+/* OUT: There was no machine check data to fetch. */
+#define XEN_MC_NODATA		0x2
+
+#ifndef __ASSEMBLY__
+/* vIRQ injected to Dom0 */
+#define VIRQ_MCA VIRQ_ARCH_0
+
+/*
+ * mc_info entry types
+ * mca machine check info are recorded in mc_info entries.
+ * when fetch mca info, it can use MC_TYPE_... to distinguish
+ * different mca info.
+ */
+#define MC_TYPE_GLOBAL		0
+#define MC_TYPE_BANK		1
+#define MC_TYPE_EXTENDED	2
+#define MC_TYPE_RECOVERY	3
+
+struct mcinfo_common {
+	uint16_t type; /* structure type */
+	uint16_t size; /* size of this struct in bytes */
+};
+
+#define MC_FLAG_CORRECTABLE	(1 << 0)
+#define MC_FLAG_UNCORRECTABLE	(1 << 1)
+#define MC_FLAG_RECOVERABLE	(1 << 2)
+#define MC_FLAG_POLLED		(1 << 3)
+#define MC_FLAG_RESET		(1 << 4)
+#define MC_FLAG_CMCI		(1 << 5)
+#define MC_FLAG_MCE		(1 << 6)
+
+/* contains x86 global mc information */
+struct mcinfo_global {
+	struct mcinfo_common common;
+
+	uint16_t mc_domid; /* running domain at the time in error */
+	uint16_t mc_vcpuid; /* virtual cpu scheduled for mc_domid */
+	uint32_t mc_socketid; /* physical socket of the physical core */
+	uint16_t mc_coreid; /* physical impacted core */
+	uint16_t mc_core_threadid; /* core thread of physical core */
+	uint32_t mc_apicid;
+	uint32_t mc_flags;
+	uint64_t mc_gstatus; /* global status */
+};
+
+/* contains x86 bank mc information */
+struct mcinfo_bank {
+	struct mcinfo_common common;
+
+	uint16_t mc_bank; /* bank nr */
+	uint16_t mc_domid; /* domain referenced by mc_addr if valid */
+	uint64_t mc_status; /* bank status */
+	uint64_t mc_addr; /* bank address */
+	uint64_t mc_misc;
+	uint64_t mc_ctrl2;
+	uint64_t mc_tsc;
+};
+
+struct mcinfo_msr {
+	uint64_t reg; /* MSR */
+	uint64_t value; /* MSR value */
+};
+
+/* contains mc information from other or additional mc MSRs */
+struct mcinfo_extended {
+	struct mcinfo_common common;
+	uint32_t mc_msrs; /* Number of msr with valid values. */
+	/*
+	 * Currently Intel extended MSR (32/64) include all gp registers
+	 * and E(R)FLAGS, E(R)IP, E(R)MISC, up to 11/19 of them might be
+	 * useful at present. So expand this array to 16/32 to leave room.
+	 */
+	struct mcinfo_msr mc_msr[sizeof(void *) * 4];
+};
+
+/* Recovery Action flags. Giving recovery result information to DOM0 */
+
+/* Xen takes successful recovery action, the error is recovered */
+#define REC_ACTION_RECOVERED (0x1 << 0)
+/* No action is performed by XEN */
+#define REC_ACTION_NONE (0x1 << 1)
+/* It's possible DOM0 might take action ownership in some case */
+#define REC_ACTION_NEED_RESET (0x1 << 2)
+
+/*
+ * Different Recovery Action types, if the action is performed successfully,
+ * REC_ACTION_RECOVERED flag will be returned.
+ */
+
+/* Page Offline Action */
+#define MC_ACTION_PAGE_OFFLINE (0x1 << 0)
+/* CPU offline Action */
+#define MC_ACTION_CPU_OFFLINE (0x1 << 1)
+/* L3 cache disable Action */
+#define MC_ACTION_CACHE_SHRINK (0x1 << 2)
+
+/*
+ * Below interface used between XEN/DOM0 for passing XEN's recovery action
+ * information to DOM0.
+ */
+struct page_offline_action {
+	/* Params for passing the offlined page number to DOM0 */
+	uint64_t mfn;
+	uint64_t status;
+};
+
+struct cpu_offline_action {
+	/* Params for passing the identity of the offlined CPU to DOM0 */
+	uint32_t mc_socketid;
+	uint16_t mc_coreid;
+	uint16_t mc_core_threadid;
+};
+
+#define MAX_UNION_SIZE 16
+struct mcinfo_recovery {
+	struct mcinfo_common common;
+	uint16_t mc_bank; /* bank nr */
+	uint8_t action_flags;
+	uint8_t action_types;
+	union {
+		struct page_offline_action page_retire;
+		struct cpu_offline_action cpu_offline;
+		uint8_t pad[MAX_UNION_SIZE];
+	} action_info;
+};
+
+
+#define MCINFO_MAXSIZE 768
+struct mc_info {
+	/* Number of mcinfo_* entries in mi_data */
+	uint32_t mi_nentries;
+	uint32_t flags;
+	uint64_t mi_data[(MCINFO_MAXSIZE - 1) / 8];
+};
+DEFINE_GUEST_HANDLE_STRUCT(mc_info);
+
+#define __MC_MSR_ARRAYSIZE 8
+#define __MC_MSR_MCGCAP 0
+#define __MC_NMSRS 1
+#define MC_NCAPS 7
+struct mcinfo_logical_cpu {
+	uint32_t mc_cpunr;
+	uint32_t mc_chipid;
+	uint16_t mc_coreid;
+	uint16_t mc_threadid;
+	uint32_t mc_apicid;
+	uint32_t mc_clusterid;
+	uint32_t mc_ncores;
+	uint32_t mc_ncores_active;
+	uint32_t mc_nthreads;
+	uint32_t mc_cpuid_level;
+	uint32_t mc_family;
+	uint32_t mc_vendor;
+	uint32_t mc_model;
+	uint32_t mc_step;
+	char mc_vendorid[16];
+	char mc_brandid[64];
+	uint32_t mc_cpu_caps[MC_NCAPS];
+	uint32_t mc_cache_size;
+	uint32_t mc_cache_alignment;
+	uint32_t mc_nmsrvals;
+	struct mcinfo_msr mc_msrvalues[__MC_MSR_ARRAYSIZE];
+};
+DEFINE_GUEST_HANDLE_STRUCT(mcinfo_logical_cpu);
+
+/*
+ * Prototype:
+ *    uint32_t x86_mcinfo_nentries(struct mc_info *mi);
+ */
+#define x86_mcinfo_nentries(_mi)    \
+	((_mi)->mi_nentries)
+/*
+ * Prototype:
+ *    struct mcinfo_common *x86_mcinfo_first(struct mc_info *mi);
+ */
+#define x86_mcinfo_first(_mi)       \
+	((struct mcinfo_common *)(_mi)->mi_data)
+/*
+ * Prototype:
+ *    struct mcinfo_common *x86_mcinfo_next(struct mcinfo_common *mic);
+ */
+#define x86_mcinfo_next(_mic)       \
+	((struct mcinfo_common *)((uint8_t *)(_mic) + (_mic)->size))
+
+/*
+ * Prototype:
+ *    void x86_mcinfo_lookup(void *ret, struct mc_info *mi, uint16_t type);
+ */
+static inline void x86_mcinfo_lookup(struct mcinfo_common **ret,
+				     struct mc_info *mi, uint16_t type)
+{
+	uint32_t i;
+	struct mcinfo_common *mic;
+	bool found = 0;
+
+	if (!ret || !mi)
+		return;
+
+	mic = x86_mcinfo_first(mi);
+	for (i = 0; i < x86_mcinfo_nentries(mi); i++) {
+		if (mic->type == type) {
+			found = 1;
+			break;
+		}
+		mic = x86_mcinfo_next(mic);
+	}
+
+	*ret = found ? mic : NULL;
+}
+
+/*
+ * Fetch machine check data from hypervisor.
+ */
+#define XEN_MC_fetch		1
+struct xen_mc_fetch {
+	/*
+	 * IN: XEN_MC_NONURGENT, XEN_MC_URGENT,
+	 * XEN_MC_ACK if ack'king an earlier fetch
+	 * OUT: XEN_MC_OK, XEN_MC_FETCHAILED, XEN_MC_NODATA
+	 */
+	uint32_t flags;
+	uint32_t _pad0;
+	/* OUT: id for ack, IN: id we are ack'ing */
+	uint64_t fetch_id;
+
+	/* OUT variables. */
+	GUEST_HANDLE(mc_info) data;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_mc_fetch);
+
+
+/*
+ * This tells the hypervisor to notify a DomU about the machine check error
+ */
+#define XEN_MC_notifydomain	2
+struct xen_mc_notifydomain {
+	/* IN variables */
+	uint16_t mc_domid; /* The unprivileged domain to notify */
+	uint16_t mc_vcpuid; /* The vcpu in mc_domid to notify */
+
+	/* IN/OUT variables */
+	uint32_t flags;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_mc_notifydomain);
+
+#define XEN_MC_physcpuinfo	3
+struct xen_mc_physcpuinfo {
+	/* IN/OUT */
+	uint32_t ncpus;
+	uint32_t _pad0;
+	/* OUT */
+	GUEST_HANDLE(mcinfo_logical_cpu) info;
+};
+
+#define XEN_MC_msrinject	4
+#define MC_MSRINJ_MAXMSRS	8
+struct xen_mc_msrinject {
+	/* IN */
+	uint32_t mcinj_cpunr; /* target processor id */
+	uint32_t mcinj_flags; /* see MC_MSRINJ_F_* below */
+	uint32_t mcinj_count; /* 0 .. count-1 in array are valid */
+	uint32_t _pad0;
+	struct mcinfo_msr mcinj_msr[MC_MSRINJ_MAXMSRS];
+};
+
+/* Flags for mcinj_flags above; bits 16-31 are reserved */
+#define MC_MSRINJ_F_INTERPOSE	0x1
+
+#define XEN_MC_mceinject	5
+struct xen_mc_mceinject {
+	unsigned int mceinj_cpunr; /* target processor id */
+};
+
+struct xen_mc {
+	uint32_t cmd;
+	uint32_t interface_version; /* XEN_MCA_INTERFACE_VERSION */
+	union {
+		struct xen_mc_fetch        mc_fetch;
+		struct xen_mc_notifydomain mc_notifydomain;
+		struct xen_mc_physcpuinfo  mc_physcpuinfo;
+		struct xen_mc_msrinject    mc_msrinject;
+		struct xen_mc_mceinject    mc_mceinject;
+	} u;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_mc);
+
+#endif /* __ASSEMBLY__ */
+#endif /* __XEN_PUBLIC_ARCH_X86_MCA_H__ */
-- 
1.7.1

[-- Attachment #2: 0001-Add-mcelog-support-for-xen-platform.patch --]
[-- Type: application/octet-stream, Size: 20832 bytes --]

From af4467b6cf0104ce98cc160438d865256c7d5561 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong <jinsong.liu@intel.com>
Date: Fri, 20 Apr 2012 05:08:38 +0800
Subject: [PATCH 1/3] Add mcelog support for xen platform

When MCA error occurs, it would be handled by xen hypervisor first,
and then the error information would be sent to dom0 for logging.

This patch gets error information from xen hypervisor and convert
xen format error into linux format mcelog. This logic is basically
self-contained, not much touching other kernel components.

So under xen platform when MCA error occurs, the error information
would be sent to dom0 and converted into native linux mcelog format.
By using tools like mcelog tool users could read specific error information,
like what they did under native linux.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/xen/hypercall.h |    8 +
 arch/x86/xen/enlighten.c             |    4 +-
 drivers/xen/Kconfig                  |    8 +
 drivers/xen/Makefile                 |    1 +
 drivers/xen/mcelog.c                 |  240 ++++++++++++++++++++++++
 include/xen/interface/xen-mca.h      |  336 ++++++++++++++++++++++++++++++++++
 6 files changed, 594 insertions(+), 3 deletions(-)
 create mode 100644 drivers/xen/mcelog.c
 create mode 100644 include/xen/interface/xen-mca.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 5728852..59c226d 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -48,6 +48,7 @@
 #include <xen/interface/sched.h>
 #include <xen/interface/physdev.h>
 #include <xen/interface/platform.h>
+#include <xen/interface/xen-mca.h>
 
 /*
  * The hypercall asms have to meet several constraints:
@@ -302,6 +303,13 @@ HYPERVISOR_set_timer_op(u64 timeout)
 }
 
 static inline int
+HYPERVISOR_mca(struct xen_mc *mc_op)
+{
+	mc_op->interface_version = XEN_MCA_INTERFACE_VERSION;
+	return _hypercall1(int, mca, mc_op);
+}
+
+static inline int
 HYPERVISOR_dom0_op(struct xen_platform_op *platform_op)
 {
 	platform_op->interface_version = XENPF_INTERFACE_VERSION;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 4f51beb..15628d4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -329,9 +329,7 @@ static void __init xen_init_cpuid_mask(void)
 	unsigned int xsave_mask;
 
 	cpuid_leaf1_edx_mask =
-		~((1 << X86_FEATURE_MCE)  |  /* disable MCE */
-		  (1 << X86_FEATURE_MCA)  |  /* disable MCA */
-		  (1 << X86_FEATURE_MTRR) |  /* disable MTRR */
+		~((1 << X86_FEATURE_MTRR) |  /* disable MTRR */
 		  (1 << X86_FEATURE_ACC));   /* thermal monitoring */
 
 	if (!xen_initial_domain())
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 9424313..0f54241 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -194,4 +194,12 @@ config XEN_ACPI_PROCESSOR
           module will be called xen_acpi_processor  If you do not know what to choose,
           select M here. If the CPUFREQ drivers are built in, select Y here.
 
+config XEN_MCE_LOG
+	bool "Xen platform mcelog"
+	depends on XEN_DOM0 && X86_64 && X86_MCE
+	default n
+	help
+	  Allow kernel fetching MCE error from Xen platform and
+	  converting it into Linux mcelog format for mcelog tools
+
 endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 9adc5be..1d3e763 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_XEN_GNTDEV)		+= xen-gntdev.o
 obj-$(CONFIG_XEN_GRANT_DEV_ALLOC)	+= xen-gntalloc.o
 obj-$(CONFIG_XENFS)			+= xenfs/
 obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
+obj-$(CONFIG_XEN_MCE_LOG)		+= mcelog.o
 obj-$(CONFIG_XEN_PVHVM)			+= platform-pci.o
 obj-$(CONFIG_XEN_TMEM)			+= tmem.o
 obj-$(CONFIG_SWIOTLB_XEN)		+= swiotlb-xen.o
diff --git a/drivers/xen/mcelog.c b/drivers/xen/mcelog.c
new file mode 100644
index 0000000..95ceb5e
--- /dev/null
+++ b/drivers/xen/mcelog.c
@@ -0,0 +1,240 @@
+/******************************************************************************
+ * mcelog.c
+ * Driver for receiving and transferring machine check error infomation
+ *
+ * Copyright (c) 2012 Intel Corporation
+ * Author: Liu, Jinsong <jinsong.liu@intel.com>
+ * Author: Jiang, Yunhong <yunhong.jiang@intel.com>
+ * Author: Ke, Liping <liping.ke@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <asm/mce.h>
+
+#include <xen/interface/xen.h>
+#include <xen/events.h>
+#include <xen/interface/vcpu.h>
+#include <xen/xen.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>
+
+#define XEN_MCELOG "xen_mcelog: "
+
+static struct mc_info g_mi;
+static struct mcinfo_logical_cpu *g_physinfo;
+static uint32_t ncpus;
+
+static DEFINE_SPINLOCK(mcelog_lock);
+
+static int convert_log(struct mc_info *mi)
+{
+	struct mcinfo_common *mic;
+	struct mcinfo_global *mc_global;
+	struct mcinfo_bank *mc_bank;
+	struct mce m;
+	uint32_t i;
+
+	mic = NULL;
+	x86_mcinfo_lookup(&mic, mi, MC_TYPE_GLOBAL);
+	if (unlikely(!mic)) {
+		pr_warning(XEN_MCELOG "Failed to find global error info\n");
+		return -ENODEV;
+	}
+
+	mce_setup(&m);
+	mc_global = (struct mcinfo_global *)mic;
+	m.mcgstatus = mc_global->mc_gstatus;
+	m.apicid = mc_global->mc_apicid;
+
+	for (i = 0; i < ncpus; i++)
+		if (g_physinfo[i].mc_apicid == m.apicid)
+			break;
+	if (unlikely(i == ncpus)) {
+		pr_warning(XEN_MCELOG "Failed to match cpu with apicid %d\n",
+			   m.apicid);
+		return -ENODEV;
+	}
+
+	m.socketid = g_physinfo[i].mc_chipid;
+	m.cpu = m.extcpu = g_physinfo[i].mc_cpunr;
+	m.cpuvendor = (__u8)g_physinfo[i].mc_vendor;
+	m.mcgcap = g_physinfo[i].mc_msrvalues[__MC_MSR_MCGCAP].value;
+
+	mic = NULL;
+	x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK);
+	if (unlikely(!mic)) {
+		pr_warning(XEN_MCELOG "Fail to find bank error info\n");
+		return -ENODEV;
+	}
+
+	do {
+		if ((!mic) || (mic->size == 0) ||
+		    (mic->type != MC_TYPE_GLOBAL   &&
+		     mic->type != MC_TYPE_BANK     &&
+		     mic->type != MC_TYPE_EXTENDED &&
+		     mic->type != MC_TYPE_RECOVERY))
+			break;
+
+		if (mic->type == MC_TYPE_BANK) {
+			mc_bank = (struct mcinfo_bank *)mic;
+			m.misc = mc_bank->mc_misc;
+			m.status = mc_bank->mc_status;
+			m.addr = mc_bank->mc_addr;
+			m.tsc = mc_bank->mc_tsc;
+			m.bank = mc_bank->mc_bank;
+			m.finished = 1;
+			/*log this record*/
+			mce_log(&m);
+		}
+		mic = x86_mcinfo_next(mic);
+	} while (1);
+
+	return 0;
+}
+
+static int mc_queue_handle(uint32_t flags)
+{
+	struct xen_mc mc_op;
+	int ret = 0;
+	unsigned long tmp;
+
+	spin_lock_irqsave(&mcelog_lock, tmp);
+
+	mc_op.cmd = XEN_MC_fetch;
+	mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
+	set_xen_guest_handle(mc_op.u.mc_fetch.data, &g_mi);
+	do {
+		mc_op.u.mc_fetch.flags = flags;
+		ret = HYPERVISOR_mca(&mc_op);
+		if (ret) {
+			pr_err(XEN_MCELOG "Failed to fetch %s error log\n",
+			       (flags == XEN_MC_URGENT) ?
+			       "urgnet" : "nonurgent");
+			break;
+		}
+
+		if (mc_op.u.mc_fetch.flags & XEN_MC_NODATA ||
+		    mc_op.u.mc_fetch.flags & XEN_MC_FETCHFAILED)
+			break;
+		else {
+			ret = convert_log(&g_mi);
+			if (ret)
+				pr_warning(XEN_MCELOG
+					   "Failed to convert this error log, "
+					   "continue acking it anyway\n");
+
+			mc_op.u.mc_fetch.flags = flags | XEN_MC_ACK;
+			ret = HYPERVISOR_mca(&mc_op);
+			if (ret) {
+				pr_err(XEN_MCELOG
+				       "Failed to ack previous error log\n");
+				break;
+			}
+		}
+	} while (1);
+
+	spin_unlock_irqrestore(&mcelog_lock, tmp);
+
+	return ret;
+}
+
+/* virq handler for machine check error info*/
+static irqreturn_t xen_mce_interrupt(int irq, void *dev_id)
+{
+	int err;
+
+	/* urgent mc_info */
+	err = mc_queue_handle(XEN_MC_URGENT);
+	if (err)
+		pr_err(XEN_MCELOG
+		       "Failed to handle urgent mc_info queue, "
+		       "continue handling nonurgent mc_info queue anyway.\n");
+
+	/* nonurgent mc_info */
+	err = mc_queue_handle(XEN_MC_NONURGENT);
+	if (err)
+		pr_err(XEN_MCELOG
+		       "Failed to handle nonurgent mc_info queue.\n");
+
+	return IRQ_HANDLED;
+}
+
+static int bind_virq_for_mce(void)
+{
+	int ret;
+	struct xen_mc mc_op;
+
+	memset(&mc_op, 0, sizeof(struct xen_mc));
+
+	/* Fetch physical CPU Numbers */
+	mc_op.cmd = XEN_MC_physcpuinfo;
+	mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
+	set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
+	ret = HYPERVISOR_mca(&mc_op);
+	if (ret) {
+		pr_err(XEN_MCELOG "Failed to get CPU numbers\n");
+		return ret;
+	}
+
+	/* Fetch each CPU Physical Info for later reference*/
+	ncpus = mc_op.u.mc_physcpuinfo.ncpus;
+	g_physinfo = kcalloc(ncpus, sizeof(struct mcinfo_logical_cpu),
+			     GFP_KERNEL);
+	if (!g_physinfo)
+		return -ENOMEM;
+	set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
+	ret = HYPERVISOR_mca(&mc_op);
+	if (ret) {
+		pr_err(XEN_MCELOG "Failed to get CPU info\n");
+		kfree(g_physinfo);
+		return ret;
+	}
+
+	ret  = bind_virq_to_irqhandler(VIRQ_MCA, 0,
+				       xen_mce_interrupt, 0, "mce", NULL);
+	if (ret < 0) {
+		pr_err(XEN_MCELOG "Failed to bind virq\n");
+		kfree(g_physinfo);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int __init mcelog_init(void)
+{
+	/* Only DOM0 is responsible for MCE logging */
+	if (xen_initial_domain())
+		return bind_virq_for_mce();
+
+	return -ENODEV;
+}
+late_initcall(mcelog_init);
diff --git a/include/xen/interface/xen-mca.h b/include/xen/interface/xen-mca.h
new file mode 100644
index 0000000..f2561db
--- /dev/null
+++ b/include/xen/interface/xen-mca.h
@@ -0,0 +1,336 @@
+/******************************************************************************
+ * arch-x86/mca.h
+ * Guest OS machine check interface to x86 Xen.
+ *
+ * Contributed by Advanced Micro Devices, Inc.
+ * Author: Christoph Egger <Christoph.Egger@amd.com>
+ *
+ * Updated by Intel Corporation
+ * Author: Liu, Jinsong <jinsong.liu@intel.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_MCA_H__
+#define __XEN_PUBLIC_ARCH_X86_MCA_H__
+
+/* Hypercall */
+#define __HYPERVISOR_mca __HYPERVISOR_arch_0
+
+#define XEN_MCA_INTERFACE_VERSION	0x01ecc003
+
+/* IN: Dom0 calls hypercall to retrieve nonurgent error log entry */
+#define XEN_MC_NONURGENT	0x1
+/* IN: Dom0 calls hypercall to retrieve urgent error log entry */
+#define XEN_MC_URGENT		0x2
+/* IN: Dom0 acknowledges previosly-fetched error log entry */
+#define XEN_MC_ACK		0x4
+
+/* OUT: All is ok */
+#define XEN_MC_OK		0x0
+/* OUT: Domain could not fetch data. */
+#define XEN_MC_FETCHFAILED	0x1
+/* OUT: There was no machine check data to fetch. */
+#define XEN_MC_NODATA		0x2
+
+#ifndef __ASSEMBLY__
+/* vIRQ injected to Dom0 */
+#define VIRQ_MCA VIRQ_ARCH_0
+
+/*
+ * mc_info entry types
+ * mca machine check info are recorded in mc_info entries.
+ * when fetch mca info, it can use MC_TYPE_... to distinguish
+ * different mca info.
+ */
+#define MC_TYPE_GLOBAL		0
+#define MC_TYPE_BANK		1
+#define MC_TYPE_EXTENDED	2
+#define MC_TYPE_RECOVERY	3
+
+struct mcinfo_common {
+	uint16_t type; /* structure type */
+	uint16_t size; /* size of this struct in bytes */
+};
+
+#define MC_FLAG_CORRECTABLE	(1 << 0)
+#define MC_FLAG_UNCORRECTABLE	(1 << 1)
+#define MC_FLAG_RECOVERABLE	(1 << 2)
+#define MC_FLAG_POLLED		(1 << 3)
+#define MC_FLAG_RESET		(1 << 4)
+#define MC_FLAG_CMCI		(1 << 5)
+#define MC_FLAG_MCE		(1 << 6)
+
+/* contains x86 global mc information */
+struct mcinfo_global {
+	struct mcinfo_common common;
+
+	uint16_t mc_domid; /* running domain at the time in error */
+	uint16_t mc_vcpuid; /* virtual cpu scheduled for mc_domid */
+	uint32_t mc_socketid; /* physical socket of the physical core */
+	uint16_t mc_coreid; /* physical impacted core */
+	uint16_t mc_core_threadid; /* core thread of physical core */
+	uint32_t mc_apicid;
+	uint32_t mc_flags;
+	uint64_t mc_gstatus; /* global status */
+};
+
+/* contains x86 bank mc information */
+struct mcinfo_bank {
+	struct mcinfo_common common;
+
+	uint16_t mc_bank; /* bank nr */
+	uint16_t mc_domid; /* domain referenced by mc_addr if valid */
+	uint64_t mc_status; /* bank status */
+	uint64_t mc_addr; /* bank address */
+	uint64_t mc_misc;
+	uint64_t mc_ctrl2;
+	uint64_t mc_tsc;
+};
+
+struct mcinfo_msr {
+	uint64_t reg; /* MSR */
+	uint64_t value; /* MSR value */
+};
+
+/* contains mc information from other or additional mc MSRs */
+struct mcinfo_extended {
+	struct mcinfo_common common;
+	uint32_t mc_msrs; /* Number of msr with valid values. */
+	/*
+	 * Currently Intel extended MSR (32/64) include all gp registers
+	 * and E(R)FLAGS, E(R)IP, E(R)MISC, up to 11/19 of them might be
+	 * useful at present. So expand this array to 16/32 to leave room.
+	 */
+	struct mcinfo_msr mc_msr[sizeof(void *) * 4];
+};
+
+/* Recovery Action flags. Giving recovery result information to DOM0 */
+
+/* Xen takes successful recovery action, the error is recovered */
+#define REC_ACTION_RECOVERED (0x1 << 0)
+/* No action is performed by XEN */
+#define REC_ACTION_NONE (0x1 << 1)
+/* It's possible DOM0 might take action ownership in some case */
+#define REC_ACTION_NEED_RESET (0x1 << 2)
+
+/*
+ * Different Recovery Action types, if the action is performed successfully,
+ * REC_ACTION_RECOVERED flag will be returned.
+ */
+
+/* Page Offline Action */
+#define MC_ACTION_PAGE_OFFLINE (0x1 << 0)
+/* CPU offline Action */
+#define MC_ACTION_CPU_OFFLINE (0x1 << 1)
+/* L3 cache disable Action */
+#define MC_ACTION_CACHE_SHRINK (0x1 << 2)
+
+/*
+ * Below interface used between XEN/DOM0 for passing XEN's recovery action
+ * information to DOM0.
+ */
+struct page_offline_action {
+	/* Params for passing the offlined page number to DOM0 */
+	uint64_t mfn;
+	uint64_t status;
+};
+
+struct cpu_offline_action {
+	/* Params for passing the identity of the offlined CPU to DOM0 */
+	uint32_t mc_socketid;
+	uint16_t mc_coreid;
+	uint16_t mc_core_threadid;
+};
+
+#define MAX_UNION_SIZE 16
+struct mcinfo_recovery {
+	struct mcinfo_common common;
+	uint16_t mc_bank; /* bank nr */
+	uint8_t action_flags;
+	uint8_t action_types;
+	union {
+		struct page_offline_action page_retire;
+		struct cpu_offline_action cpu_offline;
+		uint8_t pad[MAX_UNION_SIZE];
+	} action_info;
+};
+
+
+#define MCINFO_MAXSIZE 768
+struct mc_info {
+	/* Number of mcinfo_* entries in mi_data */
+	uint32_t mi_nentries;
+	uint32_t flags;
+	uint64_t mi_data[(MCINFO_MAXSIZE - 1) / 8];
+};
+DEFINE_GUEST_HANDLE_STRUCT(mc_info);
+
+#define __MC_MSR_ARRAYSIZE 8
+#define __MC_MSR_MCGCAP 0
+#define __MC_NMSRS 1
+#define MC_NCAPS 7
+struct mcinfo_logical_cpu {
+	uint32_t mc_cpunr;
+	uint32_t mc_chipid;
+	uint16_t mc_coreid;
+	uint16_t mc_threadid;
+	uint32_t mc_apicid;
+	uint32_t mc_clusterid;
+	uint32_t mc_ncores;
+	uint32_t mc_ncores_active;
+	uint32_t mc_nthreads;
+	uint32_t mc_cpuid_level;
+	uint32_t mc_family;
+	uint32_t mc_vendor;
+	uint32_t mc_model;
+	uint32_t mc_step;
+	char mc_vendorid[16];
+	char mc_brandid[64];
+	uint32_t mc_cpu_caps[MC_NCAPS];
+	uint32_t mc_cache_size;
+	uint32_t mc_cache_alignment;
+	uint32_t mc_nmsrvals;
+	struct mcinfo_msr mc_msrvalues[__MC_MSR_ARRAYSIZE];
+};
+DEFINE_GUEST_HANDLE_STRUCT(mcinfo_logical_cpu);
+
+/*
+ * Prototype:
+ *    uint32_t x86_mcinfo_nentries(struct mc_info *mi);
+ */
+#define x86_mcinfo_nentries(_mi)    \
+	((_mi)->mi_nentries)
+/*
+ * Prototype:
+ *    struct mcinfo_common *x86_mcinfo_first(struct mc_info *mi);
+ */
+#define x86_mcinfo_first(_mi)       \
+	((struct mcinfo_common *)(_mi)->mi_data)
+/*
+ * Prototype:
+ *    struct mcinfo_common *x86_mcinfo_next(struct mcinfo_common *mic);
+ */
+#define x86_mcinfo_next(_mic)       \
+	((struct mcinfo_common *)((uint8_t *)(_mic) + (_mic)->size))
+
+/*
+ * Prototype:
+ *    void x86_mcinfo_lookup(void *ret, struct mc_info *mi, uint16_t type);
+ */
+static inline void x86_mcinfo_lookup(struct mcinfo_common **ret,
+				     struct mc_info *mi, uint16_t type)
+{
+	uint32_t i;
+	struct mcinfo_common *mic;
+	bool found = 0;
+
+	if (!ret || !mi)
+		return;
+
+	mic = x86_mcinfo_first(mi);
+	for (i = 0; i < x86_mcinfo_nentries(mi); i++) {
+		if (mic->type == type) {
+			found = 1;
+			break;
+		}
+		mic = x86_mcinfo_next(mic);
+	}
+
+	*ret = found ? mic : NULL;
+}
+
+/*
+ * Fetch machine check data from hypervisor.
+ */
+#define XEN_MC_fetch		1
+struct xen_mc_fetch {
+	/*
+	 * IN: XEN_MC_NONURGENT, XEN_MC_URGENT,
+	 * XEN_MC_ACK if ack'king an earlier fetch
+	 * OUT: XEN_MC_OK, XEN_MC_FETCHAILED, XEN_MC_NODATA
+	 */
+	uint32_t flags;
+	uint32_t _pad0;
+	/* OUT: id for ack, IN: id we are ack'ing */
+	uint64_t fetch_id;
+
+	/* OUT variables. */
+	GUEST_HANDLE(mc_info) data;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_mc_fetch);
+
+
+/*
+ * This tells the hypervisor to notify a DomU about the machine check error
+ */
+#define XEN_MC_notifydomain	2
+struct xen_mc_notifydomain {
+	/* IN variables */
+	uint16_t mc_domid; /* The unprivileged domain to notify */
+	uint16_t mc_vcpuid; /* The vcpu in mc_domid to notify */
+
+	/* IN/OUT variables */
+	uint32_t flags;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_mc_notifydomain);
+
+#define XEN_MC_physcpuinfo	3
+struct xen_mc_physcpuinfo {
+	/* IN/OUT */
+	uint32_t ncpus;
+	uint32_t _pad0;
+	/* OUT */
+	GUEST_HANDLE(mcinfo_logical_cpu) info;
+};
+
+#define XEN_MC_msrinject	4
+#define MC_MSRINJ_MAXMSRS	8
+struct xen_mc_msrinject {
+	/* IN */
+	uint32_t mcinj_cpunr; /* target processor id */
+	uint32_t mcinj_flags; /* see MC_MSRINJ_F_* below */
+	uint32_t mcinj_count; /* 0 .. count-1 in array are valid */
+	uint32_t _pad0;
+	struct mcinfo_msr mcinj_msr[MC_MSRINJ_MAXMSRS];
+};
+
+/* Flags for mcinj_flags above; bits 16-31 are reserved */
+#define MC_MSRINJ_F_INTERPOSE	0x1
+
+#define XEN_MC_mceinject	5
+struct xen_mc_mceinject {
+	unsigned int mceinj_cpunr; /* target processor id */
+};
+
+struct xen_mc {
+	uint32_t cmd;
+	uint32_t interface_version; /* XEN_MCA_INTERFACE_VERSION */
+	union {
+		struct xen_mc_fetch        mc_fetch;
+		struct xen_mc_notifydomain mc_notifydomain;
+		struct xen_mc_physcpuinfo  mc_physcpuinfo;
+		struct xen_mc_msrinject    mc_msrinject;
+		struct xen_mc_mceinject    mc_mceinject;
+	} u;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_mc);
+
+#endif /* __ASSEMBLY__ */
+#endif /* __XEN_PUBLIC_ARCH_X86_MCA_H__ */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] Add mcelog support for xen platform
  2012-04-19 13:29 [PATCH 1/3] Add mcelog support for xen platform Liu, Jinsong
@ 2012-04-20 18:40 ` Konrad Rzeszutek Wilk
  2012-04-20 20:00   ` Liu, Jinsong
  2012-04-21  4:14 ` [Xen-devel] " Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-20 18:40 UTC (permalink / raw)
  To: Liu, Jinsong; +Cc: xen-devel, linux-kernel

On Thu, Apr 19, 2012 at 01:29:06PM +0000, Liu, Jinsong wrote:
> >From af4467b6cf0104ce98cc160438d865256c7d5561 Mon Sep 17 00:00:00 2001
> From: Liu, Jinsong <jinsong.liu@intel.com>
> Date: Fri, 20 Apr 2012 05:08:38 +0800
> Subject: [PATCH 1/3] Add mcelog support for xen platform
> 
> When MCA error occurs, it would be handled by xen hypervisor first,
> and then the error information would be sent to dom0 for logging.

How do I test this?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 1/3] Add mcelog support for xen platform
  2012-04-20 18:40 ` Konrad Rzeszutek Wilk
@ 2012-04-20 20:00   ` Liu, Jinsong
  0 siblings, 0 replies; 25+ messages in thread
From: Liu, Jinsong @ 2012-04-20 20:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, linux-kernel

Konrad Rzeszutek Wilk wrote:
> On Thu, Apr 19, 2012 at 01:29:06PM +0000, Liu, Jinsong wrote:
>>> From af4467b6cf0104ce98cc160438d865256c7d5561 Mon Sep 17 00:00:00
>>> 2001 
>> From: Liu, Jinsong <jinsong.liu@intel.com>
>> Date: Fri, 20 Apr 2012 05:08:38 +0800
>> Subject: [PATCH 1/3] Add mcelog support for xen platform
>> 
>> When MCA error occurs, it would be handled by xen hypervisor first,
>> and then the error information would be sent to dom0 for logging.
> 
> How do I test this?

It can be tested by EINJ tool to inject an memory error, and by mcelog tool to read error info, basically it need:

1. enable einj tools: load EINJ kernel module and mount debugfs;

2. set error type
# cat available_error_type (depend on your bios)
0x00000002      Processor Uncorrectable non-fatal
0x00000008      Memory Correctable
0x00000010      Memory Uncorrectable non-fatal
# echo 0x8 > error_type

3. find a proper physical address for injection (usually use a free page like xen heap page)
 
4. set the injection address and mask
# echo address > param1
# echo mask > param2
 
5. inject error
# echo 1 > error_inject
 
6. use mcelog tool to read error info


Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-19 13:29 [PATCH 1/3] Add mcelog support for xen platform Liu, Jinsong
  2012-04-20 18:40 ` Konrad Rzeszutek Wilk
@ 2012-04-21  4:14 ` Konrad Rzeszutek Wilk
  2012-04-21 10:45   ` Borislav Petkov
  1 sibling, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-21  4:14 UTC (permalink / raw)
  To: Liu, Jinsong, tony.luck, bp, x86, linux-edac; +Cc: xen-devel, linux-kernel

On Thu, Apr 19, 2012 at 01:29:06PM +0000, Liu, Jinsong wrote:
> >From af4467b6cf0104ce98cc160438d865256c7d5561 Mon Sep 17 00:00:00 2001
> From: Liu, Jinsong <jinsong.liu@intel.com>
> Date: Fri, 20 Apr 2012 05:08:38 +0800
> Subject: [PATCH 1/3] Add mcelog support for xen platform

Tony and Borislav,

Should this driver reside in arch/x86/kernel/cpu/mcheck?
I can keep it in drivers/xen, but I realized that perhaps it
should be in a different location?
> 
> When MCA error occurs, it would be handled by xen hypervisor first,
> and then the error information would be sent to dom0 for logging.
> 
> This patch gets error information from xen hypervisor and convert
> xen format error into linux format mcelog. This logic is basically
> self-contained, not much touching other kernel components.
> 
> So under xen platform when MCA error occurs, the error information
> would be sent to dom0 and converted into native linux mcelog format.
> By using tools like mcelog tool users could read specific error information,
> like what they did under native linux.
> 
> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
> Signed-off-by: Ke, Liping <liping.ke@intel.com>
> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  arch/x86/include/asm/xen/hypercall.h |    8 +
>  arch/x86/xen/enlighten.c             |    4 +-
>  drivers/xen/Kconfig                  |    8 +
>  drivers/xen/Makefile                 |    1 +
>  drivers/xen/mcelog.c                 |  240 ++++++++++++++++++++++++
>  include/xen/interface/xen-mca.h      |  336 ++++++++++++++++++++++++++++++++++
>  6 files changed, 594 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/xen/mcelog.c
>  create mode 100644 include/xen/interface/xen-mca.h
> 
> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
> index 5728852..59c226d 100644
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -48,6 +48,7 @@
>  #include <xen/interface/sched.h>
>  #include <xen/interface/physdev.h>
>  #include <xen/interface/platform.h>
> +#include <xen/interface/xen-mca.h>
>  
>  /*
>   * The hypercall asms have to meet several constraints:
> @@ -302,6 +303,13 @@ HYPERVISOR_set_timer_op(u64 timeout)
>  }
>  
>  static inline int
> +HYPERVISOR_mca(struct xen_mc *mc_op)
> +{
> +	mc_op->interface_version = XEN_MCA_INTERFACE_VERSION;
> +	return _hypercall1(int, mca, mc_op);
> +}
> +
> +static inline int
>  HYPERVISOR_dom0_op(struct xen_platform_op *platform_op)
>  {
>  	platform_op->interface_version = XENPF_INTERFACE_VERSION;
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 4f51beb..15628d4 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -329,9 +329,7 @@ static void __init xen_init_cpuid_mask(void)
>  	unsigned int xsave_mask;
>  
>  	cpuid_leaf1_edx_mask =
> -		~((1 << X86_FEATURE_MCE)  |  /* disable MCE */
> -		  (1 << X86_FEATURE_MCA)  |  /* disable MCA */
> -		  (1 << X86_FEATURE_MTRR) |  /* disable MTRR */
> +		~((1 << X86_FEATURE_MTRR) |  /* disable MTRR */
>  		  (1 << X86_FEATURE_ACC));   /* thermal monitoring */
>  
>  	if (!xen_initial_domain())
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 9424313..0f54241 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -194,4 +194,12 @@ config XEN_ACPI_PROCESSOR
>            module will be called xen_acpi_processor  If you do not know what to choose,
>            select M here. If the CPUFREQ drivers are built in, select Y here.
>  
> +config XEN_MCE_LOG
> +	bool "Xen platform mcelog"
> +	depends on XEN_DOM0 && X86_64 && X86_MCE
> +	default n
> +	help
> +	  Allow kernel fetching MCE error from Xen platform and
> +	  converting it into Linux mcelog format for mcelog tools
> +
>  endmenu
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 9adc5be..1d3e763 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -14,6 +14,7 @@ obj-$(CONFIG_XEN_GNTDEV)		+= xen-gntdev.o
>  obj-$(CONFIG_XEN_GRANT_DEV_ALLOC)	+= xen-gntalloc.o
>  obj-$(CONFIG_XENFS)			+= xenfs/
>  obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
> +obj-$(CONFIG_XEN_MCE_LOG)		+= mcelog.o
>  obj-$(CONFIG_XEN_PVHVM)			+= platform-pci.o
>  obj-$(CONFIG_XEN_TMEM)			+= tmem.o
>  obj-$(CONFIG_SWIOTLB_XEN)		+= swiotlb-xen.o
> diff --git a/drivers/xen/mcelog.c b/drivers/xen/mcelog.c
> new file mode 100644
> index 0000000..95ceb5e
> --- /dev/null
> +++ b/drivers/xen/mcelog.c
> @@ -0,0 +1,240 @@
> +/******************************************************************************
> + * mcelog.c
> + * Driver for receiving and transferring machine check error infomation
> + *
> + * Copyright (c) 2012 Intel Corporation
> + * Author: Liu, Jinsong <jinsong.liu@intel.com>
> + * Author: Jiang, Yunhong <yunhong.jiang@intel.com>
> + * Author: Ke, Liping <liping.ke@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this source file (the "Software"), to deal in the Software without
> + * restriction, including without limitation the rights to use, copy, modify,
> + * merge, publish, distribute, sublicense, and/or sell copies of the Software,
> + * and to permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <asm/mce.h>
> +
> +#include <xen/interface/xen.h>
> +#include <xen/events.h>
> +#include <xen/interface/vcpu.h>
> +#include <xen/xen.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/hypervisor.h>
> +
> +#define XEN_MCELOG "xen_mcelog: "
> +
> +static struct mc_info g_mi;
> +static struct mcinfo_logical_cpu *g_physinfo;
> +static uint32_t ncpus;
> +
> +static DEFINE_SPINLOCK(mcelog_lock);
> +
> +static int convert_log(struct mc_info *mi)
> +{
> +	struct mcinfo_common *mic;
> +	struct mcinfo_global *mc_global;
> +	struct mcinfo_bank *mc_bank;
> +	struct mce m;
> +	uint32_t i;
> +
> +	mic = NULL;
> +	x86_mcinfo_lookup(&mic, mi, MC_TYPE_GLOBAL);
> +	if (unlikely(!mic)) {
> +		pr_warning(XEN_MCELOG "Failed to find global error info\n");
> +		return -ENODEV;
> +	}
> +
> +	mce_setup(&m);
> +	mc_global = (struct mcinfo_global *)mic;
> +	m.mcgstatus = mc_global->mc_gstatus;
> +	m.apicid = mc_global->mc_apicid;
> +
> +	for (i = 0; i < ncpus; i++)
> +		if (g_physinfo[i].mc_apicid == m.apicid)
> +			break;
> +	if (unlikely(i == ncpus)) {
> +		pr_warning(XEN_MCELOG "Failed to match cpu with apicid %d\n",
> +			   m.apicid);
> +		return -ENODEV;
> +	}
> +
> +	m.socketid = g_physinfo[i].mc_chipid;
> +	m.cpu = m.extcpu = g_physinfo[i].mc_cpunr;
> +	m.cpuvendor = (__u8)g_physinfo[i].mc_vendor;
> +	m.mcgcap = g_physinfo[i].mc_msrvalues[__MC_MSR_MCGCAP].value;
> +
> +	mic = NULL;
> +	x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK);
> +	if (unlikely(!mic)) {
> +		pr_warning(XEN_MCELOG "Fail to find bank error info\n");
> +		return -ENODEV;
> +	}
> +
> +	do {
> +		if ((!mic) || (mic->size == 0) ||
> +		    (mic->type != MC_TYPE_GLOBAL   &&
> +		     mic->type != MC_TYPE_BANK     &&
> +		     mic->type != MC_TYPE_EXTENDED &&
> +		     mic->type != MC_TYPE_RECOVERY))
> +			break;
> +
> +		if (mic->type == MC_TYPE_BANK) {
> +			mc_bank = (struct mcinfo_bank *)mic;
> +			m.misc = mc_bank->mc_misc;
> +			m.status = mc_bank->mc_status;
> +			m.addr = mc_bank->mc_addr;
> +			m.tsc = mc_bank->mc_tsc;
> +			m.bank = mc_bank->mc_bank;
> +			m.finished = 1;
> +			/*log this record*/
> +			mce_log(&m);
> +		}
> +		mic = x86_mcinfo_next(mic);
> +	} while (1);
> +
> +	return 0;
> +}
> +
> +static int mc_queue_handle(uint32_t flags)
> +{
> +	struct xen_mc mc_op;
> +	int ret = 0;
> +	unsigned long tmp;
> +
> +	spin_lock_irqsave(&mcelog_lock, tmp);
> +
> +	mc_op.cmd = XEN_MC_fetch;
> +	mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
> +	set_xen_guest_handle(mc_op.u.mc_fetch.data, &g_mi);
> +	do {
> +		mc_op.u.mc_fetch.flags = flags;
> +		ret = HYPERVISOR_mca(&mc_op);
> +		if (ret) {
> +			pr_err(XEN_MCELOG "Failed to fetch %s error log\n",
> +			       (flags == XEN_MC_URGENT) ?
> +			       "urgnet" : "nonurgent");
> +			break;
> +		}
> +
> +		if (mc_op.u.mc_fetch.flags & XEN_MC_NODATA ||
> +		    mc_op.u.mc_fetch.flags & XEN_MC_FETCHFAILED)
> +			break;
> +		else {
> +			ret = convert_log(&g_mi);
> +			if (ret)
> +				pr_warning(XEN_MCELOG
> +					   "Failed to convert this error log, "
> +					   "continue acking it anyway\n");
> +
> +			mc_op.u.mc_fetch.flags = flags | XEN_MC_ACK;
> +			ret = HYPERVISOR_mca(&mc_op);
> +			if (ret) {
> +				pr_err(XEN_MCELOG
> +				       "Failed to ack previous error log\n");
> +				break;
> +			}
> +		}
> +	} while (1);
> +
> +	spin_unlock_irqrestore(&mcelog_lock, tmp);
> +
> +	return ret;
> +}
> +
> +/* virq handler for machine check error info*/
> +static irqreturn_t xen_mce_interrupt(int irq, void *dev_id)
> +{
> +	int err;
> +
> +	/* urgent mc_info */
> +	err = mc_queue_handle(XEN_MC_URGENT);
> +	if (err)
> +		pr_err(XEN_MCELOG
> +		       "Failed to handle urgent mc_info queue, "
> +		       "continue handling nonurgent mc_info queue anyway.\n");
> +
> +	/* nonurgent mc_info */
> +	err = mc_queue_handle(XEN_MC_NONURGENT);
> +	if (err)
> +		pr_err(XEN_MCELOG
> +		       "Failed to handle nonurgent mc_info queue.\n");
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int bind_virq_for_mce(void)
> +{
> +	int ret;
> +	struct xen_mc mc_op;
> +
> +	memset(&mc_op, 0, sizeof(struct xen_mc));
> +
> +	/* Fetch physical CPU Numbers */
> +	mc_op.cmd = XEN_MC_physcpuinfo;
> +	mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
> +	set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
> +	ret = HYPERVISOR_mca(&mc_op);
> +	if (ret) {
> +		pr_err(XEN_MCELOG "Failed to get CPU numbers\n");
> +		return ret;
> +	}
> +
> +	/* Fetch each CPU Physical Info for later reference*/
> +	ncpus = mc_op.u.mc_physcpuinfo.ncpus;
> +	g_physinfo = kcalloc(ncpus, sizeof(struct mcinfo_logical_cpu),
> +			     GFP_KERNEL);
> +	if (!g_physinfo)
> +		return -ENOMEM;
> +	set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
> +	ret = HYPERVISOR_mca(&mc_op);
> +	if (ret) {
> +		pr_err(XEN_MCELOG "Failed to get CPU info\n");
> +		kfree(g_physinfo);
> +		return ret;
> +	}
> +
> +	ret  = bind_virq_to_irqhandler(VIRQ_MCA, 0,
> +				       xen_mce_interrupt, 0, "mce", NULL);
> +	if (ret < 0) {
> +		pr_err(XEN_MCELOG "Failed to bind virq\n");
> +		kfree(g_physinfo);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init mcelog_init(void)
> +{
> +	/* Only DOM0 is responsible for MCE logging */
> +	if (xen_initial_domain())
> +		return bind_virq_for_mce();
> +
> +	return -ENODEV;
> +}
> +late_initcall(mcelog_init);
> diff --git a/include/xen/interface/xen-mca.h b/include/xen/interface/xen-mca.h
> new file mode 100644
> index 0000000..f2561db
> --- /dev/null
> +++ b/include/xen/interface/xen-mca.h
> @@ -0,0 +1,336 @@
> +/******************************************************************************
> + * arch-x86/mca.h
> + * Guest OS machine check interface to x86 Xen.
> + *
> + * Contributed by Advanced Micro Devices, Inc.
> + * Author: Christoph Egger <Christoph.Egger@amd.com>
> + *
> + * Updated by Intel Corporation
> + * Author: Liu, Jinsong <jinsong.liu@intel.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __XEN_PUBLIC_ARCH_X86_MCA_H__
> +#define __XEN_PUBLIC_ARCH_X86_MCA_H__
> +
> +/* Hypercall */
> +#define __HYPERVISOR_mca __HYPERVISOR_arch_0
> +
> +#define XEN_MCA_INTERFACE_VERSION	0x01ecc003
> +
> +/* IN: Dom0 calls hypercall to retrieve nonurgent error log entry */
> +#define XEN_MC_NONURGENT	0x1
> +/* IN: Dom0 calls hypercall to retrieve urgent error log entry */
> +#define XEN_MC_URGENT		0x2
> +/* IN: Dom0 acknowledges previosly-fetched error log entry */
> +#define XEN_MC_ACK		0x4
> +
> +/* OUT: All is ok */
> +#define XEN_MC_OK		0x0
> +/* OUT: Domain could not fetch data. */
> +#define XEN_MC_FETCHFAILED	0x1
> +/* OUT: There was no machine check data to fetch. */
> +#define XEN_MC_NODATA		0x2
> +
> +#ifndef __ASSEMBLY__
> +/* vIRQ injected to Dom0 */
> +#define VIRQ_MCA VIRQ_ARCH_0
> +
> +/*
> + * mc_info entry types
> + * mca machine check info are recorded in mc_info entries.
> + * when fetch mca info, it can use MC_TYPE_... to distinguish
> + * different mca info.
> + */
> +#define MC_TYPE_GLOBAL		0
> +#define MC_TYPE_BANK		1
> +#define MC_TYPE_EXTENDED	2
> +#define MC_TYPE_RECOVERY	3
> +
> +struct mcinfo_common {
> +	uint16_t type; /* structure type */
> +	uint16_t size; /* size of this struct in bytes */
> +};
> +
> +#define MC_FLAG_CORRECTABLE	(1 << 0)
> +#define MC_FLAG_UNCORRECTABLE	(1 << 1)
> +#define MC_FLAG_RECOVERABLE	(1 << 2)
> +#define MC_FLAG_POLLED		(1 << 3)
> +#define MC_FLAG_RESET		(1 << 4)
> +#define MC_FLAG_CMCI		(1 << 5)
> +#define MC_FLAG_MCE		(1 << 6)
> +
> +/* contains x86 global mc information */
> +struct mcinfo_global {
> +	struct mcinfo_common common;
> +
> +	uint16_t mc_domid; /* running domain at the time in error */
> +	uint16_t mc_vcpuid; /* virtual cpu scheduled for mc_domid */
> +	uint32_t mc_socketid; /* physical socket of the physical core */
> +	uint16_t mc_coreid; /* physical impacted core */
> +	uint16_t mc_core_threadid; /* core thread of physical core */
> +	uint32_t mc_apicid;
> +	uint32_t mc_flags;
> +	uint64_t mc_gstatus; /* global status */
> +};
> +
> +/* contains x86 bank mc information */
> +struct mcinfo_bank {
> +	struct mcinfo_common common;
> +
> +	uint16_t mc_bank; /* bank nr */
> +	uint16_t mc_domid; /* domain referenced by mc_addr if valid */
> +	uint64_t mc_status; /* bank status */
> +	uint64_t mc_addr; /* bank address */
> +	uint64_t mc_misc;
> +	uint64_t mc_ctrl2;
> +	uint64_t mc_tsc;
> +};
> +
> +struct mcinfo_msr {
> +	uint64_t reg; /* MSR */
> +	uint64_t value; /* MSR value */
> +};
> +
> +/* contains mc information from other or additional mc MSRs */
> +struct mcinfo_extended {
> +	struct mcinfo_common common;
> +	uint32_t mc_msrs; /* Number of msr with valid values. */
> +	/*
> +	 * Currently Intel extended MSR (32/64) include all gp registers
> +	 * and E(R)FLAGS, E(R)IP, E(R)MISC, up to 11/19 of them might be
> +	 * useful at present. So expand this array to 16/32 to leave room.
> +	 */
> +	struct mcinfo_msr mc_msr[sizeof(void *) * 4];
> +};
> +
> +/* Recovery Action flags. Giving recovery result information to DOM0 */
> +
> +/* Xen takes successful recovery action, the error is recovered */
> +#define REC_ACTION_RECOVERED (0x1 << 0)
> +/* No action is performed by XEN */
> +#define REC_ACTION_NONE (0x1 << 1)
> +/* It's possible DOM0 might take action ownership in some case */
> +#define REC_ACTION_NEED_RESET (0x1 << 2)
> +
> +/*
> + * Different Recovery Action types, if the action is performed successfully,
> + * REC_ACTION_RECOVERED flag will be returned.
> + */
> +
> +/* Page Offline Action */
> +#define MC_ACTION_PAGE_OFFLINE (0x1 << 0)
> +/* CPU offline Action */
> +#define MC_ACTION_CPU_OFFLINE (0x1 << 1)
> +/* L3 cache disable Action */
> +#define MC_ACTION_CACHE_SHRINK (0x1 << 2)
> +
> +/*
> + * Below interface used between XEN/DOM0 for passing XEN's recovery action
> + * information to DOM0.
> + */
> +struct page_offline_action {
> +	/* Params for passing the offlined page number to DOM0 */
> +	uint64_t mfn;
> +	uint64_t status;
> +};
> +
> +struct cpu_offline_action {
> +	/* Params for passing the identity of the offlined CPU to DOM0 */
> +	uint32_t mc_socketid;
> +	uint16_t mc_coreid;
> +	uint16_t mc_core_threadid;
> +};
> +
> +#define MAX_UNION_SIZE 16
> +struct mcinfo_recovery {
> +	struct mcinfo_common common;
> +	uint16_t mc_bank; /* bank nr */
> +	uint8_t action_flags;
> +	uint8_t action_types;
> +	union {
> +		struct page_offline_action page_retire;
> +		struct cpu_offline_action cpu_offline;
> +		uint8_t pad[MAX_UNION_SIZE];
> +	} action_info;
> +};
> +
> +
> +#define MCINFO_MAXSIZE 768
> +struct mc_info {
> +	/* Number of mcinfo_* entries in mi_data */
> +	uint32_t mi_nentries;
> +	uint32_t flags;
> +	uint64_t mi_data[(MCINFO_MAXSIZE - 1) / 8];
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(mc_info);
> +
> +#define __MC_MSR_ARRAYSIZE 8
> +#define __MC_MSR_MCGCAP 0
> +#define __MC_NMSRS 1
> +#define MC_NCAPS 7
> +struct mcinfo_logical_cpu {
> +	uint32_t mc_cpunr;
> +	uint32_t mc_chipid;
> +	uint16_t mc_coreid;
> +	uint16_t mc_threadid;
> +	uint32_t mc_apicid;
> +	uint32_t mc_clusterid;
> +	uint32_t mc_ncores;
> +	uint32_t mc_ncores_active;
> +	uint32_t mc_nthreads;
> +	uint32_t mc_cpuid_level;
> +	uint32_t mc_family;
> +	uint32_t mc_vendor;
> +	uint32_t mc_model;
> +	uint32_t mc_step;
> +	char mc_vendorid[16];
> +	char mc_brandid[64];
> +	uint32_t mc_cpu_caps[MC_NCAPS];
> +	uint32_t mc_cache_size;
> +	uint32_t mc_cache_alignment;
> +	uint32_t mc_nmsrvals;
> +	struct mcinfo_msr mc_msrvalues[__MC_MSR_ARRAYSIZE];
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(mcinfo_logical_cpu);
> +
> +/*
> + * Prototype:
> + *    uint32_t x86_mcinfo_nentries(struct mc_info *mi);
> + */
> +#define x86_mcinfo_nentries(_mi)    \
> +	((_mi)->mi_nentries)
> +/*
> + * Prototype:
> + *    struct mcinfo_common *x86_mcinfo_first(struct mc_info *mi);
> + */
> +#define x86_mcinfo_first(_mi)       \
> +	((struct mcinfo_common *)(_mi)->mi_data)
> +/*
> + * Prototype:
> + *    struct mcinfo_common *x86_mcinfo_next(struct mcinfo_common *mic);
> + */
> +#define x86_mcinfo_next(_mic)       \
> +	((struct mcinfo_common *)((uint8_t *)(_mic) + (_mic)->size))
> +
> +/*
> + * Prototype:
> + *    void x86_mcinfo_lookup(void *ret, struct mc_info *mi, uint16_t type);
> + */
> +static inline void x86_mcinfo_lookup(struct mcinfo_common **ret,
> +				     struct mc_info *mi, uint16_t type)
> +{
> +	uint32_t i;
> +	struct mcinfo_common *mic;
> +	bool found = 0;
> +
> +	if (!ret || !mi)
> +		return;
> +
> +	mic = x86_mcinfo_first(mi);
> +	for (i = 0; i < x86_mcinfo_nentries(mi); i++) {
> +		if (mic->type == type) {
> +			found = 1;
> +			break;
> +		}
> +		mic = x86_mcinfo_next(mic);
> +	}
> +
> +	*ret = found ? mic : NULL;
> +}
> +
> +/*
> + * Fetch machine check data from hypervisor.
> + */
> +#define XEN_MC_fetch		1
> +struct xen_mc_fetch {
> +	/*
> +	 * IN: XEN_MC_NONURGENT, XEN_MC_URGENT,
> +	 * XEN_MC_ACK if ack'king an earlier fetch
> +	 * OUT: XEN_MC_OK, XEN_MC_FETCHAILED, XEN_MC_NODATA
> +	 */
> +	uint32_t flags;
> +	uint32_t _pad0;
> +	/* OUT: id for ack, IN: id we are ack'ing */
> +	uint64_t fetch_id;
> +
> +	/* OUT variables. */
> +	GUEST_HANDLE(mc_info) data;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_mc_fetch);
> +
> +
> +/*
> + * This tells the hypervisor to notify a DomU about the machine check error
> + */
> +#define XEN_MC_notifydomain	2
> +struct xen_mc_notifydomain {
> +	/* IN variables */
> +	uint16_t mc_domid; /* The unprivileged domain to notify */
> +	uint16_t mc_vcpuid; /* The vcpu in mc_domid to notify */
> +
> +	/* IN/OUT variables */
> +	uint32_t flags;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_mc_notifydomain);
> +
> +#define XEN_MC_physcpuinfo	3
> +struct xen_mc_physcpuinfo {
> +	/* IN/OUT */
> +	uint32_t ncpus;
> +	uint32_t _pad0;
> +	/* OUT */
> +	GUEST_HANDLE(mcinfo_logical_cpu) info;
> +};
> +
> +#define XEN_MC_msrinject	4
> +#define MC_MSRINJ_MAXMSRS	8
> +struct xen_mc_msrinject {
> +	/* IN */
> +	uint32_t mcinj_cpunr; /* target processor id */
> +	uint32_t mcinj_flags; /* see MC_MSRINJ_F_* below */
> +	uint32_t mcinj_count; /* 0 .. count-1 in array are valid */
> +	uint32_t _pad0;
> +	struct mcinfo_msr mcinj_msr[MC_MSRINJ_MAXMSRS];
> +};
> +
> +/* Flags for mcinj_flags above; bits 16-31 are reserved */
> +#define MC_MSRINJ_F_INTERPOSE	0x1
> +
> +#define XEN_MC_mceinject	5
> +struct xen_mc_mceinject {
> +	unsigned int mceinj_cpunr; /* target processor id */
> +};
> +
> +struct xen_mc {
> +	uint32_t cmd;
> +	uint32_t interface_version; /* XEN_MCA_INTERFACE_VERSION */
> +	union {
> +		struct xen_mc_fetch        mc_fetch;
> +		struct xen_mc_notifydomain mc_notifydomain;
> +		struct xen_mc_physcpuinfo  mc_physcpuinfo;
> +		struct xen_mc_msrinject    mc_msrinject;
> +		struct xen_mc_mceinject    mc_mceinject;
> +	} u;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_mc);
> +
> +#endif /* __ASSEMBLY__ */
> +#endif /* __XEN_PUBLIC_ARCH_X86_MCA_H__ */
> -- 
> 1.7.1


> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-21  4:14 ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-04-21 10:45   ` Borislav Petkov
  2012-04-23  2:18     ` Konrad Rzeszutek Wilk
                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Borislav Petkov @ 2012-04-21 10:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Liu, Jinsong, tony.luck, bp, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

+ x86 people.

On Sat, Apr 21, 2012 at 12:14:54AM -0400, Konrad Rzeszutek Wilk wrote:
> Should this driver reside in arch/x86/kernel/cpu/mcheck?

The fact that you raise such a question is already loaded with problems,
see below.

> I can keep it in drivers/xen, but I realized that perhaps it
> should be in a different location?
> > 
> > When MCA error occurs, it would be handled by xen hypervisor first,
> > and then the error information would be sent to dom0 for logging.
> > 
> > This patch gets error information from xen hypervisor and convert
> > xen format error into linux format mcelog. This logic is basically
> > self-contained, not much touching other kernel components.

I have a problem with that statement above. This thing uses struct mce
and mce_log(), which are internal to x86, and whenever we want to change
anything there, we won't be able to because xen uses it too.

And this has happened already with the whole microcode loading debacle.

So, my suggestion is to copy the pieces you need and create your own xen
version of /dev/mcelog and add it to dom0 so that there's no hooking
into baremetal code and whenever a dom0 kernel is running, you can
reroute the mcelog userspace tool to read /dev/xen_mcelog or whatever
and not hook into the x86 versions.

Because, if you'd hooked into it, just imagine one fine day, when we
remove mcelog support, what screaming the xen people will be doing when
mcelog doesn't work anymore.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-21 10:45   ` Borislav Petkov
@ 2012-04-23  2:18     ` Konrad Rzeszutek Wilk
  2012-04-23  6:09       ` Borislav Petkov
  2012-04-23 15:27     ` Luck, Tony
  2012-04-24  8:27     ` Andi Kleen
  2 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-23  2:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Liu, Jinsong, tony.luck, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

> > > This patch gets error information from xen hypervisor and convert
> > > xen format error into linux format mcelog. This logic is basically
> > > self-contained, not much touching other kernel components.
> 
> I have a problem with that statement above. This thing uses struct mce
> and mce_log(), which are internal to x86, and whenever we want to change
> anything there, we won't be able to because xen uses it too.

Huh? You mean this:

       m.misc = mc_bank->mc_misc;
       m.status = mc_bank->mc_status;
       m.addr = mc_bank->mc_addr;
       m.tsc = mc_bank->mc_tsc;
       m.bank = mc_bank->mc_bank;
       m.finished = 1;
       /*log this record*/
       mce_log(&m);

?

This driver is not that much different from the APEI bridge to MCE code -
it just that instead of reading APEI blob data it reads it from an hypercall.

Taking this a step forward, If you change the 'mce_log' or the 'struct mce'
won't you have a problem with the APEI one b/c it uses it too?

The fix seems quite easy - you change the 'struct mce' and 'mce_log()'
along with the drivers that use it.

If you are worried about breaking something, then you can just send
the change to me or Liu to test it out before committing API changes
in the MCE code.


> 
> And this has happened already with the whole microcode loading debacle.

My recollection is that the existing microcode API had major issues that
could not fixed. The only fix was to make it be very early in the bootup
processes and that is what hpa would like developers to focus on.

> 
> So, my suggestion is to copy the pieces you need and create your own xen
> version of /dev/mcelog and add it to dom0 so that there's no hooking
> into baremetal code and whenever a dom0 kernel is running, you can
> reroute the mcelog userspace tool to read /dev/xen_mcelog or whatever
> and not hook into the x86 versions.
> 
> Because, if you'd hooked into it, just imagine one fine day, when we
> remove mcelog support, what screaming the xen people will be doing when
> mcelog doesn't work anymore.

You would have more screaming from the distro camp about removing
/dev/mcelog. But if that is your choice, I would send you an email
asking how I need to retool this driver to work with the new MCE gen2
code that you had in mind.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23  2:18     ` Konrad Rzeszutek Wilk
@ 2012-04-23  6:09       ` Borislav Petkov
  2012-04-23 15:06         ` Konrad Rzeszutek Wilk
  2012-04-23 16:17         ` Liu, Jinsong
  0 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2012-04-23  6:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Borislav Petkov, Liu, Jinsong, tony.luck, x86, linux-edac,
	xen-devel, linux-kernel, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner

On Sun, Apr 22, 2012 at 10:18:27PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > This patch gets error information from xen hypervisor and convert
> > > > xen format error into linux format mcelog. This logic is basically
> > > > self-contained, not much touching other kernel components.
> > 
> > I have a problem with that statement above. This thing uses struct mce
> > and mce_log(), which are internal to x86, and whenever we want to change
> > anything there, we won't be able to because xen uses it too.
> 
> Huh? You mean this:
> 
>        m.misc = mc_bank->mc_misc;
>        m.status = mc_bank->mc_status;
>        m.addr = mc_bank->mc_addr;
>        m.tsc = mc_bank->mc_tsc;
>        m.bank = mc_bank->mc_bank;
>        m.finished = 1;
>        /*log this record*/
>        mce_log(&m);
> 
> ?
> 
> This driver is not that much different from the APEI bridge to MCE code -
> it just that instead of reading APEI blob data it reads it from an hypercall.

Let me ask you this: is APEI a virtualization solution of some sort?

No, it is the old windoze RAS crap but I guess Linux has to support it
now too through BIOS. And x86 vendors will have to support it too.

So it is piece of the firmware we'd have to deal with too.

Now xen is a whole another deal - it is purely a piece of software.

> The fix seems quite easy - you change the 'struct mce' and 'mce_log()'
> along with the drivers that use it.

This is exactly what I have a problem with: having to take care of xen
too. "No, Boris, nope, we cannot take your new feature because it breaks
xen." and also "Have you tested this on xen too?" where the only thing I
do is _hardware_ enablement and improving software support for it. And
xen is not hardware...

[..]

> If you are worried about breaking something, then you can just send
> the change to me or Liu to test it out before committing API changes
> in the MCE code.

This probably sounds good now but I don't think code changes like
that ever run as smoothly. Whenever there's breakage, there'll always
be people screaming against it - I just don't want code that enables
hardware to be crippled and unable to change because it breaks
completely unrelated pieces - it is bad as it is now.

> > And this has happened already with the whole microcode loading debacle.
> 
> My recollection is that the existing microcode API had major issues that
> could not fixed. The only fix was to make it be very early in the bootup
> processes and that is what hpa would like developers to focus on.

That was one side of the problem. The other was, AFAICR, creating a xen
microcode driver which was "on the same level" as the hardware microcode
drivers, which was completely bull*.

The problem is xen growing stuff everywhere in arch/x86/ and this way,
maybe even unwillingly, crippling development of hardware-related
features. I know you're willing to help and I know you mean it well, but
there's always some other problem in practice.

Now I keep wondering, why don't you guys simply create your own mcelog
ring buffer and interface on the userspace tool side instead of hooking
into lowlevel kernel stuff? I mean, the code is there, you simply have
to copy it into arch/xen/ or whatever you have there. Why do you have to
hook into arch/x86/ instead of doing your own stuff?

> > So, my suggestion is to copy the pieces you need and create your own xen
> > version of /dev/mcelog and add it to dom0 so that there's no hooking
> > into baremetal code and whenever a dom0 kernel is running, you can
> > reroute the mcelog userspace tool to read /dev/xen_mcelog or whatever
> > and not hook into the x86 versions.
> > 
> > Because, if you'd hooked into it, just imagine one fine day, when we
> > remove mcelog support, what screaming the xen people will be doing when
> > mcelog doesn't work anymore.
> 
> You would have more screaming from the distro camp about removing
> /dev/mcelog.

How do you know that? Don't you think that we probably would've talked
to them already and made preparations for conversion first?

> But if that is your choice, I would send you an email asking how I
> need to retool this driver to work with the new MCE gen2 code that you
> had in mind.

As I said above, I'm very sceptical this will ever work, I guess I'd
have to live and see.

Now, with your own buffer solution, nothing breaks and all is happy,
a win-win, if you wish. I think this is much simpler and easier a
solution.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23  6:09       ` Borislav Petkov
@ 2012-04-23 15:06         ` Konrad Rzeszutek Wilk
  2012-04-23 15:59           ` Borislav Petkov
  2012-04-23 16:17         ` Liu, Jinsong
  1 sibling, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-23 15:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Liu, Jinsong, tony.luck, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

> > This driver is not that much different from the APEI bridge to MCE code -
> > it just that instead of reading APEI blob data it reads it from an hypercall.
> 
> Let me ask you this: is APEI a virtualization solution of some sort?
> 
> No, it is the old windoze RAS crap but I guess Linux has to support it
> now too through BIOS. And x86 vendors will have to support it too.
> 
> So it is piece of the firmware we'd have to deal with too.
> 
> Now xen is a whole another deal - it is purely a piece of software.

Perfect. Software is more elastic than hardware - so the Xen ABI
for the MCE can be changed then to reflect the new format if required.
> 
> > The fix seems quite easy - you change the 'struct mce' and 'mce_log()'
> > along with the drivers that use it.
> 
> This is exactly what I have a problem with: having to take care of xen
> too. "No, Boris, nope, we cannot take your new feature because it breaks
> xen." and also "Have you tested this on xen too?" where the only thing I
> do is _hardware_ enablement and improving software support for it. And
> xen is not hardware...

Delegate testing to sub-maintainers. In this case that would be me
and Liu.

> 
> [..]
> 
> > If you are worried about breaking something, then you can just send
> > the change to me or Liu to test it out before committing API changes
> > in the MCE code.
> 
> This probably sounds good now but I don't think code changes like
> that ever run as smoothly. Whenever there's breakage, there'll always
> be people screaming against it - I just don't want code that enables

Right, regressions are bad.

> hardware to be crippled and unable to change because it breaks
> completely unrelated pieces - it is bad as it is now.

Can you point me to the existing examples of MCE's badness?

I remember the Greg KK's patches to the dynamic vs static and per-cpu
initialization - but that wasn't due to the MCE API. I think that
was due to Key Sievens transition from SysFS subsystem API to device API
patches that broke bunch of stuff.

> 
> > > And this has happened already with the whole microcode loading debacle.
> > 
> > My recollection is that the existing microcode API had major issues that
> > could not fixed. The only fix was to make it be very early in the bootup
> > processes and that is what hpa would like developers to focus on.
> 
> That was one side of the problem. The other was, AFAICR, creating a xen
> microcode driver which was "on the same level" as the hardware microcode
> drivers, which was completely bull*.

I think of Xen as the hypervisors on PowerPC boxes - for certain operations
you have to use hypercalls to do some hardware operations.

> 
> The problem is xen growing stuff everywhere in arch/x86/ and this way,
> maybe even unwillingly, crippling development of hardware-related
> features. I know you're willing to help and I know you mean it well, but
> there's always some other problem in practice.

I am not sure I see why we cannot fix the practical problems as they pop
up?

> 
> Now I keep wondering, why don't you guys simply create your own mcelog
> ring buffer and interface on the userspace tool side instead of hooking
> into lowlevel kernel stuff? I mean, the code is there, you simply have
> to copy it into arch/xen/ or whatever you have there. Why do you have to

Nowadays the kernel can transition to run under lguest, KVM, Xen or
baremetal as a single binary image instead of multiple compiled
kernels for a specific virtualization framework. As such, there is
no 'arch/xen,lguest,kvm', instead there is alternative_asm that patches
the low-level calls (set_pte, load_cr3, spinlock, time, etc), for the
appropiate virtualization (or CPU if done under baremetal) offering. Hence
the arch/x86 has expanded to support baremtal and virtualization
extensions in it (called paravirt_ops).

> hook into arch/x86/ instead of doing your own stuff?

I think what you are suggesting is to _not_ reuse existing APIs. That
seems counter-intuive to general software development. There are
exceptions of course - when the existing API needs to change a lot
(or needs to be thrown out), and there is this one little driver that
keeps on using the old interface and can't change - at that point I can
see the purpose of forking it. But until then - using existing APIs is
the way to go.

And I (along with Liu) will keep the Xen MCE driver evolving as it
needs to conform to the new kernel mcheck API.

> 
> > > So, my suggestion is to copy the pieces you need and create your own xen
> > > version of /dev/mcelog and add it to dom0 so that there's no hooking
> > > into baremetal code and whenever a dom0 kernel is running, you can
> > > reroute the mcelog userspace tool to read /dev/xen_mcelog or whatever
> > > and not hook into the x86 versions.
> > > 
> > > Because, if you'd hooked into it, just imagine one fine day, when we
> > > remove mcelog support, what screaming the xen people will be doing when
> > > mcelog doesn't work anymore.
> > 
> > You would have more screaming from the distro camp about removing
> > /dev/mcelog.
> 
> How do you know that? Don't you think that we probably would've talked
> to them already and made preparations for conversion first?

I was Googling around for it and I couldn't find anything that says
MCE is removed (which could be very well the fault of my poor
Googling-skills) and its replacement user-space program. Please do
point me to the URL so I can get some idea of what is brewing.

The one thing I saw was this https://lkml.org/lkml/2012/3/2/312 which
pointed to the /dev/mcelog struct changing (which then got NACK-ed), but
nothing about the internal 'struct mce' being dropped from drivers?

I couldn't find anything in the Documentation/feature-removal-schedule.txt 

There are hints of ras_printk and  /sys/devices/system/ras/agent but
they are related to printk (and I see that mce_log would use it too), but
that just seems to [from a driver perspective] to be the output code-paths
[like the MCE decoders] - and it allows to output to be put in a trace
buffer as well - instead of just in /dev/mcelog.

If the distros choose to stop using /dev/mcelog and use another mechanism
(and the MCE check drivers still do their job of collecting the data and
sending it downstream), then I don't see why "screaming the xen people will
be doing" - as they still get the MCE errors - in whatever way the distros
has choosen to present it.

> 
> > But if that is your choice, I would send you an email asking how I
> > need to retool this driver to work with the new MCE gen2 code that you
> > had in mind.
> 
> As I said above, I'm very sceptical this will ever work, I guess I'd
> have to live and see.
> 
> Now, with your own buffer solution, nothing breaks and all is happy,
> a win-win, if you wish. I think this is much simpler and easier a
> solution.

Not sure what you mean by 'own buffer solution'. Are you talking about
using the trace_mce_record or the ras_printk instead of the mce_log?
I would think that is the job of the MCE decoders?

Please keep in mind that this driver is not trying to decode anything -
it just lifting raw events, massaging them a bit, and sending them
downstream. Similar to how the APEI does it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-21 10:45   ` Borislav Petkov
  2012-04-23  2:18     ` Konrad Rzeszutek Wilk
@ 2012-04-23 15:27     ` Luck, Tony
  2012-04-23 15:32       ` Konrad Rzeszutek Wilk
                         ` (2 more replies)
  2012-04-24  8:27     ` Andi Kleen
  2 siblings, 3 replies; 25+ messages in thread
From: Luck, Tony @ 2012-04-23 15:27 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk
  Cc: Liu, Jinsong, x86, linux-edac, xen-devel, linux-kernel,
	H. Peter Anvin, Ingo Molnar, Thomas Gleixner

> Because, if you'd hooked into it, just imagine one fine day, when we
> remove mcelog support, what screaming the xen people will be doing when
> mcelog doesn't work anymore.

Agreed. Even before we get to deleting mcelog, "struct mce" can change (new
fields could be added) ... and you don't want to have your hypervisor to
have to know which version of Linux it is talking to.

-Tony

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:27     ` Luck, Tony
@ 2012-04-23 15:32       ` Konrad Rzeszutek Wilk
  2012-04-23 16:17         ` Luck, Tony
  2012-04-23 15:43       ` Liu, Jinsong
  2012-04-23 16:17       ` H. Peter Anvin
  2 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-23 15:32 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Liu, Jinsong, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On Mon, Apr 23, 2012 at 03:27:32PM +0000, Luck, Tony wrote:
> > Because, if you'd hooked into it, just imagine one fine day, when we
> > remove mcelog support, what screaming the xen people will be doing when
> > mcelog doesn't work anymore.
> 
> Agreed. Even before we get to deleting mcelog, "struct mce" can change (new
> fields could be added) ... and you don't want to have your hypervisor to
> have to know which version of Linux it is talking to.

I am having a hard time seeing how this is different from 'struct mce'
changing and the hardware remaining the same?

Can't the MCE check drivers deal with that - I mean that is their purpose -
to extract some "blob" of data from the hardware, massage it in the software
structure, and send its way. If you add some more fields in the software
structure - either the hardware can provide it or not. If not, it will have
to emulate it. This would be the same case with this driver.

This driver is _not_ doing a strict bit-by-bit copy of 'struct mcinfo'
to 'struct mce'. It is plucking the right bits and setting them in
'struct mce'.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:27     ` Luck, Tony
  2012-04-23 15:32       ` Konrad Rzeszutek Wilk
@ 2012-04-23 15:43       ` Liu, Jinsong
  2012-04-23 16:26         ` H. Peter Anvin
  2012-04-23 16:17       ` H. Peter Anvin
  2 siblings, 1 reply; 25+ messages in thread
From: Liu, Jinsong @ 2012-04-23 15:43 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov, Konrad Rzeszutek Wilk
  Cc: x86, linux-edac, xen-devel, linux-kernel, H. Peter Anvin,
	Ingo Molnar, Thomas Gleixner

Luck, Tony wrote:
>> Because, if you'd hooked into it, just imagine one fine day, when we
>> remove mcelog support, what screaming the xen people will be doing
>> when mcelog doesn't work anymore.
> 
> Agreed. Even before we get to deleting mcelog, "struct mce" can
> change (new fields could be added) ... and you don't want to have
> your hypervisor to have to know which version of Linux it is talking
> to. 
> 

Hmm, adding new fields block neither native nor xen -- xen just ignore new added fields, that would be no problem for xen:
/* Fields are zero when not available */
struct mce {
	...
}

and, hypervisor only record error into its mc_info cookie (in its own format), it doesn't care what's dom0's version and how dom0 translate it -- that's dom0's business. The role of hypervisor is virtual 'platform', just like h/w platform don't care what linux version runs over it.


Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:06         ` Konrad Rzeszutek Wilk
@ 2012-04-23 15:59           ` Borislav Petkov
  2012-04-23 16:23             ` Luck, Tony
  2012-04-23 16:54             ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2012-04-23 15:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Liu, Jinsong, tony.luck, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On Mon, Apr 23, 2012 at 11:06:57AM -0400, Konrad Rzeszutek Wilk wrote:
> > > This driver is not that much different from the APEI bridge to MCE code -
> > > it just that instead of reading APEI blob data it reads it from an hypercall.
> > 
> > Let me ask you this: is APEI a virtualization solution of some sort?
> > 
> > No, it is the old windoze RAS crap but I guess Linux has to support it
> > now too through BIOS. And x86 vendors will have to support it too.
> > 
> > So it is piece of the firmware we'd have to deal with too.
> > 
> > Now xen is a whole another deal - it is purely a piece of software.
> 
> Perfect. Software is more elastic than hardware - so the Xen ABI
> for the MCE can be changed then to reflect the new format if required.

And this "elastic" piece of software is yet another thing I have to deal
with when working on MCE which is strictly doing _hardware_ support. And
I don't want to do that - if you hook in the mce_log() interface, you're
practically becoming yet another user of it which I have to account for
when doing changes to the core MCE code.

[..]

> Delegate testing to sub-maintainers. In this case that would be me and
> Liu.

Ok, just purely hypothetically, what happens if there's no one to update
this code anymore? What if Oracle or Citrix or whoever it is, is not
interested in maintaining xen anymore, or those drivers or whatever?
I'm stuck with users of this which I can't shake off because there's no
upstream development. Oh, and then there's the distros which already
have their installs and they'd never update their setup because it works
and why touch it... and so on and so on...

[..]

> > The problem is xen growing stuff everywhere in arch/x86/ and this way,
> > maybe even unwillingly, crippling development of hardware-related
> > features. I know you're willing to help and I know you mean it well, but
> > there's always some other problem in practice.
> 
> I am not sure I see why we cannot fix the practical problems as they pop
> up?

See above.

[..]

> > hook into arch/x86/ instead of doing your own stuff?
> 
> I think what you are suggesting is to _not_ reuse existing APIs. That
> seems counter-intuive to general software development. There are
> exceptions of course - when the existing API needs to change a lot
> (or needs to be thrown out), and there is this one little driver that
> keeps on using the old interface and can't change - at that point I can
> see the purpose of forking it. But until then - using existing APIs is
> the way to go.

I just love it when guys left and right start teaching me about stuff,
it's like I even begged for it...

You're not reusing the existing API because mce_log is not an API to
anything - it is an internal x86 MCE logging thing to put a struct mce
into a static ring buffer.

The API is /dev/mcelog and there you should interface, not hook into
internal stuff and cripple it from developing any further.

[snip more senseless drivel]

> > Now, with your own buffer solution, nothing breaks and all is happy,
> > a win-win, if you wish. I think this is much simpler and easier a
> > solution.
> 
> Not sure what you mean by 'own buffer solution'. Are you talking about
> using the trace_mce_record or the ras_printk instead of the mce_log?
> I would think that is the job of the MCE decoders?

No, I mean to copy the mce_log() code along with the functions that
create the char device /dev/mcelog - mce_chrdev_*.

Btw, you'd probably be done with it if you'd taken the time to do that
instead of arguing your cause.

> Please keep in mind that this driver is not trying to decode anything -

I know that.

> it just lifting raw events, massaging them a bit, and sending them
> downstream. Similar to how the APEI does it.

Please stop talking about APEI - I told you why it doesn't apply here in
a previous mail.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23  6:09       ` Borislav Petkov
  2012-04-23 15:06         ` Konrad Rzeszutek Wilk
@ 2012-04-23 16:17         ` Liu, Jinsong
  1 sibling, 0 replies; 25+ messages in thread
From: Liu, Jinsong @ 2012-04-23 16:17 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk
  Cc: Luck, Tony, x86, linux-edac, xen-devel, linux-kernel,
	H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Jan Beulich

>> 
>> This driver is not that much different from the APEI bridge to MCE
>> code - 
>> it just that instead of reading APEI blob data it reads it from an
>> hypercall. 
> 
> Let me ask you this: is APEI a virtualization solution of some sort?
> 
> No, it is the old windoze RAS crap but I guess Linux has to support it
> now too through BIOS. And x86 vendors will have to support it too.
> 
> So it is piece of the firmware we'd have to deal with too.
> 
> Now xen is a whole another deal - it is purely a piece of software.

But the role of xen (in fact, including other virtual solution like lguest/kvm) is virtual 'platform', equal to native h/w 'platform'.

> 
> Now I keep wondering, why don't you guys simply create your own mcelog
> ring buffer and interface on the userspace tool side instead of
> hooking into lowlevel kernel stuff? I mean, the code is there, you
> simply have to copy it into arch/xen/ or whatever you have there. Why
> do you have to hook into arch/x86/ instead of doing your own stuff?
> 
>>> So, my suggestion is to copy the pieces you need and create your
>>> own xen version of /dev/mcelog and add it to dom0 so that there's
>>> no hooking into baremetal code and whenever a dom0 kernel is
>>> running, you can reroute the mcelog userspace tool to read
>>> /dev/xen_mcelog or whatever and not hook into the x86 versions.

That would make distro camp confusing -- i.e. SLES10/11 unifiedly use /dev/mcelog for both native and xen mcelog, w/ almost same dom0 mcelog logic as this patch.

>>> 
>>> Because, if you'd hooked into it, just imagine one fine day, when we
>>> remove mcelog support, what screaming the xen people will be doing
>>> when mcelog doesn't work anymore.
>> 
>> You would have more screaming from the distro camp about removing
>> /dev/mcelog.
> 
> How do you know that? Don't you think that we probably would've talked
> to them already and made preparations for conversion first?

I guess Novell Suse doubt this.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:32       ` Konrad Rzeszutek Wilk
@ 2012-04-23 16:17         ` Luck, Tony
  2012-04-23 16:51           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Luck, Tony @ 2012-04-23 16:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Borislav Petkov, Liu, Jinsong, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

> This driver is _not_ doing a strict bit-by-bit copy of 'struct mcinfo'
> to 'struct mce'. It is plucking the right bits and setting them in
> 'struct mce'.

That isn't any worse that what we currently have ... but what we currently
have is ugly, and doing more ugly things just because we did them before
generally isn't a winning argument on the mailing lists.

-Tony

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:27     ` Luck, Tony
  2012-04-23 15:32       ` Konrad Rzeszutek Wilk
  2012-04-23 15:43       ` Liu, Jinsong
@ 2012-04-23 16:17       ` H. Peter Anvin
  2012-04-23 17:03         ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2012-04-23 16:17 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Konrad Rzeszutek Wilk, Liu, Jinsong, x86,
	linux-edac, xen-devel, linux-kernel, Ingo Molnar,
	Thomas Gleixner

On 04/23/2012 08:27 AM, Luck, Tony wrote:
>> Because, if you'd hooked into it, just imagine one fine day, when we
>> remove mcelog support, what screaming the xen people will be doing when
>> mcelog doesn't work anymore.
> 
> Agreed. Even before we get to deleting mcelog, "struct mce" can change (new
> fields could be added) ... and you don't want to have your hypervisor to
> have to know which version of Linux it is talking to.

This is a great example on the fundamental problem with Xen, or rather
the approach that Xen has taken of grabbing random kernel internals and
claiming them as APIs (or, in some cases even as ABIs.)  A lot of these
have had problems that are now very nearly unfixable, and that has
seriously stalled out the ability of evolve the Linux kernel in some
areas.  Note that the cost of this is borne by the development
community, not by the Xen maintainers.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:59           ` Borislav Petkov
@ 2012-04-23 16:23             ` Luck, Tony
  2012-04-23 16:54             ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 25+ messages in thread
From: Luck, Tony @ 2012-04-23 16:23 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk
  Cc: Liu, Jinsong, x86, linux-edac, xen-devel, linux-kernel,
	H. Peter Anvin, Ingo Molnar, Thomas Gleixner

> Perfect. Software is more elastic than hardware - so the Xen ABI
> for the MCE can be changed then to reflect the new format if required.

To developers working upstream in open source projects ... yes, software
does look like it is easy to change.  But in the long term software is
very difficult to change. Enterprise versions for Linux now come with
up to ten years of support ... and they are expected to be updated with
support for new hardware for much of their support lifetime. So what goes
into upstream now will be a support burden for somebody until around 2022.

So please look at the suggestions that Boris has provided, and think about
if there is a cleaner, more maintainable way to pass error data around.

-Tony

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:43       ` Liu, Jinsong
@ 2012-04-23 16:26         ` H. Peter Anvin
  2012-04-23 17:05           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2012-04-23 16:26 UTC (permalink / raw)
  To: Liu, Jinsong
  Cc: Luck, Tony, Borislav Petkov, Konrad Rzeszutek Wilk, x86,
	linux-edac, xen-devel, linux-kernel, Ingo Molnar,
	Thomas Gleixner

On 04/23/2012 08:43 AM, Liu, Jinsong wrote:
> 
> and, hypervisor only record error into its mc_info cookie (in its
> own format), it doesn't care what's dom0's version and how dom0 translate it
> -- that's dom0's business. The role of hypervisor is virtual 'platform',
> just like h/w platform don't care what linux version runs over it.
> 

And this is the *OTHER* problem with Xen.  A platform is only a platform
if it is documented.  There are a number of PV hypervisors which *are*
documented in detail, but Xen is not one of them.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 16:17         ` Luck, Tony
@ 2012-04-23 16:51           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-23 16:51 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Liu, Jinsong, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On Mon, Apr 23, 2012 at 04:17:14PM +0000, Luck, Tony wrote:
> > This driver is _not_ doing a strict bit-by-bit copy of 'struct mcinfo'
> > to 'struct mce'. It is plucking the right bits and setting them in
> > 'struct mce'.
> 
> That isn't any worse that what we currently have ... but what we currently
> have is ugly, and doing more ugly things just because we did them before
> generally isn't a winning argument on the mailing lists.

Heh.

Until about an hour ago I did not know that what is there is considered
'ugly' and you guys had thoughts of how to make it nicer. That is good.
And the driver isn't set in stone by only using those function - it can
change.

I am not sure if I have stated, but providing this driver in my mind
is an implicit contract that I, as the sub-maintainer, will keep it
in lock-step with mce.h cleanups. That is after all, the job of a
(sub)-maintainer.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 15:59           ` Borislav Petkov
  2012-04-23 16:23             ` Luck, Tony
@ 2012-04-23 16:54             ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-23 16:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Liu, Jinsong, tony.luck, x86, linux-edac, xen-devel,
	linux-kernel, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

> > > Now xen is a whole another deal - it is purely a piece of software.
> > 
> > Perfect. Software is more elastic than hardware - so the Xen ABI
> > for the MCE can be changed then to reflect the new format if required.
> 
> And this "elastic" piece of software is yet another thing I have to deal
> with when working on MCE which is strictly doing _hardware_ support. And
> I don't want to do that - if you hook in the mce_log() interface, you're
> practically becoming yet another user of it which I have to account for
> when doing changes to the core MCE code.

Or you can delegate.
> 
> [..]
> 
> > Delegate testing to sub-maintainers. In this case that would be me and
> > Liu.
> 
> Ok, just purely hypothetically, what happens if there's no one to update
> this code anymore? What if Oracle or Citrix or whoever it is, is not
> interested in maintaining xen anymore, or those drivers or whatever?
> I'm stuck with users of this which I can't shake off because there's no
> upstream development. Oh, and then there's the distros which already
> have their installs and they'd never update their setup because it works
> and why touch it... and so on and so on...

In case I get hit by a bus and nobody steps up from the Xen community shortly
after that or Intel is not interested anymore in this driver, then you:
 1) mark it as CONFIG_BROKEN
 2) wait some time until somebody elects themselves as the maintainer.
 3) And if that does not become true, then you do 'git rm ..'. 

Distros are in the business of providing maintainship services. That is
after all how they make their money. The Kconfig by default is set to 'n'
and the wording can be written further to inhibit folks from using.

But I do follow Fedora Core and Ubuntu bugs that have the word 'xen' in
them to fix the issues they encounter. You are probably doing the same
thing (s/xen/mce)- so why don't you point the sub-maintainer to the
distro bug so he/she can fix it?


.. snip..
> > I am not sure I see why we cannot fix the practical problems as they pop
> > up?
> 
> See above.
> 
> [..]
> 
> > > hook into arch/x86/ instead of doing your own stuff?
> > 
> > I think what you are suggesting is to _not_ reuse existing APIs. That
> > seems counter-intuive to general software development. There are
> > exceptions of course - when the existing API needs to change a lot
> > (or needs to be thrown out), and there is this one little driver that
> > keeps on using the old interface and can't change - at that point I can
> > see the purpose of forking it. But until then - using existing APIs is
> > the way to go.
> 
> I just love it when guys left and right start teaching me about stuff,
> it's like I even begged for it...
> 
> You're not reusing the existing API because mce_log is not an API to
> anything - it is an internal x86 MCE logging thing to put a struct mce
> into a static ring buffer.
> 
> The API is /dev/mcelog and there you should interface, not hook into
> internal stuff and cripple it from developing any further.

I mentioned on a couple of occasions that we would be keeping the implementation
in lock-step and testing as neccessary. And if you choose to omit this
driver and asked us to develop the appropiate patch, we can surely
do that too.

> 
> [snip more senseless drivel]
> 
> > > Now, with your own buffer solution, nothing breaks and all is happy,
> > > a win-win, if you wish. I think this is much simpler and easier a
> > > solution.
> > 
> > Not sure what you mean by 'own buffer solution'. Are you talking about
> > using the trace_mce_record or the ras_printk instead of the mce_log?
> > I would think that is the job of the MCE decoders?
> 
> No, I mean to copy the mce_log() code along with the functions that
> create the char device /dev/mcelog - mce_chrdev_*.
> 
> Btw, you'd probably be done with it if you'd taken the time to do that
> instead of arguing your cause.

I am not advocating that we stick a ground in stone and say: "this is
how mce_log(x)" from now on MUST be used. If that is impression left by
my wording - my mistake. I would want the driver to change to adhere
to the new function calls/structures.

> 
> > Please keep in mind that this driver is not trying to decode anything -
> 
> I know that.
> 
> > it just lifting raw events, massaging them a bit, and sending them
> > downstream. Similar to how the APEI does it.
> 
> Please stop talking about APEI - I told you why it doesn't apply here in
> a previous mail.

Right, b/c it is hardware and is set in stone.
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> Advanced Micro Devices GmbH
> Einsteinring 24, 85609 Dornach
> GM: Alberto Bozzo
> Reg: Dornach, Landkreis Muenchen
> HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 16:17       ` H. Peter Anvin
@ 2012-04-23 17:03         ` Konrad Rzeszutek Wilk
  2012-04-23 17:16           ` H. Peter Anvin
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-23 17:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luck, Tony, Liu, Jinsong, xen-devel, x86, linux-kernel,
	Borislav Petkov, Thomas Gleixner, Ingo Molnar, linux-edac

On Mon, Apr 23, 2012 at 09:17:28AM -0700, H. Peter Anvin wrote:
> On 04/23/2012 08:27 AM, Luck, Tony wrote:
> >> Because, if you'd hooked into it, just imagine one fine day, when we
> >> remove mcelog support, what screaming the xen people will be doing when
> >> mcelog doesn't work anymore.
> > 
> > Agreed. Even before we get to deleting mcelog, "struct mce" can change (new
> > fields could be added) ... and you don't want to have your hypervisor to
> > have to know which version of Linux it is talking to.
> 
> This is a great example on the fundamental problem with Xen, or rather
> the approach that Xen has taken of grabbing random kernel internals and
> claiming them as APIs (or, in some cases even as ABIs.)  A lot of these

I am _not_ claiming that. If I left you with that impression from my
responses - my fault for not getting my point across (the sleep deprevation
is probably not helping either).

I am _not_ stating that the usage of 'mce_log' or 'struct mce' MUST
remain the same from now on. No. I am saying that the driver will be
changed lock-step as Tony and Boris see fit in changing the functions.

And currently the way the existing MCE drivers do this - is by using
mce_log. This driver does is too - since the in-tree drivers do it this way.
When they change to use a different mechanism - this driver will as well.

> have had problems that are now very nearly unfixable, and that has
> seriously stalled out the ability of evolve the Linux kernel in some
> areas.  Note that the cost of this is borne by the development
> community, not by the Xen maintainers.

The ones I know that you are unhappy about are the MMU paravirt interfaces
and I did mention to you that once some prototype work is done and
it showed success, I will work on removing said support.

Why don't you send me your unhappy list so that is on my radar as well please?

> 
> 	-hpa
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 16:26         ` H. Peter Anvin
@ 2012-04-23 17:05           ` Konrad Rzeszutek Wilk
  2012-04-23 17:23             ` H. Peter Anvin
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-23 17:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Liu, Jinsong, Luck, Tony, Borislav Petkov, x86, linux-edac,
	xen-devel, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, Apr 23, 2012 at 09:26:42AM -0700, H. Peter Anvin wrote:
> On 04/23/2012 08:43 AM, Liu, Jinsong wrote:
> > 
> > and, hypervisor only record error into its mc_info cookie (in its
> > own format), it doesn't care what's dom0's version and how dom0 translate it
> > -- that's dom0's business. The role of hypervisor is virtual 'platform',
> > just like h/w platform don't care what linux version runs over it.
> > 
> 
> And this is the *OTHER* problem with Xen.  A platform is only a platform
> if it is documented.  There are a number of PV hypervisors which *are*
> documented in detail, but Xen is not one of them.

And slowly we are working on that. As I mentioned to you, we are in process
of documenting those calls so that there is a document similar to an SDM.
It takes time though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 17:03         ` Konrad Rzeszutek Wilk
@ 2012-04-23 17:16           ` H. Peter Anvin
  0 siblings, 0 replies; 25+ messages in thread
From: H. Peter Anvin @ 2012-04-23 17:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Luck, Tony, Liu, Jinsong, xen-devel, x86, linux-kernel,
	Borislav Petkov, Thomas Gleixner, Ingo Molnar, linux-edac

My biggest beefs are page table manipulation, all the extra ugliness in kernel entry and exit, and lack of clear initialization semantics.  Additionally, any pvop which has unclear semantics - especially anything which is a nop on native and has zero documentation.

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

>On Mon, Apr 23, 2012 at 09:17:28AM -0700, H. Peter Anvin wrote:
>> On 04/23/2012 08:27 AM, Luck, Tony wrote:
>> >> Because, if you'd hooked into it, just imagine one fine day, when
>we
>> >> remove mcelog support, what screaming the xen people will be doing
>when
>> >> mcelog doesn't work anymore.
>> > 
>> > Agreed. Even before we get to deleting mcelog, "struct mce" can
>change (new
>> > fields could be added) ... and you don't want to have your
>hypervisor to
>> > have to know which version of Linux it is talking to.
>> 
>> This is a great example on the fundamental problem with Xen, or
>rather
>> the approach that Xen has taken of grabbing random kernel internals
>and
>> claiming them as APIs (or, in some cases even as ABIs.)  A lot of
>these
>
>I am _not_ claiming that. If I left you with that impression from my
>responses - my fault for not getting my point across (the sleep
>deprevation
>is probably not helping either).
>
>I am _not_ stating that the usage of 'mce_log' or 'struct mce' MUST
>remain the same from now on. No. I am saying that the driver will be
>changed lock-step as Tony and Boris see fit in changing the functions.
>
>And currently the way the existing MCE drivers do this - is by using
>mce_log. This driver does is too - since the in-tree drivers do it this
>way.
>When they change to use a different mechanism - this driver will as
>well.
>
>> have had problems that are now very nearly unfixable, and that has
>> seriously stalled out the ability of evolve the Linux kernel in some
>> areas.  Note that the cost of this is borne by the development
>> community, not by the Xen maintainers.
>
>The ones I know that you are unhappy about are the MMU paravirt
>interfaces
>and I did mention to you that once some prototype work is done and
>it showed success, I will work on removing said support.
>
>Why don't you send me your unhappy list so that is on my radar as well
>please?
>
>> 
>> 	-hpa
>> 
>> -- 
>> H. Peter Anvin, Intel Open Source Technology Center
>> I work for Intel.  I don't speak on their behalf.
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-23 17:05           ` Konrad Rzeszutek Wilk
@ 2012-04-23 17:23             ` H. Peter Anvin
  0 siblings, 0 replies; 25+ messages in thread
From: H. Peter Anvin @ 2012-04-23 17:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Liu, Jinsong, Luck, Tony, Borislav Petkov, x86, linux-edac,
	xen-devel, linux-kernel, Ingo Molnar, Thomas Gleixner

I know.  It is very frustrating to me that this comes now rather than before Xen was merged, but changing the past is hard.  At least you are trying to fix it.

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

>On Mon, Apr 23, 2012 at 09:26:42AM -0700, H. Peter Anvin wrote:
>> On 04/23/2012 08:43 AM, Liu, Jinsong wrote:
>> > 
>> > and, hypervisor only record error into its mc_info cookie (in its
>> > own format), it doesn't care what's dom0's version and how dom0
>translate it
>> > -- that's dom0's business. The role of hypervisor is virtual
>'platform',
>> > just like h/w platform don't care what linux version runs over it.
>> > 
>> 
>> And this is the *OTHER* problem with Xen.  A platform is only a
>platform
>> if it is documented.  There are a number of PV hypervisors which
>*are*
>> documented in detail, but Xen is not one of them.
>
>And slowly we are working on that. As I mentioned to you, we are in
>process
>of documenting those calls so that there is a document similar to an
>SDM.
>It takes time though.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-21 10:45   ` Borislav Petkov
  2012-04-23  2:18     ` Konrad Rzeszutek Wilk
  2012-04-23 15:27     ` Luck, Tony
@ 2012-04-24  8:27     ` Andi Kleen
  2012-04-25  8:11       ` Ingo Molnar
  2 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2012-04-24  8:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Konrad Rzeszutek Wilk, Liu, Jinsong, tony.luck, x86, linux-edac,
	xen-devel, linux-kernel, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner, torvalds

Borislav Petkov <bp@amd64.org> writes:

> Because, if you'd hooked into it, just imagine one fine day, when we
> remove mcelog support, what screaming the xen people will be doing when
> mcelog doesn't work anymore.

That's simple. /dev/mcelog is a widely used user space ABI and Linux
does never remove widely used user space ABIs, so this can never happen.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
  2012-04-24  8:27     ` Andi Kleen
@ 2012-04-25  8:11       ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2012-04-25  8:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Borislav Petkov, Konrad Rzeszutek Wilk, Liu, Jinsong, tony.luck,
	x86, linux-edac, xen-devel, linux-kernel, H. Peter Anvin,
	Thomas Gleixner, torvalds


* Andi Kleen <andi@firstfloor.org> wrote:

> Borislav Petkov <bp@amd64.org> writes:
> 
> > Because, if you'd hooked into it, just imagine one fine day, when we
> > remove mcelog support, what screaming the xen people will be doing when
> > mcelog doesn't work anymore.
> 
> That's simple. /dev/mcelog is a widely used user space ABI 
> [...]

*COUGH* *SPLUTTER* *LAUGH*.

Thanks for making my day with the self-serving exaggeration of 
the year.

In truth /dev/mcelog is one of the crappiest ABIs Linux has - 
full stop. As a result it's barely used by anyone who can avoid 
it - its main usecase is the utility you wrote for it and some 
enterprise folks who desperately need that data and don't care 
how they get it. The mcelog user-space utility sucks too, 
understandably.

Nevertheless we probably have to keep the ABI around until a 
truly better replacement is out there (the RAS daemon for 
example) and mcelog usage eclipses to obscurity, but by all 
means we want to limit its further spreading /dev/mcelog to 
areas it has not polluted yet ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-04-25  8:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-19 13:29 [PATCH 1/3] Add mcelog support for xen platform Liu, Jinsong
2012-04-20 18:40 ` Konrad Rzeszutek Wilk
2012-04-20 20:00   ` Liu, Jinsong
2012-04-21  4:14 ` [Xen-devel] " Konrad Rzeszutek Wilk
2012-04-21 10:45   ` Borislav Petkov
2012-04-23  2:18     ` Konrad Rzeszutek Wilk
2012-04-23  6:09       ` Borislav Petkov
2012-04-23 15:06         ` Konrad Rzeszutek Wilk
2012-04-23 15:59           ` Borislav Petkov
2012-04-23 16:23             ` Luck, Tony
2012-04-23 16:54             ` Konrad Rzeszutek Wilk
2012-04-23 16:17         ` Liu, Jinsong
2012-04-23 15:27     ` Luck, Tony
2012-04-23 15:32       ` Konrad Rzeszutek Wilk
2012-04-23 16:17         ` Luck, Tony
2012-04-23 16:51           ` Konrad Rzeszutek Wilk
2012-04-23 15:43       ` Liu, Jinsong
2012-04-23 16:26         ` H. Peter Anvin
2012-04-23 17:05           ` Konrad Rzeszutek Wilk
2012-04-23 17:23             ` H. Peter Anvin
2012-04-23 16:17       ` H. Peter Anvin
2012-04-23 17:03         ` Konrad Rzeszutek Wilk
2012-04-23 17:16           ` H. Peter Anvin
2012-04-24  8:27     ` Andi Kleen
2012-04-25  8:11       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).