[PATCH 00/19] MCE code cleanup and add LMCE support

* [PATCH 00/19] MCE code cleanup and add LMCE support
@ 2017-02-17  6:39 Haozhong Zhang
  2017-02-17  6:39 ` [PATCH 01/19] x86/mce: fix indentation style in xen-mca.h and mce.h Haozhong Zhang
                   ` (18 more replies)
  0 siblings, 19 replies; 78+ messages in thread
From: Haozhong Zhang @ 2017-02-17  6:39 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Jun Nakajima, Liu Jinsong,
	Christoph Egger, Ian Jackson, Jan Beulich, Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 9332 bytes --]

This patch series adds LMCE support to Xen, although more than half
patches are for code cleanup and bug fix.

LMCE
--------------
Intel Local MCE (LMCE) is a feature on Intel Skylake Server CPU that                                        
can deliver MCE to a single processor thread instead of broadcasting                                        
to all threads, which can reduce software's load when processing MCE                                        
on machines with a large number of processor threads.                                        

The technical details of LMCE can be found in Intel SDM Vol 3, Chapter                                        
"Machine-Check Architecture" (search for 'LMCE'). Basically,                                        
 * The capability of LMCE is indicated by bit 27 (MCG_LMCE_P) of                                        
   MSR_IA32_MCG_CAP.                                        
 * LMCE is enabled by setting bit 20 (MSR_IA32_FEATURE_CONTROL_LMCE)                                        
   of MSR_IA32_FEATURE_CONTROL and bit 0 (MCG_EXT_CTL_LMCE_EN) of                                        
   MSR_IA32_MCG_EXT_CTL.                                        
 * Software can determine if a MCE is local to the current processor                                        
   thread by checking bit 2 (MCG_STATUS_LMCE) of MSR_IA32_MCG_STATUS.

Patch Overview
--------------
In this patch series,
 * Xen enables LMCE by default if it's supported by host CPU unless Xen
   boot parameter "mce_fb=1" is present.
 * Xen handles LMCE only on the affected CPU and does not need all CPUs
   to enter MCE handler.
 * A new xl config "lmce=BOOLEAN" is added to control whether LMCE is
   supported for the HVM domain. It's disabled by default. If the host
   CPU does not support LMCE, this config will be ignored.
 * For HVM domain with LMCE support, if the vcpu affected by a host
   LMCE is known, Xen will inject a vLMCE to that vcpu. If the affected
   vcpu is unknown or LMCE support is disabled for a HVM domain, a MCE
   will be broadcast to all vcpus of that domain as before.  

This patch series is organized as below:
 * Patch 1 - 8 clean up existing MCE code and make one improvement to
   debugging messages. No functional change is introduced.
 * Patch 9 - 11 fix two bugs in vMCE injection and MCE handling.
 * Patch 12 & 13 add host-side LMCE support, including detecting,
   enabling LMCE feature and handling LMCE.
 * Patch 14 - 17 add guest-side LMCE support (only HVM domain so far),
   including emulating LMCE feature and injecting LMCE to HVM domain.
 * Patch 18 & 19 add xen-mceinj support to inject LMCE for test purpose.

How to Test
--------------
0. This patch series can be tested either on Intel CPU w/ LMCE support
   (Skylake-EX), or in the nested virtualization environment on
   KVM/QEMU (i.e. Xen as L1 hypervisor).

   QEMU 2.7.0 and later with KVM in Linux kernel 4.8 and later can
   emulate LMCE and do not require the host hardware support LMCE. You
   can start a nested virtualization environment with LMCE support by
   the following command:
        qemu-system-x86_64 -enable-kvm \
                           -smp 32 -cpu kvm64,lmce=on,+vmx \
                           -hda PATH-TO-DISK-IMG -m 2048

1. Build, install and boot Xen with this patch series. You can include
   "mce_verbosity=verbose" in Xen boot parameters to get more detailed
   debugging messages about MCE.

2. At boot time, if the Xen boot parameter 'mce_fb=1' is not
   present, Xen hypervisor should be able to detect and enable LMCE,
   and print the following message:

        (XEN) mce_intel.c:737: MCA Capability: BCAST 1 SER 1 CMCI 1 firstbank 0 extended MCE MSR LMCE 1

   If 'mce_fb=1' is specified, the last segment of above message will
   be "LMCE 0" which indicates Xen does not enable LMCE support.

3. Start a HVM domain with the attached config file xl.cfg. In the
   config,
    * "lmce = 1" enables LMCE for the domaim.
    * "cpus = [ ... ]" is helpful for the following steps to figure
      out which CPU should we inject to, and may be not a necessity.

   Run Linux kernel 4.2 or later (which has LMCE support) in the
   domain.

   Run the latest mcelog (https://www.mcelog.org/) in the domain as
   well to log MCEs injected in latter steps. Depending on the guest
   Linux distro, the log can be in /var/log/mcelog, syslog or systemd
   journal.

   Compile and run the attached claim_page.c in the domain. claim_page.c
   allocates a page of memory, prints its base (guest) physical address
   and enters an infinite loop. For example, it may print a message like

        Physical address of mmaped page = 0x36d4d000

4. Use "xl vcpu-list" to figure out the cpu number on which
   claim_page on is running. For example, xl vcpu-list may output

        Name     ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
        lmce-l2   1     0    4   r--     546.5  4 / all
        lmce-l2   1     1    5   -b-       8.4  5 / all
        lmce-l2   1     2    6   -b-       6.4  6 / all 
        lmce-l2   1     3    7   -b-       6.4  7 / all

    As claim_page is the only workload that is actively running in
    the domain, CPU 4 (VCPU 0) is very likely the one it's running on.
    (You may even want to pin claim_page to a vcpu in guest Linux ... )

5. Use xen-mceinj to inject LMCE:
        ./xen-mceinj -c 4 -d 1 -p 0x36d4d000 -t 0 -l
                                                  ^^
                                                    inject LMCE

   If the injection succeeds, mcelog in the domain should generate the
   log like

        Hardware event. This is not a software error.
        MCE 0                                        
        CPU 0 BANK 1 TSC 2218fdf1380
        ^^^^^
             vcpu0 receives MCE
        RIP !INEXACT! 10:ffffffff810591e7            
        MISC 86 ADDR 36d4d000
                ^^^^^^^^^^^^^
                             error address
        TIME 1487302866 Fri Feb 17 11:41:06 2017     
        MCG status:RIPV MCIP LMCE
                             ^^^^
                                 LMCE is injected
        MCi status:                                  
        Uncorrected error                            
        Error enabled                                
        MCi_MISC register valid                      
        MCi_ADDR register valid                      
        SRAO                                         
        MCA: Generic CACHE Level-2 Eviction Error    
        STATUS bd2000000000017a MCGSTATUS d          
        MCGCAP 9000c02 APICID 1 SOCKETID 0           
        CPUID Vendor Intel Family 6 Model 79  

Haozhong Zhang (19):
  01/19 x86/mce: fix indentation style in xen-mca.h and mce.h
  02/19 x86/mce: remove declarations of non-existing functions in mce.h
  03/19 x86/mce: remove unnecessary braces around intel_get_extended_msrs()
  04/19 xen/mce: remove unused x86_mcinfo_add()
  05/19 x86/mce: merge loops to get Intel extended MC MSR
  06/19 x86/mce: merge intel_default_mce_dhandler/uhandler()
  07/19 x86/vmce: include domain/vcpu id in debug messages
  08/19 x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve()
  09/19 x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case
  10/19 x86/mce: always write 0 to MSR_IA32_MCG_STATUS on Intel CPU
  11/19 tools/xen-mceinj: fix the type of cpu number
  12/19 x86/mce: handle LMCE locally
  13/19 x86/mce_intel: detect and enable LMCE on Intel host
  14/19 x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL
  15/19 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
  16/19 x86/vmce: enable injecting LMCE to guest on Intel host
  17/19 x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
  18/19 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
  19/19 tools/xen-mceinj: support injecting LMCE

 docs/man/xl.cfg.pod.5.in                |  18 ++++
 tools/libxc/include/xenctrl.h           |   1 +
 tools/libxc/xc_misc.c                   |  25 ++++++
 tools/libxl/libxl_create.c              |   1 +
 tools/libxl/libxl_dom.c                 |   2 +
 tools/libxl/libxl_types.idl             |   1 +
 tools/libxl/xl_cmdimpl.c                |   3 +
 tools/tests/mce-test/tools/xen-mceinj.c |  70 +++++++++++++--
 xen/arch/x86/cpu/mcheck/barrier.c       |   4 +-
 xen/arch/x86/cpu/mcheck/mcaction.c      |  20 +++--
 xen/arch/x86/cpu/mcheck/mce.c           |  87 +++++++++++-------
 xen/arch/x86/cpu/mcheck/mce.h           |  51 ++++++-----
 xen/arch/x86/cpu/mcheck/mce_amd.c       |   4 +-
 xen/arch/x86/cpu/mcheck/mce_intel.c     |  86 ++++++++++--------
 xen/arch/x86/cpu/mcheck/vmce.c          | 153 ++++++++++++++++++++++++--------
 xen/arch/x86/cpu/mcheck/vmce.h          |   2 +-
 xen/arch/x86/cpu/mcheck/x86_mca.h       |   9 +-
 xen/arch/x86/hvm/hvm.c                  |   7 ++
 xen/arch/x86/hvm/vmx/vmx.c              |  10 +++
 xen/arch/x86/hvm/vmx/vvmx.c             |   4 -
 xen/include/asm-x86/mce.h               |   3 +
 xen/include/asm-x86/msr-index.h         |   2 +
 xen/include/public/arch-x86/hvm/save.h  |   2 +
 xen/include/public/arch-x86/xen-mca.h   |  25 +++---
 xen/include/public/hvm/params.h         |   5 +-
 25 files changed, 420 insertions(+), 175 deletions(-)

--
2.10.1

[-- Attachment #2: claim_page.c --]
[-- Type: text/x-csrc, Size: 1998 bytes --]

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>

struct pagemaps {
    unsigned long long pfn:55;
    unsigned long long shift:6;
    unsigned long long rsvd:1;
    unsigned long long swapped:1;
    unsigned long long present:1;
};

static int translate_va2pa(uint64_t va, uint64_t pagesize, uint64_t *pa)
{
    int rc = 0;
    static const char *pagemap_file = "/proc/self/pagemap";
    struct pagemaps pinfo;
    uint64_t pinfo_size = sizeof(pinfo);
    uint64_t offset = va / pagesize * pinfo_size;
    int fd = open(pagemap_file, O_RDONLY);

    if (fd == -1) {
        rc = errno;
        fprintf(stderr, "Failed to open %s: %s\n", pagemap_file, strerror(rc));
        goto ret;
    }

    if (pread(fd, (void *) &pinfo, pinfo_size, offset) != pinfo_size) {
        rc = errno;
        fprintf(stderr, "Failed to read offset 0x%"PRIx64": %s\n",
                offset, strerror(rc));
        goto ret_close;
    }

    *pa = (pinfo.pfn * pagesize) | (va & (pagesize - 1));

 ret_close:
    close(fd);
 ret:
    return rc;
}

int main(int argc, char **argv)
{
    void *buf;
    uint64_t buf_pa;
    int pagesize = getpagesize();
    int rc = 0;

    buf = mmap(NULL, pagesize,
               PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS,
               -1, 0);
    if (buf == MAP_FAILED) {
        rc = errno;
        fprintf(stderr, "Failed to mmap a page: %s\n", strerror(rc));
        goto ret;
    }
    memset(buf, 0xcc, pagesize);

    rc = translate_va2pa((uint64_t) buf, pagesize, &buf_pa);
    if (rc || !buf_pa) {
        fprintf(stderr, "Failed to get physical address of mmaped page\n");
        goto ret_unmap;
    }

    fprintf(stderr, "Physical address of mmaped page = 0x%"PRIx64"\n", buf_pa);

    volatile int i = 1;
    while (i++);

 ret_unmap:
    munmap(buf, pagesize);
 ret:
    return rc;
}

[-- Attachment #3: xl.cfg --]
[-- Type: text/plain, Size: 322 bytes --]

builder = "hvm"
name    = "lmce-l2"

vcpus   = 4
memory  = 1024
disk    = [ '/dev/vdb,raw,xvda,rw' ]
cpus    = [ "4", "5", "6", "7" ]

lmce    = 1

device_model_override = '/usr/local/lib/xen/bin/qemu-system-i386'
device_model_version  = 'qemu-xen'

sdl     = 0
vnc     = 1
vnclisten='0.0.0.0'
stdvga  = 1
serial  = 'pty'

[-- Attachment #4: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 78+ messages in thread