[PATCH 0/4] add support for vNVDIMM

* [PATCH 0/4] add support for vNVDIMM
@ 2015-12-29 11:31 Haozhong Zhang
  2015-12-29 11:31 ` [PATCH 1/4] x86/hvm: allow guest to use clflushopt and clwb Haozhong Zhang
                   ` (5 more replies)
  0 siblings, 6 replies; 88+ messages in thread
From: Haozhong Zhang @ 2015-12-29 11:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Jan Beulich, Wei Liu

This patch series is the Xen part patch to provide virtual NVDIMM to
guest. The corresponding QEMU patch series is sent separately with the
title "[PATCH 0/2] add vNVDIMM support for Xen".

* Background

 NVDIMM (Non-Volatile Dual In-line Memory Module) is going to be
 supported on Intel's platform. NVDIMM devices are discovered via ACPI
 and configured by _DSM method of NVDIMM device in ACPI. Some
 documents can be found at
 [1] ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
 [2] NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
 [3] DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
 [4] Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

 The upstream QEMU (commits 5c42eef ~ 70d1fb9) has added support to
 provide virtual NVDIMM in PMEM mode, in which NVDIMM devices are
 mapped into CPU's address space and are accessed via normal memory
 read/write and three special instructions (clflushopt/clwb/pcommit).

 This patch series and the corresponding QEMU patch series enable Xen
 to provide vNVDIMM devices to HVM domains.

* Design

 Supporting vNVDIMM in PMEM mode has three requirements.

 (1) Support special instructions to operate on cache lines
     (clflushopt & clwb) and persistent memory (pcommit).

     clflushopt and clwb take linear address as parameters and we
     allow them to be executed directly (w/o emulation) in HVM
     domains. This is done by Xen patch 1.

     pcommit is also allowed to be executed directly by L1 guest, and
     we let L1 hypervisor handle pcommit in L2 guest. This is done by
     Xen patch 2.

 (2) When NVDIMM works in PMEM mode, it must be mapped in CPU's
     address space.

     When a HVM domain is created, if it does not use the guest
     address space above 4 GB, vNVDIMM will be mapped to the guest
     address space from 4 GB. If the maximum used guest address above
     4 GB is X, vNVDIMM will be mapped to the guest address space
     above X. This is done by QEMU patch 1.

 (3) NVDIMM is discovered and configured through ACPI. A primary and
     complicated part of vNVDIMM implementation in upstream QEMU is to
     build those ACPI tables. To avoid reimplementing similar code in
     hvmloader again, we decide to reuse those ACPI tables from QEMU.

     We patch QEMU to build NFIT and other ACPI tables of vNVDIMM when
     it's used as Xen's device model, and then copy them to the end of
     guest memory below 4 GB. The guest address and size of those ACPI
     tables are saved to xenstore so that hvmloader can find those
     tables. This is done by QEMU patch 2.

     We also patch hvmloader to loader above extra ACPI tables. We
     reuse and extend the existing mechanism in hvmloader that loads
     passthrough ACPI tables for this purpose. This is done by Xen
     patch 4.

 In addition, Xen patch 3 adds a xl configuration 'nvdimm' and passes
 parsed parameters to QEMU to create vNVDIMM devices.

* Test
 (1) A patched upstream qemu is used for test. QEMU patch series is
     sent separately with title "[PATCH 0/2] add vNVDIMM support for
     Xen".  (vNVDIMM support has not been in qemu-xen commit f165e58,
     so we choose upstream qemu instead)

 (2) Prepare a memory backend file:
            dd if=/dev/zero of=/tmp/nvm0 bs=1G count=10

 (3) Add the following line to a HVM domain's configuration xl.cfg:
            nvdimm = [ 'file=/tmp/nvm0,size=10240' ]

 (4) Launch a HVM domain from above xl.cfg.

 (5) If guest Linux kernel is 4.2 or newer and kernel modules
     libnvdimm, nfit, nd_btt and nd_pmem are loaded, then you will see
     the whole nvdimm device used as a single namespace and /dev/pmem0
     will appear.

Haozhong Zhang (4):
  x86/hvm: allow guest to use clflushopt and clwb
  x86/hvm: add support for pcommit instruction
  tools/xl: add a new xl configuration 'nvdimm'
  hvmloader: add support to load extra ACPI tables from qemu

 docs/man/xl.cfg.pod.5                   | 19 ++++++++++++++
 tools/firmware/hvmloader/acpi/build.c   | 34 ++++++++++++++++++++-----
 tools/libxc/xc_cpufeature.h             |  4 ++-
 tools/libxc/xc_cpuid_x86.c              |  5 +++-
 tools/libxl/libxl_dm.c                  | 15 +++++++++--
 tools/libxl/libxl_types.idl             |  9 +++++++
 tools/libxl/xl_cmdimpl.c                | 45 +++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c                  | 10 ++++++++
 xen/arch/x86/hvm/vmx/vmcs.c             |  6 ++++-
 xen/arch/x86/hvm/vmx/vmx.c              |  1 +
 xen/arch/x86/hvm/vmx/vvmx.c             |  3 +++
 xen/include/asm-x86/cpufeature.h        |  7 +++++
 xen/include/asm-x86/hvm/vmx/vmcs.h      |  3 +++
 xen/include/asm-x86/hvm/vmx/vmx.h       |  1 +
 xen/include/public/hvm/hvm_xs_strings.h |  3 +++
 15 files changed, 154 insertions(+), 11 deletions(-)

-- 
2.4.8

^ permalink raw reply	[flat|nested] 88+ messages in thread