[RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains

* [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains
@ 2017-09-11  4:37 Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
                   ` (40 more replies)
  0 siblings, 41 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Jan Beulich, Chao Peng, Dan Williams

Overview
==================

(RFC v2 can be found at https://lists.xen.org/archives/html/xen-devel/2017-03/msg02401.html)

Well, this RFC v3 changes and inflates a lot from previous versions.
The primary changes are listed below, most of which are to simplify
the first implementation and avoid additional inflation.

1. Drop the support to maintain the frametable and M2P table of PMEM
   in RAM. In the future, we may add this support back.

2. Hide host NFIT and deny access to host PMEM from Dom0. In other
   words, the kernel NVDIMM driver is loaded in Dom 0 and existing
   management utilities (e.g. ndctl) do not work in Dom0 anymore. This
   is to workaround the inferences of PMEM access between Dom0 and Xen
   hypervisor. In the future, we may add a stub driver in Dom0 which
   will hold the PMEM pages being used by Xen hypervisor and/or other
   domains.

3. As there is no NVDIMM driver and management utilities in Dom0 now,
   we cannot easily specify an area of host NVDIMM (e.g., by /dev/pmem0)
   and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
   have to specify the exact MFNs of host PMEM pages in xl domain
   configuration files and the newly added Xen NVDIMM management
   utility xen-ndctl.

   If there are indeed some tasks that have to be handled by existing
   driver and management utilities, such as recovery from hardware
   failures, they have to be accomplished out of Xen environment.

   After 2. is solved in the future, we would be able to make existing
   driver and management utilities work in Dom0 again.

All patches can be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3

How to Test
==================

1. Build and install this patchset with the associated QEMU patches.

2. Use xen-ndctl to get a list of PMEM regions detected by Xen
   hypervisor, e.g.

     # xen-ndctl list --raw
     Raw PMEM regions:
      0: MFN 0x480000 - 0x880000, PXM 3

   which indicates a PMEM region is present at MFN 0x480000 - 0x880000.

3. Setup a management area to manage the guest data areas.

     # xen-ndctl setup-mgmt 0x480000 0x4c0000
     # xen-ndctl list --mgmt
     Management PMEM regions:
      0: MFN 0x480000 - 0x4c0000, used 0xc00

   The first command setup the PMEM area in MFN 0x480000 - 0x4c0000
   (1GB) as a management area, which is also used to manage itself.
   The second command list all management areas, and 'used' field
   shows the number of pages has been used from the beginning of that
   area.

   The size ratio between a management area and areas that it manages
   (including itself) should be at least 1 : 100 (i.e., 32 bytes for
   frametable and 8 bytes for M2P table per page).

   The size of a management area as well as a data area below is
   currently restricted to 256 Mbytes or multiples. The alignment is
   restricted to 2 Mbytes or multiples.

4. Setup a data area that can be used by guest.

     # xen-ndctl setup-data 0x4c0000 0x880000 0x480c00 0x4c0000
     # xen-ndctl list --data
     Data PMEM regions:
      0: MFN 0x4c0000 - 0x880000, MGMT MFN 0x480c00 - 0x48b000

   The first command setup the remaining PMEM pages from MFN 0x4c0000
   to 0x880000 as a data area. The management area MFN from 0x480c00
   to 0x4c0000 is specified to manage this data area. The actual used
   management pages can be found by the second command.

5. Assign a data pages to a HVM domain by adding the following line in
   the domain configuration.

     vnvdimms = [ 'type=mfn, backend=0x4c0000, nr_pages=0x100000' ]

   which assigns 4 Gbytes PMEM starting from MFN 0x4c0000 to that
   domain. A 4 Gbytes PMEM should be present in guest (e.g., as
   /dev/pmem0) after above steps of setup.

   There can be one or multiple entries in vnvdimms, which do not
   overlap with each other. Sharing the PMEM pages between domains are
   not supported, so PMEM pages assigned to each domain should not
   overlap with each other.

Patch Organization
==================

This RFC v3 is composed of following 6 parts per the task they are
going to solve. The tool stack patches are collected and separated
into each part.

- Part 0. Bug fix and code cleanup
    [01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
    [02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
    [03/39] x86_64/mm: avoid cleaning the unmapped frame table

- Part 1. Detect host PMEM
  Detect host PMEM via NFIT. No frametable and M2P table for them are
  created in this part.

    [04/39] xen/common: add Kconfig item for pmem support
    [05/39] x86/mm: exclude PMEM regions from initial frametable
    [06/39] acpi: probe valid PMEM regions via NFIT
    [07/39] xen/pmem: register valid PMEM regions to Xen hypervisor
    [08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
    [09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
    [10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
    [11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
    [12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
    [13/39] tools/xen-ndctl: add command 'list'

- Part 2. Setup host PMEM for management and guest data usage
  Allow users or admins in Dom0 to setup host PMEM pages for
  management and guest data usages.
   * Management PMEM pages are used to store the frametable and M2P of
     PMEM pages (including themselves), and never mapped to guest.
   * Guest data PMEM pages can be mapped to guest and used as the
     backend storage of virtual NVDIMM devices.

    [14/39] x86_64/mm: refactor memory_add()
    [15/39] x86_64/mm: allow customized location of extended frametable and M2P table
    [16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region
    [17/39] tools/xen-ndctl: add command 'setup-mgmt'
    [18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
    [19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions
    [20/39] tools/xen-ndctl: add option '--mgmt' to command 'list'
    [21/39] xen/pmem: support setup PMEM region for guest data usage
    [22/39] tools/xen-ndctl: add command 'setup-data'
    [23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
    [24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions
    [25/39] tools/xen-ndctl: add option '--data' to command 'list'

- Part 3. Hypervisor support to map host PMEM pages to HVM domain
    [26/39] xen/pmem: add function to map PMEM pages to HVM domain
    [27/39] xen/pmem: release PMEM pages on HVM domain destruction
    [28/39] xen: add hypercall XENMEM_populate_pmem_map

- Part 4. Pass ACPI from QEMU to Xen
  Guest NFIT and NVDIMM namespace devices are built by QEMU. This part
  implements the interface for the device model to pass its ACPI (DM
  ACPI) to Xen, and loads DM ACPI. A simple blacklist mechanism is
  added to reject DM ACPI tables and namespace devices that may
  conflict with those built by Xen itself.

    [29/39] tools: reserve guest memory for ACPI from device model
    [30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc
    [31/39] tools/libacpi: add callback to translate GPA to GVA
    [32/39] tools/libacpi: add callbacks to access XenStore
    [33/39] tools/libacpi: add a simple AML builder
    [34/39] tools/libacpi: add DM ACPI blacklists
    [35/39] tools/libacpi: load ACPI built by the device model

- Part 5. Remaining tool stack changes
  Add xl domain configuration and generate new QEMU options for vNVDIMM.

    [36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices
    [37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors
    [38/39] tools/libxl: initiate PMEM mapping via QMP callback
    [39/39] tools/libxl: build qemu options from xl vNVDIMM configs

 .gitignore                              |   1 +
 docs/man/xl.cfg.pod.5.in                |  33 ++
 tools/firmware/hvmloader/Makefile       |   3 +-
 tools/firmware/hvmloader/util.c         |  75 ++++
 tools/firmware/hvmloader/util.h         |  10 +
 tools/firmware/hvmloader/xenbus.c       |  44 +-
 tools/flask/policy/modules/dom0.te      |   2 +-
 tools/flask/policy/modules/xen.if       |   2 +-
 tools/libacpi/acpi2_0.h                 |   2 +
 tools/libacpi/aml_build.c               | 326 ++++++++++++++
 tools/libacpi/aml_build.h               | 116 +++++
 tools/libacpi/build.c                   | 330 ++++++++++++++
 tools/libacpi/libacpi.h                 |  23 +
 tools/libxc/include/xc_dom.h            |   1 +
 tools/libxc/include/xenctrl.h           |  88 ++++
 tools/libxc/xc_dom_x86.c                |  13 +
 tools/libxc/xc_domain.c                 |  15 +
 tools/libxc/xc_misc.c                   | 157 +++++++
 tools/libxl/Makefile                    |   5 +-
 tools/libxl/libxl.h                     |   5 +
 tools/libxl/libxl_create.c              |   4 +-
 tools/libxl/libxl_dm.c                  |  81 +++-
 tools/libxl/libxl_dom.c                 |  25 ++
 tools/libxl/libxl_qmp.c                 | 139 +++++-
 tools/libxl/libxl_types.idl             |  16 +
 tools/libxl/libxl_vnvdimm.c             |  79 ++++
 tools/libxl/libxl_vnvdimm.h             |  30 ++
 tools/libxl/libxl_x86_acpi.c            |  36 ++
 tools/misc/Makefile                     |   4 +
 tools/misc/xen-ndctl.c                  | 399 +++++++++++++++++
 tools/xl/xl_parse.c                     | 125 +++++-
 tools/xl/xl_vmcontrol.c                 |  15 +-
 xen/arch/x86/acpi/boot.c                |   4 +
 xen/arch/x86/acpi/power.c               |   7 +
 xen/arch/x86/dom0_build.c               |   5 +
 xen/arch/x86/domain.c                   |  32 +-
 xen/arch/x86/mm.c                       | 123 ++++-
 xen/arch/x86/setup.c                    |   4 +
 xen/arch/x86/shutdown.c                 |   3 +
 xen/arch/x86/tboot.c                    |   4 +
 xen/arch/x86/x86_64/mm.c                | 309 +++++++++----
 xen/common/Kconfig                      |   8 +
 xen/common/Makefile                     |   1 +
 xen/common/compat/memory.c              |   1 +
 xen/common/domain.c                     |   3 +
 xen/common/kexec.c                      |   3 +
 xen/common/memory.c                     |  44 ++
 xen/common/pmem.c                       | 769 ++++++++++++++++++++++++++++++++
 xen/common/sysctl.c                     |   9 +
 xen/drivers/acpi/Makefile               |   2 +
 xen/drivers/acpi/nfit.c                 | 298 +++++++++++++
 xen/include/acpi/actbl1.h               |  69 +++
 xen/include/asm-x86/domain.h            |   1 +
 xen/include/asm-x86/mm.h                |  10 +-
 xen/include/public/hvm/hvm_xs_strings.h |   8 +
 xen/include/public/memory.h             |  14 +-
 xen/include/public/sysctl.h             | 100 ++++-
 xen/include/xen/acpi.h                  |  10 +
 xen/include/xen/pmem.h                  |  76 ++++
 xen/include/xen/sched.h                 |   3 +
 xen/include/xsm/dummy.h                 |  11 +
 xen/include/xsm/xsm.h                   |  12 +
 xen/xsm/dummy.c                         |   4 +
 xen/xsm/flask/hooks.c                   |  17 +
 xen/xsm/flask/policy/access_vectors     |   4 +
 65 files changed, 4044 insertions(+), 128 deletions(-)
 create mode 100644 tools/libacpi/aml_build.c
 create mode 100644 tools/libacpi/aml_build.h
 create mode 100644 tools/libxl/libxl_vnvdimm.c
 create mode 100644 tools/libxl/libxl_vnvdimm.h
 create mode 100644 tools/misc/xen-ndctl.c
 create mode 100644 xen/common/pmem.c
 create mode 100644 xen/drivers/acpi/nfit.c
 create mode 100644 xen/include/xen/pmem.h

-- 
2.14.1

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread