All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains
@ 2017-09-11  4:37 Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
                   ` (40 more replies)
  0 siblings, 41 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Jan Beulich, Chao Peng, Dan Williams

Overview
==================

(RFC v2 can be found at https://lists.xen.org/archives/html/xen-devel/2017-03/msg02401.html)

Well, this RFC v3 changes and inflates a lot from previous versions.
The primary changes are listed below, most of which are to simplify
the first implementation and avoid additional inflation.

1. Drop the support to maintain the frametable and M2P table of PMEM
   in RAM. In the future, we may add this support back.

2. Hide host NFIT and deny access to host PMEM from Dom0. In other
   words, the kernel NVDIMM driver is loaded in Dom 0 and existing
   management utilities (e.g. ndctl) do not work in Dom0 anymore. This
   is to workaround the inferences of PMEM access between Dom0 and Xen
   hypervisor. In the future, we may add a stub driver in Dom0 which
   will hold the PMEM pages being used by Xen hypervisor and/or other
   domains.

3. As there is no NVDIMM driver and management utilities in Dom0 now,
   we cannot easily specify an area of host NVDIMM (e.g., by /dev/pmem0)
   and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
   have to specify the exact MFNs of host PMEM pages in xl domain
   configuration files and the newly added Xen NVDIMM management
   utility xen-ndctl.

   If there are indeed some tasks that have to be handled by existing
   driver and management utilities, such as recovery from hardware
   failures, they have to be accomplished out of Xen environment.

   After 2. is solved in the future, we would be able to make existing
   driver and management utilities work in Dom0 again.

All patches can be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3


How to Test
==================

1. Build and install this patchset with the associated QEMU patches.

2. Use xen-ndctl to get a list of PMEM regions detected by Xen
   hypervisor, e.g.
       
     # xen-ndctl list --raw
     Raw PMEM regions:
      0: MFN 0x480000 - 0x880000, PXM 3

   which indicates a PMEM region is present at MFN 0x480000 - 0x880000.

3. Setup a management area to manage the guest data areas.

     # xen-ndctl setup-mgmt 0x480000 0x4c0000
     # xen-ndctl list --mgmt
     Management PMEM regions:
      0: MFN 0x480000 - 0x4c0000, used 0xc00
 
   The first command setup the PMEM area in MFN 0x480000 - 0x4c0000
   (1GB) as a management area, which is also used to manage itself.
   The second command list all management areas, and 'used' field
   shows the number of pages has been used from the beginning of that
   area.

   The size ratio between a management area and areas that it manages
   (including itself) should be at least 1 : 100 (i.e., 32 bytes for
   frametable and 8 bytes for M2P table per page).

   The size of a management area as well as a data area below is
   currently restricted to 256 Mbytes or multiples. The alignment is
   restricted to 2 Mbytes or multiples.

4. Setup a data area that can be used by guest.

     # xen-ndctl setup-data 0x4c0000 0x880000 0x480c00 0x4c0000
     # xen-ndctl list --data
     Data PMEM regions:
      0: MFN 0x4c0000 - 0x880000, MGMT MFN 0x480c00 - 0x48b000

   The first command setup the remaining PMEM pages from MFN 0x4c0000
   to 0x880000 as a data area. The management area MFN from 0x480c00
   to 0x4c0000 is specified to manage this data area. The actual used
   management pages can be found by the second command.

5. Assign a data pages to a HVM domain by adding the following line in
   the domain configuration.

     vnvdimms = [ 'type=mfn, backend=0x4c0000, nr_pages=0x100000' ]

   which assigns 4 Gbytes PMEM starting from MFN 0x4c0000 to that
   domain. A 4 Gbytes PMEM should be present in guest (e.g., as
   /dev/pmem0) after above steps of setup.

   There can be one or multiple entries in vnvdimms, which do not
   overlap with each other. Sharing the PMEM pages between domains are
   not supported, so PMEM pages assigned to each domain should not
   overlap with each other.


Patch Organization
==================

This RFC v3 is composed of following 6 parts per the task they are
going to solve. The tool stack patches are collected and separated
into each part.

- Part 0. Bug fix and code cleanup
    [01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
    [02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
    [03/39] x86_64/mm: avoid cleaning the unmapped frame table

- Part 1. Detect host PMEM
  Detect host PMEM via NFIT. No frametable and M2P table for them are
  created in this part.

    [04/39] xen/common: add Kconfig item for pmem support
    [05/39] x86/mm: exclude PMEM regions from initial frametable
    [06/39] acpi: probe valid PMEM regions via NFIT
    [07/39] xen/pmem: register valid PMEM regions to Xen hypervisor
    [08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
    [09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
    [10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
    [11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
    [12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
    [13/39] tools/xen-ndctl: add command 'list'

- Part 2. Setup host PMEM for management and guest data usage
  Allow users or admins in Dom0 to setup host PMEM pages for
  management and guest data usages.
   * Management PMEM pages are used to store the frametable and M2P of
     PMEM pages (including themselves), and never mapped to guest.
   * Guest data PMEM pages can be mapped to guest and used as the
     backend storage of virtual NVDIMM devices.

    [14/39] x86_64/mm: refactor memory_add()
    [15/39] x86_64/mm: allow customized location of extended frametable and M2P table
    [16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region
    [17/39] tools/xen-ndctl: add command 'setup-mgmt'
    [18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
    [19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions
    [20/39] tools/xen-ndctl: add option '--mgmt' to command 'list'
    [21/39] xen/pmem: support setup PMEM region for guest data usage
    [22/39] tools/xen-ndctl: add command 'setup-data'
    [23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
    [24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions
    [25/39] tools/xen-ndctl: add option '--data' to command 'list'

- Part 3. Hypervisor support to map host PMEM pages to HVM domain
    [26/39] xen/pmem: add function to map PMEM pages to HVM domain
    [27/39] xen/pmem: release PMEM pages on HVM domain destruction
    [28/39] xen: add hypercall XENMEM_populate_pmem_map

- Part 4. Pass ACPI from QEMU to Xen
  Guest NFIT and NVDIMM namespace devices are built by QEMU. This part
  implements the interface for the device model to pass its ACPI (DM
  ACPI) to Xen, and loads DM ACPI. A simple blacklist mechanism is
  added to reject DM ACPI tables and namespace devices that may
  conflict with those built by Xen itself.

    [29/39] tools: reserve guest memory for ACPI from device model
    [30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc
    [31/39] tools/libacpi: add callback to translate GPA to GVA
    [32/39] tools/libacpi: add callbacks to access XenStore
    [33/39] tools/libacpi: add a simple AML builder
    [34/39] tools/libacpi: add DM ACPI blacklists
    [35/39] tools/libacpi: load ACPI built by the device model

- Part 5. Remaining tool stack changes
  Add xl domain configuration and generate new QEMU options for vNVDIMM.

    [36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices
    [37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors
    [38/39] tools/libxl: initiate PMEM mapping via QMP callback
    [39/39] tools/libxl: build qemu options from xl vNVDIMM configs


 .gitignore                              |   1 +
 docs/man/xl.cfg.pod.5.in                |  33 ++
 tools/firmware/hvmloader/Makefile       |   3 +-
 tools/firmware/hvmloader/util.c         |  75 ++++
 tools/firmware/hvmloader/util.h         |  10 +
 tools/firmware/hvmloader/xenbus.c       |  44 +-
 tools/flask/policy/modules/dom0.te      |   2 +-
 tools/flask/policy/modules/xen.if       |   2 +-
 tools/libacpi/acpi2_0.h                 |   2 +
 tools/libacpi/aml_build.c               | 326 ++++++++++++++
 tools/libacpi/aml_build.h               | 116 +++++
 tools/libacpi/build.c                   | 330 ++++++++++++++
 tools/libacpi/libacpi.h                 |  23 +
 tools/libxc/include/xc_dom.h            |   1 +
 tools/libxc/include/xenctrl.h           |  88 ++++
 tools/libxc/xc_dom_x86.c                |  13 +
 tools/libxc/xc_domain.c                 |  15 +
 tools/libxc/xc_misc.c                   | 157 +++++++
 tools/libxl/Makefile                    |   5 +-
 tools/libxl/libxl.h                     |   5 +
 tools/libxl/libxl_create.c              |   4 +-
 tools/libxl/libxl_dm.c                  |  81 +++-
 tools/libxl/libxl_dom.c                 |  25 ++
 tools/libxl/libxl_qmp.c                 | 139 +++++-
 tools/libxl/libxl_types.idl             |  16 +
 tools/libxl/libxl_vnvdimm.c             |  79 ++++
 tools/libxl/libxl_vnvdimm.h             |  30 ++
 tools/libxl/libxl_x86_acpi.c            |  36 ++
 tools/misc/Makefile                     |   4 +
 tools/misc/xen-ndctl.c                  | 399 +++++++++++++++++
 tools/xl/xl_parse.c                     | 125 +++++-
 tools/xl/xl_vmcontrol.c                 |  15 +-
 xen/arch/x86/acpi/boot.c                |   4 +
 xen/arch/x86/acpi/power.c               |   7 +
 xen/arch/x86/dom0_build.c               |   5 +
 xen/arch/x86/domain.c                   |  32 +-
 xen/arch/x86/mm.c                       | 123 ++++-
 xen/arch/x86/setup.c                    |   4 +
 xen/arch/x86/shutdown.c                 |   3 +
 xen/arch/x86/tboot.c                    |   4 +
 xen/arch/x86/x86_64/mm.c                | 309 +++++++++----
 xen/common/Kconfig                      |   8 +
 xen/common/Makefile                     |   1 +
 xen/common/compat/memory.c              |   1 +
 xen/common/domain.c                     |   3 +
 xen/common/kexec.c                      |   3 +
 xen/common/memory.c                     |  44 ++
 xen/common/pmem.c                       | 769 ++++++++++++++++++++++++++++++++
 xen/common/sysctl.c                     |   9 +
 xen/drivers/acpi/Makefile               |   2 +
 xen/drivers/acpi/nfit.c                 | 298 +++++++++++++
 xen/include/acpi/actbl1.h               |  69 +++
 xen/include/asm-x86/domain.h            |   1 +
 xen/include/asm-x86/mm.h                |  10 +-
 xen/include/public/hvm/hvm_xs_strings.h |   8 +
 xen/include/public/memory.h             |  14 +-
 xen/include/public/sysctl.h             | 100 ++++-
 xen/include/xen/acpi.h                  |  10 +
 xen/include/xen/pmem.h                  |  76 ++++
 xen/include/xen/sched.h                 |   3 +
 xen/include/xsm/dummy.h                 |  11 +
 xen/include/xsm/xsm.h                   |  12 +
 xen/xsm/dummy.c                         |   4 +
 xen/xsm/flask/hooks.c                   |  17 +
 xen/xsm/flask/policy/access_vectors     |   4 +
 65 files changed, 4044 insertions(+), 128 deletions(-)
 create mode 100644 tools/libacpi/aml_build.c
 create mode 100644 tools/libacpi/aml_build.h
 create mode 100644 tools/libxl/libxl_vnvdimm.c
 create mode 100644 tools/libxl/libxl_vnvdimm.h
 create mode 100644 tools/misc/xen-ndctl.c
 create mode 100644 xen/common/pmem.c
 create mode 100644 xen/drivers/acpi/nfit.c
 create mode 100644 xen/include/xen/pmem.h

-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-10-27  6:49   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table() Haozhong Zhang
                   ` (39 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

The current check refuses the hot-plugged memory that falls in one
unused PDX group, which should be allowed.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 11746730b4..6c5221f90c 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1296,12 +1296,8 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
         return 0;
 
     /* Make sure the new range is not present now */
-    sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  & ~(PDX_GROUP_COUNT - 1))
-            / PDX_GROUP_COUNT;
+    sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;
     eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;
-    if (sidx >= eidx)
-        return 0;
-
     s = find_next_zero_bit(pdx_group_valid, eidx, sidx);
     if ( s > eidx )
         return 0;
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-10-27  6:58   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
                   ` (38 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is
identical to the former.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 6c5221f90c..c93383d7d9 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -720,12 +720,11 @@ static void cleanup_frame_table(struct mem_hotadd_info *info)
     spfn = info->spfn;
     epfn = info->epfn;
 
-    sva = (unsigned long)pdx_to_page(pfn_to_pdx(spfn));
-    eva = (unsigned long)pdx_to_page(pfn_to_pdx(epfn));
+    sva = (unsigned long)mfn_to_page(spfn);
+    eva = (unsigned long)mfn_to_page(epfn);
 
     /* Intialize all page */
-    memset(mfn_to_page(spfn), -1,
-           (unsigned long)mfn_to_page(epfn) - (unsigned long)mfn_to_page(spfn));
+    memset((void *)sva, -1, eva - sva);
 
     while (sva < eva)
     {
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table() Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-10-27  8:10   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 04/39] xen/common: add Kconfig item for pmem support Haozhong Zhang
                   ` (37 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

cleanup_frame_table() initializes the entire newly added frame table
to all -1's. If it's called after extend_frame_table() failed to map
the entire frame table, the initialization will hit a page fault.

Move the cleanup of partially mapped frametable to extend_frame_table(),
which has enough knowledge of the mapping status.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 51 ++++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index c93383d7d9..f635e4bf70 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -710,15 +710,12 @@ void free_compat_arg_xlat(struct vcpu *v)
                               PFN_UP(COMPAT_ARG_XLAT_SIZE));
 }
 
-static void cleanup_frame_table(struct mem_hotadd_info *info)
+static void cleanup_frame_table(unsigned long spfn, unsigned long epfn)
 {
+    struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn };
     unsigned long sva, eva;
     l3_pgentry_t l3e;
     l2_pgentry_t l2e;
-    unsigned long spfn, epfn;
-
-    spfn = info->spfn;
-    epfn = info->epfn;
 
     sva = (unsigned long)mfn_to_page(spfn);
     eva = (unsigned long)mfn_to_page(epfn);
@@ -744,7 +741,7 @@ static void cleanup_frame_table(struct mem_hotadd_info *info)
         if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) ==
               (_PAGE_PSE | _PAGE_PRESENT) )
         {
-            if (hotadd_mem_valid(l2e_get_pfn(l2e), info))
+            if ( hotadd_mem_valid(l2e_get_pfn(l2e), &info) )
                 destroy_xen_mappings(sva & ~((1UL << L2_PAGETABLE_SHIFT) - 1),
                          ((sva & ~((1UL << L2_PAGETABLE_SHIFT) -1 )) +
                             (1UL << L2_PAGETABLE_SHIFT) - 1));
@@ -769,28 +766,33 @@ static int setup_frametable_chunk(void *start, void *end,
 {
     unsigned long s = (unsigned long)start;
     unsigned long e = (unsigned long)end;
-    unsigned long mfn;
-    int err;
+    unsigned long cur, mfn;
+    int err = 0;
 
     ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
     ASSERT(!(e & ((1 << L2_PAGETABLE_SHIFT) - 1)));
 
-    for ( ; s < e; s += (1UL << L2_PAGETABLE_SHIFT))
+    for ( cur = s; cur < e; cur += (1UL << L2_PAGETABLE_SHIFT) )
     {
         mfn = alloc_hotadd_mfn(info);
-        err = map_pages_to_xen(s, mfn, 1UL << PAGETABLE_ORDER,
+        err = map_pages_to_xen(cur, mfn, 1UL << PAGETABLE_ORDER,
                                PAGE_HYPERVISOR);
         if ( err )
-            return err;
+            break;
     }
-    memset(start, -1, s - (unsigned long)start);
 
-    return 0;
+    if ( !err )
+        memset(start, -1, cur - s);
+    else
+        destroy_xen_mappings(s, cur);
+
+    return err;
 }
 
 static int extend_frame_table(struct mem_hotadd_info *info)
 {
     unsigned long cidx, nidx, eidx, spfn, epfn;
+    int err = 0;
 
     spfn = info->spfn;
     epfn = info->epfn;
@@ -809,8 +811,6 @@ static int extend_frame_table(struct mem_hotadd_info *info)
 
     while ( cidx < eidx )
     {
-        int err;
-
         nidx = find_next_bit(pdx_group_valid, eidx, cidx);
         if ( nidx >= eidx )
             nidx = eidx;
@@ -818,14 +818,19 @@ static int extend_frame_table(struct mem_hotadd_info *info)
                                      pdx_to_page(nidx * PDX_GROUP_COUNT),
                                      info);
         if ( err )
-            return err;
+            break;
 
         cidx = find_next_zero_bit(pdx_group_valid, eidx, nidx);
     }
 
-    memset(mfn_to_page(spfn), 0,
-           (unsigned long)mfn_to_page(epfn) - (unsigned long)mfn_to_page(spfn));
-    return 0;
+    if ( !err )
+        memset(mfn_to_page(spfn), 0,
+               (unsigned long)mfn_to_page(epfn) -
+               (unsigned long)mfn_to_page(spfn));
+    else
+        cleanup_frame_table(spfn, pdx_to_pfn(cidx * PDX_GROUP_COUNT));
+
+    return err;
 }
 
 void __init subarch_init_memory(void)
@@ -1404,8 +1409,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     info.cur = spfn;
 
     ret = extend_frame_table(&info);
-    if (ret)
-        goto destroy_frametable;
+    if ( ret )
+        goto restore_node_status;
 
     /* Set max_page as setup_m2p_table will use it*/
     if (max_page < epfn)
@@ -1448,8 +1453,8 @@ destroy_m2p:
     max_page = old_max;
     total_pages = old_total;
     max_pdx = pfn_to_pdx(max_page - 1) + 1;
-destroy_frametable:
-    cleanup_frame_table(&info);
+    cleanup_frame_table(spfn, epfn);
+restore_node_status:
     if ( !orig_online )
         node_set_offline(node);
     NODE_DATA(node)->node_start_pfn = old_node_start;
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 04/39] xen/common: add Kconfig item for pmem support
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (2 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable Haozhong Zhang
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Add CONFIG_PMEM to enable NVDIMM persistent memory support. By
default, it's N.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/Kconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index dc8e876439..d4565b1c7b 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -279,4 +279,12 @@ config CMDLINE_OVERRIDE
 
 	  This is used to work around broken bootloaders. This should
 	  be set to 'N' under normal conditions.
+
+config NVDIMM_PMEM
+	bool "Persistent memory support"
+	default n
+	---help---
+	  Enable support for NVDIMM in the persistent memory mode.
+
+	  If unsure, say N.
 endmenu
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (3 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 04/39] xen/common: add Kconfig item for pmem support Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-11-03  5:58   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT Haozhong Zhang
                   ` (35 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, George Dunlap, Andrew Cooper, Jan Beulich,
	Chao Peng, Dan Williams

No specification defines that PMEM regions cannot appear in margins
between RAM regions. If that does happen, init_frametable() will need
to allocate RAM for the part of frametable of PMEM regions. However,
PMEM regions can be very large (several terabytes or more), so
init_frametable() may fail.

Because Xen does not use PMEM at the boot time, we can defer the
actual resource allocation of frametable of PMEM regions. At the boot
time, all pages of frametable of PMEM regions appearing between RAM
regions are mapped one RAM page filled with 0xff.

Any attempt, whichs write to those frametable pages before the their
actual resource is allocated, implies bugs in Xen. Therefore, the
read-only mapping is used here to make those bugs explicit.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c         | 117 +++++++++++++++++++++++++++++++++++++++++-----
 xen/arch/x86/setup.c      |   4 ++
 xen/drivers/acpi/Makefile |   2 +
 xen/drivers/acpi/nfit.c   | 116 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/acpi/actbl1.h |  43 +++++++++++++++++
 xen/include/xen/acpi.h    |   7 +++
 6 files changed, 278 insertions(+), 11 deletions(-)
 create mode 100644 xen/drivers/acpi/nfit.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e5a029c9be..2fdf609805 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -83,6 +83,9 @@
  * an application-supplied buffer).
  */
 
+#ifdef CONFIG_NVDIMM_PMEM
+#include <xen/acpi.h>
+#endif
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -196,31 +199,123 @@ static int __init parse_mmio_relax(const char *s)
 }
 custom_param("mmio-relax", parse_mmio_relax);
 
-static void __init init_frametable_chunk(void *start, void *end)
+static void __init init_frametable_ram_chunk(unsigned long s, unsigned long e)
 {
-    unsigned long s = (unsigned long)start;
-    unsigned long e = (unsigned long)end;
-    unsigned long step, mfn;
+    unsigned long cur, step, mfn;
 
-    ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
-    for ( ; s < e; s += step << PAGE_SHIFT )
+    for ( cur = s; cur < e; cur += step << PAGE_SHIFT )
     {
         step = 1UL << (cpu_has_page1gb &&
-                       !(s & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ?
+                       !(cur & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ?
                        L3_PAGETABLE_SHIFT - PAGE_SHIFT :
                        L2_PAGETABLE_SHIFT - PAGE_SHIFT);
         /*
          * The hardcoded 4 below is arbitrary - just pick whatever you think
          * is reasonable to waste as a trade-off for using a large page.
          */
-        while ( step && s + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) )
+        while ( step && cur + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) )
             step >>= PAGETABLE_ORDER;
         mfn = alloc_boot_pages(step, step);
-        map_pages_to_xen(s, mfn, step, PAGE_HYPERVISOR);
+        map_pages_to_xen(cur, mfn, step, PAGE_HYPERVISOR);
     }
 
-    memset(start, 0, end - start);
-    memset(end, -1, s - e);
+    memset((void *)s, 0, e - s);
+    memset((void *)e, -1, cur - e);
+}
+
+#ifdef CONFIG_NVDIMM_PMEM
+static void __init init_frametable_pmem_chunk(unsigned long s, unsigned long e)
+{
+    static unsigned long pmem_init_frametable_mfn;
+
+    ASSERT(!((s | e) & (PAGE_SIZE - 1)));
+
+    if ( !pmem_init_frametable_mfn )
+    {
+        pmem_init_frametable_mfn = alloc_boot_pages(1, 1);
+        if ( !pmem_init_frametable_mfn )
+            panic("Not enough memory for pmem initial frame table page");
+        memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE);
+    }
+
+    while ( s < e )
+    {
+        /*
+         * The real frame table entries of a pmem region will be
+         * created when the pmem region is registered to hypervisor.
+         * Any write attempt to the initial entries of that pmem
+         * region implies potential hypervisor bugs. In order to make
+         * those bugs explicit, map those initial entries as read-only.
+         */
+        map_pages_to_xen(s, pmem_init_frametable_mfn, 1, PAGE_HYPERVISOR_RO);
+        s += PAGE_SIZE;
+    }
+}
+#endif /* CONFIG_NVDIMM_PMEM */
+
+static void __init init_frametable_chunk(void *start, void *end)
+{
+    unsigned long s = (unsigned long)start;
+    unsigned long e = (unsigned long)end;
+#ifdef CONFIG_NVDIMM_PMEM
+    unsigned long pmem_smfn, pmem_emfn;
+    unsigned long pmem_spage = s, pmem_epage = s;
+    unsigned long pmem_page_aligned;
+    bool found = false;
+#endif /* CONFIG_NVDIMM_PMEM */
+
+    ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
+
+#ifndef CONFIG_NVDIMM_PMEM
+    init_frametable_ram_chunk(s, e);
+#else
+    while ( s < e )
+    {
+        /* No previous found pmem region overlaps with s ~ e. */
+        if ( s >= (pmem_epage & PAGE_MASK) )
+        {
+            found = acpi_nfit_boot_search_pmem(
+                mfn_x(page_to_mfn((struct page_info *)s)),
+                mfn_x(page_to_mfn((struct page_info *)e)),
+                &pmem_smfn, &pmem_emfn);
+            if ( found )
+            {
+                pmem_spage = (unsigned long)mfn_to_page(_mfn(pmem_smfn));
+                pmem_epage = (unsigned long)mfn_to_page(_mfn(pmem_emfn));
+            }
+        }
+
+        /* No pmem region found in s ~ e. */
+        if ( s >= (pmem_epage & PAGE_MASK) )
+        {
+            init_frametable_ram_chunk(s, e);
+            break;
+        }
+
+        if ( s < pmem_spage )
+        {
+            init_frametable_ram_chunk(s, pmem_spage);
+            pmem_page_aligned = (pmem_spage + PAGE_SIZE - 1) & PAGE_MASK;
+            if ( pmem_page_aligned > pmem_epage )
+                memset((void *)pmem_epage, -1, pmem_page_aligned - pmem_epage);
+            s = pmem_page_aligned;
+        }
+        else
+        {
+            pmem_page_aligned = pmem_epage & PAGE_MASK;
+            if ( pmem_page_aligned > s )
+                init_frametable_pmem_chunk(s, pmem_page_aligned);
+            if ( pmem_page_aligned < pmem_epage )
+            {
+                init_frametable_ram_chunk(pmem_page_aligned,
+                                          min(pmem_page_aligned + PAGE_SIZE, e));
+                memset((void *)pmem_page_aligned, -1,
+                       pmem_epage - pmem_page_aligned);
+            }
+            s = (pmem_epage + PAGE_SIZE - 1) & PAGE_MASK;
+        }
+    }
+#endif
 }
 
 void __init init_frametable(void)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3cbe305202..b9ebda8f4e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1358,6 +1358,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     BUILD_BUG_ON(MACH2PHYS_VIRT_START != RO_MPT_VIRT_START);
     BUILD_BUG_ON(MACH2PHYS_VIRT_END   != RO_MPT_VIRT_END);
 
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_boot_init();
+#endif
+
     init_frametable();
 
     if ( !acpi_boot_table_init_done )
diff --git a/xen/drivers/acpi/Makefile b/xen/drivers/acpi/Makefile
index 444b11d583..c8bb869cb8 100644
--- a/xen/drivers/acpi/Makefile
+++ b/xen/drivers/acpi/Makefile
@@ -9,3 +9,5 @@ obj-$(CONFIG_HAS_CPUFREQ) += pmstat.o
 
 obj-$(CONFIG_X86) += hwregs.o
 obj-$(CONFIG_X86) += reboot.o
+
+obj-$(CONFIG_NVDIMM_PMEM) += nfit.o
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
new file mode 100644
index 0000000000..e099378ee0
--- /dev/null
+++ b/xen/drivers/acpi/nfit.c
@@ -0,0 +1,116 @@
+/*
+ * xen/drivers/acpi/nfit.c
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/acpi.h>
+#include <xen/init.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+
+/*
+ * GUID of a byte addressable persistent memory region
+ * (ref. ACPI 6.2, Section 5.2.25.2)
+ */
+static const uint8_t nfit_spa_pmem_guid[] =
+{
+    0x79, 0xd3, 0xf0, 0x66, 0xf3, 0xb4, 0x74, 0x40,
+    0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb,
+};
+
+struct acpi_nfit_desc {
+    struct acpi_table_nfit *acpi_table;
+};
+
+static struct acpi_nfit_desc nfit_desc;
+
+void __init acpi_nfit_boot_init(void)
+{
+    acpi_status status;
+    acpi_physical_address nfit_addr;
+    acpi_native_uint nfit_len;
+
+    status = acpi_get_table_phys(ACPI_SIG_NFIT, 0, &nfit_addr, &nfit_len);
+    if ( ACPI_FAILURE(status) )
+        return;
+
+    nfit_desc.acpi_table = (struct acpi_table_nfit *)__va(nfit_addr);
+    map_pages_to_xen((unsigned long)nfit_desc.acpi_table, PFN_DOWN(nfit_addr),
+                     PFN_UP(nfit_addr + nfit_len) - PFN_DOWN(nfit_addr),
+                     PAGE_HYPERVISOR);
+}
+
+/**
+ * Search pmem regions overlapped with the specified address range.
+ *
+ * Parameters:
+ *  @smfn, @emfn: the start and end MFN of address range to search
+ *  @ret_smfn, @ret_emfn: return the address range of the first pmem region
+ *                        in above range
+ *
+ * Return:
+ *  Return true if a pmem region is overlapped with @smfn - @emfn. The
+ *  start and end MFN of the lowest pmem region are returned via
+ *  @ret_smfn and @ret_emfn respectively.
+ *
+ *  Return false if no pmem region is overlapped with @smfn - @emfn.
+ */
+bool __init acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
+                                       unsigned long *ret_smfn,
+                                       unsigned long *ret_emfn)
+{
+    struct acpi_table_nfit *nfit_table = nfit_desc.acpi_table;
+    uint32_t hdr_offset = sizeof(*nfit_table);
+    unsigned long saddr = pfn_to_paddr(smfn), eaddr = pfn_to_paddr(emfn);
+    unsigned long ret_saddr = 0, ret_eaddr = 0;
+
+    if ( !nfit_table )
+        return false;
+
+    while ( hdr_offset < nfit_table->header.length )
+    {
+        struct acpi_nfit_header *hdr = (void *)nfit_table + hdr_offset;
+        struct acpi_nfit_system_address *spa;
+        unsigned long pmem_saddr, pmem_eaddr;
+
+        hdr_offset += hdr->length;
+
+        if ( hdr->type != ACPI_NFIT_TYPE_SYSTEM_ADDRESS )
+            continue;
+
+        spa = (struct acpi_nfit_system_address *)hdr;
+        if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) )
+            continue;
+
+        pmem_saddr = spa->address;
+        pmem_eaddr = pmem_saddr + spa->length;
+        if ( pmem_saddr >= eaddr || pmem_eaddr <= saddr )
+            continue;
+
+        if ( ret_saddr < pmem_saddr )
+            continue;
+        ret_saddr = pmem_saddr;
+        ret_eaddr = pmem_eaddr;
+    }
+
+    if ( ret_saddr == ret_eaddr )
+        return false;
+
+    *ret_smfn = paddr_to_pfn(ret_saddr);
+    *ret_emfn = paddr_to_pfn(ret_eaddr);
+
+    return true;
+}
diff --git a/xen/include/acpi/actbl1.h b/xen/include/acpi/actbl1.h
index e1991362dc..94d8d7775c 100644
--- a/xen/include/acpi/actbl1.h
+++ b/xen/include/acpi/actbl1.h
@@ -71,6 +71,7 @@
 #define ACPI_SIG_SBST           "SBST"	/* Smart Battery Specification Table */
 #define ACPI_SIG_SLIT           "SLIT"	/* System Locality Distance Information Table */
 #define ACPI_SIG_SRAT           "SRAT"	/* System Resource Affinity Table */
+#define ACPI_SIG_NFIT           "NFIT"	/* NVDIMM Firmware Interface Table */
 
 /*
  * All tables must be byte-packed to match the ACPI specification, since
@@ -903,6 +904,48 @@ struct acpi_msct_proximity {
 	u64 memory_capacity;	/* In bytes */
 };
 
+/*******************************************************************************
+ *
+ * NFIT - NVDIMM Interface Table (ACPI 6.0+)
+ *		  Version 1
+ *
+ ******************************************************************************/
+
+struct acpi_table_nfit {
+	struct acpi_table_header header;	/* Common ACPI table header */
+	u32 reserved;						/* Reserved, must be zero */
+};
+
+/* Subtable header for NFIT */
+
+struct acpi_nfit_header {
+	u16 type;
+	u16 length;
+};
+
+/* Values for subtable type in struct acpi_nfit_header */
+enum acpi_nfit_type {
+	ACPI_NFIT_TYPE_SYSTEM_ADDRESS = 0,
+	ACPI_NFIT_TYPE_MEMORY_MAP = 1,
+};
+
+/*
+ * NFIT Subtables
+ */
+
+/* 0: System Physical Address Range Structure */
+struct acpi_nfit_system_address {
+	struct acpi_nfit_header header;
+	u16 range_index;
+	u16 flags;
+	u32 reserved;		/* Reseved, must be zero */
+	u32 proximity_domain;
+	u8	range_guid[16];
+	u64 address;
+	u64 length;
+	u64 memory_mapping;
+};
+
 /*******************************************************************************
  *
  * SBST - Smart Battery Specification Table
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 9409350f05..1bd8f9f4e4 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -180,4 +180,11 @@ void acpi_reboot(void);
 void acpi_dmar_zap(void);
 void acpi_dmar_reinstate(void);
 
+#ifdef CONFIG_NVDIMM_PMEM
+void acpi_nfit_boot_init(void);
+bool acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
+                                unsigned long *ret_smfn,
+                                unsigned long *ret_emfn);
+#endif /* CONFIG_NVDIMM_PMEM */
+
 #endif /*_LINUX_ACPI_H*/
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (4 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-11-03  6:15   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor Haozhong Zhang
                   ` (34 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

A PMEM region with failures (e.g., not properly flushed in the last
power cycle, or some blocks within it are borken) cannot be safely
used by Xen and guest. Scan the state flags of NVDIMM region mapping
structures in NFIT to check whether any failures happened to a PMEM
region. The recovery of those failure are left out of Xen (e.g. left
to the firmware or other management utilities on the bare metal).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/acpi/boot.c  |   4 ++
 xen/drivers/acpi/nfit.c   | 153 +++++++++++++++++++++++++++++++++++++++++++++-
 xen/include/acpi/actbl1.h |  26 ++++++++
 xen/include/xen/acpi.h    |   1 +
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
index 8e6c96dcf6..f52a2c6dc5 100644
--- a/xen/arch/x86/acpi/boot.c
+++ b/xen/arch/x86/acpi/boot.c
@@ -732,5 +732,9 @@ int __init acpi_boot_init(void)
 
 	acpi_table_parse(ACPI_SIG_BGRT, acpi_invalidate_bgrt);
 
+#ifdef CONFIG_NVDIMM_PMEM
+	acpi_nfit_init();
+#endif
+
 	return 0;
 }
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index e099378ee0..b88a587b8d 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -31,11 +31,143 @@ static const uint8_t nfit_spa_pmem_guid[] =
     0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb,
 };
 
+struct nfit_spa_desc {
+    struct list_head link;
+    struct acpi_nfit_system_address *acpi_table;
+};
+
+struct nfit_memdev_desc {
+    struct list_head link;
+    struct acpi_nfit_memory_map *acpi_table;
+    struct nfit_spa_desc *spa_desc;
+};
+
 struct acpi_nfit_desc {
     struct acpi_table_nfit *acpi_table;
+    struct list_head spa_list;
+    struct list_head memdev_list;
 };
 
-static struct acpi_nfit_desc nfit_desc;
+static struct acpi_nfit_desc nfit_desc = {
+    .spa_list = LIST_HEAD_INIT(nfit_desc.spa_list),
+    .memdev_list = LIST_HEAD_INIT(nfit_desc.memdev_list),
+};
+
+static void __init acpi_nfit_del_subtables(struct acpi_nfit_desc *desc)
+{
+    struct nfit_spa_desc *spa, *spa_next;
+    struct nfit_memdev_desc *memdev, *memdev_next;
+
+    list_for_each_entry_safe(spa, spa_next, &desc->spa_list, link)
+    {
+        list_del(&spa->link);
+        xfree(spa);
+    }
+    list_for_each_entry_safe (memdev, memdev_next, &desc->memdev_list, link)
+    {
+        list_del(&memdev->link);
+        xfree(memdev);
+    }
+}
+
+static int __init acpi_nfit_add_subtables(struct acpi_nfit_desc *desc)
+{
+    struct acpi_table_nfit *nfit_table = desc->acpi_table;
+    uint32_t hdr_offset = sizeof(*nfit_table);
+    uint32_t nfit_length = nfit_table->header.length;
+    struct acpi_nfit_header *hdr;
+    struct nfit_spa_desc *spa_desc;
+    struct nfit_memdev_desc *memdev_desc;
+    int ret = 0;
+
+#define INIT_DESC(desc, acpi_hdr, acpi_type, desc_list) \
+    do {                                                \
+        (desc) = xzalloc(typeof(*(desc)));              \
+        if ( unlikely(!(desc)) ) {                      \
+            ret = -ENOMEM;                              \
+            goto nomem;                                 \
+        }                                               \
+        (desc)->acpi_table = (acpi_type *)(acpi_hdr);   \
+        INIT_LIST_HEAD(&(desc)->link);                  \
+        list_add_tail(&(desc)->link, (desc_list));      \
+    } while ( 0 )
+
+    while ( hdr_offset < nfit_length )
+    {
+        hdr = (void *)nfit_table + hdr_offset;
+        hdr_offset += hdr->length;
+
+        switch ( hdr->type )
+        {
+        case ACPI_NFIT_TYPE_SYSTEM_ADDRESS:
+            INIT_DESC(spa_desc, hdr, struct acpi_nfit_system_address,
+                      &desc->spa_list);
+            break;
+
+        case ACPI_NFIT_TYPE_MEMORY_MAP:
+            INIT_DESC(memdev_desc, hdr, struct acpi_nfit_memory_map,
+                      &desc->memdev_list);
+            break;
+
+        default:
+            continue;
+        }
+    }
+
+#undef INIT_DESC
+
+    return 0;
+
+ nomem:
+    acpi_nfit_del_subtables(desc);
+
+    return ret;
+}
+
+static void __init acpi_nfit_link_subtables(struct acpi_nfit_desc *desc)
+{
+    struct nfit_spa_desc *spa_desc;
+    struct nfit_memdev_desc *memdev_desc;
+    uint16_t spa_idx;
+
+    list_for_each_entry(memdev_desc, &desc->memdev_list, link)
+    {
+        spa_idx = memdev_desc->acpi_table->range_index;
+        list_for_each_entry(spa_desc, &desc->spa_list, link)
+        {
+            if ( spa_desc->acpi_table->range_index == spa_idx )
+                break;
+        }
+        memdev_desc->spa_desc = spa_desc;
+    }
+}
+
+static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
+{
+    struct nfit_spa_desc *spa_desc;
+    struct nfit_memdev_desc *memdev_desc;
+    struct acpi_nfit_system_address *spa;
+    unsigned long smfn, emfn;
+
+    list_for_each_entry(memdev_desc, &desc->memdev_list, link)
+    {
+        spa_desc = memdev_desc->spa_desc;
+
+        if ( !spa_desc ||
+             (memdev_desc->acpi_table->flags &
+              (ACPI_NFIT_MEM_SAVE_FAILED | ACPI_NFIT_MEM_RESTORE_FAILED |
+               ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_NOT_ARMED |
+               ACPI_NFIT_MEM_MAP_FAILED)) )
+            continue;
+
+        spa = spa_desc->acpi_table;
+        if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) )
+            continue;
+        smfn = paddr_to_pfn(spa->address);
+        emfn = paddr_to_pfn(spa->address + spa->length);
+        printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn, emfn);
+    }
+}
 
 void __init acpi_nfit_boot_init(void)
 {
@@ -53,6 +185,25 @@ void __init acpi_nfit_boot_init(void)
                      PAGE_HYPERVISOR);
 }
 
+void __init acpi_nfit_init(void)
+{
+    if ( !nfit_desc.acpi_table )
+        return;
+
+    /* Collect all SPA and memory map sub-tables. */
+    if ( acpi_nfit_add_subtables(&nfit_desc) )
+    {
+        printk(XENLOG_ERR "NFIT: no memory for NFIT management\n");
+        return;
+    }
+
+    /* Link descriptors of SPA and memory map sub-tables. */
+    acpi_nfit_link_subtables(&nfit_desc);
+
+    /* Register valid pmem regions to Xen hypervisor. */
+    acpi_nfit_register_pmem(&nfit_desc);
+}
+
 /**
  * Search pmem regions overlapped with the specified address range.
  *
diff --git a/xen/include/acpi/actbl1.h b/xen/include/acpi/actbl1.h
index 94d8d7775c..037652916a 100644
--- a/xen/include/acpi/actbl1.h
+++ b/xen/include/acpi/actbl1.h
@@ -946,6 +946,32 @@ struct acpi_nfit_system_address {
 	u64 memory_mapping;
 };
 
+/* 1: Memory Device to System Address Range Map Structure */
+struct acpi_nfit_memory_map {
+	struct acpi_nfit_header header;
+	u32 device_handle;
+	u16 physical_id;
+	u16 region_id;
+	u16 range_index;
+	u16 region_index;
+	u64 region_size;
+	u64 region_offset;
+	u64 address;
+	u16 interleave_index;
+	u16 interleave_ways;
+	u16 flags;
+	u16 reserved;		/* Reserved, must be zero */
+};
+
+/* Flags in struct acpi_nfit_memory_map */
+#define ACPI_NFIT_MEM_SAVE_FAILED		(1)	/* 00: Last SAVE to Memory Device failed */
+#define ACPI_NFIT_MEM_RESTORE_FAILED	(1<<1)	/* 01: Last RESTORE from Memory Device failed */
+#define ACPI_NFIT_MEM_FLUSH_FAILED		(1<<2)	/* 02: Platform flush failed */
+#define ACPI_NFIT_MEM_NOT_ARMED			(1<<3)	/* 03: Memory Device is not armed */
+#define ACPI_NFIT_MEM_HEALTH_OBSERVED	(1<<4)	/* 04: Memory Device observed SMART/health events */
+#define ACPI_NFIT_MEM_HEALTH_ENABLED	(1<<5)	/* 05: SMART/health events enabled */
+#define ACPI_NFIT_MEM_MAP_FAILED		(1<<6)	/* 06: Mapping to SPA failed */
+
 /*******************************************************************************
  *
  * SBST - Smart Battery Specification Table
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 1bd8f9f4e4..088f01255d 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -185,6 +185,7 @@ void acpi_nfit_boot_init(void);
 bool acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
                                 unsigned long *ret_smfn,
                                 unsigned long *ret_emfn);
+void acpi_nfit_init(void);
 #endif /* CONFIG_NVDIMM_PMEM */
 
 #endif /*_LINUX_ACPI_H*/
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (5 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-11-03  6:26   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0 Haozhong Zhang
                   ` (33 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

Register valid PMEM regions probed via NFIT to Xen hypervisor. No
frametable and M2P table are created for those PMEM regions at this
stage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/common/Makefile     |   1 +
 xen/common/pmem.c       | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
 xen/drivers/acpi/nfit.c |  12 ++++-
 xen/include/xen/pmem.h  |  28 +++++++++++
 4 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/pmem.c
 create mode 100644 xen/include/xen/pmem.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 39e2614546..46f9d1f57f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -29,6 +29,7 @@ obj-y += notifier.o
 obj-y += page_alloc.o
 obj-$(CONFIG_HAS_PDX) += pdx.o
 obj-$(CONFIG_PERF_COUNTERS) += perfc.o
+obj-${CONFIG_NVDIMM_PMEM} += pmem.o
 obj-y += preempt.o
 obj-y += random.o
 obj-y += rangeset.o
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
new file mode 100644
index 0000000000..49648222a6
--- /dev/null
+++ b/xen/common/pmem.c
@@ -0,0 +1,130 @@
+/*
+ * xen/common/pmem.c
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/errno.h>
+#include <xen/list.h>
+#include <xen/pmem.h>
+
+/*
+ * All PMEM regions presenting in NFIT SPA range structures are linked
+ * in this list.
+ */
+static LIST_HEAD(pmem_raw_regions);
+static unsigned int nr_raw_regions;
+
+struct pmem {
+    struct list_head link; /* link to one of PMEM region list */
+    unsigned long smfn;    /* start MFN of the PMEM region */
+    unsigned long emfn;    /* end MFN of the PMEM region */
+
+    union {
+        struct {
+            unsigned int pxm; /* proximity domain of the PMEM region */
+        } raw;
+    } u;
+};
+
+static bool check_overlap(unsigned long smfn1, unsigned long emfn1,
+                          unsigned long smfn2, unsigned long emfn2)
+{
+    return (smfn1 >= smfn2 && smfn1 < emfn2) ||
+           (emfn1 > smfn2 && emfn1 <= emfn2);
+}
+
+/**
+ * Add a PMEM region to a list. All PMEM regions in the list are
+ * sorted in the ascending order of the start address. A PMEM region,
+ * whose range is overlapped with anyone in the list, cannot be added
+ * to the list.
+ *
+ * Parameters:
+ *  list:       the list to which a new PMEM region will be added
+ *  smfn, emfn: the range of the new PMEM region
+ *  entry:      return the new entry added to the list
+ *
+ * Return:
+ *  On success, return 0 and the new entry added to the list is
+ *  returned via @entry. Otherwise, return an error number and the
+ *  value of @entry is undefined.
+ */
+static int pmem_list_add(struct list_head *list,
+                         unsigned long smfn, unsigned long emfn,
+                         struct pmem **entry)
+{
+    struct list_head *cur;
+    struct pmem *new_pmem;
+    int rc = 0;
+
+    list_for_each_prev(cur, list)
+    {
+        struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
+        unsigned long cur_smfn = cur_pmem->smfn;
+        unsigned long cur_emfn = cur_pmem->emfn;
+
+        if ( check_overlap(smfn, emfn, cur_smfn, cur_emfn) )
+        {
+            rc = -EEXIST;
+            goto out;
+        }
+
+        if ( cur_smfn < smfn )
+            break;
+    }
+
+    new_pmem = xzalloc(struct pmem);
+    if ( !new_pmem )
+    {
+        rc = -ENOMEM;
+        goto out;
+    }
+    new_pmem->smfn = smfn;
+    new_pmem->emfn = emfn;
+    list_add(&new_pmem->link, cur);
+
+ out:
+    if ( !rc && entry )
+        *entry = new_pmem;
+
+    return rc;
+}
+
+/**
+ * Register a pmem region to Xen.
+ *
+ * Parameters:
+ *  smfn, emfn: start and end MFNs of the pmem region
+ *  pxm:        the proximity domain of the pmem region
+ *
+ * Return:
+ *  On success, return 0. Otherwise, an error number is returned.
+ */
+int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
+{
+    int rc;
+    struct pmem *pmem;
+
+    if ( smfn >= emfn )
+        return -EINVAL;
+
+    rc = pmem_list_add(&pmem_raw_regions, smfn, emfn, &pmem);
+    if ( !rc )
+        pmem->u.raw.pxm = pxm;
+    nr_raw_regions++;
+
+    return rc;
+}
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index b88a587b8d..68750c2edc 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -20,6 +20,7 @@
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/pfn.h>
+#include <xen/pmem.h>
 
 /*
  * GUID of a byte addressable persistent memory region
@@ -148,6 +149,7 @@ static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
     struct nfit_memdev_desc *memdev_desc;
     struct acpi_nfit_system_address *spa;
     unsigned long smfn, emfn;
+    int rc;
 
     list_for_each_entry(memdev_desc, &desc->memdev_list, link)
     {
@@ -165,7 +167,15 @@ static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
             continue;
         smfn = paddr_to_pfn(spa->address);
         emfn = paddr_to_pfn(spa->address + spa->length);
-        printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn, emfn);
+        rc = pmem_register(smfn, emfn, spa->proximity_domain);
+        if ( !rc )
+            printk(XENLOG_INFO
+                   "NFIT: PMEM MFNs 0x%lx - 0x%lx on PXM %u registered\n",
+                   smfn, emfn, spa->proximity_domain);
+        else
+            printk(XENLOG_ERR
+                   "NFIT: failed to register PMEM MFNs 0x%lx - 0x%lx on PXM %u, err %d\n",
+                   smfn, emfn, spa->proximity_domain, rc);
     }
 }
 
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
new file mode 100644
index 0000000000..41cb9bb04f
--- /dev/null
+++ b/xen/include/xen/pmem.h
@@ -0,0 +1,28 @@
+/*
+ * xen/include/xen/pmem.h
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __XEN_PMEM_H__
+#define __XEN_PMEM_H__
+#ifdef CONFIG_NVDIMM_PMEM
+
+#include <xen/types.h>
+
+int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
+
+#endif /* CONFIG_NVDIMM_PMEM */
+#endif /* __XEN_PMEM_H__ */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (6 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-11-03  6:51   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op Haozhong Zhang
                   ` (32 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Shane Wang,
	Chao Peng, Dan Williams, Gang Wei

... to avoid the inference with the PMEM driver and management
utilities in Dom0.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Gang Wei <gang.wei@intel.com>
Cc: Shane Wang <shane.wang@intel.com>
---
 xen/arch/x86/acpi/power.c |  7 +++++++
 xen/arch/x86/dom0_build.c |  5 +++++
 xen/arch/x86/shutdown.c   |  3 +++
 xen/arch/x86/tboot.c      |  4 ++++
 xen/common/kexec.c        |  3 +++
 xen/common/pmem.c         | 21 +++++++++++++++++++++
 xen/drivers/acpi/nfit.c   | 21 +++++++++++++++++++++
 xen/include/xen/acpi.h    |  2 ++
 xen/include/xen/pmem.h    | 13 +++++++++++++
 9 files changed, 79 insertions(+)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 1e4e5680a7..d135715a49 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -178,6 +178,10 @@ static int enter_state(u32 state)
 
     freeze_domains();
 
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_reinstate();
+#endif
+
     acpi_dmar_reinstate();
 
     if ( (error = disable_nonboot_cpus()) )
@@ -260,6 +264,9 @@ static int enter_state(u32 state)
     mtrr_aps_sync_end();
     adjust_vtd_irq_affinities();
     acpi_dmar_zap();
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_zap();
+#endif
     thaw_domains();
     system_state = SYS_STATE_active;
     spin_unlock(&pm_lock);
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index f616b99ddc..10741e865a 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -8,6 +8,7 @@
 #include <xen/iocap.h>
 #include <xen/libelf.h>
 #include <xen/pfn.h>
+#include <xen/pmem.h>
 #include <xen/sched.h>
 #include <xen/sched-if.h>
 #include <xen/softirq.h>
@@ -452,6 +453,10 @@ int __init dom0_setup_permissions(struct domain *d)
             rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
     }
 
+#ifdef CONFIG_NVDIMM_PMEM
+    rc |= pmem_dom0_setup_permission(d);
+#endif
+
     return rc;
 }
 
diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index a87aa60add..1902dfe73e 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -550,6 +550,9 @@ void machine_restart(unsigned int delay_millisecs)
 
     if ( tboot_in_measured_env() )
     {
+#ifdef CONFIG_NVDIMM_PMEM
+        acpi_nfit_reinstate();
+#endif
         acpi_dmar_reinstate();
         tboot_shutdown(TB_SHUTDOWN_REBOOT);
     }
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 59d7c477f4..24e3b81ff1 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -488,6 +488,10 @@ int __init tboot_parse_dmar_table(acpi_table_handler dmar_handler)
     /* but dom0 will read real table, so must zap it there too */
     acpi_dmar_zap();
 
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_zap();
+#endif
+
     return rc;
 }
 
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index fcc68bd4d8..c8c6138e71 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -366,6 +366,9 @@ static int kexec_common_shutdown(void)
     watchdog_disable();
     console_start_sync();
     spin_debug_disable();
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_reinstate();
+#endif
     acpi_dmar_reinstate();
 
     return 0;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 49648222a6..c9f5f6e904 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -18,6 +18,8 @@
 
 #include <xen/errno.h>
 #include <xen/list.h>
+#include <xen/iocap.h>
+#include <xen/paging.h>
 #include <xen/pmem.h>
 
 /*
@@ -128,3 +130,22 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
 
     return rc;
 }
+
+#ifdef CONFIG_X86
+
+int __init pmem_dom0_setup_permission(struct domain *d)
+{
+    struct list_head *cur;
+    struct pmem *pmem;
+    int rc = 0;
+
+    list_for_each(cur, &pmem_raw_regions)
+    {
+        pmem = list_entry(cur, struct pmem, link);
+        rc |= iomem_deny_access(d, pmem->smfn, pmem->emfn - 1);
+    }
+
+    return rc;
+}
+
+#endif /* CONFIG_X86 */
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index 68750c2edc..5f34cf2464 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -179,6 +179,24 @@ static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
     }
 }
 
+void acpi_nfit_zap(void)
+{
+    uint32_t sig = 0x4e494654; /* "TFIN" */
+
+    if ( nfit_desc.acpi_table )
+        write_atomic((uint32_t *)&nfit_desc.acpi_table->header.signature[0],
+                     sig);
+}
+
+void acpi_nfit_reinstate(void)
+{
+    uint32_t sig = 0x5449464e; /* "NFIT" */
+
+    if ( nfit_desc.acpi_table )
+        write_atomic((uint32_t *)&nfit_desc.acpi_table->header.signature[0],
+                     sig);
+}
+
 void __init acpi_nfit_boot_init(void)
 {
     acpi_status status;
@@ -193,6 +211,9 @@ void __init acpi_nfit_boot_init(void)
     map_pages_to_xen((unsigned long)nfit_desc.acpi_table, PFN_DOWN(nfit_addr),
                      PFN_UP(nfit_addr + nfit_len) - PFN_DOWN(nfit_addr),
                      PAGE_HYPERVISOR);
+
+    /* Hide NFIT from Dom0. */
+    acpi_nfit_zap();
 }
 
 void __init acpi_nfit_init(void)
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 088f01255d..77188193d0 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -186,6 +186,8 @@ bool acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
                                 unsigned long *ret_smfn,
                                 unsigned long *ret_emfn);
 void acpi_nfit_init(void);
+void acpi_nfit_zap(void);
+void acpi_nfit_reinstate(void);
 #endif /* CONFIG_NVDIMM_PMEM */
 
 #endif /*_LINUX_ACPI_H*/
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 41cb9bb04f..d5bd54ff19 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -24,5 +24,18 @@
 
 int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
 
+#ifdef CONFIG_X86
+
+int pmem_dom0_setup_permission(struct domain *d);
+
+#else /* !CONFIG_X86 */
+
+static inline int pmem_dom0_setup_permission(...)
+{
+    return -ENOSYS;
+}
+
+#endif /* CONFIG_X86 */
+
 #endif /* CONFIG_NVDIMM_PMEM */
 #endif /* __XEN_PMEM_H__ */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (7 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0 Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-11-03  7:40   ` Chao Peng
  2017-09-11  4:37 ` [RFC XEN PATCH v3 10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr Haozhong Zhang
                   ` (31 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng,
	Dan Williams, Daniel De Graaf

XEN_SYSCTL_nvdimm_op will support a set of sub-commands to manage the
physical NVDIMM devices. This commit just adds the framework for this
hypercall, and does not implement any sub-commands.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/flask/policy/modules/dom0.te  |  2 +-
 xen/common/pmem.c                   | 18 ++++++++++++++++++
 xen/common/sysctl.c                 |  9 +++++++++
 xen/include/public/sysctl.h         | 19 ++++++++++++++++++-
 xen/include/xen/pmem.h              |  2 ++
 xen/xsm/flask/hooks.c               |  4 ++++
 xen/xsm/flask/policy/access_vectors |  2 ++
 7 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
index 338caaf41e..8a817b0b55 100644
--- a/tools/flask/policy/modules/dom0.te
+++ b/tools/flask/policy/modules/dom0.te
@@ -16,7 +16,7 @@ allow dom0_t xen_t:xen {
 allow dom0_t xen_t:xen2 {
 	resource_op psr_cmt_op psr_cat_op pmu_ctrl get_symbol
 	get_cpu_levelling_caps get_cpu_featureset livepatch_op
-	gcov_op set_parameter
+	gcov_op set_parameter nvdimm_op
 };
 
 # Allow dom0 to use all XENVER_ subops that have checks.
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index c9f5f6e904..d67f237cd5 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -131,6 +131,24 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
     return rc;
 }
 
+/**
+ * Top-level hypercall handler of XEN_SYSCTL_nvdimm_pmem_*.
+ *
+ * Parameters:
+ *  nvdimm: the hypercall parameters
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
+{
+    int rc = -ENOSYS;
+
+    nvdimm->err = -rc;
+
+    return rc;
+}
+
 #ifdef CONFIG_X86
 
 int __init pmem_dom0_setup_permission(struct domain *d)
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index a6882d1c9d..33c8fca081 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xen/pmstat.h>
 #include <xen/livepatch.h>
 #include <xen/gcov.h>
+#include <xen/pmem.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -503,6 +504,14 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         break;
     }
 
+#ifdef CONFIG_NVDIMM_PMEM
+    case XEN_SYSCTL_nvdimm_op:
+        ret = pmem_do_sysctl(&op->u.nvdimm);
+        if ( ret != -ENOSYS )
+            copyback = 1;
+        break;
+#endif
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 7830b987da..e8272ae968 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -36,7 +36,7 @@
 #include "physdev.h"
 #include "tmem.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000F
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000010
 
 /*
  * Read console content from Xen buffer ring.
@@ -1114,6 +1114,21 @@ struct xen_sysctl_set_parameter {
 typedef struct xen_sysctl_set_parameter xen_sysctl_set_parameter_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t);
 
+/*
+ * Interface for NVDIMM management.
+ */
+
+struct xen_sysctl_nvdimm_op {
+    uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
+    uint32_t pad; /* IN: Always zero. */
+    union {
+        /* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */
+    } u;
+    uint32_t err; /* OUT: error code */
+};
+typedef struct xen_sysctl_nvdimm_op xen_sysctl_nvdimm_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_op_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -1143,6 +1158,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_get_cpu_featureset            26
 #define XEN_SYSCTL_livepatch_op                  27
 #define XEN_SYSCTL_set_parameter                 28
+#define XEN_SYSCTL_nvdimm_op                     29
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -1172,6 +1188,7 @@ struct xen_sysctl {
         struct xen_sysctl_cpu_featureset    cpu_featureset;
         struct xen_sysctl_livepatch_op      livepatch;
         struct xen_sysctl_set_parameter     set_parameter;
+        struct xen_sysctl_nvdimm_op         nvdimm;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index d5bd54ff19..922b12f570 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -20,9 +20,11 @@
 #define __XEN_PMEM_H__
 #ifdef CONFIG_NVDIMM_PMEM
 
+#include <public/sysctl.h>
 #include <xen/types.h>
 
 int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
+int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm);
 
 #ifdef CONFIG_X86
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 56dc5b0ab9..edfe529495 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -832,6 +832,10 @@ static int flask_sysctl(int cmd)
         return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
                                     XEN2__SET_PARAMETER, NULL);
 
+    case XEN_SYSCTL_nvdimm_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__NVDIMM_OP, NULL);
+
     default:
         return avc_unknown_permission("sysctl", cmd);
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index da9f3dfb2e..af05826064 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -103,6 +103,8 @@ class xen2
     gcov_op
 # XEN_SYSCTL_set_parameter
     set_parameter
+# XEN_SYSCTL_nvdimm_op
+    nvdimm_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (8 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

XEN_SYSCTL_nvdimm_pmem_get_rgions_nr, which is a command of hypercall
XEN_SYSCTL_nvdimm_op, is to get the number of PMEM regions of the
specified type (see PMEM_REGION_TYPE_*).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/include/xenctrl.h | 15 +++++++++++++++
 tools/libxc/xc_misc.c         | 24 ++++++++++++++++++++++++
 xen/common/pmem.c             | 29 ++++++++++++++++++++++++++++-
 xen/include/public/sysctl.h   | 16 ++++++++++++++--
 4 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 43151cb415..e4d26967ba 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2572,6 +2572,21 @@ int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
 int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
                          xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
 
+/*
+ * Get the number of PMEM regions of the specified type.
+ *
+ * Parameters:
+ *  xch:  xc interface handle
+ *  type: the type of PMEM regions, must be one of PMEM_REGION_TYPE_*
+ *  nr:   the number of PMEM regions is returned via this parameter
+ *
+ * Return:
+ *  On success, return 0 and the number of PMEM regions is returned via @nr.
+ *  Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
+                                  uint8_t type, uint32_t *nr);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 7e15e904e3..fa66410869 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -888,6 +888,30 @@ int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout)
     return _xc_livepatch_action(xch, name, LIVEPATCH_ACTION_REPLACE, timeout);
 }
 
+int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
+{
+    DECLARE_SYSCTL;
+    xen_sysctl_nvdimm_op_t *nvdimm = &sysctl.u.nvdimm;
+    int rc;
+
+    if ( !nr || type != PMEM_REGION_TYPE_RAW )
+        return -EINVAL;
+
+    sysctl.cmd = XEN_SYSCTL_nvdimm_op;
+    nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions_nr;
+    nvdimm->pad = 0;
+    nvdimm->u.pmem_regions_nr.type = type;
+    nvdimm->err = 0;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( !rc )
+        *nr = nvdimm->u.pmem_regions_nr.num_regions;
+    else if ( nvdimm->err )
+        rc = nvdimm->err;
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index d67f237cd5..995dfcb867 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -105,6 +105,23 @@ static int pmem_list_add(struct list_head *list,
     return rc;
 }
 
+static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
+{
+    int rc = 0;
+
+    switch ( regions_nr->type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        regions_nr->num_regions = nr_raw_regions;
+        break;
+
+    default:
+        rc = -EINVAL;
+    }
+
+    return rc;
+}
+
 /**
  * Register a pmem region to Xen.
  *
@@ -142,7 +159,17 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
  */
 int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
 {
-    int rc = -ENOSYS;
+    int rc;
+
+    switch ( nvdimm->cmd )
+    {
+    case XEN_SYSCTL_nvdimm_pmem_get_regions_nr:
+        rc = pmem_get_regions_nr(&nvdimm->u.pmem_regions_nr);
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
 
     nvdimm->err = -rc;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index e8272ae968..cf308bbc45 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1118,11 +1118,23 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t);
  * Interface for NVDIMM management.
  */
 
+/* Types of PMEM regions */
+#define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
+
+/* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
+struct xen_sysctl_nvdimm_pmem_regions_nr {
+    uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
+    uint32_t num_regions; /* OUT: the number of PMEM regions of type @type */
+};
+typedef struct xen_sysctl_nvdimm_pmem_regions_nr xen_sysctl_nvdimm_pmem_regions_nr_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_nr_t);
+
 struct xen_sysctl_nvdimm_op {
-    uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
+    uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*. */
+#define XEN_SYSCTL_nvdimm_pmem_get_regions_nr     0
     uint32_t pad; /* IN: Always zero. */
     union {
-        /* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */
+        xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr;
     } u;
     uint32_t err; /* OUT: error code */
 };
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (9 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl' Haozhong Zhang
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

XEN_SYSCTL_nvdimm_pmem_get_regions, which is a command of hypercall
XEN_SYSCTL_nvdimm_op, is to get a list of PMEM regions of specified
type (see PMEM_REGION_TYPE_*).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/include/xenctrl.h | 18 ++++++++++++
 tools/libxc/xc_misc.c         | 63 ++++++++++++++++++++++++++++++++++++++++
 xen/common/pmem.c             | 67 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h   | 27 +++++++++++++++++
 4 files changed, 175 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index e4d26967ba..d750e67460 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2587,6 +2587,24 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
 int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
                                   uint8_t type, uint32_t *nr);
 
+/*
+ * Get an array of information of PMEM regions of the specified type.
+ *
+ * Parameters:
+ *  xch:    xc interface handle
+ *  type:   the type of PMEM regions, must be one of PMEM_REGION_TYPE_*
+ *  buffer: the buffer where the information of PMEM regions is returned,
+ *          the caller should allocate enough memory for it.
+ *  nr :    IN: the maximum number of PMEM regions that can be returned
+ *              in @buffer
+ *          OUT: the actual number of returned PMEM regions in @buffer
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
+                               void *buffer, uint32_t *nr);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index fa66410869..f9ce802eda 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -912,6 +912,69 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
     return rc;
 }
 
+int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
+                               void *buffer, uint32_t *nr)
+{
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(buffer, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    xen_sysctl_nvdimm_op_t *nvdimm = &sysctl.u.nvdimm;
+    xen_sysctl_nvdimm_pmem_regions_t *regions = &nvdimm->u.pmem_regions;
+    unsigned int max;
+    unsigned long size;
+    int rc;
+
+    if ( !buffer || !nr )
+        return -EINVAL;
+
+    max = *nr;
+    if ( !max )
+        return 0;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max;
+        break;
+
+    default:
+        return -EINVAL;
+    }
+
+    HYPERCALL_BOUNCE_SET_SIZE(buffer, size);
+    if ( xc_hypercall_bounce_pre(xch, buffer) )
+        return -EFAULT;
+
+    sysctl.cmd = XEN_SYSCTL_nvdimm_op;
+    nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions;
+    nvdimm->pad = 0;
+    nvdimm->err = 0;
+    regions->type = type;
+    regions->num_regions = max;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        set_xen_guest_handle(regions->u_buffer.raw_regions, buffer);
+        break;
+
+    default:
+        rc = -EINVAL;
+        goto out;
+    }
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( !rc )
+        *nr = regions->num_regions;
+    else if ( nvdimm->err )
+        rc = -nvdimm->err;
+
+out:
+    xc_hypercall_bounce_post(xch, buffer);
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 995dfcb867..a737e7dc71 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -22,6 +22,8 @@
 #include <xen/paging.h>
 #include <xen/pmem.h>
 
+#include <asm/guest_access.h>
+
 /*
  * All PMEM regions presenting in NFIT SPA range structures are linked
  * in this list.
@@ -122,6 +124,67 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
     return rc;
 }
 
+static int pmem_get_raw_regions(
+    XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) regions,
+    unsigned int *num_regions)
+{
+    struct list_head *cur;
+    unsigned int nr = 0, max = *num_regions;
+    xen_sysctl_nvdimm_pmem_raw_region_t region;
+    int rc = 0;
+
+    if ( !guest_handle_okay(regions, max * sizeof(region)) )
+        return -EINVAL;
+
+    list_for_each(cur, &pmem_raw_regions)
+    {
+        struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+        if ( nr >= max )
+            break;
+
+        region.smfn = pmem->smfn;
+        region.emfn = pmem->emfn;
+        region.pxm = pmem->u.raw.pxm;
+
+        if ( copy_to_guest_offset(regions, nr, &region, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        nr++;
+    }
+
+    *num_regions = nr;
+
+    return rc;
+}
+
+static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
+{
+    unsigned int type = regions->type, max = regions->num_regions;
+    int rc = 0;
+
+    if ( !max )
+        return 0;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        rc = pmem_get_raw_regions(regions->u_buffer.raw_regions, &max);
+        break;
+
+    default:
+        rc = -EINVAL;
+    }
+
+    if ( !rc )
+        regions->num_regions = max;
+
+    return rc;
+}
+
 /**
  * Register a pmem region to Xen.
  *
@@ -167,6 +230,10 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
         rc = pmem_get_regions_nr(&nvdimm->u.pmem_regions_nr);
         break;
 
+    case XEN_SYSCTL_nvdimm_pmem_get_regions:
+        rc = pmem_get_regions(&nvdimm->u.pmem_regions);
+        break;
+
     default:
         rc = -ENOSYS;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index cf308bbc45..2635b1c911 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1121,6 +1121,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t);
 /* Types of PMEM regions */
 #define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
 
+/* PMEM_REGION_TYPE_RAW */
+struct xen_sysctl_nvdimm_pmem_raw_region {
+    uint64_t smfn;
+    uint64_t emfn;
+    uint32_t pxm;
+};
+typedef struct xen_sysctl_nvdimm_pmem_raw_region xen_sysctl_nvdimm_pmem_raw_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_raw_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
     uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
@@ -1129,12 +1138,30 @@ struct xen_sysctl_nvdimm_pmem_regions_nr {
 typedef struct xen_sysctl_nvdimm_pmem_regions_nr xen_sysctl_nvdimm_pmem_regions_nr_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_nr_t);
 
+/* XEN_SYSCTL_nvdimm_pmem_get_regions */
+struct xen_sysctl_nvdimm_pmem_regions {
+    uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
+    uint32_t num_regions; /* IN: the maximum number of entries that can be
+                                 returned via the guest handler in @u_buffer
+                             OUT: the actual number of entries returned via
+                                  the guest handler in @u_buffer */
+    union {
+        /* if type == PMEM_REGION_TYPE_RAW */
+        XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
+    } u_buffer;           /* IN: the guest handler where the entries of PMEM
+                                 regions of the type @type are returned */
+};
+typedef struct xen_sysctl_nvdimm_pmem_regions xen_sysctl_nvdimm_pmem_regions_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_t);
+
 struct xen_sysctl_nvdimm_op {
     uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*. */
 #define XEN_SYSCTL_nvdimm_pmem_get_regions_nr     0
+#define XEN_SYSCTL_nvdimm_pmem_get_regions        1
     uint32_t pad; /* IN: Always zero. */
     union {
         xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr;
+        xen_sysctl_nvdimm_pmem_regions_t pmem_regions;
     } u;
     uint32_t err; /* OUT: error code */
 };
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (10 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  5:10   ` Dan Williams
  2017-09-11  4:37 ` [RFC XEN PATCH v3 13/39] tools/xen-ndctl: add command 'list' Haozhong Zhang
                   ` (28 subsequent siblings)
  40 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

The kernel NVDIMM driver and the traditional NVDIMM management
utilities in Dom0 does not work now. 'xen-ndctl' is added as an
alternatively, which manages NVDIMM via Xen hypercalls.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 .gitignore             |   1 +
 tools/misc/Makefile    |   4 ++
 tools/misc/xen-ndctl.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 177 insertions(+)
 create mode 100644 tools/misc/xen-ndctl.c

diff --git a/.gitignore b/.gitignore
index ecb198f914..30655673f7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -216,6 +216,7 @@ tools/misc/xen-hvmctx
 tools/misc/xenlockprof
 tools/misc/lowmemd
 tools/misc/xencov
+tools/misc/xen-ndctl
 tools/pkg-config/*
 tools/qemu-xen-build
 tools/xentrace/xenalyze
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index eaa28793ef..124775b7f4 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -32,6 +32,7 @@ INSTALL_SBIN                   += xenpm
 INSTALL_SBIN                   += xenwatchdogd
 INSTALL_SBIN                   += xen-livepatch
 INSTALL_SBIN                   += xen-diag
+INSTALL_SBIN                   += xen-ndctl
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -118,4 +119,7 @@ xen-lowmemd: xen-lowmemd.o
 xencov: xencov.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-ndctl: xen-ndctl.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 -include $(DEPS_INCLUDE)
diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
new file mode 100644
index 0000000000..de40e29ff6
--- /dev/null
+++ b/tools/misc/xen-ndctl.c
@@ -0,0 +1,172 @@
+/*
+ * xen-ndctl.c
+ *
+ * Xen NVDIMM management tool
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without restriction,
+ * including without limitation the rights to use, copy, modify, merge,
+ * publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so,
+ * subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <xenctrl.h>
+
+static xc_interface *xch;
+
+static int handle_help(int argc, char *argv[]);
+static int handle_list_cmds(int argc, char *argv[]);
+
+static const struct xen_ndctl_cmd
+{
+    const char *name;
+    const char *syntax;
+    const char *help;
+    int (*handler)(int argc, char **argv);
+    bool need_xc;
+} cmds[] =
+{
+    {
+        .name    = "help",
+        .syntax  = "[command]",
+        .help    = "Show this message or the help message of 'command'.\n"
+                   "Use command 'list-cmds' to list all supported commands.\n",
+        .handler = handle_help,
+    },
+
+    {
+        .name    = "list-cmds",
+        .syntax  = "",
+        .help    = "List all supported commands.\n",
+        .handler = handle_list_cmds,
+    },
+};
+
+static const unsigned int nr_cmds = sizeof(cmds) / sizeof(cmds[0]);
+
+static void show_help(const char *cmd)
+{
+    unsigned int i;
+
+    if ( !cmd )
+    {
+        fprintf(stderr,
+                "Usage: xen-ndctl <command> [args]\n\n"
+                "List all supported commands by 'xen-ndctl list-cmds'.\n"
+                "Get help of a command by 'xen-ndctl help <command>'.\n");
+        return;
+    }
+
+    for ( i = 0; i < nr_cmds; i++ )
+        if ( !strcmp(cmd, cmds[i].name) )
+        {
+            fprintf(stderr, "Usage: xen-ndctl %s %s\n\n%s",
+                    cmds[i].name, cmds[i].syntax, cmds[i].help);
+            break;
+        }
+
+    if ( i == nr_cmds )
+        fprintf(stderr, "Unsupported command '%s'.\n"
+                "List all supported commands by 'xen-ndctl list-cmds'.\n",
+                cmd);
+}
+
+static int handle_unrecognized_argument(const char *cmd, const char *argv)
+{
+    fprintf(stderr, "Unrecognized argument: %s.\n\n", argv);
+    show_help(cmd);
+
+    return -EINVAL;
+}
+
+static int handle_help(int argc, char *argv[])
+{
+    if ( argc == 1 )
+        show_help(NULL);
+    else if ( argc == 2 )
+        show_help(argv[1]);
+    else
+        return handle_unrecognized_argument(argv[0], argv[2]);
+
+    return 0;
+}
+
+static int handle_list_cmds(int argc, char *argv[])
+{
+    unsigned int i;
+
+    if ( argc > 1 )
+        return handle_unrecognized_argument(argv[0], argv[1]);
+
+    for ( i = 0; i < nr_cmds; i++ )
+        fprintf(stderr, "%s\n", cmds[i].name);
+
+    return 0;
+}
+
+int main(int argc, char *argv[])
+{
+    unsigned int i;
+    int rc = 0;
+    const char *cmd;
+
+    if ( argc <= 1 )
+    {
+        show_help(NULL);
+        return 0;
+    }
+
+    cmd = argv[1];
+
+    for ( i = 0; i < nr_cmds; i++ )
+        if ( !strcmp(cmd, cmds[i].name) )
+        {
+            if ( cmds[i].need_xc )
+            {
+                xch = xc_interface_open(0, 0, 0);
+                if ( !xch )
+                {
+                    rc = -errno;
+                    fprintf(stderr, "Cannot get xc handler: %s\n",
+                            strerror(errno));
+                    break;
+                }
+            }
+            rc = cmds[i].handler(argc - 1, &argv[1]);
+            if ( rc )
+                fprintf(stderr, "\n'%s' failed: %s\n",
+                        cmds[i].name, strerror(-rc));
+            break;
+        }
+
+    if ( i == nr_cmds )
+    {
+        fprintf(stderr, "Unsupported command '%s'. "
+                "List all supported commands by 'xen-ndctl list-cmds'.\n",
+                cmd);
+        rc = -ENOSYS;
+    }
+
+    if ( xch )
+        xc_interface_close(xch);
+
+    return rc;
+}
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 13/39] tools/xen-ndctl: add command 'list'
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (11 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl' Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 14/39] x86_64/mm: refactor memory_add() Haozhong Zhang
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

Two options are supported by command 'list'. '--raw' indicates to list
all PMEM regions detected by Xen hypervisor, which can be later
configured for future usages. '--all' indicates all other
options (i.e. --raw and future options).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/misc/xen-ndctl.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index de40e29ff6..6277a1eda2 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -27,12 +27,14 @@
 
 #include <errno.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <string.h>
 #include <xenctrl.h>
 
 static xc_interface *xch;
 
 static int handle_help(int argc, char *argv[]);
+static int handle_list(int argc, char *argv[]);
 static int handle_list_cmds(int argc, char *argv[]);
 
 static const struct xen_ndctl_cmd
@@ -52,6 +54,15 @@ static const struct xen_ndctl_cmd
         .handler = handle_help,
     },
 
+    {
+        .name    = "list",
+        .syntax  = "[--all | --raw ]",
+        .help    = "--all: the default option, list all PMEM regions of following types.\n"
+                   "--raw: list all PMEM regions detected by Xen hypervisor.\n",
+        .handler = handle_list,
+        .need_xc = true,
+    },
+
     {
         .name    = "list-cmds",
         .syntax  = "",
@@ -109,6 +120,70 @@ static int handle_help(int argc, char *argv[])
     return 0;
 }
 
+static int handle_list_raw(void)
+{
+    int rc;
+    unsigned int nr = 0, i;
+    xen_sysctl_nvdimm_pmem_raw_region_t *raw_list;
+
+    rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_RAW, &nr);
+    if ( rc )
+    {
+        fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n",
+                strerror(-rc));
+        return rc;
+    }
+
+    raw_list = malloc(nr * sizeof(*raw_list));
+    if ( !raw_list )
+        return -ENOMEM;
+
+    rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_RAW, raw_list, &nr);
+    if ( rc )
+        goto out;
+
+    printf("Raw PMEM regions:\n");
+    for ( i = 0; i < nr; i++ )
+        printf(" %u: MFN 0x%lx - 0x%lx, PXM %u\n",
+               i, raw_list[i].smfn, raw_list[i].emfn, raw_list[i].pxm);
+
+ out:
+    free(raw_list);
+
+    return rc;
+}
+
+static const struct list_handlers {
+    const char *option;
+    int (*handler)(void);
+} list_hndrs[] =
+{
+    { "--raw", handle_list_raw },
+};
+
+static const unsigned int nr_list_hndrs =
+    sizeof(list_hndrs) / sizeof(list_hndrs[0]);
+
+static int handle_list(int argc, char *argv[])
+{
+    bool list_all = argc <= 1 || !strcmp(argv[1], "--all");
+    unsigned int i;
+    bool handled = false;
+    int rc = 0;
+
+    for ( i = 0; i < nr_list_hndrs && !rc; i++)
+        if ( list_all || !strcmp(argv[1], list_hndrs[i].option) )
+        {
+            rc = list_hndrs[i].handler();
+            handled = true;
+        }
+
+    if ( !handled )
+        return handle_unrecognized_argument(argv[0], argv[1]);
+
+    return rc;
+}
+
 static int handle_list_cmds(int argc, char *argv[])
 {
     unsigned int i;
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 14/39] x86_64/mm: refactor memory_add()
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (12 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 13/39] tools/xen-ndctl: add command 'list' Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 15/39] x86_64/mm: allow customized location of extended frametable and M2P table Haozhong Zhang
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

Separate the revertible part of memory_add_common(), which will also
be used in PMEM management. The separation will ease the failure
recovery in PMEM management. Several coding-style issues in the
touched code are fixed as well.

No functional change is introduced.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 98 +++++++++++++++++++++++++++---------------------
 1 file changed, 56 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index f635e4bf70..c8ffafe8a8 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1337,21 +1337,16 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
     return 1;
 }
 
-/*
- * A bit paranoid for memory allocation failure issue since
- * it may be reason for memory add
- */
-int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
+static int memory_add_common(struct mem_hotadd_info *info,
+                             unsigned int pxm, bool direct_map)
 {
-    struct mem_hotadd_info info;
+    unsigned long spfn = info->spfn, epfn = info->epfn;
     int ret;
     nodeid_t node;
     unsigned long old_max = max_page, old_total = total_pages;
     unsigned long old_node_start, old_node_span, orig_online;
     unsigned long i;
 
-    dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, pxm);
-
     if ( !mem_hotadd_check(spfn, epfn) )
         return -EINVAL;
 
@@ -1366,22 +1361,25 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
         return -EINVAL;
     }
 
-    i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1;
-    if ( spfn < i )
-    {
-        ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn,
-                               min(epfn, i) - spfn, PAGE_HYPERVISOR);
-        if ( ret )
-            goto destroy_directmap;
-    }
-    if ( i < epfn )
+    if ( direct_map )
     {
-        if ( i < spfn )
-            i = spfn;
-        ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i,
-                               epfn - i, __PAGE_HYPERVISOR_RW);
-        if ( ret )
-            goto destroy_directmap;
+        i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1;
+        if ( spfn < i )
+        {
+            ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn,
+                                   min(epfn, i) - spfn, PAGE_HYPERVISOR);
+            if ( ret )
+                goto destroy_directmap;
+        }
+        if ( i < epfn )
+        {
+            if ( i < spfn )
+                i = spfn;
+            ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i,
+                                   epfn - i, __PAGE_HYPERVISOR_RW);
+            if ( ret )
+                goto destroy_directmap;
+        }
     }
 
     old_node_start = node_start_pfn(node);
@@ -1398,22 +1396,18 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     }
     else
     {
-        if (node_start_pfn(node) > spfn)
+        if ( node_start_pfn(node) > spfn )
             NODE_DATA(node)->node_start_pfn = spfn;
-        if (node_end_pfn(node) < epfn)
+        if ( node_end_pfn(node) < epfn )
             NODE_DATA(node)->node_spanned_pages = epfn - node_start_pfn(node);
     }
 
-    info.spfn = spfn;
-    info.epfn = epfn;
-    info.cur = spfn;
-
-    ret = extend_frame_table(&info);
+    ret = extend_frame_table(info);
     if ( ret )
         goto restore_node_status;
 
     /* Set max_page as setup_m2p_table will use it*/
-    if (max_page < epfn)
+    if ( max_page < epfn )
     {
         max_page = epfn;
         max_pdx = pfn_to_pdx(max_page - 1) + 1;
@@ -1421,7 +1415,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     total_pages += epfn - spfn;
 
     set_pdx_range(spfn, epfn);
-    ret = setup_m2p_table(&info);
+    ret = setup_m2p_table(info);
 
     if ( ret )
         goto destroy_m2p;
@@ -1429,11 +1423,12 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
     {
         for ( i = spfn; i < epfn; i++ )
-            if ( iommu_map_page(hardware_domain, i, i, IOMMUF_readable|IOMMUF_writable) )
+            if ( iommu_map_page(hardware_domain, i, i,
+                                IOMMUF_readable|IOMMUF_writable) )
                 break;
         if ( i != epfn )
         {
-            while (i-- > old_max)
+            while ( i-- > old_max )
                 /* If statement to satisfy __must_check. */
                 if ( iommu_unmap_page(hardware_domain, i) )
                     continue;
@@ -1442,14 +1437,10 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
         }
     }
 
-    /* We can't revert any more */
-    share_hotadd_m2p_table(&info);
-    transfer_pages_to_heap(&info);
-
     return 0;
 
 destroy_m2p:
-    destroy_m2p_mapping(&info);
+    destroy_m2p_mapping(info);
     max_page = old_max;
     total_pages = old_total;
     max_pdx = pfn_to_pdx(max_page - 1) + 1;
@@ -1459,9 +1450,32 @@ restore_node_status:
         node_set_offline(node);
     NODE_DATA(node)->node_start_pfn = old_node_start;
     NODE_DATA(node)->node_spanned_pages = old_node_span;
- destroy_directmap:
-    destroy_xen_mappings((unsigned long)mfn_to_virt(spfn),
-                         (unsigned long)mfn_to_virt(epfn));
+destroy_directmap:
+    if ( direct_map )
+        destroy_xen_mappings((unsigned long)mfn_to_virt(spfn),
+                             (unsigned long)mfn_to_virt(epfn));
+
+    return ret;
+}
+
+/*
+ * A bit paranoid for memory allocation failure issue since
+ * it may be reason for memory add
+ */
+int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
+{
+    struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn };
+    int ret;
+
+    dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, pxm);
+
+    ret = memory_add_common(&info, pxm, true);
+    if ( !ret )
+    {
+        /* We can't revert any more */
+        share_hotadd_m2p_table(&info);
+        transfer_pages_to_heap(&info);
+    }
 
     return ret;
 }
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 15/39] x86_64/mm: allow customized location of extended frametable and M2P table
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (13 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 14/39] x86_64/mm: refactor memory_add() Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region Haozhong Zhang
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

As the existing data in PMEM region is persistent, Xen hypervisor has
no knowledge of which part is free to be used for the frame table and
M2P table of that PMEM region. Instead, we will allow users or system
admins to specify the location of those frame table and M2P table.
The location is not necessarily at the beginning of the PMEM region,
which is different from the case of hotplugged RAM.

This commit adds the support for a customized page allocation
function, which is used to allocate the memory for the frame table and
M2P table. No page free function is added, and we require that all
allocated pages can be reclaimed or has no effect out of
memory_add_common(), if memory_add_common() fails.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 83 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 69 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index c8ffafe8a8..d92307ca0b 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -106,13 +106,44 @@ struct mem_hotadd_info
     unsigned long cur;
 };
 
+struct mem_hotadd_alloc
+{
+    /*
+     * Allocate 2^PAGETABLE_ORDER pages.
+     *
+     * No free function is added right now, so we require that all
+     * allocated pages can be reclaimed easily or has no effect out of
+     * memory_add_common(), if memory_add_common() fails.
+     *
+     * For example, alloc_hotadd_mfn(), which is used in RAM hotplug,
+     * allocates pages from the hotplugged RAM. If memory_add_common()
+     * fails, the hotplugged RAM will not be available to Xen, so
+     * pages allocated by alloc_hotadd_mfns() will never be used and
+     * have no effect.
+     *
+     * Parameters:
+     *  opaque:   arguments of the allocator (depending on the implementation)
+     *
+     * Return:
+     *  On success, return MFN of the first page.
+     *  Otherwise, return mfn_x(INVALID_MFN).
+     */
+    unsigned long (*alloc_mfns)(void *opaque);
+
+    /*
+     * Additional arguments passed to @alloc_mfns().
+     */
+    void *opaque;
+};
+
 static int hotadd_mem_valid(unsigned long pfn, struct mem_hotadd_info *info)
 {
     return (pfn < info->epfn && pfn >= info->spfn);
 }
 
-static unsigned long alloc_hotadd_mfn(struct mem_hotadd_info *info)
+static unsigned long alloc_hotadd_mfn(void *opaque)
 {
+    struct mem_hotadd_info *info = opaque;
     unsigned mfn;
 
     ASSERT((info->cur + ( 1UL << PAGETABLE_ORDER) < info->epfn) &&
@@ -315,7 +346,8 @@ static void destroy_m2p_mapping(struct mem_hotadd_info *info)
  * spfn/epfn: the pfn ranges to be setup
  * free_s/free_e: the pfn ranges that is free still
  */
-static int setup_compat_m2p_table(struct mem_hotadd_info *info)
+static int setup_compat_m2p_table(struct mem_hotadd_info *info,
+                                  struct mem_hotadd_alloc *alloc)
 {
     unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn;
     unsigned int n;
@@ -369,7 +401,13 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
         if ( n == CNT )
             continue;
 
-        mfn = alloc_hotadd_mfn(info);
+        mfn = alloc->alloc_mfns(alloc->opaque);
+        if ( mfn == mfn_x(INVALID_MFN) )
+        {
+            err = -ENOMEM;
+            break;
+        }
+
         err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER,
                                PAGE_HYPERVISOR);
         if ( err )
@@ -389,7 +427,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
  * Allocate and map the machine-to-phys table.
  * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup already
  */
-static int setup_m2p_table(struct mem_hotadd_info *info)
+static int setup_m2p_table(struct mem_hotadd_info *info,
+                           struct mem_hotadd_alloc *alloc)
 {
     unsigned long i, va, smap, emap;
     unsigned int n;
@@ -438,7 +477,13 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
                 break;
         if ( n < CNT )
         {
-            unsigned long mfn = alloc_hotadd_mfn(info);
+            unsigned long mfn = alloc->alloc_mfns(alloc->opaque);
+
+            if ( mfn == mfn_x(INVALID_MFN) )
+            {
+                ret = -ENOMEM;
+                goto error;
+            }
 
             ret = map_pages_to_xen(
                         RDWR_MPT_VIRT_START + i * sizeof(unsigned long),
@@ -483,7 +528,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 #undef CNT
 #undef MFN
 
-    ret = setup_compat_m2p_table(info);
+    ret = setup_compat_m2p_table(info, alloc);
 error:
     return ret;
 }
@@ -762,7 +807,7 @@ static void cleanup_frame_table(unsigned long spfn, unsigned long epfn)
 }
 
 static int setup_frametable_chunk(void *start, void *end,
-                                  struct mem_hotadd_info *info)
+                                  struct mem_hotadd_alloc *alloc)
 {
     unsigned long s = (unsigned long)start;
     unsigned long e = (unsigned long)end;
@@ -774,7 +819,13 @@ static int setup_frametable_chunk(void *start, void *end,
 
     for ( cur = s; cur < e; cur += (1UL << L2_PAGETABLE_SHIFT) )
     {
-        mfn = alloc_hotadd_mfn(info);
+        mfn = alloc->alloc_mfns(alloc->opaque);
+        if ( mfn == mfn_x(INVALID_MFN) )
+        {
+            err = -ENOMEM;
+            break;
+        }
+
         err = map_pages_to_xen(cur, mfn, 1UL << PAGETABLE_ORDER,
                                PAGE_HYPERVISOR);
         if ( err )
@@ -789,7 +840,8 @@ static int setup_frametable_chunk(void *start, void *end,
     return err;
 }
 
-static int extend_frame_table(struct mem_hotadd_info *info)
+static int extend_frame_table(struct mem_hotadd_info *info,
+                              struct mem_hotadd_alloc *alloc)
 {
     unsigned long cidx, nidx, eidx, spfn, epfn;
     int err = 0;
@@ -816,7 +868,7 @@ static int extend_frame_table(struct mem_hotadd_info *info)
             nidx = eidx;
         err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT ),
                                      pdx_to_page(nidx * PDX_GROUP_COUNT),
-                                     info);
+                                     alloc);
         if ( err )
             break;
 
@@ -1338,7 +1390,8 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
 }
 
 static int memory_add_common(struct mem_hotadd_info *info,
-                             unsigned int pxm, bool direct_map)
+                             unsigned int pxm, bool direct_map,
+                             struct mem_hotadd_alloc *alloc)
 {
     unsigned long spfn = info->spfn, epfn = info->epfn;
     int ret;
@@ -1402,7 +1455,7 @@ static int memory_add_common(struct mem_hotadd_info *info,
             NODE_DATA(node)->node_spanned_pages = epfn - node_start_pfn(node);
     }
 
-    ret = extend_frame_table(info);
+    ret = extend_frame_table(info, alloc);
     if ( ret )
         goto restore_node_status;
 
@@ -1415,7 +1468,7 @@ static int memory_add_common(struct mem_hotadd_info *info,
     total_pages += epfn - spfn;
 
     set_pdx_range(spfn, epfn);
-    ret = setup_m2p_table(info);
+    ret = setup_m2p_table(info, alloc);
 
     if ( ret )
         goto destroy_m2p;
@@ -1465,11 +1518,13 @@ destroy_directmap:
 int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
 {
     struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn };
+    struct mem_hotadd_alloc alloc =
+        { .alloc_mfns = alloc_hotadd_mfn, .opaque = &info };
     int ret;
 
     dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, pxm);
 
-    ret = memory_add_common(&info, pxm, true);
+    ret = memory_add_common(&info, pxm, true, &alloc);
     if ( !ret )
     {
         /* We can't revert any more */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (14 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 15/39] x86_64/mm: allow customized location of extended frametable and M2P table Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 17/39] tools/xen-ndctl: add command 'setup-mgmt' Haozhong Zhang
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Jan Beulich, Chao Peng, Dan Williams

Add a command XEN_SYSCTL_nvdimm_pmem_setup to hypercall
XEN_SYSCTL_nvdimm_op to setup the frame table and M2P table of a PMEM
region. This command is currently used to setup the management PMEM
region which is used to store the frame table and M2P table of other
PMEM regions and itself. The management PMEM region should not be
mapped to guest.

PMEM pages are not added in any Xen or domain heaps. A new flag
PGC_pmem_page is used to indicate whether a page is from PMEM and
avoid returning PMEM pages to heaps.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/include/xenctrl.h |  16 +++++
 tools/libxc/xc_misc.c         |  34 ++++++++++
 xen/arch/x86/mm.c             |   3 +-
 xen/arch/x86/x86_64/mm.c      |  72 +++++++++++++++++++++
 xen/common/pmem.c             | 142 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/mm.h      |  10 ++-
 xen/include/public/sysctl.h   |  18 ++++++
 xen/include/xen/pmem.h        |   8 +++
 8 files changed, 301 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d750e67460..7c5707fe11 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2605,6 +2605,22 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
 int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
                                void *buffer, uint32_t *nr);
 
+/*
+ * Setup the specified PMEM pages for management usage. If success,
+ * these PMEM pages can be used to store the frametable and M2P table
+ * of itself and other PMEM pages. These management PMEM pages will
+ * never be mapped to guest.
+ *
+ * Parameters:
+ *  xch:        xc interface handle
+ *  smfn, emfn: the start and end MFN of the PMEM region
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index f9ce802eda..bebe6d04c8 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -975,6 +975,40 @@ out:
     return rc;
 }
 
+static void xc_nvdimm_pmem_setup_common(struct xen_sysctl *sysctl,
+                                        unsigned long smfn, unsigned long emfn,
+                                        unsigned long mgmt_smfn,
+                                        unsigned long mgmt_emfn)
+{
+    xen_sysctl_nvdimm_op_t *nvdimm = &sysctl->u.nvdimm;
+    xen_sysctl_nvdimm_pmem_setup_t *setup = &nvdimm->u.pmem_setup;
+
+    sysctl->cmd = XEN_SYSCTL_nvdimm_op;
+    nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_setup;
+    nvdimm->pad = 0;
+    nvdimm->err = 0;
+    setup->smfn = smfn;
+    setup->emfn = emfn;
+    setup->mgmt_smfn = mgmt_smfn;
+    setup->mgmt_emfn = mgmt_emfn;
+}
+
+int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn)
+{
+    DECLARE_SYSCTL;
+    int rc;
+
+    xc_nvdimm_pmem_setup_common(&sysctl, smfn, emfn, smfn, emfn);
+    sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_MGMT;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( rc && sysctl.u.nvdimm.err )
+        rc = -sysctl.u.nvdimm.err;
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2fdf609805..93ccf198c9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2341,7 +2341,8 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( cleanup_page_cacheattr(page) == 0 )
+        if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. */
+             && cleanup_page_cacheattr(page) == 0 )
             free_domheap_page(page);
         else
             gdprintk(XENLOG_WARNING,
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index d92307ca0b..7dbc5e966c 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1535,6 +1535,78 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     return ret;
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+
+static void pmem_init_frame_table(unsigned long smfn, unsigned long emfn)
+{
+    struct page_info *page = mfn_to_page(smfn), *epage = mfn_to_page(emfn);
+
+    while ( page < epage )
+    {
+        page->count_info = PGC_state_free | PGC_pmem_page;
+        page++;
+    }
+}
+
+/**
+ * Initialize frametable and M2P for the specified PMEM region.
+ *
+ * Parameters:
+ *  smfn, emfn: the start and end MFN of the PMEM region
+ *  mgmt_smfn,
+ *  mgmt_emfn:  the start and end MFN of the PMEM region used to store
+ *              the frame table and M2P table of above PMEM region. If
+ *              @smfn - @emfn is going to be mapped to guest, it should
+ *              not overlap with @mgmt_smfn - @mgmt_emfn. If @smfn - @emfn
+ *              is going to be used for management purpose, it should
+ *              be identical to @mgmt_smfn - @mgnt_emfn.
+ *  used_mgmt_mfns: return the number of pages used in @mgmt_smfn - @mgmt_emfn
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int pmem_arch_setup(unsigned long smfn, unsigned long emfn, unsigned int pxm,
+                    unsigned long mgmt_smfn, unsigned long mgmt_emfn,
+                    unsigned long *used_mgmt_mfns)
+{
+    struct mem_hotadd_info info =
+        { .spfn = smfn, .epfn = emfn, .cur = smfn };
+    struct mem_hotadd_info mgmt_info =
+        { .spfn = mgmt_smfn, .epfn = mgmt_emfn, .cur = mgmt_smfn };
+    struct mem_hotadd_alloc alloc =
+    {
+        .alloc_mfns = alloc_hotadd_mfn,
+        .opaque     = &mgmt_info
+    };
+    bool is_mgmt = (mgmt_smfn == smfn && mgmt_emfn == emfn);
+    int rc;
+
+    if ( mgmt_smfn == mfn_x(INVALID_MFN) || mgmt_emfn == mfn_x(INVALID_MFN) ||
+         mgmt_smfn >= mgmt_emfn )
+        return -EINVAL;
+
+    if ( !is_mgmt &&
+         ((smfn >= mgmt_smfn && smfn < mgmt_emfn) ||
+          (emfn > mgmt_smfn && emfn <= mgmt_emfn)) )
+        return -EINVAL;
+
+    rc = memory_add_common(&info, pxm, false, &alloc);
+    if ( rc )
+        return rc;
+
+    pmem_init_frame_table(smfn, emfn);
+
+    if ( !is_mgmt )
+        share_hotadd_m2p_table(&info);
+
+    if ( used_mgmt_mfns )
+        *used_mgmt_mfns = mgmt_info.cur - mgmt_info.spfn;
+
+    return 0;
+}
+
+#endif /* CONFIG_NVDIMM_PMEM */
+
 #include "compat/mm.c"
 
 /*
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index a737e7dc71..7a081c2879 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -31,6 +31,15 @@
 static LIST_HEAD(pmem_raw_regions);
 static unsigned int nr_raw_regions;
 
+/*
+ * All PMEM regions reserved for management purpose are linked to this
+ * list. All of them must be covered by one or multiple PMEM regions
+ * in list pmem_raw_regions.
+ */
+static LIST_HEAD(pmem_mgmt_regions);
+static DEFINE_SPINLOCK(pmem_mgmt_lock);
+static unsigned int nr_mgmt_regions;
+
 struct pmem {
     struct list_head link; /* link to one of PMEM region list */
     unsigned long smfn;    /* start MFN of the PMEM region */
@@ -40,6 +49,10 @@ struct pmem {
         struct {
             unsigned int pxm; /* proximity domain of the PMEM region */
         } raw;
+
+        struct {
+            unsigned long used; /* # of used pages in MGMT PMEM region */
+        } mgmt;
     } u;
 };
 
@@ -107,6 +120,18 @@ static int pmem_list_add(struct list_head *list,
     return rc;
 }
 
+/**
+ * Delete the specified entry from the list to which it's currently linked.
+ *
+ * Parameters:
+ *  entry: the entry to be deleted
+ */
+static void pmem_list_del(struct pmem *entry)
+{
+    list_del(&entry->link);
+    xfree(entry);
+}
+
 static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
 {
     int rc = 0;
@@ -185,6 +210,114 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
     return rc;
 }
 
+static bool check_mgmt_size(unsigned long mgmt_mfns, unsigned long total_mfns)
+{
+    return mgmt_mfns >=
+        ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
+        ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
+}
+
+static bool check_address_and_pxm(unsigned long smfn, unsigned long emfn,
+                                  unsigned int *ret_pxm)
+{
+    struct list_head *cur;
+    long pxm = -1;
+
+    list_for_each(cur, &pmem_raw_regions)
+    {
+        struct pmem *raw = list_entry(cur, struct pmem, link);
+        unsigned long raw_smfn = raw->smfn, raw_emfn = raw->emfn;
+
+        if ( !check_overlap(smfn, emfn, raw_smfn, raw_emfn) )
+            continue;
+
+        if ( smfn < raw_smfn )
+            return false;
+
+        if ( pxm != -1 && pxm != raw->u.raw.pxm )
+            return false;
+        pxm = raw->u.raw.pxm;
+
+        smfn = min(emfn, raw_emfn);
+        if ( smfn == emfn )
+            break;
+    }
+
+    *ret_pxm = pxm;
+
+    return smfn == emfn;
+}
+
+static int pmem_setup_mgmt(unsigned long smfn, unsigned long emfn)
+{
+    struct pmem *mgmt;
+    unsigned long used_mgmt_mfns;
+    unsigned int pxm;
+    int rc;
+
+    if ( smfn == mfn_x(INVALID_MFN) || emfn == mfn_x(INVALID_MFN) ||
+         smfn >= emfn )
+        return -EINVAL;
+
+    /*
+     * Require the PMEM region in one proximity domain, in order to
+     * avoid the error recovery from multiple calls to pmem_arch_setup()
+     * which is not revertible.
+     */
+    if ( !check_address_and_pxm(smfn, emfn, &pxm) )
+        return -EINVAL;
+
+    if ( !check_mgmt_size(emfn - smfn, emfn - smfn) )
+        return -ENOSPC;
+
+    spin_lock(&pmem_mgmt_lock);
+
+    rc = pmem_list_add(&pmem_mgmt_regions, smfn, emfn, &mgmt);
+    if ( rc )
+        goto out;
+
+    rc = pmem_arch_setup(smfn, emfn, pxm, smfn, emfn, &used_mgmt_mfns);
+    if ( rc )
+    {
+        pmem_list_del(mgmt);
+        goto out;
+    }
+
+    mgmt->u.mgmt.used = used_mgmt_mfns;
+    nr_mgmt_regions++;
+
+ out:
+    spin_unlock(&pmem_mgmt_lock);
+
+    return rc;
+}
+
+static int pmem_setup(unsigned long smfn, unsigned long emfn,
+                      unsigned long mgmt_smfn, unsigned long mgmt_emfn,
+                      unsigned int type)
+{
+    int rc;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_MGMT:
+        if ( smfn != mgmt_smfn || emfn != mgmt_emfn )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = pmem_setup_mgmt(smfn, emfn);
+
+        break;
+
+    default:
+        rc = -EINVAL;
+    }
+
+    return rc;
+}
+
 /**
  * Register a pmem region to Xen.
  *
@@ -234,6 +367,15 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
         rc = pmem_get_regions(&nvdimm->u.pmem_regions);
         break;
 
+    case XEN_SYSCTL_nvdimm_pmem_setup:
+    {
+        struct xen_sysctl_nvdimm_pmem_setup *setup = &nvdimm->u.pmem_setup;
+        rc = pmem_setup(setup->smfn, setup->emfn,
+                        setup->mgmt_smfn, setup->mgmt_emfn,
+                        setup->type);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
     }
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index bef45e8e9f..33a732846f 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -245,9 +245,11 @@ struct page_info
 #define PGC_state_offlined PG_mask(2, 9)
 #define PGC_state_free    PG_mask(3, 9)
 #define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
+/* Page is from PMEM? */
+#define PGC_pmem_page     PG_mask(1, 10)
 
  /* Count of references to this frame. */
-#define PGC_count_width   PG_shift(9)
+#define PGC_count_width   PG_shift(10)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
 /*
@@ -264,6 +266,12 @@ struct page_info
     ((((mfn) << PAGE_SHIFT) >= __pa(&_stext)) &&  \
      (((mfn) << PAGE_SHIFT) <= __pa(&__2M_rwdata_end)))
 
+#ifdef CONFIG_NVDIMM_PMEM
+#define is_pmem_page(page) ((page)->count_info & PGC_pmem_page)
+#else
+#define is_pmem_page(page) false
+#endif
+
 #define PRtype_info "016lx"/* should only be used for printk's */
 
 /* The number of out-of-sync shadows we allow per vcpu (prime, please) */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 2635b1c911..5d208033a0 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1120,6 +1120,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t);
 
 /* Types of PMEM regions */
 #define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
+#define PMEM_REGION_TYPE_MGMT       1 /* PMEM regions for management usage */
 
 /* PMEM_REGION_TYPE_RAW */
 struct xen_sysctl_nvdimm_pmem_raw_region {
@@ -1154,14 +1155,31 @@ struct xen_sysctl_nvdimm_pmem_regions {
 typedef struct xen_sysctl_nvdimm_pmem_regions xen_sysctl_nvdimm_pmem_regions_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_t);
 
+/* XEN_SYSCTL_nvdimm_pmem_setup */
+struct xen_sysctl_nvdimm_pmem_setup {
+    /* IN variables */
+    uint64_t smfn;      /* start MFN of the PMEM region */
+    uint64_t emfn;      /* end MFN of the PMEM region */
+    uint64_t mgmt_smfn;
+    uint64_t mgmt_emfn; /* start and end MFN of PMEM pages used to manage */
+                        /* above PMEM region. If the above PMEM region is */
+                        /* a management region, mgmt_{s,e}mfn is required */
+                        /* to be identical to {s,e}mfn. */
+    uint8_t  type;      /* Only PMEM_REGION_TYPE_MGMT is supported now */
+};
+typedef struct xen_sysctl_nvdimm_pmem_setup xen_sysctl_nvdimm_pmem_setup_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_setup_t);
+
 struct xen_sysctl_nvdimm_op {
     uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*. */
 #define XEN_SYSCTL_nvdimm_pmem_get_regions_nr     0
 #define XEN_SYSCTL_nvdimm_pmem_get_regions        1
+#define XEN_SYSCTL_nvdimm_pmem_setup              2
     uint32_t pad; /* IN: Always zero. */
     union {
         xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr;
         xen_sysctl_nvdimm_pmem_regions_t pmem_regions;
+        xen_sysctl_nvdimm_pmem_setup_t pmem_setup;
     } u;
     uint32_t err; /* OUT: error code */
 };
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 922b12f570..9323d679a6 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -29,6 +29,9 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm);
 #ifdef CONFIG_X86
 
 int pmem_dom0_setup_permission(struct domain *d);
+int pmem_arch_setup(unsigned long smfn, unsigned long emfn, unsigned int pxm,
+                    unsigned long mgmt_smfn, unsigned long mgmt_emfn,
+                    unsigned long *used_mgmt_mfns);
 
 #else /* !CONFIG_X86 */
 
@@ -37,6 +40,11 @@ static inline int pmem_dom0_setup_permission(...)
     return -ENOSYS;
 }
 
+static inline int pmem_arch_setup(...)
+{
+    return -ENOSYS;
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* CONFIG_NVDIMM_PMEM */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 17/39] tools/xen-ndctl: add command 'setup-mgmt'
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (15 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

This command is to query Xen hypervisor to setup the specified PMEM
range for the management usage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/misc/xen-ndctl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 6277a1eda2..1289a83dbe 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -36,6 +36,7 @@ static xc_interface *xch;
 static int handle_help(int argc, char *argv[]);
 static int handle_list(int argc, char *argv[]);
 static int handle_list_cmds(int argc, char *argv[]);
+static int handle_setup_mgmt(int argc, char *argv[]);
 
 static const struct xen_ndctl_cmd
 {
@@ -69,6 +70,14 @@ static const struct xen_ndctl_cmd
         .help    = "List all supported commands.\n",
         .handler = handle_list_cmds,
     },
+
+    {
+        .name    = "setup-mgmt",
+        .syntax  = "<smfn> <emfn>",
+        .help    = "Setup a PMEM region from MFN 'smfn' to 'emfn' for management usage.\n\n",
+        .handler = handle_setup_mgmt,
+        .need_xc = true,
+    },
 };
 
 static const unsigned int nr_cmds = sizeof(cmds) / sizeof(cmds[0]);
@@ -197,6 +206,42 @@ static int handle_list_cmds(int argc, char *argv[])
     return 0;
 }
 
+static bool string_to_mfn(const char *str, unsigned long *ret)
+{
+    unsigned long l;
+
+    errno = 0;
+    l = strtoul(str, NULL, 0);
+
+    if ( !errno )
+        *ret = l;
+    else
+        fprintf(stderr, "Invalid MFN %s: %s\n", str, strerror(errno));
+
+    return !errno;
+}
+
+static int handle_setup_mgmt(int argc, char **argv)
+{
+    unsigned long smfn, emfn;
+
+    if ( argc < 3 )
+    {
+        fprintf(stderr, "Too few arguments.\n\n");
+        show_help(argv[0]);
+        return -EINVAL;
+    }
+
+    if ( !string_to_mfn(argv[1], &smfn) ||
+         !string_to_mfn(argv[2], &emfn) )
+        return -EINVAL;
+
+    if ( argc > 3 )
+        return handle_unrecognized_argument(argv[0], argv[3]);
+
+    return xc_nvdimm_pmem_setup_mgmt(xch, smfn, emfn);
+}
+
 int main(int argc, char *argv[])
 {
     unsigned int i;
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (16 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 17/39] tools/xen-ndctl: add command 'setup-mgmt' Haozhong Zhang
@ 2017-09-11  4:37 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of
management PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_misc.c | 4 +++-
 xen/common/pmem.c     | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index bebe6d04c8..4b5558aaa5 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -894,7 +894,9 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
     xen_sysctl_nvdimm_op_t *nvdimm = &sysctl.u.nvdimm;
     int rc;
 
-    if ( !nr || type != PMEM_REGION_TYPE_RAW )
+    if ( !nr ||
+         (type != PMEM_REGION_TYPE_RAW &&
+          type != PMEM_REGION_TYPE_MGMT) )
         return -EINVAL;
 
     sysctl.cmd = XEN_SYSCTL_nvdimm_op;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 7a081c2879..54b3e7119a 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -142,6 +142,10 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
         regions_nr->num_regions = nr_raw_regions;
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        regions_nr->num_regions = nr_mgmt_regions;
+        break;
+
     default:
         rc = -EINVAL;
     }
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (17 preceding siblings ...)
  2017-09-11  4:37 ` [RFC XEN PATCH v3 18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 20/39] tools/xen-ndctl: add option '--mgmt' to command 'list' Haozhong Zhang
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of
management PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_misc.c       |  8 ++++++++
 xen/common/pmem.c           | 45 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h | 11 +++++++++++
 3 files changed, 64 insertions(+)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 4b5558aaa5..3ad254f5ae 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -939,6 +939,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max;
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max;
+        break;
+
     default:
         return -EINVAL;
     }
@@ -960,6 +964,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         set_xen_guest_handle(regions->u_buffer.raw_regions, buffer);
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer);
+        break;
+
     default:
         rc = -EINVAL;
         goto out;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 54b3e7119a..dcd8160407 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -190,6 +190,47 @@ static int pmem_get_raw_regions(
     return rc;
 }
 
+static int pmem_get_mgmt_regions(
+    XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) regions,
+    unsigned int *num_regions)
+{
+    struct list_head *cur;
+    unsigned int nr = 0, max = *num_regions;
+    xen_sysctl_nvdimm_pmem_mgmt_region_t region;
+    int rc = 0;
+
+    if ( !guest_handle_okay(regions, max * sizeof(region)) )
+        return -EINVAL;
+
+    spin_lock(&pmem_mgmt_lock);
+
+    list_for_each(cur, &pmem_mgmt_regions)
+    {
+        struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+        if ( nr >= max )
+            break;
+
+        region.smfn = pmem->smfn;
+        region.emfn = pmem->emfn;
+        region.used_mfns = pmem->u.mgmt.used;
+
+        if ( copy_to_guest_offset(regions, nr, &region, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        nr++;
+    }
+
+    spin_unlock(&pmem_mgmt_lock);
+
+    *num_regions = nr;
+
+    return rc;
+}
+
 static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 {
     unsigned int type = regions->type, max = regions->num_regions;
@@ -204,6 +245,10 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
         rc = pmem_get_raw_regions(regions->u_buffer.raw_regions, &max);
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, &max);
+        break;
+
     default:
         rc = -EINVAL;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 5d208033a0..f825716446 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1131,6 +1131,15 @@ struct xen_sysctl_nvdimm_pmem_raw_region {
 typedef struct xen_sysctl_nvdimm_pmem_raw_region xen_sysctl_nvdimm_pmem_raw_region_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_raw_region_t);
 
+/* PMEM_REGION_TYPE_MGMT */
+struct xen_sysctl_nvdimm_pmem_mgmt_region {
+    uint64_t smfn;
+    uint64_t emfn;
+    uint64_t used_mfns;
+};
+typedef struct xen_sysctl_nvdimm_pmem_mgmt_region xen_sysctl_nvdimm_pmem_mgmt_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
     uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
@@ -1149,6 +1158,8 @@ struct xen_sysctl_nvdimm_pmem_regions {
     union {
         /* if type == PMEM_REGION_TYPE_RAW */
         XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
+        /* if type == PMEM_REGION_TYPE_MGMT */
+        XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions;
     } u_buffer;           /* IN: the guest handler where the entries of PMEM
                                  regions of the type @type are returned */
 };
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 20/39] tools/xen-ndctl: add option '--mgmt' to command 'list'
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (18 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 21/39] xen/pmem: support setup PMEM region for guest data usage Haozhong Zhang
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

If the option '--mgmt' is present, the command 'list' will list all
PMEM regions for management usage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/misc/xen-ndctl.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 1289a83dbe..058f8ccaf5 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -57,9 +57,10 @@ static const struct xen_ndctl_cmd
 
     {
         .name    = "list",
-        .syntax  = "[--all | --raw ]",
+        .syntax  = "[--all | --raw | --mgmt]",
         .help    = "--all: the default option, list all PMEM regions of following types.\n"
-                   "--raw: list all PMEM regions detected by Xen hypervisor.\n",
+                   "--raw: list all PMEM regions detected by Xen hypervisor.\n"
+                   "--mgmt: list all PMEM regions for management usage.\n",
         .handler = handle_list,
         .need_xc = true,
     },
@@ -162,12 +163,46 @@ static int handle_list_raw(void)
     return rc;
 }
 
+static int handle_list_mgmt(void)
+{
+    int rc;
+    unsigned int nr = 0, i;
+    xen_sysctl_nvdimm_pmem_mgmt_region_t *mgmt_list;
+
+    rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_MGMT, &nr);
+    if ( rc )
+    {
+        fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n",
+                strerror(-rc));
+        return rc;
+    }
+
+    mgmt_list = malloc(nr * sizeof(*mgmt_list));
+    if ( !mgmt_list )
+        return -ENOMEM;
+
+    rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_MGMT, mgmt_list, &nr);
+    if ( rc )
+        goto out;
+
+    printf("Management PMEM regions:\n");
+    for ( i = 0; i < nr; i++ )
+        printf(" %u: MFN 0x%lx - 0x%lx, used 0x%lx\n",
+               i, mgmt_list[i].smfn, mgmt_list[i].emfn, mgmt_list[i].used_mfns);
+
+ out:
+    free(mgmt_list);
+
+    return rc;
+}
+
 static const struct list_handlers {
     const char *option;
     int (*handler)(void);
 } list_hndrs[] =
 {
     { "--raw", handle_list_raw },
+    { "--mgmt", handle_list_mgmt },
 };
 
 static const unsigned int nr_list_hndrs =
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 21/39] xen/pmem: support setup PMEM region for guest data usage
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (19 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 20/39] tools/xen-ndctl: add option '--mgmt' to command 'list' Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 22/39] tools/xen-ndctl: add command 'setup-data' Haozhong Zhang
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

Allow the command XEN_SYSCTL_nvdimm_pmem_setup of hypercall
XEN_SYSCTL_nvdimm_op to setup a PMEM region for guest data
usage. After the setup, that PMEM region will be able to be
mapped to guest address space.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/include/xenctrl.h |  22 ++++++++
 tools/libxc/xc_misc.c         |  17 ++++++
 xen/common/pmem.c             | 118 +++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/sysctl.h   |   3 +-
 4 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 7c5707fe11..41e5e3408c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2621,6 +2621,28 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
 int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
                               unsigned long smfn, unsigned long emfn);
 
+/*
+ * Setup the specified PMEM pages for guest data usage. If success,
+ * these PMEM page can be mapped to guest and be used as the backend
+ * of vNDIMM devices.
+ *
+ * Parameters:
+ *  xch:        xc interface handle
+ *  smfn, emfn: the start and end of the PMEM region
+ *  mgmt_smfn,
+
+ *  mgmt_emfn:  the start and the end MFN of the PMEM region that is
+ *              used to manage this PMEM region. It must be in one of
+ *              those added by xc_nvdimm_pmem_setup_mgmt() calls, and
+ *              not overlap with @smfn - @emfn.
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_setup_data(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn,
+                              unsigned long mgmt_smfn, unsigned long mgmt_emfn);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 3ad254f5ae..ef2e9e0656 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -1019,6 +1019,23 @@ int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
     return rc;
 }
 
+int xc_nvdimm_pmem_setup_data(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn,
+                              unsigned long mgmt_smfn, unsigned long mgmt_emfn)
+{
+    DECLARE_SYSCTL;
+    int rc;
+
+    xc_nvdimm_pmem_setup_common(&sysctl, smfn, emfn, mgmt_smfn, mgmt_emfn);
+    sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_DATA;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( rc && sysctl.u.nvdimm.err )
+        rc = -sysctl.u.nvdimm.err;
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index dcd8160407..6891ed7a47 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -34,16 +34,26 @@ static unsigned int nr_raw_regions;
 /*
  * All PMEM regions reserved for management purpose are linked to this
  * list. All of them must be covered by one or multiple PMEM regions
- * in list pmem_raw_regions.
+ * in list pmem_raw_regions, and not appear in list pmem_data_regions.
  */
 static LIST_HEAD(pmem_mgmt_regions);
 static DEFINE_SPINLOCK(pmem_mgmt_lock);
 static unsigned int nr_mgmt_regions;
 
+/*
+ * All PMEM regions that can be mapped to guest are linked to this
+ * list. All of them must be covered by one or multiple PMEM regions
+ * in list pmem_raw_regions, and not appear in list pmem_mgmt_regions.
+ */
+static LIST_HEAD(pmem_data_regions);
+static DEFINE_SPINLOCK(pmem_data_lock);
+static unsigned int nr_data_regions;
+
 struct pmem {
     struct list_head link; /* link to one of PMEM region list */
     unsigned long smfn;    /* start MFN of the PMEM region */
     unsigned long emfn;    /* end MFN of the PMEM region */
+    spinlock_t lock;
 
     union {
         struct {
@@ -53,6 +63,11 @@ struct pmem {
         struct {
             unsigned long used; /* # of used pages in MGMT PMEM region */
         } mgmt;
+
+        struct {
+            unsigned long mgmt_smfn; /* start MFN of management region */
+            unsigned long mgmt_emfn; /* end MFN of management region */
+        } data;
     } u;
 };
 
@@ -111,6 +126,7 @@ static int pmem_list_add(struct list_head *list,
     }
     new_pmem->smfn = smfn;
     new_pmem->emfn = emfn;
+    spin_lock_init(&new_pmem->lock);
     list_add(&new_pmem->link, cur);
 
  out:
@@ -261,9 +277,16 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 
 static bool check_mgmt_size(unsigned long mgmt_mfns, unsigned long total_mfns)
 {
-    return mgmt_mfns >=
+    unsigned long required =
         ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
         ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
+
+    if ( required > mgmt_mfns )
+        printk(XENLOG_DEBUG "PMEM: insufficient management pages, "
+               "0x%lx pages required, 0x%lx pages available\n",
+               required, mgmt_mfns);
+
+    return mgmt_mfns >= required;
 }
 
 static bool check_address_and_pxm(unsigned long smfn, unsigned long emfn,
@@ -341,6 +364,93 @@ static int pmem_setup_mgmt(unsigned long smfn, unsigned long emfn)
     return rc;
 }
 
+static struct pmem *find_mgmt_region(unsigned long smfn, unsigned long emfn)
+{
+    struct list_head *cur;
+
+    ASSERT(spin_is_locked(&pmem_mgmt_lock));
+
+    list_for_each(cur, &pmem_mgmt_regions)
+    {
+        struct pmem *mgmt = list_entry(cur, struct pmem, link);
+
+        if ( smfn >= mgmt->smfn && emfn <= mgmt->emfn )
+            return mgmt;
+    }
+
+    return NULL;
+}
+
+static int pmem_setup_data(unsigned long smfn, unsigned long emfn,
+                           unsigned long mgmt_smfn, unsigned long mgmt_emfn)
+{
+    struct pmem *data, *mgmt = NULL;
+    unsigned long used_mgmt_mfns;
+    unsigned int pxm;
+    int rc;
+
+    if ( smfn == mfn_x(INVALID_MFN) || emfn == mfn_x(INVALID_MFN) ||
+         smfn >= emfn )
+        return -EINVAL;
+
+    /*
+     * Require the PMEM region in one proximity domain, in order to
+     * avoid the error recovery from multiple calls to pmem_arch_setup()
+     * which is not revertible.
+     */
+    if ( !check_address_and_pxm(smfn, emfn, &pxm) )
+        return -EINVAL;
+
+    if ( mgmt_smfn == mfn_x(INVALID_MFN) || mgmt_emfn == mfn_x(INVALID_MFN) ||
+         mgmt_smfn >= mgmt_emfn )
+        return -EINVAL;
+
+    spin_lock(&pmem_mgmt_lock);
+    mgmt = find_mgmt_region(mgmt_smfn, mgmt_emfn);
+    if ( !mgmt )
+    {
+        spin_unlock(&pmem_mgmt_lock);
+        return -ENXIO;
+    }
+    spin_unlock(&pmem_mgmt_lock);
+
+    spin_lock(&mgmt->lock);
+
+    if ( mgmt_smfn < mgmt->smfn + mgmt->u.mgmt.used ||
+         !check_mgmt_size(mgmt_emfn - mgmt_smfn, emfn - smfn) )
+    {
+        spin_unlock(&mgmt->lock);
+        return -ENOSPC;
+    }
+
+    spin_lock(&pmem_data_lock);
+
+    rc = pmem_list_add(&pmem_data_regions, smfn, emfn, &data);
+    if ( rc )
+        goto out;
+    data->u.data.mgmt_smfn = data->u.data.mgmt_emfn = mfn_x(INVALID_MFN);
+
+    rc = pmem_arch_setup(smfn, emfn, pxm,
+                         mgmt_smfn, mgmt_emfn, &used_mgmt_mfns);
+    if ( rc )
+    {
+        pmem_list_del(data);
+        goto out;
+    }
+
+    mgmt->u.mgmt.used = mgmt_smfn - mgmt->smfn + used_mgmt_mfns;
+    data->u.data.mgmt_smfn = mgmt_smfn;
+    data->u.data.mgmt_emfn = mgmt->smfn + mgmt->u.mgmt.used;
+
+    nr_data_regions++;
+
+ out:
+    spin_unlock(&pmem_data_lock);
+    spin_unlock(&mgmt->lock);
+
+    return rc;
+}
+
 static int pmem_setup(unsigned long smfn, unsigned long emfn,
                       unsigned long mgmt_smfn, unsigned long mgmt_emfn,
                       unsigned int type)
@@ -360,6 +470,10 @@ static int pmem_setup(unsigned long smfn, unsigned long emfn,
 
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        rc = pmem_setup_data(smfn, emfn, mgmt_smfn, mgmt_emfn);
+        break;
+
     default:
         rc = -EINVAL;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index f825716446..d7c12f23fb 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1121,6 +1121,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t);
 /* Types of PMEM regions */
 #define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
 #define PMEM_REGION_TYPE_MGMT       1 /* PMEM regions for management usage */
+#define PMEM_REGION_TYPE_DATA       2 /* PMEM regions for guest data */
 
 /* PMEM_REGION_TYPE_RAW */
 struct xen_sysctl_nvdimm_pmem_raw_region {
@@ -1176,7 +1177,7 @@ struct xen_sysctl_nvdimm_pmem_setup {
                         /* above PMEM region. If the above PMEM region is */
                         /* a management region, mgmt_{s,e}mfn is required */
                         /* to be identical to {s,e}mfn. */
-    uint8_t  type;      /* Only PMEM_REGION_TYPE_MGMT is supported now */
+    uint8_t  type;      /* Must be one of PMEM_REGION_TYPE_{MGMT, DATA} */
 };
 typedef struct xen_sysctl_nvdimm_pmem_setup xen_sysctl_nvdimm_pmem_setup_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_setup_t);
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 22/39] tools/xen-ndctl: add command 'setup-data'
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (20 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 21/39] xen/pmem: support setup PMEM region for guest data usage Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

This command is to query Xen hypervisor to setup the specified PMEM
range for guest data usage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/misc/xen-ndctl.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 058f8ccaf5..320633ae05 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -37,6 +37,7 @@ static int handle_help(int argc, char *argv[]);
 static int handle_list(int argc, char *argv[]);
 static int handle_list_cmds(int argc, char *argv[]);
 static int handle_setup_mgmt(int argc, char *argv[]);
+static int handle_setup_data(int argc, char *argv[]);
 
 static const struct xen_ndctl_cmd
 {
@@ -72,6 +73,18 @@ static const struct xen_ndctl_cmd
         .handler = handle_list_cmds,
     },
 
+    {
+        .name    = "setup-data",
+        .syntax  = "<smfn> <emfn> <mgmt_smfn> <mgmt_emfn>",
+        .help    = "Setup a PMEM region from MFN 'smfn' to 'emfn' for guest data usage,\n"
+                   "which can be used as the backend of the virtual NVDIMM devices.\n\n"
+                   "PMEM pages from MFN 'mgmt_smfn' to 'mgmt_emfn' is used to manage\n"
+                   "the above PMEM region, and should not overlap with MFN from 'smfn'\n"
+                   "to 'emfn'.\n",
+        .handler = handle_setup_data,
+        .need_xc = true,
+    },
+
     {
         .name    = "setup-mgmt",
         .syntax  = "<smfn> <emfn>",
@@ -277,6 +290,29 @@ static int handle_setup_mgmt(int argc, char **argv)
     return xc_nvdimm_pmem_setup_mgmt(xch, smfn, emfn);
 }
 
+static int handle_setup_data(int argc, char **argv)
+{
+    unsigned long smfn, emfn, mgmt_smfn, mgmt_emfn;
+
+    if ( argc < 5 )
+    {
+        fprintf(stderr, "Too few arguments.\n\n");
+        show_help(argv[0]);
+        return -EINVAL;
+    }
+
+    if ( !string_to_mfn(argv[1], &smfn) ||
+         !string_to_mfn(argv[2], &emfn) ||
+         !string_to_mfn(argv[3], &mgmt_smfn) ||
+         !string_to_mfn(argv[4], &mgmt_emfn) )
+        return -EINVAL;
+
+    if ( argc > 5 )
+        return handle_unrecognized_argument(argv[0], argv[5]);
+
+    return xc_nvdimm_pmem_setup_data(xch, smfn, emfn, mgmt_smfn, mgmt_emfn);
+}
+
 int main(int argc, char *argv[])
 {
     unsigned int i;
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (21 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 22/39] tools/xen-ndctl: add command 'setup-data' Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of
data PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_misc.c | 3 ++-
 xen/common/pmem.c     | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index ef2e9e0656..db74df853a 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -896,7 +896,8 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
 
     if ( !nr ||
          (type != PMEM_REGION_TYPE_RAW &&
-          type != PMEM_REGION_TYPE_MGMT) )
+          type != PMEM_REGION_TYPE_MGMT &&
+          type != PMEM_REGION_TYPE_DATA) )
         return -EINVAL;
 
     sysctl.cmd = XEN_SYSCTL_nvdimm_op;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 6891ed7a47..cbe557c220 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -162,6 +162,10 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
         regions_nr->num_regions = nr_mgmt_regions;
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        regions_nr->num_regions = nr_data_regions;
+        break;
+
     default:
         rc = -EINVAL;
     }
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (22 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 25/39] tools/xen-ndctl: add option '--data' to command 'list' Haozhong Zhang
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of data PMEM
regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_misc.c       |  8 ++++++++
 xen/common/pmem.c           | 46 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h | 12 ++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index db74df853a..93a1f8fdc5 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -944,6 +944,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max;
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        size = sizeof(xen_sysctl_nvdimm_pmem_data_region_t) * max;
+        break;
+
     default:
         return -EINVAL;
     }
@@ -969,6 +973,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer);
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        set_xen_guest_handle(regions->u_buffer.data_regions, buffer);
+        break;
+
     default:
         rc = -EINVAL;
         goto out;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index cbe557c220..ed4a014c30 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -251,6 +251,48 @@ static int pmem_get_mgmt_regions(
     return rc;
 }
 
+static int pmem_get_data_regions(
+    XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) regions,
+    unsigned int *num_regions)
+{
+    struct list_head *cur;
+    unsigned int nr = 0, max = *num_regions;
+    xen_sysctl_nvdimm_pmem_data_region_t region;
+    int rc = 0;
+
+    if ( !guest_handle_okay(regions, max * sizeof(region)) )
+        return -EINVAL;
+
+    spin_lock(&pmem_data_lock);
+
+    list_for_each(cur, &pmem_data_regions)
+    {
+        struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+        if ( nr >= max )
+            break;
+
+        region.smfn = pmem->smfn;
+        region.emfn = pmem->emfn;
+        region.mgmt_smfn = pmem->u.data.mgmt_smfn;
+        region.mgmt_emfn = pmem->u.data.mgmt_emfn;
+
+        if ( copy_to_guest_offset(regions, nr, &region, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        nr++;
+    }
+
+    spin_unlock(&pmem_data_lock);
+
+    *num_regions = nr;
+
+    return rc;
+}
+
 static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 {
     unsigned int type = regions->type, max = regions->num_regions;
@@ -269,6 +311,10 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
         rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, &max);
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        rc = pmem_get_data_regions(regions->u_buffer.data_regions, &max);
+        break;
+
     default:
         rc = -EINVAL;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index d7c12f23fb..8595ea438a 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1141,6 +1141,16 @@ struct xen_sysctl_nvdimm_pmem_mgmt_region {
 typedef struct xen_sysctl_nvdimm_pmem_mgmt_region xen_sysctl_nvdimm_pmem_mgmt_region_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t);
 
+/* PMEM_REGION_TYPE_DATA */
+struct xen_sysctl_nvdimm_pmem_data_region {
+    uint64_t smfn;
+    uint64_t emfn;
+    uint64_t mgmt_smfn;
+    uint64_t mgmt_emfn;
+};
+typedef struct xen_sysctl_nvdimm_pmem_data_region xen_sysctl_nvdimm_pmem_data_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_data_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
     uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
@@ -1161,6 +1171,8 @@ struct xen_sysctl_nvdimm_pmem_regions {
         XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
         /* if type == PMEM_REGION_TYPE_MGMT */
         XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions;
+        /* if type == PMEM_REGION_TYPE_DATA */
+        XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) data_regions;
     } u_buffer;           /* IN: the guest handler where the entries of PMEM
                                  regions of the type @type are returned */
 };
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 25/39] tools/xen-ndctl: add option '--data' to command 'list'
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (23 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 26/39] xen/pmem: add function to map PMEM pages to HVM domain Haozhong Zhang
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

If the option '--data' is present, the command 'list' will list all
PMEM regions for guest data usage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/misc/xen-ndctl.c | 40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 320633ae05..33817863ca 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -58,10 +58,11 @@ static const struct xen_ndctl_cmd
 
     {
         .name    = "list",
-        .syntax  = "[--all | --raw | --mgmt]",
+        .syntax  = "[--all | --raw | --mgmt | --data]",
         .help    = "--all: the default option, list all PMEM regions of following types.\n"
                    "--raw: list all PMEM regions detected by Xen hypervisor.\n"
-                   "--mgmt: list all PMEM regions for management usage.\n",
+                   "--mgmt: list all PMEM regions for management usage.\n"
+                   "--data: list all PMEM regions that can be mapped to guest.\n",
         .handler = handle_list,
         .need_xc = true,
     },
@@ -209,6 +210,40 @@ static int handle_list_mgmt(void)
     return rc;
 }
 
+static int handle_list_data(void)
+{
+    int rc;
+    unsigned int nr = 0, i;
+    xen_sysctl_nvdimm_pmem_data_region_t *data_list;
+
+    rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_DATA, &nr);
+    if ( rc )
+    {
+        fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n",
+                strerror(-rc));
+        return rc;
+    }
+
+    data_list = malloc(nr * sizeof(*data_list));
+    if ( !data_list )
+        return -ENOMEM;
+
+    rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_DATA, data_list, &nr);
+    if ( rc )
+        goto out;
+
+    printf("Data PMEM regions:\n");
+    for ( i = 0; i < nr; i++ )
+        printf(" %u: MFN 0x%lx - 0x%lx, MGMT MFN 0x%lx - 0x%lx\n",
+               i, data_list[i].smfn, data_list[i].emfn,
+               data_list[i].mgmt_smfn, data_list[i].mgmt_emfn);
+
+ out:
+    free(data_list);
+
+    return rc;
+}
+
 static const struct list_handlers {
     const char *option;
     int (*handler)(void);
@@ -216,6 +251,7 @@ static const struct list_handlers {
 {
     { "--raw", handle_list_raw },
     { "--mgmt", handle_list_mgmt },
+    { "--data", handle_list_data },
 };
 
 static const unsigned int nr_list_hndrs =
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 26/39] xen/pmem: add function to map PMEM pages to HVM domain
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (24 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 25/39] tools/xen-ndctl: add option '--data' to command 'list' Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 27/39] xen/pmem: release PMEM pages on HVM domain destruction Haozhong Zhang
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

pmem_populate() is added to map the specifed data PMEM pages to a HVM
domain. No called is added in this commit.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/common/domain.c     |   3 ++
 xen/common/pmem.c       | 141 ++++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/pmem.h  |  19 +++++++
 xen/include/xen/sched.h |   3 ++
 4 files changed, 166 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5aebcf265f..4354342b02 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -290,6 +290,9 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
 
+    spin_lock_init(&d->pmem_lock);
+    INIT_PAGE_LIST_HEAD(&d->pmem_page_list);
+
     spin_lock_init(&d->node_affinity_lock);
     d->node_affinity = NODE_MASK_ALL;
     d->auto_node_affinity = 1;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index ed4a014c30..2f9ad64a26 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -17,10 +17,12 @@
  */
 
 #include <xen/errno.h>
+#include <xen/event.h>
 #include <xen/list.h>
 #include <xen/iocap.h>
 #include <xen/paging.h>
 #include <xen/pmem.h>
+#include <xen/sched.h>
 
 #include <asm/guest_access.h>
 
@@ -78,6 +80,31 @@ static bool check_overlap(unsigned long smfn1, unsigned long emfn1,
            (emfn1 > smfn2 && emfn1 <= emfn2);
 }
 
+static bool check_cover(struct list_head *list,
+                        unsigned long smfn, unsigned long emfn)
+{
+    struct list_head *cur;
+    struct pmem *pmem;
+    unsigned long pmem_smfn, pmem_emfn;
+
+    list_for_each(cur, list)
+    {
+        pmem = list_entry(cur, struct pmem, link);
+        pmem_smfn = pmem->smfn;
+        pmem_emfn = pmem->emfn;
+
+        if ( smfn < pmem_smfn )
+            return false;
+
+        if ( emfn <= pmem_emfn )
+            return true;
+
+        smfn = max(smfn, pmem_emfn);
+    }
+
+    return false;
+}
+
 /**
  * Add a PMEM region to a list. All PMEM regions in the list are
  * sorted in the ascending order of the start address. A PMEM region,
@@ -600,6 +627,120 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
 
 #ifdef CONFIG_X86
 
+static int pmem_assign_page(struct domain *d, struct page_info *pg,
+                            unsigned long gfn)
+{
+    int rc;
+
+    if ( pg->count_info != (PGC_state_free | PGC_pmem_page) )
+        return -EBUSY;
+
+    pg->count_info = PGC_allocated | PGC_state_inuse | PGC_pmem_page | 1;
+    pg->u.inuse.type_info = 0;
+    page_set_owner(pg, d);
+
+    rc = guest_physmap_add_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0);
+    if ( rc )
+    {
+        page_set_owner(pg, NULL);
+        pg->count_info = PGC_state_free | PGC_pmem_page;
+
+        return rc;
+    }
+
+    spin_lock(&d->pmem_lock);
+    page_list_add_tail(pg, &d->pmem_page_list);
+    spin_unlock(&d->pmem_lock);
+
+    return 0;
+}
+
+static int pmem_unassign_page(struct domain *d, struct page_info *pg,
+                              unsigned long gfn)
+{
+    int rc;
+
+    spin_lock(&d->pmem_lock);
+    page_list_del(pg, &d->pmem_page_list);
+    spin_unlock(&d->pmem_lock);
+
+    rc = guest_physmap_remove_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0);
+
+    page_set_owner(pg, NULL);
+    pg->count_info = PGC_state_free | PGC_pmem_page;
+
+    return 0;
+}
+
+int pmem_populate(struct xen_pmem_map_args *args)
+{
+    struct domain *d = args->domain;
+    unsigned long i = args->nr_done;
+    unsigned long mfn = args->mfn + i;
+    unsigned long emfn = args->mfn + args->nr_mfns;
+    unsigned long gfn = args->gfn + i;
+    struct page_info *page;
+    int rc = 0, err = 0;
+
+    if ( unlikely(d->is_dying) )
+        return -EINVAL;
+
+    if ( !is_hvm_domain(d) )
+        return -EINVAL;
+
+    spin_lock(&pmem_data_lock);
+
+    if ( !check_cover(&pmem_data_regions, mfn, emfn) )
+    {
+        rc = -ENXIO;
+        goto out;
+    }
+
+    for ( ; mfn < emfn; i++, mfn++, gfn++ )
+    {
+        if ( i != args->nr_done && hypercall_preempt_check() )
+        {
+            args->preempted = 1;
+            rc = -ERESTART;
+            break;
+        }
+
+        page = mfn_to_page(mfn);
+        if ( !page_state_is(page, free) )
+        {
+            rc = -EBUSY;
+            break;
+        }
+
+        rc = pmem_assign_page(d, page, gfn);
+        if ( rc )
+            break;
+    }
+
+ out:
+    if ( rc && rc != -ERESTART )
+        while ( i-- && !err )
+            err = pmem_unassign_page(d, mfn_to_page(--mfn), --gfn);
+
+    spin_unlock(&pmem_data_lock);
+
+    if ( unlikely(err) )
+    {
+        /*
+         * If we unfortunately fails to recover from the previous
+         * failure, some PMEM pages may still be mapped to the
+         * domain. As pmem_populate() is now called only during domain
+         * creation, let's crash the domain.
+         */
+        domain_crash(d);
+        rc = err;
+    }
+
+    args->nr_done = i;
+
+    return rc;
+}
+
 int __init pmem_dom0_setup_permission(struct domain *d)
 {
     struct list_head *cur;
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 9323d679a6..2dab90530b 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -33,6 +33,20 @@ int pmem_arch_setup(unsigned long smfn, unsigned long emfn, unsigned int pxm,
                     unsigned long mgmt_smfn, unsigned long mgmt_emfn,
                     unsigned long *used_mgmt_mfns);
 
+struct xen_pmem_map_args {
+    struct domain *domain;
+
+    unsigned long mfn;     /* start MFN of pmems page to be mapped */
+    unsigned long gfn;     /* start GFN of target domain */
+    unsigned long nr_mfns; /* number of pmem pages to be mapped */
+
+    /* For preemption ... */
+    unsigned long nr_done; /* number of pmem pages processed so far */
+    int preempted;         /* Is the operation preempted? */
+};
+
+int pmem_populate(struct xen_pmem_map_args *args);
+
 #else /* !CONFIG_X86 */
 
 static inline int pmem_dom0_setup_permission(...)
@@ -45,6 +59,11 @@ static inline int pmem_arch_setup(...)
     return -ENOSYS;
 }
 
+static inline int pmem_populate(...)
+{
+    return -ENOSYS;
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* CONFIG_NVDIMM_PMEM */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 5b8f8c68ea..de5b85b1dd 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -323,6 +323,9 @@ struct domain
     atomic_t         shr_pages;       /* number of shared pages             */
     atomic_t         paged_pages;     /* number of paged-out pages          */
 
+    spinlock_t       pmem_lock;       /* protect all following pmem_ fields */
+    struct page_list_head pmem_page_list; /* linked list of PMEM pages      */
+
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
     struct cpupool  *cpupool;
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 27/39] xen/pmem: release PMEM pages on HVM domain destruction
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (25 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 26/39] xen/pmem: add function to map PMEM pages to HVM domain Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 28/39] xen: add hypercall XENMEM_populate_pmem_map Haozhong Zhang
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, George Dunlap, Andrew Cooper, Jan Beulich,
	Chao Peng, Dan Williams

A new step RELMEM_pmem is added and taken before RELMEM_xen to release
all PMEM pages mapped to a HVM domain.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
---
 xen/arch/x86/domain.c        | 32 ++++++++++++++++++++++++++++----
 xen/arch/x86/mm.c            |  9 +++++++--
 xen/common/pmem.c            | 10 ++++++++++
 xen/include/asm-x86/domain.h |  1 +
 xen/include/xen/pmem.h       |  6 ++++++
 5 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dbddc536d3..1c4e788780 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1755,11 +1755,15 @@ static int relinquish_memory(
 {
     struct page_info  *page;
     unsigned long     x, y;
+    bool              is_pmem_list = (list == &d->pmem_page_list);
     int               ret = 0;
 
     /* Use a recursive lock, as we may enter 'free_domheap_page'. */
     spin_lock_recursive(&d->page_alloc_lock);
 
+    if ( is_pmem_list )
+        spin_lock(&d->pmem_lock);
+
     while ( (page = page_list_remove_head(list)) )
     {
         /* Grab a reference to the page so it won't disappear from under us. */
@@ -1841,8 +1845,9 @@ static int relinquish_memory(
             }
         }
 
-        /* Put the page on the list and /then/ potentially free it. */
-        page_list_add_tail(page, &d->arch.relmem_list);
+        if ( !is_pmem_list )
+            /* Put the page on the list and /then/ potentially free it. */
+            page_list_add_tail(page, &d->arch.relmem_list);
         put_page(page);
 
         if ( hypercall_preempt_check() )
@@ -1852,10 +1857,13 @@ static int relinquish_memory(
         }
     }
 
-    /* list is empty at this point. */
-    page_list_move(list, &d->arch.relmem_list);
+    if ( !is_pmem_list )
+        /* list is empty at this point. */
+        page_list_move(list, &d->arch.relmem_list);
 
  out:
+    if ( is_pmem_list )
+        spin_unlock(&d->pmem_lock);
     spin_unlock_recursive(&d->page_alloc_lock);
     return ret;
 }
@@ -1922,13 +1930,29 @@ int domain_relinquish_resources(struct domain *d)
                 return ret;
         }
 
+#ifndef CONFIG_NVDIMM_PMEM
         d->arch.relmem = RELMEM_xen;
+#else
+        d->arch.relmem = RELMEM_pmem;
+#endif
 
         spin_lock(&d->page_alloc_lock);
         page_list_splice(&d->arch.relmem_list, &d->page_list);
         INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
         spin_unlock(&d->page_alloc_lock);
 
+#ifdef CONFIG_NVDIMM_PMEM
+        /* Fallthrough. Relinquish every page of PMEM. */
+    case RELMEM_pmem:
+        if ( is_hvm_domain(d) )
+        {
+            ret = relinquish_memory(d, &d->pmem_page_list, ~0UL);
+            if ( ret )
+                return ret;
+        }
+        d->arch.relmem = RELMEM_xen;
+#endif
+
         /* Fallthrough. Relinquish every page of memory. */
     case RELMEM_xen:
         ret = relinquish_memory(d, &d->xenpage_list, ~0UL);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 93ccf198c9..26f9e5a13e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -106,6 +106,7 @@
 #include <xen/efi.h>
 #include <xen/grant_table.h>
 #include <xen/hypercall.h>
+#include <xen/pmem.h>
 #include <asm/paging.h>
 #include <asm/shadow.h>
 #include <asm/page.h>
@@ -2341,8 +2342,12 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. */
-             && cleanup_page_cacheattr(page) == 0 )
+#ifdef CONFIG_NVDIMM_PMEM
+        if ( is_pmem_page(page) )
+            pmem_page_cleanup(page);
+        else
+#endif
+        if ( cleanup_page_cacheattr(page) == 0 )
             free_domheap_page(page);
         else
             gdprintk(XENLOG_WARNING,
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 2f9ad64a26..8b9378dce6 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -741,6 +741,16 @@ int pmem_populate(struct xen_pmem_map_args *args)
     return rc;
 }
 
+void pmem_page_cleanup(struct page_info *page)
+{
+    ASSERT(is_pmem_page(page));
+    ASSERT((page->count_info & PGC_count_mask) == 0);
+
+    page->count_info = PGC_pmem_page | PGC_state_free;
+    page_set_owner(page, NULL);
+    set_gpfn_from_mfn(page_to_mfn(page), INVALID_M2P_ENTRY);
+}
+
 int __init pmem_dom0_setup_permission(struct domain *d)
 {
     struct list_head *cur;
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index fb8bf17458..8322546b5d 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -303,6 +303,7 @@ struct arch_domain
     enum {
         RELMEM_not_started,
         RELMEM_shared,
+        RELMEM_pmem,
         RELMEM_xen,
         RELMEM_l4,
         RELMEM_l3,
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 2dab90530b..dfbc412065 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -21,6 +21,7 @@
 #ifdef CONFIG_NVDIMM_PMEM
 
 #include <public/sysctl.h>
+#include <xen/mm.h>
 #include <xen/types.h>
 
 int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
@@ -46,6 +47,7 @@ struct xen_pmem_map_args {
 };
 
 int pmem_populate(struct xen_pmem_map_args *args);
+void pmem_page_cleanup(struct page_info *page);
 
 #else /* !CONFIG_X86 */
 
@@ -64,6 +66,10 @@ static inline int pmem_populate(...)
     return -ENOSYS;
 }
 
+static inline void pmem_page_cleanup(...)
+{
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* CONFIG_NVDIMM_PMEM */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 28/39] xen: add hypercall XENMEM_populate_pmem_map
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (26 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 27/39] xen/pmem: release PMEM pages on HVM domain destruction Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 29/39] tools: reserve guest memory for ACPI from device model Haozhong Zhang
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams, Daniel De Graaf

This hypercall will be used by device models to map host PMEM pages to
guest.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/flask/policy/modules/xen.if   |  2 +-
 tools/libxc/include/xenctrl.h       | 17 ++++++++++++++
 tools/libxc/xc_domain.c             | 15 +++++++++++++
 xen/common/compat/memory.c          |  1 +
 xen/common/memory.c                 | 44 +++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h         | 14 +++++++++++-
 xen/include/xsm/dummy.h             | 11 ++++++++++
 xen/include/xsm/xsm.h               | 12 ++++++++++
 xen/xsm/dummy.c                     |  4 ++++
 xen/xsm/flask/hooks.c               | 13 +++++++++++
 xen/xsm/flask/policy/access_vectors |  2 ++
 11 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index 912640002e..9634dee25f 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -55,7 +55,7 @@ define(`create_domain_common', `
 			psr_cmt_op psr_cat_op soft_reset };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
-	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
+	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp populate_pmem_map };
 	allow $1 $2:grant setup;
 	allow $1 $2:hvm { cacheattr getparam hvmctl sethvmc
 			setparam nested altp2mhvm altp2mhvm_op dm };
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 41e5e3408c..a81dcdbe58 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2643,6 +2643,23 @@ int xc_nvdimm_pmem_setup_data(xc_interface *xch,
                               unsigned long smfn, unsigned long emfn,
                               unsigned long mgmt_smfn, unsigned long mgmt_emfn);
 
+/*
+ * Map specified host PMEM pages to the specified guest address.
+ *
+ * Parameters:
+ *  xch:     xc interface handle
+ *  domid:   the target domain id
+ *  mfn:     the start MFN of the PMEM pages
+ *  gfn:     the start GFN of the target guest physical pages
+ *  nr_mfns: the number of PMEM pages to be mapped
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid,
+                                unsigned long mfn, unsigned long gfn,
+                                unsigned long nr_mfns);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 3bab4e8bab..b548da750a 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2397,6 +2397,21 @@ int xc_domain_soft_reset(xc_interface *xch,
     domctl.domain = (domid_t)domid;
     return do_domctl(xch, &domctl);
 }
+
+int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid,
+                                unsigned long mfn, unsigned long gfn,
+                                unsigned long nr_mfns)
+{
+    struct xen_pmem_map args = {
+        .domid   = domid,
+        .mfn     = mfn,
+        .gfn     = gfn,
+        .nr_mfns = nr_mfns,
+    };
+
+    return do_memory_op(xch, XENMEM_populate_pmem_map, &args, sizeof(args));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 35bb259808..51bec835b9 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -525,6 +525,7 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
         case XENMEM_add_to_physmap:
         case XENMEM_remove_from_physmap:
         case XENMEM_access_op:
+        case XENMEM_populate_pmem_map:
             break;
 
         case XENMEM_get_vnumainfo:
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 26da6050f6..31ef480562 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -23,6 +23,7 @@
 #include <xen/numa.h>
 #include <xen/mem_access.h>
 #include <xen/trace.h>
+#include <xen/pmem.h>
 #include <asm/current.h>
 #include <asm/hardirq.h>
 #include <asm/p2m.h>
@@ -1379,6 +1380,49 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 #endif
 
+#ifdef CONFIG_NVDIMM_PMEM
+    case XENMEM_populate_pmem_map:
+    {
+        struct xen_pmem_map map;
+        struct xen_pmem_map_args args;
+
+        if ( copy_from_guest(&map, arg, 1) )
+            return -EFAULT;
+
+        if ( map.domid == DOMID_SELF )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_any_id(map.domid);
+        if ( !d )
+            return -EINVAL;
+
+        rc = xsm_populate_pmem_map(XSM_TARGET, curr_d, d);
+        if ( rc )
+        {
+            rcu_unlock_domain(d);
+            return rc;
+        }
+
+        args.domain = d;
+        args.mfn = map.mfn;
+        args.gfn = map.gfn;
+        args.nr_mfns = map.nr_mfns;
+        args.nr_done = start_extent;
+        args.preempted = 0;
+
+        rc = pmem_populate(&args);
+
+        rcu_unlock_domain(d);
+
+        if ( rc == -ERESTART && args.preempted )
+            return hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "lh",
+                op | (args.nr_done << MEMOP_EXTENT_SHIFT), arg);
+
+        break;
+    }
+#endif /* CONFIG_NVDIMM_PMEM */
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 29386df98b..d74436e4b0 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -650,7 +650,19 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 28 */
+#define XENMEM_populate_pmem_map 28
+
+struct xen_pmem_map {
+    /* IN */
+    domid_t domid;
+    unsigned long mfn;
+    unsigned long gfn;
+    unsigned int nr_mfns;
+};
+typedef struct xen_pmem_map xen_pmem_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmem_map_t);
+
+/* Next available subop number is 29 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index ba89ea4bc1..6107da308c 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -724,3 +724,14 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
+
+#ifdef CONFIG_NVDIMM_PMEM
+
+static XSM_INLINE int xsm_populate_pmem_map(XSM_DEFAULT_ARG
+                                            struct domain *d1, struct domain *d2)
+{
+    XSM_ASSERT_ACTION(XSM_TARGET);
+    return xsm_default_action(action, d1, d2);
+}
+
+#endif /* CONFIG_NVDIMM_PMEM */
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 7f7feffc68..e43e79f719 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -180,6 +180,10 @@ struct xsm_operations {
     int (*dm_op) (struct domain *d);
 #endif
     int (*xen_version) (uint32_t cmd);
+
+#ifdef CONFIG_NVDIMM_PMEM
+    int (*populate_pmem_map) (struct domain *d1, struct domain *d2);
+#endif
 };
 
 #ifdef CONFIG_XSM
@@ -692,6 +696,14 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
     return xsm_ops->xen_version(op);
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+static inline int xsm_populate_pmem_map(xsm_default_t def,
+                                        struct domain *d1, struct domain *d2)
+{
+    return xsm_ops->populate_pmem_map(d1, d2);
+}
+#endif /* CONFIG_NVDIMM_PMEM */
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 479b103614..4d65eaca61 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -157,4 +157,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, dm_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
+
+#ifdef CONFIG_NVDIMM_PMEM
+    set_to_dummy_if_null(ops, populate_pmem_map);
+#endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index edfe529495..d91f246b47 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1719,6 +1719,15 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+
+static int flask_populate_pmem_map(struct domain *d1, struct domain *d2)
+{
+    return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__POPULATE_PMEM_MAP);
+}
+
+#endif /* CONFIG_NVDIMM_PMEM */
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1852,6 +1861,10 @@ static struct xsm_operations flask_ops = {
     .dm_op = flask_dm_op,
 #endif
     .xen_version = flask_xen_version,
+
+#ifdef CONFIG_NVDIMM_PMEM
+    .populate_pmem_map = flask_populate_pmem_map,
+#endif
 };
 
 void __init flask_init(const void *policy_buffer, size_t policy_size)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index af05826064..fe32fd93c8 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -387,6 +387,8 @@ class mmu
 # Allow a privileged domain to install a map of a page it does not own.  Used
 # for stub domain device models with the PV framebuffer.
     target_hack
+# XENMEM_populate_pmem_map
+    populate_pmem_map
 }
 
 # control of the paging_domctl split by subop
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 29/39] tools: reserve guest memory for ACPI from device model
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (27 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 28/39] xen: add hypercall XENMEM_populate_pmem_map Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc Haozhong Zhang
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

Some virtual devices (e.g. NVDIMM) require complex ACPI tables and
definition blocks (in AML), which a device model (e.g. QEMU) has
already been able to construct. Instead of introducing the redundant
implementation to Xen, we would like to reuse the device model to
construct those ACPI stuffs.

This commit allows Xen to reserve an area in the guest memory for the
device model to pass its ACPI tables and definition blocks to guest,
which will be loaded by hvmloader. The base guest physical address and
the size of the reserved area are passed to the device model via
XenStore keys hvmloader/dm-acpi/{address, length}. An xl config
"dm_acpi_pages = N" is added to specify the number of reserved guest
memory pages.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xc_dom.h            |  1 +
 tools/libxc/xc_dom_x86.c                | 13 +++++++++++++
 tools/libxl/libxl_dom.c                 | 25 +++++++++++++++++++++++++
 tools/libxl/libxl_types.idl             |  1 +
 tools/xl/xl_parse.c                     | 17 ++++++++++++++++-
 xen/include/public/hvm/hvm_xs_strings.h |  8 ++++++++
 6 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index ce47058c41..7c541576e7 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -93,6 +93,7 @@ struct xc_dom_image {
     struct xc_dom_seg pgtables_seg;
     struct xc_dom_seg devicetree_seg;
     struct xc_dom_seg start_info_seg; /* HVMlite only */
+    struct xc_dom_seg dm_acpi_seg;    /* reserved PFNs for DM ACPI */
     xen_pfn_t start_info_pfn;
     xen_pfn_t console_pfn;
     xen_pfn_t xenstore_pfn;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..8755350295 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -674,6 +674,19 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
                          ioreq_server_pfn(0));
         xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
                          NR_IOREQ_SERVER_PAGES);
+
+        if ( dom->dm_acpi_seg.pages )
+        {
+            size_t acpi_size = dom->dm_acpi_seg.pages * XC_DOM_PAGE_SIZE(dom);
+
+            rc = xc_dom_alloc_segment(dom, &dom->dm_acpi_seg, "DM ACPI",
+                                      0, acpi_size);
+            if ( rc != 0 )
+            {
+                DOMPRINTF("Unable to reserve memory for DM ACPI");
+                goto out;
+            }
+        }
     }
 
     rc = xc_dom_alloc_segment(dom, &dom->start_info_seg,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f54fd49a73..bad1719892 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -897,6 +897,29 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
             goto err;
     }
 
+    if (dom->dm_acpi_seg.pages) {
+        uint64_t guest_addr_out = dom->dm_acpi_seg.pfn * XC_DOM_PAGE_SIZE(dom);
+
+        if (guest_addr_out >= 0x100000000ULL) {
+            LOG(ERROR,
+                "Guest address of DM ACPI is 0x%"PRIx64", but expected below 4G",
+                guest_addr_out);
+            goto err;
+        }
+
+        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_ADDRESS, domid);
+        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64, guest_addr_out);
+        if (ret)
+            goto err;
+
+        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_LENGTH, domid);
+        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
+                               (uint64_t)(dom->dm_acpi_seg.pages *
+                                          XC_DOM_PAGE_SIZE(dom)));
+        if (ret)
+            goto err;
+    }
+
     return 0;
 
 err:
@@ -1184,6 +1207,8 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
             dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
     }
 
+    dom->dm_acpi_seg.pages = info->u.hvm.dm_acpi_pages;
+
     rc = libxl__build_dom(gc, domid, info, state, dom);
     if (rc != 0)
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 173d70acec..4acc0457f4 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -565,6 +565,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
                                        ("mca_caps",         uint64),
+                                       ("dm_acpi_pages",    integer),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 02ddd2e90d..ed562a1956 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -810,7 +810,7 @@ void parse_config_data(const char *config_source,
                        libxl_domain_config *d_config)
 {
     const char *buf;
-    long l, vcpus = 0;
+    long l, vcpus = 0, nr_dm_acpi_pages;
     XLU_Config *config;
     XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms,
                    *usbctrls, *usbdevs, *p9devs;
@@ -1929,6 +1929,21 @@ skip_usbdev:
 
 #undef parse_extra_args
 
+    if (b_info->type == LIBXL_DOMAIN_TYPE_HVM &&
+        b_info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_NONE) {
+        /* parse 'dm_acpi_pages' */
+        e = xlu_cfg_get_long(config, "dm_acpi_pages", &nr_dm_acpi_pages, 0);
+        if (e && e != ESRCH) {
+            fprintf(stderr, "ERROR: unable to parse dm_acpi_pages.\n");
+            exit(-ERROR_FAIL);
+        }
+        if (!e && nr_dm_acpi_pages <= 0) {
+            fprintf(stderr, "ERROR: require positive dm_acpi_pages.\n");
+            exit(-ERROR_FAIL);
+        }
+        b_info->u.hvm.dm_acpi_pages = nr_dm_acpi_pages;
+    }
+
     /* If we've already got vfb=[] for PV guest then ignore top level
      * VNC config. */
     if (c_info->type == LIBXL_DOMAIN_TYPE_PV && !d_config->num_vfbs) {
diff --git a/xen/include/public/hvm/hvm_xs_strings.h b/xen/include/public/hvm/hvm_xs_strings.h
index fea1dd4407..9f04ff2adc 100644
--- a/xen/include/public/hvm/hvm_xs_strings.h
+++ b/xen/include/public/hvm/hvm_xs_strings.h
@@ -80,4 +80,12 @@
  */
 #define HVM_XS_OEM_STRINGS             "bios-strings/oem-%d"
 
+/* If a range of guest memory is reserved to pass ACPI from the device
+ * model (e.g. QEMU), the start address and the size of the reserved
+ * guest memory are specified by following two xenstore values.
+ */
+#define HVM_XS_DM_ACPI_ROOT            "hvmloader/dm-acpi"
+#define HVM_XS_DM_ACPI_ADDRESS         HVM_XS_DM_ACPI_ROOT"/address"
+#define HVM_XS_DM_ACPI_LENGTH          HVM_XS_DM_ACPI_ROOT"/length"
+
 #endif /* __XEN_PUBLIC_HVM_HVM_XS_STRINGS_H__ */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (28 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 29/39] tools: reserve guest memory for ACPI from device model Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 31/39] tools/libacpi: add callback to translate GPA to GVA Haozhong Zhang
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

The AML builder added later needs to allocate contiguous memory across
multiple calls to mem_ops.alloc(). Therefore, it needs to know the
minimal alignment used by mem_ops.alloc().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c | 2 ++
 tools/libacpi/libacpi.h         | 2 ++
 tools/libxl/libxl_x86_acpi.c    | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d24cd..c2218d9fcb 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -990,6 +990,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     ctxt.mem_ops.free = acpi_mem_free;
     ctxt.mem_ops.v2p = acpi_v2p;
 
+    ctxt.min_alloc_byte_align = 16;
+
     acpi_build_tables(&ctxt, config);
 
     hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index a2efd23b0b..157f63f7bc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -52,6 +52,8 @@ struct acpi_ctxt {
         void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size);
         unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
     } mem_ops;
+
+    uint32_t min_alloc_byte_align; /* minimum alignment used by mem_ops.alloc */
 };
 
 struct acpi_config {
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 176175676f..3b79b2179b 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -183,6 +183,8 @@ int libxl__dom_load_acpi(libxl__gc *gc,
     libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
     libxl_ctxt.c.mem_ops.free = acpi_mem_free;
 
+    libxl_ctxt.c.min_alloc_byte_align = 16;
+
     rc = init_acpi_config(gc, dom, b_info, &config);
     if (rc) {
         LOG(ERROR, "init_acpi_config failed (rc=%d)", rc);
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 31/39] tools/libacpi: add callback to translate GPA to GVA
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (29 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 32/39] tools/libacpi: add callbacks to access XenStore Haozhong Zhang
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

The location of ACPI blobs passed from device modeil is offered in
guest physical address. libacpi needs to convert the guest physical
address to guest virtual address before it can access those ACPI
blobs.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c |  6 ++++++
 tools/firmware/hvmloader/util.h |  1 +
 tools/libacpi/libacpi.h         |  1 +
 tools/libxl/libxl_x86_acpi.c    | 10 ++++++++++
 4 files changed, 18 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index c2218d9fcb..2f8a4654b0 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -871,6 +871,11 @@ static unsigned long acpi_v2p(struct acpi_ctxt *ctxt, void *v)
     return virt_to_phys(v);
 }
 
+static void *acpi_p2v(struct acpi_ctxt *ctxt, unsigned long p)
+{
+    return phys_to_virt(p);
+}
+
 static void *acpi_mem_alloc(struct acpi_ctxt *ctxt,
                             uint32_t size, uint32_t align)
 {
@@ -989,6 +994,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     ctxt.mem_ops.alloc = acpi_mem_alloc;
     ctxt.mem_ops.free = acpi_mem_free;
     ctxt.mem_ops.v2p = acpi_v2p;
+    ctxt.mem_ops.p2v = acpi_p2v;
 
     ctxt.min_alloc_byte_align = 16;
 
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 2ef854eb8f..e9fe6c6e79 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -200,6 +200,7 @@ xen_pfn_t mem_hole_alloc(uint32_t nr_mfns);
 /* Allocate memory in a reserved region below 4GB. */
 void *mem_alloc(uint32_t size, uint32_t align);
 #define virt_to_phys(v) ((unsigned long)(v))
+#define phys_to_virt(v) ((void *)(p))
 
 /* Allocate memory in a scratch region */
 void *scratch_alloc(uint32_t size, uint32_t align);
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 157f63f7bc..f5a1c384bc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -51,6 +51,7 @@ struct acpi_ctxt {
         void *(*alloc)(struct acpi_ctxt *ctxt, uint32_t size, uint32_t align);
         void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size);
         unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
+        void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p);
     } mem_ops;
 
     uint32_t min_alloc_byte_align; /* minimum alignment used by mem_ops.alloc */
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 3b79b2179b..b14136949c 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -52,6 +52,15 @@ static unsigned long virt_to_phys(struct acpi_ctxt *ctxt, void *v)
             libxl_ctxt->alloc_base_paddr);
 }
 
+static void *phys_to_virt(struct acpi_ctxt *ctxt, unsigned long p)
+{
+    struct libxl_acpi_ctxt *libxl_ctxt =
+        CONTAINER_OF(ctxt, struct libxl_acpi_ctxt, c);
+
+    return (void *)((p - libxl_ctxt->alloc_base_paddr) +
+                    libxl_ctxt->alloc_base_vaddr);
+}
+
 static void *mem_alloc(struct acpi_ctxt *ctxt,
                        uint32_t size, uint32_t align)
 {
@@ -181,6 +190,7 @@ int libxl__dom_load_acpi(libxl__gc *gc,
 
     libxl_ctxt.c.mem_ops.alloc = mem_alloc;
     libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
+    libxl_ctxt.c.mem_ops.p2v = phys_to_virt;
     libxl_ctxt.c.mem_ops.free = acpi_mem_free;
 
     libxl_ctxt.c.min_alloc_byte_align = 16;
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 32/39] tools/libacpi: add callbacks to access XenStore
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (30 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 31/39] tools/libacpi: add callback to translate GPA to GVA Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 33/39] tools/libacpi: add a simple AML builder Haozhong Zhang
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

libacpi needs to access information placed in XenStore in order to
load ACPI built by the device model.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c   | 52 +++++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h   |  9 +++++++
 tools/firmware/hvmloader/xenbus.c | 44 +++++++++++++++++++++++----------
 tools/libacpi/libacpi.h           | 10 ++++++++
 tools/libxl/libxl_x86_acpi.c      | 24 ++++++++++++++++++
 5 files changed, 126 insertions(+), 13 deletions(-)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 2f8a4654b0..5b8a4ee9d0 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -893,6 +893,53 @@ static uint32_t acpi_lapic_id(unsigned cpu)
     return LAPIC_ID(cpu);
 }
 
+static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
+{
+    return xenstore_read(path, NULL);
+}
+
+static int acpi_xs_write(struct acpi_ctxt *ctxt,
+                         const char *path, const char *value)
+{
+    return xenstore_write(path, value);
+}
+
+static unsigned int count_strings(const char *strings, unsigned int len)
+{
+    const char *p;
+    unsigned int n;
+
+    for ( p = strings, n = 0; p < strings + len; p++ )
+        if ( *p == '\0' )
+            n++;
+
+    return n;
+}
+
+static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
+                                const char *path, unsigned int *num)
+{
+    const char *strings;
+    char *s, *p, **ret;
+    unsigned int len, n;
+
+    strings = xenstore_directory(path, &len, NULL);
+    if ( !strings )
+        return NULL;
+
+    n = count_strings(strings, len);
+    ret = ctxt->mem_ops.alloc(ctxt, n * sizeof(p) + len, 0);
+    if ( !ret )
+        return NULL;
+    memcpy(&ret[n], strings, len);
+
+    s = (char *)&ret[n];
+    for ( p = s, *num = 0; p < s + len; p += strlen(p) + 1 )
+        ret[(*num)++] = p;
+
+    return ret;
+}
+
 void hvmloader_acpi_build_tables(struct acpi_config *config,
                                  unsigned int physical)
 {
@@ -998,6 +1045,11 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
 
     ctxt.min_alloc_byte_align = 16;
 
+    ctxt.xs_ops.read = acpi_xs_read;
+    ctxt.xs_ops.write = acpi_xs_write;
+    ctxt.xs_ops.directory = acpi_xs_directory;
+    ctxt.xs_opaque = NULL;
+
     acpi_build_tables(&ctxt, config);
 
     hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index e9fe6c6e79..37e62d93c0 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -225,6 +225,15 @@ const char *xenstore_read(const char *path, const char *default_resp);
  */
 int xenstore_write(const char *path, const char *value);
 
+/* Read a xenstore directory. Return NULL, or a nul-terminated string
+ * which contains all names of directory entries. Names are separated
+ * by '\0'. The returned string is in a static buffer, so only valid
+ * until the next xenstore/xenbus operation.  If @default_resp is
+ * specified, it is returned in preference to a NULL or empty string
+ * received from xenstore.
+ */
+const char *xenstore_directory(const char *path, uint32_t *len,
+                               const char *default_resp);
 
 /* Get a HVM param.
  */
diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
index 2b89a56fce..387c0971e1 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -257,24 +257,16 @@ static int xenbus_recv(uint32_t *reply_len, const char **reply_data,
     return 0;
 }
 
-
-/* Read a xenstore key.  Returns a nul-terminated string (even if the XS
- * data wasn't nul-terminated) or NULL.  The returned string is in a
- * static buffer, so only valid until the next xenstore/xenbus operation.
- * If @default_resp is specified, it is returned in preference to a NULL or
- * empty string received from xenstore.
- */
-const char *xenstore_read(const char *path, const char *default_resp)
+static const char *xenstore_read_common(const char *path, uint32_t *len,
+                                        const char *default_resp, bool is_dir)
 {
-    uint32_t len = 0, type = 0;
+    uint32_t type = 0, expected_type = is_dir ? XS_DIRECTORY : XS_READ;
     const char *answer = NULL;
 
-    xenbus_send(XS_READ,
-                path, strlen(path),
-                "", 1, /* nul separator */
+    xenbus_send(expected_type, path, strlen(path), "", 1, /* nul separator */
                 NULL, 0);
 
-    if ( xenbus_recv(&len, &answer, &type) || (type != XS_READ) )
+    if ( xenbus_recv(len, &answer, &type) || type != expected_type )
         answer = NULL;
 
     if ( (default_resp != NULL) && ((answer == NULL) || (*answer == '\0')) )
@@ -284,6 +276,32 @@ const char *xenstore_read(const char *path, const char *default_resp)
     return answer;
 }
 
+/* Read a xenstore key.  Returns a nul-terminated string (even if the XS
+ * data wasn't nul-terminated) or NULL.  The returned string is in a
+ * static buffer, so only valid until the next xenstore/xenbus operation.
+ * If @default_resp is specified, it is returned in preference to a NULL or
+ * empty string received from xenstore.
+ */
+const char *xenstore_read(const char *path, const char *default_resp)
+{
+    uint32_t len = 0;
+
+    return xenstore_read_common(path, &len, default_resp, false);
+}
+
+/* Read a xenstore directory. Return NULL, or a nul-terminated string
+ * which contains all names of directory entries. Names are separated
+ * by '\0'. The returned string is in a static buffer, so only valid
+ * until the next xenstore/xenbus operation.  If @default_resp is
+ * specified, it is returned in preference to a NULL or empty string
+ * received from xenstore.
+ */
+const char *xenstore_directory(const char *path, uint32_t *len,
+                               const char *default_resp)
+{
+    return xenstore_read_common(path, len, default_resp, true);
+}
+
 /* Write a xenstore key.  @value must be a nul-terminated string. Returns
  * zero on success or a xenstore error code on failure.
  */
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index f5a1c384bc..ab86a35509 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -55,6 +55,16 @@ struct acpi_ctxt {
     } mem_ops;
 
     uint32_t min_alloc_byte_align; /* minimum alignment used by mem_ops.alloc */
+
+    struct acpi_xs_ops {
+        const char *(*read)(struct acpi_ctxt *ctxt, const char *path);
+        int (*write)(struct acpi_ctxt *ctxt, const char *path,
+                     const char *value);
+        char **(*directory)(struct acpi_ctxt *ctxt, const char *path,
+                            unsigned int *num);
+    } xs_ops;
+
+    void *xs_opaque;
 };
 
 struct acpi_config {
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index b14136949c..cbfd9a373c 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -98,6 +98,25 @@ static uint32_t acpi_lapic_id(unsigned cpu)
     return cpu * 2;
 }
 
+static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
+{
+    return libxl__xs_read((libxl__gc *)ctxt->xs_opaque, XBT_NULL, path);
+}
+
+static int acpi_xs_write(struct acpi_ctxt *ctxt,
+                         const char *path, const char *value)
+{
+    return libxl__xs_write_checked((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
+                                   path, value);
+}
+
+static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
+                                const char *path, unsigned int *num)
+{
+    return libxl__xs_directory((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
+                               path, num);
+}
+
 static int init_acpi_config(libxl__gc *gc, 
                             struct xc_dom_image *dom,
                             const libxl_domain_build_info *b_info,
@@ -195,6 +214,11 @@ int libxl__dom_load_acpi(libxl__gc *gc,
 
     libxl_ctxt.c.min_alloc_byte_align = 16;
 
+    libxl_ctxt.c.xs_ops.read = acpi_xs_read;
+    libxl_ctxt.c.xs_ops.write = acpi_xs_write;
+    libxl_ctxt.c.xs_ops.directory = acpi_xs_directory;
+    libxl_ctxt.c.xs_opaque = gc;
+
     rc = init_acpi_config(gc, dom, b_info, &config);
     if (rc) {
         LOG(ERROR, "init_acpi_config failed (rc=%d)", rc);
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 33/39] tools/libacpi: add a simple AML builder
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (31 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 32/39] tools/libacpi: add callbacks to access XenStore Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 34/39] tools/libacpi: add DM ACPI blacklists Haozhong Zhang
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

It is used by libacpi to generate SSDTs from ACPI namespace devices
built by the device model.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/Makefile |   3 +-
 tools/libacpi/aml_build.c         | 326 ++++++++++++++++++++++++++++++++++++++
 tools/libacpi/aml_build.h         | 116 ++++++++++++++
 tools/libxl/Makefile              |   3 +-
 4 files changed, 446 insertions(+), 2 deletions(-)
 create mode 100644 tools/libacpi/aml_build.c
 create mode 100644 tools/libacpi/aml_build.h

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index 7c4c0ce535..3e917507c8 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -76,11 +76,12 @@ smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
 
 ACPI_PATH = ../../libacpi
 DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
-ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
+ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o aml_build.o
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
 CFLAGS += -I$(ACPI_PATH)
 vpath build.c $(ACPI_PATH)
 vpath static_tables.c $(ACPI_PATH)
+vpath aml_build.c $(ACPI_PATH)
 OBJS += $(ACPI_OBJS)
 
 hvmloader: $(OBJS)
diff --git a/tools/libacpi/aml_build.c b/tools/libacpi/aml_build.c
new file mode 100644
index 0000000000..9b4e28ad95
--- /dev/null
+++ b/tools/libacpi/aml_build.c
@@ -0,0 +1,326 @@
+/*
+ * tools/libacpi/aml_build.c
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include LIBACPI_STDUTILS
+#include "libacpi.h"
+#include "aml_build.h"
+
+#define AML_OP_SCOPE     0x10
+#define AML_OP_EXT       0x5B
+#define AML_OP_DEVICE    0x82
+
+#define ACPI_NAMESEG_LEN 4
+
+struct aml_build_alloctor {
+    struct acpi_ctxt *ctxt;
+    uint8_t *buf;
+    uint32_t capacity;
+    uint32_t used;
+};
+static struct aml_build_alloctor alloc;
+
+static uint8_t *aml_buf_alloc(uint32_t size)
+{
+    uint8_t *buf = NULL;
+    struct acpi_ctxt *ctxt = alloc.ctxt;
+    uint32_t alloc_size, alloc_align = ctxt->min_alloc_byte_align;
+    uint32_t length = alloc.used + size;
+
+    /* Overflow ... */
+    if ( length < alloc.used )
+        return NULL;
+
+    if ( length <= alloc.capacity )
+    {
+        buf = alloc.buf + alloc.used;
+        alloc.used += size;
+    }
+    else
+    {
+        alloc_size = length - alloc.capacity;
+        alloc_size = (alloc_size + alloc_align) & ~(alloc_align - 1);
+        buf = ctxt->mem_ops.alloc(ctxt, alloc_size, alloc_align);
+
+        if ( buf &&
+             buf == alloc.buf + alloc.capacity /* cont to existing buf */ )
+        {
+            alloc.capacity += alloc_size;
+            buf = alloc.buf + alloc.used;
+            alloc.used += size;
+        }
+        else
+            buf = NULL;
+    }
+
+    return buf;
+}
+
+static uint32_t get_package_length(uint8_t *pkg)
+{
+    uint32_t len;
+
+    len = pkg - alloc.buf;
+    len = alloc.used - len;
+
+    return len;
+}
+
+/*
+ * On success, an object in the following form is stored at @buf.
+ *   @byte
+ *   the original content in @buf
+ */
+static int build_prepend_byte(uint8_t *buf, uint8_t byte)
+{
+    uint32_t len;
+
+    len = buf - alloc.buf;
+    len = alloc.used - len;
+
+    if ( !aml_buf_alloc(sizeof(uint8_t)) )
+        return -1;
+
+    if ( len )
+        memmove(buf + 1, buf, len);
+    buf[0] = byte;
+
+    return 0;
+}
+
+/*
+ * On success, an object in the following form is stored at @buf.
+ *   AML encoding of four-character @name
+ *   the original content in @buf
+ *
+ * Refer to  ACPI spec 6.1, Sec 20.2.2 "Name Objects Encoding".
+ *
+ * XXX: names of multiple segments (e.g. X.Y.Z) are not supported
+ */
+static int build_prepend_name(uint8_t *buf, const char *name)
+{
+    uint8_t *p = buf;
+    const char *s = name;
+    uint32_t len, name_len;
+
+    while ( *s == '\\' || *s == '^' )
+    {
+        if ( build_prepend_byte(p, (uint8_t) *s) )
+            return -1;
+        ++p;
+        ++s;
+    }
+
+    if ( !*s )
+        return build_prepend_byte(p, 0x00);
+
+    len = p - alloc.buf;
+    len = alloc.used - len;
+    name_len = strlen(s);
+    ASSERT(name_len <= ACPI_NAMESEG_LEN);
+
+    if ( !aml_buf_alloc(ACPI_NAMESEG_LEN) )
+        return -1;
+    if ( len )
+        memmove(p + ACPI_NAMESEG_LEN, p, len);
+    memcpy(p, s, name_len);
+    memcpy(p + name_len, "____", ACPI_NAMESEG_LEN - name_len);
+
+    return 0;
+}
+
+enum {
+    PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
+    PACKAGE_LENGTH_2BYTE_SHIFT = 4,
+    PACKAGE_LENGTH_3BYTE_SHIFT = 12,
+    PACKAGE_LENGTH_4BYTE_SHIFT = 20,
+};
+
+/*
+ * On success, an object in the following form is stored at @pkg.
+ *   AML encoding of package length @length
+ *   the original content in @pkg
+ *
+ * Refer to ACPI spec 6.1, Sec 20.2.4 "Package Length Encoding".
+ */
+static int build_prepend_package_length(uint8_t *pkg, uint32_t length)
+{
+    int rc = 0;
+    uint8_t byte;
+    unsigned length_bytes;
+
+    if ( length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT) )
+        length_bytes = 1;
+    else if ( length + 2 < (1 << PACKAGE_LENGTH_3BYTE_SHIFT) )
+        length_bytes = 2;
+    else if ( length + 3 < (1 << PACKAGE_LENGTH_4BYTE_SHIFT) )
+        length_bytes = 3;
+    else
+        length_bytes = 4;
+
+    length += length_bytes;
+
+    switch ( length_bytes )
+    {
+    case 1:
+        byte = length;
+        return build_prepend_byte(pkg, byte);
+
+    case 4:
+        byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT;
+        if ( build_prepend_byte(pkg, byte) )
+            break;
+        length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1;
+        /* fall through */
+    case 3:
+        byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT;
+        if ( build_prepend_byte(pkg, byte) )
+            break;
+        length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1;
+        /* fall through */
+    case 2:
+        byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT;
+        if ( build_prepend_byte(pkg, byte) )
+            break;
+        length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1;
+        /* fall through */
+    }
+
+    if ( !rc )
+    {
+        /*
+         * Most significant two bits of byte zero indicate how many
+         * following bytes are in PkgLength encoding.
+         */
+        byte = ((length_bytes - 1) << PACKAGE_LENGTH_1BYTE_SHIFT) | length;
+        rc = build_prepend_byte(pkg, byte);
+    }
+
+    return rc;
+}
+
+/*
+ * On success, an object in the following form is stored at @buf.
+ *   @op
+ *   AML encoding of package length of @buf
+ *   original content in @buf
+ *
+ * Refer to comments of callers for ACPI spec sections.
+ */
+static int build_prepend_package(uint8_t *buf, uint8_t op)
+{
+    uint32_t length = get_package_length(buf);
+
+    if ( !build_prepend_package_length(buf, length) )
+        return build_prepend_byte(buf, op);
+    else
+        return -1;
+}
+
+/*
+ * On success, an object in the following form is stored at @buf.
+ *   AML_OP_EXT
+ *   @op
+ *   AML encoding of package length of @buf
+ *   original content in @buf
+ *
+ * Refer to comments of callers for ACPI spec sections.
+ */
+static int build_prepend_ext_package(uint8_t *buf, uint8_t op)
+{
+    if ( !build_prepend_package(buf, op) )
+        return build_prepend_byte(buf, AML_OP_EXT);
+    else
+        return -1;
+}
+
+void *aml_build_begin(struct acpi_ctxt *ctxt)
+{
+    uint32_t align = ctxt->min_alloc_byte_align;
+
+    alloc.ctxt = ctxt;
+    alloc.buf = ctxt->mem_ops.alloc(ctxt, align, align);
+    alloc.capacity = align;
+    alloc.used = 0;
+
+    return alloc.buf;
+}
+
+uint32_t aml_build_end(void)
+{
+    return alloc.used;
+}
+
+/*
+ * On success, an object in the following form is stored at @buf.
+ *   the first @length bytes in @blob
+ *   the original content in @buf
+ */
+int aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t blob_length)
+{
+    uint32_t len;
+
+    ASSERT(buf >= alloc.buf);
+    len = buf - alloc.buf;
+    ASSERT(alloc.used >= len);
+    len = alloc.used - len;
+
+    if ( !aml_buf_alloc(blob_length) )
+        return -1;
+    if ( len )
+        memmove(buf + blob_length, buf, len);
+
+    memcpy(buf, blob, blob_length);
+
+    return 0;
+}
+
+/*
+ * On success, an object decoded as below is stored at @buf.
+ *   Device (@name)
+ *   {
+ *     the original content in @buf
+ *   }
+ *
+ * Refer to ACPI spec 6.1, Sec 20.2.5.2 "Named Objects Encoding" -
+ * "DefDevice".
+ */
+int aml_prepend_device(uint8_t *buf, const char *name)
+{
+    if ( !build_prepend_name(buf, name) )
+        return build_prepend_ext_package(buf, AML_OP_DEVICE);
+    else
+        return -1;
+}
+
+/*
+ * On success, an object decoded as below is stored at @buf.
+ *   Scope (@name)
+ *   {
+ *     the original content in @buf
+ *   }
+ *
+ * Refer to ACPI spec 6.1, Sec 20.2.5.1 "Namespace Modifier Objects
+ * Encoding" - "DefScope".
+ */
+int aml_prepend_scope(uint8_t *buf, const char *name)
+{
+    if ( !build_prepend_name(buf, name) )
+        return build_prepend_package(buf, AML_OP_SCOPE);
+    else
+        return -1;
+}
diff --git a/tools/libacpi/aml_build.h b/tools/libacpi/aml_build.h
new file mode 100644
index 0000000000..30acc0f7a1
--- /dev/null
+++ b/tools/libacpi/aml_build.h
@@ -0,0 +1,116 @@
+/*
+ * tools/libacpi/aml_build.h
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _AML_BUILD_H_
+#define _AML_BUILD_H_
+
+#include <stdint.h>
+#include "libacpi.h"
+
+/*
+ * NB: All aml_prepend_* calls, which build AML code in one ACPI
+ *     table, should be placed between a pair of calls to
+ *     aml_build_begin() and aml_build_end(). Nested aml_build_begin()
+ *     and aml_build_end() are not supported.
+ *
+ * NB: If a call to aml_prepend_*() fails, the AML builder buffer
+ *     will be in an inconsistent state, and any following calls to
+ *     aml_prepend_*() will result in undefined behavior.
+ */
+
+/**
+ * Reset the AML builder and begin a new round of building.
+ *
+ * Parameters:
+ *   ctxt: ACPI context used by the AML builder
+ *
+ * Returns:
+ *   a pointer to the builder buffer where the AML code will be stored
+ */
+void *aml_build_begin(struct acpi_ctxt *ctxt);
+
+/**
+ * Mark the end of a round of AML building.
+ *
+ * Returns:
+ *  the number of bytes in the builder buffer built in this round
+ */
+uint32_t aml_build_end(void);
+
+/**
+ * Prepend a blob, which can contain arbitrary content, to the builder buffer.
+ *
+ * On success, an object in the following form is stored at @buf.
+ *   the first @length bytes in @blob
+ *   the original content in @buf
+ *
+ * Parameters:
+ *   buf:    pointer to the builder buffer
+ *   blob:   pointer to the blob
+ *   length: the number of bytes in the blob
+ *
+ * Return:
+ *   0 on success, -1 on failure.
+ */
+int aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t length);
+
+/**
+ * Prepend an AML device structure to the builder buffer. The existing
+ * data in the builder buffer is included in the AML device.
+ *
+ * On success, an object decoded as below is stored at @buf.
+ *   Device (@name)
+ *   {
+ *     the original content in @buf
+ *   }
+ *
+ * Refer to ACPI spec 6.1, Sec 20.2.5.2 "Named Objects Encoding" -
+ * "DefDevice".
+ *
+ * Parameters:
+ *   buf:  pointer to the builder buffer
+ *   name: the name of the device
+ *
+ * Return:
+ *   0 on success, -1 on failure.
+ */
+int aml_prepend_device(uint8_t *buf, const char *name);
+
+/**
+ * Prepend an AML scope structure to the builder buffer. The existing
+ * data in the builder buffer is included in the AML scope.
+ *
+ * On success, an object decoded as below is stored at @buf.
+ *   Scope (@name)
+ *   {
+ *     the original content in @buf
+ *   }
+ *
+ * Refer to ACPI spec 6.1, Sec 20.2.5.1 "Namespace Modifier Objects
+ * Encoding" - "DefScope".
+ *
+ * Parameters:
+ *   buf:  pointer to the builder buffer
+ *   name: the name of the scope
+ *
+ * Return:
+ *   0 on success, -1 on failure.
+ */
+int aml_prepend_scope(uint8_t *buf, const char *name);
+
+#endif /* _AML_BUILD_H_ */
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index aee0a4c374..791c9ad05e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -77,11 +77,12 @@ endif
 
 ACPI_PATH  = $(XEN_ROOT)/tools/libacpi
 DSDT_FILES-$(CONFIG_X86) = dsdt_pvh.c
-ACPI_OBJS  = $(patsubst %.c,%.o,$(DSDT_FILES-y)) build.o static_tables.o
+ACPI_OBJS  = $(patsubst %.c,%.o,$(DSDT_FILES-y)) build.o static_tables.o aml_build.o
 $(DSDT_FILES-y): acpi
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/libxl_x86_acpi.h\"
 vpath build.c $(ACPI_PATH)/
 vpath static_tables.c $(ACPI_PATH)/
+vpath aml_build.c $(ACPI_PATH)/
 LIBXL_OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
 
 .PHONY: acpi
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 34/39] tools/libacpi: add DM ACPI blacklists
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (32 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 33/39] tools/libacpi: add a simple AML builder Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 35/39] tools/libacpi: load ACPI built by the device model Haozhong Zhang
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Jan Beulich, Chao Peng,
	Dan Williams

Some guest ACPI tables and namespace devices are constructed by Xen,
and should not be loaded from device model. This commit adds their
table signatures and device names into two blacklists, which will be
used to check the collisions between guest ACPI constructed by Xen and
guest ACPI passed from device model.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libacpi/build.c   | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h |  5 +++
 2 files changed, 98 insertions(+)

diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9604..493ca48025 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -56,6 +56,76 @@ struct acpi_info {
     uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
 };
 
+/* ACPI tables of following signatures should not appear in DM ACPI */
+static uint64_t dm_acpi_signature_blacklist[64];
+/* ACPI namespace devices of following names should not appear in DM ACPI */
+static const char *dm_acpi_devname_blacklist[64];
+
+static int dm_acpi_blacklist_signature(struct acpi_config *config, uint64_t sig)
+{
+    unsigned int i, nr = ARRAY_SIZE(dm_acpi_signature_blacklist);
+
+    if ( !(config->table_flags & ACPI_HAS_DM) )
+        return 0;
+
+    for ( i = 0; i < nr; i++ )
+    {
+        uint64_t entry = dm_acpi_signature_blacklist[i];
+
+        if ( entry == sig )
+            return 0;
+        else if ( entry == 0 )
+            break;
+    }
+
+    if ( i >= nr )
+    {
+        config->table_flags &= ~ACPI_HAS_DM;
+
+        printf("ERROR: DM ACPI signature blacklist is full (size %u), "
+               "disable DM ACPI\n", nr);
+
+        return -ENOSPC;
+    }
+
+    dm_acpi_signature_blacklist[i] = sig;
+
+    return 0;
+}
+
+static int dm_acpi_blacklist_devname(struct acpi_config *config,
+                                     const char *devname)
+{
+    unsigned int i, nr = ARRAY_SIZE(dm_acpi_devname_blacklist);
+
+    if ( !(config->table_flags & ACPI_HAS_DM) )
+        return 0;
+
+    for ( i = 0; i < nr; i++ )
+    {
+        const char *entry = dm_acpi_devname_blacklist[i];
+
+        if ( !entry )
+            break;
+        if ( !strncmp(entry, devname, 4) )
+            return 0;
+    }
+
+    if ( i >= nr )
+    {
+        config->table_flags &= ~ACPI_HAS_DM;
+
+        printf("ERROR: DM ACPI devname blacklist is full (size %u), "
+               "disable loading DM ACPI\n", nr);
+
+        return -ENOSPC;
+    }
+
+    dm_acpi_devname_blacklist[i] = devname;
+
+    return 0;
+}
+
 static void set_checksum(
     void *table, uint32_t checksum_offset, uint32_t length)
 {
@@ -360,6 +430,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         madt = construct_madt(ctxt, config, info);
         if (!madt) return -1;
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, madt);
+        dm_acpi_blacklist_signature(config, madt->header.signature);
     }
 
     /* HPET. */
@@ -368,6 +439,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         hpet = construct_hpet(ctxt, config);
         if (!hpet) return -1;
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, hpet);
+        dm_acpi_blacklist_signature(config, hpet->header.signature);
     }
 
     /* WAET. */
@@ -377,6 +449,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         if ( !waet )
             return -1;
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, waet);
+        dm_acpi_blacklist_signature(config, waet->header.signature);
     }
 
     if ( config->table_flags & ACPI_HAS_SSDT_PM )
@@ -385,6 +458,9 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         if (!ssdt) return -1;
         memcpy(ssdt, ssdt_pm, sizeof(ssdt_pm));
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt);
+        dm_acpi_blacklist_devname(config, "AC");
+        dm_acpi_blacklist_devname(config, "BAT0");
+        dm_acpi_blacklist_devname(config, "BAT1");
     }
 
     if ( config->table_flags & ACPI_HAS_SSDT_S3 )
@@ -450,6 +526,8 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
                          offsetof(struct acpi_header, checksum),
                          tcpa->header.length);
         }
+        dm_acpi_blacklist_signature(config, tcpa->header.signature);
+        dm_acpi_blacklist_devname(config, "TPM");
     }
 
     /* SRAT and SLIT */
@@ -459,11 +537,17 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         struct acpi_20_slit *slit = construct_slit(ctxt, config);
 
         if ( srat )
+        {
             table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, srat);
+            dm_acpi_blacklist_signature(config, srat->header.signature);
+        }
         else
             printf("Failed to build SRAT, skipping...\n");
         if ( slit )
+        {
             table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, slit);
+            dm_acpi_blacklist_signature(config, slit->header.signature);
+        }
         else
             printf("Failed to build SLIT, skipping...\n");
     }
@@ -543,6 +627,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     facs = ctxt->mem_ops.alloc(ctxt, sizeof(struct acpi_20_facs), 16);
     if (!facs) goto oom;
     memcpy(facs, &Facs, sizeof(struct acpi_20_facs));
+    dm_acpi_blacklist_signature(config, facs->signature);
 
     /*
      * Alternative DSDTs we get linked against. A cover-all DSDT for up to the
@@ -564,6 +649,9 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
         if (!dsdt) goto oom;
         memcpy(dsdt, config->dsdt_anycpu, config->dsdt_anycpu_len);
     }
+    dm_acpi_blacklist_signature(config, ((struct acpi_header *)dsdt)->signature);
+    dm_acpi_blacklist_devname(config, "MEM0");
+    dm_acpi_blacklist_devname(config, "PCI0");
 
     /*
      * N.B. ACPI 1.0 operating systems may not handle FADT with revision 2
@@ -583,6 +671,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(fadt_10,
                  offsetof(struct acpi_header, checksum),
                  sizeof(struct acpi_10_fadt));
+    dm_acpi_blacklist_signature(config, fadt_10->header.signature);
 
     switch ( config->acpi_revision )
     {
@@ -634,6 +723,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
         fadt->iapc_boot_arch |= ACPI_FADT_NO_CMOS_RTC;
     }
     set_checksum(fadt, offsetof(struct acpi_header, checksum), fadt_size);
+    dm_acpi_blacklist_signature(config, fadt->header.signature);
 
     nr_secondaries = construct_secondary_tables(ctxt, secondary_tables,
                  config, acpi_info);
@@ -652,6 +742,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(xsdt,
                  offsetof(struct acpi_header, checksum),
                  xsdt->header.length);
+    dm_acpi_blacklist_signature(config, xsdt->header.signature);
 
     rsdt = ctxt->mem_ops.alloc(ctxt, sizeof(struct acpi_20_rsdt) +
                                sizeof(uint32_t) * nr_secondaries,
@@ -665,6 +756,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(rsdt,
                  offsetof(struct acpi_header, checksum),
                  rsdt->header.length);
+    dm_acpi_blacklist_signature(config, rsdt->header.signature);
 
     /*
      * Fill in low-memory data structures: acpi_info and RSDP.
@@ -680,6 +772,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(rsdp,
                  offsetof(struct acpi_20_rsdp, extended_checksum),
                  sizeof(struct acpi_20_rsdp));
+    dm_acpi_blacklist_signature(config, rsdp->signature);
 
     if ( !new_vm_gid(ctxt, config, acpi_info) )
         goto oom;
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index ab86a35509..87f311bfab 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -36,6 +36,11 @@
 #define ACPI_HAS_8042              (1<<13)
 #define ACPI_HAS_CMOS_RTC          (1<<14)
 #define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
+#define ACPI_HAS_DM                (1<<16)
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
+#endif
 
 struct xen_vmemrange;
 struct acpi_numa {
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 35/39] tools/libacpi: load ACPI built by the device model
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (33 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 34/39] tools/libacpi: add DM ACPI blacklists Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices Haozhong Zhang
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

ACPI tables built by the device model, whose signatures do not
conflict with tables built by Xen except SSDT, are loaded after ACPI
tables built by Xen.

ACPI namespace devices built by the device model, whose names do not
conflict with devices built by Xen, are assembled and placed in SSDTs
after ACPI tables built by Xen.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c |  15 +++
 tools/libacpi/acpi2_0.h         |   2 +
 tools/libacpi/build.c           | 237 ++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h         |   5 +
 4 files changed, 259 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 5b8a4ee9d0..0468fea490 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -1019,6 +1019,21 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     if ( !strncmp(xenstore_read("platform/acpi_laptop_slate", "0"), "1", 1)  )
         config->table_flags |= ACPI_HAS_SSDT_LAPTOP_SLATE;
 
+    s = xenstore_read(HVM_XS_DM_ACPI_ADDRESS, NULL);
+    if ( s )
+    {
+        config->dm.addr = strtoll(s, NULL, 0);
+
+        s = xenstore_read(HVM_XS_DM_ACPI_LENGTH, NULL);
+        if ( s )
+        {
+            config->dm.length = strtoll(s, NULL, 0);
+            config->table_flags |= ACPI_HAS_DM;
+        }
+        else
+            config->dm.addr = 0;
+    }
+
     config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC |
                             ACPI_HAS_WAET | ACPI_HAS_PMTIMER |
                             ACPI_HAS_BUTTONS | ACPI_HAS_VGA |
diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 2619ba32db..365825e6bc 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -435,6 +435,7 @@ struct acpi_20_slit {
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_2_0_SSDT_SIGNATURE ASCII32('S','S','D','T')
 
 /*
  * Table revision numbers.
@@ -449,6 +450,7 @@ struct acpi_20_slit {
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
 #define ACPI_2_0_SLIT_REVISION 0x01
+#define ACPI_2_0_SSDT_REVISION 0x02
 
 #pragma pack ()
 
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index 493ca48025..8ec1dfda5f 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -15,6 +15,7 @@
 
 #include LIBACPI_STDUTILS
 #include "acpi2_0.h"
+#include "aml_build.h"
 #include "libacpi.h"
 #include "ssdt_s3.h"
 #include "ssdt_s4.h"
@@ -56,6 +57,9 @@ struct acpi_info {
     uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
 };
 
+#define DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */
+#define DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML of an ACPI namespace device */
+
 /* ACPI tables of following signatures should not appear in DM ACPI */
 static uint64_t dm_acpi_signature_blacklist[64];
 /* ACPI namespace devices of following names should not appear in DM ACPI */
@@ -141,6 +145,233 @@ static void set_checksum(
     p[checksum_offset] = -sum;
 }
 
+static bool has_dm_tables(struct acpi_ctxt *ctxt,
+                          const struct acpi_config *config)
+{
+    char **dir;
+    unsigned int num;
+
+    if ( !(config->table_flags & ACPI_HAS_DM) || !config->dm.addr )
+        return false;
+
+    dir = ctxt->xs_ops.directory(ctxt, HVM_XS_DM_ACPI_ROOT, &num);
+    if ( !dir || !num )
+        return false;
+
+    return true;
+}
+
+/* Return true if no collision is found. */
+static bool check_signature_collision(uint64_t sig)
+{
+    unsigned int i;
+    for ( i = 0; i < ARRAY_SIZE(dm_acpi_signature_blacklist); i++ )
+    {
+        if ( sig == dm_acpi_signature_blacklist[i] )
+            return false;
+    }
+    return true;
+}
+
+/* Return true if no collision is found. */
+static int check_devname_collision(const char *name)
+{
+    unsigned int i;
+    for ( i = 0; i < ARRAY_SIZE(dm_acpi_devname_blacklist); i++ )
+    {
+        if ( !strncmp(name, dm_acpi_devname_blacklist[i], 4) )
+            return false;
+    }
+    return true;
+}
+
+static const char *xs_read_dm_acpi_blob_key(struct acpi_ctxt *ctxt,
+                                            const char *name, const char *key)
+{
+/*
+ * @name is supposed to be 4 characters at most, and the longest @key
+ * so far is 'address' (7), so 30 characters is enough to hold the
+ * longest path HVM_XS_DM_ACPI_ROOT/name/key.
+ */
+#define DM_ACPI_BLOB_PATH_MAX_LENGTH   30
+    char path[DM_ACPI_BLOB_PATH_MAX_LENGTH];
+    snprintf(path, DM_ACPI_BLOB_PATH_MAX_LENGTH, HVM_XS_DM_ACPI_ROOT"/%s/%s",
+             name, key);
+    return ctxt->xs_ops.read(ctxt, path);
+}
+
+static bool construct_dm_table(struct acpi_ctxt *ctxt,
+                               unsigned long *table_ptrs,
+                               unsigned int nr_tables,
+                               const void *blob, uint32_t length)
+{
+    const struct acpi_header *header = blob;
+    uint8_t *buffer;
+
+    if ( !check_signature_collision(header->signature) )
+        return false;
+
+    if ( header->length > length || header->length == 0 )
+        return false;
+
+    buffer = ctxt->mem_ops.alloc(ctxt, header->length, 16);
+    if ( !buffer )
+        return false;
+    memcpy(buffer, header, header->length);
+
+    /* some device models (e.g. QEMU) does not set checksum */
+    set_checksum(buffer, offsetof(struct acpi_header, checksum),
+                 header->length);
+
+    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
+
+    return true;
+}
+
+static bool construct_dm_nsdev(struct acpi_ctxt *ctxt,
+                               unsigned long *table_ptrs,
+                               unsigned int nr_tables,
+                               const char *dev_name,
+                               const void *blob, uint32_t blob_length)
+{
+    struct acpi_header ssdt, *header;
+    uint8_t *buffer;
+    int rc;
+
+    if ( !check_devname_collision(dev_name) )
+        return false;
+
+#define AML_BUILD(STMT)           \
+    do {                          \
+        rc = STMT;                \
+        if ( rc )                 \
+            goto out;             \
+    } while (0)
+
+    /* built ACPI namespace device from [name, blob] */
+    buffer = aml_build_begin(ctxt);
+    if ( !buffer )
+        return false;
+
+    AML_BUILD(aml_prepend_blob(buffer, blob, blob_length));
+    AML_BUILD(aml_prepend_device(buffer, dev_name));
+    AML_BUILD((aml_prepend_scope(buffer, "\\_SB")));
+
+    /* build SSDT header */
+    ssdt.signature = ACPI_2_0_SSDT_SIGNATURE;
+    ssdt.revision = ACPI_2_0_SSDT_REVISION;
+    fixed_strcpy(ssdt.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(ssdt.oem_table_id, ACPI_OEM_TABLE_ID);
+    ssdt.oem_revision = ACPI_OEM_REVISION;
+    ssdt.creator_id = ACPI_CREATOR_ID;
+    ssdt.creator_revision = ACPI_CREATOR_REVISION;
+
+    /* prepend SSDT header to ACPI namespace device */
+    AML_BUILD(aml_prepend_blob(buffer, &ssdt, sizeof(ssdt)));
+    header = (struct acpi_header *) buffer;
+
+out:
+    header->length = aml_build_end();
+
+    if ( rc )
+        return false;
+
+    /* calculate checksum of SSDT */
+    set_checksum(header, offsetof(struct acpi_header, checksum),
+                 header->length);
+
+    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
+
+    return true;
+}
+
+/*
+ * All ACPI stuffs built by the device model are placed in the guest
+ * buffer whose address and size are specified by config->dm.{addr, length},
+ * or XenStore keys HVM_XS_DM_ACPI_{ADDRESS, LENGTH}.
+ *
+ * The data layout within the buffer is further specified by XenStore
+ * directories under HVM_XS_DM_ACPI_ROOT. Each directory specifies a
+ * data blob and contains following XenStore keys:
+ *
+ * - "type":
+ *   * DM_ACPI_BLOB_TYPE_TABLE
+ *     The data blob specified by this directory is an ACPI table.
+ *   * DM_ACPI_BLOB_TYPE_NSDEV
+ *     The data blob specified by this directory is an ACPI namespace device.
+ *     Its name is specified by the directory name, while the AML code of the
+ *     body of the AML device structure is in the data blob.
+ *
+ * - "length": the number of bytes in this data blob.
+ *
+ * - "offset": the offset in bytes of this data blob from the beginning of buffer
+ */
+static int construct_dm_tables(struct acpi_ctxt *ctxt,
+                               unsigned long *table_ptrs,
+                               unsigned int nr_tables,
+                               struct acpi_config *config)
+{
+    const char *s;
+    char **dir;
+    uint8_t type;
+    void *blob;
+    unsigned int num, length, offset, i, nr_added = 0;
+
+    if ( !config->dm.addr )
+        return 0;
+
+    dir = ctxt->xs_ops.directory(ctxt, HVM_XS_DM_ACPI_ROOT, &num);
+    if ( !dir || !num )
+        return 0;
+
+    if ( num > ACPI_MAX_SECONDARY_TABLES - nr_tables )
+        return 0;
+
+    for ( i = 0; i < num; i++, dir++ )
+    {
+        if ( *dir == NULL )
+            continue;
+
+        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "type");
+        if ( !s )
+            continue;
+        type = (uint8_t)strtoll(s, NULL, 0);
+
+        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "length");
+        if ( !s )
+            continue;
+        length = (uint32_t)strtoll(s, NULL, 0);
+
+        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "offset");
+        if ( !s )
+            continue;
+        offset = (uint32_t)strtoll(s, NULL, 0);
+
+        blob = ctxt->mem_ops.p2v(ctxt, config->dm.addr + offset);
+
+        switch ( type )
+        {
+        case DM_ACPI_BLOB_TYPE_TABLE:
+            nr_added += construct_dm_table(ctxt,
+                                           table_ptrs, nr_tables + nr_added,
+                                           blob, length);
+            break;
+
+        case DM_ACPI_BLOB_TYPE_NSDEV:
+            nr_added += construct_dm_nsdev(ctxt,
+                                           table_ptrs, nr_tables + nr_added,
+                                           *dir, blob, length);
+            break;
+
+        default:
+            /* skip blobs of unknown types */
+            continue;
+        }
+    }
+
+    return nr_added;
+}
+
 static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
                                            const struct acpi_config *config,
                                            struct acpi_info *info)
@@ -556,6 +787,9 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
     nr_tables += construct_passthrough_tables(ctxt, table_ptrs,
                                               nr_tables, config);
 
+    /* Load ACPI passed from device model (e.g. NFIT from QEMU). */
+    nr_tables += construct_dm_tables(ctxt, table_ptrs, nr_tables, config);
+
     table_ptrs[nr_tables] = 0;
     return nr_tables;
 }
@@ -620,6 +854,9 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
         acpi_info->pci_hi_len = config->pci_hi_len;
     }
 
+    if ( !has_dm_tables(ctxt, config) )
+        config->table_flags &= ~ACPI_HAS_DM;
+
     /*
      * Fill in high-memory data structures, starting at @buf.
      */
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 87f311bfab..cd134ff2cf 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -93,6 +93,11 @@ struct acpi_config {
         uint32_t length;
     } pt;
 
+    struct {
+        uint32_t addr;
+        uint32_t length;
+    } dm;
+
     struct acpi_numa numa;
     const struct hvm_info_table *hvminfo;
 
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (34 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 35/39] tools/libacpi: load ACPI built by the device model Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors Haozhong Zhang
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

A new xl domain configuration
   vnvdimms = [ 'type=mfn, backend=START_PMEM_MFN, nr_pages=N', ... ]

is added to specify the virtual NVDIMM devices backed by the specified
host PMEM pages. As the kernel PMEM driver does not work in Dom0 now,
we have to specify MFNs.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 docs/man/xl.cfg.pod.5.in    |  33 +++++++++++++
 tools/libxl/Makefile        |   2 +-
 tools/libxl/libxl.h         |   5 ++
 tools/libxl/libxl_types.idl |  15 ++++++
 tools/libxl/libxl_vnvdimm.c |  49 ++++++++++++++++++++
 tools/xl/xl_parse.c         | 110 +++++++++++++++++++++++++++++++++++++++++++-
 tools/xl/xl_vmcontrol.c     |  15 +++++-
 7 files changed, 226 insertions(+), 3 deletions(-)
 create mode 100644 tools/libxl/libxl_vnvdimm.c

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 79cb2eaea7..092b051561 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1116,6 +1116,39 @@ FIFO-based event channel ABI support up to 131,071 event channels.
 Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
 x86).
 
+=item B<vnvdimms=[ 'VNVDIMM_SPEC', 'VNVDIMM_SPEC', ... ]>
+
+Specifies the virtual NVDIMM devices which are provided to the guest.
+
+Each B<VNVDIMM_SPEC> is a comma-separated list of C<KEY=VALUE> settings
+from the following list:
+
+=over 4
+
+=item B<type=TYPE>
+
+Specifies the type of host backend of the virtual NVDIMM device. Following
+is a list of supported types:
+
+=over 4
+
+=item B<mfn>
+
+backs the virtual NVDIMM device by a contiguous host PMEM region.
+
+=back
+
+=item B<backend=BACKEND>
+
+Specifies the host backend of the virtual NVDIMM device. If C<type=mfn>,
+then B<BACKEND> specifies the start MFN of the host PMEM region.
+
+=item B<nr_pages=NUMBER>
+
+Specifies the number of pages of the host backend.
+
+=back
+
 =back
 
 =head2 Paravirtualised (PV) Guest Specific Options
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 791c9ad05e..b4c2ccb7ff 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -139,7 +139,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom_suspend.o libxl_dom_save.o libxl_usb.o \
 			libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \
 			libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \
-			libxl_9pfs.o libxl_domain.o \
+			libxl_9pfs.o libxl_domain.o libxl_vnvdimm.o \
                         $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 91408b47b5..8156c08ed3 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1474,6 +1474,11 @@ int libxl_get_memory_target_0x040700(libxl_ctx *ctx, uint32_t domid,
                                      uint32_t *out_target)
     LIBXL_EXTERNAL_CALLERS_ONLY;
 
+int libxl_vnvdimm_copy_config(libxl_ctx *ctx,
+                              libxl_domain_config *dst,
+                              const libxl_domain_config *src)
+                              LIBXL_EXTERNAL_CALLERS_ONLY;
+
 /*
  * WARNING
  * This memory management API is unstable even in Xen 4.2.
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 4acc0457f4..ad236de34a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -240,6 +240,10 @@ libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
     (2, "COLO"),
     ])
 
+libxl_vnvdimm_backend_type = Enumeration("vnvdimm_backend_type", [
+    (0, "mfn"),
+    ])
+
 #
 # Complex libxl types
 #
@@ -780,6 +784,16 @@ libxl_device_channel = Struct("device_channel", [
            ])),
 ])
 
+libxl_device_vnvdimm = Struct("device_vnvdimm", [
+    ("backend_domid",   libxl_domid),
+    ("backend_domname", string),
+    ("devid",           libxl_devid),
+    ("nr_pages",        uint64),
+    ("u", KeyedUnion(None, libxl_vnvdimm_backend_type, "backend_type",
+            [("mfn", uint64),
+            ])),
+])
+
 libxl_domain_config = Struct("domain_config", [
     ("c_info", libxl_domain_create_info),
     ("b_info", libxl_domain_build_info),
@@ -798,6 +812,7 @@ libxl_domain_config = Struct("domain_config", [
     ("channels", Array(libxl_device_channel, "num_channels")),
     ("usbctrls", Array(libxl_device_usbctrl, "num_usbctrls")),
     ("usbdevs", Array(libxl_device_usbdev, "num_usbdevs")),
+    ("vnvdimms", Array(libxl_device_vnvdimm, "num_vnvdimms")),
 
     ("on_poweroff", libxl_action_on_shutdown),
     ("on_reboot", libxl_action_on_shutdown),
diff --git a/tools/libxl/libxl_vnvdimm.c b/tools/libxl/libxl_vnvdimm.c
new file mode 100644
index 0000000000..4de8f04303
--- /dev/null
+++ b/tools/libxl/libxl_vnvdimm.c
@@ -0,0 +1,49 @@
+/*
+ * tools/libxl/libxl_vnvdimm.c
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xenctrl.h>
+
+#include "libxl_internal.h"
+
+int libxl_vnvdimm_copy_config(libxl_ctx *ctx,
+                              libxl_domain_config *dst,
+                              const libxl_domain_config *src)
+{
+    GC_INIT(ctx);
+    unsigned int nr = src->num_vnvdimms;
+    libxl_device_vnvdimm *vnvdimms;
+    int rc = 0;
+
+    if (!nr)
+        goto out;
+
+    vnvdimms = libxl__calloc(NOGC, nr, sizeof(*vnvdimms));
+    if (!vnvdimms) {
+        rc = ERROR_NOMEM;
+        goto out;
+    }
+
+    dst->num_vnvdimms = nr;
+    while (nr--)
+        libxl_device_vnvdimm_copy(ctx, &vnvdimms[nr], &src->vnvdimms[nr]);
+    dst->vnvdimms = vnvdimms;
+
+ out:
+    GC_FREE;
+    return rc;
+}
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index ed562a1956..388a135dbf 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -804,13 +804,111 @@ int parse_usbdev_config(libxl_device_usbdev *usbdev, char *token)
     return 0;
 }
 
+static int parse_vnvdimm_config(libxl_device_vnvdimm *vnvdimm, char *token)
+{
+    char *oparg, *endptr;
+    unsigned long val;
+
+    if (MATCH_OPTION("type", token, oparg)) {
+        if (libxl_vnvdimm_backend_type_from_string(oparg,
+                                                   &vnvdimm->backend_type)) {
+            fprintf(stderr,
+                    "ERROR: invalid vNVDIMM backend type '%s'\n",
+                    oparg);
+            return 1;
+        }
+    } else if (MATCH_OPTION("nr_pages", token, oparg)) {
+        val = strtoul(oparg, &endptr, 0);
+        if (endptr == oparg || val == ULONG_MAX)
+        {
+            fprintf(stderr,
+                    "ERROR: invalid number of vNVDIMM backend pages '%s'\n",
+                    oparg);
+            return 1;
+        }
+        vnvdimm->nr_pages = val;
+    } else if (MATCH_OPTION("backend", token, oparg)) {
+        /* Skip: handled by parse_vnvdimms() */
+    } else {
+        fprintf(stderr, "ERROR: unknown string '%s' in vnvdimm spec\n", token);
+        return 1;
+    }
+
+    return 0;
+}
+
+/*
+ * vnvdimms = [ 'type=<mfn>, backend=<base_mfn>, nr_pages=<N>', ... ]
+ */
+static void parse_vnvdimms(XLU_Config *config, libxl_domain_config *d_config)
+{
+    XLU_ConfigList *vnvdimms;
+    const char *buf;
+    int rc;
+
+    rc = xlu_cfg_get_list(config, "vnvdimms", &vnvdimms, 0, 0);
+    if ( rc )
+        return;
+
+#if !defined(__linux__)
+    fprintf(stderr, "ERROR: 'vnvdimms' is only supported on Linux\n");
+    exit(-ERROR_FAIL);
+#endif
+
+    d_config->num_vnvdimms = 0;
+    d_config->vnvdimms = NULL;
+
+    while ((buf = xlu_cfg_get_listitem(vnvdimms,
+                                       d_config->num_vnvdimms)) != NULL) {
+        libxl_device_vnvdimm *vnvdimm =
+            ARRAY_EXTEND_INIT(d_config->vnvdimms, d_config->num_vnvdimms,
+                              libxl_device_vnvdimm_init);
+        char *buf2 = strdup(buf), *backend = NULL, *p, *endptr;
+        unsigned long mfn;
+
+        p = strtok(buf2, ",");
+        if (!p)
+            goto skip_nvdimm;
+
+        do {
+            while (*p == ' ')
+                p++;
+
+            rc = 0;
+            if (!MATCH_OPTION("backend", p, backend))
+                rc = parse_vnvdimm_config(vnvdimm, p);
+            if (rc)
+                exit(-ERROR_FAIL);
+        } while ((p = strtok(NULL, ",")) != NULL);
+
+        switch (vnvdimm->backend_type)
+        {
+        case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+            mfn = strtoul(backend, &endptr, 0);
+            if (endptr == backend || mfn == ULONG_MAX)
+            {
+                fprintf(stderr,
+                        "ERROR: invalid start MFN of host NVDIMM '%s'\n",
+                        backend);
+                exit(-ERROR_FAIL);
+            }
+            vnvdimm->u.mfn = mfn;
+
+            break;
+        }
+
+    skip_nvdimm:
+        free(buf2);
+    }
+}
+
 void parse_config_data(const char *config_source,
                        const char *config_data,
                        int config_len,
                        libxl_domain_config *d_config)
 {
     const char *buf;
-    long l, vcpus = 0, nr_dm_acpi_pages;
+    long l, vcpus = 0, nr_dm_acpi_pages = 0;
     XLU_Config *config;
     XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms,
                    *usbctrls, *usbdevs, *p9devs;
@@ -1942,6 +2040,16 @@ skip_usbdev:
             exit(-ERROR_FAIL);
         }
         b_info->u.hvm.dm_acpi_pages = nr_dm_acpi_pages;
+
+        /* parse 'vnvdimms' */
+        parse_vnvdimms(config, d_config);
+
+        /*
+         * If 'dm_acpi_pages' is not specified, reserve one DM ACPI
+         * page for vNVDIMM devices.
+         */
+        if (d_config->vnvdimms && !nr_dm_acpi_pages)
+            b_info->u.hvm.dm_acpi_pages = 1;
     }
 
     /* If we've already got vfb=[] for PV guest then ignore top level
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25ded..1bdc173e04 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -381,12 +381,25 @@ static void reload_domain_config(uint32_t domid,
     if (rc) {
         LOG("failed to retrieve guest configuration (rc=%d). "
             "reusing old configuration", rc);
-        libxl_domain_config_dispose(&d_config_new);
+        goto error_out;
     } else {
+        rc = libxl_vnvdimm_copy_config(ctx, &d_config_new, d_config);
+        if (rc) {
+            LOG("failed to copy vnvdimm configuration (rc=%d). "
+                "reusing old configuration", rc);
+            libxl_domain_config_dispose(&d_config_new);
+            goto error_out;
+        }
+
         libxl_domain_config_dispose(d_config);
         /* Steal allocations */
         memcpy(d_config, &d_config_new, sizeof(libxl_domain_config));
     }
+
+    return;
+
+ error_out:
+    libxl_domain_config_dispose(&d_config_new);
 }
 
 /* Can update r_domid if domain is destroyed */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (35 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 38/39] tools/libxl: initiate PMEM mapping via QMP callback Haozhong Zhang
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

If some errors happening during QMP initialization can affect the
proper work of a domain, it'd be better to treat them as fatal errors
and abort the creation of that domain. The existing types of QMP
initialization errors are not treated as fatal, and do not abort the
domain creation as before.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_create.c | 4 +++-
 tools/libxl/libxl_qmp.c    | 9 ++++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 9123585b52..3e05ea09e9 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1507,7 +1507,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
     if (dcs->sdss.dm.guest_domid) {
         if (d_config->b_info.device_model_version
             == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
-            libxl__qmp_initializations(gc, domid, d_config);
+            ret = libxl__qmp_initializations(gc, domid, d_config);
+            if (ret == ERROR_BADFAIL)
+                goto error_out;
         }
     }
 
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index eab993aca9..e1eb47c1d2 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1175,11 +1175,12 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
 {
     const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
     libxl__qmp_handler *qmp = NULL;
-    int ret = 0;
+    bool ignore_error = true;
+    int ret = -1;
 
     qmp = libxl__qmp_initialize(gc, domid);
     if (!qmp)
-        return -1;
+        goto out;
     ret = libxl__qmp_query_serial(qmp);
     if (!ret && vnc && vnc->passwd) {
         ret = qmp_change(gc, qmp, "vnc", "password", vnc->passwd);
@@ -1189,7 +1190,9 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
         ret = qmp_query_vnc(qmp);
     }
     libxl__qmp_close(qmp);
-    return ret;
+
+ out:
+    return ret ? (ignore_error ? ERROR_FAIL : ERROR_BADFAIL) : 0;
 }
 
 /*
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 38/39] tools/libxl: initiate PMEM mapping via QMP callback
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (36 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:38 ` [RFC XEN PATCH v3 39/39] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

The base guest physical address of each vNVDIMM device is decided by
QEMU. Add a QMP callback to get the base address from QEMU and query Xen
hypervisor to map host PMEM pages to that address.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_qmp.c     | 130 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_vnvdimm.c |  30 ++++++++++
 tools/libxl/libxl_vnvdimm.h |  30 ++++++++++
 3 files changed, 190 insertions(+)
 create mode 100644 tools/libxl/libxl_vnvdimm.h

diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index e1eb47c1d2..299f9c8260 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -26,6 +26,7 @@
 
 #include "_libxl_list.h"
 #include "libxl_internal.h"
+#include "libxl_vnvdimm.h"
 
 /* #define DEBUG_RECEIVED */
 
@@ -1170,6 +1171,127 @@ int libxl_qemu_monitor_command(libxl_ctx *ctx, uint32_t domid,
     return rc;
 }
 
+#if defined(__linux__)
+
+static int qmp_register_vnvdimm_callback(libxl__qmp_handler *qmp,
+                                         const libxl__json_object *o,
+                                         void *arg)
+{
+    GC_INIT(qmp->ctx);
+    const libxl_domain_config *guest_config = arg;
+    const libxl_device_vnvdimm *vnvdimm;
+    const libxl__json_object *obj, *sub_map, *sub_obj;
+    const char *id, *expected_id;
+    unsigned int i, slot;
+    unsigned long gpa, size, mfn, gpfn, nr_pages;
+    int rc = 0;
+
+    for (i = 0; (obj = libxl__json_array_get(o, i)); i++) {
+        if (!libxl__json_object_is_map(obj))
+            continue;
+
+        sub_map = libxl__json_map_get("data", obj, JSON_MAP);
+        if (!sub_map)
+            continue;
+
+        sub_obj = libxl__json_map_get("slot", sub_map, JSON_INTEGER);
+        slot = libxl__json_object_get_integer(sub_obj);
+        if (slot > guest_config->num_vnvdimms) {
+            LOG(ERROR,
+                "Invalid QEMU memory device slot %u, expecting less than %u",
+                slot, guest_config->num_vnvdimms);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+        vnvdimm = &guest_config->vnvdimms[slot];
+
+        /*
+         * Double check whether it's a NVDIMM memory device, through
+         * all memory devices in QEMU on Xen are for vNVDIMM.
+         */
+        expected_id = libxl__sprintf(gc, "xen_nvdimm%u", slot + 1);
+        if (!expected_id) {
+            LOG(ERROR, "Cannot build device id");
+            rc = -ERROR_FAIL;
+            goto out;
+        }
+        sub_obj = libxl__json_map_get("id", sub_map, JSON_STRING);
+        id = libxl__json_object_get_string(sub_obj);
+        if (!id || strncmp(id, expected_id, strlen(expected_id))) {
+            LOG(ERROR,
+                "Invalid QEMU memory device id %s, expecting %s",
+                id, expected_id);
+            rc = -ERROR_FAIL;
+            goto out;
+        }
+
+        sub_obj = libxl__json_map_get("addr", sub_map, JSON_INTEGER);
+        gpa = libxl__json_object_get_integer(sub_obj);
+        sub_obj = libxl__json_map_get("size", sub_map, JSON_INTEGER);
+        size = libxl__json_object_get_integer(sub_obj);
+        if ((gpa | size) & ~XC_PAGE_MASK) {
+            LOG(ERROR,
+                "Invalid address 0x%lx or size 0x%lx of QEMU memory device %s, "
+                "not aligned to 0x%lx",
+                gpa, size, id, XC_PAGE_SIZE);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+        gpfn = gpa >> XC_PAGE_SHIFT;
+
+        nr_pages = size >> XC_PAGE_SHIFT;
+        if (nr_pages > vnvdimm->nr_pages) {
+            LOG(ERROR,
+                "Invalid size 0x%lx of QEMU memory device %s, "
+                "expecting no larger than 0x%lx",
+                size, id, vnvdimm->nr_pages << XC_PAGE_SHIFT);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+
+        switch (vnvdimm->backend_type) {
+        case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+            mfn = vnvdimm->u.mfn;
+            break;
+
+        default:
+            LOG(ERROR, "Invalid NVDIMM backend type %u", vnvdimm->backend_type);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+
+        rc = libxl_vnvdimm_add_pages(gc, qmp->domid, mfn, gpfn, nr_pages);
+        if (rc) {
+            LOG(ERROR,
+                "Cannot map PMEM pages for QEMU memory device %s, "
+                "mfn 0x%lx, gpfn 0x%lx, nr 0x%lx, rc %d",
+                id, mfn, gpfn, nr_pages, rc);
+            rc = -ERROR_FAIL;
+            goto out;
+        }
+    }
+
+ out:
+    GC_FREE;
+    return rc;
+}
+
+static int libxl__qmp_query_vnvdimms(libxl__qmp_handler *qmp,
+                                     const libxl_domain_config *guest_config)
+{
+    int rc;
+    GC_INIT(qmp->ctx);
+
+    rc = qmp_synchronous_send(qmp, "query-memory-devices", NULL,
+                              qmp_register_vnvdimm_callback,
+                              (void *)guest_config, qmp->timeout);
+
+    GC_FREE;
+    return rc;
+}
+
+#endif /* __linux__ */
+
 int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
                                const libxl_domain_config *guest_config)
 {
@@ -1189,6 +1311,14 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
     if (!ret) {
         ret = qmp_query_vnc(qmp);
     }
+
+#if defined(__linux__)
+    if (!ret && guest_config->num_vnvdimms) {
+        ignore_error = false;
+        ret = libxl__qmp_query_vnvdimms(qmp, guest_config);
+    }
+#endif /* __linux__ */
+
     libxl__qmp_close(qmp);
 
  out:
diff --git a/tools/libxl/libxl_vnvdimm.c b/tools/libxl/libxl_vnvdimm.c
index 4de8f04303..ff786d4177 100644
--- a/tools/libxl/libxl_vnvdimm.c
+++ b/tools/libxl/libxl_vnvdimm.c
@@ -19,6 +19,7 @@
 #include <xenctrl.h>
 
 #include "libxl_internal.h"
+#include "libxl_vnvdimm.h"
 
 int libxl_vnvdimm_copy_config(libxl_ctx *ctx,
                               libxl_domain_config *dst,
@@ -47,3 +48,32 @@ int libxl_vnvdimm_copy_config(libxl_ctx *ctx,
     GC_FREE;
     return rc;
 }
+
+#if defined(__linux__)
+
+int libxl_vnvdimm_add_pages(libxl__gc *gc, uint32_t domid,
+                            xen_pfn_t mfn, xen_pfn_t gpfn, xen_pfn_t nr_pages)
+{
+    unsigned int nr;
+    int ret;
+
+    while (nr_pages) {
+        nr = min(nr_pages, (unsigned long)UINT_MAX);
+
+        ret = xc_domain_populate_pmem_map(CTX->xch, domid, mfn, gpfn, nr);
+        if (ret && ret != -ERESTART) {
+            LOG(ERROR, "failed to map PMEM pages, mfn 0x%" PRI_xen_pfn ", "
+                "gpfn 0x%" PRI_xen_pfn ", nr_pages %u, err %d",
+                mfn, gpfn, nr, ret);
+            break;
+        }
+
+        nr_pages -= nr;
+        mfn += nr;
+        gpfn += nr;
+    }
+
+    return ret;
+}
+
+#endif /* __linux__ */
diff --git a/tools/libxl/libxl_vnvdimm.h b/tools/libxl/libxl_vnvdimm.h
new file mode 100644
index 0000000000..ec63c95088
--- /dev/null
+++ b/tools/libxl/libxl_vnvdimm.h
@@ -0,0 +1,30 @@
+/*
+ * tools/libxl/libxl_vnvdimm.h
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef LIBXL_VNVDIMM_H
+#define LIBXL_VNVDIMM_H
+
+#include <stdint.h>
+#include "libxl_internal.h"
+
+#if defined(__linux__)
+int libxl_vnvdimm_add_pages(libxl__gc *gc, uint32_t domid,
+                            xen_pfn_t mfn, xen_pfn_t gpfn, xen_pfn_t nr_pages);
+#endif /* __linux__ */
+
+#endif /* !LIBXL_VNVDIMM_H */
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC XEN PATCH v3 39/39] tools/libxl: build qemu options from xl vNVDIMM configs
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (37 preceding siblings ...)
  2017-09-11  4:38 ` [RFC XEN PATCH v3 38/39] tools/libxl: initiate PMEM mapping via QMP callback Haozhong Zhang
@ 2017-09-11  4:38 ` Haozhong Zhang
  2017-09-11  4:41   ` Haozhong Zhang
  2017-10-27  3:26 ` [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Chao Peng
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

For xl configs
  vnvdimms = [ 'type=mfn,backend=$PMEM0_MFN,nr_pages=$N0', ... ]

the following qemu options will be built

  -machine <existing options>,nvdimm
  -m <existing options>,slots=$NR_SLOTS,maxmem=$MEM_SIZE
  -object memory-backend-xen,id=mem1,host-addr=$PMEM0_ADDR,size=$PMEM0_SIZE
  -device nvdimm,id=xen_nvdimm1,memdev=mem1
  ...

in which,
 - NR_SLOTS is the number of entries in vnvdimms + 1,
 - MEM_SIZE is the total size of all RAM and NVDIMM devices,
 - PMEM0_ADDR = PMEM0_MFN * 4096,
 - PMEM0_SIZE = N0 * 4096,

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_dm.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 79 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index e0e6a99e67..9bdb3cdb29 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -910,6 +910,58 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *target_path,
     return drive;
 }
 
+#if defined(__linux__)
+
+static uint64_t libxl__build_dm_vnvdimm_args(
+    libxl__gc *gc, flexarray_t *dm_args,
+    struct libxl_device_vnvdimm *dev, int dev_no)
+{
+    uint64_t addr = 0, size = 0;
+    char *arg;
+
+    switch (dev->backend_type)
+    {
+    case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+        addr = dev->u.mfn << XC_PAGE_SHIFT;
+        size = dev->nr_pages << XC_PAGE_SHIFT;
+        break;
+    }
+
+    if (!size)
+        return 0;
+
+    flexarray_append(dm_args, "-object");
+    arg = GCSPRINTF("memory-backend-xen,id=mem%d,host-addr=%"PRIu64",size=%"PRIu64,
+                    dev_no + 1, addr, size);
+    flexarray_append(dm_args, arg);
+
+    flexarray_append(dm_args, "-device");
+    arg = GCSPRINTF("nvdimm,id=xen_nvdimm%d,memdev=mem%d",
+                    dev_no + 1, dev_no + 1);
+    flexarray_append(dm_args, arg);
+
+    return size;
+}
+
+static uint64_t libxl__build_dm_vnvdimms_args(
+    libxl__gc *gc, flexarray_t *dm_args,
+    struct libxl_device_vnvdimm *vnvdimms, int num_vnvdimms)
+{
+    uint64_t total_size = 0, size;
+    unsigned int i;
+
+    for (i = 0; i < num_vnvdimms; i++) {
+        size = libxl__build_dm_vnvdimm_args(gc, dm_args, &vnvdimms[i], i);
+        if (!size)
+            break;
+        total_size += size;
+    }
+
+    return total_size;
+}
+
+#endif /* __linux__ */
+
 static int libxl__build_device_model_args_new(libxl__gc *gc,
                                         const char *dm, int guest_domid,
                                         const libxl_domain_config *guest_config,
@@ -923,13 +975,18 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
     const libxl_device_nic *nics = guest_config->nics;
     const int num_disks = guest_config->num_disks;
     const int num_nics = guest_config->num_nics;
+#if defined(__linux__)
+    const int num_vnvdimms = guest_config->num_vnvdimms;
+#else
+    const int num_vnvdimms = 0;
+#endif
     const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
     const libxl_sdl_info *sdl = dm_sdl(guest_config);
     const char *keymap = dm_keymap(guest_config);
     char *machinearg;
     flexarray_t *dm_args, *dm_envs;
     int i, connection, devid, ret;
-    uint64_t ram_size;
+    uint64_t ram_size, ram_size_in_byte = 0, vnvdimms_size = 0;
     const char *path, *chardev;
     char *user = NULL;
 
@@ -1451,6 +1508,9 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
             }
         }
 
+        if (num_vnvdimms)
+            machinearg = libxl__sprintf(gc, "%s,nvdimm", machinearg);
+
         flexarray_append(dm_args, machinearg);
         for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++)
             flexarray_append(dm_args, b_info->extra_hvm[i]);
@@ -1460,8 +1520,25 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
     }
 
     ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb);
+    if (num_vnvdimms) {
+        ram_size_in_byte = ram_size << 20;
+        vnvdimms_size = libxl__build_dm_vnvdimms_args(gc, dm_args,
+                                                      guest_config->vnvdimms,
+                                                      num_vnvdimms);
+        if (ram_size_in_byte + vnvdimms_size < ram_size_in_byte) {
+            LOG(ERROR,
+                "total size of RAM (%"PRIu64") and NVDIMM (%"PRIu64") overflow",
+                ram_size_in_byte, vnvdimms_size);
+            return ERROR_INVAL;
+        }
+    }
     flexarray_append(dm_args, "-m");
-    flexarray_append(dm_args, GCSPRINTF("%"PRId64, ram_size));
+    flexarray_append(dm_args,
+                     vnvdimms_size ?
+                     GCSPRINTF("%"PRId64",slots=%d,maxmem=%"PRId64,
+                               ram_size, num_vnvdimms + 1,
+                               ROUNDUP(ram_size_in_byte, 12) + vnvdimms_size) :
+                     GCSPRINTF("%"PRId64, ram_size));
 
     if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
         if (b_info->u.hvm.hdtype == LIBXL_HDTYPE_AHCI)
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
@ 2017-09-11  4:41   ` Haozhong Zhang
  2017-09-11  4:37 ` [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table() Haozhong Zhang
                     ` (39 subsequent siblings)
  40 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Eduardo Habkost, Igor Mammedov, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson,
	Stefano Stabellini, Anthony Perard

This is the QEMU part patches that works with the associated Xen
patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
guest address space for vNVDIMM devices.

All patches can be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3

Patch 1 is to avoid dereferencing the NULL pointer to non-existing
label data, as the Xen side support for labels is not implemented yet.

Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
memory region for Xen guest, in order to make the existing nvdimm
device plugging path work on Xen.

Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
used as the Xen device model.


Haozhong Zhang (10):
  nvdimm: do not intiailize nvdimm->label_data if label size is zero
  hw/xen-hvm: create the hotplug memory region on Xen
  hostmem-xen: add a host memory backend for Xen
  nvdimm acpi: do not use fw_cfg on Xen
  hw/xen-hvm: initialize DM ACPI
  hw/xen-hvm: add function to copy ACPI into guest memory
  nvdimm acpi: copy NFIT to Xen guest
  nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
  nvdimm acpi: do not build _FIT method on Xen
  hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled

 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 ++++++++++++++++++++++++++
 backends/hostmem.c     |   9 +++
 hw/acpi/aml-build.c    |  10 ++-
 hw/acpi/nvdimm.c       |  79 ++++++++++++++-----
 hw/i386/pc.c           | 102 ++++++++++++++-----------
 hw/i386/xen/xen-hvm.c  | 204 ++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/mem/nvdimm.c        |  10 ++-
 hw/mem/pc-dimm.c       |   6 +-
 include/hw/i386/pc.h   |   1 +
 include/hw/xen/xen.h   |  25 ++++++
 stubs/xen-hvm.c        |  10 +++
 12 files changed, 495 insertions(+), 70 deletions(-)
 create mode 100644 backends/hostmem-xen.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-09-11  4:41   ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Igor Mammedov, Anthony Perard,
	Chao Peng, Dan Williams, Richard Henderson, Xiao Guangrong

This is the QEMU part patches that works with the associated Xen
patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
guest address space for vNVDIMM devices.

All patches can be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3

Patch 1 is to avoid dereferencing the NULL pointer to non-existing
label data, as the Xen side support for labels is not implemented yet.

Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
memory region for Xen guest, in order to make the existing nvdimm
device plugging path work on Xen.

Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
used as the Xen device model.


Haozhong Zhang (10):
  nvdimm: do not intiailize nvdimm->label_data if label size is zero
  hw/xen-hvm: create the hotplug memory region on Xen
  hostmem-xen: add a host memory backend for Xen
  nvdimm acpi: do not use fw_cfg on Xen
  hw/xen-hvm: initialize DM ACPI
  hw/xen-hvm: add function to copy ACPI into guest memory
  nvdimm acpi: copy NFIT to Xen guest
  nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
  nvdimm acpi: do not build _FIT method on Xen
  hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled

 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 ++++++++++++++++++++++++++
 backends/hostmem.c     |   9 +++
 hw/acpi/aml-build.c    |  10 ++-
 hw/acpi/nvdimm.c       |  79 ++++++++++++++-----
 hw/i386/pc.c           | 102 ++++++++++++++-----------
 hw/i386/xen/xen-hvm.c  | 204 ++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/mem/nvdimm.c        |  10 ++-
 hw/mem/pc-dimm.c       |   6 +-
 include/hw/i386/pc.h   |   1 +
 include/hw/xen/xen.h   |  25 ++++++
 stubs/xen-hvm.c        |  10 +++
 12 files changed, 495 insertions(+), 70 deletions(-)
 create mode 100644 backends/hostmem-xen.c

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 01/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Xiao Guangrong, Michael S. Tsirkin, Igor Mammedov

The memory region of vNVDIMM on Xen is a RAM memory region, so
memory_region_get_ram_ptr() cannot be used in nvdimm_realize() to get
a pointer to the label data area in that region. To be worse, it may
abort QEMU. As Xen currently does not support labels (i.e. label size
is 0) and every access in QEMU to labels is led by a label size check,
let's not intiailize nvdimm->label_data if the label size is 0.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem/nvdimm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 952fce5ec8..3e58538b99 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -87,7 +87,15 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
     align = memory_region_get_alignment(mr);
 
     pmem_size = size - nvdimm->label_size;
-    nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
+    /*
+     * The memory region of vNVDIMM on Xen is not a RAM memory region,
+     * so memory_region_get_ram_ptr() below will abort QEMU. In
+     * addition that Xen currently does not support vNVDIMM labels
+     * (i.e. label_size is zero here), let's not initialize of the
+     * pointer to label data if the label size is zero.
+     */
+    if (nvdimm->label_size)
+        nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
     pmem_size = QEMU_ALIGN_DOWN(pmem_size, align);
 
     if (size <= nvdimm->label_size || !pmem_size) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 01/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Michael S. Tsirkin,
	Igor Mammedov, Chao Peng, Dan Williams

The memory region of vNVDIMM on Xen is a RAM memory region, so
memory_region_get_ram_ptr() cannot be used in nvdimm_realize() to get
a pointer to the label data area in that region. To be worse, it may
abort QEMU. As Xen currently does not support labels (i.e. label size
is 0) and every access in QEMU to labels is led by a label size check,
let's not intiailize nvdimm->label_data if the label size is 0.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem/nvdimm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 952fce5ec8..3e58538b99 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -87,7 +87,15 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
     align = memory_region_get_alignment(mr);
 
     pmem_size = size - nvdimm->label_size;
-    nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
+    /*
+     * The memory region of vNVDIMM on Xen is not a RAM memory region,
+     * so memory_region_get_ram_ptr() below will abort QEMU. In
+     * addition that Xen currently does not support vNVDIMM labels
+     * (i.e. label_size is zero here), let's not initialize of the
+     * pointer to label data if the label size is zero.
+     */
+    if (nvdimm->label_size)
+        nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
     pmem_size = QEMU_ALIGN_DOWN(pmem_size, align);
 
     if (size <= nvdimm->label_size || !pmem_size) {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 02/10] hw/xen-hvm: create the hotplug memory region on Xen
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin, Stefano Stabellini, Anthony Perard

The guest physical address of vNVDIMM is allocated from the hotplug
memory region, which is not created when QEMU is used as Xen device
model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
the code for pc machine type to create the hotplug memory region for
Xen HVM domains.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
---
 hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
 hw/i386/xen/xen-hvm.c |  2 ++
 include/hw/i386/pc.h  |  1 +
 3 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 21081041d5..5cbdce61a7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1347,6 +1347,53 @@ void xen_load_linux(PCMachineState *pcms)
     pcms->fw_cfg = fw_cfg;
 }
 
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)
+{
+    MachineState *machine = MACHINE(pcms);
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
+
+    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
+        return;
+
+    if (memory_region_size(&pcms->hotplug_memory.mr)) {
+        error_report("hotplug memory region has been initialized");
+        exit(EXIT_FAILURE);
+    }
+
+    if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
+        error_report("unsupported amount of memory slots: %"PRIu64,
+                     machine->ram_slots);
+        exit(EXIT_FAILURE);
+    }
+
+    if (QEMU_ALIGN_UP(machine->maxram_size,
+                      TARGET_PAGE_SIZE) != machine->maxram_size) {
+        error_report("maximum memory size must by aligned to multiple of "
+                     "%d bytes", TARGET_PAGE_SIZE);
+        exit(EXIT_FAILURE);
+    }
+
+    pcms->hotplug_memory.base =
+        ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
+
+    if (pcmc->enforce_aligned_dimm) {
+        /* size hotplug region assuming 1G page max alignment per slot */
+        hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
+    }
+
+    if ((pcms->hotplug_memory.base + hotplug_mem_size) < hotplug_mem_size) {
+        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+                     machine->maxram_size);
+        exit(EXIT_FAILURE);
+    }
+
+    memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
+                       "hotplug-memory", hotplug_mem_size);
+    memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
+                                &pcms->hotplug_memory.mr);
+}
+
 void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *system_memory,
                     MemoryRegion *rom_memory,
@@ -1398,44 +1445,7 @@ void pc_memory_init(PCMachineState *pcms,
     }
 
     /* initialize hotplug memory address space */
-    if (pcmc->has_reserved_memory &&
-        (machine->ram_size < machine->maxram_size)) {
-        ram_addr_t hotplug_mem_size =
-            machine->maxram_size - machine->ram_size;
-
-        if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
-            error_report("unsupported amount of memory slots: %"PRIu64,
-                         machine->ram_slots);
-            exit(EXIT_FAILURE);
-        }
-
-        if (QEMU_ALIGN_UP(machine->maxram_size,
-                          TARGET_PAGE_SIZE) != machine->maxram_size) {
-            error_report("maximum memory size must by aligned to multiple of "
-                         "%d bytes", TARGET_PAGE_SIZE);
-            exit(EXIT_FAILURE);
-        }
-
-        pcms->hotplug_memory.base =
-            ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
-
-        if (pcmc->enforce_aligned_dimm) {
-            /* size hotplug region assuming 1G page max alignment per slot */
-            hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
-        }
-
-        if ((pcms->hotplug_memory.base + hotplug_mem_size) <
-            hotplug_mem_size) {
-            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
-                         machine->maxram_size);
-            exit(EXIT_FAILURE);
-        }
-
-        memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
-                           "hotplug-memory", hotplug_mem_size);
-        memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
-                                    &pcms->hotplug_memory.mr);
-    }
+    pc_memory_hotplug_init(pcms, system_memory);
 
     /* Initialize PC system firmware */
     pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d9ccd5d0d6..90163e1a1b 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -235,6 +235,8 @@ static void xen_ram_init(PCMachineState *pcms,
                                  pcms->above_4g_mem_size);
         memory_region_add_subregion(sysmem, 0x100000000ULL, &ram_hi);
     }
+
+    pc_memory_hotplug_init(pcms, sysmem);
 }
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 8226904524..b65c5dd5ec 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -249,6 +249,7 @@ void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *system_memory,
                     MemoryRegion *rom_memory,
                     MemoryRegion **ram_memory);
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory);
 qemu_irq pc_allocate_cpu_irq(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 02/10] hw/xen-hvm: create the hotplug memory region on Xen
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

The guest physical address of vNVDIMM is allocated from the hotplug
memory region, which is not created when QEMU is used as Xen device
model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
the code for pc machine type to create the hotplug memory region for
Xen HVM domains.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
---
 hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
 hw/i386/xen/xen-hvm.c |  2 ++
 include/hw/i386/pc.h  |  1 +
 3 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 21081041d5..5cbdce61a7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1347,6 +1347,53 @@ void xen_load_linux(PCMachineState *pcms)
     pcms->fw_cfg = fw_cfg;
 }
 
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)
+{
+    MachineState *machine = MACHINE(pcms);
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
+
+    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
+        return;
+
+    if (memory_region_size(&pcms->hotplug_memory.mr)) {
+        error_report("hotplug memory region has been initialized");
+        exit(EXIT_FAILURE);
+    }
+
+    if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
+        error_report("unsupported amount of memory slots: %"PRIu64,
+                     machine->ram_slots);
+        exit(EXIT_FAILURE);
+    }
+
+    if (QEMU_ALIGN_UP(machine->maxram_size,
+                      TARGET_PAGE_SIZE) != machine->maxram_size) {
+        error_report("maximum memory size must by aligned to multiple of "
+                     "%d bytes", TARGET_PAGE_SIZE);
+        exit(EXIT_FAILURE);
+    }
+
+    pcms->hotplug_memory.base =
+        ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
+
+    if (pcmc->enforce_aligned_dimm) {
+        /* size hotplug region assuming 1G page max alignment per slot */
+        hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
+    }
+
+    if ((pcms->hotplug_memory.base + hotplug_mem_size) < hotplug_mem_size) {
+        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+                     machine->maxram_size);
+        exit(EXIT_FAILURE);
+    }
+
+    memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
+                       "hotplug-memory", hotplug_mem_size);
+    memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
+                                &pcms->hotplug_memory.mr);
+}
+
 void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *system_memory,
                     MemoryRegion *rom_memory,
@@ -1398,44 +1445,7 @@ void pc_memory_init(PCMachineState *pcms,
     }
 
     /* initialize hotplug memory address space */
-    if (pcmc->has_reserved_memory &&
-        (machine->ram_size < machine->maxram_size)) {
-        ram_addr_t hotplug_mem_size =
-            machine->maxram_size - machine->ram_size;
-
-        if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
-            error_report("unsupported amount of memory slots: %"PRIu64,
-                         machine->ram_slots);
-            exit(EXIT_FAILURE);
-        }
-
-        if (QEMU_ALIGN_UP(machine->maxram_size,
-                          TARGET_PAGE_SIZE) != machine->maxram_size) {
-            error_report("maximum memory size must by aligned to multiple of "
-                         "%d bytes", TARGET_PAGE_SIZE);
-            exit(EXIT_FAILURE);
-        }
-
-        pcms->hotplug_memory.base =
-            ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
-
-        if (pcmc->enforce_aligned_dimm) {
-            /* size hotplug region assuming 1G page max alignment per slot */
-            hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
-        }
-
-        if ((pcms->hotplug_memory.base + hotplug_mem_size) <
-            hotplug_mem_size) {
-            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
-                         machine->maxram_size);
-            exit(EXIT_FAILURE);
-        }
-
-        memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
-                           "hotplug-memory", hotplug_mem_size);
-        memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
-                                    &pcms->hotplug_memory.mr);
-    }
+    pc_memory_hotplug_init(pcms, system_memory);
 
     /* Initialize PC system firmware */
     pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d9ccd5d0d6..90163e1a1b 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -235,6 +235,8 @@ static void xen_ram_init(PCMachineState *pcms,
                                  pcms->above_4g_mem_size);
         memory_region_add_subregion(sysmem, 0x100000000ULL, &ram_hi);
     }
+
+    pc_memory_hotplug_init(pcms, sysmem);
 }
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 8226904524..b65c5dd5ec 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -249,6 +249,7 @@ void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *system_memory,
                     MemoryRegion *rom_memory,
                     MemoryRegion **ram_memory);
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory);
 qemu_irq pc_allocate_cpu_irq(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 03/10] hostmem-xen: add a host memory backend for Xen
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Eduardo Habkost, Igor Mammedov, Michael S. Tsirkin

vNVDIMM requires a host memory backend to allocate its backend
resources to the guest. When QEMU is used as Xen device model, the
backend resource allocation of vNVDIMM is managed out of QEMU. A new
host memory backend 'memory-backend-xen' is introduced to represent
the backend resource allocated by Xen. It simply creates a memory
region of the specified size as a placeholder in the guest address
space, which will be mapped by Xen to the actual backend resource.

Following example QEMU options create a vNVDIMM device backed by a 4GB
host PMEM region at host physical address 0x100000000:
   -object memory-backend-xen,id=mem1,host-addr=0x100000000,size=4G
   -device nvdimm,id=nvdimm1,memdev=mem1

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++
 backends/hostmem.c     |   9 +++++
 hw/mem/pc-dimm.c       |   6 ++-
 4 files changed, 123 insertions(+), 1 deletion(-)
 create mode 100644 backends/hostmem-xen.c

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 0400799efd..3096fde21f 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
 common-obj-$(CONFIG_LINUX) += hostmem-file.o
+common-obj-${CONFIG_XEN_BACKEND} += hostmem-xen.o
 
 common-obj-y += cryptodev.o
 common-obj-y += cryptodev-builtin.o
diff --git a/backends/hostmem-xen.c b/backends/hostmem-xen.c
new file mode 100644
index 0000000000..99211efd81
--- /dev/null
+++ b/backends/hostmem-xen.c
@@ -0,0 +1,108 @@
+/*
+ * QEMU Host Memory Backend for Xen
+ *
+ * Copyright(C) 2017 Intel Corporation.
+ *
+ * Author:
+ *   Haozhong Zhang <haozhong.zhang@intel.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_MEMORY_BACKEND_XEN "memory-backend-xen"
+
+#define MEMORY_BACKEND_XEN(obj) \
+    OBJECT_CHECK(HostMemoryBackendXen, (obj), TYPE_MEMORY_BACKEND_XEN)
+
+typedef struct HostMemoryBackendXen HostMemoryBackendXen;
+
+struct HostMemoryBackendXen {
+    HostMemoryBackend parent_obj;
+
+    uint64_t host_addr;
+};
+
+static void xen_backend_get_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackendXen *backend = MEMORY_BACKEND_XEN(obj);
+    uint64_t value = backend->host_addr;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void xen_backend_set_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendXen *xb = MEMORY_BACKEND_XEN(obj);
+    Error *local_err = NULL;
+    uint64_t value;
+
+    if (memory_region_size(&backend->mr)) {
+        error_setg(&local_err, "cannot change property value");
+        goto out;
+    }
+
+    visit_type_size(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    xb->host_addr = value;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+static void xen_backend_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return;
+    }
+    memory_region_init(&backend->mr, OBJECT(backend), "hostmem-xen",
+                       backend->size);
+    backend->mr.align = getpagesize();
+}
+
+static void xen_backend_class_init(ObjectClass *oc, void *data)
+{
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+    bc->alloc = xen_backend_alloc;
+
+    object_class_property_add(oc, "host-addr", "int",
+                              xen_backend_get_host_addr,
+                              xen_backend_set_host_addr,
+                              NULL, NULL, &error_abort);
+}
+
+static const TypeInfo xen_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_XEN,
+    .parent = TYPE_MEMORY_BACKEND,
+    .class_init = xen_backend_class_init,
+    .instance_size = sizeof(HostMemoryBackendXen),
+};
+
+static void register_types(void)
+{
+    type_register_static(&xen_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index ee2c2d5bfd..ba13a52994 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
+#include "hw/xen/xen.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qapi-types.h"
@@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
             goto out;
         }
 
+        /*
+         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
+         * so no further work in this function is needed.
+         */
+        if (xen_enabled() && !backend->mr.ram_block) {
+            goto out;
+        }
+
         ptr = memory_region_get_ram_ptr(&backend->mr);
         sz = memory_region_size(&backend->mr);
 
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index bdf6649083..7e1fe005ee 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -28,6 +28,7 @@
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "hw/virtio/vhost.h"
+#include "hw/xen/xen.h"
 
 typedef struct pc_dimms_capacity {
      uint64_t size;
@@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     }
 
     memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
-    vmstate_register_ram(vmstate_mr, dev);
+    /* memory-backend-xen is not backed by RAM. */
+    if (!xen_enabled()) {
+        vmstate_register_ram(vmstate_mr, dev);
+    }
     numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
 
 out:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 03/10] hostmem-xen: add a host memory backend for Xen
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Eduardo Habkost, Michael S. Tsirkin,
	Igor Mammedov, Chao Peng, Dan Williams

vNVDIMM requires a host memory backend to allocate its backend
resources to the guest. When QEMU is used as Xen device model, the
backend resource allocation of vNVDIMM is managed out of QEMU. A new
host memory backend 'memory-backend-xen' is introduced to represent
the backend resource allocated by Xen. It simply creates a memory
region of the specified size as a placeholder in the guest address
space, which will be mapped by Xen to the actual backend resource.

Following example QEMU options create a vNVDIMM device backed by a 4GB
host PMEM region at host physical address 0x100000000:
   -object memory-backend-xen,id=mem1,host-addr=0x100000000,size=4G
   -device nvdimm,id=nvdimm1,memdev=mem1

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++
 backends/hostmem.c     |   9 +++++
 hw/mem/pc-dimm.c       |   6 ++-
 4 files changed, 123 insertions(+), 1 deletion(-)
 create mode 100644 backends/hostmem-xen.c

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 0400799efd..3096fde21f 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
 common-obj-$(CONFIG_LINUX) += hostmem-file.o
+common-obj-${CONFIG_XEN_BACKEND} += hostmem-xen.o
 
 common-obj-y += cryptodev.o
 common-obj-y += cryptodev-builtin.o
diff --git a/backends/hostmem-xen.c b/backends/hostmem-xen.c
new file mode 100644
index 0000000000..99211efd81
--- /dev/null
+++ b/backends/hostmem-xen.c
@@ -0,0 +1,108 @@
+/*
+ * QEMU Host Memory Backend for Xen
+ *
+ * Copyright(C) 2017 Intel Corporation.
+ *
+ * Author:
+ *   Haozhong Zhang <haozhong.zhang@intel.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_MEMORY_BACKEND_XEN "memory-backend-xen"
+
+#define MEMORY_BACKEND_XEN(obj) \
+    OBJECT_CHECK(HostMemoryBackendXen, (obj), TYPE_MEMORY_BACKEND_XEN)
+
+typedef struct HostMemoryBackendXen HostMemoryBackendXen;
+
+struct HostMemoryBackendXen {
+    HostMemoryBackend parent_obj;
+
+    uint64_t host_addr;
+};
+
+static void xen_backend_get_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackendXen *backend = MEMORY_BACKEND_XEN(obj);
+    uint64_t value = backend->host_addr;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void xen_backend_set_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendXen *xb = MEMORY_BACKEND_XEN(obj);
+    Error *local_err = NULL;
+    uint64_t value;
+
+    if (memory_region_size(&backend->mr)) {
+        error_setg(&local_err, "cannot change property value");
+        goto out;
+    }
+
+    visit_type_size(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    xb->host_addr = value;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+static void xen_backend_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return;
+    }
+    memory_region_init(&backend->mr, OBJECT(backend), "hostmem-xen",
+                       backend->size);
+    backend->mr.align = getpagesize();
+}
+
+static void xen_backend_class_init(ObjectClass *oc, void *data)
+{
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+    bc->alloc = xen_backend_alloc;
+
+    object_class_property_add(oc, "host-addr", "int",
+                              xen_backend_get_host_addr,
+                              xen_backend_set_host_addr,
+                              NULL, NULL, &error_abort);
+}
+
+static const TypeInfo xen_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_XEN,
+    .parent = TYPE_MEMORY_BACKEND,
+    .class_init = xen_backend_class_init,
+    .instance_size = sizeof(HostMemoryBackendXen),
+};
+
+static void register_types(void)
+{
+    type_register_static(&xen_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index ee2c2d5bfd..ba13a52994 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
+#include "hw/xen/xen.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qapi-types.h"
@@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
             goto out;
         }
 
+        /*
+         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
+         * so no further work in this function is needed.
+         */
+        if (xen_enabled() && !backend->mr.ram_block) {
+            goto out;
+        }
+
         ptr = memory_region_get_ram_ptr(&backend->mr);
         sz = memory_region_size(&backend->mr);
 
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index bdf6649083..7e1fe005ee 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -28,6 +28,7 @@
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "hw/virtio/vhost.h"
+#include "hw/xen/xen.h"
 
 typedef struct pc_dimms_capacity {
      uint64_t size;
@@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     }
 
     memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
-    vmstate_register_ram(vmstate_mr, dev);
+    /* memory-backend-xen is not backed by RAM. */
+    if (!xen_enabled()) {
+        vmstate_register_ram(vmstate_mr, dev);
+    }
     numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
 
 out:
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 04/10] nvdimm acpi: do not use fw_cfg on Xen
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Xiao Guangrong, Michael S. Tsirkin, Igor Mammedov

Xen relies on QEMU to build guest ACPI for NVDIMM. However, no fw_cfg
is created when QEMU is used as Xen device model, so QEMU should avoid
using fw_cfg on Xen.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/nvdimm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 6ceea196e7..9121a766c6 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -32,6 +32,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/xen/xen.h"
 
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
@@ -890,8 +891,12 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
 
     state->dsm_mem = g_array_new(false, true /* clear */, 1);
     acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
-    fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
-                    state->dsm_mem->len);
+
+    /* No fw_cfg is created when QEMU is used as Xen device model. */
+    if (!xen_enabled()) {
+        fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
+                        state->dsm_mem->len);
+    }
 
     nvdimm_init_fit_buffer(&state->fit_buf);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 04/10] nvdimm acpi: do not use fw_cfg on Xen
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Michael S. Tsirkin,
	Igor Mammedov, Chao Peng, Dan Williams

Xen relies on QEMU to build guest ACPI for NVDIMM. However, no fw_cfg
is created when QEMU is used as Xen device model, so QEMU should avoid
using fw_cfg on Xen.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/nvdimm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 6ceea196e7..9121a766c6 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -32,6 +32,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/xen/xen.h"
 
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
@@ -890,8 +891,12 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
 
     state->dsm_mem = g_array_new(false, true /* clear */, 1);
     acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
-    fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
-                    state->dsm_mem->len);
+
+    /* No fw_cfg is created when QEMU is used as Xen device model. */
+    if (!xen_enabled()) {
+        fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
+                        state->dsm_mem->len);
+    }
 
     nvdimm_init_fit_buffer(&state->fit_buf);
 }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 05/10] hw/xen-hvm: initialize DM ACPI
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Stefano Stabellini, Anthony Perard, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

Probe the base address and the length of guest ACPI buffer reserved
for copying ACPI from QEMU.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 90163e1a1b..ae895aaf03 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -18,6 +18,7 @@
 #include "hw/xen/xen_backend.h"
 #include "qmp-commands.h"
 
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
 #include "sysemu/xen-mapcache.h"
@@ -86,6 +87,18 @@ typedef struct XenPhysmap {
     QLIST_ENTRY(XenPhysmap) list;
 } XenPhysmap;
 
+#define HVM_XS_DM_ACPI_ROOT    "/hvmloader/dm-acpi"
+#define HVM_XS_DM_ACPI_ADDRESS HVM_XS_DM_ACPI_ROOT"/address"
+#define HVM_XS_DM_ACPI_LENGTH  HVM_XS_DM_ACPI_ROOT"/length"
+
+typedef struct XenAcpiBuf {
+    ram_addr_t base;
+    ram_addr_t length;
+    ram_addr_t used;
+} XenAcpiBuf;
+
+static XenAcpiBuf *dm_acpi_buf;
+
 typedef struct XenIOState {
     ioservid_t ioservid;
     shared_iopage_t *shared_page;
@@ -110,6 +123,8 @@ typedef struct XenIOState {
     hwaddr free_phys_offset;
     const XenPhysmap *log_for_dirtybit;
 
+    XenAcpiBuf dm_acpi_buf;
+
     Notifier exit;
     Notifier suspend;
     Notifier wakeup;
@@ -1234,6 +1249,52 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
     xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
+static int xen_dm_acpi_needed(PCMachineState *pcms)
+{
+    return 0;
+}
+
+static int dm_acpi_buf_init(XenIOState *state)
+{
+    char path[80], *value;
+    unsigned int len;
+
+    dm_acpi_buf = &state->dm_acpi_buf;
+
+    snprintf(path, sizeof(path),
+             "/local/domain/%d"HVM_XS_DM_ACPI_ADDRESS, xen_domid);
+    value = xs_read(state->xenstore, 0, path, &len);
+    if (!value) {
+        return -EINVAL;
+    }
+    if (qemu_strtoul(value, NULL, 16, &dm_acpi_buf->base)) {
+        return -EINVAL;
+    }
+
+    snprintf(path, sizeof(path),
+             "/local/domain/%d"HVM_XS_DM_ACPI_LENGTH, xen_domid);
+    value = xs_read(state->xenstore, 0, path, &len);
+    if (!value) {
+        return -EINVAL;
+    }
+    if (qemu_strtoul(value, NULL, 16, &dm_acpi_buf->length)) {
+        return -EINVAL;
+    }
+
+    dm_acpi_buf->used = 0;
+
+    return 0;
+}
+
+static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
+{
+    if (!xen_dm_acpi_needed(pcms)) {
+        return 0;
+    }
+
+    return dm_acpi_buf_init(state);
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
@@ -1385,6 +1446,11 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
     /* Disable ACPI build because Xen handles it */
     pcms->acpi_build_enabled = false;
 
+    if (xen_dm_acpi_init(pcms, state)) {
+        error_report("failed to initialize xen ACPI");
+        goto err;
+    }
+
     return;
 
 err:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 05/10] hw/xen-hvm: initialize DM ACPI
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

Probe the base address and the length of guest ACPI buffer reserved
for copying ACPI from QEMU.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 90163e1a1b..ae895aaf03 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -18,6 +18,7 @@
 #include "hw/xen/xen_backend.h"
 #include "qmp-commands.h"
 
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
 #include "sysemu/xen-mapcache.h"
@@ -86,6 +87,18 @@ typedef struct XenPhysmap {
     QLIST_ENTRY(XenPhysmap) list;
 } XenPhysmap;
 
+#define HVM_XS_DM_ACPI_ROOT    "/hvmloader/dm-acpi"
+#define HVM_XS_DM_ACPI_ADDRESS HVM_XS_DM_ACPI_ROOT"/address"
+#define HVM_XS_DM_ACPI_LENGTH  HVM_XS_DM_ACPI_ROOT"/length"
+
+typedef struct XenAcpiBuf {
+    ram_addr_t base;
+    ram_addr_t length;
+    ram_addr_t used;
+} XenAcpiBuf;
+
+static XenAcpiBuf *dm_acpi_buf;
+
 typedef struct XenIOState {
     ioservid_t ioservid;
     shared_iopage_t *shared_page;
@@ -110,6 +123,8 @@ typedef struct XenIOState {
     hwaddr free_phys_offset;
     const XenPhysmap *log_for_dirtybit;
 
+    XenAcpiBuf dm_acpi_buf;
+
     Notifier exit;
     Notifier suspend;
     Notifier wakeup;
@@ -1234,6 +1249,52 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
     xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
+static int xen_dm_acpi_needed(PCMachineState *pcms)
+{
+    return 0;
+}
+
+static int dm_acpi_buf_init(XenIOState *state)
+{
+    char path[80], *value;
+    unsigned int len;
+
+    dm_acpi_buf = &state->dm_acpi_buf;
+
+    snprintf(path, sizeof(path),
+             "/local/domain/%d"HVM_XS_DM_ACPI_ADDRESS, xen_domid);
+    value = xs_read(state->xenstore, 0, path, &len);
+    if (!value) {
+        return -EINVAL;
+    }
+    if (qemu_strtoul(value, NULL, 16, &dm_acpi_buf->base)) {
+        return -EINVAL;
+    }
+
+    snprintf(path, sizeof(path),
+             "/local/domain/%d"HVM_XS_DM_ACPI_LENGTH, xen_domid);
+    value = xs_read(state->xenstore, 0, path, &len);
+    if (!value) {
+        return -EINVAL;
+    }
+    if (qemu_strtoul(value, NULL, 16, &dm_acpi_buf->length)) {
+        return -EINVAL;
+    }
+
+    dm_acpi_buf->used = 0;
+
+    return 0;
+}
+
+static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
+{
+    if (!xen_dm_acpi_needed(pcms)) {
+        return 0;
+    }
+
+    return dm_acpi_buf_init(state);
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
@@ -1385,6 +1446,11 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
     /* Disable ACPI build because Xen handles it */
     pcms->acpi_build_enabled = false;
 
+    if (xen_dm_acpi_init(pcms, state)) {
+        error_report("failed to initialize xen ACPI");
+        goto err;
+    }
+
     return;
 
 err:
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 06/10] hw/xen-hvm: add function to copy ACPI into guest memory
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Stefano Stabellini, Anthony Perard, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

Xen relies on QEMU to build guest NFIT and NVDIMM namespace devices,
and implements an interface to allow QEMU to copy its ACPI into guest
memory. This commit implements the QEMU side support.

The location of guest memory that can receive QEMU ACPI can be found
from XenStore entries /local/domain/$dom_id/hvmloader/dm-acpi/{address,length},
which have been handled by previous commit.

QEMU ACPI copied to guest is organized in blobs. For each blob, QEMU
creates following XenStore entries under
/local/domain/$dom_id/hvmloader/dm-acpi/$name to indicate its type,
location in above guest memory region and size.
 - type   the type of the passed ACPI, which can be the following
          values.
    * XEN_DM_ACPI_BLOB_TYPE_TABLE (0) indicates it's a complete ACPI
      table, and its signature is indicated by $name in the XenStore
      path.
    * XEN_DM_ACPI_BLOB_TYPE_NSDEV (1) indicates it's the body of a
      namespace device, and its device name is indicated by $name in
      the XenStore path.
 - offset  offset in byte from the beginning of above guest memory region
 - length  size in byte of the copied ACPI

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/xen/xen.h  |  18 ++++++++
 stubs/xen-hvm.c       |   6 +++
 3 files changed, 137 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index ae895aaf03..b74c4ffb9c 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1286,6 +1286,20 @@ static int dm_acpi_buf_init(XenIOState *state)
     return 0;
 }
 
+static ram_addr_t dm_acpi_buf_alloc(size_t length)
+{
+    ram_addr_t addr;
+
+    if (dm_acpi_buf->length - dm_acpi_buf->used < length) {
+        return 0;
+    }
+
+    addr = dm_acpi_buf->base + dm_acpi_buf->used;
+    dm_acpi_buf->used += length;
+
+    return addr;
+}
+
 static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
 {
     if (!xen_dm_acpi_needed(pcms)) {
@@ -1295,6 +1309,105 @@ static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
     return dm_acpi_buf_init(state);
 }
 
+static int xs_write_dm_acpi_blob_entry(const char *name,
+                                       const char *entry, const char *value)
+{
+    XenIOState *state = container_of(dm_acpi_buf, XenIOState, dm_acpi_buf);
+    char path[80];
+
+    snprintf(path, sizeof(path),
+             "/local/domain/%d"HVM_XS_DM_ACPI_ROOT"/%s/%s",
+             xen_domid, name, entry);
+    if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+        return -EIO;
+    }
+
+    return 0;
+}
+
+static size_t xen_memcpy_to_guest(ram_addr_t gpa,
+                                  const void *buf, size_t length)
+{
+    size_t copied = 0, size;
+    ram_addr_t s, e, offset, cur = gpa;
+    xen_pfn_t cur_pfn;
+    void *page;
+
+    if (!buf || !length) {
+        return 0;
+    }
+
+    s = gpa & TARGET_PAGE_MASK;
+    e = gpa + length;
+    if (e < s) {
+        return 0;
+    }
+
+    while (cur < e) {
+        cur_pfn = cur >> TARGET_PAGE_BITS;
+        offset = cur - (cur_pfn << TARGET_PAGE_BITS);
+        size = (length >= TARGET_PAGE_SIZE - offset) ?
+               TARGET_PAGE_SIZE - offset : length;
+
+        page = xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ | PROT_WRITE,
+                                    1, &cur_pfn, NULL);
+        if (!page) {
+            break;
+        }
+
+        memcpy(page + offset, buf, size);
+        xenforeignmemory_unmap(xen_fmem, page, 1);
+
+        copied += size;
+        buf += size;
+        cur += size;
+        length -= size;
+    }
+
+    return copied;
+}
+
+int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
+                           int type)
+{
+    char value[21];
+    ram_addr_t buf_addr;
+    int rc;
+
+    if (type != XEN_DM_ACPI_BLOB_TYPE_TABLE &&
+        type != XEN_DM_ACPI_BLOB_TYPE_NSDEV) {
+        return -EINVAL;
+    }
+
+    buf_addr = dm_acpi_buf_alloc(length);
+    if (!buf_addr) {
+        return -ENOMEM;
+    }
+    if (xen_memcpy_to_guest(buf_addr, blob, length) != length) {
+        return -EIO;
+    }
+
+    snprintf(value, sizeof(value), "%d", type);
+    rc = xs_write_dm_acpi_blob_entry(name, "type", value);
+    if (rc) {
+        return rc;
+    }
+
+    snprintf(value, sizeof(value), "%"PRIu64, buf_addr - dm_acpi_buf->base);
+    rc = xs_write_dm_acpi_blob_entry(name, "offset", value);
+    if (rc) {
+        return rc;
+    }
+
+    snprintf(value, sizeof(value), "%"PRIu64, length);
+    rc = xs_write_dm_acpi_blob_entry(name, "length", value);
+    if (rc) {
+        return rc;
+    }
+
+    return 0;
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 7efcdaa8fe..38dcd1a7d4 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -48,4 +48,22 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length);
 
 void xen_register_framebuffer(struct MemoryRegion *mr);
 
+/*
+ * Copy an ACPI blob from QEMU to HVM guest.
+ *
+ * Parameters:
+ *  name:   a unique name of the data blob; for XEN_DM_ACPI_BLOB_TYPE_NSDEV,
+ *          name should be less then 4 characters
+ *  blob:   the ACPI blob to be copied
+ *  length: the length in bytes of the ACPI blob
+ *  type:   the type of content in the ACPI blob, one of XEN_DM_ACPI_BLOB_TYPE_*
+ *
+ * Return:
+ *   0 on success; a non-zero error code on failures.
+ */
+#define XEN_DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */
+#define XEN_DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML of ACPI namespace device */
+int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
+                           int type);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 3ca6c51b21..58889ae0fb 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -61,3 +61,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
 }
+
+int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
+                           int type)
+{
+    return -1;
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 06/10] hw/xen-hvm: add function to copy ACPI into guest memory
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

Xen relies on QEMU to build guest NFIT and NVDIMM namespace devices,
and implements an interface to allow QEMU to copy its ACPI into guest
memory. This commit implements the QEMU side support.

The location of guest memory that can receive QEMU ACPI can be found
from XenStore entries /local/domain/$dom_id/hvmloader/dm-acpi/{address,length},
which have been handled by previous commit.

QEMU ACPI copied to guest is organized in blobs. For each blob, QEMU
creates following XenStore entries under
/local/domain/$dom_id/hvmloader/dm-acpi/$name to indicate its type,
location in above guest memory region and size.
 - type   the type of the passed ACPI, which can be the following
          values.
    * XEN_DM_ACPI_BLOB_TYPE_TABLE (0) indicates it's a complete ACPI
      table, and its signature is indicated by $name in the XenStore
      path.
    * XEN_DM_ACPI_BLOB_TYPE_NSDEV (1) indicates it's the body of a
      namespace device, and its device name is indicated by $name in
      the XenStore path.
 - offset  offset in byte from the beginning of above guest memory region
 - length  size in byte of the copied ACPI

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/xen/xen.h  |  18 ++++++++
 stubs/xen-hvm.c       |   6 +++
 3 files changed, 137 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index ae895aaf03..b74c4ffb9c 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1286,6 +1286,20 @@ static int dm_acpi_buf_init(XenIOState *state)
     return 0;
 }
 
+static ram_addr_t dm_acpi_buf_alloc(size_t length)
+{
+    ram_addr_t addr;
+
+    if (dm_acpi_buf->length - dm_acpi_buf->used < length) {
+        return 0;
+    }
+
+    addr = dm_acpi_buf->base + dm_acpi_buf->used;
+    dm_acpi_buf->used += length;
+
+    return addr;
+}
+
 static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
 {
     if (!xen_dm_acpi_needed(pcms)) {
@@ -1295,6 +1309,105 @@ static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
     return dm_acpi_buf_init(state);
 }
 
+static int xs_write_dm_acpi_blob_entry(const char *name,
+                                       const char *entry, const char *value)
+{
+    XenIOState *state = container_of(dm_acpi_buf, XenIOState, dm_acpi_buf);
+    char path[80];
+
+    snprintf(path, sizeof(path),
+             "/local/domain/%d"HVM_XS_DM_ACPI_ROOT"/%s/%s",
+             xen_domid, name, entry);
+    if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+        return -EIO;
+    }
+
+    return 0;
+}
+
+static size_t xen_memcpy_to_guest(ram_addr_t gpa,
+                                  const void *buf, size_t length)
+{
+    size_t copied = 0, size;
+    ram_addr_t s, e, offset, cur = gpa;
+    xen_pfn_t cur_pfn;
+    void *page;
+
+    if (!buf || !length) {
+        return 0;
+    }
+
+    s = gpa & TARGET_PAGE_MASK;
+    e = gpa + length;
+    if (e < s) {
+        return 0;
+    }
+
+    while (cur < e) {
+        cur_pfn = cur >> TARGET_PAGE_BITS;
+        offset = cur - (cur_pfn << TARGET_PAGE_BITS);
+        size = (length >= TARGET_PAGE_SIZE - offset) ?
+               TARGET_PAGE_SIZE - offset : length;
+
+        page = xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ | PROT_WRITE,
+                                    1, &cur_pfn, NULL);
+        if (!page) {
+            break;
+        }
+
+        memcpy(page + offset, buf, size);
+        xenforeignmemory_unmap(xen_fmem, page, 1);
+
+        copied += size;
+        buf += size;
+        cur += size;
+        length -= size;
+    }
+
+    return copied;
+}
+
+int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
+                           int type)
+{
+    char value[21];
+    ram_addr_t buf_addr;
+    int rc;
+
+    if (type != XEN_DM_ACPI_BLOB_TYPE_TABLE &&
+        type != XEN_DM_ACPI_BLOB_TYPE_NSDEV) {
+        return -EINVAL;
+    }
+
+    buf_addr = dm_acpi_buf_alloc(length);
+    if (!buf_addr) {
+        return -ENOMEM;
+    }
+    if (xen_memcpy_to_guest(buf_addr, blob, length) != length) {
+        return -EIO;
+    }
+
+    snprintf(value, sizeof(value), "%d", type);
+    rc = xs_write_dm_acpi_blob_entry(name, "type", value);
+    if (rc) {
+        return rc;
+    }
+
+    snprintf(value, sizeof(value), "%"PRIu64, buf_addr - dm_acpi_buf->base);
+    rc = xs_write_dm_acpi_blob_entry(name, "offset", value);
+    if (rc) {
+        return rc;
+    }
+
+    snprintf(value, sizeof(value), "%"PRIu64, length);
+    rc = xs_write_dm_acpi_blob_entry(name, "length", value);
+    if (rc) {
+        return rc;
+    }
+
+    return 0;
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 7efcdaa8fe..38dcd1a7d4 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -48,4 +48,22 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length);
 
 void xen_register_framebuffer(struct MemoryRegion *mr);
 
+/*
+ * Copy an ACPI blob from QEMU to HVM guest.
+ *
+ * Parameters:
+ *  name:   a unique name of the data blob; for XEN_DM_ACPI_BLOB_TYPE_NSDEV,
+ *          name should be less then 4 characters
+ *  blob:   the ACPI blob to be copied
+ *  length: the length in bytes of the ACPI blob
+ *  type:   the type of content in the ACPI blob, one of XEN_DM_ACPI_BLOB_TYPE_*
+ *
+ * Return:
+ *   0 on success; a non-zero error code on failures.
+ */
+#define XEN_DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */
+#define XEN_DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML of ACPI namespace device */
+int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
+                           int type);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 3ca6c51b21..58889ae0fb 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -61,3 +61,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
 }
+
+int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
+                           int type)
+{
+    return -1;
+}
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 07/10] nvdimm acpi: copy NFIT to Xen guest
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Michael S. Tsirkin, Igor Mammedov, Xiao Guangrong

Xen relies on QEMU to build the guest NFIT.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
---
 hw/acpi/nvdimm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 9121a766c6..d9cdc5a531 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -404,6 +404,12 @@ static void nvdimm_build_nfit(AcpiNVDIMMState *state, GArray *table_offsets,
     build_header(linker, table_data,
                  (void *)(table_data->data + header), "NFIT",
                  sizeof(NvdimmNfitHeader) + fit_buf->fit->len, 1, NULL, NULL);
+
+    if (xen_enabled()) {
+        xen_acpi_copy_to_guest("NFIT", table_data->data + header,
+                               sizeof(NvdimmNfitHeader) + fit_buf->fit->len,
+                               XEN_DM_ACPI_BLOB_TYPE_TABLE);
+    }
 }
 
 #define NVDIMM_DSM_MEMORY_SIZE      4096
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 07/10] nvdimm acpi: copy NFIT to Xen guest
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Michael S. Tsirkin,
	Igor Mammedov, Chao Peng, Dan Williams

Xen relies on QEMU to build the guest NFIT.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
---
 hw/acpi/nvdimm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 9121a766c6..d9cdc5a531 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -404,6 +404,12 @@ static void nvdimm_build_nfit(AcpiNVDIMMState *state, GArray *table_offsets,
     build_header(linker, table_data,
                  (void *)(table_data->data + header), "NFIT",
                  sizeof(NvdimmNfitHeader) + fit_buf->fit->len, 1, NULL, NULL);
+
+    if (xen_enabled()) {
+        xen_acpi_copy_to_guest("NFIT", table_data->data + header,
+                               sizeof(NvdimmNfitHeader) + fit_buf->fit->len,
+                               XEN_DM_ACPI_BLOB_TYPE_TABLE);
+    }
 }
 
 #define NVDIMM_DSM_MEMORY_SIZE      4096
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 08/10] nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Michael S. Tsirkin, Igor Mammedov, Xiao Guangrong

Xen relies on QEMU to build the ACPI namespace device of vNVDIMM for
Xen guest.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
---
 hw/acpi/nvdimm.c | 55 ++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index d9cdc5a531..bf887512ad 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -1226,22 +1226,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, uint32_t ram_slots)
     }
 }
 
-static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
-                              BIOSLinker *linker, GArray *dsm_dma_arrea,
-                              uint32_t ram_slots)
+static void nvdimm_build_ssdt_device(Aml *dev, uint32_t ram_slots)
 {
-    Aml *ssdt, *sb_scope, *dev;
-    int mem_addr_offset, nvdimm_ssdt;
-
-    acpi_add_table(table_offsets, table_data);
-
-    ssdt = init_aml_allocator();
-    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
-
-    sb_scope = aml_scope("\\_SB");
-
-    dev = aml_device("NVDR");
-
     /*
      * ACPI 6.0: 9.20 NVDIMM Devices:
      *
@@ -1262,6 +1248,25 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
     nvdimm_build_fit(dev);
 
     nvdimm_build_nvdimm_devices(dev, ram_slots);
+}
+
+static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
+                              BIOSLinker *linker, GArray *dsm_dma_arrea,
+                              uint32_t ram_slots)
+{
+    Aml *ssdt, *sb_scope, *dev;
+    int mem_addr_offset, nvdimm_ssdt;
+
+    acpi_add_table(table_offsets, table_data);
+
+    ssdt = init_aml_allocator();
+    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+    sb_scope = aml_scope("\\_SB");
+
+    dev = aml_device("NVDR");
+
+    nvdimm_build_ssdt_device(dev, ram_slots);
 
     aml_append(sb_scope, dev);
     aml_append(ssdt, sb_scope);
@@ -1285,6 +1290,18 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
     free_aml_allocator();
 }
 
+static void nvdimm_build_xen_ssdt(uint32_t ram_slots)
+{
+    Aml *dev = init_aml_allocator();
+
+    nvdimm_build_ssdt_device(dev, ram_slots);
+    build_append_named_dword(dev->buf, NVDIMM_ACPI_MEM_ADDR);
+    xen_acpi_copy_to_guest("NVDR", dev->buf->data, dev->buf->len,
+                           XEN_DM_ACPI_BLOB_TYPE_NSDEV);
+
+    free_aml_allocator();
+}
+
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        BIOSLinker *linker, AcpiNVDIMMState *state,
                        uint32_t ram_slots)
@@ -1296,8 +1313,12 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
         return;
     }
 
-    nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
-                      ram_slots);
+    if (!xen_enabled()) {
+        nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
+                          ram_slots);
+    } else {
+        nvdimm_build_xen_ssdt(ram_slots);
+    }
 
     device_list = nvdimm_get_device_list();
     /* no NVDIMM device is plugged. */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 08/10] nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Michael S. Tsirkin,
	Igor Mammedov, Chao Peng, Dan Williams

Xen relies on QEMU to build the ACPI namespace device of vNVDIMM for
Xen guest.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
---
 hw/acpi/nvdimm.c | 55 ++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index d9cdc5a531..bf887512ad 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -1226,22 +1226,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, uint32_t ram_slots)
     }
 }
 
-static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
-                              BIOSLinker *linker, GArray *dsm_dma_arrea,
-                              uint32_t ram_slots)
+static void nvdimm_build_ssdt_device(Aml *dev, uint32_t ram_slots)
 {
-    Aml *ssdt, *sb_scope, *dev;
-    int mem_addr_offset, nvdimm_ssdt;
-
-    acpi_add_table(table_offsets, table_data);
-
-    ssdt = init_aml_allocator();
-    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
-
-    sb_scope = aml_scope("\\_SB");
-
-    dev = aml_device("NVDR");
-
     /*
      * ACPI 6.0: 9.20 NVDIMM Devices:
      *
@@ -1262,6 +1248,25 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
     nvdimm_build_fit(dev);
 
     nvdimm_build_nvdimm_devices(dev, ram_slots);
+}
+
+static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
+                              BIOSLinker *linker, GArray *dsm_dma_arrea,
+                              uint32_t ram_slots)
+{
+    Aml *ssdt, *sb_scope, *dev;
+    int mem_addr_offset, nvdimm_ssdt;
+
+    acpi_add_table(table_offsets, table_data);
+
+    ssdt = init_aml_allocator();
+    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+    sb_scope = aml_scope("\\_SB");
+
+    dev = aml_device("NVDR");
+
+    nvdimm_build_ssdt_device(dev, ram_slots);
 
     aml_append(sb_scope, dev);
     aml_append(ssdt, sb_scope);
@@ -1285,6 +1290,18 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
     free_aml_allocator();
 }
 
+static void nvdimm_build_xen_ssdt(uint32_t ram_slots)
+{
+    Aml *dev = init_aml_allocator();
+
+    nvdimm_build_ssdt_device(dev, ram_slots);
+    build_append_named_dword(dev->buf, NVDIMM_ACPI_MEM_ADDR);
+    xen_acpi_copy_to_guest("NVDR", dev->buf->data, dev->buf->len,
+                           XEN_DM_ACPI_BLOB_TYPE_NSDEV);
+
+    free_aml_allocator();
+}
+
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        BIOSLinker *linker, AcpiNVDIMMState *state,
                        uint32_t ram_slots)
@@ -1296,8 +1313,12 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
         return;
     }
 
-    nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
-                      ram_slots);
+    if (!xen_enabled()) {
+        nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
+                          ram_slots);
+    } else {
+        nvdimm_build_xen_ssdt(ram_slots);
+    }
 
     device_list = nvdimm_get_device_list();
     /* no NVDIMM device is plugged. */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 09/10] nvdimm acpi: do not build _FIT method on Xen
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Michael S. Tsirkin, Igor Mammedov, Xiao Guangrong

Xen currently does not support vNVDIMM hotplug and always sets QEMU
option "maxmem" to be just enough for RAM and vNVDIMM, so it's not
necessary to build _FIT method when QEMU is used as Xen device model.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
---
 hw/acpi/nvdimm.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index bf887512ad..61789c3966 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -1245,7 +1245,14 @@ static void nvdimm_build_ssdt_device(Aml *dev, uint32_t ram_slots)
 
     /* 0 is reserved for root device. */
     nvdimm_build_device_dsm(dev, 0);
-    nvdimm_build_fit(dev);
+    /*
+     * Xen does not support vNVDIMM hotplug, and always sets the QEMU
+     * option "maxmem" to be just enough for RAM and static plugged
+     * vNVDIMM, so it's unnecessary to build _FIT method on Xen.
+     */
+    if (!xen_enabled()) {
+        nvdimm_build_fit(dev);
+    }
 
     nvdimm_build_nvdimm_devices(dev, ram_slots);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 09/10] nvdimm acpi: do not build _FIT method on Xen
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Michael S. Tsirkin,
	Igor Mammedov, Chao Peng, Dan Williams

Xen currently does not support vNVDIMM hotplug and always sets QEMU
option "maxmem" to be just enough for RAM and vNVDIMM, so it's not
necessary to build _FIT method when QEMU is used as Xen device model.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
---
 hw/acpi/nvdimm.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index bf887512ad..61789c3966 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -1245,7 +1245,14 @@ static void nvdimm_build_ssdt_device(Aml *dev, uint32_t ram_slots)
 
     /* 0 is reserved for root device. */
     nvdimm_build_device_dsm(dev, 0);
-    nvdimm_build_fit(dev);
+    /*
+     * Xen does not support vNVDIMM hotplug, and always sets the QEMU
+     * option "maxmem" to be just enough for RAM and static plugged
+     * vNVDIMM, so it's unnecessary to build _FIT method on Xen.
+     */
+    if (!xen_enabled()) {
+        nvdimm_build_fit(dev);
+    }
 
     nvdimm_build_nvdimm_devices(dev, ram_slots);
 }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v3 10/10] hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:41     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Dan Williams, Chao Peng, Haozhong Zhang,
	Michael S. Tsirkin, Igor Mammedov, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Stefano Stabellini,
	Anthony Perard

If the machine option 'nvdimm' is enabled and QEMU is used as Xen
device model, construct the guest NFIT and ACPI namespace devices of
vNVDIMM and copy them into guest memory.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
---
 hw/acpi/aml-build.c   | 10 +++++++---
 hw/i386/pc.c          | 16 ++++++++++------
 hw/i386/xen/xen-hvm.c | 25 +++++++++++++++++++++++--
 include/hw/xen/xen.h  |  7 +++++++
 stubs/xen-hvm.c       |  4 ++++
 5 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 36a6cc450e..5f57c1bef3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -22,6 +22,7 @@
 #include "qemu/osdep.h"
 #include <glib/gprintf.h>
 #include "hw/acpi/aml-build.h"
+#include "hw/xen/xen.h"
 #include "qemu/bswap.h"
 #include "qemu/bitops.h"
 #include "sysemu/numa.h"
@@ -1531,9 +1532,12 @@ build_header(BIOSLinker *linker, GArray *table_data,
     h->oem_revision = cpu_to_le32(1);
     memcpy(h->asl_compiler_id, ACPI_BUILD_APPNAME4, 4);
     h->asl_compiler_revision = cpu_to_le32(1);
-    /* Checksum to be filled in by Guest linker */
-    bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE,
-        tbl_offset, len, checksum_offset);
+    /* No linker is used when QEMU is used as Xen device model. */
+    if (!xen_enabled()) {
+        /* Checksum to be filled in by Guest linker */
+        bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE,
+                                        tbl_offset, len, checksum_offset);
+    }
 }
 
 void *acpi_data_push(GArray *table_data, unsigned size)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5cbdce61a7..7101d380a0 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1252,12 +1252,16 @@ void pc_machine_done(Notifier *notifier, void *data)
         }
     }
 
-    acpi_setup();
-    if (pcms->fw_cfg) {
-        pc_build_smbios(pcms);
-        pc_build_feature_control_file(pcms);
-        /* update FW_CFG_NB_CPUS to account for -device added CPUs */
-        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+    if (!xen_enabled()) {
+        acpi_setup();
+        if (pcms->fw_cfg) {
+            pc_build_smbios(pcms);
+            pc_build_feature_control_file(pcms);
+            /* update FW_CFG_NB_CPUS to account for -device added CPUs */
+            fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+        }
+    } else {
+        xen_dm_acpi_setup(pcms);
     }
 
     if (pcms->apic_id_limit > 255) {
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index b74c4ffb9c..d81cc7dbbc 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -265,7 +265,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
         /* RAM already populated in Xen */
         fprintf(stderr, "%s: do not alloc "RAM_ADDR_FMT
                 " bytes of ram at "RAM_ADDR_FMT" when runstate is INMIGRATE\n",
-                __func__, size, ram_addr); 
+                __func__, size, ram_addr);
         return;
     }
 
@@ -1251,7 +1251,7 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
 
 static int xen_dm_acpi_needed(PCMachineState *pcms)
 {
-    return 0;
+    return pcms->acpi_nvdimm_state.is_enabled;
 }
 
 static int dm_acpi_buf_init(XenIOState *state)
@@ -1309,6 +1309,20 @@ static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
     return dm_acpi_buf_init(state);
 }
 
+static void xen_dm_acpi_nvdimm_setup(PCMachineState *pcms)
+{
+    GArray *table_offsets = g_array_new(false, true /* clear */,
+                                        sizeof(uint32_t));
+    GArray *table_data = g_array_new(false, true /* clear */, 1);
+
+    nvdimm_build_acpi(table_offsets, table_data,
+                      NULL, &pcms->acpi_nvdimm_state,
+                      MACHINE(pcms)->ram_slots);
+
+    g_array_free(table_offsets, true);
+    g_array_free(table_data, true);
+}
+
 static int xs_write_dm_acpi_blob_entry(const char *name,
                                        const char *entry, const char *value)
 {
@@ -1408,6 +1422,13 @@ int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
     return 0;
 }
 
+void xen_dm_acpi_setup(PCMachineState *pcms)
+{
+    if (pcms->acpi_nvdimm_state.is_enabled) {
+        xen_dm_acpi_nvdimm_setup(pcms);
+    }
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 38dcd1a7d4..8c48195e12 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -66,4 +66,11 @@ void xen_register_framebuffer(struct MemoryRegion *mr);
 int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
                            int type);
 
+/*
+ * Build guest ACPI (i.e. DM ACPI, or ACPI built by device model) and
+ * copy them into guest memory. Xen hvmloader will load and merge DM
+ * ACPI with the guest ACPI built by itself.
+ */
+void xen_dm_acpi_setup(PCMachineState *pcms);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 58889ae0fb..c1a6d21efa 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -67,3 +67,7 @@ int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
 {
     return -1;
 }
+
+void xen_dm_acpi_setup(PCMachineState *pcms)
+{
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC QEMU PATCH v3 10/10] hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
@ 2017-09-11  4:41     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  4:41 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Igor Mammedov, Anthony Perard,
	Chao Peng, Dan Williams, Richard Henderson

If the machine option 'nvdimm' is enabled and QEMU is used as Xen
device model, construct the guest NFIT and ACPI namespace devices of
vNVDIMM and copy them into guest memory.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
---
 hw/acpi/aml-build.c   | 10 +++++++---
 hw/i386/pc.c          | 16 ++++++++++------
 hw/i386/xen/xen-hvm.c | 25 +++++++++++++++++++++++--
 include/hw/xen/xen.h  |  7 +++++++
 stubs/xen-hvm.c       |  4 ++++
 5 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 36a6cc450e..5f57c1bef3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -22,6 +22,7 @@
 #include "qemu/osdep.h"
 #include <glib/gprintf.h>
 #include "hw/acpi/aml-build.h"
+#include "hw/xen/xen.h"
 #include "qemu/bswap.h"
 #include "qemu/bitops.h"
 #include "sysemu/numa.h"
@@ -1531,9 +1532,12 @@ build_header(BIOSLinker *linker, GArray *table_data,
     h->oem_revision = cpu_to_le32(1);
     memcpy(h->asl_compiler_id, ACPI_BUILD_APPNAME4, 4);
     h->asl_compiler_revision = cpu_to_le32(1);
-    /* Checksum to be filled in by Guest linker */
-    bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE,
-        tbl_offset, len, checksum_offset);
+    /* No linker is used when QEMU is used as Xen device model. */
+    if (!xen_enabled()) {
+        /* Checksum to be filled in by Guest linker */
+        bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE,
+                                        tbl_offset, len, checksum_offset);
+    }
 }
 
 void *acpi_data_push(GArray *table_data, unsigned size)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5cbdce61a7..7101d380a0 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1252,12 +1252,16 @@ void pc_machine_done(Notifier *notifier, void *data)
         }
     }
 
-    acpi_setup();
-    if (pcms->fw_cfg) {
-        pc_build_smbios(pcms);
-        pc_build_feature_control_file(pcms);
-        /* update FW_CFG_NB_CPUS to account for -device added CPUs */
-        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+    if (!xen_enabled()) {
+        acpi_setup();
+        if (pcms->fw_cfg) {
+            pc_build_smbios(pcms);
+            pc_build_feature_control_file(pcms);
+            /* update FW_CFG_NB_CPUS to account for -device added CPUs */
+            fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+        }
+    } else {
+        xen_dm_acpi_setup(pcms);
     }
 
     if (pcms->apic_id_limit > 255) {
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index b74c4ffb9c..d81cc7dbbc 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -265,7 +265,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
         /* RAM already populated in Xen */
         fprintf(stderr, "%s: do not alloc "RAM_ADDR_FMT
                 " bytes of ram at "RAM_ADDR_FMT" when runstate is INMIGRATE\n",
-                __func__, size, ram_addr); 
+                __func__, size, ram_addr);
         return;
     }
 
@@ -1251,7 +1251,7 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
 
 static int xen_dm_acpi_needed(PCMachineState *pcms)
 {
-    return 0;
+    return pcms->acpi_nvdimm_state.is_enabled;
 }
 
 static int dm_acpi_buf_init(XenIOState *state)
@@ -1309,6 +1309,20 @@ static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
     return dm_acpi_buf_init(state);
 }
 
+static void xen_dm_acpi_nvdimm_setup(PCMachineState *pcms)
+{
+    GArray *table_offsets = g_array_new(false, true /* clear */,
+                                        sizeof(uint32_t));
+    GArray *table_data = g_array_new(false, true /* clear */, 1);
+
+    nvdimm_build_acpi(table_offsets, table_data,
+                      NULL, &pcms->acpi_nvdimm_state,
+                      MACHINE(pcms)->ram_slots);
+
+    g_array_free(table_offsets, true);
+    g_array_free(table_data, true);
+}
+
 static int xs_write_dm_acpi_blob_entry(const char *name,
                                        const char *entry, const char *value)
 {
@@ -1408,6 +1422,13 @@ int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
     return 0;
 }
 
+void xen_dm_acpi_setup(PCMachineState *pcms)
+{
+    if (pcms->acpi_nvdimm_state.is_enabled) {
+        xen_dm_acpi_nvdimm_setup(pcms);
+    }
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 38dcd1a7d4..8c48195e12 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -66,4 +66,11 @@ void xen_register_framebuffer(struct MemoryRegion *mr);
 int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
                            int type);
 
+/*
+ * Build guest ACPI (i.e. DM ACPI, or ACPI built by device model) and
+ * copy them into guest memory. Xen hvmloader will load and merge DM
+ * ACPI with the guest ACPI built by itself.
+ */
+void xen_dm_acpi_setup(PCMachineState *pcms);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 58889ae0fb..c1a6d21efa 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -67,3 +67,7 @@ int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
 {
     return -1;
 }
+
+void xen_dm_acpi_setup(PCMachineState *pcms)
+{
+}
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11  4:53     ` no-reply
  -1 siblings, 0 replies; 128+ messages in thread
From: no-reply @ 2017-09-11  4:53 UTC (permalink / raw)
  To: haozhong.zhang
  Cc: famz, qemu-devel, xen-devel, sstabellini, ehabkost, konrad.wilk,
	mst, pbonzini, imammedo, anthony.perard, chao.p.peng,
	dan.j.williams, rth, xiaoguangrong.eric

Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
Message-id: 20170911044157.15403-1-haozhong.zhang@intel.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20170911044157.15403-1-haozhong.zhang@intel.com -> patchew/20170911044157.15403-1-haozhong.zhang@intel.com
Switched to a new branch 'test'
d5f5b8faf2 hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
73b52971f5 nvdimm acpi: do not build _FIT method on Xen
1c6eeac40e nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
f2d6097366 nvdimm acpi: copy NFIT to Xen guest
69ddac3d65 hw/xen-hvm: add function to copy ACPI into guest memory
cae88474b2 hw/xen-hvm: initialize DM ACPI
23a0e4204a nvdimm acpi: do not use fw_cfg on Xen
e298be5d96 hostmem-xen: add a host memory backend for Xen
f069bbb659 hw/xen-hvm: create the hotplug memory region on Xen
69b6b6e9fa nvdimm: do not intiailize nvdimm->label_data if label size is zero

=== OUTPUT BEGIN ===
Checking PATCH 1/10: nvdimm: do not intiailize nvdimm->label_data if label size is zero...
ERROR: braces {} are necessary for all arms of this statement
#33: FILE: hw/mem/nvdimm.c:97:
+    if (nvdimm->label_size)
[...]

total: 1 errors, 0 warnings, 16 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/10: hw/xen-hvm: create the hotplug memory region on Xen...
ERROR: braces {} are necessary for all arms of this statement
#29: FILE: hw/i386/pc.c:1356:
+    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
[...]

total: 1 errors, 0 warnings, 113 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/10: hostmem-xen: add a host memory backend for Xen...
Checking PATCH 4/10: nvdimm acpi: do not use fw_cfg on Xen...
Checking PATCH 5/10: hw/xen-hvm: initialize DM ACPI...
Checking PATCH 6/10: hw/xen-hvm: add function to copy ACPI into guest memory...
Checking PATCH 7/10: nvdimm acpi: copy NFIT to Xen guest...
Checking PATCH 8/10: nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest...
Checking PATCH 9/10: nvdimm acpi: do not build _FIT method on Xen...
Checking PATCH 10/10: hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-09-11  4:53     ` no-reply
  0 siblings, 0 replies; 128+ messages in thread
From: no-reply @ 2017-09-11  4:53 UTC (permalink / raw)
  Cc: haozhong.zhang, sstabellini, famz, ehabkost, mst, qemu-devel,
	xen-devel, chao.p.peng, imammedo, anthony.perard, pbonzini,
	dan.j.williams, xiaoguangrong.eric, rth

Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
Message-id: 20170911044157.15403-1-haozhong.zhang@intel.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20170911044157.15403-1-haozhong.zhang@intel.com -> patchew/20170911044157.15403-1-haozhong.zhang@intel.com
Switched to a new branch 'test'
d5f5b8faf2 hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
73b52971f5 nvdimm acpi: do not build _FIT method on Xen
1c6eeac40e nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
f2d6097366 nvdimm acpi: copy NFIT to Xen guest
69ddac3d65 hw/xen-hvm: add function to copy ACPI into guest memory
cae88474b2 hw/xen-hvm: initialize DM ACPI
23a0e4204a nvdimm acpi: do not use fw_cfg on Xen
e298be5d96 hostmem-xen: add a host memory backend for Xen
f069bbb659 hw/xen-hvm: create the hotplug memory region on Xen
69b6b6e9fa nvdimm: do not intiailize nvdimm->label_data if label size is zero

=== OUTPUT BEGIN ===
Checking PATCH 1/10: nvdimm: do not intiailize nvdimm->label_data if label size is zero...
ERROR: braces {} are necessary for all arms of this statement
#33: FILE: hw/mem/nvdimm.c:97:
+    if (nvdimm->label_size)
[...]

total: 1 errors, 0 warnings, 16 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/10: hw/xen-hvm: create the hotplug memory region on Xen...
ERROR: braces {} are necessary for all arms of this statement
#29: FILE: hw/i386/pc.c:1356:
+    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
[...]

total: 1 errors, 0 warnings, 113 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/10: hostmem-xen: add a host memory backend for Xen...
Checking PATCH 4/10: nvdimm acpi: do not use fw_cfg on Xen...
Checking PATCH 5/10: hw/xen-hvm: initialize DM ACPI...
Checking PATCH 6/10: hw/xen-hvm: add function to copy ACPI into guest memory...
Checking PATCH 7/10: nvdimm acpi: copy NFIT to Xen guest...
Checking PATCH 8/10: nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest...
Checking PATCH 9/10: nvdimm acpi: do not build _FIT method on Xen...
Checking PATCH 10/10: hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
  2017-09-11  4:37 ` [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl' Haozhong Zhang
@ 2017-09-11  5:10   ` Dan Williams
  2017-09-11  5:39     ` Haozhong Zhang
  0 siblings, 1 reply; 128+ messages in thread
From: Dan Williams @ 2017-09-11  5:10 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Chao Peng, Ian Jackson, Wei Liu, xen-devel

On Sun, Sep 10, 2017 at 9:37 PM, Haozhong Zhang
<haozhong.zhang@intel.com> wrote:
> The kernel NVDIMM driver and the traditional NVDIMM management
> utilities in Dom0 does not work now. 'xen-ndctl' is added as an
> alternatively, which manages NVDIMM via Xen hypercalls.
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  .gitignore             |   1 +
>  tools/misc/Makefile    |   4 ++
>  tools/misc/xen-ndctl.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 177 insertions(+)
>  create mode 100644 tools/misc/xen-ndctl.c

What about my offer to move this functionality into the upstream ndctl
utility [1]? I think it is thoroughly confusing that you are reusing
the name 'ndctl' and avoiding integration with the upstream ndctl
utility.

[1]: https://patchwork.kernel.org/patch/9632865/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
  2017-09-11  5:10   ` Dan Williams
@ 2017-09-11  5:39     ` Haozhong Zhang
  2017-09-11 16:35       ` Dan Williams
  0 siblings, 1 reply; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-11  5:39 UTC (permalink / raw)
  To: Dan Williams; +Cc: Chao Peng, Ian Jackson, Wei Liu, xen-devel

On 09/10/17 22:10 -0700, Dan Williams wrote:
> On Sun, Sep 10, 2017 at 9:37 PM, Haozhong Zhang
> <haozhong.zhang@intel.com> wrote:
> > The kernel NVDIMM driver and the traditional NVDIMM management
> > utilities in Dom0 does not work now. 'xen-ndctl' is added as an
> > alternatively, which manages NVDIMM via Xen hypercalls.
> >
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  .gitignore             |   1 +
> >  tools/misc/Makefile    |   4 ++
> >  tools/misc/xen-ndctl.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 177 insertions(+)
> >  create mode 100644 tools/misc/xen-ndctl.c
> 
> What about my offer to move this functionality into the upstream ndctl
> utility [1]? I think it is thoroughly confusing that you are reusing
> the name 'ndctl' and avoiding integration with the upstream ndctl
> utility.
> 
> [1]: https://patchwork.kernel.org/patch/9632865/

I'm not object to integrate it with ndctl.

My only concern is that the integration will introduces two types of
user interface. The upstream ndctl works with the kernel driver and
provides easily used *names* (e.g., namespace0.0, region0, nmem0,
etc.) for user input. However, this version patchset hides NFIT from
Dom0 (to simplify the first implementation), so the kernel driver does
not work in Dom0, neither does ndctl. Instead, xen-ndctl has to use
*the physical address* for users to specify their interested NVDIMM
region, which is different from upstream ndctl.


Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-09-11 14:08     ` Igor Mammedov
  -1 siblings, 0 replies; 128+ messages in thread
From: Igor Mammedov @ 2017-09-11 14:08 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, xen-devel, Konrad Rzeszutek Wilk, Dan Williams,
	Chao Peng, Eduardo Habkost, Michael S. Tsirkin, Xiao Guangrong,
	Paolo Bonzini, Richard Henderson, Stefano Stabellini,
	Anthony Perard

On Mon, 11 Sep 2017 12:41:47 +0800
Haozhong Zhang <haozhong.zhang@intel.com> wrote:

> This is the QEMU part patches that works with the associated Xen
> patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> guest address space for vNVDIMM devices.
> 
> All patches can be found at
>   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
>   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> 
> Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> label data, as the Xen side support for labels is not implemented yet.
> 
> Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> memory region for Xen guest, in order to make the existing nvdimm
> device plugging path work on Xen.
> 
> Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> used as the Xen device model.

I've skimmed over patch-set and can't say that I'm happy with
number of xen_enabled() invariants it introduced as well as
with partial blobs it creates.

I'd like to reduce above and a way to do this might be making xen 
 1. use fw_cfg
 2. fetch QEMU build acpi tables from fw_cfg
 3. extract nvdim tables (which is trivial) and use them

looking at xen_load_linux(), it seems possible to use fw_cfg.

So what's stopping xen from using it elsewhere?,
instead of adding more xen specific code to do 'the same'
job and not reusing/sharing common code with tcg/kvm.


> Haozhong Zhang (10):
>   nvdimm: do not intiailize nvdimm->label_data if label size is zero
>   hw/xen-hvm: create the hotplug memory region on Xen
>   hostmem-xen: add a host memory backend for Xen
>   nvdimm acpi: do not use fw_cfg on Xen
>   hw/xen-hvm: initialize DM ACPI
>   hw/xen-hvm: add function to copy ACPI into guest memory
>   nvdimm acpi: copy NFIT to Xen guest
>   nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
>   nvdimm acpi: do not build _FIT method on Xen
>   hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
> 
>  backends/Makefile.objs |   1 +
>  backends/hostmem-xen.c | 108 ++++++++++++++++++++++++++
>  backends/hostmem.c     |   9 +++
>  hw/acpi/aml-build.c    |  10 ++-
>  hw/acpi/nvdimm.c       |  79 ++++++++++++++-----
>  hw/i386/pc.c           | 102 ++++++++++++++-----------
>  hw/i386/xen/xen-hvm.c  | 204 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/mem/nvdimm.c        |  10 ++-
>  hw/mem/pc-dimm.c       |   6 +-
>  include/hw/i386/pc.h   |   1 +
>  include/hw/xen/xen.h   |  25 ++++++
>  stubs/xen-hvm.c        |  10 +++
>  12 files changed, 495 insertions(+), 70 deletions(-)
>  create mode 100644 backends/hostmem-xen.c
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-09-11 14:08     ` Igor Mammedov
  0 siblings, 0 replies; 128+ messages in thread
From: Igor Mammedov @ 2017-09-11 14:08 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, xen-devel, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson, Xiao Guangrong

On Mon, 11 Sep 2017 12:41:47 +0800
Haozhong Zhang <haozhong.zhang@intel.com> wrote:

> This is the QEMU part patches that works with the associated Xen
> patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> guest address space for vNVDIMM devices.
> 
> All patches can be found at
>   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
>   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> 
> Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> label data, as the Xen side support for labels is not implemented yet.
> 
> Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> memory region for Xen guest, in order to make the existing nvdimm
> device plugging path work on Xen.
> 
> Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> used as the Xen device model.

I've skimmed over patch-set and can't say that I'm happy with
number of xen_enabled() invariants it introduced as well as
with partial blobs it creates.

I'd like to reduce above and a way to do this might be making xen 
 1. use fw_cfg
 2. fetch QEMU build acpi tables from fw_cfg
 3. extract nvdim tables (which is trivial) and use them

looking at xen_load_linux(), it seems possible to use fw_cfg.

So what's stopping xen from using it elsewhere?,
instead of adding more xen specific code to do 'the same'
job and not reusing/sharing common code with tcg/kvm.


> Haozhong Zhang (10):
>   nvdimm: do not intiailize nvdimm->label_data if label size is zero
>   hw/xen-hvm: create the hotplug memory region on Xen
>   hostmem-xen: add a host memory backend for Xen
>   nvdimm acpi: do not use fw_cfg on Xen
>   hw/xen-hvm: initialize DM ACPI
>   hw/xen-hvm: add function to copy ACPI into guest memory
>   nvdimm acpi: copy NFIT to Xen guest
>   nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
>   nvdimm acpi: do not build _FIT method on Xen
>   hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
> 
>  backends/Makefile.objs |   1 +
>  backends/hostmem-xen.c | 108 ++++++++++++++++++++++++++
>  backends/hostmem.c     |   9 +++
>  hw/acpi/aml-build.c    |  10 ++-
>  hw/acpi/nvdimm.c       |  79 ++++++++++++++-----
>  hw/i386/pc.c           | 102 ++++++++++++++-----------
>  hw/i386/xen/xen-hvm.c  | 204 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/mem/nvdimm.c        |  10 ++-
>  hw/mem/pc-dimm.c       |   6 +-
>  include/hw/i386/pc.h   |   1 +
>  include/hw/xen/xen.h   |  25 ++++++
>  stubs/xen-hvm.c        |  10 +++
>  12 files changed, 495 insertions(+), 70 deletions(-)
>  create mode 100644 backends/hostmem-xen.c
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
  2017-09-11  5:39     ` Haozhong Zhang
@ 2017-09-11 16:35       ` Dan Williams
  2017-09-11 21:24         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 128+ messages in thread
From: Dan Williams @ 2017-09-11 16:35 UTC (permalink / raw)
  To: Dan Williams, xen-devel, Konrad Rzeszutek Wilk, Chao Peng,
	Ian Jackson, Wei Liu

On Sun, Sep 10, 2017 at 10:39 PM, Haozhong Zhang
<haozhong.zhang@intel.com> wrote:
> On 09/10/17 22:10 -0700, Dan Williams wrote:
>> On Sun, Sep 10, 2017 at 9:37 PM, Haozhong Zhang
>> <haozhong.zhang@intel.com> wrote:
>> > The kernel NVDIMM driver and the traditional NVDIMM management
>> > utilities in Dom0 does not work now. 'xen-ndctl' is added as an
>> > alternatively, which manages NVDIMM via Xen hypercalls.
>> >
>> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> > ---
>> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> > Cc: Wei Liu <wei.liu2@citrix.com>
>> > ---
>> >  .gitignore             |   1 +
>> >  tools/misc/Makefile    |   4 ++
>> >  tools/misc/xen-ndctl.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++
>> >  3 files changed, 177 insertions(+)
>> >  create mode 100644 tools/misc/xen-ndctl.c
>>
>> What about my offer to move this functionality into the upstream ndctl
>> utility [1]? I think it is thoroughly confusing that you are reusing
>> the name 'ndctl' and avoiding integration with the upstream ndctl
>> utility.
>>
>> [1]: https://patchwork.kernel.org/patch/9632865/
>
> I'm not object to integrate it with ndctl.
>
> My only concern is that the integration will introduces two types of
> user interface. The upstream ndctl works with the kernel driver and
> provides easily used *names* (e.g., namespace0.0, region0, nmem0,
> etc.) for user input. However, this version patchset hides NFIT from
> Dom0 (to simplify the first implementation), so the kernel driver does
> not work in Dom0, neither does ndctl. Instead, xen-ndctl has to use
> *the physical address* for users to specify their interested NVDIMM
> region, which is different from upstream ndctl.

Ok, I think this means that xen-ndctl should be renamed (xen-nvdimm?)
so that the distinction between the 2 tools is clear.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-09-11 14:08     ` Igor Mammedov
@ 2017-09-11 18:52       ` Stefano Stabellini
  -1 siblings, 0 replies; 128+ messages in thread
From: Stefano Stabellini @ 2017-09-11 18:52 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Haozhong Zhang, qemu-devel, xen-devel, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson,
	Stefano Stabellini, Anthony Perard, xen-devel, ian.jackson,
	wei.liu2, george.dunlap, JBeulich, andrew.cooper3

CC'ing xen-devel, and the Xen tools and x86 maintainers.

On Mon, 11 Sep 2017, Igor Mammedov wrote:
> On Mon, 11 Sep 2017 12:41:47 +0800
> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> 
> > This is the QEMU part patches that works with the associated Xen
> > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > guest address space for vNVDIMM devices.
> > 
> > All patches can be found at
> >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > 
> > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > label data, as the Xen side support for labels is not implemented yet.
> > 
> > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > memory region for Xen guest, in order to make the existing nvdimm
> > device plugging path work on Xen.
> > 
> > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > used as the Xen device model.
> 
> I've skimmed over patch-set and can't say that I'm happy with
> number of xen_enabled() invariants it introduced as well as
> with partial blobs it creates.

I have not read the series (Haozhong, please CC me, Anthony and
xen-devel to the whole series next time), but yes, indeed. Let's not add
more xen_enabled() if possible.

Haozhong, was there a design document thread on xen-devel about this? If
so, did it reach a conclusion? Was the design accepted? If so, please
add a link to the design doc in the introductory email, so that
everybody can read it and be on the same page.


> I'd like to reduce above and a way to do this might be making xen 
>  1. use fw_cfg
>  2. fetch QEMU build acpi tables from fw_cfg
>  3. extract nvdim tables (which is trivial) and use them
> 
> looking at xen_load_linux(), it seems possible to use fw_cfg.
> 
> So what's stopping xen from using it elsewhere?,
> instead of adding more xen specific code to do 'the same'
> job and not reusing/sharing common code with tcg/kvm.

So far, ACPI tables have not been generated by QEMU. Xen HVM machines
rely on a firmware-like application called "hvmloader" that runs in
guest context and generates the ACPI tables. I have no opinions on
hvmloader and I'll let the Xen maintainers talk about it. However, keep
in mind that with an HVM guest some devices are emulated by Xen and/or
by other device emulators that can run alongside QEMU. QEMU doesn't have
a full few of the system.

Here the question is: does it have to be QEMU the one to generate the
ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
like the rest, instead of introducing this split-brain design about
ACPI. We need to see a design doc to fully understand this.

If the design doc thread led into thinking that it has to be QEMU to
generate them, then would it make the code nicer if we used fw_cfg to
get the (full or partial) tables from QEMU, as Igor suggested?


> > Haozhong Zhang (10):
> >   nvdimm: do not intiailize nvdimm->label_data if label size is zero
> >   hw/xen-hvm: create the hotplug memory region on Xen
> >   hostmem-xen: add a host memory backend for Xen
> >   nvdimm acpi: do not use fw_cfg on Xen
> >   hw/xen-hvm: initialize DM ACPI
> >   hw/xen-hvm: add function to copy ACPI into guest memory
> >   nvdimm acpi: copy NFIT to Xen guest
> >   nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
> >   nvdimm acpi: do not build _FIT method on Xen
> >   hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
> > 
> >  backends/Makefile.objs |   1 +
> >  backends/hostmem-xen.c | 108 ++++++++++++++++++++++++++
> >  backends/hostmem.c     |   9 +++
> >  hw/acpi/aml-build.c    |  10 ++-
> >  hw/acpi/nvdimm.c       |  79 ++++++++++++++-----
> >  hw/i386/pc.c           | 102 ++++++++++++++-----------
> >  hw/i386/xen/xen-hvm.c  | 204 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  hw/mem/nvdimm.c        |  10 ++-
> >  hw/mem/pc-dimm.c       |   6 +-
> >  include/hw/i386/pc.h   |   1 +
> >  include/hw/xen/xen.h   |  25 ++++++
> >  stubs/xen-hvm.c        |  10 +++
> >  12 files changed, 495 insertions(+), 70 deletions(-)
> >  create mode 100644 backends/hostmem-xen.c
> > 
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-09-11 18:52       ` Stefano Stabellini
  0 siblings, 0 replies; 128+ messages in thread
From: Stefano Stabellini @ 2017-09-11 18:52 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Haozhong Zhang, Stefano Stabellini, wei.liu2, Eduardo Habkost,
	andrew.cooper3, Michael S. Tsirkin, ian.jackson, qemu-devel,
	xen-devel, xen-devel, JBeulich, Paolo Bonzini, Anthony Perard,
	Chao Peng, Dan Williams, Richard Henderson, george.dunlap,
	Xiao Guangrong

CC'ing xen-devel, and the Xen tools and x86 maintainers.

On Mon, 11 Sep 2017, Igor Mammedov wrote:
> On Mon, 11 Sep 2017 12:41:47 +0800
> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> 
> > This is the QEMU part patches that works with the associated Xen
> > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > guest address space for vNVDIMM devices.
> > 
> > All patches can be found at
> >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > 
> > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > label data, as the Xen side support for labels is not implemented yet.
> > 
> > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > memory region for Xen guest, in order to make the existing nvdimm
> > device plugging path work on Xen.
> > 
> > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > used as the Xen device model.
> 
> I've skimmed over patch-set and can't say that I'm happy with
> number of xen_enabled() invariants it introduced as well as
> with partial blobs it creates.

I have not read the series (Haozhong, please CC me, Anthony and
xen-devel to the whole series next time), but yes, indeed. Let's not add
more xen_enabled() if possible.

Haozhong, was there a design document thread on xen-devel about this? If
so, did it reach a conclusion? Was the design accepted? If so, please
add a link to the design doc in the introductory email, so that
everybody can read it and be on the same page.


> I'd like to reduce above and a way to do this might be making xen 
>  1. use fw_cfg
>  2. fetch QEMU build acpi tables from fw_cfg
>  3. extract nvdim tables (which is trivial) and use them
> 
> looking at xen_load_linux(), it seems possible to use fw_cfg.
> 
> So what's stopping xen from using it elsewhere?,
> instead of adding more xen specific code to do 'the same'
> job and not reusing/sharing common code with tcg/kvm.

So far, ACPI tables have not been generated by QEMU. Xen HVM machines
rely on a firmware-like application called "hvmloader" that runs in
guest context and generates the ACPI tables. I have no opinions on
hvmloader and I'll let the Xen maintainers talk about it. However, keep
in mind that with an HVM guest some devices are emulated by Xen and/or
by other device emulators that can run alongside QEMU. QEMU doesn't have
a full few of the system.

Here the question is: does it have to be QEMU the one to generate the
ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
like the rest, instead of introducing this split-brain design about
ACPI. We need to see a design doc to fully understand this.

If the design doc thread led into thinking that it has to be QEMU to
generate them, then would it make the code nicer if we used fw_cfg to
get the (full or partial) tables from QEMU, as Igor suggested?


> > Haozhong Zhang (10):
> >   nvdimm: do not intiailize nvdimm->label_data if label size is zero
> >   hw/xen-hvm: create the hotplug memory region on Xen
> >   hostmem-xen: add a host memory backend for Xen
> >   nvdimm acpi: do not use fw_cfg on Xen
> >   hw/xen-hvm: initialize DM ACPI
> >   hw/xen-hvm: add function to copy ACPI into guest memory
> >   nvdimm acpi: copy NFIT to Xen guest
> >   nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
> >   nvdimm acpi: do not build _FIT method on Xen
> >   hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
> > 
> >  backends/Makefile.objs |   1 +
> >  backends/hostmem-xen.c | 108 ++++++++++++++++++++++++++
> >  backends/hostmem.c     |   9 +++
> >  hw/acpi/aml-build.c    |  10 ++-
> >  hw/acpi/nvdimm.c       |  79 ++++++++++++++-----
> >  hw/i386/pc.c           | 102 ++++++++++++++-----------
> >  hw/i386/xen/xen-hvm.c  | 204 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  hw/mem/nvdimm.c        |  10 ++-
> >  hw/mem/pc-dimm.c       |   6 +-
> >  include/hw/i386/pc.h   |   1 +
> >  include/hw/xen/xen.h   |  25 ++++++
> >  stubs/xen-hvm.c        |  10 +++
> >  12 files changed, 495 insertions(+), 70 deletions(-)
> >  create mode 100644 backends/hostmem-xen.c
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
  2017-09-11 16:35       ` Dan Williams
@ 2017-09-11 21:24         ` Konrad Rzeszutek Wilk
  2017-09-13 17:45           ` Dan Williams
  0 siblings, 1 reply; 128+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-09-11 21:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: Chao Peng, Ian Jackson, Wei Liu, xen-devel

On Mon, Sep 11, 2017 at 09:35:08AM -0700, Dan Williams wrote:
> On Sun, Sep 10, 2017 at 10:39 PM, Haozhong Zhang
> <haozhong.zhang@intel.com> wrote:
> > On 09/10/17 22:10 -0700, Dan Williams wrote:
> >> On Sun, Sep 10, 2017 at 9:37 PM, Haozhong Zhang
> >> <haozhong.zhang@intel.com> wrote:
> >> > The kernel NVDIMM driver and the traditional NVDIMM management
> >> > utilities in Dom0 does not work now. 'xen-ndctl' is added as an
> >> > alternatively, which manages NVDIMM via Xen hypercalls.
> >> >
> >> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >> > ---
> >> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> >> > Cc: Wei Liu <wei.liu2@citrix.com>
> >> > ---
> >> >  .gitignore             |   1 +
> >> >  tools/misc/Makefile    |   4 ++
> >> >  tools/misc/xen-ndctl.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++
> >> >  3 files changed, 177 insertions(+)
> >> >  create mode 100644 tools/misc/xen-ndctl.c
> >>
> >> What about my offer to move this functionality into the upstream ndctl
> >> utility [1]? I think it is thoroughly confusing that you are reusing
> >> the name 'ndctl' and avoiding integration with the upstream ndctl
> >> utility.
> >>
> >> [1]: https://patchwork.kernel.org/patch/9632865/
> >
> > I'm not object to integrate it with ndctl.
> >
> > My only concern is that the integration will introduces two types of
> > user interface. The upstream ndctl works with the kernel driver and
> > provides easily used *names* (e.g., namespace0.0, region0, nmem0,
> > etc.) for user input. However, this version patchset hides NFIT from
> > Dom0 (to simplify the first implementation), so the kernel driver does
> > not work in Dom0, neither does ndctl. Instead, xen-ndctl has to use
> > *the physical address* for users to specify their interested NVDIMM
> > region, which is different from upstream ndctl.
> 
> Ok, I think this means that xen-ndctl should be renamed (xen-nvdimm?)
> so that the distinction between the 2 tools is clear.

I think it makes much more sense to integrate this in the upstream
version of ndctl. As surely in the future the ndctl will need to work
with other OSes too? Such as FreeBSD?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-09-11 18:52       ` Stefano Stabellini
@ 2017-09-12  3:15         ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-12  3:15 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Igor Mammedov, qemu-devel, xen-devel, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson, Anthony Perard,
	xen-devel, ian.jackson, wei.liu2, george.dunlap, JBeulich,
	andrew.cooper3

On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> CC'ing xen-devel, and the Xen tools and x86 maintainers.
> 
> On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > On Mon, 11 Sep 2017 12:41:47 +0800
> > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > 
> > > This is the QEMU part patches that works with the associated Xen
> > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > guest address space for vNVDIMM devices.
> > > 
> > > All patches can be found at
> > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > 
> > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > label data, as the Xen side support for labels is not implemented yet.
> > > 
> > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > memory region for Xen guest, in order to make the existing nvdimm
> > > device plugging path work on Xen.
> > > 
> > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > used as the Xen device model.
> > 
> > I've skimmed over patch-set and can't say that I'm happy with
> > number of xen_enabled() invariants it introduced as well as
> > with partial blobs it creates.
> 
> I have not read the series (Haozhong, please CC me, Anthony and
> xen-devel to the whole series next time), but yes, indeed. Let's not add
> more xen_enabled() if possible.
> 
> Haozhong, was there a design document thread on xen-devel about this? If
> so, did it reach a conclusion? Was the design accepted? If so, please
> add a link to the design doc in the introductory email, so that
> everybody can read it and be on the same page.

Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
the guest ACPI.

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html

> 
> 
> > I'd like to reduce above and a way to do this might be making xen 
> >  1. use fw_cfg
> >  2. fetch QEMU build acpi tables from fw_cfg
> >  3. extract nvdim tables (which is trivial) and use them
> > 
> > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > 
> > So what's stopping xen from using it elsewhere?,
> > instead of adding more xen specific code to do 'the same'
> > job and not reusing/sharing common code with tcg/kvm.
> 
> So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> rely on a firmware-like application called "hvmloader" that runs in
> guest context and generates the ACPI tables. I have no opinions on
> hvmloader and I'll let the Xen maintainers talk about it. However, keep
> in mind that with an HVM guest some devices are emulated by Xen and/or
> by other device emulators that can run alongside QEMU. QEMU doesn't have
> a full few of the system.
> 
> Here the question is: does it have to be QEMU the one to generate the
> ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> like the rest, instead of introducing this split-brain design about
> ACPI. We need to see a design doc to fully understand this.
>

hvmloader runs in the guest and is responsible to build/load guest
ACPI. However, it's not capable to build AML at runtime (for the lack
of AML builder). If any guest ACPI object is needed (e.g. by guest
DSDT), it has to be generated from ASL by iasl at Xen compile time and
then be loaded by hvmloader at runtime.

Xen includes an OperationRegion "BIOS" in the static generated guest
DSDT, whose address is hardcoded and which contains a list of values
filled by hvmloader at runtime. Other ACPI objects can refer to those
values (e.g., the number of vCPUs). But it's not enough for generating
guest NVDIMM ACPI objects at compile time and then being customized
and loaded by hvmload, because its structure (i.e., the number of
namespace devices) cannot be decided util the guest config is known.

Alternatively, we may introduce an AML builder in hvmloader and build
all guest ACPI completely in hvmloader. Looking at the similar
implementation in QEMU, it would not be small, compared to the current
size of hvmloader. Besides, I'm still going to let QEMU handle guest
NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
build NVDIMM ACPI.

> If the design doc thread led into thinking that it has to be QEMU to
> generate them, then would it make the code nicer if we used fw_cfg to
> get the (full or partial) tables from QEMU, as Igor suggested?

I'll have a look at the code (which I didn't notice) pointed by Igor.

One possible issue to use fw_cfg is how to avoid the conflict between
ACPI built by QEMU and ACPI built by hvmloader (e.g., both may use the
same table signature / device names / ...). In my current design, QEMU
will pass the table signatures and device names used in its ACPI to
Xen, and Xen can check the conflict with its own ACPI. Perhaps we can
add necessary functions in fw_cfg as well. Anyway, let me first look
at the code.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-09-12  3:15         ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-09-12  3:15 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: wei.liu2, Eduardo Habkost, andrew.cooper3, Michael S. Tsirkin,
	ian.jackson, qemu-devel, xen-devel, xen-devel, Paolo Bonzini,
	JBeulich, Chao Peng, Anthony Perard, Igor Mammedov, Dan Williams,
	Richard Henderson, george.dunlap, Xiao Guangrong

On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> CC'ing xen-devel, and the Xen tools and x86 maintainers.
> 
> On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > On Mon, 11 Sep 2017 12:41:47 +0800
> > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > 
> > > This is the QEMU part patches that works with the associated Xen
> > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > guest address space for vNVDIMM devices.
> > > 
> > > All patches can be found at
> > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > 
> > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > label data, as the Xen side support for labels is not implemented yet.
> > > 
> > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > memory region for Xen guest, in order to make the existing nvdimm
> > > device plugging path work on Xen.
> > > 
> > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > used as the Xen device model.
> > 
> > I've skimmed over patch-set and can't say that I'm happy with
> > number of xen_enabled() invariants it introduced as well as
> > with partial blobs it creates.
> 
> I have not read the series (Haozhong, please CC me, Anthony and
> xen-devel to the whole series next time), but yes, indeed. Let's not add
> more xen_enabled() if possible.
> 
> Haozhong, was there a design document thread on xen-devel about this? If
> so, did it reach a conclusion? Was the design accepted? If so, please
> add a link to the design doc in the introductory email, so that
> everybody can read it and be on the same page.

Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
the guest ACPI.

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html

> 
> 
> > I'd like to reduce above and a way to do this might be making xen 
> >  1. use fw_cfg
> >  2. fetch QEMU build acpi tables from fw_cfg
> >  3. extract nvdim tables (which is trivial) and use them
> > 
> > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > 
> > So what's stopping xen from using it elsewhere?,
> > instead of adding more xen specific code to do 'the same'
> > job and not reusing/sharing common code with tcg/kvm.
> 
> So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> rely on a firmware-like application called "hvmloader" that runs in
> guest context and generates the ACPI tables. I have no opinions on
> hvmloader and I'll let the Xen maintainers talk about it. However, keep
> in mind that with an HVM guest some devices are emulated by Xen and/or
> by other device emulators that can run alongside QEMU. QEMU doesn't have
> a full few of the system.
> 
> Here the question is: does it have to be QEMU the one to generate the
> ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> like the rest, instead of introducing this split-brain design about
> ACPI. We need to see a design doc to fully understand this.
>

hvmloader runs in the guest and is responsible to build/load guest
ACPI. However, it's not capable to build AML at runtime (for the lack
of AML builder). If any guest ACPI object is needed (e.g. by guest
DSDT), it has to be generated from ASL by iasl at Xen compile time and
then be loaded by hvmloader at runtime.

Xen includes an OperationRegion "BIOS" in the static generated guest
DSDT, whose address is hardcoded and which contains a list of values
filled by hvmloader at runtime. Other ACPI objects can refer to those
values (e.g., the number of vCPUs). But it's not enough for generating
guest NVDIMM ACPI objects at compile time and then being customized
and loaded by hvmload, because its structure (i.e., the number of
namespace devices) cannot be decided util the guest config is known.

Alternatively, we may introduce an AML builder in hvmloader and build
all guest ACPI completely in hvmloader. Looking at the similar
implementation in QEMU, it would not be small, compared to the current
size of hvmloader. Besides, I'm still going to let QEMU handle guest
NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
build NVDIMM ACPI.

> If the design doc thread led into thinking that it has to be QEMU to
> generate them, then would it make the code nicer if we used fw_cfg to
> get the (full or partial) tables from QEMU, as Igor suggested?

I'll have a look at the code (which I didn't notice) pointed by Igor.

One possible issue to use fw_cfg is how to avoid the conflict between
ACPI built by QEMU and ACPI built by hvmloader (e.g., both may use the
same table signature / device names / ...). In my current design, QEMU
will pass the table signatures and device names used in its ACPI to
Xen, and Xen can check the conflict with its own ACPI. Perhaps we can
add necessary functions in fw_cfg as well. Anyway, let me first look
at the code.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
  2017-09-11 21:24         ` Konrad Rzeszutek Wilk
@ 2017-09-13 17:45           ` Dan Williams
  0 siblings, 0 replies; 128+ messages in thread
From: Dan Williams @ 2017-09-13 17:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Chao Peng, Ian Jackson, Wei Liu, xen-devel

On Mon, Sep 11, 2017 at 2:24 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Mon, Sep 11, 2017 at 09:35:08AM -0700, Dan Williams wrote:
>> On Sun, Sep 10, 2017 at 10:39 PM, Haozhong Zhang
>> <haozhong.zhang@intel.com> wrote:
>> > On 09/10/17 22:10 -0700, Dan Williams wrote:
>> >> On Sun, Sep 10, 2017 at 9:37 PM, Haozhong Zhang
>> >> <haozhong.zhang@intel.com> wrote:
>> >> > The kernel NVDIMM driver and the traditional NVDIMM management
>> >> > utilities in Dom0 does not work now. 'xen-ndctl' is added as an
>> >> > alternatively, which manages NVDIMM via Xen hypercalls.
>> >> >
>> >> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> >> > ---
>> >> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> >> > Cc: Wei Liu <wei.liu2@citrix.com>
>> >> > ---
>> >> >  .gitignore             |   1 +
>> >> >  tools/misc/Makefile    |   4 ++
>> >> >  tools/misc/xen-ndctl.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >  3 files changed, 177 insertions(+)
>> >> >  create mode 100644 tools/misc/xen-ndctl.c
>> >>
>> >> What about my offer to move this functionality into the upstream ndctl
>> >> utility [1]? I think it is thoroughly confusing that you are reusing
>> >> the name 'ndctl' and avoiding integration with the upstream ndctl
>> >> utility.
>> >>
>> >> [1]: https://patchwork.kernel.org/patch/9632865/
>> >
>> > I'm not object to integrate it with ndctl.
>> >
>> > My only concern is that the integration will introduces two types of
>> > user interface. The upstream ndctl works with the kernel driver and
>> > provides easily used *names* (e.g., namespace0.0, region0, nmem0,
>> > etc.) for user input. However, this version patchset hides NFIT from
>> > Dom0 (to simplify the first implementation), so the kernel driver does
>> > not work in Dom0, neither does ndctl. Instead, xen-ndctl has to use
>> > *the physical address* for users to specify their interested NVDIMM
>> > region, which is different from upstream ndctl.
>>
>> Ok, I think this means that xen-ndctl should be renamed (xen-nvdimm?)
>> so that the distinction between the 2 tools is clear.
>
> I think it makes much more sense to integrate this in the upstream
> version of ndctl. As surely in the future the ndctl will need to work
> with other OSes too? Such as FreeBSD?

I'm receptive to carrying Xen-specific enabling and / or a FreeBSD
compat layer in ndctl.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-09-12  3:15         ` Haozhong Zhang
@ 2017-10-10 16:05           ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 128+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-10-10 16:05 UTC (permalink / raw)
  To: Stefano Stabellini, Igor Mammedov, qemu-devel, xen-devel,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson, Anthony Perard,
	xen-devel, ian.jackson, wei.liu2, george.dunlap, JBeulich,
	andrew.cooper3

On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > 
> > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > 
> > > > This is the QEMU part patches that works with the associated Xen
> > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > guest address space for vNVDIMM devices.
> > > > 
> > > > All patches can be found at
> > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > 
> > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > label data, as the Xen side support for labels is not implemented yet.
> > > > 
> > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > device plugging path work on Xen.
> > > > 
> > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > used as the Xen device model.
> > > 
> > > I've skimmed over patch-set and can't say that I'm happy with
> > > number of xen_enabled() invariants it introduced as well as
> > > with partial blobs it creates.
> > 
> > I have not read the series (Haozhong, please CC me, Anthony and
> > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > more xen_enabled() if possible.
> > 
> > Haozhong, was there a design document thread on xen-devel about this? If
> > so, did it reach a conclusion? Was the design accepted? If so, please
> > add a link to the design doc in the introductory email, so that
> > everybody can read it and be on the same page.
> 
> Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> the guest ACPI.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html

Igor, did you have a chance to read it?

.. see below
> 
> > 
> > 
> > > I'd like to reduce above and a way to do this might be making xen 
> > >  1. use fw_cfg
> > >  2. fetch QEMU build acpi tables from fw_cfg
> > >  3. extract nvdim tables (which is trivial) and use them
> > > 
> > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > 
> > > So what's stopping xen from using it elsewhere?,
> > > instead of adding more xen specific code to do 'the same'
> > > job and not reusing/sharing common code with tcg/kvm.
> > 
> > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > rely on a firmware-like application called "hvmloader" that runs in
> > guest context and generates the ACPI tables. I have no opinions on
> > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > in mind that with an HVM guest some devices are emulated by Xen and/or
> > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > a full few of the system.
> > 
> > Here the question is: does it have to be QEMU the one to generate the
> > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > like the rest, instead of introducing this split-brain design about
> > ACPI. We need to see a design doc to fully understand this.
> >
> 
> hvmloader runs in the guest and is responsible to build/load guest
> ACPI. However, it's not capable to build AML at runtime (for the lack
> of AML builder). If any guest ACPI object is needed (e.g. by guest
> DSDT), it has to be generated from ASL by iasl at Xen compile time and
> then be loaded by hvmloader at runtime.
> 
> Xen includes an OperationRegion "BIOS" in the static generated guest
> DSDT, whose address is hardcoded and which contains a list of values
> filled by hvmloader at runtime. Other ACPI objects can refer to those
> values (e.g., the number of vCPUs). But it's not enough for generating
> guest NVDIMM ACPI objects at compile time and then being customized
> and loaded by hvmload, because its structure (i.e., the number of
> namespace devices) cannot be decided util the guest config is known.
> 
> Alternatively, we may introduce an AML builder in hvmloader and build
> all guest ACPI completely in hvmloader. Looking at the similar
> implementation in QEMU, it would not be small, compared to the current
> size of hvmloader. Besides, I'm still going to let QEMU handle guest
> NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> build NVDIMM ACPI.
> 
> > If the design doc thread led into thinking that it has to be QEMU to
> > generate them, then would it make the code nicer if we used fw_cfg to
> > get the (full or partial) tables from QEMU, as Igor suggested?
> 
> I'll have a look at the code (which I didn't notice) pointed by Igor.

And there is a spec too!

https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt

Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
ACPI AML code?

> 
> One possible issue to use fw_cfg is how to avoid the conflict between
> ACPI built by QEMU and ACPI built by hvmloader (e.g., both may use the
> same table signature / device names / ...). In my current design, QEMU
> will pass the table signatures and device names used in its ACPI to
> Xen, and Xen can check the conflict with its own ACPI. Perhaps we can
> add necessary functions in fw_cfg as well. Anyway, let me first look
> at the code.
> 
> Thanks,
> Haozhong

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-10 16:05           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 128+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-10-10 16:05 UTC (permalink / raw)
  To: Stefano Stabellini, Igor Mammedov, qemu-devel, xen-devel,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson, Anthony Perard,
	xen-devel, ian.jackson, wei.liu2, george.dunlap, JBeulich,
	andrew.cooper3

On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > 
> > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > 
> > > > This is the QEMU part patches that works with the associated Xen
> > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > guest address space for vNVDIMM devices.
> > > > 
> > > > All patches can be found at
> > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > 
> > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > label data, as the Xen side support for labels is not implemented yet.
> > > > 
> > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > device plugging path work on Xen.
> > > > 
> > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > used as the Xen device model.
> > > 
> > > I've skimmed over patch-set and can't say that I'm happy with
> > > number of xen_enabled() invariants it introduced as well as
> > > with partial blobs it creates.
> > 
> > I have not read the series (Haozhong, please CC me, Anthony and
> > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > more xen_enabled() if possible.
> > 
> > Haozhong, was there a design document thread on xen-devel about this? If
> > so, did it reach a conclusion? Was the design accepted? If so, please
> > add a link to the design doc in the introductory email, so that
> > everybody can read it and be on the same page.
> 
> Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> the guest ACPI.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html

Igor, did you have a chance to read it?

.. see below
> 
> > 
> > 
> > > I'd like to reduce above and a way to do this might be making xen 
> > >  1. use fw_cfg
> > >  2. fetch QEMU build acpi tables from fw_cfg
> > >  3. extract nvdim tables (which is trivial) and use them
> > > 
> > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > 
> > > So what's stopping xen from using it elsewhere?,
> > > instead of adding more xen specific code to do 'the same'
> > > job and not reusing/sharing common code with tcg/kvm.
> > 
> > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > rely on a firmware-like application called "hvmloader" that runs in
> > guest context and generates the ACPI tables. I have no opinions on
> > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > in mind that with an HVM guest some devices are emulated by Xen and/or
> > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > a full few of the system.
> > 
> > Here the question is: does it have to be QEMU the one to generate the
> > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > like the rest, instead of introducing this split-brain design about
> > ACPI. We need to see a design doc to fully understand this.
> >
> 
> hvmloader runs in the guest and is responsible to build/load guest
> ACPI. However, it's not capable to build AML at runtime (for the lack
> of AML builder). If any guest ACPI object is needed (e.g. by guest
> DSDT), it has to be generated from ASL by iasl at Xen compile time and
> then be loaded by hvmloader at runtime.
> 
> Xen includes an OperationRegion "BIOS" in the static generated guest
> DSDT, whose address is hardcoded and which contains a list of values
> filled by hvmloader at runtime. Other ACPI objects can refer to those
> values (e.g., the number of vCPUs). But it's not enough for generating
> guest NVDIMM ACPI objects at compile time and then being customized
> and loaded by hvmload, because its structure (i.e., the number of
> namespace devices) cannot be decided util the guest config is known.
> 
> Alternatively, we may introduce an AML builder in hvmloader and build
> all guest ACPI completely in hvmloader. Looking at the similar
> implementation in QEMU, it would not be small, compared to the current
> size of hvmloader. Besides, I'm still going to let QEMU handle guest
> NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> build NVDIMM ACPI.
> 
> > If the design doc thread led into thinking that it has to be QEMU to
> > generate them, then would it make the code nicer if we used fw_cfg to
> > get the (full or partial) tables from QEMU, as Igor suggested?
> 
> I'll have a look at the code (which I didn't notice) pointed by Igor.

And there is a spec too!

https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt

Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
ACPI AML code?

> 
> One possible issue to use fw_cfg is how to avoid the conflict between
> ACPI built by QEMU and ACPI built by hvmloader (e.g., both may use the
> same table signature / device names / ...). In my current design, QEMU
> will pass the table signatures and device names used in its ACPI to
> Xen, and Xen can check the conflict with its own ACPI. Perhaps we can
> add necessary functions in fw_cfg as well. Anyway, let me first look
> at the code.
> 
> Thanks,
> Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-10 16:05           ` Konrad Rzeszutek Wilk
@ 2017-10-12 12:45             ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-12 12:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Igor Mammedov, qemu-devel, xen-devel,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson, Anthony Perard,
	xen-devel, ian.jackson, wei.liu2, george.dunlap, JBeulich,
	andrew.cooper3

On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > 
> > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > > 
> > > > > This is the QEMU part patches that works with the associated Xen
> > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > guest address space for vNVDIMM devices.
> > > > > 
> > > > > All patches can be found at
> > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > 
> > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > label data, as the Xen side support for labels is not implemented yet.
> > > > > 
> > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > device plugging path work on Xen.
> > > > > 
> > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > > used as the Xen device model.
> > > > 
> > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > number of xen_enabled() invariants it introduced as well as
> > > > with partial blobs it creates.
> > > 
> > > I have not read the series (Haozhong, please CC me, Anthony and
> > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > more xen_enabled() if possible.
> > > 
> > > Haozhong, was there a design document thread on xen-devel about this? If
> > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > add a link to the design doc in the introductory email, so that
> > > everybody can read it and be on the same page.
> > 
> > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > the guest ACPI.
> > 
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> 
> Igor, did you have a chance to read it?
> 
> .. see below
> > 
> > > 
> > > 
> > > > I'd like to reduce above and a way to do this might be making xen 
> > > >  1. use fw_cfg
> > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > >  3. extract nvdim tables (which is trivial) and use them
> > > > 
> > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > 
> > > > So what's stopping xen from using it elsewhere?,
> > > > instead of adding more xen specific code to do 'the same'
> > > > job and not reusing/sharing common code with tcg/kvm.
> > > 
> > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > rely on a firmware-like application called "hvmloader" that runs in
> > > guest context and generates the ACPI tables. I have no opinions on
> > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > a full few of the system.
> > > 
> > > Here the question is: does it have to be QEMU the one to generate the
> > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > like the rest, instead of introducing this split-brain design about
> > > ACPI. We need to see a design doc to fully understand this.
> > >
> > 
> > hvmloader runs in the guest and is responsible to build/load guest
> > ACPI. However, it's not capable to build AML at runtime (for the lack
> > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > then be loaded by hvmloader at runtime.
> > 
> > Xen includes an OperationRegion "BIOS" in the static generated guest
> > DSDT, whose address is hardcoded and which contains a list of values
> > filled by hvmloader at runtime. Other ACPI objects can refer to those
> > values (e.g., the number of vCPUs). But it's not enough for generating
> > guest NVDIMM ACPI objects at compile time and then being customized
> > and loaded by hvmload, because its structure (i.e., the number of
> > namespace devices) cannot be decided util the guest config is known.
> > 
> > Alternatively, we may introduce an AML builder in hvmloader and build
> > all guest ACPI completely in hvmloader. Looking at the similar
> > implementation in QEMU, it would not be small, compared to the current
> > size of hvmloader. Besides, I'm still going to let QEMU handle guest
> > NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> > build NVDIMM ACPI.
> > 
> > > If the design doc thread led into thinking that it has to be QEMU to
> > > generate them, then would it make the code nicer if we used fw_cfg to
> > > get the (full or partial) tables from QEMU, as Igor suggested?
> > 
> > I'll have a look at the code (which I didn't notice) pointed by Igor.
> 
> And there is a spec too!
> 
> https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt
> 
> Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
> ACPI AML code?
> 

Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
/rom@etc/table-loader. The former is unstructured to guest, and
contains all data of guest ACPI. The latter is a BIOSLinkerLoader
organized as a set of commands, which direct the guest (e.g., SeaBIOS
on KVM/QEMU) to relocate data in the former file, recalculate checksum
of specified area, and fill guest address in specified ACPI field.

One part of my patches is to implement a mechanism to tell Xen which
part of ACPI data is a table (NFIT), and which part defines a
namespace device and what the device name is. I can add two new loader
commands for them respectively.

Because they just provide information and SeaBIOS in non-xen
environment ignores unrecognized commands, they will not break SeaBIOS
in non-xen environment.

On QEMU side, most Xen-specific hacks in ACPI builder could be
dropped, and replaced by adding the new loader commands (though they
may be used only by Xen).

On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
are needed in, perhaps, hvmloader.


Haozhong

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-12 12:45             ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-12 12:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, wei.liu2, Eduardo Habkost,
	Michael S. Tsirkin, andrew.cooper3, ian.jackson, qemu-devel,
	xen-devel, xen-devel, Paolo Bonzini, JBeulich, Chao Peng,
	Anthony Perard, Igor Mammedov, Dan Williams, Richard Henderson,
	george.dunlap, Xiao Guangrong

On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > 
> > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > > 
> > > > > This is the QEMU part patches that works with the associated Xen
> > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > guest address space for vNVDIMM devices.
> > > > > 
> > > > > All patches can be found at
> > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > 
> > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > label data, as the Xen side support for labels is not implemented yet.
> > > > > 
> > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > device plugging path work on Xen.
> > > > > 
> > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > > used as the Xen device model.
> > > > 
> > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > number of xen_enabled() invariants it introduced as well as
> > > > with partial blobs it creates.
> > > 
> > > I have not read the series (Haozhong, please CC me, Anthony and
> > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > more xen_enabled() if possible.
> > > 
> > > Haozhong, was there a design document thread on xen-devel about this? If
> > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > add a link to the design doc in the introductory email, so that
> > > everybody can read it and be on the same page.
> > 
> > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > the guest ACPI.
> > 
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> 
> Igor, did you have a chance to read it?
> 
> .. see below
> > 
> > > 
> > > 
> > > > I'd like to reduce above and a way to do this might be making xen 
> > > >  1. use fw_cfg
> > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > >  3. extract nvdim tables (which is trivial) and use them
> > > > 
> > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > 
> > > > So what's stopping xen from using it elsewhere?,
> > > > instead of adding more xen specific code to do 'the same'
> > > > job and not reusing/sharing common code with tcg/kvm.
> > > 
> > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > rely on a firmware-like application called "hvmloader" that runs in
> > > guest context and generates the ACPI tables. I have no opinions on
> > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > a full few of the system.
> > > 
> > > Here the question is: does it have to be QEMU the one to generate the
> > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > like the rest, instead of introducing this split-brain design about
> > > ACPI. We need to see a design doc to fully understand this.
> > >
> > 
> > hvmloader runs in the guest and is responsible to build/load guest
> > ACPI. However, it's not capable to build AML at runtime (for the lack
> > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > then be loaded by hvmloader at runtime.
> > 
> > Xen includes an OperationRegion "BIOS" in the static generated guest
> > DSDT, whose address is hardcoded and which contains a list of values
> > filled by hvmloader at runtime. Other ACPI objects can refer to those
> > values (e.g., the number of vCPUs). But it's not enough for generating
> > guest NVDIMM ACPI objects at compile time and then being customized
> > and loaded by hvmload, because its structure (i.e., the number of
> > namespace devices) cannot be decided util the guest config is known.
> > 
> > Alternatively, we may introduce an AML builder in hvmloader and build
> > all guest ACPI completely in hvmloader. Looking at the similar
> > implementation in QEMU, it would not be small, compared to the current
> > size of hvmloader. Besides, I'm still going to let QEMU handle guest
> > NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> > build NVDIMM ACPI.
> > 
> > > If the design doc thread led into thinking that it has to be QEMU to
> > > generate them, then would it make the code nicer if we used fw_cfg to
> > > get the (full or partial) tables from QEMU, as Igor suggested?
> > 
> > I'll have a look at the code (which I didn't notice) pointed by Igor.
> 
> And there is a spec too!
> 
> https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt
> 
> Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
> ACPI AML code?
> 

Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
/rom@etc/table-loader. The former is unstructured to guest, and
contains all data of guest ACPI. The latter is a BIOSLinkerLoader
organized as a set of commands, which direct the guest (e.g., SeaBIOS
on KVM/QEMU) to relocate data in the former file, recalculate checksum
of specified area, and fill guest address in specified ACPI field.

One part of my patches is to implement a mechanism to tell Xen which
part of ACPI data is a table (NFIT), and which part defines a
namespace device and what the device name is. I can add two new loader
commands for them respectively.

Because they just provide information and SeaBIOS in non-xen
environment ignores unrecognized commands, they will not break SeaBIOS
in non-xen environment.

On QEMU side, most Xen-specific hacks in ACPI builder could be
dropped, and replaced by adding the new loader commands (though they
may be used only by Xen).

On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
are needed in, perhaps, hvmloader.


Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-12 12:45             ` Haozhong Zhang
@ 2017-10-12 15:45               ` Paolo Bonzini
  -1 siblings, 0 replies; 128+ messages in thread
From: Paolo Bonzini @ 2017-10-12 15:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Stefano Stabellini, Igor Mammedov,
	qemu-devel, xen-devel, Dan Williams, Chao Peng, Eduardo Habkost,
	Michael S. Tsirkin, Xiao Guangrong, Richard Henderson,
	Anthony Perard, xen-devel, ian.jackson, wei.liu2, george.dunlap,
	JBeulich, andrew.cooper3

On 12/10/2017 14:45, Haozhong Zhang wrote:
> Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> /rom@etc/table-loader. The former is unstructured to guest, and
> contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> organized as a set of commands, which direct the guest (e.g., SeaBIOS
> on KVM/QEMU) to relocate data in the former file, recalculate checksum
> of specified area, and fill guest address in specified ACPI field.
> 
> One part of my patches is to implement a mechanism to tell Xen which
> part of ACPI data is a table (NFIT), and which part defines a
> namespace device and what the device name is. I can add two new loader
> commands for them respectively.
> 
> Because they just provide information and SeaBIOS in non-xen
> environment ignores unrecognized commands, they will not break SeaBIOS
> in non-xen environment.
> 
> On QEMU side, most Xen-specific hacks in ACPI builder could be
> dropped, and replaced by adding the new loader commands (though they
> may be used only by Xen).
> 
> On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> are needed in, perhaps, hvmloader.

If Xen has to parse BIOSLinkerLoader, it can use the existing commands
to process a reduced set of ACPI tables.  In other words,
etc/acpi/tables would only include the NFIT, the SSDT with namespace
devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.

hvmloader can then:

1) allocate some memory for where the XSDT will go

2) process the BIOSLinkerLoader like SeaBIOS would do

3) find the RSDP in low memory, since the loader script must have placed
it there.  If it cannot find it, allocate some low memory, fill it with
the RSDP header and revision, and and jump to step 6

4) If it found QEMU's RSDP, use it to find QEMU's XSDT

5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.

6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.

7) overwrite the RSDP in low memory with a pointer to hvmloader's own
RSDT and/or XSDT, and updated the checksums

QEMU's XSDT remains there somewhere in memory, unused but harmless.

Paolo

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-12 15:45               ` Paolo Bonzini
  0 siblings, 0 replies; 128+ messages in thread
From: Paolo Bonzini @ 2017-10-12 15:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Stefano Stabellini, Igor Mammedov,
	qemu-devel, xen-devel, Dan Williams, Chao Peng, Eduardo Habkost,
	Michael S. Tsirkin, Xiao Guangrong, Richard Henderson,
	Anthony Perard, xen-devel, ian.jackson, wei.liu2, george.dunlap,
	JBeulich, andrew.cooper3

On 12/10/2017 14:45, Haozhong Zhang wrote:
> Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> /rom@etc/table-loader. The former is unstructured to guest, and
> contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> organized as a set of commands, which direct the guest (e.g., SeaBIOS
> on KVM/QEMU) to relocate data in the former file, recalculate checksum
> of specified area, and fill guest address in specified ACPI field.
> 
> One part of my patches is to implement a mechanism to tell Xen which
> part of ACPI data is a table (NFIT), and which part defines a
> namespace device and what the device name is. I can add two new loader
> commands for them respectively.
> 
> Because they just provide information and SeaBIOS in non-xen
> environment ignores unrecognized commands, they will not break SeaBIOS
> in non-xen environment.
> 
> On QEMU side, most Xen-specific hacks in ACPI builder could be
> dropped, and replaced by adding the new loader commands (though they
> may be used only by Xen).
> 
> On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> are needed in, perhaps, hvmloader.

If Xen has to parse BIOSLinkerLoader, it can use the existing commands
to process a reduced set of ACPI tables.  In other words,
etc/acpi/tables would only include the NFIT, the SSDT with namespace
devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.

hvmloader can then:

1) allocate some memory for where the XSDT will go

2) process the BIOSLinkerLoader like SeaBIOS would do

3) find the RSDP in low memory, since the loader script must have placed
it there.  If it cannot find it, allocate some low memory, fill it with
the RSDP header and revision, and and jump to step 6

4) If it found QEMU's RSDP, use it to find QEMU's XSDT

5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.

6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.

7) overwrite the RSDP in low memory with a pointer to hvmloader's own
RSDT and/or XSDT, and updated the checksums

QEMU's XSDT remains there somewhere in memory, unused but harmless.

Paolo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-12 12:45             ` Haozhong Zhang
@ 2017-10-12 17:39               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 128+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-10-12 17:39 UTC (permalink / raw)
  To: Stefano Stabellini, Igor Mammedov, qemu-devel, xen-devel,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson, Anthony Perard,
	xen-devel, ian.jackson, wei.liu2, george.dunlap, JBeulich,
	andrew.cooper3

On Thu, Oct 12, 2017 at 08:45:44PM +0800, Haozhong Zhang wrote:
> On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > > 
> > > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > > > 
> > > > > > This is the QEMU part patches that works with the associated Xen
> > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > > guest address space for vNVDIMM devices.
> > > > > > 
> > > > > > All patches can be found at
> > > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > > 
> > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > > label data, as the Xen side support for labels is not implemented yet.
> > > > > > 
> > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > > device plugging path work on Xen.
> > > > > > 
> > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > > > used as the Xen device model.
> > > > > 
> > > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > > number of xen_enabled() invariants it introduced as well as
> > > > > with partial blobs it creates.
> > > > 
> > > > I have not read the series (Haozhong, please CC me, Anthony and
> > > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > > more xen_enabled() if possible.
> > > > 
> > > > Haozhong, was there a design document thread on xen-devel about this? If
> > > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > > add a link to the design doc in the introductory email, so that
> > > > everybody can read it and be on the same page.
> > > 
> > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > > the guest ACPI.
> > > 
> > > [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> > 
> > Igor, did you have a chance to read it?
> > 
> > .. see below
> > > 
> > > > 
> > > > 
> > > > > I'd like to reduce above and a way to do this might be making xen 
> > > > >  1. use fw_cfg
> > > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > > >  3. extract nvdim tables (which is trivial) and use them
> > > > > 
> > > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > > 
> > > > > So what's stopping xen from using it elsewhere?,
> > > > > instead of adding more xen specific code to do 'the same'
> > > > > job and not reusing/sharing common code with tcg/kvm.
> > > > 
> > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > > rely on a firmware-like application called "hvmloader" that runs in
> > > > guest context and generates the ACPI tables. I have no opinions on
> > > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > > a full few of the system.
> > > > 
> > > > Here the question is: does it have to be QEMU the one to generate the
> > > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > > like the rest, instead of introducing this split-brain design about
> > > > ACPI. We need to see a design doc to fully understand this.
> > > >
> > > 
> > > hvmloader runs in the guest and is responsible to build/load guest
> > > ACPI. However, it's not capable to build AML at runtime (for the lack
> > > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > > then be loaded by hvmloader at runtime.
> > > 
> > > Xen includes an OperationRegion "BIOS" in the static generated guest
> > > DSDT, whose address is hardcoded and which contains a list of values
> > > filled by hvmloader at runtime. Other ACPI objects can refer to those
> > > values (e.g., the number of vCPUs). But it's not enough for generating
> > > guest NVDIMM ACPI objects at compile time and then being customized
> > > and loaded by hvmload, because its structure (i.e., the number of
> > > namespace devices) cannot be decided util the guest config is known.
> > > 
> > > Alternatively, we may introduce an AML builder in hvmloader and build
> > > all guest ACPI completely in hvmloader. Looking at the similar
> > > implementation in QEMU, it would not be small, compared to the current
> > > size of hvmloader. Besides, I'm still going to let QEMU handle guest
> > > NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> > > build NVDIMM ACPI.
> > > 
> > > > If the design doc thread led into thinking that it has to be QEMU to
> > > > generate them, then would it make the code nicer if we used fw_cfg to
> > > > get the (full or partial) tables from QEMU, as Igor suggested?
> > > 
> > > I'll have a look at the code (which I didn't notice) pointed by Igor.
> > 
> > And there is a spec too!
> > 
> > https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt
> > 
> > Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
> > ACPI AML code?
> > 
> 
> Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> /rom@etc/table-loader. The former is unstructured to guest, and
> contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> organized as a set of commands, which direct the guest (e.g., SeaBIOS
> on KVM/QEMU) to relocate data in the former file, recalculate checksum
> of specified area, and fill guest address in specified ACPI field.
> 
> One part of my patches is to implement a mechanism to tell Xen which
> part of ACPI data is a table (NFIT), and which part defines a
> namespace device and what the device name is. I can add two new loader
> commands for them respectively.

<nods>
> 
> Because they just provide information and SeaBIOS in non-xen
> environment ignores unrecognized commands, they will not break SeaBIOS
> in non-xen environment.
> 
> On QEMU side, most Xen-specific hacks in ACPI builder could be

Wooot!
> dropped, and replaced by adding the new loader commands (though they
> may be used only by Xen).

And eventually all of the hvmloader ACPI built in code could be dropped
and use all of the loader commands?

> 
> On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> are needed in, perhaps, hvmloader.

<nods>
> 
> 
> Haozhong
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-12 17:39               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 128+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-10-12 17:39 UTC (permalink / raw)
  To: Stefano Stabellini, Igor Mammedov, qemu-devel, xen-devel,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson, Anthony Perard,
	xen-devel, ian.jackson, wei.liu2, george.dunlap, JBeulich,
	andrew.cooper3

On Thu, Oct 12, 2017 at 08:45:44PM +0800, Haozhong Zhang wrote:
> On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > > 
> > > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > > > 
> > > > > > This is the QEMU part patches that works with the associated Xen
> > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > > guest address space for vNVDIMM devices.
> > > > > > 
> > > > > > All patches can be found at
> > > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > > 
> > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > > label data, as the Xen side support for labels is not implemented yet.
> > > > > > 
> > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > > device plugging path work on Xen.
> > > > > > 
> > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > > > used as the Xen device model.
> > > > > 
> > > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > > number of xen_enabled() invariants it introduced as well as
> > > > > with partial blobs it creates.
> > > > 
> > > > I have not read the series (Haozhong, please CC me, Anthony and
> > > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > > more xen_enabled() if possible.
> > > > 
> > > > Haozhong, was there a design document thread on xen-devel about this? If
> > > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > > add a link to the design doc in the introductory email, so that
> > > > everybody can read it and be on the same page.
> > > 
> > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > > the guest ACPI.
> > > 
> > > [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> > 
> > Igor, did you have a chance to read it?
> > 
> > .. see below
> > > 
> > > > 
> > > > 
> > > > > I'd like to reduce above and a way to do this might be making xen 
> > > > >  1. use fw_cfg
> > > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > > >  3. extract nvdim tables (which is trivial) and use them
> > > > > 
> > > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > > 
> > > > > So what's stopping xen from using it elsewhere?,
> > > > > instead of adding more xen specific code to do 'the same'
> > > > > job and not reusing/sharing common code with tcg/kvm.
> > > > 
> > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > > rely on a firmware-like application called "hvmloader" that runs in
> > > > guest context and generates the ACPI tables. I have no opinions on
> > > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > > a full few of the system.
> > > > 
> > > > Here the question is: does it have to be QEMU the one to generate the
> > > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > > like the rest, instead of introducing this split-brain design about
> > > > ACPI. We need to see a design doc to fully understand this.
> > > >
> > > 
> > > hvmloader runs in the guest and is responsible to build/load guest
> > > ACPI. However, it's not capable to build AML at runtime (for the lack
> > > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > > then be loaded by hvmloader at runtime.
> > > 
> > > Xen includes an OperationRegion "BIOS" in the static generated guest
> > > DSDT, whose address is hardcoded and which contains a list of values
> > > filled by hvmloader at runtime. Other ACPI objects can refer to those
> > > values (e.g., the number of vCPUs). But it's not enough for generating
> > > guest NVDIMM ACPI objects at compile time and then being customized
> > > and loaded by hvmload, because its structure (i.e., the number of
> > > namespace devices) cannot be decided util the guest config is known.
> > > 
> > > Alternatively, we may introduce an AML builder in hvmloader and build
> > > all guest ACPI completely in hvmloader. Looking at the similar
> > > implementation in QEMU, it would not be small, compared to the current
> > > size of hvmloader. Besides, I'm still going to let QEMU handle guest
> > > NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> > > build NVDIMM ACPI.
> > > 
> > > > If the design doc thread led into thinking that it has to be QEMU to
> > > > generate them, then would it make the code nicer if we used fw_cfg to
> > > > get the (full or partial) tables from QEMU, as Igor suggested?
> > > 
> > > I'll have a look at the code (which I didn't notice) pointed by Igor.
> > 
> > And there is a spec too!
> > 
> > https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt
> > 
> > Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
> > ACPI AML code?
> > 
> 
> Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> /rom@etc/table-loader. The former is unstructured to guest, and
> contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> organized as a set of commands, which direct the guest (e.g., SeaBIOS
> on KVM/QEMU) to relocate data in the former file, recalculate checksum
> of specified area, and fill guest address in specified ACPI field.
> 
> One part of my patches is to implement a mechanism to tell Xen which
> part of ACPI data is a table (NFIT), and which part defines a
> namespace device and what the device name is. I can add two new loader
> commands for them respectively.

<nods>
> 
> Because they just provide information and SeaBIOS in non-xen
> environment ignores unrecognized commands, they will not break SeaBIOS
> in non-xen environment.
> 
> On QEMU side, most Xen-specific hacks in ACPI builder could be

Wooot!
> dropped, and replaced by adding the new loader commands (though they
> may be used only by Xen).

And eventually all of the hvmloader ACPI built in code could be dropped
and use all of the loader commands?

> 
> On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> are needed in, perhaps, hvmloader.

<nods>
> 
> 
> Haozhong
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-12 15:45               ` Paolo Bonzini
@ 2017-10-13  7:53                 ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-13  7:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Konrad Rzeszutek Wilk, Stefano Stabellini, Igor Mammedov,
	qemu-devel, xen-devel, Dan Williams, Chao Peng, Eduardo Habkost,
	Michael S. Tsirkin, Xiao Guangrong, Richard Henderson,
	Anthony Perard, xen-devel, ian.jackson, wei.liu2, george.dunlap,
	JBeulich, andrew.cooper3

On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> On 12/10/2017 14:45, Haozhong Zhang wrote:
> > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > /rom@etc/table-loader. The former is unstructured to guest, and
> > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > of specified area, and fill guest address in specified ACPI field.
> > 
> > One part of my patches is to implement a mechanism to tell Xen which
> > part of ACPI data is a table (NFIT), and which part defines a
> > namespace device and what the device name is. I can add two new loader
> > commands for them respectively.
> > 
> > Because they just provide information and SeaBIOS in non-xen
> > environment ignores unrecognized commands, they will not break SeaBIOS
> > in non-xen environment.
> > 
> > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > dropped, and replaced by adding the new loader commands (though they
> > may be used only by Xen).
> > 
> > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > are needed in, perhaps, hvmloader.
> 
> If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> to process a reduced set of ACPI tables.  In other words,
> etc/acpi/tables would only include the NFIT, the SSDT with namespace
> devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
>
> hvmloader can then:
> 
> 1) allocate some memory for where the XSDT will go
> 
> 2) process the BIOSLinkerLoader like SeaBIOS would do
> 
> 3) find the RSDP in low memory, since the loader script must have placed
> it there.  If it cannot find it, allocate some low memory, fill it with
> the RSDP header and revision, and and jump to step 6
> 
> 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> 
> 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> 
> 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> 
> 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> RSDT and/or XSDT, and updated the checksums
> 
> QEMU's XSDT remains there somewhere in memory, unused but harmless.
> 

It can work for plan tables which do not contain AML.

However, for a namespace device, Xen needs to know its name in order
to detect the potential name conflict with those used in Xen built
ACPI. Xen does not (and is not going to) introduce an AML parser, so
it cannot get those device names from QEMU built ACPI by its own.

The idea of either this patch series or the new BIOSLinkerLoader
command is to let QEMU tell Xen where the definition body of a
namespace device (i.e. that part within the outmost "Device(NAME)") is
and what the device name is. Xen, after the name conflict check, can
re-package the definition body in a namespace device (w/ minimal AML
builder code added in Xen) and then in SSDT.


Haozhong

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-13  7:53                 ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-13  7:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefano Stabellini, wei.liu2, Eduardo Habkost,
	Konrad Rzeszutek Wilk, andrew.cooper3, Michael S. Tsirkin,
	ian.jackson, qemu-devel, xen-devel, xen-devel, JBeulich,
	Chao Peng, Anthony Perard, Igor Mammedov, Dan Williams,
	Richard Henderson, george.dunlap, Xiao Guangrong

On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> On 12/10/2017 14:45, Haozhong Zhang wrote:
> > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > /rom@etc/table-loader. The former is unstructured to guest, and
> > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > of specified area, and fill guest address in specified ACPI field.
> > 
> > One part of my patches is to implement a mechanism to tell Xen which
> > part of ACPI data is a table (NFIT), and which part defines a
> > namespace device and what the device name is. I can add two new loader
> > commands for them respectively.
> > 
> > Because they just provide information and SeaBIOS in non-xen
> > environment ignores unrecognized commands, they will not break SeaBIOS
> > in non-xen environment.
> > 
> > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > dropped, and replaced by adding the new loader commands (though they
> > may be used only by Xen).
> > 
> > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > are needed in, perhaps, hvmloader.
> 
> If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> to process a reduced set of ACPI tables.  In other words,
> etc/acpi/tables would only include the NFIT, the SSDT with namespace
> devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
>
> hvmloader can then:
> 
> 1) allocate some memory for where the XSDT will go
> 
> 2) process the BIOSLinkerLoader like SeaBIOS would do
> 
> 3) find the RSDP in low memory, since the loader script must have placed
> it there.  If it cannot find it, allocate some low memory, fill it with
> the RSDP header and revision, and and jump to step 6
> 
> 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> 
> 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> 
> 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> 
> 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> RSDT and/or XSDT, and updated the checksums
> 
> QEMU's XSDT remains there somewhere in memory, unused but harmless.
> 

It can work for plan tables which do not contain AML.

However, for a namespace device, Xen needs to know its name in order
to detect the potential name conflict with those used in Xen built
ACPI. Xen does not (and is not going to) introduce an AML parser, so
it cannot get those device names from QEMU built ACPI by its own.

The idea of either this patch series or the new BIOSLinkerLoader
command is to let QEMU tell Xen where the definition body of a
namespace device (i.e. that part within the outmost "Device(NAME)") is
and what the device name is. Xen, after the name conflict check, can
re-package the definition body in a namespace device (w/ minimal AML
builder code added in Xen) and then in SSDT.


Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-12 17:39               ` Konrad Rzeszutek Wilk
@ 2017-10-13  8:00                 ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-13  8:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Igor Mammedov, qemu-devel, xen-devel,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Paolo Bonzini, Richard Henderson, Anthony Perard,
	xen-devel, ian.jackson, wei.liu2, george.dunlap, JBeulich,
	andrew.cooper3

On 10/12/17 13:39 -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Oct 12, 2017 at 08:45:44PM +0800, Haozhong Zhang wrote:
> > On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > > > 
> > > > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > > > > 
> > > > > > > This is the QEMU part patches that works with the associated Xen
> > > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > > > guest address space for vNVDIMM devices.
> > > > > > > 
> > > > > > > All patches can be found at
> > > > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > > > 
> > > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > > > label data, as the Xen side support for labels is not implemented yet.
> > > > > > > 
> > > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > > > device plugging path work on Xen.
> > > > > > > 
> > > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > > > > used as the Xen device model.
> > > > > > 
> > > > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > > > number of xen_enabled() invariants it introduced as well as
> > > > > > with partial blobs it creates.
> > > > > 
> > > > > I have not read the series (Haozhong, please CC me, Anthony and
> > > > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > > > more xen_enabled() if possible.
> > > > > 
> > > > > Haozhong, was there a design document thread on xen-devel about this? If
> > > > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > > > add a link to the design doc in the introductory email, so that
> > > > > everybody can read it and be on the same page.
> > > > 
> > > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > > > the guest ACPI.
> > > > 
> > > > [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> > > 
> > > Igor, did you have a chance to read it?
> > > 
> > > .. see below
> > > > 
> > > > > 
> > > > > 
> > > > > > I'd like to reduce above and a way to do this might be making xen 
> > > > > >  1. use fw_cfg
> > > > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > > > >  3. extract nvdim tables (which is trivial) and use them
> > > > > > 
> > > > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > > > 
> > > > > > So what's stopping xen from using it elsewhere?,
> > > > > > instead of adding more xen specific code to do 'the same'
> > > > > > job and not reusing/sharing common code with tcg/kvm.
> > > > > 
> > > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > > > rely on a firmware-like application called "hvmloader" that runs in
> > > > > guest context and generates the ACPI tables. I have no opinions on
> > > > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > > > a full few of the system.
> > > > > 
> > > > > Here the question is: does it have to be QEMU the one to generate the
> > > > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > > > like the rest, instead of introducing this split-brain design about
> > > > > ACPI. We need to see a design doc to fully understand this.
> > > > >
> > > > 
> > > > hvmloader runs in the guest and is responsible to build/load guest
> > > > ACPI. However, it's not capable to build AML at runtime (for the lack
> > > > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > > > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > > > then be loaded by hvmloader at runtime.
> > > > 
> > > > Xen includes an OperationRegion "BIOS" in the static generated guest
> > > > DSDT, whose address is hardcoded and which contains a list of values
> > > > filled by hvmloader at runtime. Other ACPI objects can refer to those
> > > > values (e.g., the number of vCPUs). But it's not enough for generating
> > > > guest NVDIMM ACPI objects at compile time and then being customized
> > > > and loaded by hvmload, because its structure (i.e., the number of
> > > > namespace devices) cannot be decided util the guest config is known.
> > > > 
> > > > Alternatively, we may introduce an AML builder in hvmloader and build
> > > > all guest ACPI completely in hvmloader. Looking at the similar
> > > > implementation in QEMU, it would not be small, compared to the current
> > > > size of hvmloader. Besides, I'm still going to let QEMU handle guest
> > > > NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> > > > build NVDIMM ACPI.
> > > > 
> > > > > If the design doc thread led into thinking that it has to be QEMU to
> > > > > generate them, then would it make the code nicer if we used fw_cfg to
> > > > > get the (full or partial) tables from QEMU, as Igor suggested?
> > > > 
> > > > I'll have a look at the code (which I didn't notice) pointed by Igor.
> > > 
> > > And there is a spec too!
> > > 
> > > https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt
> > > 
> > > Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
> > > ACPI AML code?
> > > 
> > 
> > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > /rom@etc/table-loader. The former is unstructured to guest, and
> > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > of specified area, and fill guest address in specified ACPI field.
> > 
> > One part of my patches is to implement a mechanism to tell Xen which
> > part of ACPI data is a table (NFIT), and which part defines a
> > namespace device and what the device name is. I can add two new loader
> > commands for them respectively.
> 
> <nods>
> > 
> > Because they just provide information and SeaBIOS in non-xen
> > environment ignores unrecognized commands, they will not break SeaBIOS
> > in non-xen environment.
> > 
> > On QEMU side, most Xen-specific hacks in ACPI builder could be
> 
> Wooot!
> > dropped, and replaced by adding the new loader commands (though they
> > may be used only by Xen).
> 
> And eventually all of the hvmloader ACPI built in code could be dropped
> and use all of the loader commands?

If Xen is going to rely on QEMU to build the entire ACPI for a HVM
guest, then there would be no need to check the signature/name
conflict, so the new BIOSLinkerLoader commands here would not be
necessary in that case (or only for backwards compatible). I don't
know how much work would be needed, and that could be another project.

Haozhong

> 
> > 
> > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > are needed in, perhaps, hvmloader.
> 
> <nods>
> > 
> > 
> > Haozhong
> > 
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-13  8:00                 ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-13  8:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, wei.liu2, Eduardo Habkost,
	Michael S. Tsirkin, andrew.cooper3, ian.jackson, qemu-devel,
	xen-devel, xen-devel, Paolo Bonzini, JBeulich, Chao Peng,
	Anthony Perard, Igor Mammedov, Dan Williams, Richard Henderson,
	george.dunlap, Xiao Guangrong

On 10/12/17 13:39 -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Oct 12, 2017 at 08:45:44PM +0800, Haozhong Zhang wrote:
> > On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > > > 
> > > > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > > > > 
> > > > > > > This is the QEMU part patches that works with the associated Xen
> > > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > > > guest address space for vNVDIMM devices.
> > > > > > > 
> > > > > > > All patches can be found at
> > > > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > > > 
> > > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > > > label data, as the Xen side support for labels is not implemented yet.
> > > > > > > 
> > > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > > > device plugging path work on Xen.
> > > > > > > 
> > > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > > > > used as the Xen device model.
> > > > > > 
> > > > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > > > number of xen_enabled() invariants it introduced as well as
> > > > > > with partial blobs it creates.
> > > > > 
> > > > > I have not read the series (Haozhong, please CC me, Anthony and
> > > > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > > > more xen_enabled() if possible.
> > > > > 
> > > > > Haozhong, was there a design document thread on xen-devel about this? If
> > > > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > > > add a link to the design doc in the introductory email, so that
> > > > > everybody can read it and be on the same page.
> > > > 
> > > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > > > the guest ACPI.
> > > > 
> > > > [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> > > 
> > > Igor, did you have a chance to read it?
> > > 
> > > .. see below
> > > > 
> > > > > 
> > > > > 
> > > > > > I'd like to reduce above and a way to do this might be making xen 
> > > > > >  1. use fw_cfg
> > > > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > > > >  3. extract nvdim tables (which is trivial) and use them
> > > > > > 
> > > > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > > > 
> > > > > > So what's stopping xen from using it elsewhere?,
> > > > > > instead of adding more xen specific code to do 'the same'
> > > > > > job and not reusing/sharing common code with tcg/kvm.
> > > > > 
> > > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > > > rely on a firmware-like application called "hvmloader" that runs in
> > > > > guest context and generates the ACPI tables. I have no opinions on
> > > > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > > > a full few of the system.
> > > > > 
> > > > > Here the question is: does it have to be QEMU the one to generate the
> > > > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > > > like the rest, instead of introducing this split-brain design about
> > > > > ACPI. We need to see a design doc to fully understand this.
> > > > >
> > > > 
> > > > hvmloader runs in the guest and is responsible to build/load guest
> > > > ACPI. However, it's not capable to build AML at runtime (for the lack
> > > > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > > > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > > > then be loaded by hvmloader at runtime.
> > > > 
> > > > Xen includes an OperationRegion "BIOS" in the static generated guest
> > > > DSDT, whose address is hardcoded and which contains a list of values
> > > > filled by hvmloader at runtime. Other ACPI objects can refer to those
> > > > values (e.g., the number of vCPUs). But it's not enough for generating
> > > > guest NVDIMM ACPI objects at compile time and then being customized
> > > > and loaded by hvmload, because its structure (i.e., the number of
> > > > namespace devices) cannot be decided util the guest config is known.
> > > > 
> > > > Alternatively, we may introduce an AML builder in hvmloader and build
> > > > all guest ACPI completely in hvmloader. Looking at the similar
> > > > implementation in QEMU, it would not be small, compared to the current
> > > > size of hvmloader. Besides, I'm still going to let QEMU handle guest
> > > > NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
> > > > build NVDIMM ACPI.
> > > > 
> > > > > If the design doc thread led into thinking that it has to be QEMU to
> > > > > generate them, then would it make the code nicer if we used fw_cfg to
> > > > > get the (full or partial) tables from QEMU, as Igor suggested?
> > > > 
> > > > I'll have a look at the code (which I didn't notice) pointed by Igor.
> > > 
> > > And there is a spec too!
> > > 
> > > https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt
> > > 
> > > Igor, did you have in mind to use FW_CFG_FILE_DIR to retrieve the
> > > ACPI AML code?
> > > 
> > 
> > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > /rom@etc/table-loader. The former is unstructured to guest, and
> > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > of specified area, and fill guest address in specified ACPI field.
> > 
> > One part of my patches is to implement a mechanism to tell Xen which
> > part of ACPI data is a table (NFIT), and which part defines a
> > namespace device and what the device name is. I can add two new loader
> > commands for them respectively.
> 
> <nods>
> > 
> > Because they just provide information and SeaBIOS in non-xen
> > environment ignores unrecognized commands, they will not break SeaBIOS
> > in non-xen environment.
> > 
> > On QEMU side, most Xen-specific hacks in ACPI builder could be
> 
> Wooot!
> > dropped, and replaced by adding the new loader commands (though they
> > may be used only by Xen).
> 
> And eventually all of the hvmloader ACPI built in code could be dropped
> and use all of the loader commands?

If Xen is going to rely on QEMU to build the entire ACPI for a HVM
guest, then there would be no need to check the signature/name
conflict, so the new BIOSLinkerLoader commands here would not be
necessary in that case (or only for backwards compatible). I don't
know how much work would be needed, and that could be another project.

Haozhong

> 
> > 
> > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > are needed in, perhaps, hvmloader.
> 
> <nods>
> > 
> > 
> > Haozhong
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-13  7:53                 ` Haozhong Zhang
@ 2017-10-13  8:44                   ` Igor Mammedov
  -1 siblings, 0 replies; 128+ messages in thread
From: Igor Mammedov @ 2017-10-13  8:44 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Paolo Bonzini, Konrad Rzeszutek Wilk, Stefano Stabellini,
	qemu-devel, xen-devel, Dan Williams, Chao Peng, Eduardo Habkost,
	Michael S. Tsirkin, Xiao Guangrong, Richard Henderson,
	Anthony Perard, xen-devel, ian.jackson, wei.liu2, george.dunlap,
	JBeulich, andrew.cooper3

On Fri, 13 Oct 2017 15:53:26 +0800
Haozhong Zhang <haozhong.zhang@intel.com> wrote:

> On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> > On 12/10/2017 14:45, Haozhong Zhang wrote:  
> > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > > /rom@etc/table-loader. The former is unstructured to guest, and
> > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > > of specified area, and fill guest address in specified ACPI field.
> > > 
> > > One part of my patches is to implement a mechanism to tell Xen which
> > > part of ACPI data is a table (NFIT), and which part defines a
> > > namespace device and what the device name is. I can add two new loader
> > > commands for them respectively.
> > > 
> > > Because they just provide information and SeaBIOS in non-xen
> > > environment ignores unrecognized commands, they will not break SeaBIOS
> > > in non-xen environment.
> > > 
> > > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > > dropped, and replaced by adding the new loader commands (though they
> > > may be used only by Xen).
> > > 
> > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > > are needed in, perhaps, hvmloader.  
> > 
> > If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> > to process a reduced set of ACPI tables.  In other words,
> > etc/acpi/tables would only include the NFIT, the SSDT with namespace
> > devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
> >
> > hvmloader can then:
> > 
> > 1) allocate some memory for where the XSDT will go
> > 
> > 2) process the BIOSLinkerLoader like SeaBIOS would do
> > 
> > 3) find the RSDP in low memory, since the loader script must have placed
> > it there.  If it cannot find it, allocate some low memory, fill it with
> > the RSDP header and revision, and and jump to step 6
> > 
> > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> > 
> > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> > 
> > 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> > 
> > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> > RSDT and/or XSDT, and updated the checksums
> > 
> > QEMU's XSDT remains there somewhere in memory, unused but harmless.
> >   
+1 to Paolo's suggestion, i.e.
 1. add BIOSLinkerLoader into hvmloader
 2. load/process QEMU's tables with #1
 3. get pointers to QEMU generated NFIT and NVDIMM SSDT from QEMU's RSDT/XSDT
    and put them in hvmloader's RSDT

> It can work for plan tables which do not contain AML.
> 
> However, for a namespace device, Xen needs to know its name in order
> to detect the potential name conflict with those used in Xen built
> ACPI. Xen does not (and is not going to) introduce an AML parser, so
> it cannot get those device names from QEMU built ACPI by its own.
> 
> The idea of either this patch series or the new BIOSLinkerLoader
> command is to let QEMU tell Xen where the definition body of a
> namespace device (i.e. that part within the outmost "Device(NAME)") is
> and what the device name is. Xen, after the name conflict check, can
> re-package the definition body in a namespace device (w/ minimal AML
> builder code added in Xen) and then in SSDT.

I'd skip conflict check at runtime as hvmloader doesn't currently
have "\\_SB\NVDR" device so instead of doing runtime check it might
do primitive check at build time that ASL sources in hvmloader do
not contain reserved for QEMU "NVDR" keyword to avoid its addition
by accident in future. (it also might be reused in future if some
other tables from QEMU will be reused).
It's a bit hackinsh but at least it does the job and keeps
BIOSLinkerLoader interface the same for all supported firmwares
(I'd consider it as a temporary hack on the way to fully build
by QEMU ACPI tables for Xen).

Ideally it would be better for QEMU to build all ACPI tables for
hvmloader to avoid split brain issues and need to invent extra
interfaces every time a feature is added to pass configuration
data from QEMU to firmware.
But that's probably out of scope of this project, it could be
done on top of this if Xen folks would like to do it. Adding
BIOSLinkerLoader to hvmloader would be a good starting point
for that future effort.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-13  8:44                   ` Igor Mammedov
  0 siblings, 0 replies; 128+ messages in thread
From: Igor Mammedov @ 2017-10-13  8:44 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, wei.liu2, Eduardo Habkost,
	Konrad Rzeszutek Wilk, andrew.cooper3, Michael S. Tsirkin,
	ian.jackson, qemu-devel, xen-devel, xen-devel, JBeulich,
	Chao Peng, Anthony Perard, Paolo Bonzini, Dan Williams,
	Richard Henderson, george.dunlap, Xiao Guangrong

On Fri, 13 Oct 2017 15:53:26 +0800
Haozhong Zhang <haozhong.zhang@intel.com> wrote:

> On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> > On 12/10/2017 14:45, Haozhong Zhang wrote:  
> > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > > /rom@etc/table-loader. The former is unstructured to guest, and
> > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > > of specified area, and fill guest address in specified ACPI field.
> > > 
> > > One part of my patches is to implement a mechanism to tell Xen which
> > > part of ACPI data is a table (NFIT), and which part defines a
> > > namespace device and what the device name is. I can add two new loader
> > > commands for them respectively.
> > > 
> > > Because they just provide information and SeaBIOS in non-xen
> > > environment ignores unrecognized commands, they will not break SeaBIOS
> > > in non-xen environment.
> > > 
> > > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > > dropped, and replaced by adding the new loader commands (though they
> > > may be used only by Xen).
> > > 
> > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > > are needed in, perhaps, hvmloader.  
> > 
> > If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> > to process a reduced set of ACPI tables.  In other words,
> > etc/acpi/tables would only include the NFIT, the SSDT with namespace
> > devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
> >
> > hvmloader can then:
> > 
> > 1) allocate some memory for where the XSDT will go
> > 
> > 2) process the BIOSLinkerLoader like SeaBIOS would do
> > 
> > 3) find the RSDP in low memory, since the loader script must have placed
> > it there.  If it cannot find it, allocate some low memory, fill it with
> > the RSDP header and revision, and and jump to step 6
> > 
> > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> > 
> > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> > 
> > 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> > 
> > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> > RSDT and/or XSDT, and updated the checksums
> > 
> > QEMU's XSDT remains there somewhere in memory, unused but harmless.
> >   
+1 to Paolo's suggestion, i.e.
 1. add BIOSLinkerLoader into hvmloader
 2. load/process QEMU's tables with #1
 3. get pointers to QEMU generated NFIT and NVDIMM SSDT from QEMU's RSDT/XSDT
    and put them in hvmloader's RSDT

> It can work for plan tables which do not contain AML.
> 
> However, for a namespace device, Xen needs to know its name in order
> to detect the potential name conflict with those used in Xen built
> ACPI. Xen does not (and is not going to) introduce an AML parser, so
> it cannot get those device names from QEMU built ACPI by its own.
> 
> The idea of either this patch series or the new BIOSLinkerLoader
> command is to let QEMU tell Xen where the definition body of a
> namespace device (i.e. that part within the outmost "Device(NAME)") is
> and what the device name is. Xen, after the name conflict check, can
> re-package the definition body in a namespace device (w/ minimal AML
> builder code added in Xen) and then in SSDT.

I'd skip conflict check at runtime as hvmloader doesn't currently
have "\\_SB\NVDR" device so instead of doing runtime check it might
do primitive check at build time that ASL sources in hvmloader do
not contain reserved for QEMU "NVDR" keyword to avoid its addition
by accident in future. (it also might be reused in future if some
other tables from QEMU will be reused).
It's a bit hackinsh but at least it does the job and keeps
BIOSLinkerLoader interface the same for all supported firmwares
(I'd consider it as a temporary hack on the way to fully build
by QEMU ACPI tables for Xen).

Ideally it would be better for QEMU to build all ACPI tables for
hvmloader to avoid split brain issues and need to invent extra
interfaces every time a feature is added to pass configuration
data from QEMU to firmware.
But that's probably out of scope of this project, it could be
done on top of this if Xen folks would like to do it. Adding
BIOSLinkerLoader to hvmloader would be a good starting point
for that future effort.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-13  8:44                   ` Igor Mammedov
@ 2017-10-13 11:13                     ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-13 11:13 UTC (permalink / raw)
  To: Igor Mammedov, JBeulich, andrew.cooper3, Stefano Stabellini,
	Anthony Perard
  Cc: Paolo Bonzini, Konrad Rzeszutek Wilk, qemu-devel, xen-devel,
	Dan Williams, Chao Peng, Eduardo Habkost, Michael S. Tsirkin,
	Xiao Guangrong, Richard Henderson, xen-devel, ian.jackson,
	wei.liu2, george.dunlap

On 10/13/17 10:44 +0200, Igor Mammedov wrote:
> On Fri, 13 Oct 2017 15:53:26 +0800
> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> 
> > On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> > > On 12/10/2017 14:45, Haozhong Zhang wrote:  
> > > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > > > /rom@etc/table-loader. The former is unstructured to guest, and
> > > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > > > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > > > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > > > of specified area, and fill guest address in specified ACPI field.
> > > > 
> > > > One part of my patches is to implement a mechanism to tell Xen which
> > > > part of ACPI data is a table (NFIT), and which part defines a
> > > > namespace device and what the device name is. I can add two new loader
> > > > commands for them respectively.
> > > > 
> > > > Because they just provide information and SeaBIOS in non-xen
> > > > environment ignores unrecognized commands, they will not break SeaBIOS
> > > > in non-xen environment.
> > > > 
> > > > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > > > dropped, and replaced by adding the new loader commands (though they
> > > > may be used only by Xen).
> > > > 
> > > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > > > are needed in, perhaps, hvmloader.  
> > > 
> > > If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> > > to process a reduced set of ACPI tables.  In other words,
> > > etc/acpi/tables would only include the NFIT, the SSDT with namespace
> > > devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
> > >
> > > hvmloader can then:
> > > 
> > > 1) allocate some memory for where the XSDT will go
> > > 
> > > 2) process the BIOSLinkerLoader like SeaBIOS would do
> > > 
> > > 3) find the RSDP in low memory, since the loader script must have placed
> > > it there.  If it cannot find it, allocate some low memory, fill it with
> > > the RSDP header and revision, and and jump to step 6
> > > 
> > > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> > > 
> > > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> > > 
> > > 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> > > 
> > > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> > > RSDT and/or XSDT, and updated the checksums
> > > 
> > > QEMU's XSDT remains there somewhere in memory, unused but harmless.
> > >   
> +1 to Paolo's suggestion, i.e.
>  1. add BIOSLinkerLoader into hvmloader
>  2. load/process QEMU's tables with #1
>  3. get pointers to QEMU generated NFIT and NVDIMM SSDT from QEMU's RSDT/XSDT
>     and put them in hvmloader's RSDT
> 
> > It can work for plan tables which do not contain AML.
> > 
> > However, for a namespace device, Xen needs to know its name in order
> > to detect the potential name conflict with those used in Xen built
> > ACPI. Xen does not (and is not going to) introduce an AML parser, so
> > it cannot get those device names from QEMU built ACPI by its own.
> > 
> > The idea of either this patch series or the new BIOSLinkerLoader
> > command is to let QEMU tell Xen where the definition body of a
> > namespace device (i.e. that part within the outmost "Device(NAME)") is
> > and what the device name is. Xen, after the name conflict check, can
> > re-package the definition body in a namespace device (w/ minimal AML
> > builder code added in Xen) and then in SSDT.
> 
> I'd skip conflict check at runtime as hvmloader doesn't currently
> have "\\_SB\NVDR" device so instead of doing runtime check it might
> do primitive check at build time that ASL sources in hvmloader do
> not contain reserved for QEMU "NVDR" keyword to avoid its addition
> by accident in future. (it also might be reused in future if some
> other tables from QEMU will be reused).
> It's a bit hackinsh but at least it does the job and keeps
> BIOSLinkerLoader interface the same for all supported firmwares
> (I'd consider it as a temporary hack on the way to fully build
> by QEMU ACPI tables for Xen).
> 
> Ideally it would be better for QEMU to build all ACPI tables for
> hvmloader to avoid split brain issues and need to invent extra
> interfaces every time a feature is added to pass configuration
> data from QEMU to firmware.
> But that's probably out of scope of this project, it could be
> done on top of this if Xen folks would like to do it. Adding
> BIOSLinkerLoader to hvmloader would be a good starting point
> for that future effort.

If we can let QEMU build the entire guest ACPI, we may even not need
to introduce fw_cfg and BIOSLinkerLoader code to hvmloader.  SeaBIOS
is currently loaded after hvmloader and can be used to load QEMU built
ACPI.

To Jan, Andrew, Stefano and Anthony,

what do you think about allowing QEMU to build the entire guest ACPI
and letting SeaBIOS to load it? ACPI builder code in hvmloader is
still there and just bypassed in this case.


Haozhong

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-13 11:13                     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-13 11:13 UTC (permalink / raw)
  To: Igor Mammedov, JBeulich, andrew.cooper3, Stefano Stabellini,
	Anthony Perard
  Cc: wei.liu2, Eduardo Habkost, Konrad Rzeszutek Wilk,
	Michael S. Tsirkin, ian.jackson, qemu-devel, xen-devel,
	xen-devel, Chao Peng, Paolo Bonzini, Dan Williams,
	Richard Henderson, george.dunlap, Xiao Guangrong

On 10/13/17 10:44 +0200, Igor Mammedov wrote:
> On Fri, 13 Oct 2017 15:53:26 +0800
> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> 
> > On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> > > On 12/10/2017 14:45, Haozhong Zhang wrote:  
> > > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > > > /rom@etc/table-loader. The former is unstructured to guest, and
> > > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > > > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > > > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > > > of specified area, and fill guest address in specified ACPI field.
> > > > 
> > > > One part of my patches is to implement a mechanism to tell Xen which
> > > > part of ACPI data is a table (NFIT), and which part defines a
> > > > namespace device and what the device name is. I can add two new loader
> > > > commands for them respectively.
> > > > 
> > > > Because they just provide information and SeaBIOS in non-xen
> > > > environment ignores unrecognized commands, they will not break SeaBIOS
> > > > in non-xen environment.
> > > > 
> > > > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > > > dropped, and replaced by adding the new loader commands (though they
> > > > may be used only by Xen).
> > > > 
> > > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > > > are needed in, perhaps, hvmloader.  
> > > 
> > > If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> > > to process a reduced set of ACPI tables.  In other words,
> > > etc/acpi/tables would only include the NFIT, the SSDT with namespace
> > > devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
> > >
> > > hvmloader can then:
> > > 
> > > 1) allocate some memory for where the XSDT will go
> > > 
> > > 2) process the BIOSLinkerLoader like SeaBIOS would do
> > > 
> > > 3) find the RSDP in low memory, since the loader script must have placed
> > > it there.  If it cannot find it, allocate some low memory, fill it with
> > > the RSDP header and revision, and and jump to step 6
> > > 
> > > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> > > 
> > > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> > > 
> > > 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> > > 
> > > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> > > RSDT and/or XSDT, and updated the checksums
> > > 
> > > QEMU's XSDT remains there somewhere in memory, unused but harmless.
> > >   
> +1 to Paolo's suggestion, i.e.
>  1. add BIOSLinkerLoader into hvmloader
>  2. load/process QEMU's tables with #1
>  3. get pointers to QEMU generated NFIT and NVDIMM SSDT from QEMU's RSDT/XSDT
>     and put them in hvmloader's RSDT
> 
> > It can work for plan tables which do not contain AML.
> > 
> > However, for a namespace device, Xen needs to know its name in order
> > to detect the potential name conflict with those used in Xen built
> > ACPI. Xen does not (and is not going to) introduce an AML parser, so
> > it cannot get those device names from QEMU built ACPI by its own.
> > 
> > The idea of either this patch series or the new BIOSLinkerLoader
> > command is to let QEMU tell Xen where the definition body of a
> > namespace device (i.e. that part within the outmost "Device(NAME)") is
> > and what the device name is. Xen, after the name conflict check, can
> > re-package the definition body in a namespace device (w/ minimal AML
> > builder code added in Xen) and then in SSDT.
> 
> I'd skip conflict check at runtime as hvmloader doesn't currently
> have "\\_SB\NVDR" device so instead of doing runtime check it might
> do primitive check at build time that ASL sources in hvmloader do
> not contain reserved for QEMU "NVDR" keyword to avoid its addition
> by accident in future. (it also might be reused in future if some
> other tables from QEMU will be reused).
> It's a bit hackinsh but at least it does the job and keeps
> BIOSLinkerLoader interface the same for all supported firmwares
> (I'd consider it as a temporary hack on the way to fully build
> by QEMU ACPI tables for Xen).
> 
> Ideally it would be better for QEMU to build all ACPI tables for
> hvmloader to avoid split brain issues and need to invent extra
> interfaces every time a feature is added to pass configuration
> data from QEMU to firmware.
> But that's probably out of scope of this project, it could be
> done on top of this if Xen folks would like to do it. Adding
> BIOSLinkerLoader to hvmloader would be a good starting point
> for that future effort.

If we can let QEMU build the entire guest ACPI, we may even not need
to introduce fw_cfg and BIOSLinkerLoader code to hvmloader.  SeaBIOS
is currently loaded after hvmloader and can be used to load QEMU built
ACPI.

To Jan, Andrew, Stefano and Anthony,

what do you think about allowing QEMU to build the entire guest ACPI
and letting SeaBIOS to load it? ACPI builder code in hvmloader is
still there and just bypassed in this case.


Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-13 11:13                     ` Haozhong Zhang
@ 2017-10-13 12:13                       ` Jan Beulich
  -1 siblings, 0 replies; 128+ messages in thread
From: Jan Beulich @ 2017-10-13 12:13 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: andrew.cooper3, Anthony Perard, george.dunlap, wei.liu2,
	ian.jackson, Xiao Guangrong, Dan Williams, Stefano Stabellini,
	Chao Peng, xen-devel, xen-devel, qemu-devel,
	Konrad Rzeszutek Wilk, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Paolo Bonzini, Richard Henderson

>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> To Jan, Andrew, Stefano and Anthony,
> 
> what do you think about allowing QEMU to build the entire guest ACPI
> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> still there and just bypassed in this case.

Well, if that can be made work in a non-quirky way and without
loss of functionality, I'd probably be fine. I do think, however,
that there's a reason this is being handled in hvmloader right now.

Jan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-13 12:13                       ` Jan Beulich
  0 siblings, 0 replies; 128+ messages in thread
From: Jan Beulich @ 2017-10-13 12:13 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, wei.liu2, Xiao Guangrong,
	Konrad Rzeszutek Wilk, qemu-devel, andrew.cooper3,
	Michael S. Tsirkin, ian.jackson, george.dunlap, xen-devel,
	Igor Mammedov, Paolo Bonzini, xen-devel, Anthony Perard,
	Chao Peng, Dan Williams, Richard Henderson, Eduardo Habkost

>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> To Jan, Andrew, Stefano and Anthony,
> 
> what do you think about allowing QEMU to build the entire guest ACPI
> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> still there and just bypassed in this case.

Well, if that can be made work in a non-quirky way and without
loss of functionality, I'd probably be fine. I do think, however,
that there's a reason this is being handled in hvmloader right now.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-13 12:13                       ` Jan Beulich
@ 2017-10-13 22:46                         ` Stefano Stabellini
  -1 siblings, 0 replies; 128+ messages in thread
From: Stefano Stabellini @ 2017-10-13 22:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Haozhong Zhang, andrew.cooper3, Anthony Perard, george.dunlap,
	wei.liu2, ian.jackson, Xiao Guangrong, Dan Williams,
	Stefano Stabellini, Chao Peng, xen-devel, xen-devel, qemu-devel,
	Konrad Rzeszutek Wilk, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Paolo Bonzini, Richard Henderson

On Fri, 13 Oct 2017, Jan Beulich wrote:
> >>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > To Jan, Andrew, Stefano and Anthony,
> > 
> > what do you think about allowing QEMU to build the entire guest ACPI
> > and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > still there and just bypassed in this case.
> 
> Well, if that can be made work in a non-quirky way and without
> loss of functionality, I'd probably be fine. I do think, however,
> that there's a reason this is being handled in hvmloader right now.

And not to discourage you, just as a clarification, you'll also need to
consider backward compatibility: unless the tables are identical, I
imagine we'll have to keep using the old tables for already installed
virtual machines.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-13 22:46                         ` Stefano Stabellini
  0 siblings, 0 replies; 128+ messages in thread
From: Stefano Stabellini @ 2017-10-13 22:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Haozhong Zhang, Stefano Stabellini, wei.liu2, Xiao Guangrong,
	Konrad Rzeszutek Wilk, qemu-devel, andrew.cooper3,
	Michael S. Tsirkin, ian.jackson, george.dunlap, xen-devel,
	Igor Mammedov, Paolo Bonzini, xen-devel, Anthony Perard,
	Chao Peng, Dan Williams, Richard Henderson, Eduardo Habkost

On Fri, 13 Oct 2017, Jan Beulich wrote:
> >>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > To Jan, Andrew, Stefano and Anthony,
> > 
> > what do you think about allowing QEMU to build the entire guest ACPI
> > and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > still there and just bypassed in this case.
> 
> Well, if that can be made work in a non-quirky way and without
> loss of functionality, I'd probably be fine. I do think, however,
> that there's a reason this is being handled in hvmloader right now.

And not to discourage you, just as a clarification, you'll also need to
consider backward compatibility: unless the tables are identical, I
imagine we'll have to keep using the old tables for already installed
virtual machines.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-13 22:46                         ` Stefano Stabellini
@ 2017-10-15  0:31                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 128+ messages in thread
From: Michael S. Tsirkin @ 2017-10-15  0:31 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Jan Beulich, Haozhong Zhang, andrew.cooper3, Anthony Perard,
	george.dunlap, wei.liu2, ian.jackson, Xiao Guangrong,
	Dan Williams, Chao Peng, xen-devel, xen-devel, qemu-devel,
	Konrad Rzeszutek Wilk, Eduardo Habkost, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On Fri, Oct 13, 2017 at 03:46:39PM -0700, Stefano Stabellini wrote:
> On Fri, 13 Oct 2017, Jan Beulich wrote:
> > >>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > > To Jan, Andrew, Stefano and Anthony,
> > > 
> > > what do you think about allowing QEMU to build the entire guest ACPI
> > > and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > > still there and just bypassed in this case.
> > 
> > Well, if that can be made work in a non-quirky way and without
> > loss of functionality, I'd probably be fine. I do think, however,
> > that there's a reason this is being handled in hvmloader right now.
> 
> And not to discourage you, just as a clarification, you'll also need to
> consider backward compatibility: unless the tables are identical, I
> imagine we'll have to keep using the old tables for already installed
> virtual machines.

Maybe you can handle this using machine type versioning.
Installed guests would use the old type.

-- 
MST

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-15  0:31                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 128+ messages in thread
From: Michael S. Tsirkin @ 2017-10-15  0:31 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Haozhong Zhang, wei.liu2, Xiao Guangrong, Konrad Rzeszutek Wilk,
	qemu-devel, andrew.cooper3, ian.jackson, george.dunlap,
	xen-devel, Igor Mammedov, Paolo Bonzini, Jan Beulich, xen-devel,
	Anthony Perard, Chao Peng, Dan Williams, Richard Henderson,
	Eduardo Habkost

On Fri, Oct 13, 2017 at 03:46:39PM -0700, Stefano Stabellini wrote:
> On Fri, 13 Oct 2017, Jan Beulich wrote:
> > >>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > > To Jan, Andrew, Stefano and Anthony,
> > > 
> > > what do you think about allowing QEMU to build the entire guest ACPI
> > > and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > > still there and just bypassed in this case.
> > 
> > Well, if that can be made work in a non-quirky way and without
> > loss of functionality, I'd probably be fine. I do think, however,
> > that there's a reason this is being handled in hvmloader right now.
> 
> And not to discourage you, just as a clarification, you'll also need to
> consider backward compatibility: unless the tables are identical, I
> imagine we'll have to keep using the old tables for already installed
> virtual machines.

Maybe you can handle this using machine type versioning.
Installed guests would use the old type.

-- 
MST

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-13  7:53                 ` Haozhong Zhang
@ 2017-10-15  0:35                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 128+ messages in thread
From: Michael S. Tsirkin @ 2017-10-15  0:35 UTC (permalink / raw)
  To: Paolo Bonzini, Konrad Rzeszutek Wilk, Stefano Stabellini,
	Igor Mammedov, qemu-devel, xen-devel, Dan Williams, Chao Peng,
	Eduardo Habkost, Xiao Guangrong, Richard Henderson,
	Anthony Perard, xen-devel, ian.jackson, wei.liu2, george.dunlap,
	JBeulich, andrew.cooper3

On Fri, Oct 13, 2017 at 03:53:26PM +0800, Haozhong Zhang wrote:
> On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> > On 12/10/2017 14:45, Haozhong Zhang wrote:
> > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > > /rom@etc/table-loader. The former is unstructured to guest, and
> > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > > of specified area, and fill guest address in specified ACPI field.
> > > 
> > > One part of my patches is to implement a mechanism to tell Xen which
> > > part of ACPI data is a table (NFIT), and which part defines a
> > > namespace device and what the device name is. I can add two new loader
> > > commands for them respectively.
> > > 
> > > Because they just provide information and SeaBIOS in non-xen
> > > environment ignores unrecognized commands, they will not break SeaBIOS
> > > in non-xen environment.
> > > 
> > > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > > dropped, and replaced by adding the new loader commands (though they
> > > may be used only by Xen).
> > > 
> > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > > are needed in, perhaps, hvmloader.
> > 
> > If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> > to process a reduced set of ACPI tables.  In other words,
> > etc/acpi/tables would only include the NFIT, the SSDT with namespace
> > devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
> >
> > hvmloader can then:
> > 
> > 1) allocate some memory for where the XSDT will go
> > 
> > 2) process the BIOSLinkerLoader like SeaBIOS would do
> > 
> > 3) find the RSDP in low memory, since the loader script must have placed
> > it there.  If it cannot find it, allocate some low memory, fill it with
> > the RSDP header and revision, and and jump to step 6
> > 
> > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> > 
> > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> > 
> > 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> > 
> > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> > RSDT and/or XSDT, and updated the checksums
> > 
> > QEMU's XSDT remains there somewhere in memory, unused but harmless.
> > 
> 
> It can work for plan tables which do not contain AML.
> 
> However, for a namespace device, Xen needs to know its name in order
> to detect the potential name conflict with those used in Xen built
> ACPI. Xen does not (and is not going to) introduce an AML parser, so
> it cannot get those device names from QEMU built ACPI by its own.
> 
> The idea of either this patch series or the new BIOSLinkerLoader
> command is to let QEMU tell Xen where the definition body of a
> namespace device (i.e. that part within the outmost "Device(NAME)") is
> and what the device name is. Xen, after the name conflict check, can
> re-package the definition body in a namespace device (w/ minimal AML
> builder code added in Xen) and then in SSDT.
> 
> 
> Haozhong

You most likely can do this without a new command.
You can use something similiar to build_append_named_dword
in combination with BIOS_LINKER_LOADER_COMMAND_ADD_POINTER
like vm gen id does.

-- 
MST

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-15  0:35                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 128+ messages in thread
From: Michael S. Tsirkin @ 2017-10-15  0:35 UTC (permalink / raw)
  To: Paolo Bonzini, Konrad Rzeszutek Wilk, Stefano Stabellini,
	Igor Mammedov, qemu-devel, xen-devel, Dan Williams, Chao Peng,
	Eduardo Habkost, Xiao Guangrong, Richard Henderson,
	Anthony Perard, xen-devel, ian.jackson, wei.liu2, george.dunlap,
	JBeulich, andrew.cooper3

On Fri, Oct 13, 2017 at 03:53:26PM +0800, Haozhong Zhang wrote:
> On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> > On 12/10/2017 14:45, Haozhong Zhang wrote:
> > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > > /rom@etc/table-loader. The former is unstructured to guest, and
> > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > > of specified area, and fill guest address in specified ACPI field.
> > > 
> > > One part of my patches is to implement a mechanism to tell Xen which
> > > part of ACPI data is a table (NFIT), and which part defines a
> > > namespace device and what the device name is. I can add two new loader
> > > commands for them respectively.
> > > 
> > > Because they just provide information and SeaBIOS in non-xen
> > > environment ignores unrecognized commands, they will not break SeaBIOS
> > > in non-xen environment.
> > > 
> > > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > > dropped, and replaced by adding the new loader commands (though they
> > > may be used only by Xen).
> > > 
> > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > > are needed in, perhaps, hvmloader.
> > 
> > If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> > to process a reduced set of ACPI tables.  In other words,
> > etc/acpi/tables would only include the NFIT, the SSDT with namespace
> > devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
> >
> > hvmloader can then:
> > 
> > 1) allocate some memory for where the XSDT will go
> > 
> > 2) process the BIOSLinkerLoader like SeaBIOS would do
> > 
> > 3) find the RSDP in low memory, since the loader script must have placed
> > it there.  If it cannot find it, allocate some low memory, fill it with
> > the RSDP header and revision, and and jump to step 6
> > 
> > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> > 
> > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> > 
> > 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> > 
> > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> > RSDT and/or XSDT, and updated the checksums
> > 
> > QEMU's XSDT remains there somewhere in memory, unused but harmless.
> > 
> 
> It can work for plan tables which do not contain AML.
> 
> However, for a namespace device, Xen needs to know its name in order
> to detect the potential name conflict with those used in Xen built
> ACPI. Xen does not (and is not going to) introduce an AML parser, so
> it cannot get those device names from QEMU built ACPI by its own.
> 
> The idea of either this patch series or the new BIOSLinkerLoader
> command is to let QEMU tell Xen where the definition body of a
> namespace device (i.e. that part within the outmost "Device(NAME)") is
> and what the device name is. Xen, after the name conflict check, can
> re-package the definition body in a namespace device (w/ minimal AML
> builder code added in Xen) and then in SSDT.
> 
> 
> Haozhong

You most likely can do this without a new command.
You can use something similiar to build_append_named_dword
in combination with BIOS_LINKER_LOADER_COMMAND_ADD_POINTER
like vm gen id does.

-- 
MST

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-15  0:31                           ` Michael S. Tsirkin
@ 2017-10-16 14:49                             ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 128+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-10-16 14:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefano Stabellini, Haozhong Zhang, wei.liu2, Xiao Guangrong,
	qemu-devel, andrew.cooper3, ian.jackson, george.dunlap,
	xen-devel, Igor Mammedov, Paolo Bonzini, Jan Beulich, xen-devel,
	Anthony Perard, Chao Peng, Dan Williams, Richard Henderson,
	Eduardo Habkost

On Sun, Oct 15, 2017 at 03:31:15AM +0300, Michael S. Tsirkin wrote:
> On Fri, Oct 13, 2017 at 03:46:39PM -0700, Stefano Stabellini wrote:
> > On Fri, 13 Oct 2017, Jan Beulich wrote:
> > > >>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > > > To Jan, Andrew, Stefano and Anthony,
> > > > 
> > > > what do you think about allowing QEMU to build the entire guest ACPI
> > > > and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > > > still there and just bypassed in this case.
> > > 
> > > Well, if that can be made work in a non-quirky way and without
> > > loss of functionality, I'd probably be fine. I do think, however,
> > > that there's a reason this is being handled in hvmloader right now.
> > 
> > And not to discourage you, just as a clarification, you'll also need to
> > consider backward compatibility: unless the tables are identical, I
> > imagine we'll have to keep using the old tables for already installed
> > virtual machines.
> 
> Maybe you can handle this using machine type versioning.

<nods> And the type could be v2 if nvdimm was provided (which is
something that the toolstack would figure out).

The toolstack could also have a seperate 'v2' config flag if somebody
wanted to play with this _outside_ of having NVDIMM in the guest?


> Installed guests would use the old type.

<nods>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-16 14:49                             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 128+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-10-16 14:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Haozhong Zhang, Stefano Stabellini, wei.liu2, Xiao Guangrong,
	andrew.cooper3, ian.jackson, qemu-devel, Eduardo Habkost,
	george.dunlap, xen-devel, Chao Peng, Jan Beulich, Paolo Bonzini,
	Anthony Perard, Igor Mammedov, Dan Williams, xen-devel,
	Richard Henderson

On Sun, Oct 15, 2017 at 03:31:15AM +0300, Michael S. Tsirkin wrote:
> On Fri, Oct 13, 2017 at 03:46:39PM -0700, Stefano Stabellini wrote:
> > On Fri, 13 Oct 2017, Jan Beulich wrote:
> > > >>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > > > To Jan, Andrew, Stefano and Anthony,
> > > > 
> > > > what do you think about allowing QEMU to build the entire guest ACPI
> > > > and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > > > still there and just bypassed in this case.
> > > 
> > > Well, if that can be made work in a non-quirky way and without
> > > loss of functionality, I'd probably be fine. I do think, however,
> > > that there's a reason this is being handled in hvmloader right now.
> > 
> > And not to discourage you, just as a clarification, you'll also need to
> > consider backward compatibility: unless the tables are identical, I
> > imagine we'll have to keep using the old tables for already installed
> > virtual machines.
> 
> Maybe you can handle this using machine type versioning.

<nods> And the type could be v2 if nvdimm was provided (which is
something that the toolstack would figure out).

The toolstack could also have a seperate 'v2' config flag if somebody
wanted to play with this _outside_ of having NVDIMM in the guest?


> Installed guests would use the old type.

<nods>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-13 22:46                         ` Stefano Stabellini
@ 2017-10-17 11:45                           ` Paolo Bonzini
  -1 siblings, 0 replies; 128+ messages in thread
From: Paolo Bonzini @ 2017-10-17 11:45 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: Haozhong Zhang, andrew.cooper3, Anthony Perard, george.dunlap,
	wei.liu2, ian.jackson, Xiao Guangrong, Dan Williams, Chao Peng,
	xen-devel, xen-devel, qemu-devel, Konrad Rzeszutek Wilk,
	Eduardo Habkost, Igor Mammedov, Michael S. Tsirkin,
	Richard Henderson

On 14/10/2017 00:46, Stefano Stabellini wrote:
> On Fri, 13 Oct 2017, Jan Beulich wrote:
>>>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
>>> To Jan, Andrew, Stefano and Anthony,
>>>
>>> what do you think about allowing QEMU to build the entire guest ACPI
>>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
>>> still there and just bypassed in this case.
>> Well, if that can be made work in a non-quirky way and without
>> loss of functionality, I'd probably be fine. I do think, however,
>> that there's a reason this is being handled in hvmloader right now.
> And not to discourage you, just as a clarification, you'll also need to
> consider backward compatibility: unless the tables are identical, I
> imagine we'll have to keep using the old tables for already installed
> virtual machines.

I agree.  Some of them are already identical, some are not but the QEMU
version should be okay, and for yet more it's probably better to keep
the Xen-specific parts in hvmloader.

The good thing is that it's possible to proceed incrementally once you
have the hvmloader support for merging the QEMU and hvmloader RSDT or
XSDT (whatever you are using), starting with just NVDIMM and proceeding
later with whatever you see fit.

Paolo

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-17 11:45                           ` Paolo Bonzini
  0 siblings, 0 replies; 128+ messages in thread
From: Paolo Bonzini @ 2017-10-17 11:45 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: Haozhong Zhang, wei.liu2, Xiao Guangrong, Konrad Rzeszutek Wilk,
	qemu-devel, andrew.cooper3, Michael S. Tsirkin, ian.jackson,
	george.dunlap, xen-devel, Igor Mammedov, xen-devel,
	Anthony Perard, Chao Peng, Dan Williams, Richard Henderson,
	Eduardo Habkost

On 14/10/2017 00:46, Stefano Stabellini wrote:
> On Fri, 13 Oct 2017, Jan Beulich wrote:
>>>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
>>> To Jan, Andrew, Stefano and Anthony,
>>>
>>> what do you think about allowing QEMU to build the entire guest ACPI
>>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
>>> still there and just bypassed in this case.
>> Well, if that can be made work in a non-quirky way and without
>> loss of functionality, I'd probably be fine. I do think, however,
>> that there's a reason this is being handled in hvmloader right now.
> And not to discourage you, just as a clarification, you'll also need to
> consider backward compatibility: unless the tables are identical, I
> imagine we'll have to keep using the old tables for already installed
> virtual machines.

I agree.  Some of them are already identical, some are not but the QEMU
version should be okay, and for yet more it's probably better to keep
the Xen-specific parts in hvmloader.

The good thing is that it's possible to proceed incrementally once you
have the hvmloader support for merging the QEMU and hvmloader RSDT or
XSDT (whatever you are using), starting with just NVDIMM and proceeding
later with whatever you see fit.

Paolo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-17 11:45                           ` Paolo Bonzini
@ 2017-10-17 12:16                             ` Haozhong Zhang
  -1 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-17 12:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefano Stabellini, Jan Beulich, andrew.cooper3, Anthony Perard,
	george.dunlap, wei.liu2, ian.jackson, Xiao Guangrong,
	Dan Williams, Chao Peng, xen-devel, xen-devel, qemu-devel,
	Konrad Rzeszutek Wilk, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Richard Henderson

On 10/17/17 13:45 +0200, Paolo Bonzini wrote:
> On 14/10/2017 00:46, Stefano Stabellini wrote:
> > On Fri, 13 Oct 2017, Jan Beulich wrote:
> >>>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> >>> To Jan, Andrew, Stefano and Anthony,
> >>>
> >>> what do you think about allowing QEMU to build the entire guest ACPI
> >>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> >>> still there and just bypassed in this case.
> >> Well, if that can be made work in a non-quirky way and without
> >> loss of functionality, I'd probably be fine. I do think, however,
> >> that there's a reason this is being handled in hvmloader right now.
> > And not to discourage you, just as a clarification, you'll also need to
> > consider backward compatibility: unless the tables are identical, I
> > imagine we'll have to keep using the old tables for already installed
> > virtual machines.
> 
> I agree.  Some of them are already identical, some are not but the QEMU
> version should be okay, and for yet more it's probably better to keep
> the Xen-specific parts in hvmloader.
> 
> The good thing is that it's possible to proceed incrementally once you
> have the hvmloader support for merging the QEMU and hvmloader RSDT or
> XSDT (whatever you are using), starting with just NVDIMM and proceeding
> later with whatever you see fit.
> 

I'll have a try to check how much the differences would affect. If it
would not take too much work, I'd like to adapt Xen NVDIMM enabling
patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
and MST's suggestions.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-17 12:16                             ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-17 12:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefano Stabellini, wei.liu2, Xiao Guangrong,
	Konrad Rzeszutek Wilk, qemu-devel, andrew.cooper3,
	Michael S. Tsirkin, ian.jackson, george.dunlap, xen-devel,
	Igor Mammedov, Jan Beulich, xen-devel, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson, Eduardo Habkost

On 10/17/17 13:45 +0200, Paolo Bonzini wrote:
> On 14/10/2017 00:46, Stefano Stabellini wrote:
> > On Fri, 13 Oct 2017, Jan Beulich wrote:
> >>>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> >>> To Jan, Andrew, Stefano and Anthony,
> >>>
> >>> what do you think about allowing QEMU to build the entire guest ACPI
> >>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> >>> still there and just bypassed in this case.
> >> Well, if that can be made work in a non-quirky way and without
> >> loss of functionality, I'd probably be fine. I do think, however,
> >> that there's a reason this is being handled in hvmloader right now.
> > And not to discourage you, just as a clarification, you'll also need to
> > consider backward compatibility: unless the tables are identical, I
> > imagine we'll have to keep using the old tables for already installed
> > virtual machines.
> 
> I agree.  Some of them are already identical, some are not but the QEMU
> version should be okay, and for yet more it's probably better to keep
> the Xen-specific parts in hvmloader.
> 
> The good thing is that it's possible to proceed incrementally once you
> have the hvmloader support for merging the QEMU and hvmloader RSDT or
> XSDT (whatever you are using), starting with just NVDIMM and proceeding
> later with whatever you see fit.
> 

I'll have a try to check how much the differences would affect. If it
would not take too much work, I'd like to adapt Xen NVDIMM enabling
patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
and MST's suggestions.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-17 12:16                             ` Haozhong Zhang
@ 2017-10-18  8:32                               ` Roger Pau Monné
  -1 siblings, 0 replies; 128+ messages in thread
From: Roger Pau Monné @ 2017-10-18  8:32 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Paolo Bonzini, Stefano Stabellini, Jan Beulich, andrew.cooper3,
	Anthony Perard, george.dunlap, wei.liu2, ian.jackson,
	Xiao Guangrong, Dan Williams, Chao Peng, xen-devel, xen-devel,
	qemu-devel, Konrad Rzeszutek Wilk, Eduardo Habkost,
	Igor Mammedov, Michael S. Tsirkin, Richard Henderson

On Tue, Oct 17, 2017 at 08:16:47PM +0800, Haozhong Zhang wrote:
> On 10/17/17 13:45 +0200, Paolo Bonzini wrote:
> > On 14/10/2017 00:46, Stefano Stabellini wrote:
> > > On Fri, 13 Oct 2017, Jan Beulich wrote:
> > >>>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > >>> To Jan, Andrew, Stefano and Anthony,
> > >>>
> > >>> what do you think about allowing QEMU to build the entire guest ACPI
> > >>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > >>> still there and just bypassed in this case.
> > >> Well, if that can be made work in a non-quirky way and without
> > >> loss of functionality, I'd probably be fine. I do think, however,
> > >> that there's a reason this is being handled in hvmloader right now.
> > > And not to discourage you, just as a clarification, you'll also need to
> > > consider backward compatibility: unless the tables are identical, I
> > > imagine we'll have to keep using the old tables for already installed
> > > virtual machines.
> > 
> > I agree.  Some of them are already identical, some are not but the QEMU
> > version should be okay, and for yet more it's probably better to keep
> > the Xen-specific parts in hvmloader.
> > 
> > The good thing is that it's possible to proceed incrementally once you
> > have the hvmloader support for merging the QEMU and hvmloader RSDT or
> > XSDT (whatever you are using), starting with just NVDIMM and proceeding
> > later with whatever you see fit.
> > 
> 
> I'll have a try to check how much the differences would affect. If it
> would not take too much work, I'd like to adapt Xen NVDIMM enabling
> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
> and MST's suggestions.

I don't agree with the end goal of fully switching to the QEMU build
ACPI tables. First of all, the only entity that has all the
information about the guest it's the toolstack, and so it should be
the one in control of the ACPI tables.

Also, Xen guests can use several device models concurrently (via the
ioreq server interface), and each should be able to contribute to the
information presented in the ACPI tables. Intel is also working on
adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
ACPI tables should be created by the toolstack and not QEMU. And
finally keep in mind that there are Xen guests (PVH) that use ACPI
tables but not QEMU.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-18  8:32                               ` Roger Pau Monné
  0 siblings, 0 replies; 128+ messages in thread
From: Roger Pau Monné @ 2017-10-18  8:32 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, wei.liu2, Xiao Guangrong,
	Konrad Rzeszutek Wilk, qemu-devel, andrew.cooper3,
	Michael S. Tsirkin, ian.jackson, george.dunlap, xen-devel,
	xen-devel, Igor Mammedov, Jan Beulich, Chao Peng, Anthony Perard,
	Paolo Bonzini, Dan Williams, Richard Henderson, Eduardo Habkost

On Tue, Oct 17, 2017 at 08:16:47PM +0800, Haozhong Zhang wrote:
> On 10/17/17 13:45 +0200, Paolo Bonzini wrote:
> > On 14/10/2017 00:46, Stefano Stabellini wrote:
> > > On Fri, 13 Oct 2017, Jan Beulich wrote:
> > >>>>> On 13.10.17 at 13:13, <haozhong.zhang@intel.com> wrote:
> > >>> To Jan, Andrew, Stefano and Anthony,
> > >>>
> > >>> what do you think about allowing QEMU to build the entire guest ACPI
> > >>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > >>> still there and just bypassed in this case.
> > >> Well, if that can be made work in a non-quirky way and without
> > >> loss of functionality, I'd probably be fine. I do think, however,
> > >> that there's a reason this is being handled in hvmloader right now.
> > > And not to discourage you, just as a clarification, you'll also need to
> > > consider backward compatibility: unless the tables are identical, I
> > > imagine we'll have to keep using the old tables for already installed
> > > virtual machines.
> > 
> > I agree.  Some of them are already identical, some are not but the QEMU
> > version should be okay, and for yet more it's probably better to keep
> > the Xen-specific parts in hvmloader.
> > 
> > The good thing is that it's possible to proceed incrementally once you
> > have the hvmloader support for merging the QEMU and hvmloader RSDT or
> > XSDT (whatever you are using), starting with just NVDIMM and proceeding
> > later with whatever you see fit.
> > 
> 
> I'll have a try to check how much the differences would affect. If it
> would not take too much work, I'd like to adapt Xen NVDIMM enabling
> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
> and MST's suggestions.

I don't agree with the end goal of fully switching to the QEMU build
ACPI tables. First of all, the only entity that has all the
information about the guest it's the toolstack, and so it should be
the one in control of the ACPI tables.

Also, Xen guests can use several device models concurrently (via the
ioreq server interface), and each should be able to contribute to the
information presented in the ACPI tables. Intel is also working on
adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
ACPI tables should be created by the toolstack and not QEMU. And
finally keep in mind that there are Xen guests (PVH) that use ACPI
tables but not QEMU.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-18  8:32                               ` [Qemu-devel] " Roger Pau Monné
@ 2017-10-18  8:46                                 ` Paolo Bonzini
  -1 siblings, 0 replies; 128+ messages in thread
From: Paolo Bonzini @ 2017-10-18  8:46 UTC (permalink / raw)
  To: Roger Pau Monné, Haozhong Zhang
  Cc: Stefano Stabellini, Jan Beulich, andrew.cooper3, Anthony Perard,
	george.dunlap, wei.liu2, ian.jackson, Xiao Guangrong,
	Dan Williams, Chao Peng, xen-devel, xen-devel, qemu-devel,
	Konrad Rzeszutek Wilk, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Richard Henderson

On 18/10/2017 10:32, Roger Pau Monné wrote:
>> I'll have a try to check how much the differences would affect. If it
>> would not take too much work, I'd like to adapt Xen NVDIMM enabling
>> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
>> and MST's suggestions.
> I don't agree with the end goal of fully switching to the QEMU build
> ACPI tables. First of all, the only entity that has all the
> information about the guest it's the toolstack, and so it should be
> the one in control of the ACPI tables.
> 
> Also, Xen guests can use several device models concurrently (via the
> ioreq server interface), and each should be able to contribute to the
> information presented in the ACPI tables. Intel is also working on
> adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
> ACPI tables should be created by the toolstack and not QEMU. And
> finally keep in mind that there are Xen guests (PVH) that use ACPI
> tables but not QEMU.

I agree with this in fact; QEMU has a view of _most_ of the emulated
hardware, but not all.

However, I disagree that the toolstack should be alone in controlling
the ACPI tables; rather, each involved part of the stack should be
providing its own part of the tables.  For example, QEMU (in addition to
NVDIMM information) should be the one providing an SSDT for southbridge
devices (floppy, COMx, LPTx, etc.).

The Xen stack (or more likely, hvmloader itself) would provide all the
bits that are provided by the hypervisor (MADT for the IOAPIC, another
SSDT for the HPET and RTC, DMAR tables for IOMMU, and so on).  This
should also work just fine for PVH.  Of course backwards compatibility
is the enemy of simplification, but in the end things _should_ actually
be simpler and I think it's a good idea if a prerequisite for Xen
vNVDIMM is to move AML code for QEMU devices out of hvmloader.

Paolo

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-18  8:46                                 ` Paolo Bonzini
  0 siblings, 0 replies; 128+ messages in thread
From: Paolo Bonzini @ 2017-10-18  8:46 UTC (permalink / raw)
  To: Roger Pau Monné, Haozhong Zhang
  Cc: Stefano Stabellini, wei.liu2, Xiao Guangrong,
	Konrad Rzeszutek Wilk, qemu-devel, andrew.cooper3,
	Michael S. Tsirkin, ian.jackson, george.dunlap, xen-devel,
	Igor Mammedov, Jan Beulich, xen-devel, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson, Eduardo Habkost

On 18/10/2017 10:32, Roger Pau Monné wrote:
>> I'll have a try to check how much the differences would affect. If it
>> would not take too much work, I'd like to adapt Xen NVDIMM enabling
>> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
>> and MST's suggestions.
> I don't agree with the end goal of fully switching to the QEMU build
> ACPI tables. First of all, the only entity that has all the
> information about the guest it's the toolstack, and so it should be
> the one in control of the ACPI tables.
> 
> Also, Xen guests can use several device models concurrently (via the
> ioreq server interface), and each should be able to contribute to the
> information presented in the ACPI tables. Intel is also working on
> adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
> ACPI tables should be created by the toolstack and not QEMU. And
> finally keep in mind that there are Xen guests (PVH) that use ACPI
> tables but not QEMU.

I agree with this in fact; QEMU has a view of _most_ of the emulated
hardware, but not all.

However, I disagree that the toolstack should be alone in controlling
the ACPI tables; rather, each involved part of the stack should be
providing its own part of the tables.  For example, QEMU (in addition to
NVDIMM information) should be the one providing an SSDT for southbridge
devices (floppy, COMx, LPTx, etc.).

The Xen stack (or more likely, hvmloader itself) would provide all the
bits that are provided by the hypervisor (MADT for the IOAPIC, another
SSDT for the HPET and RTC, DMAR tables for IOMMU, and so on).  This
should also work just fine for PVH.  Of course backwards compatibility
is the enemy of simplification, but in the end things _should_ actually
be simpler and I think it's a good idea if a prerequisite for Xen
vNVDIMM is to move AML code for QEMU devices out of hvmloader.

Paolo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
  2017-10-18  8:46                                 ` [Qemu-devel] " Paolo Bonzini
@ 2017-10-18  8:55                                   ` Roger Pau Monné
  -1 siblings, 0 replies; 128+ messages in thread
From: Roger Pau Monné @ 2017-10-18  8:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Haozhong Zhang, Stefano Stabellini, Jan Beulich, andrew.cooper3,
	Anthony Perard, george.dunlap, wei.liu2, ian.jackson,
	Xiao Guangrong, Dan Williams, Chao Peng, xen-devel, xen-devel,
	qemu-devel, Konrad Rzeszutek Wilk, Eduardo Habkost,
	Igor Mammedov, Michael S. Tsirkin, Richard Henderson

On Wed, Oct 18, 2017 at 10:46:57AM +0200, Paolo Bonzini wrote:
> On 18/10/2017 10:32, Roger Pau Monné wrote:
> >> I'll have a try to check how much the differences would affect. If it
> >> would not take too much work, I'd like to adapt Xen NVDIMM enabling
> >> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
> >> and MST's suggestions.
> > I don't agree with the end goal of fully switching to the QEMU build
> > ACPI tables. First of all, the only entity that has all the
> > information about the guest it's the toolstack, and so it should be
> > the one in control of the ACPI tables.
> > 
> > Also, Xen guests can use several device models concurrently (via the
> > ioreq server interface), and each should be able to contribute to the
> > information presented in the ACPI tables. Intel is also working on
> > adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
> > ACPI tables should be created by the toolstack and not QEMU. And
> > finally keep in mind that there are Xen guests (PVH) that use ACPI
> > tables but not QEMU.
> 
> I agree with this in fact; QEMU has a view of _most_ of the emulated
> hardware, but not all.
> 
> However, I disagree that the toolstack should be alone in controlling
> the ACPI tables; rather, each involved part of the stack should be
> providing its own part of the tables.  For example, QEMU (in addition to
> NVDIMM information) should be the one providing an SSDT for southbridge
> devices (floppy, COMx, LPTx, etc.).

Yes, that's what I wanted to say, rather than the toolstack providing
all the ACPI tables by itself. Every component should provide the
tables of the devices under it's control, and that should be glued
together by the toolstack (ie: hvmloader).

Roger.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-10-18  8:55                                   ` Roger Pau Monné
  0 siblings, 0 replies; 128+ messages in thread
From: Roger Pau Monné @ 2017-10-18  8:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Haozhong Zhang, Stefano Stabellini, wei.liu2, Xiao Guangrong,
	Konrad Rzeszutek Wilk, qemu-devel, andrew.cooper3,
	Michael S. Tsirkin, ian.jackson, george.dunlap, xen-devel,
	Igor Mammedov, Jan Beulich, xen-devel, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson, Eduardo Habkost

On Wed, Oct 18, 2017 at 10:46:57AM +0200, Paolo Bonzini wrote:
> On 18/10/2017 10:32, Roger Pau Monné wrote:
> >> I'll have a try to check how much the differences would affect. If it
> >> would not take too much work, I'd like to adapt Xen NVDIMM enabling
> >> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
> >> and MST's suggestions.
> > I don't agree with the end goal of fully switching to the QEMU build
> > ACPI tables. First of all, the only entity that has all the
> > information about the guest it's the toolstack, and so it should be
> > the one in control of the ACPI tables.
> > 
> > Also, Xen guests can use several device models concurrently (via the
> > ioreq server interface), and each should be able to contribute to the
> > information presented in the ACPI tables. Intel is also working on
> > adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
> > ACPI tables should be created by the toolstack and not QEMU. And
> > finally keep in mind that there are Xen guests (PVH) that use ACPI
> > tables but not QEMU.
> 
> I agree with this in fact; QEMU has a view of _most_ of the emulated
> hardware, but not all.
> 
> However, I disagree that the toolstack should be alone in controlling
> the ACPI tables; rather, each involved part of the stack should be
> providing its own part of the tables.  For example, QEMU (in addition to
> NVDIMM information) should be the one providing an SSDT for southbridge
> devices (floppy, COMx, LPTx, etc.).

Yes, that's what I wanted to say, rather than the toolstack providing
all the ACPI tables by itself. Every component should provide the
tables of the devices under it's control, and that should be glued
together by the toolstack (ie: hvmloader).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains
  2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (39 preceding siblings ...)
  2017-09-11  4:41   ` Haozhong Zhang
@ 2017-10-27  3:26 ` Chao Peng
  2017-10-27  4:25   ` Haozhong Zhang
  40 siblings, 1 reply; 128+ messages in thread
From: Chao Peng @ 2017-10-27  3:26 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Ian Jackson, Jan Beulich, Dan Williams


[-- Attachment #1.1: Type: text/plain, Size: 2348 bytes --]

On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> Overview
> ==================
> 
> > (RFC v2 can be found at https://lists.xen.org/archives/html/xen-
devel/2017-03/msg02401.html)
> 
> Well, this RFC v3 changes and inflates a lot from previous versions.
> The primary changes are listed below, most of which are to simplify
> the first implementation and avoid additional inflation.
> 
> 1. Drop the support to maintain the frametable and M2P table of PMEM
>    in RAM. In the future, we may add this support back.

I don't find any discussion in v2 about this, but I'm thinking putting
those Xen data structures in RAM sometimes is useful (e.g. when
performance is important). It's better not making hard restriction on
this.

> 
> 2. Hide host NFIT and deny access to host PMEM from Dom0. In other
>    words, the kernel NVDIMM driver is loaded in Dom 0 and existing
>    management utilities (e.g. ndctl) do not work in Dom0 anymore. This
>    is to workaround the inferences of PMEM access between Dom0 and Xen
>    hypervisor. In the future, we may add a stub driver in Dom0 which
>    will hold the PMEM pages being used by Xen hypervisor and/or other
>    domains.
> 
> 3. As there is no NVDIMM driver and management utilities in Dom0 now,
> >    we cannot easily specify an area of host NVDIMM (e.g., by
/dev/pmem0)
>    and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
>    have to specify the exact MFNs of host PMEM pages in xl domain
>    configuration files and the newly added Xen NVDIMM management
>    utility xen-ndctl.
> 
>    If there are indeed some tasks that have to be handled by existing
>    driver and management utilities, such as recovery from hardware
>    failures, they have to be accomplished out of Xen environment.

What kind of recovery can happen and does the recovery can happen at
runtime? For example, can we recover a portion of NVDIMM assigned to a
certain VM while keep other VMs still using NVDIMM?

> 
>    After 2. is solved in the future, we would be able to make existing
>    driver and management utilities work in Dom0 again.

Is there any reason why we can't do it now? If existing ndctl (with
additional patches) can work then we don't need introduce xen-ndctl
anymore? I think that keeps user interface clearer.

Chao

[-- Attachment #1.2: Type: text/html, Size: 3136 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains
  2017-10-27  3:26 ` [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Chao Peng
@ 2017-10-27  4:25   ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-27  4:25 UTC (permalink / raw)
  To: Chao Peng
  Cc: Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Dan Williams

On 10/27/17 11:26 +0800, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > Overview
> > ==================
> > 
> > > (RFC v2 can be found at https://lists.xen.org/archives/html/xen-
> devel/2017-03/msg02401.html)
> > 
> > Well, this RFC v3 changes and inflates a lot from previous versions.
> > The primary changes are listed below, most of which are to simplify
> > the first implementation and avoid additional inflation.
> > 
> > 1. Drop the support to maintain the frametable and M2P table of PMEM
> >    in RAM. In the future, we may add this support back.
> 
> I don't find any discussion in v2 about this, but I'm thinking putting
> those Xen data structures in RAM sometimes is useful (e.g. when
> performance is important). It's better not making hard restriction on
> this.

Well, this is to reduce the complexity, as you see the current patch
size is already too big. In addition, the size of NVDIMM can be very
large, e.g. several tera-bytes or even more, which would require a
large RAM space to store its frametable and M2P (~10 MB per 1 GB) and
leave fewer RAM for guest usage.

> 
> > 
> > 2. Hide host NFIT and deny access to host PMEM from Dom0. In other
> >    words, the kernel NVDIMM driver is loaded in Dom 0 and existing
> >    management utilities (e.g. ndctl) do not work in Dom0 anymore. This
> >    is to workaround the inferences of PMEM access between Dom0 and Xen
> >    hypervisor. In the future, we may add a stub driver in Dom0 which
> >    will hold the PMEM pages being used by Xen hypervisor and/or other
> >    domains.
> > 
> > 3. As there is no NVDIMM driver and management utilities in Dom0 now,
> > >    we cannot easily specify an area of host NVDIMM (e.g., by
> /dev/pmem0)
> >    and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
> >    have to specify the exact MFNs of host PMEM pages in xl domain
> >    configuration files and the newly added Xen NVDIMM management
> >    utility xen-ndctl.
> > 
> >    If there are indeed some tasks that have to be handled by existing
> >    driver and management utilities, such as recovery from hardware
> >    failures, they have to be accomplished out of Xen environment.
> 
> What kind of recovery can happen and does the recovery can happen at
> runtime? For example, can we recover a portion of NVDIMM assigned to a
> certain VM while keep other VMs still using NVDIMM?

For example, evaluate ACPI _DSM (maybe vendor specific) for error
recovery and/or scrubbing bad blocks, etc.

> 
> > 
> >    After 2. is solved in the future, we would be able to make existing
> >    driver and management utilities work in Dom0 again.
> 
> Is there any reason why we can't do it now? If existing ndctl (with
> additional patches) can work then we don't need introduce xen-ndctl
> anymore? I think that keeps user interface clearer.

The simple reason is I want to reduce the components (Xen/kernel/QEMU)
touched by the first patchset (whose primary target is to implement
the basic functionality, i.e. mapping host NVDIMM to guest as a
virtual NVDIMM). As you said, leaving a driver (the nvdimm driver
and/or a stub driver) in Dom0 would make the user interface
clearer. Let's see what I can get in the next version.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
  2017-09-11  4:37 ` [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
@ 2017-10-27  6:49   ` Chao Peng
  2017-10-27  7:02     ` Haozhong Zhang
  0 siblings, 1 reply; 128+ messages in thread
From: Chao Peng @ 2017-10-27  6:49 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Andrew Cooper, Dan Williams, Jan Beulich, Konrad Rzeszutek Wilk

On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> The current check refuses the hot-plugged memory that falls in one
> unused PDX group, which should be allowed.

Looks reasonable to me. The only thing I can think of is you can double
check if the following find_next_zero_bit/find_next_bit will still
work. 

Chao
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/x86_64/mm.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 11746730b4..6c5221f90c 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1296,12 +1296,8 @@ static int mem_hotadd_check(unsigned long spfn,
> unsigned long epfn)
>          return 0;
>  
>      /* Make sure the new range is not present now */
> -    sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  &
> ~(PDX_GROUP_COUNT - 1))
> -            / PDX_GROUP_COUNT;
> +    sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) /
> PDX_GROUP_COUNT;
>      eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) /
> PDX_GROUP_COUNT;
> -    if (sidx >= eidx)
> -        return 0;
> -
>      s = find_next_zero_bit(pdx_group_valid, eidx, sidx);
>      if ( s > eidx )
>          return 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
  2017-09-11  4:37 ` [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table() Haozhong Zhang
@ 2017-10-27  6:58   ` Chao Peng
  2017-10-27  9:24     ` Andrew Cooper
  0 siblings, 1 reply; 128+ messages in thread
From: Chao Peng @ 2017-10-27  6:58 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel; +Cc: Andrew Cooper, Dan Williams, Jan Beulich

On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is
> identical to the former.

Looks good to me.

Chao
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/x86_64/mm.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 6c5221f90c..c93383d7d9 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -720,12 +720,11 @@ static void cleanup_frame_table(struct
> mem_hotadd_info *info)
>      spfn = info->spfn;
>      epfn = info->epfn;
>  
> -    sva = (unsigned long)pdx_to_page(pfn_to_pdx(spfn));
> -    eva = (unsigned long)pdx_to_page(pfn_to_pdx(epfn));
> +    sva = (unsigned long)mfn_to_page(spfn);
> +    eva = (unsigned long)mfn_to_page(epfn);
>  
>      /* Intialize all page */
> -    memset(mfn_to_page(spfn), -1,
> -           (unsigned long)mfn_to_page(epfn) - (unsigned
> long)mfn_to_page(spfn));
> +    memset((void *)sva, -1, eva - sva);
>  
>      while (sva < eva)
>      {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
  2017-10-27  6:49   ` Chao Peng
@ 2017-10-27  7:02     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-10-27  7:02 UTC (permalink / raw)
  To: Chao Peng
  Cc: Andrew Cooper, Dan Williams, Konrad Rzeszutek Wilk, Jan Beulich,
	xen-devel

On 10/27/17 14:49 +0800, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > The current check refuses the hot-plugged memory that falls in one
> > unused PDX group, which should be allowed.
> 
> Looks reasonable to me. The only thing I can think of is you can double
> check if the following find_next_zero_bit/find_next_bit will still
> work. 

The first check in mem_hotadd_check() ensures spfn < epfn, so sidx <=
eidx here. Compared with the previous code, the only added case is
sidx == eidx, which is what this patch intends to allow and tested.

Haozhong

> 
> Chao
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> >  xen/arch/x86/x86_64/mm.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> > index 11746730b4..6c5221f90c 100644
> > --- a/xen/arch/x86/x86_64/mm.c
> > +++ b/xen/arch/x86/x86_64/mm.c
> > @@ -1296,12 +1296,8 @@ static int mem_hotadd_check(unsigned long spfn,
> > unsigned long epfn)
> >          return 0;
> >  
> >      /* Make sure the new range is not present now */
> > -    sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  &
> > ~(PDX_GROUP_COUNT - 1))
> > -            / PDX_GROUP_COUNT;
> > +    sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) /
> > PDX_GROUP_COUNT;
> >      eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) /
> > PDX_GROUP_COUNT;
> > -    if (sidx >= eidx)
> > -        return 0;
> > -
> >      s = find_next_zero_bit(pdx_group_valid, eidx, sidx);
> >      if ( s > eidx )
> >          return 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table
  2017-09-11  4:37 ` [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
@ 2017-10-27  8:10   ` Chao Peng
  0 siblings, 0 replies; 128+ messages in thread
From: Chao Peng @ 2017-10-27  8:10 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Andrew Cooper, Dan Williams, Jan Beulich, Konrad Rzeszutek Wilk

On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> cleanup_frame_table() initializes the entire newly added frame table
> to all -1's. If it's called after extend_frame_table() failed to map
> the entire frame table, the initialization will hit a page fault.
> 
> Move the cleanup of partially mapped frametable to
> extend_frame_table(),
> which has enough knowledge of the mapping status.

Overall the patch fixed the issue. But I guess you can achieve this with
less change. For example, you can use info->cur to pass the last mapped
pfn and only memset those mapped chuncks.

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
  2017-10-27  6:58   ` Chao Peng
@ 2017-10-27  9:24     ` Andrew Cooper
  2017-10-30  2:21       ` Chao Peng
  0 siblings, 1 reply; 128+ messages in thread
From: Andrew Cooper @ 2017-10-27  9:24 UTC (permalink / raw)
  To: Chao Peng, Haozhong Zhang, xen-devel; +Cc: Dan Williams, Jan Beulich

On 27/10/17 07:58, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
>> Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is
>> identical to the former.
> Looks good to me.

Is that a Reviewed-by: then?

>
> Chao
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Given that this is a trivial cleanup patch, I will include it in the
x86-next branch I am maintaining until the 4.11 release window opens.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
  2017-10-27  9:24     ` Andrew Cooper
@ 2017-10-30  2:21       ` Chao Peng
  0 siblings, 0 replies; 128+ messages in thread
From: Chao Peng @ 2017-10-30  2:21 UTC (permalink / raw)
  To: Andrew Cooper, Haozhong Zhang, xen-devel; +Cc: Dan Williams, Jan Beulich

On Fri, 2017-10-27 at 10:24 +0100, Andrew Cooper wrote:
> On 27/10/17 07:58, Chao Peng wrote:
> > 
> > On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > > 
> > > Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is
> > > identical to the former.
> > Looks good to me.
> 
> Is that a Reviewed-by: then?

Yes, Reviewed-by: Chao Peng <chao.p.peng@linux.intel.com>

> 
> > 
> > 
> > Chao
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > ---
> > > Cc: Jan Beulich <jbeulich@suse.com>
> > > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Given that this is a trivial cleanup patch, I will include it in the
> x86-next branch I am maintaining until the 4.11 release window opens.
> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable
  2017-09-11  4:37 ` [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable Haozhong Zhang
@ 2017-11-03  5:58   ` Chao Peng
  2017-11-03  6:39     ` Haozhong Zhang
  0 siblings, 1 reply; 128+ messages in thread
From: Chao Peng @ 2017-11-03  5:58 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: George Dunlap, Andrew Cooper, Dan Williams, Jan Beulich,
	Konrad Rzeszutek Wilk


> +#ifdef CONFIG_NVDIMM_PMEM
> +static void __init init_frametable_pmem_chunk(unsigned long s,
> unsigned long e)
> +{
> +    static unsigned long pmem_init_frametable_mfn;
> +
> +    ASSERT(!((s | e) & (PAGE_SIZE - 1)));
> +
> +    if ( !pmem_init_frametable_mfn )
> +    {
> +        pmem_init_frametable_mfn = alloc_boot_pages(1, 1);
> +        if ( !pmem_init_frametable_mfn )
> +            panic("Not enough memory for pmem initial frame table
> page");
> +        memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE);
> +    }

Can zero_page be used instead?

> +
> +    while ( s < e )
> +    {
> +        /*
> +         * The real frame table entries of a pmem region will be
> +         * created when the pmem region is registered to hypervisor.
> +         * Any write attempt to the initial entries of that pmem
> +         * region implies potential hypervisor bugs. In order to make
> +         * those bugs explicit, map those initial entries as read-
> only.
> +         */
> +        map_pages_to_xen(s, pmem_init_frametable_mfn, 1,
> PAGE_HYPERVISOR_RO);
> +        s += PAGE_SIZE;

Don't know how much the impact of 4K mapping on boot time when pmem is
very large. Perhaps we need get such data on hardware.

Another question is do we really need to map it, e.g. can we just skip
the range here?

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT
  2017-09-11  4:37 ` [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT Haozhong Zhang
@ 2017-11-03  6:15   ` Chao Peng
  2017-11-03  7:14     ` Haozhong Zhang
  0 siblings, 1 reply; 128+ messages in thread
From: Chao Peng @ 2017-11-03  6:15 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Andrew Cooper, Dan Williams, Jan Beulich, Konrad Rzeszutek Wilk


> +static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc
> *desc)
> +{
> +    struct nfit_spa_desc *spa_desc;
> +    struct nfit_memdev_desc *memdev_desc;
> +    struct acpi_nfit_system_address *spa;
> +    unsigned long smfn, emfn;
> +
> +    list_for_each_entry(memdev_desc, &desc->memdev_list, link)
> +    {
> +        spa_desc = memdev_desc->spa_desc;
> +
> +        if ( !spa_desc ||
> +             (memdev_desc->acpi_table->flags &
> +              (ACPI_NFIT_MEM_SAVE_FAILED |
> ACPI_NFIT_MEM_RESTORE_FAILED |
> +               ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_NOT_ARMED |
> +               ACPI_NFIT_MEM_MAP_FAILED)) )
> +            continue;

If failure is detected, is it reasonable to continue? We can print some
messages at least I think.

Chao
> +
> +        spa = spa_desc->acpi_table;
> +        if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) )
> +            continue;
> +        smfn = paddr_to_pfn(spa->address);
> +        emfn = paddr_to_pfn(spa->address + spa->length);
> +        printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn,
> emfn);
> +    }
> +}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor
  2017-09-11  4:37 ` [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor Haozhong Zhang
@ 2017-11-03  6:26   ` Chao Peng
  0 siblings, 0 replies; 128+ messages in thread
From: Chao Peng @ 2017-11-03  6:26 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Andrew Cooper, Dan Williams, Jan Beulich, Konrad Rzeszutek Wilk

> +
> +/**
> + * Add a PMEM region to a list. All PMEM regions in the list are
> + * sorted in the ascending order of the start address. A PMEM region,
> + * whose range is overlapped with anyone in the list, cannot be added
> + * to the list.
> + *
> + * Parameters:
> + *  list:       the list to which a new PMEM region will be added
> + *  smfn, emfn: the range of the new PMEM region
> + *  entry:      return the new entry added to the list
> + *
> + * Return:
> + *  On success, return 0 and the new entry added to the list is
> + *  returned via @entry. Otherwise, return an error number and the
> + *  value of @entry is undefined.
> + */
> +static int pmem_list_add(struct list_head *list,
> +                         unsigned long smfn, unsigned long emfn,
> +                         struct pmem **entry)
> +{
> +    struct list_head *cur;
> +    struct pmem *new_pmem;
> +    int rc = 0;
> +
> +    list_for_each_prev(cur, list)
> +    {
> +        struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
> +        unsigned long cur_smfn = cur_pmem->smfn;
> +        unsigned long cur_emfn = cur_pmem->emfn;
> +
> +        if ( check_overlap(smfn, emfn, cur_smfn, cur_emfn) )
> +        {
> +            rc = -EEXIST;
> +            goto out;
> +        }
> +
> +        if ( cur_smfn < smfn )
> +            break;
> +    }
> +
> +    new_pmem = xzalloc(struct pmem);
> +    if ( !new_pmem )
> +    {
> +        rc = -ENOMEM;
> +        goto out;
> +    }
> +    new_pmem->smfn = smfn;
> +    new_pmem->emfn = emfn;
> +    list_add(&new_pmem->link, cur);
> +
> + out:
> +    if ( !rc && entry )
> +        *entry = new_pmem;
> +
> +    return rc;

It's not necessary to introduce 'out' and 'rc'. You can return directly
for failure case.

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable
  2017-11-03  5:58   ` Chao Peng
@ 2017-11-03  6:39     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-11-03  6:39 UTC (permalink / raw)
  To: Chao Peng
  Cc: Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, xen-devel,
	Jan Beulich, Dan Williams

On 11/03/17 13:58 +0800, Chao Peng wrote:
> 
> > +#ifdef CONFIG_NVDIMM_PMEM
> > +static void __init init_frametable_pmem_chunk(unsigned long s,
> > unsigned long e)
> > +{
> > +    static unsigned long pmem_init_frametable_mfn;
> > +
> > +    ASSERT(!((s | e) & (PAGE_SIZE - 1)));
> > +
> > +    if ( !pmem_init_frametable_mfn )
> > +    {
> > +        pmem_init_frametable_mfn = alloc_boot_pages(1, 1);
> > +        if ( !pmem_init_frametable_mfn )
> > +            panic("Not enough memory for pmem initial frame table
> > page");
> > +        memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE);
> > +    }
> 
> Can zero_page be used instead?

No. I intend to make the frametable entries for NVDIMM as invalid at
boot time, in order to avoid/detect accidental accesses to NVDIMM
pages before they are registered to Xen hypervisor later (by part 2
patches 14 - 25).

> 
> > +
> > +    while ( s < e )
> > +    {
> > +        /*
> > +         * The real frame table entries of a pmem region will be
> > +         * created when the pmem region is registered to hypervisor.
> > +         * Any write attempt to the initial entries of that pmem
> > +         * region implies potential hypervisor bugs. In order to make
> > +         * those bugs explicit, map those initial entries as read-
> > only.
> > +         */
> > +        map_pages_to_xen(s, pmem_init_frametable_mfn, 1,
> > PAGE_HYPERVISOR_RO);
> > +        s += PAGE_SIZE;
> 
> Don't know how much the impact of 4K mapping on boot time when pmem is
> very large. Perhaps we need get such data on hardware.
>

Well, it will be very slow because the size of NVDIMM is usually very
large (e.g. from hundreds of giga-bytes to several tera-bytes). I can
make it to use huge page if possible.

> Another question is do we really need to map it, e.g. can we just skip
> the range here?

Sadly, I cannot remind why I did this. Maybe I can just leave the
frametable of NVDIMM unmapped and accidental access to them would just
trigger page fault in hypervisor, which can makes bugs explicit as well.


Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
  2017-09-11  4:37 ` [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0 Haozhong Zhang
@ 2017-11-03  6:51   ` Chao Peng
  2017-11-03  7:24     ` Haozhong Zhang
  0 siblings, 1 reply; 128+ messages in thread
From: Chao Peng @ 2017-11-03  6:51 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Konrad Rzeszutek Wilk, Andrew Cooper, Jan Beulich, Shane Wang,
	Dan Williams, Gang Wei

On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> ... to avoid the inference with the PMEM driver and management
> utilities in Dom0.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Gang Wei <gang.wei@intel.com>
> Cc: Shane Wang <shane.wang@intel.com>
> ---
>  xen/arch/x86/acpi/power.c |  7 +++++++
>  xen/arch/x86/dom0_build.c |  5 +++++
>  xen/arch/x86/shutdown.c   |  3 +++
>  xen/arch/x86/tboot.c      |  4 ++++
>  xen/common/kexec.c        |  3 +++
>  xen/common/pmem.c         | 21 +++++++++++++++++++++
>  xen/drivers/acpi/nfit.c   | 21 +++++++++++++++++++++
>  xen/include/xen/acpi.h    |  2 ++
>  xen/include/xen/pmem.h    | 13 +++++++++++++
>  9 files changed, 79 insertions(+)
> 
> diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
> index 1e4e5680a7..d135715a49 100644
> --- a/xen/arch/x86/acpi/power.c
> +++ b/xen/arch/x86/acpi/power.c
> @@ -178,6 +178,10 @@ static int enter_state(u32 state)
>  
>      freeze_domains();
>  
> +#ifdef CONFIG_NVDIMM_PMEM
> +    acpi_nfit_reinstate();
> +#endif

I don't understand why reinstate is needed for NFIT table? Will it  be
searched by firmware on shutdown / entering power state?

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT
  2017-11-03  6:15   ` Chao Peng
@ 2017-11-03  7:14     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-11-03  7:14 UTC (permalink / raw)
  To: Chao Peng
  Cc: Andrew Cooper, Dan Williams, Konrad Rzeszutek Wilk, Jan Beulich,
	xen-devel

On 11/03/17 14:15 +0800, Chao Peng wrote:
> 
> > +static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc
> > *desc)
> > +{
> > +    struct nfit_spa_desc *spa_desc;
> > +    struct nfit_memdev_desc *memdev_desc;
> > +    struct acpi_nfit_system_address *spa;
> > +    unsigned long smfn, emfn;
> > +
> > +    list_for_each_entry(memdev_desc, &desc->memdev_list, link)
> > +    {
> > +        spa_desc = memdev_desc->spa_desc;
> > +
> > +        if ( !spa_desc ||
> > +             (memdev_desc->acpi_table->flags &
> > +              (ACPI_NFIT_MEM_SAVE_FAILED |
> > ACPI_NFIT_MEM_RESTORE_FAILED |
> > +               ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_NOT_ARMED |
> > +               ACPI_NFIT_MEM_MAP_FAILED)) )
> > +            continue;
> 
> If failure is detected, is it reasonable to continue? We can print some
> messages at least I think.

I got something wrong here. I should iterate SPA structures, and check
all memdev in each SPA range. If any memdev contains failure flags,
then skip the whole SPA range and print an error message.

Haozhong

> 
> Chao
> > +
> > +        spa = spa_desc->acpi_table;
> > +        if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) )
> > +            continue;
> > +        smfn = paddr_to_pfn(spa->address);
> > +        emfn = paddr_to_pfn(spa->address + spa->length);
> > +        printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn,
> > emfn);
> > +    }
> > +}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
  2017-11-03  6:51   ` Chao Peng
@ 2017-11-03  7:24     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-11-03  7:24 UTC (permalink / raw)
  To: Chao Peng
  Cc: Konrad Rzeszutek Wilk, Andrew Cooper, xen-devel, Jan Beulich,
	Shane Wang, Dan Williams, Gang Wei

On 11/03/17 14:51 +0800, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > ... to avoid the inference with the PMEM driver and management
> > utilities in Dom0.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: Gang Wei <gang.wei@intel.com>
> > Cc: Shane Wang <shane.wang@intel.com>
> > ---
> >  xen/arch/x86/acpi/power.c |  7 +++++++
> >  xen/arch/x86/dom0_build.c |  5 +++++
> >  xen/arch/x86/shutdown.c   |  3 +++
> >  xen/arch/x86/tboot.c      |  4 ++++
> >  xen/common/kexec.c        |  3 +++
> >  xen/common/pmem.c         | 21 +++++++++++++++++++++
> >  xen/drivers/acpi/nfit.c   | 21 +++++++++++++++++++++
> >  xen/include/xen/acpi.h    |  2 ++
> >  xen/include/xen/pmem.h    | 13 +++++++++++++
> >  9 files changed, 79 insertions(+)
> > 
> > diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
> > index 1e4e5680a7..d135715a49 100644
> > --- a/xen/arch/x86/acpi/power.c
> > +++ b/xen/arch/x86/acpi/power.c
> > @@ -178,6 +178,10 @@ static int enter_state(u32 state)
> >  
> >      freeze_domains();
> >  
> > +#ifdef CONFIG_NVDIMM_PMEM
> > +    acpi_nfit_reinstate();
> > +#endif
> 
> I don't understand why reinstate is needed for NFIT table? Will it  be
> searched by firmware on shutdown / entering power state?

I added these acpi_nfit_reinstate()'s akin to acpi_dmar_reinstate().
There is not public documents stating NFIT is not rebuilt during power
state changes.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
  2017-09-11  4:37 ` [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op Haozhong Zhang
@ 2017-11-03  7:40   ` Chao Peng
  2017-11-03  8:54     ` Haozhong Zhang
  0 siblings, 1 reply; 128+ messages in thread
From: Chao Peng @ 2017-11-03  7:40 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Andrew Cooper, Dan Williams, Daniel De Graaf, Jan Beulich,
	Konrad Rzeszutek Wilk


> +/*
> + * Interface for NVDIMM management.
> + */
> +
> +struct xen_sysctl_nvdimm_op {
> +    uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented
> yet. */
> +    uint32_t pad; /* IN: Always zero. */

If alignment is the only concern, then err can be moved to here.

If it's designed for future and does not get used now, then it's better
to check its value explicitly.

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
  2017-11-03  7:40   ` Chao Peng
@ 2017-11-03  8:54     ` Haozhong Zhang
  0 siblings, 0 replies; 128+ messages in thread
From: Haozhong Zhang @ 2017-11-03  8:54 UTC (permalink / raw)
  To: Chao Peng
  Cc: Konrad Rzeszutek Wilk, Andrew Cooper, xen-devel, Jan Beulich,
	Dan Williams, Daniel De Graaf

On 11/03/17 15:40 +0800, Chao Peng wrote:
> 
> > +/*
> > + * Interface for NVDIMM management.
> > + */
> > +
> > +struct xen_sysctl_nvdimm_op {
> > +    uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented
> > yet. */
> > +    uint32_t pad; /* IN: Always zero. */
> 
> If alignment is the only concern, then err can be moved to here.
> 
> If it's designed for future and does not get used now, then it's better
> to check its value explicitly.
> 

I'll move 'err' to the position of 'pad'.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2017-11-03  8:54 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-11  4:37 [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
2017-10-27  6:49   ` Chao Peng
2017-10-27  7:02     ` Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table() Haozhong Zhang
2017-10-27  6:58   ` Chao Peng
2017-10-27  9:24     ` Andrew Cooper
2017-10-30  2:21       ` Chao Peng
2017-09-11  4:37 ` [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
2017-10-27  8:10   ` Chao Peng
2017-09-11  4:37 ` [RFC XEN PATCH v3 04/39] xen/common: add Kconfig item for pmem support Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable Haozhong Zhang
2017-11-03  5:58   ` Chao Peng
2017-11-03  6:39     ` Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT Haozhong Zhang
2017-11-03  6:15   ` Chao Peng
2017-11-03  7:14     ` Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor Haozhong Zhang
2017-11-03  6:26   ` Chao Peng
2017-09-11  4:37 ` [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0 Haozhong Zhang
2017-11-03  6:51   ` Chao Peng
2017-11-03  7:24     ` Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op Haozhong Zhang
2017-11-03  7:40   ` Chao Peng
2017-11-03  8:54     ` Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl' Haozhong Zhang
2017-09-11  5:10   ` Dan Williams
2017-09-11  5:39     ` Haozhong Zhang
2017-09-11 16:35       ` Dan Williams
2017-09-11 21:24         ` Konrad Rzeszutek Wilk
2017-09-13 17:45           ` Dan Williams
2017-09-11  4:37 ` [RFC XEN PATCH v3 13/39] tools/xen-ndctl: add command 'list' Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 14/39] x86_64/mm: refactor memory_add() Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 15/39] x86_64/mm: allow customized location of extended frametable and M2P table Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 17/39] tools/xen-ndctl: add command 'setup-mgmt' Haozhong Zhang
2017-09-11  4:37 ` [RFC XEN PATCH v3 18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 20/39] tools/xen-ndctl: add option '--mgmt' to command 'list' Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 21/39] xen/pmem: support setup PMEM region for guest data usage Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 22/39] tools/xen-ndctl: add command 'setup-data' Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 25/39] tools/xen-ndctl: add option '--data' to command 'list' Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 26/39] xen/pmem: add function to map PMEM pages to HVM domain Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 27/39] xen/pmem: release PMEM pages on HVM domain destruction Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 28/39] xen: add hypercall XENMEM_populate_pmem_map Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 29/39] tools: reserve guest memory for ACPI from device model Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 31/39] tools/libacpi: add callback to translate GPA to GVA Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 32/39] tools/libacpi: add callbacks to access XenStore Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 33/39] tools/libacpi: add a simple AML builder Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 34/39] tools/libacpi: add DM ACPI blacklists Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 35/39] tools/libacpi: load ACPI built by the device model Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 38/39] tools/libxl: initiate PMEM mapping via QMP callback Haozhong Zhang
2017-09-11  4:38 ` [RFC XEN PATCH v3 39/39] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
2017-09-11  4:41 ` [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest Haozhong Zhang
2017-09-11  4:41   ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 01/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 02/10] hw/xen-hvm: create the hotplug memory region on Xen Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 03/10] hostmem-xen: add a host memory backend for Xen Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 04/10] nvdimm acpi: do not use fw_cfg on Xen Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 05/10] hw/xen-hvm: initialize DM ACPI Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 06/10] hw/xen-hvm: add function to copy ACPI into guest memory Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 07/10] nvdimm acpi: copy NFIT to Xen guest Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 08/10] nvdimm acpi: copy ACPI namespace device of vNVDIMM " Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 09/10] nvdimm acpi: do not build _FIT method on Xen Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:41   ` [Qemu-devel] [RFC QEMU PATCH v3 10/10] hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled Haozhong Zhang
2017-09-11  4:41     ` Haozhong Zhang
2017-09-11  4:53   ` [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest no-reply
2017-09-11  4:53     ` no-reply
2017-09-11 14:08   ` Igor Mammedov
2017-09-11 14:08     ` Igor Mammedov
2017-09-11 18:52     ` [Qemu-devel] " Stefano Stabellini
2017-09-11 18:52       ` Stefano Stabellini
2017-09-12  3:15       ` [Qemu-devel] " Haozhong Zhang
2017-09-12  3:15         ` Haozhong Zhang
2017-10-10 16:05         ` [Qemu-devel] " Konrad Rzeszutek Wilk
2017-10-10 16:05           ` Konrad Rzeszutek Wilk
2017-10-12 12:45           ` [Qemu-devel] " Haozhong Zhang
2017-10-12 12:45             ` Haozhong Zhang
2017-10-12 15:45             ` Paolo Bonzini
2017-10-12 15:45               ` Paolo Bonzini
2017-10-13  7:53               ` Haozhong Zhang
2017-10-13  7:53                 ` Haozhong Zhang
2017-10-13  8:44                 ` Igor Mammedov
2017-10-13  8:44                   ` Igor Mammedov
2017-10-13 11:13                   ` Haozhong Zhang
2017-10-13 11:13                     ` Haozhong Zhang
2017-10-13 12:13                     ` Jan Beulich
2017-10-13 12:13                       ` Jan Beulich
2017-10-13 22:46                       ` Stefano Stabellini
2017-10-13 22:46                         ` Stefano Stabellini
2017-10-15  0:31                         ` Michael S. Tsirkin
2017-10-15  0:31                           ` Michael S. Tsirkin
2017-10-16 14:49                           ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
2017-10-16 14:49                             ` [Qemu-devel] " Konrad Rzeszutek Wilk
2017-10-17 11:45                         ` Paolo Bonzini
2017-10-17 11:45                           ` Paolo Bonzini
2017-10-17 12:16                           ` Haozhong Zhang
2017-10-17 12:16                             ` Haozhong Zhang
2017-10-18  8:32                             ` [Qemu-devel] [Xen-devel] " Roger Pau Monné
2017-10-18  8:32                               ` [Qemu-devel] " Roger Pau Monné
2017-10-18  8:46                               ` [Qemu-devel] [Xen-devel] " Paolo Bonzini
2017-10-18  8:46                                 ` [Qemu-devel] " Paolo Bonzini
2017-10-18  8:55                                 ` [Qemu-devel] [Xen-devel] " Roger Pau Monné
2017-10-18  8:55                                   ` [Qemu-devel] " Roger Pau Monné
2017-10-15  0:35                 ` Michael S. Tsirkin
2017-10-15  0:35                   ` Michael S. Tsirkin
2017-10-12 17:39             ` Konrad Rzeszutek Wilk
2017-10-12 17:39               ` Konrad Rzeszutek Wilk
2017-10-13  8:00               ` Haozhong Zhang
2017-10-13  8:00                 ` Haozhong Zhang
2017-10-27  3:26 ` [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains Chao Peng
2017-10-27  4:25   ` Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.