All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
@ 2017-12-07 10:09 Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
                   ` (42 more replies)
  0 siblings, 43 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Gang Wei, Jan Beulich,
	Shane Wang, Chao Peng, Dan Williams, Daniel De Graaf

All patches can also be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4

RFC v3 can be found at
  https://lists.xen.org/archives/html/xen-devel/2017-09/msg00964.html

Changes in v4:
  * Move the functionality of management util 'xen-ndctl' to Xne
    management tool 'xl'.
  * Load QEMU ACPI via QEMU fw_cfg and BIOSLinkerLoader interface.
  * Other changes are documented in patches separately.


- Part 0. Bug fix and code cleanup
  [01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
  [02/41] x86_64/mm: avoid cleaning the unmapped frame table
  [03/41] hvmloader/util: do not compare characters after '\0' in strncmp

- Part 1. Detect host PMEM
  Detect host PMEM via NFIT. No frametable and M2P table for them are
  created in this part.

  [04/41] xen/common: add Kconfig item for pmem support
  [05/41] x86/mm: exclude PMEM regions from initial frametable
  [06/41] acpi: probe valid PMEM regions via NFIT
  [07/41] xen/pmem: register valid PMEM regions to Xen hypervisor
  [08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0
  [09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
  [10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
  [11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
  [12/41] tools/xl: add xl command 'pmem-list'

- Part 2. Setup host PMEM for management and guest data usage
  Allow users or admins in Dom0 to setup host PMEM pages for
  management and guest data usages.
   * Management PMEM pages are used to store the frametable and M2P of
     PMEM pages (including themselves), and never mapped to guest.
   * Guest data PMEM pages can be mapped to guest and used as the
     backend storage of virtual NVDIMM devices.

  [13/41] x86_64/mm: refactor memory_add()
  [14/41] x86_64/mm: allow customized location of extended frametable and M2P table
  [15/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region
  [16/41] tools/xl: accept all bases in parse_ulong()
  [17/41] tools/xl: expose parse_ulong()
  [18/41] tools/xl: add xl command 'pmem-setup'
  [19/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
  [20/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions
  [21/41] tools/xl: add option '--mgmt | -m' to xl command pmem-list
  [22/41] xen/pmem: support setup PMEM region for guest data usage
  [23/41] tools/xl: add option '--data | -d' to xl command pmem-setup
  [24/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
  [25/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions
  [26/41] tools/xl: add option '--data | -d' to xl command pmem-list

- Part 3. Hypervisor support to map host PMEM pages to HVM domain
  [27/41] xen/pmem: add function to map PMEM pages to HVM domain
  [28/41] xen/pmem: release PMEM pages on HVM domain destruction
  [29/41] xen: add hypercall XENMEM_populate_pmem_map

- Part 4. Load QEMU ACPI
  Guest NFIT and NVDIMM namespace devices are built by QEMU. This part
  loads QEMU ACPI via QEMU fw_cfg and BIOSLinkerLoader interface. A
  simple blacklist mechanism is added to reject DM ACPI tables that
  may conflict with those built by Xen itself.

  [30/41] tools: reserve extra guest memory for ACPI from device model
  [31/41] tools/libacpi: add callback to translate GPA to GVA
  [32/41] tools/libacpi: build a DM ACPI signature blacklist
  [33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  [34/41] tools/libacpi: probe QEMU ACPI ROMs via fw_cfg interface
  [35/41] tools/libacpi: add a QEMU BIOSLinkLoader executor
  [36/41] tools/libacpi: add function to get the data of QEMU RSDP
  [37/41] tools/libacpi: load QEMU ACPI

- Part 5. Remaining tool stack changes
  Add xl domain configuration and generate new QEMU options for vNVDIMM.

  [38/41] tools/xl: add xl domain configuration for virtual NVDIMM devices
  [39/41] tools/libxl: allow aborting domain creation on fatal QMP init errors
  [40/41] tools/libxl: initiate PMEM mapping via QMP callback
  [41/41] tools/libxl: build qemu options from xl vNVDIMM configs


 docs/man/xl.cfg.pod.5.in            |  40 ++
 tools/firmware/hvmloader/Makefile   |   4 +-
 tools/firmware/hvmloader/util.c     |  16 +
 tools/firmware/hvmloader/util.h     |  14 +
 tools/flask/policy/modules/dom0.te  |   2 +-
 tools/flask/policy/modules/xen.if   |   3 +-
 tools/libacpi/acpi2_0.h             |   1 +
 tools/libacpi/build.c               | 176 ++++++++-
 tools/libacpi/libacpi.h             |  10 +
 tools/libacpi/qemu.h                |  56 +++
 tools/libacpi/qemu_fw_cfg.c         |  99 +++++
 tools/libacpi/qemu_loader.c         | 392 +++++++++++++++++++
 tools/libacpi/qemu_stub.c           |  64 +++
 tools/libxc/include/xenctrl.h       |  88 +++++
 tools/libxc/xc_domain.c             |  15 +
 tools/libxc/xc_misc.c               | 154 ++++++++
 tools/libxl/Makefile                |   5 +-
 tools/libxl/libxl.h                 |  55 +++
 tools/libxl/libxl_create.c          |   4 +-
 tools/libxl/libxl_dm.c              |  81 +++-
 tools/libxl/libxl_internal.h        |   6 +
 tools/libxl/libxl_nvdimm.c          | 227 +++++++++++
 tools/libxl/libxl_qmp.c             | 138 ++++++-
 tools/libxl/libxl_types.idl         |  49 +++
 tools/libxl/libxl_x86.c             |   7 +-
 tools/libxl/libxl_x86_acpi.c        |  10 +
 tools/xl/Makefile                   |   2 +-
 tools/xl/xl.h                       |   2 +
 tools/xl/xl_cmdtable.c              |  19 +
 tools/xl/xl_nvdimm.c                | 205 ++++++++++
 tools/xl/xl_parse.c                 | 130 +++++-
 tools/xl/xl_parse.h                 |   1 +
 tools/xl/xl_vmcontrol.c             |  15 +-
 xen/arch/x86/acpi/boot.c            |   4 +
 xen/arch/x86/acpi/power.c           |   7 +
 xen/arch/x86/dom0_build.c           |   5 +
 xen/arch/x86/domain.c               |  32 +-
 xen/arch/x86/mm.c                   | 124 +++++-
 xen/arch/x86/setup.c                |   4 +
 xen/arch/x86/shutdown.c             |   3 +
 xen/arch/x86/tboot.c                |   4 +
 xen/arch/x86/x86_64/mm.c            | 302 ++++++++++----
 xen/common/Kconfig                  |   8 +
 xen/common/Makefile                 |   1 +
 xen/common/compat/memory.c          |   1 +
 xen/common/domain.c                 |   3 +
 xen/common/kexec.c                  |   3 +
 xen/common/memory.c                 |  44 +++
 xen/common/pmem.c                   | 761 ++++++++++++++++++++++++++++++++++++
 xen/common/sysctl.c                 |   9 +
 xen/drivers/acpi/Makefile           |   2 +
 xen/drivers/acpi/nfit.c             | 321 +++++++++++++++
 xen/include/acpi/actbl1.h           |  69 ++++
 xen/include/asm-x86/domain.h        |   1 +
 xen/include/asm-x86/mm.h            |  10 +-
 xen/include/public/memory.h         |  14 +-
 xen/include/public/sysctl.h         |  97 ++++-
 xen/include/xen/acpi.h              |  10 +
 xen/include/xen/pmem.h              |  76 ++++
 xen/include/xen/sched.h             |   3 +
 xen/include/xsm/dummy.h             |  11 +
 xen/include/xsm/xsm.h               |  12 +
 xen/xsm/dummy.c                     |   4 +
 xen/xsm/flask/hooks.c               |  17 +
 xen/xsm/flask/policy/access_vectors |   4 +
 65 files changed, 3939 insertions(+), 117 deletions(-)
 create mode 100644 tools/libacpi/qemu.h
 create mode 100644 tools/libacpi/qemu_fw_cfg.c
 create mode 100644 tools/libacpi/qemu_loader.c
 create mode 100644 tools/libacpi/qemu_stub.c
 create mode 100644 tools/libxl/libxl_nvdimm.c
 create mode 100644 tools/xl/xl_nvdimm.c
 create mode 100644 xen/common/pmem.c
 create mode 100644 xen/drivers/acpi/nfit.c
 create mode 100644 xen/include/xen/pmem.h

-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2018-01-04  6:12   ` Chao Peng
  2018-05-07 15:59   ` Jan Beulich
  2017-12-07 10:09 ` [RFC XEN PATCH v4 02/41] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
                   ` (41 subsequent siblings)
  42 siblings, 2 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

The current check refuses the hot-plugged memory that falls in one
unused PDX group, which should be allowed.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 9b37da6698..839038b6c3 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1295,12 +1295,8 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
         return 0;
 
     /* Make sure the new range is not present now */
-    sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  & ~(PDX_GROUP_COUNT - 1))
-            / PDX_GROUP_COUNT;
+    sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;
     eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;
-    if (sidx >= eidx)
-        return 0;
-
     s = find_next_zero_bit(pdx_group_valid, eidx, sidx);
     if ( s > eidx )
         return 0;
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 02/41] x86_64/mm: avoid cleaning the unmapped frame table
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2018-01-04  6:20   ` Chao Peng
  2017-12-07 10:09 ` [RFC XEN PATCH v4 03/41] hvmloader/util: do not compare characters after '\0' in strncmp Haozhong Zhang
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

cleanup_frame_table() initializes the entire newly added frame table
to all -1's. If it's called after extend_frame_table() failed to map
the entire frame table, the initialization will hit a page fault.

Move the cleanup of partially mapped frametable to extend_frame_table(),
which has enough knowledge of the mapping status.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

@Chao: I don't modify this patch per your comment, because I feel it's
better to handle the errors locally in each function (rather than handle
all of them in the top-level), which will make each function easier to use.
---
 xen/arch/x86/x86_64/mm.c | 51 ++++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 839038b6c3..35a1535c1e 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -710,15 +710,12 @@ void free_compat_arg_xlat(struct vcpu *v)
                               PFN_UP(COMPAT_ARG_XLAT_SIZE));
 }
 
-static void cleanup_frame_table(struct mem_hotadd_info *info)
+static void cleanup_frame_table(unsigned long spfn, unsigned long epfn)
 {
+    struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn };
     unsigned long sva, eva;
     l3_pgentry_t l3e;
     l2_pgentry_t l2e;
-    unsigned long spfn, epfn;
-
-    spfn = info->spfn;
-    epfn = info->epfn;
 
     sva = (unsigned long)mfn_to_page(spfn);
     eva = (unsigned long)mfn_to_page(epfn);
@@ -744,7 +741,7 @@ static void cleanup_frame_table(struct mem_hotadd_info *info)
         if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) ==
               (_PAGE_PSE | _PAGE_PRESENT) )
         {
-            if (hotadd_mem_valid(l2e_get_pfn(l2e), info))
+            if ( hotadd_mem_valid(l2e_get_pfn(l2e), &info) )
                 destroy_xen_mappings(sva & ~((1UL << L2_PAGETABLE_SHIFT) - 1),
                          ((sva & ~((1UL << L2_PAGETABLE_SHIFT) -1 )) +
                             (1UL << L2_PAGETABLE_SHIFT) - 1));
@@ -769,28 +766,33 @@ static int setup_frametable_chunk(void *start, void *end,
 {
     unsigned long s = (unsigned long)start;
     unsigned long e = (unsigned long)end;
-    unsigned long mfn;
-    int err;
+    unsigned long cur, mfn;
+    int err = 0;
 
     ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
     ASSERT(!(e & ((1 << L2_PAGETABLE_SHIFT) - 1)));
 
-    for ( ; s < e; s += (1UL << L2_PAGETABLE_SHIFT))
+    for ( cur = s; cur < e; cur += (1UL << L2_PAGETABLE_SHIFT) )
     {
         mfn = alloc_hotadd_mfn(info);
-        err = map_pages_to_xen(s, mfn, 1UL << PAGETABLE_ORDER,
+        err = map_pages_to_xen(cur, mfn, 1UL << PAGETABLE_ORDER,
                                PAGE_HYPERVISOR);
         if ( err )
-            return err;
+            break;
     }
-    memset(start, -1, s - (unsigned long)start);
 
-    return 0;
+    if ( !err )
+        memset(start, -1, cur - s);
+    else
+        destroy_xen_mappings(s, cur);
+
+    return err;
 }
 
 static int extend_frame_table(struct mem_hotadd_info *info)
 {
     unsigned long cidx, nidx, eidx, spfn, epfn;
+    int err = 0;
 
     spfn = info->spfn;
     epfn = info->epfn;
@@ -809,8 +811,6 @@ static int extend_frame_table(struct mem_hotadd_info *info)
 
     while ( cidx < eidx )
     {
-        int err;
-
         nidx = find_next_bit(pdx_group_valid, eidx, cidx);
         if ( nidx >= eidx )
             nidx = eidx;
@@ -818,14 +818,19 @@ static int extend_frame_table(struct mem_hotadd_info *info)
                                      pdx_to_page(nidx * PDX_GROUP_COUNT),
                                      info);
         if ( err )
-            return err;
+            break;
 
         cidx = find_next_zero_bit(pdx_group_valid, eidx, nidx);
     }
 
-    memset(mfn_to_page(spfn), 0,
-           (unsigned long)mfn_to_page(epfn) - (unsigned long)mfn_to_page(spfn));
-    return 0;
+    if ( !err )
+        memset(mfn_to_page(spfn), 0,
+               (unsigned long)mfn_to_page(epfn) -
+               (unsigned long)mfn_to_page(spfn));
+    else
+        cleanup_frame_table(spfn, pdx_to_pfn(cidx * PDX_GROUP_COUNT));
+
+    return err;
 }
 
 void __init subarch_init_memory(void)
@@ -1404,8 +1409,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     info.cur = spfn;
 
     ret = extend_frame_table(&info);
-    if (ret)
-        goto destroy_frametable;
+    if ( ret )
+        goto restore_node_status;
 
     /* Set max_page as setup_m2p_table will use it*/
     if (max_page < epfn)
@@ -1448,8 +1453,8 @@ destroy_m2p:
     max_page = old_max;
     total_pages = old_total;
     max_pdx = pfn_to_pdx(max_page - 1) + 1;
-destroy_frametable:
-    cleanup_frame_table(&info);
+    cleanup_frame_table(spfn, epfn);
+restore_node_status:
     if ( !orig_online )
         node_set_offline(node);
     NODE_DATA(node)->node_start_pfn = old_node_start;
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 03/41] hvmloader/util: do not compare characters after '\0' in strncmp
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 02/41] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2018-01-04  6:23   ` Chao Peng
  2017-12-07 10:09 ` [RFC XEN PATCH v4 04/41] xen/common: add Kconfig item for pmem support Haozhong Zhang
                   ` (39 subsequent siblings)
  42 siblings, 1 reply; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

... to make its behavior the same as C standard (e.g., C99 and C11).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d24cd..76a61ee052 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -141,9 +141,16 @@ int strcmp(const char *cs, const char *ct)
 int strncmp(const char *s1, const char *s2, uint32_t n)
 {
     uint32_t ctr;
+
     for (ctr = 0; ctr < n; ctr++)
+    {
         if (s1[ctr] != s2[ctr])
             return (int)(s1[ctr] - s2[ctr]);
+
+        if (!s1[ctr])
+            break;
+    }
+
     return 0;
 }
 
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 04/41] xen/common: add Kconfig item for pmem support
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (2 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 03/41] hvmloader/util: do not compare characters after '\0' in strncmp Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 05/41] x86/mm: exclude PMEM regions from initial frametable Haozhong Zhang
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Add CONFIG_PMEM to enable NVDIMM persistent memory support. By
default, it's N.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/Kconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 103ef44cb5..1a4d7d93bb 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -282,4 +282,12 @@ config CMDLINE_OVERRIDE
 
 	  This is used to work around broken bootloaders. This should
 	  be set to 'N' under normal conditions.
+
+config NVDIMM_PMEM
+	bool "Persistent memory support"
+	default n
+	---help---
+	  Enable support for NVDIMM in the persistent memory mode.
+
+	  If unsure, say N.
 endmenu
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 05/41] x86/mm: exclude PMEM regions from initial frametable
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (3 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 04/41] xen/common: add Kconfig item for pmem support Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 06/41] acpi: probe valid PMEM regions via NFIT Haozhong Zhang
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

No specification defines that PMEM regions cannot appear in margins
between RAM regions. If that does happen, init_frametable() will need
to allocate RAM for the part of frametable of PMEM regions. However,
PMEM regions can be very large (several terabytes or more), so
init_frametable() may fail.

Because Xen does not use PMEM at the boot time, we can defer the
actual resource allocation of frametable of PMEM regions. At the boot
time, all pages of frametable of PMEM regions appearing between RAM
regions are mapped one RAM page filled with 0xff.

Any attempt, whichs write to those frametable pages before the their
actual resource is allocated, implies bugs in Xen. Therefore, the
read-only mapping is used here to make those bugs explicit.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c         | 118 +++++++++++++++++++++++++++++++++++++++++-----
 xen/arch/x86/setup.c      |   4 ++
 xen/drivers/acpi/Makefile |   2 +
 xen/drivers/acpi/nfit.c   | 116 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/acpi/actbl1.h |  43 +++++++++++++++++
 xen/include/xen/acpi.h    |   7 +++
 6 files changed, 279 insertions(+), 11 deletions(-)
 create mode 100644 xen/drivers/acpi/nfit.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 155e42569b..b2046ca2f0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -83,6 +83,9 @@
  * an application-supplied buffer).
  */
 
+#ifdef CONFIG_NVDIMM_PMEM
+#include <xen/acpi.h>
+#endif
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -188,34 +191,127 @@ static int __init parse_mmio_relax(const char *s)
 }
 custom_param("mmio-relax", parse_mmio_relax);
 
-static void __init init_frametable_chunk(void *start, void *end)
+static void __init init_frametable_ram_chunk(unsigned long s, unsigned long e)
 {
-    unsigned long s = (unsigned long)start;
-    unsigned long e = (unsigned long)end;
-    unsigned long step;
+    unsigned long cur, step;
     mfn_t mfn;
 
-    ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
-    for ( ; s < e; s += step << PAGE_SHIFT )
+    for ( cur = s; cur < e; cur += step << PAGE_SHIFT )
     {
         step = 1UL << (cpu_has_page1gb &&
-                       !(s & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ?
+                       !(cur & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ?
                        L3_PAGETABLE_SHIFT - PAGE_SHIFT :
                        L2_PAGETABLE_SHIFT - PAGE_SHIFT);
         /*
          * The hardcoded 4 below is arbitrary - just pick whatever you think
          * is reasonable to waste as a trade-off for using a large page.
          */
-        while ( step && s + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) )
+        while ( step && cur + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) )
             step >>= PAGETABLE_ORDER;
         mfn = alloc_boot_pages(step, step);
-        map_pages_to_xen(s, mfn_x(mfn), step, PAGE_HYPERVISOR);
+        map_pages_to_xen(cur, mfn_x(mfn), step, PAGE_HYPERVISOR);
     }
 
-    memset(start, 0, end - start);
-    memset(end, -1, s - e);
+    memset((void *)s, 0, e - s);
+    memset((void *)e, -1, cur - e);
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+static void __init init_frametable_pmem_chunk(unsigned long s, unsigned long e)
+{
+    static mfn_t pmem_init_frametable_mfn = INVALID_MFN_INITIALIZER;
+
+    ASSERT(!((s | e) & (PAGE_SIZE - 1)));
+
+    if ( mfn_eq(pmem_init_frametable_mfn, INVALID_MFN) )
+    {
+        pmem_init_frametable_mfn = alloc_boot_pages(1, 1);
+        if ( mfn_eq(pmem_init_frametable_mfn, INVALID_MFN) )
+            panic("Not enough memory for pmem initial frame table page");
+        memset(mfn_to_virt(mfn_x(pmem_init_frametable_mfn)), -1, PAGE_SIZE);
+    }
+
+    while ( s < e )
+    {
+        /*
+         * The real frame table entries of a pmem region will be
+         * created when the pmem region is registered to hypervisor.
+         * Any write attempt to the initial entries of that pmem
+         * region implies potential hypervisor bugs. In order to make
+         * those bugs explicit, map those initial entries as read-only.
+         */
+        map_pages_to_xen(s, mfn_x(pmem_init_frametable_mfn),
+                         1, PAGE_HYPERVISOR_RO);
+        s += PAGE_SIZE;
+    }
+}
+#endif /* CONFIG_NVDIMM_PMEM */
+
+static void __init init_frametable_chunk(void *start, void *end)
+{
+    unsigned long s = (unsigned long)start;
+    unsigned long e = (unsigned long)end;
+#ifdef CONFIG_NVDIMM_PMEM
+    unsigned long pmem_smfn, pmem_emfn;
+    unsigned long pmem_spage = s, pmem_epage = s;
+    unsigned long pmem_page_aligned;
+    bool found = false;
+#endif /* CONFIG_NVDIMM_PMEM */
+
+    ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
+
+#ifndef CONFIG_NVDIMM_PMEM
+    init_frametable_ram_chunk(s, e);
+#else
+    while ( s < e )
+    {
+        /* No previous found pmem region overlaps with s ~ e. */
+        if ( s >= (pmem_epage & PAGE_MASK) )
+        {
+            found = acpi_nfit_boot_search_pmem(
+                mfn_x(page_to_mfn((struct page_info *)s)),
+                mfn_x(page_to_mfn((struct page_info *)e)),
+                &pmem_smfn, &pmem_emfn);
+            if ( found )
+            {
+                pmem_spage = (unsigned long)mfn_to_page(_mfn(pmem_smfn));
+                pmem_epage = (unsigned long)mfn_to_page(_mfn(pmem_emfn));
+            }
+        }
+
+        /* No pmem region found in s ~ e. */
+        if ( s >= (pmem_epage & PAGE_MASK) )
+        {
+            init_frametable_ram_chunk(s, e);
+            break;
+        }
+
+        if ( s < pmem_spage )
+        {
+            init_frametable_ram_chunk(s, pmem_spage);
+            pmem_page_aligned = (pmem_spage + PAGE_SIZE - 1) & PAGE_MASK;
+            if ( pmem_page_aligned > pmem_epage )
+                memset((void *)pmem_epage, -1, pmem_page_aligned - pmem_epage);
+            s = pmem_page_aligned;
+        }
+        else
+        {
+            pmem_page_aligned = pmem_epage & PAGE_MASK;
+            if ( pmem_page_aligned > s )
+                init_frametable_pmem_chunk(s, pmem_page_aligned);
+            if ( pmem_page_aligned < pmem_epage )
+            {
+                init_frametable_ram_chunk(pmem_page_aligned,
+                                          min(pmem_page_aligned + PAGE_SIZE, e));
+                memset((void *)pmem_page_aligned, -1,
+                       pmem_epage - pmem_page_aligned);
+            }
+            s = (pmem_epage + PAGE_SIZE - 1) & PAGE_MASK;
+        }
+    }
+#endif
+ }
+
 void __init init_frametable(void)
 {
     unsigned int sidx, eidx, nidx;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 32bb02e3a5..d7c41e2e5d 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1358,6 +1358,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     BUILD_BUG_ON(MACH2PHYS_VIRT_START != RO_MPT_VIRT_START);
     BUILD_BUG_ON(MACH2PHYS_VIRT_END   != RO_MPT_VIRT_END);
 
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_boot_init();
+#endif
+
     init_frametable();
 
     if ( !acpi_boot_table_init_done )
diff --git a/xen/drivers/acpi/Makefile b/xen/drivers/acpi/Makefile
index 444b11d583..c8bb869cb8 100644
--- a/xen/drivers/acpi/Makefile
+++ b/xen/drivers/acpi/Makefile
@@ -9,3 +9,5 @@ obj-$(CONFIG_HAS_CPUFREQ) += pmstat.o
 
 obj-$(CONFIG_X86) += hwregs.o
 obj-$(CONFIG_X86) += reboot.o
+
+obj-$(CONFIG_NVDIMM_PMEM) += nfit.o
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
new file mode 100644
index 0000000000..e099378ee0
--- /dev/null
+++ b/xen/drivers/acpi/nfit.c
@@ -0,0 +1,116 @@
+/*
+ * xen/drivers/acpi/nfit.c
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/acpi.h>
+#include <xen/init.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+
+/*
+ * GUID of a byte addressable persistent memory region
+ * (ref. ACPI 6.2, Section 5.2.25.2)
+ */
+static const uint8_t nfit_spa_pmem_guid[] =
+{
+    0x79, 0xd3, 0xf0, 0x66, 0xf3, 0xb4, 0x74, 0x40,
+    0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb,
+};
+
+struct acpi_nfit_desc {
+    struct acpi_table_nfit *acpi_table;
+};
+
+static struct acpi_nfit_desc nfit_desc;
+
+void __init acpi_nfit_boot_init(void)
+{
+    acpi_status status;
+    acpi_physical_address nfit_addr;
+    acpi_native_uint nfit_len;
+
+    status = acpi_get_table_phys(ACPI_SIG_NFIT, 0, &nfit_addr, &nfit_len);
+    if ( ACPI_FAILURE(status) )
+        return;
+
+    nfit_desc.acpi_table = (struct acpi_table_nfit *)__va(nfit_addr);
+    map_pages_to_xen((unsigned long)nfit_desc.acpi_table, PFN_DOWN(nfit_addr),
+                     PFN_UP(nfit_addr + nfit_len) - PFN_DOWN(nfit_addr),
+                     PAGE_HYPERVISOR);
+}
+
+/**
+ * Search pmem regions overlapped with the specified address range.
+ *
+ * Parameters:
+ *  @smfn, @emfn: the start and end MFN of address range to search
+ *  @ret_smfn, @ret_emfn: return the address range of the first pmem region
+ *                        in above range
+ *
+ * Return:
+ *  Return true if a pmem region is overlapped with @smfn - @emfn. The
+ *  start and end MFN of the lowest pmem region are returned via
+ *  @ret_smfn and @ret_emfn respectively.
+ *
+ *  Return false if no pmem region is overlapped with @smfn - @emfn.
+ */
+bool __init acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
+                                       unsigned long *ret_smfn,
+                                       unsigned long *ret_emfn)
+{
+    struct acpi_table_nfit *nfit_table = nfit_desc.acpi_table;
+    uint32_t hdr_offset = sizeof(*nfit_table);
+    unsigned long saddr = pfn_to_paddr(smfn), eaddr = pfn_to_paddr(emfn);
+    unsigned long ret_saddr = 0, ret_eaddr = 0;
+
+    if ( !nfit_table )
+        return false;
+
+    while ( hdr_offset < nfit_table->header.length )
+    {
+        struct acpi_nfit_header *hdr = (void *)nfit_table + hdr_offset;
+        struct acpi_nfit_system_address *spa;
+        unsigned long pmem_saddr, pmem_eaddr;
+
+        hdr_offset += hdr->length;
+
+        if ( hdr->type != ACPI_NFIT_TYPE_SYSTEM_ADDRESS )
+            continue;
+
+        spa = (struct acpi_nfit_system_address *)hdr;
+        if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) )
+            continue;
+
+        pmem_saddr = spa->address;
+        pmem_eaddr = pmem_saddr + spa->length;
+        if ( pmem_saddr >= eaddr || pmem_eaddr <= saddr )
+            continue;
+
+        if ( ret_saddr < pmem_saddr )
+            continue;
+        ret_saddr = pmem_saddr;
+        ret_eaddr = pmem_eaddr;
+    }
+
+    if ( ret_saddr == ret_eaddr )
+        return false;
+
+    *ret_smfn = paddr_to_pfn(ret_saddr);
+    *ret_emfn = paddr_to_pfn(ret_eaddr);
+
+    return true;
+}
diff --git a/xen/include/acpi/actbl1.h b/xen/include/acpi/actbl1.h
index e1991362dc..94d8d7775c 100644
--- a/xen/include/acpi/actbl1.h
+++ b/xen/include/acpi/actbl1.h
@@ -71,6 +71,7 @@
 #define ACPI_SIG_SBST           "SBST"	/* Smart Battery Specification Table */
 #define ACPI_SIG_SLIT           "SLIT"	/* System Locality Distance Information Table */
 #define ACPI_SIG_SRAT           "SRAT"	/* System Resource Affinity Table */
+#define ACPI_SIG_NFIT           "NFIT"	/* NVDIMM Firmware Interface Table */
 
 /*
  * All tables must be byte-packed to match the ACPI specification, since
@@ -903,6 +904,48 @@ struct acpi_msct_proximity {
 	u64 memory_capacity;	/* In bytes */
 };
 
+/*******************************************************************************
+ *
+ * NFIT - NVDIMM Interface Table (ACPI 6.0+)
+ *		  Version 1
+ *
+ ******************************************************************************/
+
+struct acpi_table_nfit {
+	struct acpi_table_header header;	/* Common ACPI table header */
+	u32 reserved;						/* Reserved, must be zero */
+};
+
+/* Subtable header for NFIT */
+
+struct acpi_nfit_header {
+	u16 type;
+	u16 length;
+};
+
+/* Values for subtable type in struct acpi_nfit_header */
+enum acpi_nfit_type {
+	ACPI_NFIT_TYPE_SYSTEM_ADDRESS = 0,
+	ACPI_NFIT_TYPE_MEMORY_MAP = 1,
+};
+
+/*
+ * NFIT Subtables
+ */
+
+/* 0: System Physical Address Range Structure */
+struct acpi_nfit_system_address {
+	struct acpi_nfit_header header;
+	u16 range_index;
+	u16 flags;
+	u32 reserved;		/* Reseved, must be zero */
+	u32 proximity_domain;
+	u8	range_guid[16];
+	u64 address;
+	u64 length;
+	u64 memory_mapping;
+};
+
 /*******************************************************************************
  *
  * SBST - Smart Battery Specification Table
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 9409350f05..1bd8f9f4e4 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -180,4 +180,11 @@ void acpi_reboot(void);
 void acpi_dmar_zap(void);
 void acpi_dmar_reinstate(void);
 
+#ifdef CONFIG_NVDIMM_PMEM
+void acpi_nfit_boot_init(void);
+bool acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
+                                unsigned long *ret_smfn,
+                                unsigned long *ret_emfn);
+#endif /* CONFIG_NVDIMM_PMEM */
+
 #endif /*_LINUX_ACPI_H*/
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 06/41] acpi: probe valid PMEM regions via NFIT
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (4 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 05/41] x86/mm: exclude PMEM regions from initial frametable Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 07/41] xen/pmem: register valid PMEM regions to Xen hypervisor Haozhong Zhang
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

A PMEM region with failures (e.g., not properly flushed in the last
power cycle, or some blocks within it are borken) cannot be safely
used by Xen and guest. Scan the state flags of NVDIMM region mapping
structures in NFIT to check whether any failures happened to a PMEM
region. The recovery of those failure are left out of Xen (e.g. left
to the firmware or other management utilities on the bare metal).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>

Changes in v4:
 * Scan memory mapping tables from SPA tables in acpi_nfit_register_pmem(),
   rather than in the reverse order.
---
 xen/arch/x86/acpi/boot.c  |   4 ++
 xen/drivers/acpi/nfit.c   | 176 +++++++++++++++++++++++++++++++++++++++++++++-
 xen/include/acpi/actbl1.h |  26 +++++++
 xen/include/xen/acpi.h    |   1 +
 4 files changed, 206 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
index 8e6c96dcf6..f52a2c6dc5 100644
--- a/xen/arch/x86/acpi/boot.c
+++ b/xen/arch/x86/acpi/boot.c
@@ -732,5 +732,9 @@ int __init acpi_boot_init(void)
 
 	acpi_table_parse(ACPI_SIG_BGRT, acpi_invalidate_bgrt);
 
+#ifdef CONFIG_NVDIMM_PMEM
+	acpi_nfit_init();
+#endif
+
 	return 0;
 }
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index e099378ee0..0a44983aad 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -31,11 +31,166 @@ static const uint8_t nfit_spa_pmem_guid[] =
     0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb,
 };
 
+struct nfit_spa_desc {
+    struct list_head link;
+    struct acpi_nfit_system_address *acpi_table;
+    struct list_head memdev_list;
+};
+
+struct nfit_memdev_desc {
+    struct list_head link;
+    struct acpi_nfit_memory_map *acpi_table;
+    struct list_head memdev_link;
+};
+
 struct acpi_nfit_desc {
     struct acpi_table_nfit *acpi_table;
+    struct list_head spa_list;
+    struct list_head memdev_list;
 };
 
-static struct acpi_nfit_desc nfit_desc;
+static struct acpi_nfit_desc nfit_desc = {
+    .spa_list = LIST_HEAD_INIT(nfit_desc.spa_list),
+    .memdev_list = LIST_HEAD_INIT(nfit_desc.memdev_list),
+};
+
+static void __init acpi_nfit_del_subtables(struct acpi_nfit_desc *desc)
+{
+    struct nfit_spa_desc *spa, *spa_next;
+    struct nfit_memdev_desc *memdev, *memdev_next;
+
+    list_for_each_entry_safe(spa, spa_next, &desc->spa_list, link)
+    {
+        list_del(&spa->link);
+        xfree(spa);
+    }
+    list_for_each_entry_safe (memdev, memdev_next, &desc->memdev_list, link)
+    {
+        list_del(&memdev->link);
+        xfree(memdev);
+    }
+}
+
+static int __init acpi_nfit_add_subtables(struct acpi_nfit_desc *desc)
+{
+    struct acpi_table_nfit *nfit_table = desc->acpi_table;
+    uint32_t hdr_offset = sizeof(*nfit_table);
+    uint32_t nfit_length = nfit_table->header.length;
+    struct acpi_nfit_header *hdr;
+    struct nfit_spa_desc *spa_desc;
+    struct nfit_memdev_desc *memdev_desc;
+    int ret = 0;
+
+#define INIT_DESC(desc, acpi_hdr, acpi_type, desc_list) \
+    do {                                                \
+        (desc) = xzalloc(typeof(*(desc)));              \
+        if ( unlikely(!(desc)) ) {                      \
+            ret = -ENOMEM;                              \
+            goto nomem;                                 \
+        }                                               \
+        (desc)->acpi_table = (acpi_type *)(acpi_hdr);   \
+        INIT_LIST_HEAD(&(desc)->link);                  \
+        list_add_tail(&(desc)->link, (desc_list));      \
+    } while ( 0 )
+
+    while ( hdr_offset < nfit_length )
+    {
+        hdr = (void *)nfit_table + hdr_offset;
+        hdr_offset += hdr->length;
+
+        switch ( hdr->type )
+        {
+        case ACPI_NFIT_TYPE_SYSTEM_ADDRESS:
+            INIT_DESC(spa_desc, hdr, struct acpi_nfit_system_address,
+                      &desc->spa_list);
+            break;
+
+        case ACPI_NFIT_TYPE_MEMORY_MAP:
+            INIT_DESC(memdev_desc, hdr, struct acpi_nfit_memory_map,
+                      &desc->memdev_list);
+            break;
+
+        default:
+            continue;
+        }
+    }
+
+#undef INIT_DESC
+
+    return 0;
+
+ nomem:
+    acpi_nfit_del_subtables(desc);
+
+    return ret;
+}
+
+static void __init acpi_nfit_link_subtables(struct acpi_nfit_desc *desc)
+{
+    struct nfit_spa_desc *spa_desc;
+    struct nfit_memdev_desc *memdev_desc;
+    uint16_t spa_idx;
+
+    list_for_each_entry(spa_desc, &desc->spa_list, link)
+    {
+        INIT_LIST_HEAD(&spa_desc->memdev_list);
+
+        spa_idx = spa_desc->acpi_table->range_index;
+
+        list_for_each_entry(memdev_desc, &desc->memdev_list, link)
+        {
+            if ( memdev_desc->acpi_table->range_index == spa_idx )
+                list_add_tail(&memdev_desc->memdev_link,
+                              &spa_desc->memdev_list);
+        }
+    }
+}
+
+static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
+{
+    struct nfit_spa_desc *spa_desc;
+    struct nfit_memdev_desc *memdev_desc;
+    struct acpi_nfit_system_address *spa;
+    unsigned long smfn, emfn;
+    bool failed;
+
+    list_for_each_entry(spa_desc, &desc->spa_list, link)
+    {
+        spa = spa_desc->acpi_table;
+
+        /* Skip non-pmem entry. */
+        if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) )
+            continue;
+
+        smfn = paddr_to_pfn(spa->address);
+        emfn = paddr_to_pfn(spa->address + spa->length);
+        failed = false;
+
+        list_for_each_entry(memdev_desc, &spa_desc->memdev_list, memdev_link)
+        {
+            if ( memdev_desc->acpi_table->flags &
+                 (ACPI_NFIT_MEM_SAVE_FAILED |
+                  ACPI_NFIT_MEM_RESTORE_FAILED |
+                  ACPI_NFIT_MEM_FLUSH_FAILED |
+                  ACPI_NFIT_MEM_NOT_ARMED |
+                  ACPI_NFIT_MEM_MAP_FAILED) )
+            {
+                failed = true;
+                break;
+            }
+        }
+
+        if ( failed )
+        {
+            printk(XENLOG_INFO
+                   "NFIT: detected failures on PMEM MFNs 0x%lx - 0x%lx, skipped\n",
+                   smfn, emfn);
+            continue;
+        }
+
+        printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn, emfn);
+    }
+}
 
 void __init acpi_nfit_boot_init(void)
 {
@@ -53,6 +208,25 @@ void __init acpi_nfit_boot_init(void)
                      PAGE_HYPERVISOR);
 }
 
+void __init acpi_nfit_init(void)
+{
+    if ( !nfit_desc.acpi_table )
+        return;
+
+    /* Collect all SPA and memory map sub-tables. */
+    if ( acpi_nfit_add_subtables(&nfit_desc) )
+    {
+        printk(XENLOG_ERR "NFIT: no memory for NFIT management\n");
+        return;
+    }
+
+    /* Link descriptors of SPA and memory map sub-tables. */
+    acpi_nfit_link_subtables(&nfit_desc);
+
+    /* Register valid pmem regions to Xen hypervisor. */
+    acpi_nfit_register_pmem(&nfit_desc);
+}
+
 /**
  * Search pmem regions overlapped with the specified address range.
  *
diff --git a/xen/include/acpi/actbl1.h b/xen/include/acpi/actbl1.h
index 94d8d7775c..037652916a 100644
--- a/xen/include/acpi/actbl1.h
+++ b/xen/include/acpi/actbl1.h
@@ -946,6 +946,32 @@ struct acpi_nfit_system_address {
 	u64 memory_mapping;
 };
 
+/* 1: Memory Device to System Address Range Map Structure */
+struct acpi_nfit_memory_map {
+	struct acpi_nfit_header header;
+	u32 device_handle;
+	u16 physical_id;
+	u16 region_id;
+	u16 range_index;
+	u16 region_index;
+	u64 region_size;
+	u64 region_offset;
+	u64 address;
+	u16 interleave_index;
+	u16 interleave_ways;
+	u16 flags;
+	u16 reserved;		/* Reserved, must be zero */
+};
+
+/* Flags in struct acpi_nfit_memory_map */
+#define ACPI_NFIT_MEM_SAVE_FAILED		(1)	/* 00: Last SAVE to Memory Device failed */
+#define ACPI_NFIT_MEM_RESTORE_FAILED	(1<<1)	/* 01: Last RESTORE from Memory Device failed */
+#define ACPI_NFIT_MEM_FLUSH_FAILED		(1<<2)	/* 02: Platform flush failed */
+#define ACPI_NFIT_MEM_NOT_ARMED			(1<<3)	/* 03: Memory Device is not armed */
+#define ACPI_NFIT_MEM_HEALTH_OBSERVED	(1<<4)	/* 04: Memory Device observed SMART/health events */
+#define ACPI_NFIT_MEM_HEALTH_ENABLED	(1<<5)	/* 05: SMART/health events enabled */
+#define ACPI_NFIT_MEM_MAP_FAILED		(1<<6)	/* 06: Mapping to SPA failed */
+
 /*******************************************************************************
  *
  * SBST - Smart Battery Specification Table
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 1bd8f9f4e4..088f01255d 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -185,6 +185,7 @@ void acpi_nfit_boot_init(void);
 bool acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
                                 unsigned long *ret_smfn,
                                 unsigned long *ret_emfn);
+void acpi_nfit_init(void);
 #endif /* CONFIG_NVDIMM_PMEM */
 
 #endif /*_LINUX_ACPI_H*/
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 07/41] xen/pmem: register valid PMEM regions to Xen hypervisor
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (5 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 06/41] acpi: probe valid PMEM regions via NFIT Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0 Haozhong Zhang
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Register valid PMEM regions probed via NFIT to Xen hypervisor. No
frametable and M2P table are created for those PMEM regions at this
stage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>

Changes in v4:
  * Simplify return paths of pmem_list_add().
---
 xen/common/Makefile     |   1 +
 xen/common/pmem.c       | 122 ++++++++++++++++++++++++++++++++++++++++++++++++
 xen/drivers/acpi/nfit.c |  12 ++++-
 xen/include/xen/pmem.h  |  28 +++++++++++
 4 files changed, 162 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/pmem.c
 create mode 100644 xen/include/xen/pmem.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 66cc2c8995..57fa4601b8 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -29,6 +29,7 @@ obj-y += notifier.o
 obj-y += page_alloc.o
 obj-$(CONFIG_HAS_PDX) += pdx.o
 obj-$(CONFIG_PERF_COUNTERS) += perfc.o
+obj-${CONFIG_NVDIMM_PMEM} += pmem.o
 obj-y += preempt.o
 obj-y += random.o
 obj-y += rangeset.o
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
new file mode 100644
index 0000000000..aa0a1d166d
--- /dev/null
+++ b/xen/common/pmem.c
@@ -0,0 +1,122 @@
+/*
+ * xen/common/pmem.c
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/errno.h>
+#include <xen/list.h>
+#include <xen/pmem.h>
+
+/*
+ * All PMEM regions presenting in NFIT SPA range structures are linked
+ * in this list.
+ */
+static LIST_HEAD(pmem_raw_regions);
+static unsigned int nr_raw_regions;
+
+struct pmem {
+    struct list_head link; /* link to one of PMEM region list */
+    unsigned long smfn;    /* start MFN of the PMEM region */
+    unsigned long emfn;    /* end MFN of the PMEM region */
+
+    union {
+        struct {
+            unsigned int pxm; /* proximity domain of the PMEM region */
+        } raw;
+    } u;
+};
+
+static bool check_overlap(unsigned long smfn1, unsigned long emfn1,
+                          unsigned long smfn2, unsigned long emfn2)
+{
+    return (smfn1 >= smfn2 && smfn1 < emfn2) ||
+           (emfn1 > smfn2 && emfn1 <= emfn2);
+}
+
+/**
+ * Add a PMEM region to a list. All PMEM regions in the list are
+ * sorted in the ascending order of the start address. A PMEM region,
+ * whose range is overlapped with anyone in the list, cannot be added
+ * to the list.
+ *
+ * Parameters:
+ *  list:       the list to which a new PMEM region will be added
+ *  smfn, emfn: the range of the new PMEM region
+ *  entry:      return the new entry added to the list
+ *
+ * Return:
+ *  On success, return 0 and the new entry added to the list is
+ *  returned via @entry. Otherwise, return an error number and the
+ *  value of @entry is undefined.
+ */
+static int pmem_list_add(struct list_head *list,
+                         unsigned long smfn, unsigned long emfn,
+                         struct pmem **entry)
+{
+    struct list_head *cur;
+    struct pmem *new_pmem;
+
+    list_for_each_prev(cur, list)
+    {
+        struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
+        unsigned long cur_smfn = cur_pmem->smfn;
+        unsigned long cur_emfn = cur_pmem->emfn;
+
+        if ( check_overlap(smfn, emfn, cur_smfn, cur_emfn) )
+            return -EEXIST;
+
+        if ( cur_smfn < smfn )
+            break;
+    }
+
+    new_pmem = xzalloc(struct pmem);
+    if ( !new_pmem )
+        return -ENOMEM;
+
+    new_pmem->smfn = smfn;
+    new_pmem->emfn = emfn;
+    list_add(&new_pmem->link, cur);
+    if ( entry )
+        *entry = new_pmem;
+
+    return 0;
+}
+
+/**
+ * Register a pmem region to Xen.
+ *
+ * Parameters:
+ *  smfn, emfn: start and end MFNs of the pmem region
+ *  pxm:        the proximity domain of the pmem region
+ *
+ * Return:
+ *  On success, return 0. Otherwise, an error number is returned.
+ */
+int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
+{
+    int rc;
+    struct pmem *pmem;
+
+    if ( smfn >= emfn )
+        return -EINVAL;
+
+    rc = pmem_list_add(&pmem_raw_regions, smfn, emfn, &pmem);
+    if ( !rc )
+        pmem->u.raw.pxm = pxm;
+    nr_raw_regions++;
+
+    return rc;
+}
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index 0a44983aad..6f85d4d911 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -20,6 +20,7 @@
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/pfn.h>
+#include <xen/pmem.h>
 
 /*
  * GUID of a byte addressable persistent memory region
@@ -153,6 +154,7 @@ static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
     struct acpi_nfit_system_address *spa;
     unsigned long smfn, emfn;
     bool failed;
+    int rc;
 
     list_for_each_entry(spa_desc, &desc->spa_list, link)
     {
@@ -188,7 +190,15 @@ static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
             continue;
         }
 
-        printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn, emfn);
+        rc = pmem_register(smfn, emfn, spa->proximity_domain);
+        if ( !rc )
+            printk(XENLOG_INFO
+                   "NFIT: PMEM MFNs 0x%lx - 0x%lx on PXM %u registered\n",
+                   smfn, emfn, spa->proximity_domain);
+        else
+            printk(XENLOG_ERR
+                   "NFIT: failed to register PMEM MFNs 0x%lx - 0x%lx on PXM %u, err %d\n",
+                   smfn, emfn, spa->proximity_domain, rc);
     }
 }
 
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
new file mode 100644
index 0000000000..41cb9bb04f
--- /dev/null
+++ b/xen/include/xen/pmem.h
@@ -0,0 +1,28 @@
+/*
+ * xen/include/xen/pmem.h
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __XEN_PMEM_H__
+#define __XEN_PMEM_H__
+#ifdef CONFIG_NVDIMM_PMEM
+
+#include <xen/types.h>
+
+int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
+
+#endif /* CONFIG_NVDIMM_PMEM */
+#endif /* __XEN_PMEM_H__ */
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (6 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 07/41] xen/pmem: register valid PMEM regions to Xen hypervisor Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op Haozhong Zhang
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Shane Wang,
	Chao Peng, Dan Williams, Gang Wei

... to avoid the inference with the PMEM driver and management
utilities in Dom0.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Gang Wei <gang.wei@intel.com>
Cc: Shane Wang <shane.wang@intel.com>
---
 xen/arch/x86/acpi/power.c |  7 +++++++
 xen/arch/x86/dom0_build.c |  5 +++++
 xen/arch/x86/shutdown.c   |  3 +++
 xen/arch/x86/tboot.c      |  4 ++++
 xen/common/kexec.c        |  3 +++
 xen/common/pmem.c         | 21 +++++++++++++++++++++
 xen/drivers/acpi/nfit.c   | 21 +++++++++++++++++++++
 xen/include/xen/acpi.h    |  2 ++
 xen/include/xen/pmem.h    | 13 +++++++++++++
 9 files changed, 79 insertions(+)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 1e4e5680a7..d135715a49 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -178,6 +178,10 @@ static int enter_state(u32 state)
 
     freeze_domains();
 
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_reinstate();
+#endif
+
     acpi_dmar_reinstate();
 
     if ( (error = disable_nonboot_cpus()) )
@@ -260,6 +264,9 @@ static int enter_state(u32 state)
     mtrr_aps_sync_end();
     adjust_vtd_irq_affinities();
     acpi_dmar_zap();
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_zap();
+#endif
     thaw_domains();
     system_state = SYS_STATE_active;
     spin_unlock(&pm_lock);
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index bf992fef6d..3e4be7c571 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -8,6 +8,7 @@
 #include <xen/iocap.h>
 #include <xen/libelf.h>
 #include <xen/pfn.h>
+#include <xen/pmem.h>
 #include <xen/sched.h>
 #include <xen/sched-if.h>
 #include <xen/softirq.h>
@@ -458,6 +459,10 @@ int __init dom0_setup_permissions(struct domain *d)
             rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
     }
 
+#ifdef CONFIG_NVDIMM_PMEM
+    rc |= pmem_dom0_setup_permission(d);
+#endif
+
     return rc;
 }
 
diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index a87aa60add..1902dfe73e 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -550,6 +550,9 @@ void machine_restart(unsigned int delay_millisecs)
 
     if ( tboot_in_measured_env() )
     {
+#ifdef CONFIG_NVDIMM_PMEM
+        acpi_nfit_reinstate();
+#endif
         acpi_dmar_reinstate();
         tboot_shutdown(TB_SHUTDOWN_REBOOT);
     }
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 59d7c477f4..24e3b81ff1 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -488,6 +488,10 @@ int __init tboot_parse_dmar_table(acpi_table_handler dmar_handler)
     /* but dom0 will read real table, so must zap it there too */
     acpi_dmar_zap();
 
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_zap();
+#endif
+
     return rc;
 }
 
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index c14cbb2b9c..8e9ea131e3 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -366,6 +366,9 @@ static int kexec_common_shutdown(void)
     watchdog_disable();
     console_start_sync();
     spin_debug_disable();
+#ifdef CONFIG_NVDIMM_PMEM
+    acpi_nfit_reinstate();
+#endif
     acpi_dmar_reinstate();
 
     return 0;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index aa0a1d166d..699f8a3322 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -18,6 +18,8 @@
 
 #include <xen/errno.h>
 #include <xen/list.h>
+#include <xen/iocap.h>
+#include <xen/paging.h>
 #include <xen/pmem.h>
 
 /*
@@ -120,3 +122,22 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
 
     return rc;
 }
+
+#ifdef CONFIG_X86
+
+int __init pmem_dom0_setup_permission(struct domain *d)
+{
+    struct list_head *cur;
+    struct pmem *pmem;
+    int rc = 0;
+
+    list_for_each(cur, &pmem_raw_regions)
+    {
+        pmem = list_entry(cur, struct pmem, link);
+        rc |= iomem_deny_access(d, pmem->smfn, pmem->emfn - 1);
+    }
+
+    return rc;
+}
+
+#endif /* CONFIG_X86 */
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index 6f85d4d911..e15d47b352 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -202,6 +202,24 @@ static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
     }
 }
 
+void acpi_nfit_zap(void)
+{
+    uint32_t sig = 0x4e494654; /* "TFIN" */
+
+    if ( nfit_desc.acpi_table )
+        write_atomic((uint32_t *)&nfit_desc.acpi_table->header.signature[0],
+                     sig);
+}
+
+void acpi_nfit_reinstate(void)
+{
+    uint32_t sig = 0x5449464e; /* "NFIT" */
+
+    if ( nfit_desc.acpi_table )
+        write_atomic((uint32_t *)&nfit_desc.acpi_table->header.signature[0],
+                     sig);
+}
+
 void __init acpi_nfit_boot_init(void)
 {
     acpi_status status;
@@ -216,6 +234,9 @@ void __init acpi_nfit_boot_init(void)
     map_pages_to_xen((unsigned long)nfit_desc.acpi_table, PFN_DOWN(nfit_addr),
                      PFN_UP(nfit_addr + nfit_len) - PFN_DOWN(nfit_addr),
                      PAGE_HYPERVISOR);
+
+    /* Hide NFIT from Dom0. */
+    acpi_nfit_zap();
 }
 
 void __init acpi_nfit_init(void)
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 088f01255d..77188193d0 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -186,6 +186,8 @@ bool acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn,
                                 unsigned long *ret_smfn,
                                 unsigned long *ret_emfn);
 void acpi_nfit_init(void);
+void acpi_nfit_zap(void);
+void acpi_nfit_reinstate(void);
 #endif /* CONFIG_NVDIMM_PMEM */
 
 #endif /*_LINUX_ACPI_H*/
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 41cb9bb04f..d5bd54ff19 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -24,5 +24,18 @@
 
 int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
 
+#ifdef CONFIG_X86
+
+int pmem_dom0_setup_permission(struct domain *d);
+
+#else /* !CONFIG_X86 */
+
+static inline int pmem_dom0_setup_permission(...)
+{
+    return -ENOSYS;
+}
+
+#endif /* CONFIG_X86 */
+
 #endif /* CONFIG_NVDIMM_PMEM */
 #endif /* __XEN_PMEM_H__ */
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (7 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0 Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr Haozhong Zhang
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams, Daniel De Graaf

XEN_SYSCTL_nvdimm_op will support a set of sub-commands to manage the
physical NVDIMM devices. This commit just adds the framework for this
hypercall, and does not implement any sub-commands.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>

Changes in v4:
 * Remove unnecessary 'pad' field in struct xen_sysctl_nvdimm_op.
---
 tools/flask/policy/modules/dom0.te  |  2 +-
 xen/common/pmem.c                   | 18 ++++++++++++++++++
 xen/common/sysctl.c                 |  9 +++++++++
 xen/include/public/sysctl.h         | 16 +++++++++++++++-
 xen/include/xen/pmem.h              |  2 ++
 xen/xsm/flask/hooks.c               |  4 ++++
 xen/xsm/flask/policy/access_vectors |  2 ++
 7 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
index 1643b400f0..7379222f18 100644
--- a/tools/flask/policy/modules/dom0.te
+++ b/tools/flask/policy/modules/dom0.te
@@ -16,7 +16,7 @@ allow dom0_t xen_t:xen {
 allow dom0_t xen_t:xen2 {
 	resource_op psr_cmt_op psr_cat_op pmu_ctrl get_symbol
 	get_cpu_levelling_caps get_cpu_featureset livepatch_op
-	gcov_op set_parameter
+	gcov_op set_parameter nvdimm_op
 };
 
 # Allow dom0 to use all XENVER_ subops that have checks.
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 699f8a3322..c3b26dd02d 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -123,6 +123,24 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
     return rc;
 }
 
+/**
+ * Top-level hypercall handler of XEN_SYSCTL_nvdimm_pmem_*.
+ *
+ * Parameters:
+ *  nvdimm: the hypercall parameters
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
+{
+    int rc = -ENOSYS;
+
+    nvdimm->err = -rc;
+
+    return rc;
+}
+
 #ifdef CONFIG_X86
 
 int __init pmem_dom0_setup_permission(struct domain *d)
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 08198b7150..f533875c5c 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xen/pmstat.h>
 #include <xen/livepatch.h>
 #include <xen/gcov.h>
+#include <xen/pmem.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -504,6 +505,14 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         break;
     }
 
+#ifdef CONFIG_NVDIMM_PMEM
+    case XEN_SYSCTL_nvdimm_op:
+        ret = pmem_do_sysctl(&op->u.nvdimm);
+        if ( ret != -ENOSYS )
+            copyback = 1;
+        break;
+#endif
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 6140f1a059..7f0e56f73a 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -36,7 +36,7 @@
 #include "physdev.h"
 #include "tmem.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x00000010
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000011
 
 /*
  * Read console content from Xen buffer ring.
@@ -1045,6 +1045,18 @@ struct xen_sysctl_set_parameter {
     uint16_t pad[3];                        /* IN: MUST be zero. */
 };
 
+/*
+ * Interface for NVDIMM management.
+ */
+
+struct xen_sysctl_nvdimm_op {
+    uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
+    uint32_t err; /* OUT: error code */
+    union {
+        /* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */
+    } u;
+};
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -1074,6 +1086,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_get_cpu_featureset            26
 #define XEN_SYSCTL_livepatch_op                  27
 #define XEN_SYSCTL_set_parameter                 28
+#define XEN_SYSCTL_nvdimm_op                     29
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -1103,6 +1116,7 @@ struct xen_sysctl {
         struct xen_sysctl_cpu_featureset    cpu_featureset;
         struct xen_sysctl_livepatch_op      livepatch;
         struct xen_sysctl_set_parameter     set_parameter;
+        struct xen_sysctl_nvdimm_op         nvdimm;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index d5bd54ff19..922b12f570 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -20,9 +20,11 @@
 #define __XEN_PMEM_H__
 #ifdef CONFIG_NVDIMM_PMEM
 
+#include <public/sysctl.h>
 #include <xen/types.h>
 
 int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
+int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm);
 
 #ifdef CONFIG_X86
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index f01b4cfaaa..f677755512 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -835,6 +835,10 @@ static int flask_sysctl(int cmd)
         return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
                                     XEN2__SET_PARAMETER, NULL);
 
+    case XEN_SYSCTL_nvdimm_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__NVDIMM_OP, NULL);
+
     default:
         return avc_unknown_permission("sysctl", cmd);
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 3a2d863b8f..3bfbb892c7 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -103,6 +103,8 @@ class xen2
     gcov_op
 # XEN_SYSCTL_set_parameter
     set_parameter
+# XEN_SYSCTL_nvdimm_op
+    nvdimm_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (8 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op Haozhong Zhang
@ 2017-12-07 10:09 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

XEN_SYSCTL_nvdimm_pmem_get_rgions_nr, which is a command of hypercall
XEN_SYSCTL_nvdimm_op, is to get the number of PMEM regions of the
specified type (see PMEM_REGION_TYPE_*).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/include/xenctrl.h | 15 +++++++++++++++
 tools/libxc/xc_misc.c         | 23 +++++++++++++++++++++++
 tools/libxl/libxl_nvdimm.c    |  0
 xen/common/pmem.c             | 29 ++++++++++++++++++++++++++++-
 xen/include/public/sysctl.h   | 14 +++++++++++++-
 5 files changed, 79 insertions(+), 2 deletions(-)
 create mode 100644 tools/libxl/libxl_nvdimm.c

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 666db0b919..195ff69846 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2607,6 +2607,21 @@ int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
 int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
                          xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
 
+/*
+ * Get the number of PMEM regions of the specified type.
+ *
+ * Parameters:
+ *  xch:  xc interface handle
+ *  type: the type of PMEM regions, must be one of PMEM_REGION_TYPE_*
+ *  nr:   the number of PMEM regions is returned via this parameter
+ *
+ * Return:
+ *  On success, return 0 and the number of PMEM regions is returned via @nr.
+ *  Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
+                                  uint8_t type, uint32_t *nr);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 5e6714ae2b..a3c6cfe2f6 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -888,6 +888,29 @@ int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout)
     return _xc_livepatch_action(xch, name, LIVEPATCH_ACTION_REPLACE, timeout);
 }
 
+int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
+{
+    DECLARE_SYSCTL;
+    struct xen_sysctl_nvdimm_op *nvdimm = &sysctl.u.nvdimm;
+    int rc;
+
+    if ( !nr || type != PMEM_REGION_TYPE_RAW )
+        return -EINVAL;
+
+    sysctl.cmd = XEN_SYSCTL_nvdimm_op;
+    nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions_nr;
+    nvdimm->err = 0;
+    nvdimm->u.pmem_regions_nr.type = type;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( !rc )
+        *nr = nvdimm->u.pmem_regions_nr.num_regions;
+    else if ( nvdimm->err )
+        rc = nvdimm->err;
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index c3b26dd02d..b196b256bb 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -97,6 +97,23 @@ static int pmem_list_add(struct list_head *list,
     return 0;
 }
 
+static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
+{
+    int rc = 0;
+
+    switch ( regions_nr->type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        regions_nr->num_regions = nr_raw_regions;
+        break;
+
+    default:
+        rc = -EINVAL;
+    }
+
+    return rc;
+}
+
 /**
  * Register a pmem region to Xen.
  *
@@ -134,7 +151,17 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
  */
 int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
 {
-    int rc = -ENOSYS;
+    int rc;
+
+    switch ( nvdimm->cmd )
+    {
+    case XEN_SYSCTL_nvdimm_pmem_get_regions_nr:
+        rc = pmem_get_regions_nr(&nvdimm->u.pmem_regions_nr);
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
 
     nvdimm->err = -rc;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 7f0e56f73a..c3c992225a 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1049,11 +1049,23 @@ struct xen_sysctl_set_parameter {
  * Interface for NVDIMM management.
  */
 
+/* Types of PMEM regions */
+#define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
+
+/* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
+struct xen_sysctl_nvdimm_pmem_regions_nr {
+    uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
+    uint32_t num_regions; /* OUT: the number of PMEM regions of type @type */
+};
+typedef struct xen_sysctl_nvdimm_pmem_regions_nr xen_sysctl_nvdimm_pmem_regions_nr_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_nr_t);
+
 struct xen_sysctl_nvdimm_op {
     uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
+#define XEN_SYSCTL_nvdimm_pmem_get_regions_nr     0
     uint32_t err; /* OUT: error code */
     union {
-        /* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */
+        xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr;
     } u;
 };
 
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (9 preceding siblings ...)
  2017-12-07 10:09 ` [RFC XEN PATCH v4 10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 12/41] tools/xl: add xl command 'pmem-list' Haozhong Zhang
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

XEN_SYSCTL_nvdimm_pmem_get_regions, which is a command of hypercall
XEN_SYSCTL_nvdimm_op, is to get a list of PMEM regions of specified
type (see PMEM_REGION_TYPE_*).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/include/xenctrl.h | 18 ++++++++++++
 tools/libxc/xc_misc.c         | 62 +++++++++++++++++++++++++++++++++++++++
 xen/common/pmem.c             | 67 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h   | 27 +++++++++++++++++
 4 files changed, 174 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 195ff69846..e0adad1cf8 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2622,6 +2622,24 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
 int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
                                   uint8_t type, uint32_t *nr);
 
+/*
+ * Get an array of information of PMEM regions of the specified type.
+ *
+ * Parameters:
+ *  xch:    xc interface handle
+ *  type:   the type of PMEM regions, must be one of PMEM_REGION_TYPE_*
+ *  buffer: the buffer where the information of PMEM regions is returned,
+ *          the caller should allocate enough memory for it.
+ *  nr :    IN: the maximum number of PMEM regions that can be returned
+ *              in @buffer
+ *          OUT: the actual number of returned PMEM regions in @buffer
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
+                               void *buffer, uint32_t *nr);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index a3c6cfe2f6..11befa444f 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -911,6 +911,68 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
     return rc;
 }
 
+int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
+                               void *buffer, uint32_t *nr)
+{
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(buffer, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    struct xen_sysctl_nvdimm_op *nvdimm = &sysctl.u.nvdimm;
+    xen_sysctl_nvdimm_pmem_regions_t *regions = &nvdimm->u.pmem_regions;
+    unsigned int max;
+    unsigned long size;
+    int rc;
+
+    if ( !buffer || !nr )
+        return -EINVAL;
+
+    max = *nr;
+    if ( !max )
+        return 0;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max;
+        break;
+
+    default:
+        return -EINVAL;
+    }
+
+    HYPERCALL_BOUNCE_SET_SIZE(buffer, size);
+    if ( xc_hypercall_bounce_pre(xch, buffer) )
+        return -EFAULT;
+
+    sysctl.cmd = XEN_SYSCTL_nvdimm_op;
+    nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions;
+    nvdimm->err = 0;
+    regions->type = type;
+    regions->num_regions = max;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        set_xen_guest_handle(regions->u_buffer.raw_regions, buffer);
+        break;
+
+    default:
+        rc = -EINVAL;
+        goto out;
+    }
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( !rc )
+        *nr = regions->num_regions;
+    else if ( nvdimm->err )
+        rc = -nvdimm->err;
+
+out:
+    xc_hypercall_bounce_post(xch, buffer);
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index b196b256bb..0afc1573c6 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -22,6 +22,8 @@
 #include <xen/paging.h>
 #include <xen/pmem.h>
 
+#include <asm/guest_access.h>
+
 /*
  * All PMEM regions presenting in NFIT SPA range structures are linked
  * in this list.
@@ -114,6 +116,67 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
     return rc;
 }
 
+static int pmem_get_raw_regions(
+    XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) regions,
+    unsigned int *num_regions)
+{
+    struct list_head *cur;
+    unsigned int nr = 0, max = *num_regions;
+    xen_sysctl_nvdimm_pmem_raw_region_t region;
+    int rc = 0;
+
+    if ( !guest_handle_okay(regions, max * sizeof(region)) )
+        return -EINVAL;
+
+    list_for_each(cur, &pmem_raw_regions)
+    {
+        struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+        if ( nr >= max )
+            break;
+
+        region.smfn = pmem->smfn;
+        region.emfn = pmem->emfn;
+        region.pxm = pmem->u.raw.pxm;
+
+        if ( copy_to_guest_offset(regions, nr, &region, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        nr++;
+    }
+
+    *num_regions = nr;
+
+    return rc;
+}
+
+static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
+{
+    unsigned int type = regions->type, max = regions->num_regions;
+    int rc = 0;
+
+    if ( !max )
+        return 0;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_RAW:
+        rc = pmem_get_raw_regions(regions->u_buffer.raw_regions, &max);
+        break;
+
+    default:
+        rc = -EINVAL;
+    }
+
+    if ( !rc )
+        regions->num_regions = max;
+
+    return rc;
+}
+
 /**
  * Register a pmem region to Xen.
  *
@@ -159,6 +222,10 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
         rc = pmem_get_regions_nr(&nvdimm->u.pmem_regions_nr);
         break;
 
+    case XEN_SYSCTL_nvdimm_pmem_get_regions:
+        rc = pmem_get_regions(&nvdimm->u.pmem_regions);
+        break;
+
     default:
         rc = -ENOSYS;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index c3c992225a..9b2a65fcb9 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1052,6 +1052,15 @@ struct xen_sysctl_set_parameter {
 /* Types of PMEM regions */
 #define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
 
+/* PMEM_REGION_TYPE_RAW */
+struct xen_sysctl_nvdimm_pmem_raw_region {
+    uint64_t smfn;
+    uint64_t emfn;
+    uint32_t pxm;
+};
+typedef struct xen_sysctl_nvdimm_pmem_raw_region xen_sysctl_nvdimm_pmem_raw_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_raw_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
     uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
@@ -1060,12 +1069,30 @@ struct xen_sysctl_nvdimm_pmem_regions_nr {
 typedef struct xen_sysctl_nvdimm_pmem_regions_nr xen_sysctl_nvdimm_pmem_regions_nr_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_nr_t);
 
+/* XEN_SYSCTL_nvdimm_pmem_get_regions */
+struct xen_sysctl_nvdimm_pmem_regions {
+    uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
+    uint32_t num_regions; /* IN: the maximum number of entries that can be
+                                 returned via the guest handler in @u_buffer
+                             OUT: the actual number of entries returned via
+                                  the guest handler in @u_buffer */
+    union {
+        /* if type == PMEM_REGION_TYPE_RAW */
+        XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
+    } u_buffer;           /* IN: the guest handler where the entries of PMEM
+                                 regions of the type @type are returned */
+};
+typedef struct xen_sysctl_nvdimm_pmem_regions xen_sysctl_nvdimm_pmem_regions_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_t);
+
 struct xen_sysctl_nvdimm_op {
     uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
 #define XEN_SYSCTL_nvdimm_pmem_get_regions_nr     0
+#define XEN_SYSCTL_nvdimm_pmem_get_regions        1
     uint32_t err; /* OUT: error code */
     union {
         xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr;
+        xen_sysctl_nvdimm_pmem_regions_t pmem_regions;
     } u;
 };
 
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 12/41] tools/xl: add xl command 'pmem-list'
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (10 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 13/41] x86_64/mm: refactor memory_add() Haozhong Zhang
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

The new xl command 'pmem-list' is used to list the information of PMEM
regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/Makefile        |   2 +-
 tools/libxl/libxl.h         |  20 +++++++
 tools/libxl/libxl_nvdimm.c  | 138 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl |  16 +++++
 tools/xl/Makefile           |   2 +-
 tools/xl/xl.h               |   1 +
 tools/xl/xl_cmdtable.c      |   6 ++
 tools/xl/xl_nvdimm.c        |  92 +++++++++++++++++++++++++++++
 8 files changed, 275 insertions(+), 2 deletions(-)
 create mode 100644 tools/xl/xl_nvdimm.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 5a861f72cb..a6f2dbd1cf 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -139,7 +139,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom_suspend.o libxl_dom_save.o libxl_usb.o \
 			libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \
 			libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \
-			libxl_9pfs.o libxl_domain.o libxl_vdispl.o \
+			libxl_9pfs.o libxl_domain.o libxl_vdispl.o libxl_nvdimm.o \
                         $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5e9aed739d..9ce487e79f 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2304,6 +2304,26 @@ int libxl_psr_cat_get_l3_info(libxl_ctx *ctx, libxl_psr_cat_info **info,
 void libxl_psr_cat_info_list_free(libxl_psr_cat_info *list, int nr);
 #endif
 
+/* NVDIMM */
+
+/*
+ * Get a list of information of PMEM regions of the specified type.
+ *
+ * Parameters:
+ *  ctx:       libxl contenxt
+ *  type:      type of the PMEM regions
+ *  regions_r: return the information list (one entry per region) on success;
+ *             the list is dynamically allocated and shall be freed by callers
+ *  nr_r:      return the number of entries in regions_r on success
+ *
+ * Return:
+ *  0 on success; otherwise, ERROR_*, and leave errno valid.
+ */
+int libxl_nvdimm_pmem_get_regions(libxl_ctx *ctx,
+                                  libxl_nvdimm_pmem_region_type type,
+                                  libxl_nvdimm_pmem_region **regions_r,
+                                  unsigned int *nr_r);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index e69de29bb2..70da18f11f 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -0,0 +1,138 @@
+/*
+ * tools/libxl/libxl_nvdimm.c
+ *
+ * <One line description of the file and what it does>
+ *
+ * Copyright (C) 2017  Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xenctrl.h>
+#include <xen-tools/libs.h>
+
+#include "libxl_internal.h"
+
+/*
+ * Convert sizeof(libxl_nvdimm_pmem_*_region) to
+ * sizeof(xen_sysctl_nvdimm_pmem_*_region_t).
+ *
+ * Indexed by LIBXL_NVDIMM_PMEM_REGION_TYPE_*.
+ */
+static size_t xc_pmem_region_struct_size[] = {
+    [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = sizeof(libxl_nvdimm_pmem_raw_region),
+};
+
+static int get_xc_region_type(libxl_nvdimm_pmem_region_type type,
+                               uint8_t *xc_type_r)
+{
+    static uint8_t xc_region_types[] = {
+        [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = PMEM_REGION_TYPE_RAW,
+    };
+    static unsigned int nr_types =
+        sizeof(xc_region_types) / sizeof(xc_region_types[0]);
+
+    if (type >= nr_types)
+        return -EINVAL;
+
+    *xc_type_r = xc_region_types[type];
+
+    return 0;
+}
+
+static void copy_from_xc_regions(libxl_nvdimm_pmem_region *tgt_regions,
+                                 void *src_xc_regions, uint8_t xc_type,
+                                 unsigned int nr)
+{
+    static size_t offset = offsetof(libxl_nvdimm_pmem_region, u);
+    libxl_nvdimm_pmem_region *tgt = tgt_regions;
+    libxl_nvdimm_pmem_region *end = tgt_regions + nr;
+    void *src = src_xc_regions;
+    size_t size = xc_pmem_region_struct_size[xc_type];
+
+    BUILD_BUG_ON(sizeof(libxl_nvdimm_pmem_raw_region) !=
+                 sizeof(xen_sysctl_nvdimm_pmem_raw_region_t));
+
+    while (tgt < end) {
+        memcpy((void *)tgt + offset, src, size);
+        tgt += 1;
+        src += size;
+    }
+}
+
+int libxl_nvdimm_pmem_get_regions(libxl_ctx *ctx,
+                                  libxl_nvdimm_pmem_region_type type,
+                                  libxl_nvdimm_pmem_region **regions_r,
+                                  unsigned int *nr_r)
+{
+    GC_INIT(ctx);
+    libxl_nvdimm_pmem_region *regions;
+    uint8_t xc_type;
+    unsigned int nr;
+    void *xc_regions;
+    int rc = 0, err;
+
+    err = get_xc_region_type(type, &xc_type);
+    if (err) {
+        LOGE(ERROR, "invalid PMEM region type %d required", type);
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+    err = xc_nvdimm_pmem_get_regions_nr(ctx->xch, xc_type, &nr);
+    if (err) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (!nr) {
+        *nr_r = 0;
+        goto out;
+    }
+
+    xc_regions = libxl__malloc(gc, nr * xc_pmem_region_struct_size[xc_type]);
+    if (!xc_regions) {
+        LOGE(ERROR, "cannot allocate xc buffer for %d regions", nr);
+        err = -ENOMEM;
+        rc = ERROR_NOMEM;
+        goto out;
+    }
+
+    err = xc_nvdimm_pmem_get_regions(ctx->xch, xc_type, xc_regions, &nr);
+    if (err) {
+        LOGE(ERROR, "cannot get information of PMEM regions of type %d, err %d",
+             type, err);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    regions = libxl__malloc(NOGC, sizeof(*regions) * nr);
+    if (!regions) {
+        LOGE(ERROR, "cannot allocate return buffer for %d regions", nr);
+        err = -ENOMEM;
+        rc = ERROR_NOMEM;
+        goto out;
+    }
+    copy_from_xc_regions(regions, xc_regions, xc_type, nr);
+
+    *regions_r = regions;
+    *nr_r = nr;
+
+ out:
+    GC_FREE;
+
+    if (rc)
+        errno = -err;
+
+    return rc;
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a239324341..1c7b8998e9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -1041,3 +1041,19 @@ libxl_psr_cat_info = Struct("psr_cat_info", [
     ("cbm_len", uint32),
     ("cdp_enabled", bool),
     ])
+
+libxl_nvdimm_pmem_region_type = Enumeration("nvdimm_pmem_region_type", [
+    (0, "RAW"),
+    ])
+
+libxl_nvdimm_pmem_raw_region = Struct("nvdimm_pmem_raw_region", [
+    ("smfn", uint64),
+    ("emfn", uint64),
+    ("pxm", uint32),
+    ])
+
+libxl_nvdimm_pmem_region = Struct("nvdimm_pmem_region", [
+    ("u", KeyedUnion(None, libxl_nvdimm_pmem_region_type, "type",
+                     [("raw", libxl_nvdimm_pmem_raw_region),
+                     ])),
+    ])
diff --git a/tools/xl/Makefile b/tools/xl/Makefile
index a5117ab3fb..0c374b3c2a 100644
--- a/tools/xl/Makefile
+++ b/tools/xl/Makefile
@@ -22,7 +22,7 @@ XL_OBJS += xl_vtpm.o xl_block.o xl_nic.o xl_usb.o
 XL_OBJS += xl_sched.o xl_pci.o xl_vcpu.o xl_cdrom.o xl_mem.o
 XL_OBJS += xl_info.o xl_console.o xl_misc.o
 XL_OBJS += xl_vmcontrol.o xl_saverestore.o xl_migrate.o
-XL_OBJS += xl_vdispl.o
+XL_OBJS += xl_vdispl.o xl_nvdimm.o
 
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxentoollog)
 $(XL_OBJS): CFLAGS += $(CFLAGS_XL)
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 6b60d1db50..9359a3d9c7 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -210,6 +210,7 @@ int main_psr_cat_cbm_set(int argc, char **argv);
 int main_psr_cat_show(int argc, char **argv);
 #endif
 int main_qemu_monitor_command(int argc, char **argv);
+int main_pmem_list(int argc, char **argv);
 
 void help(const char *command);
 
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 5546cf66e7..f525cafcdf 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -616,6 +616,12 @@ struct cmd_spec cmd_table[] = {
       "Issue a qemu monitor command to the device model of a domain",
       "<Domain> <Command>",
     },
+    { "pmem-list",
+      &main_pmem_list, 0, 0,
+      "List PMEM regions of specified types, or all PMEM regions if no type is specified",
+      "[options]",
+      "-r, --raw   List PMEM regions detected by Xen hypervisor\n"
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
diff --git a/tools/xl/xl_nvdimm.c b/tools/xl/xl_nvdimm.c
new file mode 100644
index 0000000000..799c76e4c2
--- /dev/null
+++ b/tools/xl/xl_nvdimm.c
@@ -0,0 +1,92 @@
+/*
+ * tools/xl/xl_nvdimm.c
+ *
+ * <One line description of the file and what it does>
+ *
+ * Copyright (C) 2017  Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <libxl.h>
+
+#include "xl.h"
+#include "xl_utils.h"
+
+typedef void (*show_region_fn_t)(libxl_nvdimm_pmem_region *region,
+                                 unsigned int idx);
+
+static void show_raw_region(libxl_nvdimm_pmem_region *region, unsigned int idx)
+{
+    libxl_nvdimm_pmem_raw_region *raw = &region->u.raw;
+
+    printf(" %u: mfn 0x%lx - 0x%lx, pxm %u\n",
+           idx, raw->smfn, raw->emfn, raw->pxm);
+}
+
+static show_region_fn_t show_region_fn[] = {
+    [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = show_raw_region,
+};
+
+static int list_regions(libxl_nvdimm_pmem_region_type type)
+{
+    int rc;
+    libxl_nvdimm_pmem_region *regions = NULL;
+    unsigned int nr, i;
+
+    rc = libxl_nvdimm_pmem_get_regions(ctx, type, &regions, &nr);
+    if (rc || !nr)
+        goto out;
+
+    printf("List of %s PMEM regions:\n",
+           libxl_nvdimm_pmem_region_type_to_string(type));
+    for (i = 0; i < nr; i++)
+        show_region_fn[type](&regions[i], i);
+
+ out:
+    if (regions)
+        free(regions);
+
+    if (rc)
+        fprintf(stderr, "Error: pmem-list failed: %s\n", strerror(errno));
+
+    return rc;
+}
+
+int main_pmem_list(int argc, char **argv)
+{
+    static struct option opts[] = {
+        { "raw", 0, 0, 'r' },
+        COMMON_LONG_OPTS
+    };
+
+    bool all = true, raw = false;
+    int opt, ret = 0;
+
+    SWITCH_FOREACH_OPT(opt, "r", opts, "pmem-list", 0) {
+    case 'r':
+        all = false;
+        raw = true;
+        break;
+    }
+
+    if (all || raw)
+        ret = list_regions(LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW);
+
+    return ret;
+}
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 13/41] x86_64/mm: refactor memory_add()
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (11 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 12/41] tools/xl: add xl command 'pmem-list' Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 14/41] x86_64/mm: allow customized location of extended frametable and M2P table Haozhong Zhang
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

Separate the revertible part of memory_add_common(), which will also
be used in PMEM management. The separation will ease the failure
recovery in PMEM management. Several coding-style issues in the
touched code are fixed as well.

No functional change is introduced.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 98 +++++++++++++++++++++++++++---------------------
 1 file changed, 56 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 35a1535c1e..90341267d9 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1337,21 +1337,16 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
     return 1;
 }
 
-/*
- * A bit paranoid for memory allocation failure issue since
- * it may be reason for memory add
- */
-int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
+static int memory_add_common(struct mem_hotadd_info *info,
+                             unsigned int pxm, bool direct_map)
 {
-    struct mem_hotadd_info info;
+    unsigned long spfn = info->spfn, epfn = info->epfn;
     int ret;
     nodeid_t node;
     unsigned long old_max = max_page, old_total = total_pages;
     unsigned long old_node_start, old_node_span, orig_online;
     unsigned long i;
 
-    dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, pxm);
-
     if ( !mem_hotadd_check(spfn, epfn) )
         return -EINVAL;
 
@@ -1366,22 +1361,25 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
         return -EINVAL;
     }
 
-    i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1;
-    if ( spfn < i )
-    {
-        ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn,
-                               min(epfn, i) - spfn, PAGE_HYPERVISOR);
-        if ( ret )
-            goto destroy_directmap;
-    }
-    if ( i < epfn )
+    if ( direct_map )
     {
-        if ( i < spfn )
-            i = spfn;
-        ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i,
-                               epfn - i, __PAGE_HYPERVISOR_RW);
-        if ( ret )
-            goto destroy_directmap;
+        i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1;
+        if ( spfn < i )
+        {
+            ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn,
+                                   min(epfn, i) - spfn, PAGE_HYPERVISOR);
+            if ( ret )
+                goto destroy_directmap;
+        }
+        if ( i < epfn )
+        {
+            if ( i < spfn )
+                i = spfn;
+            ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i,
+                                   epfn - i, __PAGE_HYPERVISOR_RW);
+            if ( ret )
+                goto destroy_directmap;
+        }
     }
 
     old_node_start = node_start_pfn(node);
@@ -1398,22 +1396,18 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     }
     else
     {
-        if (node_start_pfn(node) > spfn)
+        if ( node_start_pfn(node) > spfn )
             NODE_DATA(node)->node_start_pfn = spfn;
-        if (node_end_pfn(node) < epfn)
+        if ( node_end_pfn(node) < epfn )
             NODE_DATA(node)->node_spanned_pages = epfn - node_start_pfn(node);
     }
 
-    info.spfn = spfn;
-    info.epfn = epfn;
-    info.cur = spfn;
-
-    ret = extend_frame_table(&info);
+    ret = extend_frame_table(info);
     if ( ret )
         goto restore_node_status;
 
     /* Set max_page as setup_m2p_table will use it*/
-    if (max_page < epfn)
+    if ( max_page < epfn )
     {
         max_page = epfn;
         max_pdx = pfn_to_pdx(max_page - 1) + 1;
@@ -1421,7 +1415,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     total_pages += epfn - spfn;
 
     set_pdx_range(spfn, epfn);
-    ret = setup_m2p_table(&info);
+    ret = setup_m2p_table(info);
 
     if ( ret )
         goto destroy_m2p;
@@ -1429,11 +1423,12 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
     {
         for ( i = spfn; i < epfn; i++ )
-            if ( iommu_map_page(hardware_domain, i, i, IOMMUF_readable|IOMMUF_writable) )
+            if ( iommu_map_page(hardware_domain, i, i,
+                                IOMMUF_readable|IOMMUF_writable) )
                 break;
         if ( i != epfn )
         {
-            while (i-- > old_max)
+            while ( i-- > old_max )
                 /* If statement to satisfy __must_check. */
                 if ( iommu_unmap_page(hardware_domain, i) )
                     continue;
@@ -1442,14 +1437,10 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
         }
     }
 
-    /* We can't revert any more */
-    share_hotadd_m2p_table(&info);
-    transfer_pages_to_heap(&info);
-
     return 0;
 
 destroy_m2p:
-    destroy_m2p_mapping(&info);
+    destroy_m2p_mapping(info);
     max_page = old_max;
     total_pages = old_total;
     max_pdx = pfn_to_pdx(max_page - 1) + 1;
@@ -1459,9 +1450,32 @@ restore_node_status:
         node_set_offline(node);
     NODE_DATA(node)->node_start_pfn = old_node_start;
     NODE_DATA(node)->node_spanned_pages = old_node_span;
- destroy_directmap:
-    destroy_xen_mappings((unsigned long)mfn_to_virt(spfn),
-                         (unsigned long)mfn_to_virt(epfn));
+destroy_directmap:
+    if ( direct_map )
+        destroy_xen_mappings((unsigned long)mfn_to_virt(spfn),
+                             (unsigned long)mfn_to_virt(epfn));
+
+    return ret;
+}
+
+/*
+ * A bit paranoid for memory allocation failure issue since
+ * it may be reason for memory add
+ */
+int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
+{
+    struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn };
+    int ret;
+
+    dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, pxm);
+
+    ret = memory_add_common(&info, pxm, true);
+    if ( !ret )
+    {
+        /* We can't revert any more */
+        share_hotadd_m2p_table(&info);
+        transfer_pages_to_heap(&info);
+    }
 
     return ret;
 }
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 14/41] x86_64/mm: allow customized location of extended frametable and M2P table
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (12 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 13/41] x86_64/mm: refactor memory_add() Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 15/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region Haozhong Zhang
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Andrew Cooper, Jan Beulich, Chao Peng, Dan Williams

As the existing data in PMEM region is persistent, Xen hypervisor has
no knowledge of which part is free to be used for the frame table and
M2P table of that PMEM region. Instead, we will allow users or system
admins to specify the location of those frame table and M2P table.
The location is not necessarily at the beginning of the PMEM region,
which is different from the case of hotplugged RAM.

This commit adds the support for a customized page allocation
function, which is used to allocate the memory for the frame table and
M2P table. No page free function is added, and we require that all
allocated pages can be reclaimed or has no effect out of
memory_add_common(), if memory_add_common() fails.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 83 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 69 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 90341267d9..36dcb3f1cb 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -106,13 +106,44 @@ struct mem_hotadd_info
     unsigned long cur;
 };
 
+struct mem_hotadd_alloc
+{
+    /*
+     * Allocate 2^PAGETABLE_ORDER pages.
+     *
+     * No free function is added right now, so we require that all
+     * allocated pages can be reclaimed easily or has no effect out of
+     * memory_add_common(), if memory_add_common() fails.
+     *
+     * For example, alloc_hotadd_mfn(), which is used in RAM hotplug,
+     * allocates pages from the hotplugged RAM. If memory_add_common()
+     * fails, the hotplugged RAM will not be available to Xen, so
+     * pages allocated by alloc_hotadd_mfns() will never be used and
+     * have no effect.
+     *
+     * Parameters:
+     *  opaque:   arguments of the allocator (depending on the implementation)
+     *
+     * Return:
+     *  On success, return MFN of the first page.
+     *  Otherwise, return mfn_x(INVALID_MFN).
+     */
+    unsigned long (*alloc_mfns)(void *opaque);
+
+    /*
+     * Additional arguments passed to @alloc_mfns().
+     */
+    void *opaque;
+};
+
 static int hotadd_mem_valid(unsigned long pfn, struct mem_hotadd_info *info)
 {
     return (pfn < info->epfn && pfn >= info->spfn);
 }
 
-static unsigned long alloc_hotadd_mfn(struct mem_hotadd_info *info)
+static unsigned long alloc_hotadd_mfn(void *opaque)
 {
+    struct mem_hotadd_info *info = opaque;
     unsigned mfn;
 
     ASSERT((info->cur + ( 1UL << PAGETABLE_ORDER) < info->epfn) &&
@@ -315,7 +346,8 @@ static void destroy_m2p_mapping(struct mem_hotadd_info *info)
  * spfn/epfn: the pfn ranges to be setup
  * free_s/free_e: the pfn ranges that is free still
  */
-static int setup_compat_m2p_table(struct mem_hotadd_info *info)
+static int setup_compat_m2p_table(struct mem_hotadd_info *info,
+                                  struct mem_hotadd_alloc *alloc)
 {
     unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn;
     unsigned int n;
@@ -369,7 +401,13 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
         if ( n == CNT )
             continue;
 
-        mfn = alloc_hotadd_mfn(info);
+        mfn = alloc->alloc_mfns(alloc->opaque);
+        if ( mfn == mfn_x(INVALID_MFN) )
+        {
+            err = -ENOMEM;
+            break;
+        }
+
         err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER,
                                PAGE_HYPERVISOR);
         if ( err )
@@ -389,7 +427,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
  * Allocate and map the machine-to-phys table.
  * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup already
  */
-static int setup_m2p_table(struct mem_hotadd_info *info)
+static int setup_m2p_table(struct mem_hotadd_info *info,
+                           struct mem_hotadd_alloc *alloc)
 {
     unsigned long i, va, smap, emap;
     unsigned int n;
@@ -438,7 +477,13 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
                 break;
         if ( n < CNT )
         {
-            unsigned long mfn = alloc_hotadd_mfn(info);
+            unsigned long mfn = alloc->alloc_mfns(alloc->opaque);
+
+            if ( mfn == mfn_x(INVALID_MFN) )
+            {
+                ret = -ENOMEM;
+                goto error;
+            }
 
             ret = map_pages_to_xen(
                         RDWR_MPT_VIRT_START + i * sizeof(unsigned long),
@@ -483,7 +528,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 #undef CNT
 #undef MFN
 
-    ret = setup_compat_m2p_table(info);
+    ret = setup_compat_m2p_table(info, alloc);
 error:
     return ret;
 }
@@ -762,7 +807,7 @@ static void cleanup_frame_table(unsigned long spfn, unsigned long epfn)
 }
 
 static int setup_frametable_chunk(void *start, void *end,
-                                  struct mem_hotadd_info *info)
+                                  struct mem_hotadd_alloc *alloc)
 {
     unsigned long s = (unsigned long)start;
     unsigned long e = (unsigned long)end;
@@ -774,7 +819,13 @@ static int setup_frametable_chunk(void *start, void *end,
 
     for ( cur = s; cur < e; cur += (1UL << L2_PAGETABLE_SHIFT) )
     {
-        mfn = alloc_hotadd_mfn(info);
+        mfn = alloc->alloc_mfns(alloc->opaque);
+        if ( mfn == mfn_x(INVALID_MFN) )
+        {
+            err = -ENOMEM;
+            break;
+        }
+
         err = map_pages_to_xen(cur, mfn, 1UL << PAGETABLE_ORDER,
                                PAGE_HYPERVISOR);
         if ( err )
@@ -789,7 +840,8 @@ static int setup_frametable_chunk(void *start, void *end,
     return err;
 }
 
-static int extend_frame_table(struct mem_hotadd_info *info)
+static int extend_frame_table(struct mem_hotadd_info *info,
+                              struct mem_hotadd_alloc *alloc)
 {
     unsigned long cidx, nidx, eidx, spfn, epfn;
     int err = 0;
@@ -816,7 +868,7 @@ static int extend_frame_table(struct mem_hotadd_info *info)
             nidx = eidx;
         err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT ),
                                      pdx_to_page(nidx * PDX_GROUP_COUNT),
-                                     info);
+                                     alloc);
         if ( err )
             break;
 
@@ -1338,7 +1390,8 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
 }
 
 static int memory_add_common(struct mem_hotadd_info *info,
-                             unsigned int pxm, bool direct_map)
+                             unsigned int pxm, bool direct_map,
+                             struct mem_hotadd_alloc *alloc)
 {
     unsigned long spfn = info->spfn, epfn = info->epfn;
     int ret;
@@ -1402,7 +1455,7 @@ static int memory_add_common(struct mem_hotadd_info *info,
             NODE_DATA(node)->node_spanned_pages = epfn - node_start_pfn(node);
     }
 
-    ret = extend_frame_table(info);
+    ret = extend_frame_table(info, alloc);
     if ( ret )
         goto restore_node_status;
 
@@ -1415,7 +1468,7 @@ static int memory_add_common(struct mem_hotadd_info *info,
     total_pages += epfn - spfn;
 
     set_pdx_range(spfn, epfn);
-    ret = setup_m2p_table(info);
+    ret = setup_m2p_table(info, alloc);
 
     if ( ret )
         goto destroy_m2p;
@@ -1465,11 +1518,13 @@ destroy_directmap:
 int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
 {
     struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn };
+    struct mem_hotadd_alloc alloc =
+        { .alloc_mfns = alloc_hotadd_mfn, .opaque = &info };
     int ret;
 
     dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, pxm);
 
-    ret = memory_add_common(&info, pxm, true);
+    ret = memory_add_common(&info, pxm, true, &alloc);
     if ( !ret )
     {
         /* We can't revert any more */
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 15/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (13 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 14/41] x86_64/mm: allow customized location of extended frametable and M2P table Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 16/41] tools/xl: accept all bases in parse_ulong() Haozhong Zhang
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Add a command XEN_SYSCTL_nvdimm_pmem_setup to hypercall
XEN_SYSCTL_nvdimm_op to setup the frame table and M2P table of a PMEM
region. This command is currently used to setup the management PMEM
region which is used to store the frame table and M2P table of other
PMEM regions and itself. The management PMEM region should not be
mapped to guest.

PMEM pages are not added in any Xen or domain heaps. A new flag
PGC_pmem_page is used to indicate whether a page is from PMEM and
avoid returning PMEM pages to heaps.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/include/xenctrl.h |  16 +++++
 tools/libxc/xc_misc.c         |  33 ++++++++++
 xen/arch/x86/mm.c             |   3 +-
 xen/arch/x86/x86_64/mm.c      |  72 +++++++++++++++++++++
 xen/common/pmem.c             | 142 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/mm.h      |  10 ++-
 xen/include/public/sysctl.h   |  18 ++++++
 xen/include/xen/pmem.h        |   8 +++
 8 files changed, 300 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index e0adad1cf8..935885d6a7 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2640,6 +2640,22 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
 int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
                                void *buffer, uint32_t *nr);
 
+/*
+ * Setup the specified PMEM pages for management usage. If success,
+ * these PMEM pages can be used to store the frametable and M2P table
+ * of itself and other PMEM pages. These management PMEM pages will
+ * never be mapped to guest.
+ *
+ * Parameters:
+ *  xch:        xc interface handle
+ *  smfn, emfn: the start and end MFN of the PMEM region
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 11befa444f..d6b30534d2 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -973,6 +973,39 @@ out:
     return rc;
 }
 
+static void xc_nvdimm_pmem_setup_common(struct xen_sysctl *sysctl,
+                                        unsigned long smfn, unsigned long emfn,
+                                        unsigned long mgmt_smfn,
+                                        unsigned long mgmt_emfn)
+{
+    struct xen_sysctl_nvdimm_op *nvdimm = &sysctl->u.nvdimm;
+    xen_sysctl_nvdimm_pmem_setup_t *setup = &nvdimm->u.pmem_setup;
+
+    sysctl->cmd = XEN_SYSCTL_nvdimm_op;
+    nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_setup;
+    nvdimm->err = 0;
+    setup->smfn = smfn;
+    setup->emfn = emfn;
+    setup->mgmt_smfn = mgmt_smfn;
+    setup->mgmt_emfn = mgmt_emfn;
+}
+
+int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn)
+{
+    DECLARE_SYSCTL;
+    int rc;
+
+    xc_nvdimm_pmem_setup_common(&sysctl, smfn, emfn, smfn, emfn);
+    sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_MGMT;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( rc && sysctl.u.nvdimm.err )
+        rc = -sysctl.u.nvdimm.err;
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b2046ca2f0..9a224cf1bb 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2306,7 +2306,8 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( cleanup_page_cacheattr(page) == 0 )
+        if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. */
+             && cleanup_page_cacheattr(page) == 0 )
             free_domheap_page(page);
         else
             gdprintk(XENLOG_WARNING,
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 36dcb3f1cb..7bd2c9a9af 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1535,6 +1535,78 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     return ret;
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+
+static void pmem_init_frame_table(unsigned long smfn, unsigned long emfn)
+{
+    struct page_info *page = mfn_to_page(smfn), *epage = mfn_to_page(emfn);
+
+    while ( page < epage )
+    {
+        page->count_info = PGC_state_free | PGC_pmem_page;
+        page++;
+    }
+}
+
+/**
+ * Initialize frametable and M2P for the specified PMEM region.
+ *
+ * Parameters:
+ *  smfn, emfn: the start and end MFN of the PMEM region
+ *  mgmt_smfn,
+ *  mgmt_emfn:  the start and end MFN of the PMEM region used to store
+ *              the frame table and M2P table of above PMEM region. If
+ *              @smfn - @emfn is going to be mapped to guest, it should
+ *              not overlap with @mgmt_smfn - @mgmt_emfn. If @smfn - @emfn
+ *              is going to be used for management purpose, it should
+ *              be identical to @mgmt_smfn - @mgnt_emfn.
+ *  used_mgmt_mfns: return the number of pages used in @mgmt_smfn - @mgmt_emfn
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int pmem_arch_setup(unsigned long smfn, unsigned long emfn, unsigned int pxm,
+                    unsigned long mgmt_smfn, unsigned long mgmt_emfn,
+                    unsigned long *used_mgmt_mfns)
+{
+    struct mem_hotadd_info info =
+        { .spfn = smfn, .epfn = emfn, .cur = smfn };
+    struct mem_hotadd_info mgmt_info =
+        { .spfn = mgmt_smfn, .epfn = mgmt_emfn, .cur = mgmt_smfn };
+    struct mem_hotadd_alloc alloc =
+    {
+        .alloc_mfns = alloc_hotadd_mfn,
+        .opaque     = &mgmt_info
+    };
+    bool is_mgmt = (mgmt_smfn == smfn && mgmt_emfn == emfn);
+    int rc;
+
+    if ( mgmt_smfn == mfn_x(INVALID_MFN) || mgmt_emfn == mfn_x(INVALID_MFN) ||
+         mgmt_smfn >= mgmt_emfn )
+        return -EINVAL;
+
+    if ( !is_mgmt &&
+         ((smfn >= mgmt_smfn && smfn < mgmt_emfn) ||
+          (emfn > mgmt_smfn && emfn <= mgmt_emfn)) )
+        return -EINVAL;
+
+    rc = memory_add_common(&info, pxm, false, &alloc);
+    if ( rc )
+        return rc;
+
+    pmem_init_frame_table(smfn, emfn);
+
+    if ( !is_mgmt )
+        share_hotadd_m2p_table(&info);
+
+    if ( used_mgmt_mfns )
+        *used_mgmt_mfns = mgmt_info.cur - mgmt_info.spfn;
+
+    return 0;
+}
+
+#endif /* CONFIG_NVDIMM_PMEM */
+
 #include "compat/mm.c"
 
 /*
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 0afc1573c6..936cf1423f 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -31,6 +31,15 @@
 static LIST_HEAD(pmem_raw_regions);
 static unsigned int nr_raw_regions;
 
+/*
+ * All PMEM regions reserved for management purpose are linked to this
+ * list. All of them must be covered by one or multiple PMEM regions
+ * in list pmem_raw_regions.
+ */
+static LIST_HEAD(pmem_mgmt_regions);
+static DEFINE_SPINLOCK(pmem_mgmt_lock);
+static unsigned int nr_mgmt_regions;
+
 struct pmem {
     struct list_head link; /* link to one of PMEM region list */
     unsigned long smfn;    /* start MFN of the PMEM region */
@@ -40,6 +49,10 @@ struct pmem {
         struct {
             unsigned int pxm; /* proximity domain of the PMEM region */
         } raw;
+
+        struct {
+            unsigned long used; /* # of used pages in MGMT PMEM region */
+        } mgmt;
     } u;
 };
 
@@ -99,6 +112,18 @@ static int pmem_list_add(struct list_head *list,
     return 0;
 }
 
+/**
+ * Delete the specified entry from the list to which it's currently linked.
+ *
+ * Parameters:
+ *  entry: the entry to be deleted
+ */
+static void pmem_list_del(struct pmem *entry)
+{
+    list_del(&entry->link);
+    xfree(entry);
+}
+
 static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
 {
     int rc = 0;
@@ -177,6 +202,114 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
     return rc;
 }
 
+static bool check_mgmt_size(unsigned long mgmt_mfns, unsigned long total_mfns)
+{
+    return mgmt_mfns >=
+        ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
+        ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
+}
+
+static bool check_address_and_pxm(unsigned long smfn, unsigned long emfn,
+                                  unsigned int *ret_pxm)
+{
+    struct list_head *cur;
+    long pxm = -1;
+
+    list_for_each(cur, &pmem_raw_regions)
+    {
+        struct pmem *raw = list_entry(cur, struct pmem, link);
+        unsigned long raw_smfn = raw->smfn, raw_emfn = raw->emfn;
+
+        if ( !check_overlap(smfn, emfn, raw_smfn, raw_emfn) )
+            continue;
+
+        if ( smfn < raw_smfn )
+            return false;
+
+        if ( pxm != -1 && pxm != raw->u.raw.pxm )
+            return false;
+        pxm = raw->u.raw.pxm;
+
+        smfn = min(emfn, raw_emfn);
+        if ( smfn == emfn )
+            break;
+    }
+
+    *ret_pxm = pxm;
+
+    return smfn == emfn;
+}
+
+static int pmem_setup_mgmt(unsigned long smfn, unsigned long emfn)
+{
+    struct pmem *mgmt;
+    unsigned long used_mgmt_mfns;
+    unsigned int pxm;
+    int rc;
+
+    if ( smfn == mfn_x(INVALID_MFN) || emfn == mfn_x(INVALID_MFN) ||
+         smfn >= emfn )
+        return -EINVAL;
+
+    /*
+     * Require the PMEM region in one proximity domain, in order to
+     * avoid the error recovery from multiple calls to pmem_arch_setup()
+     * which is not revertible.
+     */
+    if ( !check_address_and_pxm(smfn, emfn, &pxm) )
+        return -EINVAL;
+
+    if ( !check_mgmt_size(emfn - smfn, emfn - smfn) )
+        return -ENOSPC;
+
+    spin_lock(&pmem_mgmt_lock);
+
+    rc = pmem_list_add(&pmem_mgmt_regions, smfn, emfn, &mgmt);
+    if ( rc )
+        goto out;
+
+    rc = pmem_arch_setup(smfn, emfn, pxm, smfn, emfn, &used_mgmt_mfns);
+    if ( rc )
+    {
+        pmem_list_del(mgmt);
+        goto out;
+    }
+
+    mgmt->u.mgmt.used = used_mgmt_mfns;
+    nr_mgmt_regions++;
+
+ out:
+    spin_unlock(&pmem_mgmt_lock);
+
+    return rc;
+}
+
+static int pmem_setup(unsigned long smfn, unsigned long emfn,
+                      unsigned long mgmt_smfn, unsigned long mgmt_emfn,
+                      unsigned int type)
+{
+    int rc;
+
+    switch ( type )
+    {
+    case PMEM_REGION_TYPE_MGMT:
+        if ( smfn != mgmt_smfn || emfn != mgmt_emfn )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = pmem_setup_mgmt(smfn, emfn);
+
+        break;
+
+    default:
+        rc = -EINVAL;
+    }
+
+    return rc;
+}
+
 /**
  * Register a pmem region to Xen.
  *
@@ -226,6 +359,15 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
         rc = pmem_get_regions(&nvdimm->u.pmem_regions);
         break;
 
+    case XEN_SYSCTL_nvdimm_pmem_setup:
+    {
+        struct xen_sysctl_nvdimm_pmem_setup *setup = &nvdimm->u.pmem_setup;
+        rc = pmem_setup(setup->smfn, setup->emfn,
+                        setup->mgmt_smfn, setup->mgmt_emfn,
+                        setup->type);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
     }
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 83626085e0..19472e324f 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -256,9 +256,11 @@ struct page_info
 #define PGC_state_offlined PG_mask(2, 9)
 #define PGC_state_free    PG_mask(3, 9)
 #define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
+/* Page is from PMEM? */
+#define PGC_pmem_page     PG_mask(1, 10)
 
  /* Count of references to this frame. */
-#define PGC_count_width   PG_shift(9)
+#define PGC_count_width   PG_shift(10)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
 /*
@@ -275,6 +277,12 @@ struct page_info
     ((((mfn) << PAGE_SHIFT) >= __pa(&_stext)) &&  \
      (((mfn) << PAGE_SHIFT) <= __pa(&__2M_rwdata_end)))
 
+#ifdef CONFIG_NVDIMM_PMEM
+#define is_pmem_page(page) ((page)->count_info & PGC_pmem_page)
+#else
+#define is_pmem_page(page) false
+#endif
+
 #define PRtype_info "016lx"/* should only be used for printk's */
 
 /* The number of out-of-sync shadows we allow per vcpu (prime, please) */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 9b2a65fcb9..7c889cad58 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1051,6 +1051,7 @@ struct xen_sysctl_set_parameter {
 
 /* Types of PMEM regions */
 #define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
+#define PMEM_REGION_TYPE_MGMT       1 /* PMEM regions for management usage */
 
 /* PMEM_REGION_TYPE_RAW */
 struct xen_sysctl_nvdimm_pmem_raw_region {
@@ -1085,14 +1086,31 @@ struct xen_sysctl_nvdimm_pmem_regions {
 typedef struct xen_sysctl_nvdimm_pmem_regions xen_sysctl_nvdimm_pmem_regions_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_t);
 
+/* XEN_SYSCTL_nvdimm_pmem_setup */
+struct xen_sysctl_nvdimm_pmem_setup {
+    /* IN variables */
+    uint64_t smfn;      /* start MFN of the PMEM region */
+    uint64_t emfn;      /* end MFN of the PMEM region */
+    uint64_t mgmt_smfn;
+    uint64_t mgmt_emfn; /* start and end MFN of PMEM pages used to manage */
+                        /* above PMEM region. If the above PMEM region is */
+                        /* a management region, mgmt_{s,e}mfn is required */
+                        /* to be identical to {s,e}mfn. */
+    uint8_t  type;      /* Only PMEM_REGION_TYPE_MGMT is supported now */
+};
+typedef struct xen_sysctl_nvdimm_pmem_setup xen_sysctl_nvdimm_pmem_setup_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_setup_t);
+
 struct xen_sysctl_nvdimm_op {
     uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
 #define XEN_SYSCTL_nvdimm_pmem_get_regions_nr     0
 #define XEN_SYSCTL_nvdimm_pmem_get_regions        1
+#define XEN_SYSCTL_nvdimm_pmem_setup              2
     uint32_t err; /* OUT: error code */
     union {
         xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr;
         xen_sysctl_nvdimm_pmem_regions_t pmem_regions;
+        xen_sysctl_nvdimm_pmem_setup_t pmem_setup;
     } u;
 };
 
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 922b12f570..9323d679a6 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -29,6 +29,9 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm);
 #ifdef CONFIG_X86
 
 int pmem_dom0_setup_permission(struct domain *d);
+int pmem_arch_setup(unsigned long smfn, unsigned long emfn, unsigned int pxm,
+                    unsigned long mgmt_smfn, unsigned long mgmt_emfn,
+                    unsigned long *used_mgmt_mfns);
 
 #else /* !CONFIG_X86 */
 
@@ -37,6 +40,11 @@ static inline int pmem_dom0_setup_permission(...)
     return -ENOSYS;
 }
 
+static inline int pmem_arch_setup(...)
+{
+    return -ENOSYS;
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* CONFIG_NVDIMM_PMEM */
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 16/41] tools/xl: accept all bases in parse_ulong()
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (14 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 15/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 17/41] tools/xl: expose parse_ulong() Haozhong Zhang
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/xl/xl_parse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 9a692d5ae6..81a50f2edb 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -387,7 +387,7 @@ static unsigned long parse_ulong(const char *str)
     char *endptr;
     unsigned long val;
 
-    val = strtoul(str, &endptr, 10);
+    val = strtoul(str, &endptr, 0);
     if (endptr == str || val == ULONG_MAX) {
         fprintf(stderr, "xl: failed to convert \"%s\" to number\n", str);
         exit(EXIT_FAILURE);
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 17/41] tools/xl: expose parse_ulong()
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (15 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 16/41] tools/xl: accept all bases in parse_ulong() Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 18/41] tools/xl: add xl command 'pmem-setup' Haozhong Zhang
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/xl/xl_parse.c | 2 +-
 tools/xl/xl_parse.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 81a50f2edb..993b754c0a 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -382,7 +382,7 @@ static void parse_vcpu_affinity(libxl_domain_build_info *b_info,
     }
 }
 
-static unsigned long parse_ulong(const char *str)
+unsigned long parse_ulong(const char *str)
 {
     char *endptr;
     unsigned long val;
diff --git a/tools/xl/xl_parse.h b/tools/xl/xl_parse.h
index cc459fb43f..1a4f12b0f3 100644
--- a/tools/xl/xl_parse.h
+++ b/tools/xl/xl_parse.h
@@ -21,6 +21,7 @@ void parse_config_data(const char *config_source,
                        const char *config_data,
                        int config_len,
                        libxl_domain_config *d_config);
+unsigned long parse_ulong(const char *str);
 int parse_range(const char *str, unsigned long *a, unsigned long *b);
 int64_t parse_mem_size_kb(const char *mem);
 void parse_disk_config(XLU_Config **config, const char *spec,
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 18/41] tools/xl: add xl command 'pmem-setup'
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (16 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 17/41] tools/xl: expose parse_ulong() Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 19/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

The new xl command 'pmem-setup' with '-m' option is used to setup the
specified PMEM region for the management usage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl.h        | 13 ++++++++++++
 tools/libxl/libxl_nvdimm.c | 11 ++++++++++
 tools/xl/xl.h              |  1 +
 tools/xl/xl_cmdtable.c     |  6 ++++++
 tools/xl/xl_nvdimm.c       | 51 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 82 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 9ce487e79f..e13a911cb4 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2324,6 +2324,19 @@ int libxl_nvdimm_pmem_get_regions(libxl_ctx *ctx,
                                   libxl_nvdimm_pmem_region **regions_r,
                                   unsigned int *nr_r);
 
+/*
+ * Setup the specified PMEM region for management usage.
+ *
+ * Parameters:
+ *  ctx:        libxl context
+ *  smfn, emfn: start and end MFN's of the PMEM region
+ *
+ * Return:
+ *  0 on success; otherwise, ERROR_*, and leave errno valid.
+ */
+int libxl_nvdimm_pmem_setup_mgmt(libxl_ctx *ctx,
+                                 unsigned long smfn, unsigned long emfn);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index 70da18f11f..c0024298ec 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -136,3 +136,14 @@ int libxl_nvdimm_pmem_get_regions(libxl_ctx *ctx,
 
     return rc;
 }
+
+int libxl_nvdimm_pmem_setup_mgmt(libxl_ctx *ctx,
+                                 unsigned long smfn, unsigned long emfn)
+{
+    int rc = xc_nvdimm_pmem_setup_mgmt(ctx->xch, smfn, emfn);
+
+    if (rc)
+        errno = -rc;
+
+    return errno ? ERROR_FAIL : 0;
+}
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 9359a3d9c7..8995f64a6f 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -211,6 +211,7 @@ int main_psr_cat_show(int argc, char **argv);
 #endif
 int main_qemu_monitor_command(int argc, char **argv);
 int main_pmem_list(int argc, char **argv);
+int main_pmem_setup(int argc, char **argv);
 
 void help(const char *command);
 
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index f525cafcdf..12a2c2d601 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -622,6 +622,12 @@ struct cmd_spec cmd_table[] = {
       "[options]",
       "-r, --raw   List PMEM regions detected by Xen hypervisor\n"
     },
+    { "pmem-setup",
+      &main_pmem_setup, 0, 1,
+      "Setup a PMEM region for specified usage purpose",
+      "[options]",
+      "-m, --mgmt <smfn> <emfn>  Set PMEM pages smfn - emfn for management usage\n"
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
diff --git a/tools/xl/xl_nvdimm.c b/tools/xl/xl_nvdimm.c
index 799c76e4c2..25dc6350da 100644
--- a/tools/xl/xl_nvdimm.c
+++ b/tools/xl/xl_nvdimm.c
@@ -24,8 +24,10 @@
 #include <string.h>
 
 #include <libxl.h>
+#include <libxlutil.h>
 
 #include "xl.h"
+#include "xl_parse.h"
 #include "xl_utils.h"
 
 typedef void (*show_region_fn_t)(libxl_nvdimm_pmem_region *region,
@@ -90,3 +92,52 @@ int main_pmem_list(int argc, char **argv)
 
     return ret;
 }
+
+int main_pmem_setup(int argc, char **argv)
+{
+    static struct option opts[] = {
+        { "mgmt", 1, 0, 'm' },
+        COMMON_LONG_OPTS
+    };
+
+    bool mgmt = false;
+    unsigned long mgmt_smfn, mgmt_emfn;
+    int opt, rc = 0;
+
+#define CHECK_NR_ARGS(expected, option)                                 \
+    do {                                                                \
+        if (argc + 1 != optind + (expected)) {                          \
+            fprintf(stderr,                                             \
+                    "Error: 'xl pmem-setup %s' requires %u arguments\n\n", \
+                    (option), (expected));                              \
+            help("pmem-setup");                                         \
+                                                                        \
+            rc = ERROR_INVAL;                                           \
+            errno = EINVAL;                                             \
+                                                                        \
+            goto out;                                                   \
+        }                                                               \
+    } while (0)
+
+    SWITCH_FOREACH_OPT(opt, "m:", opts, "pmem-setup", 0) {
+    case 'm':
+        CHECK_NR_ARGS(2, "-m");
+
+        mgmt = true;
+        mgmt_smfn = parse_ulong(optarg);
+        mgmt_emfn = parse_ulong(argv[optind]);
+
+        break;
+    }
+
+#undef CHECK_NR_ARGS
+
+    if (mgmt)
+        rc = libxl_nvdimm_pmem_setup_mgmt(ctx, mgmt_smfn, mgmt_emfn);
+
+ out:
+    if (rc)
+        fprintf(stderr, "Error: pmem-setup failed, %s\n", strerror(errno));
+
+    return rc;
+}
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 19/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (17 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 18/41] tools/xl: add xl command 'pmem-setup' Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 20/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of
management PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/xc_misc.c | 4 +++-
 xen/common/pmem.c     | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index d6b30534d2..bc0be2e1ae 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -894,7 +894,9 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
     struct xen_sysctl_nvdimm_op *nvdimm = &sysctl.u.nvdimm;
     int rc;
 
-    if ( !nr || type != PMEM_REGION_TYPE_RAW )
+    if ( !nr ||
+         (type != PMEM_REGION_TYPE_RAW &&
+          type != PMEM_REGION_TYPE_MGMT) )
         return -EINVAL;
 
     sysctl.cmd = XEN_SYSCTL_nvdimm_op;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 936cf1423f..4de03f6f2d 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -134,6 +134,10 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
         regions_nr->num_regions = nr_raw_regions;
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        regions_nr->num_regions = nr_mgmt_regions;
+        break;
+
     default:
         rc = -EINVAL;
     }
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 20/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (18 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 19/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 21/41] tools/xl: add option '--mgmt | -m' to xl command pmem-list Haozhong Zhang
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of
management PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/xc_misc.c       |  8 ++++++++
 xen/common/pmem.c           | 45 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h | 11 +++++++++++
 3 files changed, 64 insertions(+)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index bc0be2e1ae..77f93ffd9a 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -938,6 +938,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max;
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max;
+        break;
+
     default:
         return -EINVAL;
     }
@@ -958,6 +962,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         set_xen_guest_handle(regions->u_buffer.raw_regions, buffer);
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer);
+        break;
+
     default:
         rc = -EINVAL;
         goto out;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 4de03f6f2d..e286d033f2 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -182,6 +182,47 @@ static int pmem_get_raw_regions(
     return rc;
 }
 
+static int pmem_get_mgmt_regions(
+    XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) regions,
+    unsigned int *num_regions)
+{
+    struct list_head *cur;
+    unsigned int nr = 0, max = *num_regions;
+    xen_sysctl_nvdimm_pmem_mgmt_region_t region;
+    int rc = 0;
+
+    if ( !guest_handle_okay(regions, max * sizeof(region)) )
+        return -EINVAL;
+
+    spin_lock(&pmem_mgmt_lock);
+
+    list_for_each(cur, &pmem_mgmt_regions)
+    {
+        struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+        if ( nr >= max )
+            break;
+
+        region.smfn = pmem->smfn;
+        region.emfn = pmem->emfn;
+        region.used_mfns = pmem->u.mgmt.used;
+
+        if ( copy_to_guest_offset(regions, nr, &region, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        nr++;
+    }
+
+    spin_unlock(&pmem_mgmt_lock);
+
+    *num_regions = nr;
+
+    return rc;
+}
+
 static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 {
     unsigned int type = regions->type, max = regions->num_regions;
@@ -196,6 +237,10 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
         rc = pmem_get_raw_regions(regions->u_buffer.raw_regions, &max);
         break;
 
+    case PMEM_REGION_TYPE_MGMT:
+        rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, &max);
+        break;
+
     default:
         rc = -EINVAL;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 7c889cad58..703dd860e7 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1062,6 +1062,15 @@ struct xen_sysctl_nvdimm_pmem_raw_region {
 typedef struct xen_sysctl_nvdimm_pmem_raw_region xen_sysctl_nvdimm_pmem_raw_region_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_raw_region_t);
 
+/* PMEM_REGION_TYPE_MGMT */
+struct xen_sysctl_nvdimm_pmem_mgmt_region {
+    uint64_t smfn;
+    uint64_t emfn;
+    uint64_t used_mfns;
+};
+typedef struct xen_sysctl_nvdimm_pmem_mgmt_region xen_sysctl_nvdimm_pmem_mgmt_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
     uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
@@ -1080,6 +1089,8 @@ struct xen_sysctl_nvdimm_pmem_regions {
     union {
         /* if type == PMEM_REGION_TYPE_RAW */
         XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
+        /* if type == PMEM_REGION_TYPE_MGMT */
+        XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions;
     } u_buffer;           /* IN: the guest handler where the entries of PMEM
                                  regions of the type @type are returned */
 };
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 21/41] tools/xl: add option '--mgmt | -m' to xl command pmem-list
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (19 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 20/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 22/41] xen/pmem: support setup PMEM region for guest data usage Haozhong Zhang
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

'xl pmem-list --mgmt | -m' is used to list all management regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_nvdimm.c  |  4 ++++
 tools/libxl/libxl_types.idl |  8 ++++++++
 tools/xl/xl_cmdtable.c      |  1 +
 tools/xl/xl_nvdimm.c        | 22 ++++++++++++++++++++--
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index c0024298ec..896b5632ac 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -31,6 +31,7 @@
  */
 static size_t xc_pmem_region_struct_size[] = {
     [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = sizeof(libxl_nvdimm_pmem_raw_region),
+    [LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT] = sizeof(libxl_nvdimm_pmem_mgmt_region),
 };
 
 static int get_xc_region_type(libxl_nvdimm_pmem_region_type type,
@@ -38,6 +39,7 @@ static int get_xc_region_type(libxl_nvdimm_pmem_region_type type,
 {
     static uint8_t xc_region_types[] = {
         [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = PMEM_REGION_TYPE_RAW,
+        [LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT] = PMEM_REGION_TYPE_MGMT,
     };
     static unsigned int nr_types =
         sizeof(xc_region_types) / sizeof(xc_region_types[0]);
@@ -62,6 +64,8 @@ static void copy_from_xc_regions(libxl_nvdimm_pmem_region *tgt_regions,
 
     BUILD_BUG_ON(sizeof(libxl_nvdimm_pmem_raw_region) !=
                  sizeof(xen_sysctl_nvdimm_pmem_raw_region_t));
+    BUILD_BUG_ON(sizeof(libxl_nvdimm_pmem_mgmt_region) !=
+                 sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t));
 
     while (tgt < end) {
         memcpy((void *)tgt + offset, src, size);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1c7b8998e9..22478657ff 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -1044,6 +1044,7 @@ libxl_psr_cat_info = Struct("psr_cat_info", [
 
 libxl_nvdimm_pmem_region_type = Enumeration("nvdimm_pmem_region_type", [
     (0, "RAW"),
+    (1, "MGMT"),
     ])
 
 libxl_nvdimm_pmem_raw_region = Struct("nvdimm_pmem_raw_region", [
@@ -1052,8 +1053,15 @@ libxl_nvdimm_pmem_raw_region = Struct("nvdimm_pmem_raw_region", [
     ("pxm", uint32),
     ])
 
+libxl_nvdimm_pmem_mgmt_region = Struct("nvdimm_pmem_mgmt_region", [
+    ("smfn", uint64),
+    ("emfn", uint64),
+    ("used", uint64),
+    ])
+
 libxl_nvdimm_pmem_region = Struct("nvdimm_pmem_region", [
     ("u", KeyedUnion(None, libxl_nvdimm_pmem_region_type, "type",
                      [("raw", libxl_nvdimm_pmem_raw_region),
+                      ("mgmt", libxl_nvdimm_pmem_mgmt_region),
                      ])),
     ])
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 12a2c2d601..8a0b58493d 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -621,6 +621,7 @@ struct cmd_spec cmd_table[] = {
       "List PMEM regions of specified types, or all PMEM regions if no type is specified",
       "[options]",
       "-r, --raw   List PMEM regions detected by Xen hypervisor\n"
+      "-m, --mgmt  List PMEM regions used fro management\n"
     },
     { "pmem-setup",
       &main_pmem_setup, 0, 1,
diff --git a/tools/xl/xl_nvdimm.c b/tools/xl/xl_nvdimm.c
index 25dc6350da..e42e7a3640 100644
--- a/tools/xl/xl_nvdimm.c
+++ b/tools/xl/xl_nvdimm.c
@@ -41,8 +41,17 @@ static void show_raw_region(libxl_nvdimm_pmem_region *region, unsigned int idx)
            idx, raw->smfn, raw->emfn, raw->pxm);
 }
 
+static void show_mgmt_region(libxl_nvdimm_pmem_region *region, unsigned int idx)
+{
+    libxl_nvdimm_pmem_mgmt_region *mgmt = &region->u.mgmt;
+
+    printf(" %u: mfn 0x%lx - 0x%lx, used 0x%lx pages\n",
+           idx, mgmt->smfn, mgmt->emfn, mgmt->used);
+}
+
 static show_region_fn_t show_region_fn[] = {
     [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = show_raw_region,
+    [LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT] = show_mgmt_region,
 };
 
 static int list_regions(libxl_nvdimm_pmem_region_type type)
@@ -74,22 +83,31 @@ int main_pmem_list(int argc, char **argv)
 {
     static struct option opts[] = {
         { "raw", 0, 0, 'r' },
+        { "mgmt", 0, 0, 'm' },
         COMMON_LONG_OPTS
     };
 
-    bool all = true, raw = false;
+    bool all = true, raw = false, mgmt = false;
     int opt, ret = 0;
 
-    SWITCH_FOREACH_OPT(opt, "r", opts, "pmem-list", 0) {
+    SWITCH_FOREACH_OPT(opt, "rm", opts, "pmem-list", 0) {
     case 'r':
         all = false;
         raw = true;
         break;
+
+    case 'm':
+        all = false;
+        mgmt = true;
+        break;
     }
 
     if (all || raw)
         ret = list_regions(LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW);
 
+    if (!ret && (all || mgmt))
+        ret = list_regions(LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT);
+
     return ret;
 }
 
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 22/41] xen/pmem: support setup PMEM region for guest data usage
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (20 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 21/41] tools/xl: add option '--mgmt | -m' to xl command pmem-list Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 23/41] tools/xl: add option '--data | -d' to xl command pmem-setup Haozhong Zhang
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Allow the command XEN_SYSCTL_nvdimm_pmem_setup of hypercall
XEN_SYSCTL_nvdimm_op to setup a PMEM region for guest data
usage. After the setup, that PMEM region will be able to be
mapped to guest address space.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/include/xenctrl.h |  22 ++++++++
 tools/libxc/xc_misc.c         |  17 ++++++
 xen/common/pmem.c             | 118 +++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/sysctl.h   |   3 +-
 4 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 935885d6a7..5194d3ff5e 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2656,6 +2656,28 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
 int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
                               unsigned long smfn, unsigned long emfn);
 
+/*
+ * Setup the specified PMEM pages for guest data usage. If success,
+ * these PMEM page can be mapped to guest and be used as the backend
+ * of vNDIMM devices.
+ *
+ * Parameters:
+ *  xch:        xc interface handle
+ *  smfn, emfn: the start and end of the PMEM region
+ *  mgmt_smfn,
+
+ *  mgmt_emfn:  the start and the end MFN of the PMEM region that is
+ *              used to manage this PMEM region. It must be in one of
+ *              those added by xc_nvdimm_pmem_setup_mgmt() calls, and
+ *              not overlap with @smfn - @emfn.
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_setup_data(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn,
+                              unsigned long mgmt_smfn, unsigned long mgmt_emfn);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 77f93ffd9a..940bf61931 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -1016,6 +1016,23 @@ int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
     return rc;
 }
 
+int xc_nvdimm_pmem_setup_data(xc_interface *xch,
+                              unsigned long smfn, unsigned long emfn,
+                              unsigned long mgmt_smfn, unsigned long mgmt_emfn)
+{
+    DECLARE_SYSCTL;
+    int rc;
+
+    xc_nvdimm_pmem_setup_common(&sysctl, smfn, emfn, mgmt_smfn, mgmt_emfn);
+    sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_DATA;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( rc && sysctl.u.nvdimm.err )
+        rc = -sysctl.u.nvdimm.err;
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index e286d033f2..ed4eba7f64 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -34,16 +34,26 @@ static unsigned int nr_raw_regions;
 /*
  * All PMEM regions reserved for management purpose are linked to this
  * list. All of them must be covered by one or multiple PMEM regions
- * in list pmem_raw_regions.
+ * in list pmem_raw_regions, and not appear in list pmem_data_regions.
  */
 static LIST_HEAD(pmem_mgmt_regions);
 static DEFINE_SPINLOCK(pmem_mgmt_lock);
 static unsigned int nr_mgmt_regions;
 
+/*
+ * All PMEM regions that can be mapped to guest are linked to this
+ * list. All of them must be covered by one or multiple PMEM regions
+ * in list pmem_raw_regions, and not appear in list pmem_mgmt_regions.
+ */
+static LIST_HEAD(pmem_data_regions);
+static DEFINE_SPINLOCK(pmem_data_lock);
+static unsigned int nr_data_regions;
+
 struct pmem {
     struct list_head link; /* link to one of PMEM region list */
     unsigned long smfn;    /* start MFN of the PMEM region */
     unsigned long emfn;    /* end MFN of the PMEM region */
+    spinlock_t lock;
 
     union {
         struct {
@@ -53,6 +63,11 @@ struct pmem {
         struct {
             unsigned long used; /* # of used pages in MGMT PMEM region */
         } mgmt;
+
+        struct {
+            unsigned long mgmt_smfn; /* start MFN of management region */
+            unsigned long mgmt_emfn; /* end MFN of management region */
+        } data;
     } u;
 };
 
@@ -105,6 +120,7 @@ static int pmem_list_add(struct list_head *list,
 
     new_pmem->smfn = smfn;
     new_pmem->emfn = emfn;
+    spin_lock_init(&new_pmem->lock);
     list_add(&new_pmem->link, cur);
     if ( entry )
         *entry = new_pmem;
@@ -253,9 +269,16 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 
 static bool check_mgmt_size(unsigned long mgmt_mfns, unsigned long total_mfns)
 {
-    return mgmt_mfns >=
+    unsigned long required =
         ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
         ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
+
+    if ( required > mgmt_mfns )
+        printk(XENLOG_DEBUG "PMEM: insufficient management pages, "
+               "0x%lx pages required, 0x%lx pages available\n",
+               required, mgmt_mfns);
+
+    return mgmt_mfns >= required;
 }
 
 static bool check_address_and_pxm(unsigned long smfn, unsigned long emfn,
@@ -333,6 +356,93 @@ static int pmem_setup_mgmt(unsigned long smfn, unsigned long emfn)
     return rc;
 }
 
+static struct pmem *find_mgmt_region(unsigned long smfn, unsigned long emfn)
+{
+    struct list_head *cur;
+
+    ASSERT(spin_is_locked(&pmem_mgmt_lock));
+
+    list_for_each(cur, &pmem_mgmt_regions)
+    {
+        struct pmem *mgmt = list_entry(cur, struct pmem, link);
+
+        if ( smfn >= mgmt->smfn && emfn <= mgmt->emfn )
+            return mgmt;
+    }
+
+    return NULL;
+}
+
+static int pmem_setup_data(unsigned long smfn, unsigned long emfn,
+                           unsigned long mgmt_smfn, unsigned long mgmt_emfn)
+{
+    struct pmem *data, *mgmt = NULL;
+    unsigned long used_mgmt_mfns;
+    unsigned int pxm;
+    int rc;
+
+    if ( smfn == mfn_x(INVALID_MFN) || emfn == mfn_x(INVALID_MFN) ||
+         smfn >= emfn )
+        return -EINVAL;
+
+    /*
+     * Require the PMEM region in one proximity domain, in order to
+     * avoid the error recovery from multiple calls to pmem_arch_setup()
+     * which is not revertible.
+     */
+    if ( !check_address_and_pxm(smfn, emfn, &pxm) )
+        return -EINVAL;
+
+    if ( mgmt_smfn == mfn_x(INVALID_MFN) || mgmt_emfn == mfn_x(INVALID_MFN) ||
+         mgmt_smfn >= mgmt_emfn )
+        return -EINVAL;
+
+    spin_lock(&pmem_mgmt_lock);
+    mgmt = find_mgmt_region(mgmt_smfn, mgmt_emfn);
+    if ( !mgmt )
+    {
+        spin_unlock(&pmem_mgmt_lock);
+        return -ENXIO;
+    }
+    spin_unlock(&pmem_mgmt_lock);
+
+    spin_lock(&mgmt->lock);
+
+    mgmt_smfn = mgmt->smfn + mgmt->u.mgmt.used;
+    if ( !check_mgmt_size(mgmt_emfn - mgmt_smfn, emfn - smfn) )
+    {
+        spin_unlock(&mgmt->lock);
+        return -ENOSPC;
+    }
+
+    spin_lock(&pmem_data_lock);
+
+    rc = pmem_list_add(&pmem_data_regions, smfn, emfn, &data);
+    if ( rc )
+        goto out;
+    data->u.data.mgmt_smfn = data->u.data.mgmt_emfn = mfn_x(INVALID_MFN);
+
+    rc = pmem_arch_setup(smfn, emfn, pxm,
+                         mgmt_smfn, mgmt_emfn, &used_mgmt_mfns);
+    if ( rc )
+    {
+        pmem_list_del(data);
+        goto out;
+    }
+
+    mgmt->u.mgmt.used = mgmt_smfn - mgmt->smfn + used_mgmt_mfns;
+    data->u.data.mgmt_smfn = mgmt_smfn;
+    data->u.data.mgmt_emfn = mgmt->smfn + mgmt->u.mgmt.used;
+
+    nr_data_regions++;
+
+ out:
+    spin_unlock(&pmem_data_lock);
+    spin_unlock(&mgmt->lock);
+
+    return rc;
+}
+
 static int pmem_setup(unsigned long smfn, unsigned long emfn,
                       unsigned long mgmt_smfn, unsigned long mgmt_emfn,
                       unsigned int type)
@@ -352,6 +462,10 @@ static int pmem_setup(unsigned long smfn, unsigned long emfn,
 
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        rc = pmem_setup_data(smfn, emfn, mgmt_smfn, mgmt_emfn);
+        break;
+
     default:
         rc = -EINVAL;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 703dd860e7..d1fbb30247 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1052,6 +1052,7 @@ struct xen_sysctl_set_parameter {
 /* Types of PMEM regions */
 #define PMEM_REGION_TYPE_RAW        0 /* PMEM regions detected by Xen */
 #define PMEM_REGION_TYPE_MGMT       1 /* PMEM regions for management usage */
+#define PMEM_REGION_TYPE_DATA       2 /* PMEM regions for guest data */
 
 /* PMEM_REGION_TYPE_RAW */
 struct xen_sysctl_nvdimm_pmem_raw_region {
@@ -1107,7 +1108,7 @@ struct xen_sysctl_nvdimm_pmem_setup {
                         /* above PMEM region. If the above PMEM region is */
                         /* a management region, mgmt_{s,e}mfn is required */
                         /* to be identical to {s,e}mfn. */
-    uint8_t  type;      /* Only PMEM_REGION_TYPE_MGMT is supported now */
+    uint8_t  type;      /* Must be one of PMEM_REGION_TYPE_{MGMT, DATA} */
 };
 typedef struct xen_sysctl_nvdimm_pmem_setup xen_sysctl_nvdimm_pmem_setup_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_setup_t);
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 23/41] tools/xl: add option '--data | -d' to xl command pmem-setup
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (21 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 22/41] xen/pmem: support setup PMEM region for guest data usage Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 24/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

'xl pmem-setup --data | -d' is used to setup the specified PMEM region
for guest data usage.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl.h        | 17 +++++++++++++++++
 tools/libxl/libxl_nvdimm.c | 13 +++++++++++++
 tools/xl/xl_cmdtable.c     |  5 +++++
 tools/xl/xl_nvdimm.c       | 32 +++++++++++++++++++++++++++++---
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index e13a911cb4..c390bf227b 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2337,6 +2337,23 @@ int libxl_nvdimm_pmem_get_regions(libxl_ctx *ctx,
 int libxl_nvdimm_pmem_setup_mgmt(libxl_ctx *ctx,
                                  unsigned long smfn, unsigned long emfn);
 
+/*
+ * Setup the specified PMEM region for guest data usage.
+ *
+ * Parameters:
+ *  ctx:              libxl context
+ *  data_{smfn,emfn}: start and end MFNs of the data PMEM region
+ *  mgmt_{smfn,emfn}: start and end MFNs of the management PMEM region used to
+ *                    manage the above data PMEM region; it cannot overlap with
+ *                    the above data PMEM region
+ *
+ * Return:
+ *  0 on success; otherwise, ERROR_*, and leave errno valid.
+ */
+int libxl_nvdimm_pmem_setup_data(libxl_ctx *ctx,
+                                 unsigned long data_smfn, unsigned data_emfn,
+                                 unsigned long mgmt_smfn, unsigned mgmt_emfn);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index 896b5632ac..33eb4007ec 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -151,3 +151,16 @@ int libxl_nvdimm_pmem_setup_mgmt(libxl_ctx *ctx,
 
     return errno ? ERROR_FAIL : 0;
 }
+
+int libxl_nvdimm_pmem_setup_data(libxl_ctx *ctx,
+                                 unsigned long data_smfn, unsigned data_emfn,
+                                 unsigned long mgmt_smfn, unsigned mgmt_emfn)
+{
+    int rc = xc_nvdimm_pmem_setup_data(ctx->xch, data_smfn, data_emfn,
+                                       mgmt_smfn, mgmt_emfn);
+
+    if (rc)
+        errno = -rc;
+
+    return errno ? ERROR_FAIL : 0;
+}
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 8a0b58493d..e5d117d3b9 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -628,6 +628,11 @@ struct cmd_spec cmd_table[] = {
       "Setup a PMEM region for specified usage purpose",
       "[options]",
       "-m, --mgmt <smfn> <emfn>  Set PMEM pages smfn - emfn for management usage\n"
+      "-d, --data <smfn> <emfn> <mgmt_smfn> <mgmt_emfn>\n"
+      "                          Set PMEM pages smfn - emfn for guest data usage.\n"
+      "                          PMEM pages mgmt_smfn - mgmt_emfn are used to manage\n"
+      "                          above PMEM pages. The two types of PMEM pages cannot\n"
+      "                          overlap with each other\n"
     },
 };
 
diff --git a/tools/xl/xl_nvdimm.c b/tools/xl/xl_nvdimm.c
index e42e7a3640..ac01039144 100644
--- a/tools/xl/xl_nvdimm.c
+++ b/tools/xl/xl_nvdimm.c
@@ -118,8 +118,8 @@ int main_pmem_setup(int argc, char **argv)
         COMMON_LONG_OPTS
     };
 
-    bool mgmt = false;
-    unsigned long mgmt_smfn, mgmt_emfn;
+    bool mgmt = false, data = false;
+    unsigned long mgmt_smfn, mgmt_emfn, data_smfn, data_emfn;
     int opt, rc = 0;
 
 #define CHECK_NR_ARGS(expected, option)                                 \
@@ -137,7 +137,7 @@ int main_pmem_setup(int argc, char **argv)
         }                                                               \
     } while (0)
 
-    SWITCH_FOREACH_OPT(opt, "m:", opts, "pmem-setup", 0) {
+    SWITCH_FOREACH_OPT(opt, "m:d:", opts, "pmem-setup", 0) {
     case 'm':
         CHECK_NR_ARGS(2, "-m");
 
@@ -145,14 +145,40 @@ int main_pmem_setup(int argc, char **argv)
         mgmt_smfn = parse_ulong(optarg);
         mgmt_emfn = parse_ulong(argv[optind]);
 
+        break;
+
+    case 'd':
+        CHECK_NR_ARGS(4, "-d");
+
+        data = true;
+        data_smfn = parse_ulong(optarg);
+        data_emfn = parse_ulong(argv[optind]);
+        mgmt_smfn = parse_ulong(argv[optind + 1]);
+        mgmt_emfn = parse_ulong(argv[optind + 2]);
+
         break;
     }
 
 #undef CHECK_NR_ARGS
 
+    if (mgmt && data) {
+        fprintf(stderr,
+                "Error: '-m' and '-d' cannot be used simultaneously\n\n");
+        help("pmem-setup");
+
+        rc = ERROR_INVAL;
+        errno = EINVAL;
+
+        goto out;
+    }
+
     if (mgmt)
         rc = libxl_nvdimm_pmem_setup_mgmt(ctx, mgmt_smfn, mgmt_emfn);
 
+    if (data)
+        rc = libxl_nvdimm_pmem_setup_data(ctx, data_smfn, data_emfn,
+                                          mgmt_smfn, mgmt_emfn);
+
  out:
     if (rc)
         fprintf(stderr, "Error: pmem-setup failed, %s\n", strerror(errno));
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 24/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (22 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 23/41] tools/xl: add option '--data | -d' to xl command pmem-setup Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 25/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of
data PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/xc_misc.c | 3 ++-
 xen/common/pmem.c     | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 940bf61931..b535f83df6 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -896,7 +896,8 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr)
 
     if ( !nr ||
          (type != PMEM_REGION_TYPE_RAW &&
-          type != PMEM_REGION_TYPE_MGMT) )
+          type != PMEM_REGION_TYPE_MGMT &&
+          type != PMEM_REGION_TYPE_DATA) )
         return -EINVAL;
 
     sysctl.cmd = XEN_SYSCTL_nvdimm_op;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index ed4eba7f64..b1cefc3d70 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -154,6 +154,10 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
         regions_nr->num_regions = nr_mgmt_regions;
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        regions_nr->num_regions = nr_data_regions;
+        break;
+
     default:
         rc = -EINVAL;
     }
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 25/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (23 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 24/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 26/41] tools/xl: add option '--data | -d' to xl command pmem-list Haozhong Zhang
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of data PMEM
regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/libxc/xc_misc.c       |  8 ++++++++
 xen/common/pmem.c           | 46 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h | 12 ++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index b535f83df6..f425fb01e0 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -943,6 +943,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max;
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        size = sizeof(xen_sysctl_nvdimm_pmem_data_region_t) * max;
+        break;
+
     default:
         return -EINVAL;
     }
@@ -967,6 +971,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
         set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer);
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        set_xen_guest_handle(regions->u_buffer.data_regions, buffer);
+        break;
+
     default:
         rc = -EINVAL;
         goto out;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index b1cefc3d70..cd557c7851 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -243,6 +243,48 @@ static int pmem_get_mgmt_regions(
     return rc;
 }
 
+static int pmem_get_data_regions(
+    XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) regions,
+    unsigned int *num_regions)
+{
+    struct list_head *cur;
+    unsigned int nr = 0, max = *num_regions;
+    xen_sysctl_nvdimm_pmem_data_region_t region;
+    int rc = 0;
+
+    if ( !guest_handle_okay(regions, max * sizeof(region)) )
+        return -EINVAL;
+
+    spin_lock(&pmem_data_lock);
+
+    list_for_each(cur, &pmem_data_regions)
+    {
+        struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+        if ( nr >= max )
+            break;
+
+        region.smfn = pmem->smfn;
+        region.emfn = pmem->emfn;
+        region.mgmt_smfn = pmem->u.data.mgmt_smfn;
+        region.mgmt_emfn = pmem->u.data.mgmt_emfn;
+
+        if ( copy_to_guest_offset(regions, nr, &region, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        nr++;
+    }
+
+    spin_unlock(&pmem_data_lock);
+
+    *num_regions = nr;
+
+    return rc;
+}
+
 static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 {
     unsigned int type = regions->type, max = regions->num_regions;
@@ -261,6 +303,10 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
         rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, &max);
         break;
 
+    case PMEM_REGION_TYPE_DATA:
+        rc = pmem_get_data_regions(regions->u_buffer.data_regions, &max);
+        break;
+
     default:
         rc = -EINVAL;
     }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index d1fbb30247..c3555ced5c 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1072,6 +1072,16 @@ struct xen_sysctl_nvdimm_pmem_mgmt_region {
 typedef struct xen_sysctl_nvdimm_pmem_mgmt_region xen_sysctl_nvdimm_pmem_mgmt_region_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t);
 
+/* PMEM_REGION_TYPE_DATA */
+struct xen_sysctl_nvdimm_pmem_data_region {
+    uint64_t smfn;
+    uint64_t emfn;
+    uint64_t mgmt_smfn;
+    uint64_t mgmt_emfn;
+};
+typedef struct xen_sysctl_nvdimm_pmem_data_region xen_sysctl_nvdimm_pmem_data_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_data_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
     uint8_t type;         /* IN: one of PMEM_REGION_TYPE_* */
@@ -1092,6 +1102,8 @@ struct xen_sysctl_nvdimm_pmem_regions {
         XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
         /* if type == PMEM_REGION_TYPE_MGMT */
         XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions;
+        /* if type == PMEM_REGION_TYPE_DATA */
+        XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) data_regions;
     } u_buffer;           /* IN: the guest handler where the entries of PMEM
                                  regions of the type @type are returned */
 };
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 26/41] tools/xl: add option '--data | -d' to xl command pmem-list
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (24 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 25/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 27/41] xen/pmem: add function to map PMEM pages to HVM domain Haozhong Zhang
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

'xl pmem-list --data | -d' is used to list all data PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_nvdimm.c  |  4 ++++
 tools/libxl/libxl_types.idl |  9 +++++++++
 tools/xl/xl_cmdtable.c      |  1 +
 tools/xl/xl_nvdimm.c        | 22 ++++++++++++++++++++--
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index 33eb4007ec..0d51036794 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -32,6 +32,7 @@
 static size_t xc_pmem_region_struct_size[] = {
     [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = sizeof(libxl_nvdimm_pmem_raw_region),
     [LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT] = sizeof(libxl_nvdimm_pmem_mgmt_region),
+    [LIBXL_NVDIMM_PMEM_REGION_TYPE_DATA] = sizeof(libxl_nvdimm_pmem_data_region),
 };
 
 static int get_xc_region_type(libxl_nvdimm_pmem_region_type type,
@@ -40,6 +41,7 @@ static int get_xc_region_type(libxl_nvdimm_pmem_region_type type,
     static uint8_t xc_region_types[] = {
         [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = PMEM_REGION_TYPE_RAW,
         [LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT] = PMEM_REGION_TYPE_MGMT,
+        [LIBXL_NVDIMM_PMEM_REGION_TYPE_DATA] = PMEM_REGION_TYPE_DATA,
     };
     static unsigned int nr_types =
         sizeof(xc_region_types) / sizeof(xc_region_types[0]);
@@ -66,6 +68,8 @@ static void copy_from_xc_regions(libxl_nvdimm_pmem_region *tgt_regions,
                  sizeof(xen_sysctl_nvdimm_pmem_raw_region_t));
     BUILD_BUG_ON(sizeof(libxl_nvdimm_pmem_mgmt_region) !=
                  sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t));
+    BUILD_BUG_ON(sizeof(libxl_nvdimm_pmem_data_region) !=
+                 sizeof(xen_sysctl_nvdimm_pmem_data_region_t));
 
     while (tgt < end) {
         memcpy((void *)tgt + offset, src, size);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 22478657ff..e65bcbbb4f 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -1045,6 +1045,7 @@ libxl_psr_cat_info = Struct("psr_cat_info", [
 libxl_nvdimm_pmem_region_type = Enumeration("nvdimm_pmem_region_type", [
     (0, "RAW"),
     (1, "MGMT"),
+    (2, "DATA")
     ])
 
 libxl_nvdimm_pmem_raw_region = Struct("nvdimm_pmem_raw_region", [
@@ -1059,9 +1060,17 @@ libxl_nvdimm_pmem_mgmt_region = Struct("nvdimm_pmem_mgmt_region", [
     ("used", uint64),
     ])
 
+libxl_nvdimm_pmem_data_region = Struct("nvdimm_pmem_data_region", [
+    ("smfn", uint64),
+    ("emfn", uint64),
+    ("mgmt_smfn", uint64),
+    ("mgmt_emfn", uint64),
+    ])
+
 libxl_nvdimm_pmem_region = Struct("nvdimm_pmem_region", [
     ("u", KeyedUnion(None, libxl_nvdimm_pmem_region_type, "type",
                      [("raw", libxl_nvdimm_pmem_raw_region),
                       ("mgmt", libxl_nvdimm_pmem_mgmt_region),
+                      ("data", libxl_nvdimm_pmem_data_region),
                      ])),
     ])
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index e5d117d3b9..74297e8188 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -622,6 +622,7 @@ struct cmd_spec cmd_table[] = {
       "[options]",
       "-r, --raw   List PMEM regions detected by Xen hypervisor\n"
       "-m, --mgmt  List PMEM regions used fro management\n"
+      "-d, --data  List PMEM regions used for guest data\n"
     },
     { "pmem-setup",
       &main_pmem_setup, 0, 1,
diff --git a/tools/xl/xl_nvdimm.c b/tools/xl/xl_nvdimm.c
index ac01039144..45770157ba 100644
--- a/tools/xl/xl_nvdimm.c
+++ b/tools/xl/xl_nvdimm.c
@@ -49,9 +49,18 @@ static void show_mgmt_region(libxl_nvdimm_pmem_region *region, unsigned int idx)
            idx, mgmt->smfn, mgmt->emfn, mgmt->used);
 }
 
+static void show_data_region(libxl_nvdimm_pmem_region *region, unsigned int idx)
+{
+    libxl_nvdimm_pmem_data_region *data = &region->u.data;
+
+    printf(" %u: mfn 0x%lx - 0x%lx, mgmt mfn 0x%lx - 0x%lx\n",
+           idx, data->smfn, data->emfn, data->mgmt_smfn, data->mgmt_emfn);
+}
+
 static show_region_fn_t show_region_fn[] = {
     [LIBXL_NVDIMM_PMEM_REGION_TYPE_RAW] = show_raw_region,
     [LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT] = show_mgmt_region,
+    [LIBXL_NVDIMM_PMEM_REGION_TYPE_DATA] = show_data_region,
 };
 
 static int list_regions(libxl_nvdimm_pmem_region_type type)
@@ -84,13 +93,14 @@ int main_pmem_list(int argc, char **argv)
     static struct option opts[] = {
         { "raw", 0, 0, 'r' },
         { "mgmt", 0, 0, 'm' },
+        { "data", 0, 0, 'd' },
         COMMON_LONG_OPTS
     };
 
-    bool all = true, raw = false, mgmt = false;
+    bool all = true, raw = false, mgmt = false, data = false;
     int opt, ret = 0;
 
-    SWITCH_FOREACH_OPT(opt, "rm", opts, "pmem-list", 0) {
+    SWITCH_FOREACH_OPT(opt, "rmd", opts, "pmem-list", 0) {
     case 'r':
         all = false;
         raw = true;
@@ -100,6 +110,11 @@ int main_pmem_list(int argc, char **argv)
         all = false;
         mgmt = true;
         break;
+
+    case 'd':
+        all = false;
+        data = true;
+        break;
     }
 
     if (all || raw)
@@ -108,6 +123,9 @@ int main_pmem_list(int argc, char **argv)
     if (!ret && (all || mgmt))
         ret = list_regions(LIBXL_NVDIMM_PMEM_REGION_TYPE_MGMT);
 
+    if (!ret && (all || data))
+        ret = list_regions(LIBXL_NVDIMM_PMEM_REGION_TYPE_DATA);
+
     return ret;
 }
 
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 27/41] xen/pmem: add function to map PMEM pages to HVM domain
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (25 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 26/41] tools/xl: add option '--data | -d' to xl command pmem-list Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 28/41] xen/pmem: release PMEM pages on HVM domain destruction Haozhong Zhang
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

pmem_populate() is added to map the specifed data PMEM pages to a HVM
domain. No called is added in this commit.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/domain.c     |   3 ++
 xen/common/pmem.c       | 141 ++++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/pmem.h  |  19 +++++++
 xen/include/xen/sched.h |   3 ++
 4 files changed, 166 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7484693a87..db9226e84b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -290,6 +290,9 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
 
+    spin_lock_init(&d->pmem_lock);
+    INIT_PAGE_LIST_HEAD(&d->pmem_page_list);
+
     spin_lock_init(&d->node_affinity_lock);
     d->node_affinity = NODE_MASK_ALL;
     d->auto_node_affinity = 1;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index cd557c7851..d2c5518329 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -17,10 +17,12 @@
  */
 
 #include <xen/errno.h>
+#include <xen/event.h>
 #include <xen/list.h>
 #include <xen/iocap.h>
 #include <xen/paging.h>
 #include <xen/pmem.h>
+#include <xen/sched.h>
 
 #include <asm/guest_access.h>
 
@@ -78,6 +80,31 @@ static bool check_overlap(unsigned long smfn1, unsigned long emfn1,
            (emfn1 > smfn2 && emfn1 <= emfn2);
 }
 
+static bool check_cover(struct list_head *list,
+                        unsigned long smfn, unsigned long emfn)
+{
+    struct list_head *cur;
+    struct pmem *pmem;
+    unsigned long pmem_smfn, pmem_emfn;
+
+    list_for_each(cur, list)
+    {
+        pmem = list_entry(cur, struct pmem, link);
+        pmem_smfn = pmem->smfn;
+        pmem_emfn = pmem->emfn;
+
+        if ( smfn < pmem_smfn )
+            return false;
+
+        if ( emfn <= pmem_emfn )
+            return true;
+
+        smfn = max(smfn, pmem_emfn);
+    }
+
+    return false;
+}
+
 /**
  * Add a PMEM region to a list. All PMEM regions in the list are
  * sorted in the ascending order of the start address. A PMEM region,
@@ -592,6 +619,120 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
 
 #ifdef CONFIG_X86
 
+static int pmem_assign_page(struct domain *d, struct page_info *pg,
+                            unsigned long gfn)
+{
+    int rc;
+
+    if ( pg->count_info != (PGC_state_free | PGC_pmem_page) )
+        return -EBUSY;
+
+    pg->count_info = PGC_allocated | PGC_state_inuse | PGC_pmem_page | 1;
+    pg->u.inuse.type_info = 0;
+    page_set_owner(pg, d);
+
+    rc = guest_physmap_add_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0);
+    if ( rc )
+    {
+        page_set_owner(pg, NULL);
+        pg->count_info = PGC_state_free | PGC_pmem_page;
+
+        return rc;
+    }
+
+    spin_lock(&d->pmem_lock);
+    page_list_add_tail(pg, &d->pmem_page_list);
+    spin_unlock(&d->pmem_lock);
+
+    return 0;
+}
+
+static int pmem_unassign_page(struct domain *d, struct page_info *pg,
+                              unsigned long gfn)
+{
+    int rc;
+
+    spin_lock(&d->pmem_lock);
+    page_list_del(pg, &d->pmem_page_list);
+    spin_unlock(&d->pmem_lock);
+
+    rc = guest_physmap_remove_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0);
+
+    page_set_owner(pg, NULL);
+    pg->count_info = PGC_state_free | PGC_pmem_page;
+
+    return 0;
+}
+
+int pmem_populate(struct xen_pmem_map_args *args)
+{
+    struct domain *d = args->domain;
+    unsigned long i = args->nr_done;
+    unsigned long mfn = args->mfn + i;
+    unsigned long emfn = args->mfn + args->nr_mfns;
+    unsigned long gfn = args->gfn + i;
+    struct page_info *page;
+    int rc = 0, err = 0;
+
+    if ( unlikely(d->is_dying) )
+        return -EINVAL;
+
+    if ( !is_hvm_domain(d) )
+        return -EINVAL;
+
+    spin_lock(&pmem_data_lock);
+
+    if ( !check_cover(&pmem_data_regions, mfn, emfn) )
+    {
+        rc = -ENXIO;
+        goto out;
+    }
+
+    for ( ; mfn < emfn; i++, mfn++, gfn++ )
+    {
+        if ( i != args->nr_done && hypercall_preempt_check() )
+        {
+            args->preempted = 1;
+            rc = -ERESTART;
+            break;
+        }
+
+        page = mfn_to_page(mfn);
+        if ( !page_state_is(page, free) )
+        {
+            rc = -EBUSY;
+            break;
+        }
+
+        rc = pmem_assign_page(d, page, gfn);
+        if ( rc )
+            break;
+    }
+
+ out:
+    if ( rc && rc != -ERESTART )
+        while ( i-- && !err )
+            err = pmem_unassign_page(d, mfn_to_page(--mfn), --gfn);
+
+    spin_unlock(&pmem_data_lock);
+
+    if ( unlikely(err) )
+    {
+        /*
+         * If we unfortunately fails to recover from the previous
+         * failure, some PMEM pages may still be mapped to the
+         * domain. As pmem_populate() is now called only during domain
+         * creation, let's crash the domain.
+         */
+        domain_crash(d);
+        rc = err;
+    }
+
+    args->nr_done = i;
+
+    return rc;
+}
+
 int __init pmem_dom0_setup_permission(struct domain *d)
 {
     struct list_head *cur;
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 9323d679a6..2dab90530b 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -33,6 +33,20 @@ int pmem_arch_setup(unsigned long smfn, unsigned long emfn, unsigned int pxm,
                     unsigned long mgmt_smfn, unsigned long mgmt_emfn,
                     unsigned long *used_mgmt_mfns);
 
+struct xen_pmem_map_args {
+    struct domain *domain;
+
+    unsigned long mfn;     /* start MFN of pmems page to be mapped */
+    unsigned long gfn;     /* start GFN of target domain */
+    unsigned long nr_mfns; /* number of pmem pages to be mapped */
+
+    /* For preemption ... */
+    unsigned long nr_done; /* number of pmem pages processed so far */
+    int preempted;         /* Is the operation preempted? */
+};
+
+int pmem_populate(struct xen_pmem_map_args *args);
+
 #else /* !CONFIG_X86 */
 
 static inline int pmem_dom0_setup_permission(...)
@@ -45,6 +59,11 @@ static inline int pmem_arch_setup(...)
     return -ENOSYS;
 }
 
+static inline int pmem_populate(...)
+{
+    return -ENOSYS;
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* CONFIG_NVDIMM_PMEM */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 002ba29d6d..a4a901d7ea 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -323,6 +323,9 @@ struct domain
     atomic_t         shr_pages;       /* number of shared pages             */
     atomic_t         paged_pages;     /* number of paged-out pages          */
 
+    spinlock_t       pmem_lock;       /* protect all following pmem_ fields */
+    struct page_list_head pmem_page_list; /* linked list of PMEM pages      */
+
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
     struct cpupool  *cpupool;
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 28/41] xen/pmem: release PMEM pages on HVM domain destruction
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (26 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 27/41] xen/pmem: add function to map PMEM pages to HVM domain Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 29/41] xen: add hypercall XENMEM_populate_pmem_map Haozhong Zhang
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

A new step RELMEM_pmem is added and taken before RELMEM_xen to release
all PMEM pages mapped to a HVM domain.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/domain.c        | 32 ++++++++++++++++++++++++++++----
 xen/arch/x86/mm.c            |  9 +++++++--
 xen/common/pmem.c            | 10 ++++++++++
 xen/include/asm-x86/domain.h |  1 +
 xen/include/xen/pmem.h       |  6 ++++++
 5 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e1bf2d9e9d..613a8b4250 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1801,11 +1801,15 @@ static int relinquish_memory(
 {
     struct page_info  *page;
     unsigned long     x, y;
+    bool              is_pmem_list = (list == &d->pmem_page_list);
     int               ret = 0;
 
     /* Use a recursive lock, as we may enter 'free_domheap_page'. */
     spin_lock_recursive(&d->page_alloc_lock);
 
+    if ( is_pmem_list )
+        spin_lock(&d->pmem_lock);
+
     while ( (page = page_list_remove_head(list)) )
     {
         /* Grab a reference to the page so it won't disappear from under us. */
@@ -1887,8 +1891,9 @@ static int relinquish_memory(
             }
         }
 
-        /* Put the page on the list and /then/ potentially free it. */
-        page_list_add_tail(page, &d->arch.relmem_list);
+        if ( !is_pmem_list )
+            /* Put the page on the list and /then/ potentially free it. */
+            page_list_add_tail(page, &d->arch.relmem_list);
         put_page(page);
 
         if ( hypercall_preempt_check() )
@@ -1898,10 +1903,13 @@ static int relinquish_memory(
         }
     }
 
-    /* list is empty at this point. */
-    page_list_move(list, &d->arch.relmem_list);
+    if ( !is_pmem_list )
+        /* list is empty at this point. */
+        page_list_move(list, &d->arch.relmem_list);
 
  out:
+    if ( is_pmem_list )
+        spin_unlock(&d->pmem_lock);
     spin_unlock_recursive(&d->page_alloc_lock);
     return ret;
 }
@@ -1968,13 +1976,29 @@ int domain_relinquish_resources(struct domain *d)
                 return ret;
         }
 
+#ifndef CONFIG_NVDIMM_PMEM
         d->arch.relmem = RELMEM_xen;
+#else
+        d->arch.relmem = RELMEM_pmem;
+#endif
 
         spin_lock(&d->page_alloc_lock);
         page_list_splice(&d->arch.relmem_list, &d->page_list);
         INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
         spin_unlock(&d->page_alloc_lock);
 
+#ifdef CONFIG_NVDIMM_PMEM
+        /* Fallthrough. Relinquish every page of PMEM. */
+    case RELMEM_pmem:
+        if ( is_hvm_domain(d) )
+        {
+            ret = relinquish_memory(d, &d->pmem_page_list, ~0UL);
+            if ( ret )
+                return ret;
+        }
+        d->arch.relmem = RELMEM_xen;
+#endif
+
         /* Fallthrough. Relinquish every page of memory. */
     case RELMEM_xen:
         ret = relinquish_memory(d, &d->xenpage_list, ~0UL);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9a224cf1bb..9386e88eb1 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -106,6 +106,7 @@
 #include <xen/efi.h>
 #include <xen/grant_table.h>
 #include <xen/hypercall.h>
+#include <xen/pmem.h>
 #include <asm/paging.h>
 #include <asm/shadow.h>
 #include <asm/page.h>
@@ -2306,8 +2307,12 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. */
-             && cleanup_page_cacheattr(page) == 0 )
+#ifdef CONFIG_NVDIMM_PMEM
+        if ( is_pmem_page(page) )
+            pmem_page_cleanup(page);
+        else
+#endif
+        if ( cleanup_page_cacheattr(page) == 0 )
             free_domheap_page(page);
         else
             gdprintk(XENLOG_WARNING,
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index d2c5518329..a0d23cdfbe 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -733,6 +733,16 @@ int pmem_populate(struct xen_pmem_map_args *args)
     return rc;
 }
 
+void pmem_page_cleanup(struct page_info *page)
+{
+    ASSERT(is_pmem_page(page));
+    ASSERT((page->count_info & PGC_count_mask) == 0);
+
+    page->count_info = PGC_pmem_page | PGC_state_free;
+    page_set_owner(page, NULL);
+    set_gpfn_from_mfn(page_to_mfn(page), INVALID_M2P_ENTRY);
+}
+
 int __init pmem_dom0_setup_permission(struct domain *d)
 {
     struct list_head *cur;
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index f69911918e..e6f575244d 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -305,6 +305,7 @@ struct arch_domain
     enum {
         RELMEM_not_started,
         RELMEM_shared,
+        RELMEM_pmem,
         RELMEM_xen,
         RELMEM_l4,
         RELMEM_l3,
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 2dab90530b..dfbc412065 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -21,6 +21,7 @@
 #ifdef CONFIG_NVDIMM_PMEM
 
 #include <public/sysctl.h>
+#include <xen/mm.h>
 #include <xen/types.h>
 
 int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
@@ -46,6 +47,7 @@ struct xen_pmem_map_args {
 };
 
 int pmem_populate(struct xen_pmem_map_args *args);
+void pmem_page_cleanup(struct page_info *page);
 
 #else /* !CONFIG_X86 */
 
@@ -64,6 +66,10 @@ static inline int pmem_populate(...)
     return -ENOSYS;
 }
 
+static inline void pmem_page_cleanup(...)
+{
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* CONFIG_NVDIMM_PMEM */
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 29/41] xen: add hypercall XENMEM_populate_pmem_map
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (27 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 28/41] xen/pmem: release PMEM pages on HVM domain destruction Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 30/41] tools: reserve extra guest memory for ACPI from device model Haozhong Zhang
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams, Daniel De Graaf

This hypercall will be used by device models to map host PMEM pages to
guest.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 tools/flask/policy/modules/xen.if   |  3 ++-
 tools/libxc/include/xenctrl.h       | 17 ++++++++++++++
 tools/libxc/xc_domain.c             | 15 +++++++++++++
 xen/common/compat/memory.c          |  1 +
 xen/common/memory.c                 | 44 +++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h         | 14 +++++++++++-
 xen/include/xsm/dummy.h             | 11 ++++++++++
 xen/include/xsm/xsm.h               | 12 ++++++++++
 xen/xsm/dummy.c                     |  4 ++++
 xen/xsm/flask/hooks.c               | 13 +++++++++++
 xen/xsm/flask/policy/access_vectors |  2 ++
 11 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index 55437496f6..8c2d6776f4 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -55,7 +55,8 @@ define(`create_domain_common', `
 			psr_cmt_op psr_cat_op soft_reset set_gnttab_limits };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
-	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
+	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp
+			populate_pmem_map };
 	allow $1 $2:grant setup;
 	allow $1 $2:hvm { cacheattr getparam hvmctl sethvmc
 			setparam nested altp2mhvm altp2mhvm_op dm };
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 5194d3ff5e..4d66cbed0b 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2678,6 +2678,23 @@ int xc_nvdimm_pmem_setup_data(xc_interface *xch,
                               unsigned long smfn, unsigned long emfn,
                               unsigned long mgmt_smfn, unsigned long mgmt_emfn);
 
+/*
+ * Map specified host PMEM pages to the specified guest address.
+ *
+ * Parameters:
+ *  xch:     xc interface handle
+ *  domid:   the target domain id
+ *  mfn:     the start MFN of the PMEM pages
+ *  gfn:     the start GFN of the target guest physical pages
+ *  nr_mfns: the number of PMEM pages to be mapped
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid,
+                                unsigned long mfn, unsigned long gfn,
+                                unsigned long nr_mfns);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 3ccd27f101..a62470e6d8 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2435,6 +2435,21 @@ int xc_domain_soft_reset(xc_interface *xch,
     domctl.domain = domid;
     return do_domctl(xch, &domctl);
 }
+
+int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid,
+                                unsigned long mfn, unsigned long gfn,
+                                unsigned long nr_mfns)
+{
+    struct xen_pmem_map args = {
+        .domid   = domid,
+        .mfn     = mfn,
+        .gfn     = gfn,
+        .nr_mfns = nr_mfns,
+    };
+
+    return do_memory_op(xch, XENMEM_populate_pmem_map, &args, sizeof(args));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 35bb259808..51bec835b9 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -525,6 +525,7 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
         case XENMEM_add_to_physmap:
         case XENMEM_remove_from_physmap:
         case XENMEM_access_op:
+        case XENMEM_populate_pmem_map:
             break;
 
         case XENMEM_get_vnumainfo:
diff --git a/xen/common/memory.c b/xen/common/memory.c
index a6ba33fdcb..2f870ad2b6 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -23,6 +23,7 @@
 #include <xen/numa.h>
 #include <xen/mem_access.h>
 #include <xen/trace.h>
+#include <xen/pmem.h>
 #include <asm/current.h>
 #include <asm/hardirq.h>
 #include <asm/p2m.h>
@@ -1408,6 +1409,49 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 #endif
 
+#ifdef CONFIG_NVDIMM_PMEM
+    case XENMEM_populate_pmem_map:
+    {
+        struct xen_pmem_map map;
+        struct xen_pmem_map_args args;
+
+        if ( copy_from_guest(&map, arg, 1) )
+            return -EFAULT;
+
+        if ( map.domid == DOMID_SELF )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_any_id(map.domid);
+        if ( !d )
+            return -EINVAL;
+
+        rc = xsm_populate_pmem_map(XSM_TARGET, curr_d, d);
+        if ( rc )
+        {
+            rcu_unlock_domain(d);
+            return rc;
+        }
+
+        args.domain = d;
+        args.mfn = map.mfn;
+        args.gfn = map.gfn;
+        args.nr_mfns = map.nr_mfns;
+        args.nr_done = start_extent;
+        args.preempted = 0;
+
+        rc = pmem_populate(&args);
+
+        rcu_unlock_domain(d);
+
+        if ( rc == -ERESTART && args.preempted )
+            return hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "lh",
+                op | (args.nr_done << MEMOP_EXTENT_SHIFT), arg);
+
+        break;
+    }
+#endif /* CONFIG_NVDIMM_PMEM */
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 29386df98b..d74436e4b0 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -650,7 +650,19 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 28 */
+#define XENMEM_populate_pmem_map 28
+
+struct xen_pmem_map {
+    /* IN */
+    domid_t domid;
+    unsigned long mfn;
+    unsigned long gfn;
+    unsigned int nr_mfns;
+};
+typedef struct xen_pmem_map xen_pmem_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmem_map_t);
+
+/* Next available subop number is 29 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index b2cd56cdc5..1eb6595cfa 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -724,3 +724,14 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
+
+#ifdef CONFIG_NVDIMM_PMEM
+
+static XSM_INLINE int xsm_populate_pmem_map(XSM_DEFAULT_ARG
+                                            struct domain *d1, struct domain *d2)
+{
+    XSM_ASSERT_ACTION(XSM_TARGET);
+    return xsm_default_action(action, d1, d2);
+}
+
+#endif /* CONFIG_NVDIMM_PMEM */
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 7f7feffc68..e43e79f719 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -180,6 +180,10 @@ struct xsm_operations {
     int (*dm_op) (struct domain *d);
 #endif
     int (*xen_version) (uint32_t cmd);
+
+#ifdef CONFIG_NVDIMM_PMEM
+    int (*populate_pmem_map) (struct domain *d1, struct domain *d2);
+#endif
 };
 
 #ifdef CONFIG_XSM
@@ -692,6 +696,14 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
     return xsm_ops->xen_version(op);
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+static inline int xsm_populate_pmem_map(xsm_default_t def,
+                                        struct domain *d1, struct domain *d2)
+{
+    return xsm_ops->populate_pmem_map(d1, d2);
+}
+#endif /* CONFIG_NVDIMM_PMEM */
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 479b103614..4d65eaca61 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -157,4 +157,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, dm_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
+
+#ifdef CONFIG_NVDIMM_PMEM
+    set_to_dummy_if_null(ops, populate_pmem_map);
+#endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index f677755512..47cfb81d64 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1722,6 +1722,15 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+
+static int flask_populate_pmem_map(struct domain *d1, struct domain *d2)
+{
+    return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__POPULATE_PMEM_MAP);
+}
+
+#endif /* CONFIG_NVDIMM_PMEM */
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1855,6 +1864,10 @@ static struct xsm_operations flask_ops = {
     .dm_op = flask_dm_op,
 #endif
     .xen_version = flask_xen_version,
+
+#ifdef CONFIG_NVDIMM_PMEM
+    .populate_pmem_map = flask_populate_pmem_map,
+#endif /* CONFIG_NVDIMM_PMEM */
 };
 
 void __init flask_init(const void *policy_buffer, size_t policy_size)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 3bfbb892c7..daa6937c22 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -389,6 +389,8 @@ class mmu
 # Allow a privileged domain to install a map of a page it does not own.  Used
 # for stub domain device models with the PV framebuffer.
     target_hack
+# XENMEM_populate_pmem_map
+    populate_pmem_map
 }
 
 # control of the paging_domctl split by subop
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 30/41] tools: reserve extra guest memory for ACPI from device model
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (28 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 29/41] xen: add hypercall XENMEM_populate_pmem_map Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 31/41] tools/libacpi: add callback to translate GPA to GVA Haozhong Zhang
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

The device model may need extra guest memory to store the generated
data when building guest ACPI. For example, QEMU puts the unpatched
ACPI and BIOSLinkerLoader ROMs in guest memory, which can be patched
and loaded by the guest firmware later. Though the default value can
be implied from the type of device model and the domain config, we
still add an xl domain config 'dm_acpi_size' to allow users to specify
the required size when the default value is not enough.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 docs/man/xl.cfg.pod.5.in    |  7 +++++++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/libxl_x86.c     |  7 ++++++-
 tools/xl/xl_parse.c         | 18 +++++++++++++++++-
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index b7b91d8627..1f9538c445 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1607,6 +1607,13 @@ table. True (1) by default.
 Include the Windows laptop/slate mode switch device in the virtual
 firmware ACPI table. False (0) by default.
 
+=item B<dm_acpi_size=NUMBER>
+
+B<(x86 HVM only)> Reserve the specified bytes of guest memory for ACPI
+built by the device model. The default value is determined according
+to which device model is in use and which ACPI tables are provided by
+that device model.
+
 =item B<apic=BOOLEAN>
 
 B<(x86 only)> Include information regarding APIC (Advanced Programmable Interrupt
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e65bcbbb4f..053b1c0b9a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -581,6 +581,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
                                        ("mca_caps",         uint64),
+                                       ("dm_acpi_size",     uint64),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 5f91fe4f92..35ccc7b483 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -364,7 +364,12 @@ int libxl__arch_extra_memory(libxl__gc *gc,
                              const libxl_domain_build_info *info,
                              uint64_t *out)
 {
-    *out = LIBXL_MAXMEM_CONSTANT;
+    uint64_t dm_acpi_size = 0;
+
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM)
+        dm_acpi_size = info->u.hvm.dm_acpi_size;
+
+    *out = LIBXL_MAXMEM_CONSTANT + DIV_ROUNDUP(dm_acpi_size, 1024);
 
     return 0;
 }
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 993b754c0a..0a43a4876e 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -857,7 +857,7 @@ void parse_config_data(const char *config_source,
                        libxl_domain_config *d_config)
 {
     const char *buf;
-    long l, vcpus = 0;
+    long l, vcpus = 0, dm_acpi_size = 0;
     XLU_Config *config;
     XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms,
                    *usbctrls, *usbdevs, *p9devs, *vdispls;
@@ -2118,6 +2118,22 @@ skip_usbdev:
 
 #undef parse_extra_args
 
+    if (b_info->type == LIBXL_DOMAIN_TYPE_HVM &&
+        b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
+        /* parse 'dm_acpi_size' */
+        e = xlu_cfg_get_long(config, "dm_acpi_size", &dm_acpi_size, 0);
+        if (e && e != ESRCH) {
+            fprintf(stderr, "ERROR: unable to parse dm_acpi_size.\n");
+            exit(-ERROR_FAIL);
+        }
+        if (!e && dm_acpi_size <= 0) {
+            fprintf(stderr, "ERROR: require positive dm_acpi_size.\n");
+            exit(-ERROR_FAIL);
+        }
+
+        b_info->u.hvm.dm_acpi_size = dm_acpi_size;
+    }
+
     /* If we've already got vfb=[] for PV guest then ignore top level
      * VNC config. */
     if (c_info->type == LIBXL_DOMAIN_TYPE_PV && !d_config->num_vfbs) {
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 31/41] tools/libacpi: add callback to translate GPA to GVA
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (29 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 30/41] tools: reserve extra guest memory for ACPI from device model Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 32/41] tools/libacpi: build a DM ACPI signature blacklist Haozhong Zhang
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Chao Peng, Dan Williams

The location of ACPI blobs passed from device modeil is offered in
guest physical address. libacpi needs to convert the guest physical
address to guest virtual address before it can access those ACPI
blobs.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c |  6 ++++++
 tools/firmware/hvmloader/util.h |  1 +
 tools/libacpi/libacpi.h         |  1 +
 tools/libxl/libxl_x86_acpi.c    | 10 ++++++++++
 4 files changed, 18 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 76a61ee052..e4a37b3310 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -878,6 +878,11 @@ static unsigned long acpi_v2p(struct acpi_ctxt *ctxt, void *v)
     return virt_to_phys(v);
 }
 
+static void *acpi_p2v(struct acpi_ctxt *ctxt, unsigned long p)
+{
+    return phys_to_virt(p);
+}
+
 static void *acpi_mem_alloc(struct acpi_ctxt *ctxt,
                             uint32_t size, uint32_t align)
 {
@@ -996,6 +1001,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     ctxt.mem_ops.alloc = acpi_mem_alloc;
     ctxt.mem_ops.free = acpi_mem_free;
     ctxt.mem_ops.v2p = acpi_v2p;
+    ctxt.mem_ops.p2v = acpi_p2v;
 
     acpi_build_tables(&ctxt, config);
 
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7bca6418d2..e32b83e721 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -203,6 +203,7 @@ xen_pfn_t mem_hole_alloc(uint32_t nr_mfns);
 /* Allocate memory in a reserved region below 4GB. */
 void *mem_alloc(uint32_t size, uint32_t align);
 #define virt_to_phys(v) ((unsigned long)(v))
+#define phys_to_virt(p) ((void *)(p))
 
 /* Allocate memory in a scratch region */
 void *scratch_alloc(uint32_t size, uint32_t align);
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index a2efd23b0b..5953c7887c 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -51,6 +51,7 @@ struct acpi_ctxt {
         void *(*alloc)(struct acpi_ctxt *ctxt, uint32_t size, uint32_t align);
         void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size);
         unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
+        void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p);
     } mem_ops;
 };
 
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 9a7c90467d..ce251ffa59 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -52,6 +52,15 @@ static unsigned long virt_to_phys(struct acpi_ctxt *ctxt, void *v)
             libxl_ctxt->alloc_base_paddr);
 }
 
+static void *phys_to_virt(struct acpi_ctxt *ctxt, unsigned long p)
+{
+    struct libxl_acpi_ctxt *libxl_ctxt =
+        CONTAINER_OF(ctxt, struct libxl_acpi_ctxt, c);
+
+    return (void *)((p - libxl_ctxt->alloc_base_paddr) +
+                    libxl_ctxt->alloc_base_vaddr);
+}
+
 static void *mem_alloc(struct acpi_ctxt *ctxt,
                        uint32_t size, uint32_t align)
 {
@@ -180,6 +189,7 @@ int libxl__dom_load_acpi(libxl__gc *gc,
 
     libxl_ctxt.c.mem_ops.alloc = mem_alloc;
     libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
+    libxl_ctxt.c.mem_ops.p2v = phys_to_virt;
     libxl_ctxt.c.mem_ops.free = acpi_mem_free;
 
     rc = init_acpi_config(gc, dom, b_info, &config);
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 32/41] tools/libacpi: build a DM ACPI signature blacklist
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (30 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 31/41] tools/libacpi: add callback to translate GPA to GVA Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface Haozhong Zhang
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Jan Beulich, Chao Peng,
	Dan Williams

Some guest ACPI tables are built by Xen and should not be loaded from
device model (DM). We add signatures of Xen-built ACPI tables except
SSDT in a blacklist, so that we can check DM-built ACPI tables against
it later.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libacpi/build.c   | 67 +++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h |  3 +++
 2 files changed, 70 insertions(+)

diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9604..f2185589f8 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -56,6 +56,55 @@ struct acpi_info {
     uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
 };
 
+/*
+ * DM ACPI table signature blacklist:
+ *
+ * Signatures of ACPI tables, which are built by Xen and can have
+ * only one instance, are added to this blacklist. Xen prevents to
+ * load ACPI tables from the device model, which have signatures in
+ * the blacklist.
+ *
+ * Because the max number of ACPI tables allowed by libacpi is small,
+ * we statically allocate the space for the blacklist. We reserve a
+ * little larger space for non-secondary tables (including RSDP, RSDT,
+ * XSDT, DSDT, FADT v1.0 and FADT so far).
+ */
+
+#define NR_SIGNATURE_BLACKLIST_ENTS (8 + ACPI_MAX_SECONDARY_TABLES)
+static uint64_t dm_acpi_signature_blacklist[NR_SIGNATURE_BLACKLIST_ENTS];
+
+static int dm_acpi_blacklist_signature(struct acpi_config *config, uint64_t sig)
+{
+    unsigned int i;
+
+    if ( !(config->table_flags & ACPI_HAS_DM) )
+        return 0;
+
+    for ( i = 0; i < NR_SIGNATURE_BLACKLIST_ENTS; i++ )
+    {
+        uint64_t entry = dm_acpi_signature_blacklist[i];
+
+        if ( entry == sig )
+            return 0;
+        else if ( entry == 0 )
+            break;
+    }
+
+    if ( i >= NR_SIGNATURE_BLACKLIST_ENTS )
+    {
+        config->table_flags &= ~ACPI_HAS_DM;
+
+        printf("ERROR: DM ACPI signature blacklist is full (size %u), "
+               "disable DM ACPI\n", NR_SIGNATURE_BLACKLIST_ENTS);
+
+        return -ENOSPC;
+    }
+
+    dm_acpi_signature_blacklist[i] = sig;
+
+    return 0;
+}
+
 static void set_checksum(
     void *table, uint32_t checksum_offset, uint32_t length)
 {
@@ -360,6 +409,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         madt = construct_madt(ctxt, config, info);
         if (!madt) return -1;
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, madt);
+        dm_acpi_blacklist_signature(config, madt->header.signature);
     }
 
     /* HPET. */
@@ -368,6 +418,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         hpet = construct_hpet(ctxt, config);
         if (!hpet) return -1;
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, hpet);
+        dm_acpi_blacklist_signature(config, hpet->header.signature);
     }
 
     /* WAET. */
@@ -377,6 +428,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         if ( !waet )
             return -1;
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, waet);
+        dm_acpi_blacklist_signature(config, waet->header.signature);
     }
 
     if ( config->table_flags & ACPI_HAS_SSDT_PM )
@@ -450,6 +502,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
                          offsetof(struct acpi_header, checksum),
                          tcpa->header.length);
         }
+        dm_acpi_blacklist_signature(config, tcpa->header.signature);
     }
 
     /* SRAT and SLIT */
@@ -459,11 +512,17 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         struct acpi_20_slit *slit = construct_slit(ctxt, config);
 
         if ( srat )
+        {
             table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, srat);
+            dm_acpi_blacklist_signature(config, srat->header.signature);
+        }
         else
             printf("Failed to build SRAT, skipping...\n");
         if ( slit )
+        {
             table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, slit);
+            dm_acpi_blacklist_signature(config, slit->header.signature);
+        }
         else
             printf("Failed to build SLIT, skipping...\n");
     }
@@ -543,6 +602,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     facs = ctxt->mem_ops.alloc(ctxt, sizeof(struct acpi_20_facs), 16);
     if (!facs) goto oom;
     memcpy(facs, &Facs, sizeof(struct acpi_20_facs));
+    dm_acpi_blacklist_signature(config, facs->signature);
 
     /*
      * Alternative DSDTs we get linked against. A cover-all DSDT for up to the
@@ -564,6 +624,8 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
         if (!dsdt) goto oom;
         memcpy(dsdt, config->dsdt_anycpu, config->dsdt_anycpu_len);
     }
+    dm_acpi_blacklist_signature(config,
+                                ((struct acpi_header *)dsdt)->signature);
 
     /*
      * N.B. ACPI 1.0 operating systems may not handle FADT with revision 2
@@ -583,6 +645,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(fadt_10,
                  offsetof(struct acpi_header, checksum),
                  sizeof(struct acpi_10_fadt));
+    dm_acpi_blacklist_signature(config, fadt_10->header.signature);
 
     switch ( config->acpi_revision )
     {
@@ -634,6 +697,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
         fadt->iapc_boot_arch |= ACPI_FADT_NO_CMOS_RTC;
     }
     set_checksum(fadt, offsetof(struct acpi_header, checksum), fadt_size);
+    dm_acpi_blacklist_signature(config, fadt->header.signature);
 
     nr_secondaries = construct_secondary_tables(ctxt, secondary_tables,
                  config, acpi_info);
@@ -652,6 +716,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(xsdt,
                  offsetof(struct acpi_header, checksum),
                  xsdt->header.length);
+    dm_acpi_blacklist_signature(config, xsdt->header.signature);
 
     rsdt = ctxt->mem_ops.alloc(ctxt, sizeof(struct acpi_20_rsdt) +
                                sizeof(uint32_t) * nr_secondaries,
@@ -665,6 +730,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(rsdt,
                  offsetof(struct acpi_header, checksum),
                  rsdt->header.length);
+    dm_acpi_blacklist_signature(config, rsdt->header.signature);
 
     /*
      * Fill in low-memory data structures: acpi_info and RSDP.
@@ -680,6 +746,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(rsdp,
                  offsetof(struct acpi_20_rsdp, extended_checksum),
                  sizeof(struct acpi_20_rsdp));
+    dm_acpi_blacklist_signature(config, rsdp->signature);
 
     if ( !new_vm_gid(ctxt, config, acpi_info) )
         goto oom;
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 5953c7887c..748a7e4a73 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -36,6 +36,9 @@
 #define ACPI_HAS_8042              (1<<13)
 #define ACPI_HAS_CMOS_RTC          (1<<14)
 #define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
+#define ACPI_HAS_QEMU_XEN          (1<<16)
+
+#define ACPI_HAS_DM                ACPI_HAS_QEMU_XEN
 
 struct xen_vmemrange;
 struct acpi_numa {
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (31 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 32/41] tools/libacpi: build a DM ACPI signature blacklist Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2018-02-27 17:37   ` Anthony PERARD
  2018-02-27 18:03   ` Anthony PERARD
  2017-12-07 10:10 ` [RFC XEN PATCH v4 34/41] tools/libacpi: probe QEMU ACPI ROMs via " Haozhong Zhang
                   ` (9 subsequent siblings)
  42 siblings, 2 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Add a function in libacpi to detect QEMU fw_cfg interface. Limit the
usage of fw_cfg interface to hvmloader now, so use stub functions for
others.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/Makefile |  3 +-
 tools/firmware/hvmloader/util.c   |  3 ++
 tools/libacpi/build.c             |  1 +
 tools/libacpi/libacpi.h           |  4 +++
 tools/libacpi/qemu_fw_cfg.c       | 66 +++++++++++++++++++++++++++++++++++++++
 tools/libacpi/qemu_stub.c         | 39 +++++++++++++++++++++++
 tools/libxl/Makefile              |  3 +-
 7 files changed, 117 insertions(+), 2 deletions(-)
 create mode 100644 tools/libacpi/qemu_fw_cfg.c
 create mode 100644 tools/libacpi/qemu_stub.c

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index a5b4c32c1a..53b99e2c28 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -76,11 +76,12 @@ smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
 
 ACPI_PATH = ../../libacpi
 DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
-ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
+ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o qemu_fw_cfg.o
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
 CFLAGS += -I$(ACPI_PATH)
 vpath build.c $(ACPI_PATH)
 vpath static_tables.c $(ACPI_PATH)
+vpath qemu_fw_cfg.c $(ACPI_PATH)
 OBJS += $(ACPI_OBJS)
 
 hvmloader: $(OBJS)
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index e4a37b3310..be4b64a379 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -979,6 +979,9 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     if ( !strncmp(xenstore_read("platform/acpi_laptop_slate", "0"), "1", 1)  )
         config->table_flags |= ACPI_HAS_SSDT_LAPTOP_SLATE;
 
+    if ( fw_cfg_exists() )
+        config->table_flags |= ACPI_HAS_QEMU_XEN;
+
     config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC |
                             ACPI_HAS_WAET | ACPI_HAS_PMTIMER |
                             ACPI_HAS_BUTTONS | ACPI_HAS_VGA |
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f2185589f8..46051c46ac 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -531,6 +531,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
     nr_tables += construct_passthrough_tables(ctxt, table_ptrs,
                                               nr_tables, config);
 
+
     table_ptrs[nr_tables] = 0;
     return nr_tables;
 }
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 748a7e4a73..80403f04ab 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -20,6 +20,8 @@
 #ifndef __LIBACPI_H__
 #define __LIBACPI_H__
 
+#include <stdbool.h>
+
 #define ACPI_HAS_COM1              (1<<0)
 #define ACPI_HAS_COM2              (1<<1)
 #define ACPI_HAS_LPT1              (1<<2)
@@ -104,6 +106,8 @@ struct acpi_config {
 
 int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config);
 
+bool fw_cfg_exists(void);
+
 #endif /* __LIBACPI_H__ */
 
 /*
diff --git a/tools/libacpi/qemu_fw_cfg.c b/tools/libacpi/qemu_fw_cfg.c
new file mode 100644
index 0000000000..254d2f575d
--- /dev/null
+++ b/tools/libacpi/qemu_fw_cfg.c
@@ -0,0 +1,66 @@
+/*
+ * libacpi/qemu_fw_cfg.c
+ *
+ * Driver of QEMU fw_cfg interface. The reference document can be found at
+ * https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt.
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include LIBACPI_STDUTILS
+#include "libacpi.h"
+
+/* QEMU fw_cfg I/O ports on x86 */
+#define FW_CFG_PORT_SEL         0x510
+#define FW_CFG_PORT_DATA        0x511
+
+/* QEMU fw_cfg entries */
+#define FW_CFG_SIGNATURE        0x0000
+
+static inline void fw_cfg_select(uint16_t entry)
+{
+    outw(FW_CFG_PORT_SEL, entry);
+}
+
+static inline void fw_cfg_read(void *buf, uint32_t len)
+{
+    while ( len-- )
+        *(uint8_t *)(buf++) = inb(FW_CFG_PORT_DATA);
+}
+
+static void fw_cfg_read_entry(uint16_t entry, void *buf, uint32_t len)
+{
+    fw_cfg_select(entry);
+    fw_cfg_read(buf, len);
+}
+
+bool fw_cfg_exists(void)
+{
+    uint32_t sig;
+
+    fw_cfg_read_entry(FW_CFG_SIGNATURE, &sig, sizeof(sig));
+
+    return sig == 0x554d4551 /* "QEMU" */;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libacpi/qemu_stub.c b/tools/libacpi/qemu_stub.c
new file mode 100644
index 0000000000..6506de2d9c
--- /dev/null
+++ b/tools/libacpi/qemu_stub.c
@@ -0,0 +1,39 @@
+/*
+ * libacpi/qemu_stub.c
+ *
+ * Stub functions of QEMU drivers. QEMU drivers are only used with
+ * HVMLoader now. Add stub functions to ensure libacpi can be compiled
+ * with others.
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include LIBACPI_STDUTILS
+#include "libacpi.h"
+
+bool fw_cfg_exists(void)
+{
+    return false;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index a6f2dbd1cf..b6a8f662ed 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -78,11 +78,12 @@ endif
 
 ACPI_PATH  = $(XEN_ROOT)/tools/libacpi
 DSDT_FILES-$(CONFIG_X86) = dsdt_pvh.c
-ACPI_OBJS  = $(patsubst %.c,%.o,$(DSDT_FILES-y)) build.o static_tables.o
+ACPI_OBJS  = $(patsubst %.c,%.o,$(DSDT_FILES-y)) build.o static_tables.o qemu_stub.o
 $(DSDT_FILES-y): acpi
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/libxl_x86_acpi.h\"
 vpath build.c $(ACPI_PATH)/
 vpath static_tables.c $(ACPI_PATH)/
+vpath qemu_stub.c $(ACPI_PATH)/
 LIBXL_OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
 
 .PHONY: acpi
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 34/41] tools/libacpi: probe QEMU ACPI ROMs via fw_cfg interface
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (32 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2018-02-27 17:56   ` Anthony PERARD
  2017-12-07 10:10 ` [RFC XEN PATCH v4 35/41] tools/libacpi: add a QEMU BIOSLinkLoader executor Haozhong Zhang
                   ` (8 subsequent siblings)
  42 siblings, 1 reply; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

Probe following QEMU ACPI ROMs:
 * etc/acpi/rsdp:       QEMU RSDP, which is used to iterate other
                        QEMU ACPI tables in etc/acpi/tables

 * etc/acpi/tables:     other QEMU ACPI tables

 * etc/table-loader:    QEMU BIOSLinkerLoader ROM, which can be
                        executed to load QEMU ACPI tables

 * etc/acpi/nvdimm-mem: RAM which is used as NVDIMM ACPI DSM buffer,
                        the exact location will be allocated during
                        the execution of /etc/table-loader

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/Makefile |  3 +-
 tools/firmware/hvmloader/util.h   | 13 +++++++
 tools/libacpi/qemu.h              | 52 +++++++++++++++++++++++++
 tools/libacpi/qemu_fw_cfg.c       | 27 +++++++++++++
 tools/libacpi/qemu_loader.c       | 82 +++++++++++++++++++++++++++++++++++++++
 tools/libacpi/qemu_stub.c         | 11 ++++++
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 tools/libacpi/qemu.h
 create mode 100644 tools/libacpi/qemu_loader.c

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index 53b99e2c28..eaa175ece6 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -76,12 +76,13 @@ smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
 
 ACPI_PATH = ../../libacpi
 DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
-ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o qemu_fw_cfg.o
+ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o qemu_fw_cfg.o qemu_loader.o
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
 CFLAGS += -I$(ACPI_PATH)
 vpath build.c $(ACPI_PATH)
 vpath static_tables.c $(ACPI_PATH)
 vpath qemu_fw_cfg.c $(ACPI_PATH)
+vpath qemu_loader.c $(ACPI_PATH)
 OBJS += $(ACPI_OBJS)
 
 hvmloader: $(OBJS)
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index e32b83e721..16244bb0b4 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -66,6 +66,19 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
     return oldbit;
 }
 
+static inline uint32_t be32_to_cpu(uint32_t v)
+{
+    return ((v & 0x000000ffUL) << 24) |
+           ((v & 0x0000ff00UL) << 8)  |
+           ((v & 0x00ff0000UL) >> 8)  |
+           ((v & 0xff000000UL) >> 24);
+}
+
+static inline uint16_t be16_to_cpu(uint16_t v)
+{
+    return ((v & 0x00ff) << 8) | ((v & 0xff00) >> 8);
+}
+
 /* MSR access */
 void wrmsr(uint32_t idx, uint64_t v);
 uint64_t rdmsr(uint32_t idx);
diff --git a/tools/libacpi/qemu.h b/tools/libacpi/qemu.h
new file mode 100644
index 0000000000..940816bf27
--- /dev/null
+++ b/tools/libacpi/qemu.h
@@ -0,0 +1,52 @@
+/*
+ * libacpi/qemu.h
+ *
+ * Header file of QEMU drivers.
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __QEMU_H__
+#define __QEMU_H__
+
+#include LIBACPI_STDUTILS
+#include "libacpi.h"
+
+#define FW_CFG_FILE_PATH_MAX_LENGTH 56
+
+/* An individual file entry, 64 bytes total. */
+struct fw_cfg_file {
+    uint32_t size;      /* size of referenced fw_cfg item, big-endian */
+    uint16_t select;    /* selector key of fw_cfg item, big-endian */
+    uint16_t reserved;
+    char name[FW_CFG_FILE_PATH_MAX_LENGTH]; /* fw_cfg item name,    */
+                                            /* NUL-terminated ascii */
+};
+
+int fw_cfg_probe_roms(struct acpi_ctxt *ctxt);
+
+int loader_add_rom(struct acpi_ctxt* ctxt, const struct fw_cfg_file *file);
+
+#endif /* !__QEMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libacpi/qemu_fw_cfg.c b/tools/libacpi/qemu_fw_cfg.c
index 254d2f575d..458b6eabdc 100644
--- a/tools/libacpi/qemu_fw_cfg.c
+++ b/tools/libacpi/qemu_fw_cfg.c
@@ -21,6 +21,7 @@
 
 #include LIBACPI_STDUTILS
 #include "libacpi.h"
+#include "qemu.h"
 
 /* QEMU fw_cfg I/O ports on x86 */
 #define FW_CFG_PORT_SEL         0x510
@@ -28,6 +29,7 @@
 
 /* QEMU fw_cfg entries */
 #define FW_CFG_SIGNATURE        0x0000
+#define FW_CFG_FILE_DIR         0x0019
 
 static inline void fw_cfg_select(uint16_t entry)
 {
@@ -55,6 +57,31 @@ bool fw_cfg_exists(void)
     return sig == 0x554d4551 /* "QEMU" */;
 }
 
+int fw_cfg_probe_roms(struct acpi_ctxt *ctxt)
+{
+    struct fw_cfg_file file;
+    uint32_t count, i;
+    int rc = 0;
+
+    fw_cfg_read_entry(FW_CFG_FILE_DIR, &count, sizeof(count));
+    count = be32_to_cpu(count);
+
+    for ( i = 0; i < count; i++ )
+    {
+        fw_cfg_read(&file, sizeof(file));
+        rc = loader_add_rom(ctxt, &file);
+        if ( rc )
+        {
+            file.name[FW_CFG_FILE_PATH_MAX_LENGTH - 1] = '\0';
+            printf("ERROR: failed to load QEMU ROM %s, err %d\n",
+                   file.name, rc);
+            break;
+        }
+    }
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libacpi/qemu_loader.c b/tools/libacpi/qemu_loader.c
new file mode 100644
index 0000000000..c0ed3b0ad0
--- /dev/null
+++ b/tools/libacpi/qemu_loader.c
@@ -0,0 +1,82 @@
+/*
+ * libacpi/qemu_loader.c
+ *
+ * Driver of QEMU BIOSLinkerLoader interface. The reference document
+ * can be found at
+ * https://github.com/qemu/qemu/blob/master/hw/acpi/bios-linker-loader.c.
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include LIBACPI_STDUTILS
+#include "libacpi.h"
+#include "qemu.h"
+
+struct rom {
+    struct fw_cfg_file file;
+    struct rom *next;
+};
+
+static struct rom *roms = NULL;
+static struct rom *bios_loader = NULL;
+
+static bool rom_needed(const char *file_name)
+{
+    return
+        !strncmp(file_name, "etc/acpi/rsdp", FW_CFG_FILE_PATH_MAX_LENGTH) ||
+        !strncmp(file_name, "etc/acpi/tables", FW_CFG_FILE_PATH_MAX_LENGTH) ||
+        !strncmp(file_name, "etc/table-loader", FW_CFG_FILE_PATH_MAX_LENGTH) ||
+        !strncmp(file_name, "etc/acpi/nvdimm-mem", FW_CFG_FILE_PATH_MAX_LENGTH);
+}
+
+int loader_add_rom(struct acpi_ctxt *ctxt, const struct fw_cfg_file *file)
+{
+    const char *name = file->name;
+    struct rom *rom;
+
+    if ( !rom_needed(name) )
+        return 0;
+
+    rom = roms;
+    while ( rom )
+    {
+        if ( !strncmp(rom->file.name, name, FW_CFG_FILE_PATH_MAX_LENGTH) )
+            return -EEXIST;
+        rom = rom->next;
+    }
+
+    rom = ctxt->mem_ops.alloc(ctxt, sizeof(*rom), 0);
+    if ( !rom )
+        return -ENOMEM;
+
+    memcpy(&rom->file, file, sizeof(*file));
+    rom->next = roms;
+    roms = rom;
+
+    if ( !strncmp(name, "etc/table-loader", FW_CFG_FILE_PATH_MAX_LENGTH) )
+        bios_loader = rom;
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libacpi/qemu_stub.c b/tools/libacpi/qemu_stub.c
index 6506de2d9c..fdba5294e1 100644
--- a/tools/libacpi/qemu_stub.c
+++ b/tools/libacpi/qemu_stub.c
@@ -22,12 +22,23 @@
 
 #include LIBACPI_STDUTILS
 #include "libacpi.h"
+#include "qemu.h"
 
 bool fw_cfg_exists(void)
 {
     return false;
 }
 
+int fw_cfg_probe_roms(struct acpi_ctxt *ctxt)
+{
+    return -ENOSYS;
+}
+
+int loader_add_rom(struct acpi_ctxt* ctxt, const struct fw_cfg_file *file)
+{
+    return -ENOSYS;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 35/41] tools/libacpi: add a QEMU BIOSLinkLoader executor
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (33 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 34/41] tools/libacpi: probe QEMU ACPI ROMs via " Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 36/41] tools/libacpi: add function to get the data of QEMU RSDP Haozhong Zhang
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Jan Beulich, Chao Peng,
	Dan Williams

The executor loads and executes the QEMU BIOSLinkerLoader ROM
etc/table-loader. It currently supports three BIOSLinkerLoader
commands ALLOCATE, POINTER and CHECKSUM, which are enough to load
currently supported QEMU ROMs.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libacpi/build.c       |   3 +-
 tools/libacpi/libacpi.h     |   2 +
 tools/libacpi/qemu.h        |   2 +
 tools/libacpi/qemu_fw_cfg.c |   6 +
 tools/libacpi/qemu_loader.c | 302 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/qemu_stub.c   |   9 ++
 6 files changed, 322 insertions(+), 2 deletions(-)

diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index 46051c46ac..f2d65574ff 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -105,8 +105,7 @@ static int dm_acpi_blacklist_signature(struct acpi_config *config, uint64_t sig)
     return 0;
 }
 
-static void set_checksum(
-    void *table, uint32_t checksum_offset, uint32_t length)
+void set_checksum(void *table, uint32_t checksum_offset, uint32_t length)
 {
     uint8_t *p, sum = 0;
 
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 80403f04ab..c973311a15 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -108,6 +108,8 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config);
 
 bool fw_cfg_exists(void);
 
+void set_checksum(void *table, uint32_t checksum_offset, uint32_t length);
+
 #endif /* __LIBACPI_H__ */
 
 /*
diff --git a/tools/libacpi/qemu.h b/tools/libacpi/qemu.h
index 940816bf27..224fc67e02 100644
--- a/tools/libacpi/qemu.h
+++ b/tools/libacpi/qemu.h
@@ -36,8 +36,10 @@ struct fw_cfg_file {
 };
 
 int fw_cfg_probe_roms(struct acpi_ctxt *ctxt);
+void fw_cfg_read_file(const struct fw_cfg_file *file, void *buf);
 
 int loader_add_rom(struct acpi_ctxt* ctxt, const struct fw_cfg_file *file);
+int loader_exec(struct acpi_ctxt *ctxt);
 
 #endif /* !__QEMU_H__ */
 
diff --git a/tools/libacpi/qemu_fw_cfg.c b/tools/libacpi/qemu_fw_cfg.c
index 458b6eabdc..260728ecb3 100644
--- a/tools/libacpi/qemu_fw_cfg.c
+++ b/tools/libacpi/qemu_fw_cfg.c
@@ -82,6 +82,12 @@ int fw_cfg_probe_roms(struct acpi_ctxt *ctxt)
     return rc;
 }
 
+void fw_cfg_read_file(const struct fw_cfg_file *file, void *buf)
+{
+    fw_cfg_read_entry(be16_to_cpu(file->select), buf,
+                      be32_to_cpu(file->size));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libacpi/qemu_loader.c b/tools/libacpi/qemu_loader.c
index c0ed3b0ad0..d041a37246 100644
--- a/tools/libacpi/qemu_loader.c
+++ b/tools/libacpi/qemu_loader.c
@@ -24,9 +24,71 @@
 #include "libacpi.h"
 #include "qemu.h"
 
+/* QEMU BIOSLinkerLoader interface. All fields in little-endian. */
+struct loader_entry {
+    uint32_t command;
+    union {
+        /*
+         * COMMAND_ALLOCATE - allocate a table from @alloc.file
+         * subject to @alloc.align alignment (must be power of 2)
+         * and @alloc.zone (can be HIGH or FSEG) requirements.
+         *
+         * Must appear exactly once for each file, and before
+         * this file is referenced by any other command.
+         */
+        struct {
+            char file[FW_CFG_FILE_PATH_MAX_LENGTH];
+            uint32_t align;
+            uint8_t zone;
+        } alloc;
+
+        /*
+         * COMMAND_ADD_POINTER - patch the table (originating from
+         * @dest_file) at @pointer.offset, by adding a pointer to the table
+         * originating from @src_file. 1,2,4 or 8 byte unsigned
+         * addition is used depending on @pointer.size.
+         */
+        struct {
+            char dest_file[FW_CFG_FILE_PATH_MAX_LENGTH];
+            char src_file[FW_CFG_FILE_PATH_MAX_LENGTH];
+            uint32_t offset;
+            uint8_t size;
+        } pointer;
+
+        /*
+         * COMMAND_ADD_CHECKSUM - calculate checksum of the range specified by
+         * @cksum_start and @cksum_length fields,
+         * and then add the value at @cksum.offset.
+         * Checksum simply sums -X for each byte X in the range
+         * using 8-bit math.
+         */
+        struct {
+            char file[FW_CFG_FILE_PATH_MAX_LENGTH];
+            uint32_t offset;
+            uint32_t start;
+            uint32_t length;
+        } cksum;
+
+        /* padding */
+        char pad[124];
+    };
+} __attribute__ ((packed));
+
+enum {
+    BIOS_LINKER_LOADER_COMMAND_ALLOCATE         = 0x1,
+    BIOS_LINKER_LOADER_COMMAND_ADD_POINTER      = 0x2,
+    BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM     = 0x3,
+};
+
+enum {
+    BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH = 0x1,
+    BIOS_LINKER_LOADER_ALLOC_ZONE_FSEG = 0x2,
+};
+
 struct rom {
     struct fw_cfg_file file;
     struct rom *next;
+    void *data;
 };
 
 static struct rom *roms = NULL;
@@ -41,6 +103,174 @@ static bool rom_needed(const char *file_name)
         !strncmp(file_name, "etc/acpi/nvdimm-mem", FW_CFG_FILE_PATH_MAX_LENGTH);
 }
 
+static int loader_load(struct acpi_ctxt *ctxt, struct rom *loader)
+{
+    struct fw_cfg_file *file = &loader->file;
+    uint32_t size = be32_to_cpu(file->size);
+
+    loader->data = ctxt->mem_ops.alloc(ctxt, size, 0);
+    if ( !loader->data )
+        return -ENOMEM;
+
+    fw_cfg_read_file(file, loader->data);
+
+    return 0;
+}
+
+static struct rom *loader_find_rom(const char *file_name)
+{
+    struct rom *rom = roms;
+
+    while ( rom )
+    {
+        if ( !strncmp(rom->file.name, file_name, FW_CFG_FILE_PATH_MAX_LENGTH) )
+            break;
+        rom = rom->next;
+    }
+
+    if ( !rom )
+        printf("ERROR: File %s not exist\n", file_name);
+
+    return rom;
+}
+
+static void loader_cmd_display(struct loader_entry *entry)
+{
+    switch ( entry->command )
+    {
+    case BIOS_LINKER_LOADER_COMMAND_ALLOCATE:
+        entry->alloc.file[FW_CFG_FILE_PATH_MAX_LENGTH - 1] = '\0';
+        printf("COMMAND_ALLOCATE: file %s, align %u, zone %u\n",
+               entry->alloc.file, entry->alloc.align, entry->alloc.zone);
+        break;
+
+    case BIOS_LINKER_LOADER_COMMAND_ADD_POINTER:
+        entry->pointer.dest_file[FW_CFG_FILE_PATH_MAX_LENGTH - 1] = '\0';
+        entry->pointer.src_file[FW_CFG_FILE_PATH_MAX_LENGTH - 1] = '\0';
+        printf("COMMAND_ADD_POINTER: dst %s, src %s, offset %u, size %u\n",
+               entry->pointer.dest_file, entry->pointer.src_file,
+               entry->pointer.offset, entry->pointer.size);
+        break;
+
+    case BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM:
+        entry->cksum.file[FW_CFG_FILE_PATH_MAX_LENGTH - 1] = '\0';
+        printf("COMMAND_ADD_CHECKSUM: file %s, offset %u, offset %u, len %u\n",
+               entry->cksum.file, entry->cksum.offset,
+               entry->cksum.start, entry->cksum.length);
+        break;
+
+    default:
+        printf("Unsupported command %u\n", entry->command);
+    }
+}
+
+static int loader_exec_allocate(struct acpi_ctxt *ctxt,
+                                const struct loader_entry *entry)
+{
+    uint32_t align = entry->alloc.align;
+    uint8_t zone = entry->alloc.zone;
+    struct rom *rom;
+    struct fw_cfg_file *file;
+
+    rom = loader_find_rom(entry->alloc.file);
+    if ( !rom )
+        return -ENOENT;
+    file = &rom->file;
+
+    if ( align & (align - 1) )
+    {
+        printf("ERROR: Invalid alignment %u, not power of 2\n", align);
+        return -EINVAL;
+    }
+
+    if ( zone != BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH &&
+         zone != BIOS_LINKER_LOADER_ALLOC_ZONE_FSEG )
+    {
+        printf("ERROR: Unsupported zone type %u\n", zone);
+        return -EINVAL;
+    }
+
+    rom->data = ctxt->mem_ops.alloc(ctxt, be32_to_cpu(file->size), align);
+    fw_cfg_read_file(file, rom->data);
+
+    return 0;
+}
+
+static int loader_exec_add_pointer(struct acpi_ctxt *ctxt,
+                                   const struct loader_entry *entry)
+{
+    uint32_t offset = entry->pointer.offset;
+    uint8_t size = entry->pointer.size;
+    struct rom *dst, *src;
+    uint64_t pointer, old_pointer;
+
+    dst = loader_find_rom(entry->pointer.dest_file);
+    src = loader_find_rom(entry->pointer.src_file);
+    if ( !dst || !src )
+        return -ENOENT;
+
+    if ( !dst->data )
+    {
+        printf("ERROR: No space allocated for file %s\n",
+               entry->pointer.dest_file);
+        return -ENOSPC;
+    }
+    if ( !src->data )
+    {
+        printf("ERROR: No space allocated for file %s\n",
+               entry->pointer.src_file);
+        return -ENOSPC;
+    }
+    if ( offset + size < offset ||
+         offset + size > be32_to_cpu(dst->file.size) )
+    {
+        printf("ERROR: Invalid size\n");
+        return -EINVAL;
+    }
+    if ( size != 1 && size != 2 && size != 4 && size != 8 )
+    {
+        printf("ERROR: Invalid pointer size %u\n", size);
+        return -EINVAL;
+    }
+
+    memcpy(&pointer, dst->data + offset, size);
+    old_pointer = pointer;
+    pointer += ctxt->mem_ops.v2p(ctxt, src->data);
+    memcpy(dst->data + offset, &pointer, size);
+
+    return 0;
+}
+
+static int loader_exec_add_checksum(const struct loader_entry *entry)
+{
+    uint32_t offset = entry->cksum.offset;
+    uint32_t start = entry->cksum.start;
+    uint32_t length = entry->cksum.length;
+    uint32_t size;
+    struct rom *rom;
+
+    rom = loader_find_rom(entry->cksum.file);
+    if ( !rom )
+        return -ENOENT;
+
+    if ( !rom->data )
+    {
+        printf("ERROR: No space allocated for file %s\n", entry->cksum.file);
+        return -ENOSPC;
+    }
+
+    size = be32_to_cpu(rom->file.size);
+    if ( offset >= size || start + length < start || start + length > size )
+    {
+        printf("ERROR: Invalid size\n");
+        return -EINVAL;
+    }
+
+    set_checksum(rom->data + start, offset - start, length);
+
+    return 0;
+}
+
 int loader_add_rom(struct acpi_ctxt *ctxt, const struct fw_cfg_file *file)
 {
     const char *name = file->name;
@@ -63,6 +293,7 @@ int loader_add_rom(struct acpi_ctxt *ctxt, const struct fw_cfg_file *file)
 
     memcpy(&rom->file, file, sizeof(*file));
     rom->next = roms;
+    rom->data = NULL;
     roms = rom;
 
     if ( !strncmp(name, "etc/table-loader", FW_CFG_FILE_PATH_MAX_LENGTH) )
@@ -71,6 +302,77 @@ int loader_add_rom(struct acpi_ctxt *ctxt, const struct fw_cfg_file *file)
     return 0;
 }
 
+int loader_exec(struct acpi_ctxt *ctxt)
+{
+    struct loader_entry *entry;
+    struct fw_cfg_file *file;
+    unsigned long size, offset = 0;
+    void *data;
+    int rc = 0;
+
+    if ( !bios_loader )
+    {
+        printf("ERROR: Cannot find BIOSLinkerLoader\n");
+        return -ENODEV;
+    }
+
+    file = &bios_loader->file;
+    size = be32_to_cpu(file->size);
+
+    if ( size % sizeof(*entry) )
+    {
+        printf("ERROR: Invalid BIOSLinkerLoader size %ld, "
+               "not multiples of entry size %ld\n",
+               size, (unsigned long)sizeof(*entry));
+        return -EINVAL;
+    }
+
+    rc = loader_load(ctxt, bios_loader);
+    if ( rc )
+    {
+        printf("ERROR: Failed to load BIOSLinkerLoader, err %d\n", rc);
+        return rc;
+    }
+
+    data = bios_loader->data;
+
+    while ( offset < size )
+    {
+        entry = data + offset;
+
+        switch ( entry->command )
+        {
+        case BIOS_LINKER_LOADER_COMMAND_ALLOCATE:
+            rc = loader_exec_allocate(ctxt, entry);
+            break;
+
+        case BIOS_LINKER_LOADER_COMMAND_ADD_POINTER:
+            rc = loader_exec_add_pointer(ctxt, entry);
+            break;
+
+        case BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM:
+            rc = loader_exec_add_checksum(entry);
+            break;
+
+        default:
+            /* Skip unsupported commands */
+            break;
+        }
+
+        if ( rc )
+        {
+            printf("ERROR: Failed to execute BIOSLinkerLoader command:\n");
+            loader_cmd_display(entry);
+
+            break;
+        }
+
+        offset += sizeof(*entry);
+    }
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libacpi/qemu_stub.c b/tools/libacpi/qemu_stub.c
index fdba5294e1..1eedf5466e 100644
--- a/tools/libacpi/qemu_stub.c
+++ b/tools/libacpi/qemu_stub.c
@@ -34,11 +34,20 @@ int fw_cfg_probe_roms(struct acpi_ctxt *ctxt)
     return -ENOSYS;
 }
 
+void fw_cfg_read_file(const struct fw_cfg_file *file, void *buf)
+{
+}
+
 int loader_add_rom(struct acpi_ctxt* ctxt, const struct fw_cfg_file *file)
 {
     return -ENOSYS;
 }
 
+int loader_exec(struct acpi_ctxt *ctxt)
+{
+    return -ENOSYS;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 36/41] tools/libacpi: add function to get the data of QEMU RSDP
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (34 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 35/41] tools/libacpi: add a QEMU BIOSLinkLoader executor Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 37/41] tools/libacpi: load QEMU ACPI Haozhong Zhang
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Jan Beulich, Chao Peng,
	Dan Williams

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libacpi/qemu.h        | 2 ++
 tools/libacpi/qemu_loader.c | 8 ++++++++
 tools/libacpi/qemu_stub.c   | 5 +++++
 3 files changed, 15 insertions(+)

diff --git a/tools/libacpi/qemu.h b/tools/libacpi/qemu.h
index 224fc67e02..532c579eca 100644
--- a/tools/libacpi/qemu.h
+++ b/tools/libacpi/qemu.h
@@ -22,6 +22,7 @@
 #define __QEMU_H__
 
 #include LIBACPI_STDUTILS
+#include "acpi2_0.h"
 #include "libacpi.h"
 
 #define FW_CFG_FILE_PATH_MAX_LENGTH 56
@@ -40,6 +41,7 @@ void fw_cfg_read_file(const struct fw_cfg_file *file, void *buf);
 
 int loader_add_rom(struct acpi_ctxt* ctxt, const struct fw_cfg_file *file);
 int loader_exec(struct acpi_ctxt *ctxt);
+struct acpi_20_rsdp *loader_get_rsdp(void);
 
 #endif /* !__QEMU_H__ */
 
diff --git a/tools/libacpi/qemu_loader.c b/tools/libacpi/qemu_loader.c
index d041a37246..660d825df7 100644
--- a/tools/libacpi/qemu_loader.c
+++ b/tools/libacpi/qemu_loader.c
@@ -21,6 +21,7 @@
  */
 
 #include LIBACPI_STDUTILS
+#include "acpi2_0.h"
 #include "libacpi.h"
 #include "qemu.h"
 
@@ -373,6 +374,13 @@ int loader_exec(struct acpi_ctxt *ctxt)
     return rc;
 }
 
+struct acpi_20_rsdp *loader_get_rsdp(void)
+{
+    struct rom *rsdp = loader_find_rom("etc/acpi/rsdp");
+
+    return rsdp ? rsdp->data : NULL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libacpi/qemu_stub.c b/tools/libacpi/qemu_stub.c
index 1eedf5466e..452772682a 100644
--- a/tools/libacpi/qemu_stub.c
+++ b/tools/libacpi/qemu_stub.c
@@ -48,6 +48,11 @@ int loader_exec(struct acpi_ctxt *ctxt)
     return -ENOSYS;
 }
 
+struct acpi_20_rsdp *loader_get_rsdp(void)
+{
+    return NULL;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 37/41] tools/libacpi: load QEMU ACPI
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (35 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 36/41] tools/libacpi: add function to get the data of QEMU RSDP Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 38/41] tools/xl: add xl domain configuration for virtual NVDIMM devices Haozhong Zhang
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Jan Beulich, Chao Peng,
	Dan Williams

If libacpi detects QEMU fw_cfg interface, it will try to detect and
execute QEMU BIOSLinkerLoader ROM to load QEMU-built ACPI. If any QEMU
ACPI table is conflict with Xen-built ACPI tables, libacpi will refuse
to load all QEMU ACPI tables.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libacpi/acpi2_0.h |   1 +
 tools/libacpi/build.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 2619ba32db..733bed684e 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -435,6 +435,7 @@ struct acpi_20_slit {
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_2_0_SSDT_SIGNATURE ASCII32('S','S','D','T')
 
 /*
  * Table revision numbers.
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f2d65574ff..32405a4c77 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -16,6 +16,7 @@
 #include LIBACPI_STDUTILS
 #include "acpi2_0.h"
 #include "libacpi.h"
+#include "qemu.h"
 #include "ssdt_s3.h"
 #include "ssdt_s4.h"
 #include "ssdt_tpm.h"
@@ -105,6 +106,18 @@ static int dm_acpi_blacklist_signature(struct acpi_config *config, uint64_t sig)
     return 0;
 }
 
+/* Return true if no collision is found. */
+static bool dm_acpi_check_signature_collision(uint64_t sig)
+{
+    unsigned int i;
+
+    for ( i = 0; i < NR_SIGNATURE_BLACKLIST_ENTS; i++ )
+        if ( sig == dm_acpi_signature_blacklist[i] )
+            return false;
+
+    return true;
+}
+
 void set_checksum(void *table, uint32_t checksum_offset, uint32_t length)
 {
     uint8_t *p, sum = 0;
@@ -388,6 +401,94 @@ static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
     return nr_added;
 }
 
+static int load_qemu_xen_tables(struct acpi_ctxt *ctxt,
+                                unsigned long *table_ptrs,
+                                int nr_tables,
+                                struct acpi_config *config)
+{
+    struct acpi_header *header;
+    struct acpi_20_rsdp *rsdp;
+    struct acpi_20_rsdt *rsdt;
+    uint32_t table_paddr, sig;
+    unsigned int nr_added = 0, nr_rsdt_ents;
+
+    printf("Loading QEMU ACPI tables ...\n");
+
+    if ( fw_cfg_probe_roms(ctxt) )
+        return 0;
+
+    if ( loader_exec(ctxt) )
+        return 0;
+
+    rsdp = loader_get_rsdp();
+    if ( !rsdp )
+    {
+        printf("Cannot find QEMU RSDP\n");
+        return 0;
+    }
+
+    rsdt = (struct acpi_20_rsdt *)ctxt->mem_ops.p2v(ctxt, rsdp->rsdt_address);
+
+    nr_rsdt_ents =
+        (rsdt->header.length - sizeof(struct acpi_header)) / sizeof(uint32_t);
+    if ( nr_rsdt_ents > ACPI_MAX_SECONDARY_TABLES - nr_tables )
+    {
+        printf("Too many tables in QEMU ACPI tables\n");
+        goto exit;
+    }
+
+    for ( nr_added = 0; nr_added < nr_rsdt_ents; nr_added++ )
+    {
+        table_paddr = rsdt->entry[nr_added];
+        header = ctxt->mem_ops.p2v(ctxt, table_paddr);
+        sig = header->signature;
+
+        if ( !dm_acpi_check_signature_collision(sig) )
+        {
+            printf("QEMU ACPI table conflict with Xen ACPI table '%c%c%c%c'\n",
+                   (char)(sig & 0xff),
+                   (char)((sig >> 8) & 0xff),
+                   (char)((sig >> 16) & 0xff),
+                   (char)((sig >> 24) & 0xff));
+            break;
+        }
+
+        if ( sig != ACPI_2_0_SSDT_SIGNATURE )
+            dm_acpi_blacklist_signature(config, sig);
+
+        table_ptrs[nr_tables++] = table_paddr;
+    }
+
+    if ( nr_added < nr_rsdt_ents )
+        while ( nr_added )
+        {
+            table_ptrs[--nr_tables] = 0;
+            nr_added--;
+        }
+
+exit:
+    /* Cleanup unused QEMU RSDP & RSDT. */
+    memset(rsdp, 0,
+           rsdp->revision == ACPI_2_0_RSDP_REVISION ?
+           rsdp->length : sizeof(struct acpi_10_rsdp));
+    memset(rsdt, 0, rsdt->header.length);
+
+    return nr_added;
+}
+
+static int construct_dm_acpi_tables(struct acpi_ctxt *ctxt,
+                                    unsigned long *table_ptrs,
+                                    int nr_tables,
+                                    struct acpi_config *config)
+{
+    int nr_added = 0;
+
+    if ( config->table_flags & ACPI_HAS_QEMU_XEN )
+        nr_added += load_qemu_xen_tables(ctxt, table_ptrs, nr_tables, config);
+
+    return nr_added;
+}
+
 static int construct_secondary_tables(struct acpi_ctxt *ctxt,
                                       unsigned long *table_ptrs,
                                       struct acpi_config *config,
@@ -530,6 +631,10 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
     nr_tables += construct_passthrough_tables(ctxt, table_ptrs,
                                               nr_tables, config);
 
+    /* Load ACPI built by device model */
+    if ( config->table_flags & ACPI_HAS_DM )
+        nr_tables += construct_dm_acpi_tables(ctxt, table_ptrs,
+                                              nr_tables, config);
 
     table_ptrs[nr_tables] = 0;
     return nr_tables;
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 38/41] tools/xl: add xl domain configuration for virtual NVDIMM devices
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (36 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 37/41] tools/libacpi: load QEMU ACPI Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 39/41] tools/libxl: allow aborting domain creation on fatal QMP init errors Haozhong Zhang
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

A new xl domain configuration
   vnvdimms = [ 'type=mfn, backend=START_PMEM_MFN, nr_pages=N', ... ]

is added to specify the virtual NVDIMM devices backed by the specified
host PMEM pages. As the kernel PMEM driver does not work in Dom0 now,
we have to specify MFNs.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 docs/man/xl.cfg.pod.5.in    |  33 ++++++++++++++
 tools/libxl/libxl.h         |   5 ++
 tools/libxl/libxl_nvdimm.c  |  28 ++++++++++++
 tools/libxl/libxl_types.idl |  15 ++++++
 tools/xl/xl_parse.c         | 108 ++++++++++++++++++++++++++++++++++++++++++++
 tools/xl/xl_vmcontrol.c     |  15 +++++-
 6 files changed, 203 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 1f9538c445..1c0119cbcc 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1268,6 +1268,39 @@ connectors=id0:1920x1080;id1:800x600;id2:640x480
 
 =back
 
+=item B<vnvdimms=[ 'VNVDIMM_SPEC', 'VNVDIMM_SPEC', ... ]>
+
+Specifies the virtual NVDIMM devices which are provided to the guest.
+
+Each B<VNVDIMM_SPEC> is a comma-separated list of C<KEY=VALUE> settings
+from the following list:
+
+=over 4
+
+=item B<type=TYPE>
+
+Specifies the type of host backend of the virtual NVDIMM device. Following
+is a list of supported types:
+
+=over 4
+
+=item B<mfn>
+
+backs the virtual NVDIMM device by a contiguous host PMEM region.
+
+=back
+
+=item B<backend=BACKEND>
+
+Specifies the host backend of the virtual NVDIMM device. If C<type=mfn>,
+then B<BACKEND> specifies the start MFN of the host PMEM region.
+
+=item B<nr_pages=NUMBER>
+
+Specifies the number of pages of the host backend.
+
+=back
+
 =item B<dm_restrict=BOOLEAN>
 
 Restrict the device model after startup,
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index c390bf227b..cb4fd84d48 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2354,6 +2354,11 @@ int libxl_nvdimm_pmem_setup_data(libxl_ctx *ctx,
                                  unsigned long data_smfn, unsigned data_emfn,
                                  unsigned long mgmt_smfn, unsigned mgmt_emfn);
 
+int libxl_vnvdimm_copy_config(libxl_ctx *ctx,
+                              libxl_domain_config *dst,
+                              const libxl_domain_config *src)
+                              LIBXL_EXTERNAL_CALLERS_ONLY;
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index 0d51036794..1863d76bbc 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -168,3 +168,31 @@ int libxl_nvdimm_pmem_setup_data(libxl_ctx *ctx,
 
     return errno ? ERROR_FAIL : 0;
 }
+
+int libxl_vnvdimm_copy_config(libxl_ctx *ctx,
+                              libxl_domain_config *dst,
+                              const libxl_domain_config *src)
+{
+    GC_INIT(ctx);
+    unsigned int nr = src->num_vnvdimms;
+    libxl_device_vnvdimm *vnvdimms;
+    int rc = 0;
+
+    if (!nr)
+        goto out;
+
+    vnvdimms = libxl__calloc(NOGC, nr, sizeof(*vnvdimms));
+    if (!vnvdimms) {
+        rc = ERROR_NOMEM;
+        goto out;
+    }
+
+    dst->num_vnvdimms = nr;
+    while (nr--)
+        libxl_device_vnvdimm_copy(ctx, &vnvdimms[nr], &src->vnvdimms[nr]);
+    dst->vnvdimms = vnvdimms;
+
+ out:
+    GC_FREE;
+    return rc;
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 053b1c0b9a..30879d11db 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -247,6 +247,10 @@ libxl_vuart_type = Enumeration("vuart_type", [
     (1, "sbsa_uart"),
     ])
 
+libxl_vnvdimm_backend_type = Enumeration("vnvdimm_backend_type", [
+    (0, "mfn"),
+    ])
+
 #
 # Complex libxl types
 #
@@ -813,6 +817,16 @@ libxl_device_vdispl = Struct("device_vdispl", [
     ("connectors", Array(libxl_connector_param, "num_connectors"))
     ])
 
+libxl_device_vnvdimm = Struct("device_vnvdimm", [
+    ("backend_domid",   libxl_domid),
+    ("backend_domname", string),
+    ("devid",           libxl_devid),
+    ("nr_pages",        uint64),
+    ("u", KeyedUnion(None, libxl_vnvdimm_backend_type, "backend_type",
+            [("mfn", uint64),
+            ])),
+])
+
 libxl_domain_config = Struct("domain_config", [
     ("c_info", libxl_domain_create_info),
     ("b_info", libxl_domain_build_info),
@@ -832,6 +846,7 @@ libxl_domain_config = Struct("domain_config", [
     ("channels", Array(libxl_device_channel, "num_channels")),
     ("usbctrls", Array(libxl_device_usbctrl, "num_usbctrls")),
     ("usbdevs", Array(libxl_device_usbdev, "num_usbdevs")),
+    ("vnvdimms", Array(libxl_device_vnvdimm, "num_vnvdimms")),
 
     ("on_poweroff", libxl_action_on_shutdown),
     ("on_reboot", libxl_action_on_shutdown),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 0a43a4876e..0123fcf89e 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -851,6 +851,109 @@ out:
     return rc;
 }
 
+static int parse_vnvdimm_config(libxl_device_vnvdimm *vnvdimm, char *token)
+{
+    char *oparg, *endptr;
+    unsigned long val;
+
+    if (MATCH_OPTION("type", token, oparg)) {
+        if (libxl_vnvdimm_backend_type_from_string(oparg,
+                                                   &vnvdimm->backend_type)) {
+            fprintf(stderr,
+                    "ERROR: invalid vNVDIMM backend type '%s'\n",
+                    oparg);
+            return 1;
+        }
+    } else if (MATCH_OPTION("nr_pages", token, oparg)) {
+        val = strtoul(oparg, &endptr, 0);
+        if (endptr == oparg || val == ULONG_MAX)
+        {
+            fprintf(stderr,
+                    "ERROR: invalid number of vNVDIMM backend pages '%s'\n",
+                    oparg);
+            return 1;
+        }
+        vnvdimm->nr_pages = val;
+    } else if (MATCH_OPTION("backend", token, oparg)) {
+        /* Skip: handled by parse_vnvdimms() */
+    } else {
+        fprintf(stderr, "ERROR: unknown string '%s' in vnvdimm spec\n", token);
+        return 1;
+    }
+
+    return 0;
+}
+
+/*
+ * vnvdimms = [ 'type=<mfn>, backend=<base_mfn>, nr_pages=<N>', ... ]
+ */
+static void parse_vnvdimms(XLU_Config *config, libxl_domain_config *d_config)
+{
+    XLU_ConfigList *vnvdimms;
+    const char *buf;
+    int rc;
+
+    rc = xlu_cfg_get_list(config, "vnvdimms", &vnvdimms, 0, 0);
+    if ( rc )
+        return;
+
+#if !defined(__linux__)
+    fprintf(stderr, "ERROR: 'vnvdimms' is only supported on Linux\n");
+    exit(-ERROR_FAIL);
+#endif
+
+    d_config->num_vnvdimms = 0;
+    d_config->vnvdimms = NULL;
+
+    while ((buf = xlu_cfg_get_listitem(vnvdimms,
+                                       d_config->num_vnvdimms)) != NULL) {
+        libxl_device_vnvdimm *vnvdimm =
+            ARRAY_EXTEND_INIT(d_config->vnvdimms, d_config->num_vnvdimms,
+                              libxl_device_vnvdimm_init);
+        char *buf2 = strdup(buf), *backend = NULL, *p, *endptr;
+        unsigned long mfn;
+
+        p = strtok(buf2, ",");
+        if (!p)
+            goto skip_nvdimm;
+
+        do {
+            while (*p == ' ')
+                p++;
+
+            rc = 0;
+            if (!MATCH_OPTION("backend", p, backend))
+                rc = parse_vnvdimm_config(vnvdimm, p);
+            if (rc)
+                exit(-ERROR_FAIL);
+        } while ((p = strtok(NULL, ",")) != NULL);
+
+        switch (vnvdimm->backend_type)
+        {
+        case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+            mfn = strtoul(backend, &endptr, 0);
+            if (endptr == backend || mfn == ULONG_MAX)
+            {
+                fprintf(stderr,
+                        "ERROR: invalid start MFN of host NVDIMM '%s'\n",
+                        backend);
+                exit(-ERROR_FAIL);
+            }
+            vnvdimm->u.mfn = mfn;
+
+            break;
+        }
+
+    skip_nvdimm:
+        free(buf2);
+    }
+}
+
+/*
+ * Reserved RAM space by qemu-xen for guest ACPI.
+ */
+#define QEMU_XEN_ACPI_BUILD_TABLE_MAX_SIZE 0x200000
+
 void parse_config_data(const char *config_source,
                        const char *config_data,
                        int config_len,
@@ -2131,6 +2234,11 @@ skip_usbdev:
             exit(-ERROR_FAIL);
         }
 
+        /* parse 'vnvdimms' */
+        parse_vnvdimms(config, d_config);
+        if (d_config->num_vnvdimms && !dm_acpi_size)
+            dm_acpi_size = QEMU_XEN_ACPI_BUILD_TABLE_MAX_SIZE;
+
         b_info->u.hvm.dm_acpi_size = dm_acpi_size;
     }
 
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25ded..1bdc173e04 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -381,12 +381,25 @@ static void reload_domain_config(uint32_t domid,
     if (rc) {
         LOG("failed to retrieve guest configuration (rc=%d). "
             "reusing old configuration", rc);
-        libxl_domain_config_dispose(&d_config_new);
+        goto error_out;
     } else {
+        rc = libxl_vnvdimm_copy_config(ctx, &d_config_new, d_config);
+        if (rc) {
+            LOG("failed to copy vnvdimm configuration (rc=%d). "
+                "reusing old configuration", rc);
+            libxl_domain_config_dispose(&d_config_new);
+            goto error_out;
+        }
+
         libxl_domain_config_dispose(d_config);
         /* Steal allocations */
         memcpy(d_config, &d_config_new, sizeof(libxl_domain_config));
     }
+
+    return;
+
+ error_out:
+    libxl_domain_config_dispose(&d_config_new);
 }
 
 /* Can update r_domid if domain is destroyed */
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 39/41] tools/libxl: allow aborting domain creation on fatal QMP init errors
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (37 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 38/41] tools/xl: add xl domain configuration for virtual NVDIMM devices Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 40/41] tools/libxl: initiate PMEM mapping via QMP callback Haozhong Zhang
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

If some errors happening during QMP initialization can affect the
proper work of a domain, it'd be better to treat them as fatal errors
and abort the creation of that domain. The existing types of QMP
initialization errors are not treated as fatal, and do not abort the
domain creation as before.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_create.c | 4 +++-
 tools/libxl/libxl_qmp.c    | 9 ++++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f15fb215c2..075850b58f 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1542,7 +1542,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
     if (dcs->sdss.dm.guest_domid) {
         if (d_config->b_info.device_model_version
             == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
-            libxl__qmp_initializations(gc, domid, d_config);
+            ret = libxl__qmp_initializations(gc, domid, d_config);
+            if (ret == ERROR_BADFAIL)
+                goto error_out;
         }
     }
 
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index eab993aca9..e1eb47c1d2 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1175,11 +1175,12 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
 {
     const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
     libxl__qmp_handler *qmp = NULL;
-    int ret = 0;
+    bool ignore_error = true;
+    int ret = -1;
 
     qmp = libxl__qmp_initialize(gc, domid);
     if (!qmp)
-        return -1;
+        goto out;
     ret = libxl__qmp_query_serial(qmp);
     if (!ret && vnc && vnc->passwd) {
         ret = qmp_change(gc, qmp, "vnc", "password", vnc->passwd);
@@ -1189,7 +1190,9 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
         ret = qmp_query_vnc(qmp);
     }
     libxl__qmp_close(qmp);
-    return ret;
+
+ out:
+    return ret ? (ignore_error ? ERROR_FAIL : ERROR_BADFAIL) : 0;
 }
 
 /*
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 40/41] tools/libxl: initiate PMEM mapping via QMP callback
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (38 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 39/41] tools/libxl: allow aborting domain creation on fatal QMP init errors Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:10 ` [RFC XEN PATCH v4 41/41] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

The base guest physical address of each vNVDIMM device is decided by
QEMU. Add a QMP callback to get the base address from QEMU and query Xen
hypervisor to map host PMEM pages to that address.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_internal.h |   6 ++
 tools/libxl/libxl_nvdimm.c   |  29 ++++++++++
 tools/libxl/libxl_qmp.c      | 129 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index bfa95d8619..8c66b0e93b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -4349,6 +4349,12 @@ static inline bool libxl__string_is_default(char **s)
 {
     return *s == NULL;
 }
+
+#if defined(__linux__)
+int libxl_vnvdimm_add_pages(libxl__gc *gc, uint32_t domid,
+                            xen_pfn_t mfn, xen_pfn_t gpfn, xen_pfn_t nr_pages);
+#endif /* __linux__ */
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index 1863d76bbc..a2c70fc7e6 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -196,3 +196,32 @@ int libxl_vnvdimm_copy_config(libxl_ctx *ctx,
     GC_FREE;
     return rc;
 }
+
+#if defined(__linux__)
+
+int libxl_vnvdimm_add_pages(libxl__gc *gc, uint32_t domid,
+                            xen_pfn_t mfn, xen_pfn_t gpfn, xen_pfn_t nr_pages)
+{
+    unsigned int nr;
+    int ret;
+
+    while (nr_pages) {
+        nr = min(nr_pages, (unsigned long)UINT_MAX);
+
+        ret = xc_domain_populate_pmem_map(CTX->xch, domid, mfn, gpfn, nr);
+        if (ret && ret != -ERESTART) {
+            LOG(ERROR, "failed to map PMEM pages, mfn 0x%" PRI_xen_pfn ", "
+                "gpfn 0x%" PRI_xen_pfn ", nr_pages %u, err %d",
+                mfn, gpfn, nr, ret);
+            break;
+        }
+
+        nr_pages -= nr;
+        mfn += nr;
+        gpfn += nr;
+    }
+
+    return ret;
+}
+
+#endif /* __linux__ */
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index e1eb47c1d2..d471f36870 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1170,6 +1170,127 @@ int libxl_qemu_monitor_command(libxl_ctx *ctx, uint32_t domid,
     return rc;
 }
 
+#if defined(__linux__)
+
+static int qmp_register_vnvdimm_callback(libxl__qmp_handler *qmp,
+                                         const libxl__json_object *o,
+                                         void *arg)
+{
+    GC_INIT(qmp->ctx);
+    const libxl_domain_config *guest_config = arg;
+    const libxl_device_vnvdimm *vnvdimm;
+    const libxl__json_object *obj, *sub_map, *sub_obj;
+    const char *id, *expected_id;
+    unsigned int i, slot;
+    unsigned long gpa, size, mfn, gpfn, nr_pages;
+    int rc = 0;
+
+    for (i = 0; (obj = libxl__json_array_get(o, i)); i++) {
+        if (!libxl__json_object_is_map(obj))
+            continue;
+
+        sub_map = libxl__json_map_get("data", obj, JSON_MAP);
+        if (!sub_map)
+            continue;
+
+        sub_obj = libxl__json_map_get("slot", sub_map, JSON_INTEGER);
+        slot = libxl__json_object_get_integer(sub_obj);
+        if (slot > guest_config->num_vnvdimms) {
+            LOG(ERROR,
+                "Invalid QEMU memory device slot %u, expecting less than %u",
+                slot, guest_config->num_vnvdimms);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+        vnvdimm = &guest_config->vnvdimms[slot];
+
+        /*
+         * Double check whether it's a NVDIMM memory device, through
+         * all memory devices in QEMU on Xen are for vNVDIMM.
+         */
+        expected_id = libxl__sprintf(gc, "xen_nvdimm%u", slot + 1);
+        if (!expected_id) {
+            LOG(ERROR, "Cannot build device id");
+            rc = -ERROR_FAIL;
+            goto out;
+        }
+        sub_obj = libxl__json_map_get("id", sub_map, JSON_STRING);
+        id = libxl__json_object_get_string(sub_obj);
+        if (!id || strncmp(id, expected_id, strlen(expected_id))) {
+            LOG(ERROR,
+                "Invalid QEMU memory device id %s, expecting %s",
+                id, expected_id);
+            rc = -ERROR_FAIL;
+            goto out;
+        }
+
+        sub_obj = libxl__json_map_get("addr", sub_map, JSON_INTEGER);
+        gpa = libxl__json_object_get_integer(sub_obj);
+        sub_obj = libxl__json_map_get("size", sub_map, JSON_INTEGER);
+        size = libxl__json_object_get_integer(sub_obj);
+        if ((gpa | size) & ~XC_PAGE_MASK) {
+            LOG(ERROR,
+                "Invalid address 0x%lx or size 0x%lx of QEMU memory device %s, "
+                "not aligned to 0x%lx",
+                gpa, size, id, XC_PAGE_SIZE);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+        gpfn = gpa >> XC_PAGE_SHIFT;
+
+        nr_pages = size >> XC_PAGE_SHIFT;
+        if (nr_pages > vnvdimm->nr_pages) {
+            LOG(ERROR,
+                "Invalid size 0x%lx of QEMU memory device %s, "
+                "expecting no larger than 0x%lx",
+                size, id, vnvdimm->nr_pages << XC_PAGE_SHIFT);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+
+        switch (vnvdimm->backend_type) {
+        case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+            mfn = vnvdimm->u.mfn;
+            break;
+
+        default:
+            LOG(ERROR, "Invalid NVDIMM backend type %u", vnvdimm->backend_type);
+            rc = -ERROR_INVAL;
+            goto out;
+        }
+
+        rc = libxl_vnvdimm_add_pages(gc, qmp->domid, mfn, gpfn, nr_pages);
+        if (rc) {
+            LOG(ERROR,
+                "Cannot map PMEM pages for QEMU memory device %s, "
+                "mfn 0x%lx, gpfn 0x%lx, nr 0x%lx, rc %d",
+                id, mfn, gpfn, nr_pages, rc);
+            rc = -ERROR_FAIL;
+            goto out;
+        }
+    }
+
+ out:
+    GC_FREE;
+    return rc;
+}
+
+static int libxl__qmp_query_vnvdimms(libxl__qmp_handler *qmp,
+                                     const libxl_domain_config *guest_config)
+{
+    int rc;
+    GC_INIT(qmp->ctx);
+
+    rc = qmp_synchronous_send(qmp, "query-memory-devices", NULL,
+                              qmp_register_vnvdimm_callback,
+                              (void *)guest_config, qmp->timeout);
+
+    GC_FREE;
+    return rc;
+}
+
+#endif /* __linux__ */
+
 int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
                                const libxl_domain_config *guest_config)
 {
@@ -1189,6 +1310,14 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
     if (!ret) {
         ret = qmp_query_vnc(qmp);
     }
+
+#if defined(__linux__)
+    if (!ret && guest_config->num_vnvdimms) {
+        ignore_error = false;
+        ret = libxl__qmp_query_vnvdimms(qmp, guest_config);
+    }
+#endif /* __linux__ */
+
     libxl__qmp_close(qmp);
 
  out:
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC XEN PATCH v4 41/41] tools/libxl: build qemu options from xl vNVDIMM configs
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (39 preceding siblings ...)
  2017-12-07 10:10 ` [RFC XEN PATCH v4 40/41] tools/libxl: initiate PMEM mapping via QMP callback Haozhong Zhang
@ 2017-12-07 10:10 ` Haozhong Zhang
  2017-12-07 10:18   ` Haozhong Zhang
  2018-02-09 12:33 ` [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Roger Pau Monné
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Ian Jackson, Chao Peng, Dan Williams

For xl configs
  vnvdimms = [ 'type=mfn,backend=$PMEM0_MFN,nr_pages=$N0', ... ]

the following qemu options will be built

  -machine <existing options>,nvdimm
  -m <existing options>,slots=$NR_SLOTS,maxmem=$MEM_SIZE
  -object memory-backend-xen,id=mem1,host-addr=$PMEM0_ADDR,size=$PMEM0_SIZE
  -device nvdimm,id=xen_nvdimm1,memdev=mem1
  ...

in which,
 - NR_SLOTS is the number of entries in vnvdimms + 1,
 - MEM_SIZE is the total size of all RAM and NVDIMM devices,
 - PMEM0_ADDR = PMEM0_MFN * 4096,
 - PMEM0_SIZE = N0 * 4096,

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_dm.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 79 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index a2ea95a9be..aa20078642 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -935,6 +935,58 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *target_path,
     return drive;
 }
 
+#if defined(__linux__)
+
+static uint64_t libxl__build_dm_vnvdimm_args(
+    libxl__gc *gc, flexarray_t *dm_args,
+    struct libxl_device_vnvdimm *dev, int dev_no)
+{
+    uint64_t addr = 0, size = 0;
+    char *arg;
+
+    switch (dev->backend_type)
+    {
+    case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+        addr = dev->u.mfn << XC_PAGE_SHIFT;
+        size = dev->nr_pages << XC_PAGE_SHIFT;
+        break;
+    }
+
+    if (!size)
+        return 0;
+
+    flexarray_append(dm_args, "-object");
+    arg = GCSPRINTF("memory-backend-xen,id=mem%d,host-addr=%"PRIu64",size=%"PRIu64,
+                    dev_no + 1, addr, size);
+    flexarray_append(dm_args, arg);
+
+    flexarray_append(dm_args, "-device");
+    arg = GCSPRINTF("nvdimm,id=xen_nvdimm%d,memdev=mem%d",
+                    dev_no + 1, dev_no + 1);
+    flexarray_append(dm_args, arg);
+
+    return size;
+}
+
+static uint64_t libxl__build_dm_vnvdimms_args(
+    libxl__gc *gc, flexarray_t *dm_args,
+    struct libxl_device_vnvdimm *vnvdimms, int num_vnvdimms)
+{
+    uint64_t total_size = 0, size;
+    unsigned int i;
+
+    for (i = 0; i < num_vnvdimms; i++) {
+        size = libxl__build_dm_vnvdimm_args(gc, dm_args, &vnvdimms[i], i);
+        if (!size)
+            break;
+        total_size += size;
+    }
+
+    return total_size;
+}
+
+#endif /* __linux__ */
+
 static int libxl__build_device_model_args_new(libxl__gc *gc,
                                         const char *dm, int guest_domid,
                                         const libxl_domain_config *guest_config,
@@ -948,13 +1000,18 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
     const libxl_device_nic *nics = guest_config->nics;
     const int num_disks = guest_config->num_disks;
     const int num_nics = guest_config->num_nics;
+#if defined(__linux__)
+    const int num_vnvdimms = guest_config->num_vnvdimms;
+#else
+    const int num_vnvdimms = 0;
+#endif
     const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
     const libxl_sdl_info *sdl = dm_sdl(guest_config);
     const char *keymap = dm_keymap(guest_config);
     char *machinearg;
     flexarray_t *dm_args, *dm_envs;
     int i, connection, devid, ret;
-    uint64_t ram_size;
+    uint64_t ram_size, ram_size_in_byte = 0, vnvdimms_size = 0;
     const char *path, *chardev;
     char *user = NULL;
     struct passwd *user_base, user_pwbuf;
@@ -1481,6 +1538,9 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
             }
         }
 
+        if (num_vnvdimms)
+            machinearg = libxl__sprintf(gc, "%s,nvdimm", machinearg);
+
         flexarray_append(dm_args, machinearg);
         for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++)
             flexarray_append(dm_args, b_info->extra_hvm[i]);
@@ -1490,8 +1550,25 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
     }
 
     ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb);
+    if (num_vnvdimms) {
+        ram_size_in_byte = ram_size << 20;
+        vnvdimms_size = libxl__build_dm_vnvdimms_args(gc, dm_args,
+                                                      guest_config->vnvdimms,
+                                                      num_vnvdimms);
+        if (ram_size_in_byte + vnvdimms_size < ram_size_in_byte) {
+            LOG(ERROR,
+                "total size of RAM (%"PRIu64") and NVDIMM (%"PRIu64") overflow",
+                ram_size_in_byte, vnvdimms_size);
+            return ERROR_INVAL;
+        }
+    }
     flexarray_append(dm_args, "-m");
-    flexarray_append(dm_args, GCSPRINTF("%"PRId64, ram_size));
+    flexarray_append(dm_args,
+                     vnvdimms_size ?
+                     GCSPRINTF("%"PRId64",slots=%d,maxmem=%"PRId64,
+                               ram_size, num_vnvdimms + 1,
+                               ROUNDUP(ram_size_in_byte, 12) + vnvdimms_size) :
+                     GCSPRINTF("%"PRId64, ram_size));
 
     if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
         if (b_info->u.hvm.hdtype == LIBXL_HDTYPE_AHCI)
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
@ 2017-12-07 10:18   ` Haozhong Zhang
  2017-12-07 10:09 ` [RFC XEN PATCH v4 02/41] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
                     ` (41 subsequent siblings)
  42 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Eduardo Habkost,
	Igor Mammedov, Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

This is the QEMU part patches that works with the associated Xen
patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
guest address space for vNVDIMM devices.

All patches can also be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4

RFC v3 can be found at
  https://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg02406.html

Changes in v4:
  * The primary change in this version is to use the existing fw_cfg
    and BIOSLinkerLoader interface to pass ACPI to Xen guest, rather
    than introducing a Xen-specific mechanism. (Patch 5-10)

    Following Xen-specific are still left in ACPI code:
     (1) (Patch 6) xen_acpi_build() is called in acpi_build() to only
         build Xen guest required ACPI tables. The consequent code
         path in acpi_build() is bypassed.
     (2) (Patch 8) Add Xen-specific functions to access DSM memory,
         because the existing cpu_physical_memory_rw does not work on Xen.
     (3) (Patch 9) Implement a workaround for different AML integer
         widths between ACPI 1.0 (QEMU) and 2.0 (Xen).

Patch 1 is a trivial code cleanup.

Patch 2-3 add a memory backend dedicated for Xen usage and a hotplug
memory region for Xen guest, in order to make the existing nvdimm
device plugging path work on Xen.

Patch 4 is to avoid dereferencing the NULL pointer to non-existing
label data, as the Xen side support for labels is not implemented yet.

Patch 5-10 enable building ACPI tables and passing them to Xen HVM
domains.

Haozhong Zhang (10):
  [01/10] xen-hvm: remove a trailing space
  [02/10] xen-hvm: create the hotplug memory region on Xen
  [03/10] hostmem-xen: add a host memory backend for Xen
  [04/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero
  [05/10] xen-hvm: initialize fw_cfg interface
  [06/10] hw/acpi-build, xen-hvm: introduce a Xen-specific ACPI builder
  [07/10] xen-hvm: add functions to copy data from/to HVM memory
  [08/10] nvdimm acpi: add functions to access DSM memory on Xen
  [09/10] nvdimm acpi: add compatibility for 64-bit integer in ACPI 2.0 and later
  [10/10] xen-hvm: enable building NFIT and SSDT of vNVDIMM for HVM domains

 backends/Makefile.objs      |   1 +
 backends/hostmem-xen.c      | 108 ++++++++++++++++++++++++++++++++++++++++++++
 backends/hostmem.c          |   9 ++++
 hw/acpi/nvdimm.c            |  51 +++++++++++++++++----
 hw/i386/acpi-build.c        |   9 +++-
 hw/i386/pc.c                |  86 +++++++++++++++++++----------------
 hw/i386/xen/xen-hvm.c       | 105 ++++++++++++++++++++++++++++++++++++++++--
 hw/mem/nvdimm.c             |  10 +++-
 hw/mem/pc-dimm.c            |   6 ++-
 include/hw/acpi/aml-build.h |   4 ++
 include/hw/i386/pc.h        |   1 +
 include/hw/xen/xen.h        |   7 +++
 stubs/xen-hvm.c             |  15 ++++++
 13 files changed, 359 insertions(+), 53 deletions(-)
 create mode 100644 backends/hostmem-xen.c

-- 
2.15.1

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
@ 2017-12-07 10:18   ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Igor Mammedov, Anthony Perard,
	Chao Peng, Dan Williams, Richard Henderson, Xiao Guangrong

This is the QEMU part patches that works with the associated Xen
patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
guest address space for vNVDIMM devices.

All patches can also be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4

RFC v3 can be found at
  https://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg02406.html

Changes in v4:
  * The primary change in this version is to use the existing fw_cfg
    and BIOSLinkerLoader interface to pass ACPI to Xen guest, rather
    than introducing a Xen-specific mechanism. (Patch 5-10)

    Following Xen-specific are still left in ACPI code:
     (1) (Patch 6) xen_acpi_build() is called in acpi_build() to only
         build Xen guest required ACPI tables. The consequent code
         path in acpi_build() is bypassed.
     (2) (Patch 8) Add Xen-specific functions to access DSM memory,
         because the existing cpu_physical_memory_rw does not work on Xen.
     (3) (Patch 9) Implement a workaround for different AML integer
         widths between ACPI 1.0 (QEMU) and 2.0 (Xen).

Patch 1 is a trivial code cleanup.

Patch 2-3 add a memory backend dedicated for Xen usage and a hotplug
memory region for Xen guest, in order to make the existing nvdimm
device plugging path work on Xen.

Patch 4 is to avoid dereferencing the NULL pointer to non-existing
label data, as the Xen side support for labels is not implemented yet.

Patch 5-10 enable building ACPI tables and passing them to Xen HVM
domains.

Haozhong Zhang (10):
  [01/10] xen-hvm: remove a trailing space
  [02/10] xen-hvm: create the hotplug memory region on Xen
  [03/10] hostmem-xen: add a host memory backend for Xen
  [04/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero
  [05/10] xen-hvm: initialize fw_cfg interface
  [06/10] hw/acpi-build, xen-hvm: introduce a Xen-specific ACPI builder
  [07/10] xen-hvm: add functions to copy data from/to HVM memory
  [08/10] nvdimm acpi: add functions to access DSM memory on Xen
  [09/10] nvdimm acpi: add compatibility for 64-bit integer in ACPI 2.0 and later
  [10/10] xen-hvm: enable building NFIT and SSDT of vNVDIMM for HVM domains

 backends/Makefile.objs      |   1 +
 backends/hostmem-xen.c      | 108 ++++++++++++++++++++++++++++++++++++++++++++
 backends/hostmem.c          |   9 ++++
 hw/acpi/nvdimm.c            |  51 +++++++++++++++++----
 hw/i386/acpi-build.c        |   9 +++-
 hw/i386/pc.c                |  86 +++++++++++++++++++----------------
 hw/i386/xen/xen-hvm.c       | 105 ++++++++++++++++++++++++++++++++++++++++--
 hw/mem/nvdimm.c             |  10 +++-
 hw/mem/pc-dimm.c            |   6 ++-
 include/hw/acpi/aml-build.h |   4 ++
 include/hw/i386/pc.h        |   1 +
 include/hw/xen/xen.h        |   7 +++
 stubs/xen-hvm.c             |  15 ++++++
 13 files changed, 359 insertions(+), 53 deletions(-)
 create mode 100644 backends/hostmem-xen.c

-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 01/10] xen-hvm: remove a trailing space
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Michael S. Tsirkin

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 8028bed6fd..02d92fd268 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -248,7 +248,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
         /* RAM already populated in Xen */
         fprintf(stderr, "%s: do not alloc "RAM_ADDR_FMT
                 " bytes of ram at "RAM_ADDR_FMT" when runstate is INMIGRATE\n",
-                __func__, size, ram_addr); 
+                __func__, size, ram_addr);
         return;
     }
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 01/10] xen-hvm: remove a trailing space
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 8028bed6fd..02d92fd268 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -248,7 +248,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
         /* RAM already populated in Xen */
         fprintf(stderr, "%s: do not alloc "RAM_ADDR_FMT
                 " bytes of ram at "RAM_ADDR_FMT" when runstate is INMIGRATE\n",
-                __func__, size, ram_addr); 
+                __func__, size, ram_addr);
         return;
     }
 
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

The guest physical address of vNVDIMM is allocated from the hotplug
memory region, which is not created when QEMU is used as Xen device
model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
the code for pc machine type to create the hotplug memory region for
Xen HVM domains.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
---
 hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
 hw/i386/xen/xen-hvm.c |  2 ++
 include/hw/i386/pc.h  |  1 +
 3 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 186545d2a4..9f46c8df79 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1315,6 +1315,53 @@ void xen_load_linux(PCMachineState *pcms)
     pcms->fw_cfg = fw_cfg;
 }
 
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)
+{
+    MachineState *machine = MACHINE(pcms);
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
+
+    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
+        return;
+
+    if (memory_region_size(&pcms->hotplug_memory.mr)) {
+        error_report("hotplug memory region has been initialized");
+        exit(EXIT_FAILURE);
+    }
+
+    if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
+        error_report("unsupported amount of memory slots: %"PRIu64,
+                     machine->ram_slots);
+        exit(EXIT_FAILURE);
+    }
+
+    if (QEMU_ALIGN_UP(machine->maxram_size,
+                      TARGET_PAGE_SIZE) != machine->maxram_size) {
+        error_report("maximum memory size must by aligned to multiple of "
+                     "%d bytes", TARGET_PAGE_SIZE);
+        exit(EXIT_FAILURE);
+    }
+
+    pcms->hotplug_memory.base =
+        ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
+
+    if (pcmc->enforce_aligned_dimm) {
+        /* size hotplug region assuming 1G page max alignment per slot */
+        hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
+    }
+
+    if ((pcms->hotplug_memory.base + hotplug_mem_size) < hotplug_mem_size) {
+        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+                     machine->maxram_size);
+        exit(EXIT_FAILURE);
+    }
+
+    memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
+                       "hotplug-memory", hotplug_mem_size);
+    memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
+                                &pcms->hotplug_memory.mr);
+}
+
 void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *system_memory,
                     MemoryRegion *rom_memory,
@@ -1366,44 +1413,7 @@ void pc_memory_init(PCMachineState *pcms,
     }
 
     /* initialize hotplug memory address space */
-    if (pcmc->has_reserved_memory &&
-        (machine->ram_size < machine->maxram_size)) {
-        ram_addr_t hotplug_mem_size =
-            machine->maxram_size - machine->ram_size;
-
-        if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
-            error_report("unsupported amount of memory slots: %"PRIu64,
-                         machine->ram_slots);
-            exit(EXIT_FAILURE);
-        }
-
-        if (QEMU_ALIGN_UP(machine->maxram_size,
-                          TARGET_PAGE_SIZE) != machine->maxram_size) {
-            error_report("maximum memory size must by aligned to multiple of "
-                         "%d bytes", TARGET_PAGE_SIZE);
-            exit(EXIT_FAILURE);
-        }
-
-        pcms->hotplug_memory.base =
-            ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
-
-        if (pcmc->enforce_aligned_dimm) {
-            /* size hotplug region assuming 1G page max alignment per slot */
-            hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
-        }
-
-        if ((pcms->hotplug_memory.base + hotplug_mem_size) <
-            hotplug_mem_size) {
-            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
-                         machine->maxram_size);
-            exit(EXIT_FAILURE);
-        }
-
-        memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
-                           "hotplug-memory", hotplug_mem_size);
-        memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
-                                    &pcms->hotplug_memory.mr);
-    }
+    pc_memory_hotplug_init(pcms, system_memory);
 
     /* Initialize PC system firmware */
     pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 02d92fd268..fe01b7a025 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -235,6 +235,8 @@ static void xen_ram_init(PCMachineState *pcms,
                                  pcms->above_4g_mem_size);
         memory_region_add_subregion(sysmem, 0x100000000ULL, &ram_hi);
     }
+
+    pc_memory_hotplug_init(pcms, sysmem);
 }
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ef438bd765..86e375b616 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -249,6 +249,7 @@ void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *rom_memory,
                     MemoryRegion **ram_memory);
 uint64_t pc_pci_hole64_start(void);
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory);
 qemu_irq pc_allocate_cpu_irq(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

The guest physical address of vNVDIMM is allocated from the hotplug
memory region, which is not created when QEMU is used as Xen device
model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
the code for pc machine type to create the hotplug memory region for
Xen HVM domains.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
---
 hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
 hw/i386/xen/xen-hvm.c |  2 ++
 include/hw/i386/pc.h  |  1 +
 3 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 186545d2a4..9f46c8df79 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1315,6 +1315,53 @@ void xen_load_linux(PCMachineState *pcms)
     pcms->fw_cfg = fw_cfg;
 }
 
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)
+{
+    MachineState *machine = MACHINE(pcms);
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
+
+    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
+        return;
+
+    if (memory_region_size(&pcms->hotplug_memory.mr)) {
+        error_report("hotplug memory region has been initialized");
+        exit(EXIT_FAILURE);
+    }
+
+    if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
+        error_report("unsupported amount of memory slots: %"PRIu64,
+                     machine->ram_slots);
+        exit(EXIT_FAILURE);
+    }
+
+    if (QEMU_ALIGN_UP(machine->maxram_size,
+                      TARGET_PAGE_SIZE) != machine->maxram_size) {
+        error_report("maximum memory size must by aligned to multiple of "
+                     "%d bytes", TARGET_PAGE_SIZE);
+        exit(EXIT_FAILURE);
+    }
+
+    pcms->hotplug_memory.base =
+        ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
+
+    if (pcmc->enforce_aligned_dimm) {
+        /* size hotplug region assuming 1G page max alignment per slot */
+        hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
+    }
+
+    if ((pcms->hotplug_memory.base + hotplug_mem_size) < hotplug_mem_size) {
+        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+                     machine->maxram_size);
+        exit(EXIT_FAILURE);
+    }
+
+    memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
+                       "hotplug-memory", hotplug_mem_size);
+    memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
+                                &pcms->hotplug_memory.mr);
+}
+
 void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *system_memory,
                     MemoryRegion *rom_memory,
@@ -1366,44 +1413,7 @@ void pc_memory_init(PCMachineState *pcms,
     }
 
     /* initialize hotplug memory address space */
-    if (pcmc->has_reserved_memory &&
-        (machine->ram_size < machine->maxram_size)) {
-        ram_addr_t hotplug_mem_size =
-            machine->maxram_size - machine->ram_size;
-
-        if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
-            error_report("unsupported amount of memory slots: %"PRIu64,
-                         machine->ram_slots);
-            exit(EXIT_FAILURE);
-        }
-
-        if (QEMU_ALIGN_UP(machine->maxram_size,
-                          TARGET_PAGE_SIZE) != machine->maxram_size) {
-            error_report("maximum memory size must by aligned to multiple of "
-                         "%d bytes", TARGET_PAGE_SIZE);
-            exit(EXIT_FAILURE);
-        }
-
-        pcms->hotplug_memory.base =
-            ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1ULL << 30);
-
-        if (pcmc->enforce_aligned_dimm) {
-            /* size hotplug region assuming 1G page max alignment per slot */
-            hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
-        }
-
-        if ((pcms->hotplug_memory.base + hotplug_mem_size) <
-            hotplug_mem_size) {
-            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
-                         machine->maxram_size);
-            exit(EXIT_FAILURE);
-        }
-
-        memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms),
-                           "hotplug-memory", hotplug_mem_size);
-        memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
-                                    &pcms->hotplug_memory.mr);
-    }
+    pc_memory_hotplug_init(pcms, system_memory);
 
     /* Initialize PC system firmware */
     pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 02d92fd268..fe01b7a025 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -235,6 +235,8 @@ static void xen_ram_init(PCMachineState *pcms,
                                  pcms->above_4g_mem_size);
         memory_region_add_subregion(sysmem, 0x100000000ULL, &ram_hi);
     }
+
+    pc_memory_hotplug_init(pcms, sysmem);
 }
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ef438bd765..86e375b616 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -249,6 +249,7 @@ void pc_memory_init(PCMachineState *pcms,
                     MemoryRegion *rom_memory,
                     MemoryRegion **ram_memory);
 uint64_t pc_pci_hole64_start(void);
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory);
 qemu_irq pc_allocate_cpu_irq(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Eduardo Habkost,
	Igor Mammedov, Michael S. Tsirkin

vNVDIMM requires a host memory backend to allocate its backend
resources to the guest. When QEMU is used as Xen device model, the
backend resource allocation of vNVDIMM is managed out of QEMU. A new
host memory backend 'memory-backend-xen' is introduced to represent
the backend resource allocated by Xen. It simply creates a memory
region of the specified size as a placeholder in the guest address
space, which will be mapped by Xen to the actual backend resource.

Following example QEMU options create a vNVDIMM device backed by a 4GB
host PMEM region at host physical address 0x100000000:
   -object memory-backend-xen,id=mem1,host-addr=0x100000000,size=4G
   -device nvdimm,id=nvdimm1,memdev=mem1

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++
 backends/hostmem.c     |   9 +++++
 hw/mem/pc-dimm.c       |   6 ++-
 4 files changed, 123 insertions(+), 1 deletion(-)
 create mode 100644 backends/hostmem-xen.c

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 0400799efd..3096fde21f 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
 common-obj-$(CONFIG_LINUX) += hostmem-file.o
+common-obj-${CONFIG_XEN_BACKEND} += hostmem-xen.o
 
 common-obj-y += cryptodev.o
 common-obj-y += cryptodev-builtin.o
diff --git a/backends/hostmem-xen.c b/backends/hostmem-xen.c
new file mode 100644
index 0000000000..99211efd81
--- /dev/null
+++ b/backends/hostmem-xen.c
@@ -0,0 +1,108 @@
+/*
+ * QEMU Host Memory Backend for Xen
+ *
+ * Copyright(C) 2017 Intel Corporation.
+ *
+ * Author:
+ *   Haozhong Zhang <haozhong.zhang@intel.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_MEMORY_BACKEND_XEN "memory-backend-xen"
+
+#define MEMORY_BACKEND_XEN(obj) \
+    OBJECT_CHECK(HostMemoryBackendXen, (obj), TYPE_MEMORY_BACKEND_XEN)
+
+typedef struct HostMemoryBackendXen HostMemoryBackendXen;
+
+struct HostMemoryBackendXen {
+    HostMemoryBackend parent_obj;
+
+    uint64_t host_addr;
+};
+
+static void xen_backend_get_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackendXen *backend = MEMORY_BACKEND_XEN(obj);
+    uint64_t value = backend->host_addr;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void xen_backend_set_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendXen *xb = MEMORY_BACKEND_XEN(obj);
+    Error *local_err = NULL;
+    uint64_t value;
+
+    if (memory_region_size(&backend->mr)) {
+        error_setg(&local_err, "cannot change property value");
+        goto out;
+    }
+
+    visit_type_size(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    xb->host_addr = value;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+static void xen_backend_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return;
+    }
+    memory_region_init(&backend->mr, OBJECT(backend), "hostmem-xen",
+                       backend->size);
+    backend->mr.align = getpagesize();
+}
+
+static void xen_backend_class_init(ObjectClass *oc, void *data)
+{
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+    bc->alloc = xen_backend_alloc;
+
+    object_class_property_add(oc, "host-addr", "int",
+                              xen_backend_get_host_addr,
+                              xen_backend_set_host_addr,
+                              NULL, NULL, &error_abort);
+}
+
+static const TypeInfo xen_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_XEN,
+    .parent = TYPE_MEMORY_BACKEND,
+    .class_init = xen_backend_class_init,
+    .instance_size = sizeof(HostMemoryBackendXen),
+};
+
+static void register_types(void)
+{
+    type_register_static(&xen_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index ee2c2d5bfd..ba13a52994 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
+#include "hw/xen/xen.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qapi-types.h"
@@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
             goto out;
         }
 
+        /*
+         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
+         * so no further work in this function is needed.
+         */
+        if (xen_enabled() && !backend->mr.ram_block) {
+            goto out;
+        }
+
         ptr = memory_region_get_ram_ptr(&backend->mr);
         sz = memory_region_size(&backend->mr);
 
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 66eace5a5c..dcbfce33d5 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -28,6 +28,7 @@
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "hw/virtio/vhost.h"
+#include "hw/xen/xen.h"
 
 typedef struct pc_dimms_capacity {
      uint64_t size;
@@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     }
 
     memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
-    vmstate_register_ram(vmstate_mr, dev);
+    /* memory-backend-xen is not backed by RAM. */
+    if (!xen_enabled()) {
+        vmstate_register_ram(vmstate_mr, dev);
+    }
     numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
 
 out:
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Igor Mammedov, Anthony Perard, Chao Peng,
	Dan Williams

vNVDIMM requires a host memory backend to allocate its backend
resources to the guest. When QEMU is used as Xen device model, the
backend resource allocation of vNVDIMM is managed out of QEMU. A new
host memory backend 'memory-backend-xen' is introduced to represent
the backend resource allocated by Xen. It simply creates a memory
region of the specified size as a placeholder in the guest address
space, which will be mapped by Xen to the actual backend resource.

Following example QEMU options create a vNVDIMM device backed by a 4GB
host PMEM region at host physical address 0x100000000:
   -object memory-backend-xen,id=mem1,host-addr=0x100000000,size=4G
   -device nvdimm,id=nvdimm1,memdev=mem1

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++
 backends/hostmem.c     |   9 +++++
 hw/mem/pc-dimm.c       |   6 ++-
 4 files changed, 123 insertions(+), 1 deletion(-)
 create mode 100644 backends/hostmem-xen.c

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 0400799efd..3096fde21f 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
 common-obj-$(CONFIG_LINUX) += hostmem-file.o
+common-obj-${CONFIG_XEN_BACKEND} += hostmem-xen.o
 
 common-obj-y += cryptodev.o
 common-obj-y += cryptodev-builtin.o
diff --git a/backends/hostmem-xen.c b/backends/hostmem-xen.c
new file mode 100644
index 0000000000..99211efd81
--- /dev/null
+++ b/backends/hostmem-xen.c
@@ -0,0 +1,108 @@
+/*
+ * QEMU Host Memory Backend for Xen
+ *
+ * Copyright(C) 2017 Intel Corporation.
+ *
+ * Author:
+ *   Haozhong Zhang <haozhong.zhang@intel.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_MEMORY_BACKEND_XEN "memory-backend-xen"
+
+#define MEMORY_BACKEND_XEN(obj) \
+    OBJECT_CHECK(HostMemoryBackendXen, (obj), TYPE_MEMORY_BACKEND_XEN)
+
+typedef struct HostMemoryBackendXen HostMemoryBackendXen;
+
+struct HostMemoryBackendXen {
+    HostMemoryBackend parent_obj;
+
+    uint64_t host_addr;
+};
+
+static void xen_backend_get_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackendXen *backend = MEMORY_BACKEND_XEN(obj);
+    uint64_t value = backend->host_addr;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void xen_backend_set_host_addr(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendXen *xb = MEMORY_BACKEND_XEN(obj);
+    Error *local_err = NULL;
+    uint64_t value;
+
+    if (memory_region_size(&backend->mr)) {
+        error_setg(&local_err, "cannot change property value");
+        goto out;
+    }
+
+    visit_type_size(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    xb->host_addr = value;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+static void xen_backend_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return;
+    }
+    memory_region_init(&backend->mr, OBJECT(backend), "hostmem-xen",
+                       backend->size);
+    backend->mr.align = getpagesize();
+}
+
+static void xen_backend_class_init(ObjectClass *oc, void *data)
+{
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+    bc->alloc = xen_backend_alloc;
+
+    object_class_property_add(oc, "host-addr", "int",
+                              xen_backend_get_host_addr,
+                              xen_backend_set_host_addr,
+                              NULL, NULL, &error_abort);
+}
+
+static const TypeInfo xen_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_XEN,
+    .parent = TYPE_MEMORY_BACKEND,
+    .class_init = xen_backend_class_init,
+    .instance_size = sizeof(HostMemoryBackendXen),
+};
+
+static void register_types(void)
+{
+    type_register_static(&xen_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index ee2c2d5bfd..ba13a52994 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
+#include "hw/xen/xen.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qapi-types.h"
@@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
             goto out;
         }
 
+        /*
+         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
+         * so no further work in this function is needed.
+         */
+        if (xen_enabled() && !backend->mr.ram_block) {
+            goto out;
+        }
+
         ptr = memory_region_get_ram_ptr(&backend->mr);
         sz = memory_region_size(&backend->mr);
 
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 66eace5a5c..dcbfce33d5 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -28,6 +28,7 @@
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "hw/virtio/vhost.h"
+#include "hw/xen/xen.h"
 
 typedef struct pc_dimms_capacity {
      uint64_t size;
@@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     }
 
     memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
-    vmstate_register_ram(vmstate_mr, dev);
+    /* memory-backend-xen is not backed by RAM. */
+    if (!xen_enabled()) {
+        vmstate_register_ram(vmstate_mr, dev);
+    }
     numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
 
 out:
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 04/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov

The memory region of vNVDIMM on Xen is a RAM memory region, so
memory_region_get_ram_ptr() cannot be used in nvdimm_realize() to get
a pointer to the label data area in that region. To be worse, it may
abort QEMU. As Xen currently does not support labels (i.e. label size
is 0) and every access in QEMU to labels is led by a label size check,
let's not intiailize nvdimm->label_data if the label size is 0.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem/nvdimm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 952fce5ec8..3e58538b99 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -87,7 +87,15 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
     align = memory_region_get_alignment(mr);
 
     pmem_size = size - nvdimm->label_size;
-    nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
+    /*
+     * The memory region of vNVDIMM on Xen is not a RAM memory region,
+     * so memory_region_get_ram_ptr() below will abort QEMU. In
+     * addition that Xen currently does not support vNVDIMM labels
+     * (i.e. label_size is zero here), let's not initialize of the
+     * pointer to label data if the label size is zero.
+     */
+    if (nvdimm->label_size)
+        nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
     pmem_size = QEMU_ALIGN_DOWN(pmem_size, align);
 
     if (size <= nvdimm->label_size || !pmem_size) {
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 04/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov, Anthony Perard, Chao Peng,
	Dan Williams

The memory region of vNVDIMM on Xen is a RAM memory region, so
memory_region_get_ram_ptr() cannot be used in nvdimm_realize() to get
a pointer to the label data area in that region. To be worse, it may
abort QEMU. As Xen currently does not support labels (i.e. label size
is 0) and every access in QEMU to labels is led by a label size check,
let's not intiailize nvdimm->label_data if the label size is 0.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem/nvdimm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 952fce5ec8..3e58538b99 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -87,7 +87,15 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
     align = memory_region_get_alignment(mr);
 
     pmem_size = size - nvdimm->label_size;
-    nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
+    /*
+     * The memory region of vNVDIMM on Xen is not a RAM memory region,
+     * so memory_region_get_ram_ptr() below will abort QEMU. In
+     * addition that Xen currently does not support vNVDIMM labels
+     * (i.e. label_size is zero here), let's not initialize of the
+     * pointer to label data if the label size is zero.
+     */
+    if (nvdimm->label_size)
+        nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
     pmem_size = QEMU_ALIGN_DOWN(pmem_size, align);
 
     if (size <= nvdimm->label_size || !pmem_size) {
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 05/10] xen-hvm: initialize fw_cfg interface
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

Xen is going to reuse QEMU to build ACPI of some devices (e.g., NFIT
and SSDT for NVDIMM) for HVM domains. The existing QEMU ACPI build
code requires a fw_cfg interface which will also be used to pass QEMU
built ACPI to Xen. Therefore, we need to initialize fw_cfg when any
ACPI is going to be built by QEMU.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index fe01b7a025..4b29f4052b 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -14,6 +14,7 @@
 #include "hw/pci/pci.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/loader.h"
 #include "hw/xen/xen_common.h"
 #include "hw/xen/xen_backend.h"
 #include "qmp-commands.h"
@@ -1234,6 +1235,14 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
     xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
+static void xen_fw_cfg_init(PCMachineState *pcms)
+{
+    FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
+
+    rom_set_fw(fw_cfg);
+    pcms->fw_cfg = fw_cfg;
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
@@ -1384,6 +1393,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 
     /* Disable ACPI build because Xen handles it */
     pcms->acpi_build_enabled = false;
+    if (pcms->acpi_build_enabled) {
+        xen_fw_cfg_init(pcms);
+    }
 
     return;
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 05/10] xen-hvm: initialize fw_cfg interface
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

Xen is going to reuse QEMU to build ACPI of some devices (e.g., NFIT
and SSDT for NVDIMM) for HVM domains. The existing QEMU ACPI build
code requires a fw_cfg interface which will also be used to pass QEMU
built ACPI to Xen. Therefore, we need to initialize fw_cfg when any
ACPI is going to be built by QEMU.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index fe01b7a025..4b29f4052b 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -14,6 +14,7 @@
 #include "hw/pci/pci.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/loader.h"
 #include "hw/xen/xen_common.h"
 #include "hw/xen/xen_backend.h"
 #include "qmp-commands.h"
@@ -1234,6 +1235,14 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
     xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
+static void xen_fw_cfg_init(PCMachineState *pcms)
+{
+    FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
+
+    rom_set_fw(fw_cfg);
+    pcms->fw_cfg = fw_cfg;
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
     int i, rc;
@@ -1384,6 +1393,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 
     /* Disable ACPI build because Xen handles it */
     pcms->acpi_build_enabled = false;
+    if (pcms->acpi_build_enabled) {
+        xen_fw_cfg_init(pcms);
+    }
 
     return;
 
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 06/10] hw/acpi-build, xen-hvm: introduce a Xen-specific ACPI builder
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

QEMU on KVM/TCG and Xen requires different sets of guest ACPI tables.
When QEMU builds ACPI for Xen HVM domains, the new Xen-specific ACPI
build function xen_acpi_build() is called instead of the existing path
from acpi_build().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/acpi-build.c        |  9 ++++++++-
 hw/i386/xen/xen-hvm.c       | 21 +++++++++++++++++++++
 include/hw/acpi/aml-build.h |  4 ++++
 include/hw/xen/xen.h        |  4 ++++
 stubs/xen-hvm.c             |  5 +++++
 5 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 73519ab3ac..9007ecdaed 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -60,6 +60,7 @@
 #include "qom/qom-qobject.h"
 #include "hw/i386/amd_iommu.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/xen/xen.h"
 
 #include "hw/acpi/ipmi.h"
 
@@ -2556,7 +2557,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
                  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
 }
 
-static GArray *
+GArray *
 build_rsdp(GArray *rsdp_table, BIOSLinker *linker, unsigned rsdt_tbl_offset)
 {
     AcpiRsdpDescriptor *rsdp = acpi_data_push(rsdp_table, sizeof *rsdp);
@@ -2646,6 +2647,11 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
                              64 /* Ensure FACS is aligned */,
                              false /* high memory */);
 
+    if (xen_enabled()) {
+        xen_acpi_build(tables, table_offsets, machine);
+        goto done;
+    }
+
     /*
      * FACS is pointed to by FADT.
      * We place it first since it's the only table that has alignment
@@ -2788,6 +2794,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
 
     acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE);
 
+ done:
     /* Cleanup memory that's no longer used. */
     g_array_free(table_offsets, true);
 }
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 4b29f4052b..3df20ff282 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -11,6 +11,7 @@
 #include "qemu/osdep.h"
 
 #include "cpu.h"
+#include "hw/acpi/aml-build.h"
 #include "hw/pci/pci.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic-msidef.h"
@@ -1473,3 +1474,23 @@ void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
         memory_global_dirty_log_stop();
     }
 }
+
+void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
+                    MachineState *machine)
+{
+    PCMachineState *pcms = PC_MACHINE(machine);
+    GArray *tables_blob = tables->table_data;
+    unsigned int rsdt;
+
+    if (!pcms->acpi_build_enabled) {
+        return;
+    }
+
+    /*
+     * QEMU RSDP and RSDT are only used by hvmloader to enumerate
+     * QEMU-built tables. HVM domains still use Xen-built RSDP and RSDT.
+     */
+    rsdt = tables_blob->len;
+    build_rsdt(tables_blob, tables->linker, table_offsets, 0, 0);
+    build_rsdp(tables->rsdp, tables->linker, rsdt);
+}
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 88d0738d76..03369bb7ea 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -393,4 +393,8 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
 
 void build_slit(GArray *table_data, BIOSLinker *linker);
+
+GArray *build_rsdp(GArray *rsdp_table, BIOSLinker *linker,
+                   unsigned rsdt_tbl_offset);
+
 #endif
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 7efcdaa8fe..2785b8fd35 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -10,6 +10,7 @@
 
 #include "qemu-common.h"
 #include "exec/cpu-common.h"
+#include "hw/acpi/aml-build.h"
 #include "hw/irq.h"
 
 /* xen-machine.c */
@@ -48,4 +49,7 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length);
 
 void xen_register_framebuffer(struct MemoryRegion *mr);
 
+void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
+                    MachineState *machine);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 3ca6c51b21..58017c1457 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -61,3 +61,8 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
 }
+
+void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
+                    MachineState *machine)
+{
+}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 06/10] hw/acpi-build, xen-hvm: introduce a Xen-specific ACPI builder
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

QEMU on KVM/TCG and Xen requires different sets of guest ACPI tables.
When QEMU builds ACPI for Xen HVM domains, the new Xen-specific ACPI
build function xen_acpi_build() is called instead of the existing path
from acpi_build().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/acpi-build.c        |  9 ++++++++-
 hw/i386/xen/xen-hvm.c       | 21 +++++++++++++++++++++
 include/hw/acpi/aml-build.h |  4 ++++
 include/hw/xen/xen.h        |  4 ++++
 stubs/xen-hvm.c             |  5 +++++
 5 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 73519ab3ac..9007ecdaed 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -60,6 +60,7 @@
 #include "qom/qom-qobject.h"
 #include "hw/i386/amd_iommu.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/xen/xen.h"
 
 #include "hw/acpi/ipmi.h"
 
@@ -2556,7 +2557,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
                  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
 }
 
-static GArray *
+GArray *
 build_rsdp(GArray *rsdp_table, BIOSLinker *linker, unsigned rsdt_tbl_offset)
 {
     AcpiRsdpDescriptor *rsdp = acpi_data_push(rsdp_table, sizeof *rsdp);
@@ -2646,6 +2647,11 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
                              64 /* Ensure FACS is aligned */,
                              false /* high memory */);
 
+    if (xen_enabled()) {
+        xen_acpi_build(tables, table_offsets, machine);
+        goto done;
+    }
+
     /*
      * FACS is pointed to by FADT.
      * We place it first since it's the only table that has alignment
@@ -2788,6 +2794,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
 
     acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE);
 
+ done:
     /* Cleanup memory that's no longer used. */
     g_array_free(table_offsets, true);
 }
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 4b29f4052b..3df20ff282 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -11,6 +11,7 @@
 #include "qemu/osdep.h"
 
 #include "cpu.h"
+#include "hw/acpi/aml-build.h"
 #include "hw/pci/pci.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic-msidef.h"
@@ -1473,3 +1474,23 @@ void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
         memory_global_dirty_log_stop();
     }
 }
+
+void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
+                    MachineState *machine)
+{
+    PCMachineState *pcms = PC_MACHINE(machine);
+    GArray *tables_blob = tables->table_data;
+    unsigned int rsdt;
+
+    if (!pcms->acpi_build_enabled) {
+        return;
+    }
+
+    /*
+     * QEMU RSDP and RSDT are only used by hvmloader to enumerate
+     * QEMU-built tables. HVM domains still use Xen-built RSDP and RSDT.
+     */
+    rsdt = tables_blob->len;
+    build_rsdt(tables_blob, tables->linker, table_offsets, 0, 0);
+    build_rsdp(tables->rsdp, tables->linker, rsdt);
+}
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 88d0738d76..03369bb7ea 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -393,4 +393,8 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
 
 void build_slit(GArray *table_data, BIOSLinker *linker);
+
+GArray *build_rsdp(GArray *rsdp_table, BIOSLinker *linker,
+                   unsigned rsdt_tbl_offset);
+
 #endif
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 7efcdaa8fe..2785b8fd35 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -10,6 +10,7 @@
 
 #include "qemu-common.h"
 #include "exec/cpu-common.h"
+#include "hw/acpi/aml-build.h"
 #include "hw/irq.h"
 
 /* xen-machine.c */
@@ -48,4 +49,7 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length);
 
 void xen_register_framebuffer(struct MemoryRegion *mr);
 
+void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
+                    MachineState *machine);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 3ca6c51b21..58017c1457 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -61,3 +61,8 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
 }
+
+void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
+                    MachineState *machine)
+{
+}
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 07/10] xen-hvm: add functions to copy data from/to HVM memory
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Michael S. Tsirkin

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/xen/xen.h  |  3 +++
 stubs/xen-hvm.c       | 10 ++++++++++
 3 files changed, 68 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 3df20ff282..a7e99bd438 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1494,3 +1494,58 @@ void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
     build_rsdt(tables_blob, tables->linker, table_offsets, 0, 0);
     build_rsdp(tables->rsdp, tables->linker, rsdt);
 }
+
+static size_t xen_rw_guest(ram_addr_t gpa,
+                           void *buf, size_t length, bool is_write)
+{
+    size_t copied = 0, size;
+    ram_addr_t s, e, offset, cur = gpa;
+    xen_pfn_t cur_pfn;
+    void *page;
+    int prot = is_write ? PROT_WRITE : PROT_READ;
+
+    if (!buf || !length) {
+        return 0;
+    }
+
+    s = gpa & TARGET_PAGE_MASK;
+    e = gpa + length;
+    if (e < s) {
+        return 0;
+    }
+
+    while (cur < e) {
+        cur_pfn = cur >> TARGET_PAGE_BITS;
+        offset = cur - (cur_pfn << TARGET_PAGE_BITS);
+        size = MIN(length, TARGET_PAGE_SIZE - offset);
+
+        page = xenforeignmemory_map(xen_fmem, xen_domid, prot, 1, &cur_pfn, NULL);
+        if (!page) {
+            break;
+        }
+
+        if (is_write) {
+            memcpy(page + offset, buf, size);
+        } else {
+            memcpy(buf, page + offset, size);
+        }
+        xenforeignmemory_unmap(xen_fmem, page, 1);
+
+        copied += size;
+        buf += size;
+        cur += size;
+        length -= size;
+    }
+
+    return copied;
+}
+
+size_t xen_copy_to_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return xen_rw_guest(gpa, buf, length, true);
+}
+
+size_t xen_copy_from_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return xen_rw_guest(gpa, buf, length, false);
+}
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 2785b8fd35..cc40d45aeb 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -52,4 +52,7 @@ void xen_register_framebuffer(struct MemoryRegion *mr);
 void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
                     MachineState *machine);
 
+size_t xen_copy_to_guest(ram_addr_t gpa, void *buf, size_t length);
+size_t xen_copy_from_guest(ram_addr_t gpa, void *buf, size_t length);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 58017c1457..5de02842a3 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -66,3 +66,13 @@ void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
                     MachineState *machine)
 {
 }
+
+size_t xen_copy_to_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return 0;
+}
+
+size_t xen_copy_from_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return 0;
+}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 07/10] xen-hvm: add functions to copy data from/to HVM memory
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/xen/xen.h  |  3 +++
 stubs/xen-hvm.c       | 10 ++++++++++
 3 files changed, 68 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 3df20ff282..a7e99bd438 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1494,3 +1494,58 @@ void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
     build_rsdt(tables_blob, tables->linker, table_offsets, 0, 0);
     build_rsdp(tables->rsdp, tables->linker, rsdt);
 }
+
+static size_t xen_rw_guest(ram_addr_t gpa,
+                           void *buf, size_t length, bool is_write)
+{
+    size_t copied = 0, size;
+    ram_addr_t s, e, offset, cur = gpa;
+    xen_pfn_t cur_pfn;
+    void *page;
+    int prot = is_write ? PROT_WRITE : PROT_READ;
+
+    if (!buf || !length) {
+        return 0;
+    }
+
+    s = gpa & TARGET_PAGE_MASK;
+    e = gpa + length;
+    if (e < s) {
+        return 0;
+    }
+
+    while (cur < e) {
+        cur_pfn = cur >> TARGET_PAGE_BITS;
+        offset = cur - (cur_pfn << TARGET_PAGE_BITS);
+        size = MIN(length, TARGET_PAGE_SIZE - offset);
+
+        page = xenforeignmemory_map(xen_fmem, xen_domid, prot, 1, &cur_pfn, NULL);
+        if (!page) {
+            break;
+        }
+
+        if (is_write) {
+            memcpy(page + offset, buf, size);
+        } else {
+            memcpy(buf, page + offset, size);
+        }
+        xenforeignmemory_unmap(xen_fmem, page, 1);
+
+        copied += size;
+        buf += size;
+        cur += size;
+        length -= size;
+    }
+
+    return copied;
+}
+
+size_t xen_copy_to_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return xen_rw_guest(gpa, buf, length, true);
+}
+
+size_t xen_copy_from_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return xen_rw_guest(gpa, buf, length, false);
+}
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 2785b8fd35..cc40d45aeb 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -52,4 +52,7 @@ void xen_register_framebuffer(struct MemoryRegion *mr);
 void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
                     MachineState *machine);
 
+size_t xen_copy_to_guest(ram_addr_t gpa, void *buf, size_t length);
+size_t xen_copy_from_guest(ram_addr_t gpa, void *buf, size_t length);
+
 #endif /* QEMU_HW_XEN_H */
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 58017c1457..5de02842a3 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -66,3 +66,13 @@ void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
                     MachineState *machine)
 {
 }
+
+size_t xen_copy_to_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return 0;
+}
+
+size_t xen_copy_from_guest(ram_addr_t gpa, void *buf, size_t length)
+{
+    return 0;
+}
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 08/10] nvdimm acpi: add functions to access DSM memory on Xen
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov

Xen hvmloader can load QEMU-built NVDIMM ACPI tables via the
BIOSLinkerLoader interface, but it allocates memory in an area not
covered by any memory regions in QEMU, i.e., the hvmloader memory
cannot be accessed via the normal cpu_physical_memory_{read,write}().
If QEMU on Xen has to access the hvmloader memory in DSM emulation,
it has to take a different path, i.e., xen_copy_{from,to}_guest().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/nvdimm.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 6ceea196e7..7b3062e001 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -32,6 +32,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/xen/xen.h"
 
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
@@ -497,6 +498,35 @@ struct NvdimmFuncReadFITOut {
 typedef struct NvdimmFuncReadFITOut NvdimmFuncReadFITOut;
 QEMU_BUILD_BUG_ON(sizeof(NvdimmFuncReadFITOut) > NVDIMM_DSM_MEMORY_SIZE);
 
+/*
+ * Xen hvmloader can load QEMU-built NVDIMM ACPI tables via the
+ * BIOSLinkerLoader interface, but it allocates memory in an area not
+ * covered by any memory regions in QEMU, i.e., the hvmloader memory
+ * cannot be accessed via the normal cpu_physical_memory_{read,write}().
+ * If QEMU on Xen has to access the hvmloader memory in DSM emulation,
+ * it has to take a different path, i.e., xen_copy_{from,to}_guest().
+ */
+
+static void
+nvdimm_copy_from_dsm_mem(hwaddr dsm_mem_addr, void *dst, unsigned size)
+{
+    if (xen_enabled()) {
+        xen_copy_from_guest(dsm_mem_addr, dst, size);
+    } else {
+        cpu_physical_memory_read(dsm_mem_addr, dst, size);
+    }
+}
+
+static void
+nvdimm_copy_to_dsm_mem(hwaddr dsm_mem_addr, void *src, unsigned size)
+{
+    if (xen_enabled()) {
+        xen_copy_to_guest(dsm_mem_addr, src, size);
+    } else {
+        cpu_physical_memory_write(dsm_mem_addr, src, size);
+    }
+}
+
 static void
 nvdimm_dsm_function0(uint32_t supported_func, hwaddr dsm_mem_addr)
 {
@@ -504,7 +534,7 @@ nvdimm_dsm_function0(uint32_t supported_func, hwaddr dsm_mem_addr)
         .len = cpu_to_le32(sizeof(func0)),
         .supported_func = cpu_to_le32(supported_func),
     };
-    cpu_physical_memory_write(dsm_mem_addr, &func0, sizeof(func0));
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, &func0, sizeof(func0));
 }
 
 static void
@@ -514,7 +544,7 @@ nvdimm_dsm_no_payload(uint32_t func_ret_status, hwaddr dsm_mem_addr)
         .len = cpu_to_le32(sizeof(out)),
         .func_ret_status = cpu_to_le32(func_ret_status),
     };
-    cpu_physical_memory_write(dsm_mem_addr, &out, sizeof(out));
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, &out, sizeof(out));
 }
 
 #define NVDIMM_DSM_RET_STATUS_SUCCESS        0 /* Success */
@@ -569,7 +599,7 @@ exit:
     read_fit_out->func_ret_status = cpu_to_le32(func_ret_status);
     memcpy(read_fit_out->fit, fit->data + read_fit->offset, read_len);
 
-    cpu_physical_memory_write(dsm_mem_addr, read_fit_out, size);
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, read_fit_out, size);
 
     g_free(read_fit_out);
 }
@@ -655,8 +685,8 @@ static void nvdimm_dsm_label_size(NVDIMMDevice *nvdimm, hwaddr dsm_mem_addr)
     label_size_out.label_size = cpu_to_le32(label_size);
     label_size_out.max_xfer = cpu_to_le32(mxfer);
 
-    cpu_physical_memory_write(dsm_mem_addr, &label_size_out,
-                              sizeof(label_size_out));
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, &label_size_out,
+                           sizeof(label_size_out));
 }
 
 static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
@@ -721,7 +751,7 @@ static void nvdimm_dsm_get_label_data(NVDIMMDevice *nvdimm, NvdimmDsmIn *in,
     nvc->read_label_data(nvdimm, get_label_data_out->out_buf,
                          get_label_data->length, get_label_data->offset);
 
-    cpu_physical_memory_write(dsm_mem_addr, get_label_data_out, size);
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, get_label_data_out, size);
     g_free(get_label_data_out);
 }
 
@@ -831,7 +861,7 @@ nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
      * this by copying DSM memory to QEMU local memory.
      */
     in = g_new(NvdimmDsmIn, 1);
-    cpu_physical_memory_read(dsm_mem_addr, in, sizeof(*in));
+    nvdimm_copy_from_dsm_mem(dsm_mem_addr, in, sizeof(*in));
 
     le32_to_cpus(&in->revision);
     le32_to_cpus(&in->function);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 08/10] nvdimm acpi: add functions to access DSM memory on Xen
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov, Anthony Perard, Chao Peng,
	Dan Williams

Xen hvmloader can load QEMU-built NVDIMM ACPI tables via the
BIOSLinkerLoader interface, but it allocates memory in an area not
covered by any memory regions in QEMU, i.e., the hvmloader memory
cannot be accessed via the normal cpu_physical_memory_{read,write}().
If QEMU on Xen has to access the hvmloader memory in DSM emulation,
it has to take a different path, i.e., xen_copy_{from,to}_guest().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/nvdimm.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 6ceea196e7..7b3062e001 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -32,6 +32,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/xen/xen.h"
 
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
@@ -497,6 +498,35 @@ struct NvdimmFuncReadFITOut {
 typedef struct NvdimmFuncReadFITOut NvdimmFuncReadFITOut;
 QEMU_BUILD_BUG_ON(sizeof(NvdimmFuncReadFITOut) > NVDIMM_DSM_MEMORY_SIZE);
 
+/*
+ * Xen hvmloader can load QEMU-built NVDIMM ACPI tables via the
+ * BIOSLinkerLoader interface, but it allocates memory in an area not
+ * covered by any memory regions in QEMU, i.e., the hvmloader memory
+ * cannot be accessed via the normal cpu_physical_memory_{read,write}().
+ * If QEMU on Xen has to access the hvmloader memory in DSM emulation,
+ * it has to take a different path, i.e., xen_copy_{from,to}_guest().
+ */
+
+static void
+nvdimm_copy_from_dsm_mem(hwaddr dsm_mem_addr, void *dst, unsigned size)
+{
+    if (xen_enabled()) {
+        xen_copy_from_guest(dsm_mem_addr, dst, size);
+    } else {
+        cpu_physical_memory_read(dsm_mem_addr, dst, size);
+    }
+}
+
+static void
+nvdimm_copy_to_dsm_mem(hwaddr dsm_mem_addr, void *src, unsigned size)
+{
+    if (xen_enabled()) {
+        xen_copy_to_guest(dsm_mem_addr, src, size);
+    } else {
+        cpu_physical_memory_write(dsm_mem_addr, src, size);
+    }
+}
+
 static void
 nvdimm_dsm_function0(uint32_t supported_func, hwaddr dsm_mem_addr)
 {
@@ -504,7 +534,7 @@ nvdimm_dsm_function0(uint32_t supported_func, hwaddr dsm_mem_addr)
         .len = cpu_to_le32(sizeof(func0)),
         .supported_func = cpu_to_le32(supported_func),
     };
-    cpu_physical_memory_write(dsm_mem_addr, &func0, sizeof(func0));
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, &func0, sizeof(func0));
 }
 
 static void
@@ -514,7 +544,7 @@ nvdimm_dsm_no_payload(uint32_t func_ret_status, hwaddr dsm_mem_addr)
         .len = cpu_to_le32(sizeof(out)),
         .func_ret_status = cpu_to_le32(func_ret_status),
     };
-    cpu_physical_memory_write(dsm_mem_addr, &out, sizeof(out));
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, &out, sizeof(out));
 }
 
 #define NVDIMM_DSM_RET_STATUS_SUCCESS        0 /* Success */
@@ -569,7 +599,7 @@ exit:
     read_fit_out->func_ret_status = cpu_to_le32(func_ret_status);
     memcpy(read_fit_out->fit, fit->data + read_fit->offset, read_len);
 
-    cpu_physical_memory_write(dsm_mem_addr, read_fit_out, size);
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, read_fit_out, size);
 
     g_free(read_fit_out);
 }
@@ -655,8 +685,8 @@ static void nvdimm_dsm_label_size(NVDIMMDevice *nvdimm, hwaddr dsm_mem_addr)
     label_size_out.label_size = cpu_to_le32(label_size);
     label_size_out.max_xfer = cpu_to_le32(mxfer);
 
-    cpu_physical_memory_write(dsm_mem_addr, &label_size_out,
-                              sizeof(label_size_out));
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, &label_size_out,
+                           sizeof(label_size_out));
 }
 
 static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
@@ -721,7 +751,7 @@ static void nvdimm_dsm_get_label_data(NVDIMMDevice *nvdimm, NvdimmDsmIn *in,
     nvc->read_label_data(nvdimm, get_label_data_out->out_buf,
                          get_label_data->length, get_label_data->offset);
 
-    cpu_physical_memory_write(dsm_mem_addr, get_label_data_out, size);
+    nvdimm_copy_to_dsm_mem(dsm_mem_addr, get_label_data_out, size);
     g_free(get_label_data_out);
 }
 
@@ -831,7 +861,7 @@ nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
      * this by copying DSM memory to QEMU local memory.
      */
     in = g_new(NvdimmDsmIn, 1);
-    cpu_physical_memory_read(dsm_mem_addr, in, sizeof(*in));
+    nvdimm_copy_from_dsm_mem(dsm_mem_addr, in, sizeof(*in));
 
     le32_to_cpus(&in->revision);
     le32_to_cpus(&in->function);
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 09/10] nvdimm acpi: add compatibility for 64-bit integer in ACPI 2.0 and later
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov

When QEMU is used as Xen device model, the QEMU-built NVDIMM ACPI
tables (NFIT and SSDT) may be passed to Xen and merged with Xen-built
ACPI tables. However, different ACPI versions are used between QEMU
(ACPI 1.0) and Xen (ACPI 2.0), and different integer widths are used
between ACPI 1.0 (32 bits) and ACPI 2.0 (64 bits).

Due to the implicit type conversion between ACPI buffer field object
and ACPI integer object (ref. ACPI Spec 6.2, Sect 19.3.5.5, 19.3.5.7 &
19.3.5.8), the following AML in NVDIMM SSDT may behave differently in
ACPI 1.0 and ACPI 2.0:

    Method (NCAL, 5, Serialized)
    {
        Local6 = MEMA /* \MEMA */
        OperationRegion (NPIO, SystemIO, 0x0A18, 0x04)
        OperationRegion (NRAM, SystemMemory, Local6, 0x1000)
        Field (NPIO, DWordAcc, NoLock, Preserve)
        {
            NTFI,   32
        }

        ...

        Field (NRAM, DWordAcc, NoLock, Preserve)
        {
            RLEN,   32,
            ODAT,   32736
        }

        ...

        NTFI = Local6
        Local1 = (RLEN - 0x04)
        Local1 = (Local1 << 0x03)
        CreateField (ODAT, Zero, Local1, OBUF)
        Concatenate (Buffer (Zero){}, OBUF, Local7)
        Return (Local7)
    }

The C layout of the above ODAT is struct NvdimmFuncReadFitOut without
the length field:

    struct {
        uint32_t func_ret_status;
	uint8_t fit[0];
    }

When no error happens and no FIT data is needed to return,
nvdimm_dsm_func_read_fit() fills

    { .func_ret_status = 0 },

i.e., 4 bytes of 0's in ODAT. Because the length of ODAT is no larger
than an integer, OBUF is implicitly converted into an ACPI integer
object during the evaluation of CreateField. Later, when OBUF is
concatenated to another buffer, it needs to be converted to an ACPI
buffer object.  It's converted to a 4 bytes buffer in ACPI 1.0, but
it's converted to a 8 bytes buffer in ACPI 2.0. The extra 4 bytes in
ACPI 2.0 actually corresponds to the apparently incorrect case that

    { .func_ret_status = 0, fit = { 0, 0, 0, 0 } }

is filled in ODAT.

In order to mitigate this issue, we add a 32-bit reserved field after
func_ret_status and always fill it with 0. Therefore, the minimum
length of ODAT in both ACPI 1.0 and ACPI 2.0 is always 8 bytes, so no
extra bytes will be added accidentally by the implicit conversion.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/nvdimm.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 7b3062e001..bceb35e75a 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -493,6 +493,7 @@ struct NvdimmFuncReadFITOut {
     /* the size of buffer filled by QEMU. */
     uint32_t len;
     uint32_t func_ret_status; /* return status code. */
+    uint32_t reserved;
     uint8_t fit[0]; /* the FIT data. */
 } QEMU_PACKED;
 typedef struct NvdimmFuncReadFITOut NvdimmFuncReadFITOut;
@@ -597,6 +598,7 @@ exit:
 
     read_fit_out->len = cpu_to_le32(size);
     read_fit_out->func_ret_status = cpu_to_le32(func_ret_status);
+    read_fit_out->reserved = 0;
     memcpy(read_fit_out->fit, fit->data + read_fit->offset, read_len);
 
     nvdimm_copy_to_dsm_mem(dsm_mem_addr, read_fit_out, size);
@@ -1168,7 +1170,8 @@ static void nvdimm_build_fit(Aml *dev)
 
     aml_append(method, aml_store(aml_sizeof(buf), buf_size));
     aml_append(method, aml_subtract(buf_size,
-                                    aml_int(4) /* the size of "STAU" */,
+                                    aml_int(8) /* the size of "STAU" and the
+                                                  consequent reserved field */,
                                     buf_size));
 
     /* if we read the end of fit. */
@@ -1177,7 +1180,7 @@ static void nvdimm_build_fit(Aml *dev)
     aml_append(method, ifctx);
 
     aml_append(method, aml_create_field(buf,
-                            aml_int(4 * BITS_PER_BYTE), /* offset at byte 4.*/
+                            aml_int(8 * BITS_PER_BYTE), /* offset at byte 8. */
                             aml_shiftleft(buf_size, aml_int(3)), "BUFF"));
     aml_append(method, aml_return(aml_name("BUFF")));
     aml_append(dev, method);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 09/10] nvdimm acpi: add compatibility for 64-bit integer in ACPI 2.0 and later
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov, Anthony Perard, Chao Peng,
	Dan Williams

When QEMU is used as Xen device model, the QEMU-built NVDIMM ACPI
tables (NFIT and SSDT) may be passed to Xen and merged with Xen-built
ACPI tables. However, different ACPI versions are used between QEMU
(ACPI 1.0) and Xen (ACPI 2.0), and different integer widths are used
between ACPI 1.0 (32 bits) and ACPI 2.0 (64 bits).

Due to the implicit type conversion between ACPI buffer field object
and ACPI integer object (ref. ACPI Spec 6.2, Sect 19.3.5.5, 19.3.5.7 &
19.3.5.8), the following AML in NVDIMM SSDT may behave differently in
ACPI 1.0 and ACPI 2.0:

    Method (NCAL, 5, Serialized)
    {
        Local6 = MEMA /* \MEMA */
        OperationRegion (NPIO, SystemIO, 0x0A18, 0x04)
        OperationRegion (NRAM, SystemMemory, Local6, 0x1000)
        Field (NPIO, DWordAcc, NoLock, Preserve)
        {
            NTFI,   32
        }

        ...

        Field (NRAM, DWordAcc, NoLock, Preserve)
        {
            RLEN,   32,
            ODAT,   32736
        }

        ...

        NTFI = Local6
        Local1 = (RLEN - 0x04)
        Local1 = (Local1 << 0x03)
        CreateField (ODAT, Zero, Local1, OBUF)
        Concatenate (Buffer (Zero){}, OBUF, Local7)
        Return (Local7)
    }

The C layout of the above ODAT is struct NvdimmFuncReadFitOut without
the length field:

    struct {
        uint32_t func_ret_status;
	uint8_t fit[0];
    }

When no error happens and no FIT data is needed to return,
nvdimm_dsm_func_read_fit() fills

    { .func_ret_status = 0 },

i.e., 4 bytes of 0's in ODAT. Because the length of ODAT is no larger
than an integer, OBUF is implicitly converted into an ACPI integer
object during the evaluation of CreateField. Later, when OBUF is
concatenated to another buffer, it needs to be converted to an ACPI
buffer object.  It's converted to a 4 bytes buffer in ACPI 1.0, but
it's converted to a 8 bytes buffer in ACPI 2.0. The extra 4 bytes in
ACPI 2.0 actually corresponds to the apparently incorrect case that

    { .func_ret_status = 0, fit = { 0, 0, 0, 0 } }

is filled in ODAT.

In order to mitigate this issue, we add a 32-bit reserved field after
func_ret_status and always fill it with 0. Therefore, the minimum
length of ODAT in both ACPI 1.0 and ACPI 2.0 is always 8 bytes, so no
extra bytes will be added accidentally by the implicit conversion.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/nvdimm.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 7b3062e001..bceb35e75a 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -493,6 +493,7 @@ struct NvdimmFuncReadFITOut {
     /* the size of buffer filled by QEMU. */
     uint32_t len;
     uint32_t func_ret_status; /* return status code. */
+    uint32_t reserved;
     uint8_t fit[0]; /* the FIT data. */
 } QEMU_PACKED;
 typedef struct NvdimmFuncReadFITOut NvdimmFuncReadFITOut;
@@ -597,6 +598,7 @@ exit:
 
     read_fit_out->len = cpu_to_le32(size);
     read_fit_out->func_ret_status = cpu_to_le32(func_ret_status);
+    read_fit_out->reserved = 0;
     memcpy(read_fit_out->fit, fit->data + read_fit->offset, read_len);
 
     nvdimm_copy_to_dsm_mem(dsm_mem_addr, read_fit_out, size);
@@ -1168,7 +1170,8 @@ static void nvdimm_build_fit(Aml *dev)
 
     aml_append(method, aml_store(aml_sizeof(buf), buf_size));
     aml_append(method, aml_subtract(buf_size,
-                                    aml_int(4) /* the size of "STAU" */,
+                                    aml_int(8) /* the size of "STAU" and the
+                                                  consequent reserved field */,
                                     buf_size));
 
     /* if we read the end of fit. */
@@ -1177,7 +1180,7 @@ static void nvdimm_build_fit(Aml *dev)
     aml_append(method, ifctx);
 
     aml_append(method, aml_create_field(buf,
-                            aml_int(4 * BITS_PER_BYTE), /* offset at byte 4.*/
+                            aml_int(8 * BITS_PER_BYTE), /* offset at byte 8. */
                             aml_shiftleft(buf_size, aml_int(3)), "BUFF"));
     aml_append(method, aml_return(aml_name("BUFF")));
     aml_append(dev, method);
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [Qemu-devel] [RFC QEMU PATCH v4 10/10] xen-hvm: enable building NFIT and SSDT of vNVDIMM for HVM domains
  2017-12-07 10:18   ` Haozhong Zhang
@ 2017-12-07 10:18     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Stefano Stabellini, Anthony Perard, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Haozhong Zhang, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

When QEMU is used the device model of Xen HVM domain and vNVDIMM
devices are present, enable building ACPI tables related to vNVDIMM.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index a7e99bd438..33447fc482 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1236,6 +1236,11 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
     xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
+static bool xen_dm_acpi_build_enabled(PCMachineState *pcms)
+{
+    return pcms->acpi_nvdimm_state.is_enabled;
+}
+
 static void xen_fw_cfg_init(PCMachineState *pcms)
 {
     FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
@@ -1392,8 +1397,7 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
     xen_be_register_common();
     xen_read_physmap(state);
 
-    /* Disable ACPI build because Xen handles it */
-    pcms->acpi_build_enabled = false;
+    pcms->acpi_build_enabled = xen_dm_acpi_build_enabled(pcms);;
     if (pcms->acpi_build_enabled) {
         xen_fw_cfg_init(pcms);
     }
@@ -1486,6 +1490,11 @@ void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
         return;
     }
 
+    if (pcms->acpi_nvdimm_state.is_enabled) {
+        nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
+                          &pcms->acpi_nvdimm_state, machine->ram_slots);
+    }
+
     /*
      * QEMU RSDP and RSDT are only used by hvmloader to enumerate
      * QEMU-built tables. HVM domains still use Xen-built RSDP and RSDT.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [RFC QEMU PATCH v4 10/10] xen-hvm: enable building NFIT and SSDT of vNVDIMM for HVM domains
@ 2017-12-07 10:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2017-12-07 10:18 UTC (permalink / raw)
  To: qemu-devel, xen-devel
  Cc: Haozhong Zhang, Stefano Stabellini, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Anthony Perard, Chao Peng,
	Dan Williams, Richard Henderson

When QEMU is used the device model of Xen HVM domain and vNVDIMM
devices are present, enable building ACPI tables related to vNVDIMM.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index a7e99bd438..33447fc482 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1236,6 +1236,11 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
     xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
+static bool xen_dm_acpi_build_enabled(PCMachineState *pcms)
+{
+    return pcms->acpi_nvdimm_state.is_enabled;
+}
+
 static void xen_fw_cfg_init(PCMachineState *pcms)
 {
     FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
@@ -1392,8 +1397,7 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
     xen_be_register_common();
     xen_read_physmap(state);
 
-    /* Disable ACPI build because Xen handles it */
-    pcms->acpi_build_enabled = false;
+    pcms->acpi_build_enabled = xen_dm_acpi_build_enabled(pcms);;
     if (pcms->acpi_build_enabled) {
         xen_fw_cfg_init(pcms);
     }
@@ -1486,6 +1490,11 @@ void xen_acpi_build(AcpiBuildTables *tables, GArray *table_offsets,
         return;
     }
 
+    if (pcms->acpi_nvdimm_state.is_enabled) {
+        nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
+                          &pcms->acpi_nvdimm_state, machine->ram_slots);
+    }
+
     /*
      * QEMU RSDP and RSDT are only used by hvmloader to enumerate
      * QEMU-built tables. HVM domains still use Xen-built RSDP and RSDT.
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
  2017-12-07 10:09 ` [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
@ 2018-01-04  6:12   ` Chao Peng
  2018-05-07 15:59   ` Jan Beulich
  1 sibling, 0 replies; 113+ messages in thread
From: Chao Peng @ 2018-01-04  6:12 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel; +Cc: Andrew Cooper, Dan Williams, Jan Beulich

On Thu, 2017-12-07 at 18:09 +0800, Haozhong Zhang wrote:
> The current check refuses the hot-plugged memory that falls in one
> unused PDX group, which should be allowed.
> 
Reviewed-by: Chao Peng <chao.p.peng@linux.intel.com>

> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/x86_64/mm.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 9b37da6698..839038b6c3 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1295,12 +1295,8 @@ static int mem_hotadd_check(unsigned long spfn,
> unsigned long epfn)
>          return 0;
>  
>      /* Make sure the new range is not present now */
> -    sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  &
> ~(PDX_GROUP_COUNT - 1))
> -            / PDX_GROUP_COUNT;
> +    sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) /
> PDX_GROUP_COUNT;
>      eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) /
> PDX_GROUP_COUNT;
> -    if (sidx >= eidx)
> -        return 0;
> -
>      s = find_next_zero_bit(pdx_group_valid, eidx, sidx);
>      if ( s > eidx )
>          return 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 02/41] x86_64/mm: avoid cleaning the unmapped frame table
  2017-12-07 10:09 ` [RFC XEN PATCH v4 02/41] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
@ 2018-01-04  6:20   ` Chao Peng
  0 siblings, 0 replies; 113+ messages in thread
From: Chao Peng @ 2018-01-04  6:20 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel; +Cc: Andrew Cooper, Dan Williams, Jan Beulich

On Thu, 2017-12-07 at 18:09 +0800, Haozhong Zhang wrote:
> cleanup_frame_table() initializes the entire newly added frame table
> to all -1's. If it's called after extend_frame_table() failed to map
> the entire frame table, the initialization will hit a page fault.
> 
> Move the cleanup of partially mapped frametable to
> extend_frame_table(),
> which has enough knowledge of the mapping status.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> @Chao: I don't modify this patch per your comment, because I feel it's
> better to handle the errors locally in each function (rather than
> handle
> all of them in the top-level), which will make each function easier to
> use.

I don't insist on this, to me this is kind of flavor choice.

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 03/41] hvmloader/util: do not compare characters after '\0' in strncmp
  2017-12-07 10:09 ` [RFC XEN PATCH v4 03/41] hvmloader/util: do not compare characters after '\0' in strncmp Haozhong Zhang
@ 2018-01-04  6:23   ` Chao Peng
  0 siblings, 0 replies; 113+ messages in thread
From: Chao Peng @ 2018-01-04  6:23 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich, Dan Williams

On Thu, 2017-12-07 at 18:09 +0800, Haozhong Zhang wrote:
> ... to make its behavior the same as C standard (e.g., C99 and C11).
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/firmware/hvmloader/util.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/util.c
> b/tools/firmware/hvmloader/util.c
> index 0c3f2d24cd..76a61ee052 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -141,9 +141,16 @@ int strcmp(const char *cs, const char *ct)
>  int strncmp(const char *s1, const char *s2, uint32_t n)
>  {
>      uint32_t ctr;
> +
>      for (ctr = 0; ctr < n; ctr++)
> +    {
>          if (s1[ctr] != s2[ctr])
>              return (int)(s1[ctr] - s2[ctr]);
> +
> +        if (!s1[ctr])

Coding style, but, the original code above has issue too.
Besides this, Reviewed-by: Chao Peng <chao.p.peng@linux.intel.com>

> +            break;
> +    }
> +
>      return 0;
>  }
>  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (41 preceding siblings ...)
  2017-12-07 10:18   ` Haozhong Zhang
@ 2018-02-09 12:33 ` Roger Pau Monné
  2018-02-12  1:25   ` Haozhong Zhang
  42 siblings, 1 reply; 113+ messages in thread
From: Roger Pau Monné @ 2018-02-09 12:33 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Jan Beulich, Shane Wang, Chao Peng, Dan Williams, Gang Wei

Thanks for the series, I'm however wondering whether it's appropriate
to post a v4 as RFC. Ie: at v4 the reviewer expects the submitter to
have a clear picture of what needs to be implemented.

On Thu, Dec 07, 2017 at 06:09:49PM +0800, Haozhong Zhang wrote:
> All patches can also be found at
>   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
>   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4
> 
> RFC v3 can be found at
>   https://lists.xen.org/archives/html/xen-devel/2017-09/msg00964.html
> 
> Changes in v4:
>   * Move the functionality of management util 'xen-ndctl' to Xne
>     management tool 'xl'.
>   * Load QEMU ACPI via QEMU fw_cfg and BIOSLinkerLoader interface.
>   * Other changes are documented in patches separately.
> 
> 
> - Part 0. Bug fix and code cleanup
>   [01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
>   [02/41] x86_64/mm: avoid cleaning the unmapped frame table
>   [03/41] hvmloader/util: do not compare characters after '\0' in strncmp
> 
> - Part 1. Detect host PMEM
>   Detect host PMEM via NFIT. No frametable and M2P table for them are
>   created in this part.
> 
>   [04/41] xen/common: add Kconfig item for pmem support
>   [05/41] x86/mm: exclude PMEM regions from initial frametable
>   [06/41] acpi: probe valid PMEM regions via NFIT
>   [07/41] xen/pmem: register valid PMEM regions to Xen hypervisor
>   [08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0

I'm afraid I might ask stupied questions, since I haven't followed the
design discussion of this series very closely.

So you basically hide the NVDIMM from Dom0, and only allow guests to
use it?

What happens when you boot the same system without Xen? Will the
NVDIMM get corrupted because for example Linux will write something to
it?

>   [09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
>   [10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
>   [11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
>   [12/41] tools/xl: add xl command 'pmem-list'
> 
> - Part 2. Setup host PMEM for management and guest data usage
>   Allow users or admins in Dom0 to setup host PMEM pages for
>   management and guest data usages.
>    * Management PMEM pages are used to store the frametable and M2P of
>      PMEM pages (including themselves), and never mapped to guest.
>    * Guest data PMEM pages can be mapped to guest and used as the
>      backend storage of virtual NVDIMM devices.

So this is basically tied to a PV Dom0, but I would like to also think
about what would happen with a PVH Dom0. In that case AFAICT Xen could
map the full NVDIMM to the Dom0 p2m as MMIO using 1GB pages, at which
point Dom0 could manage the NVDIMM as desired? Ie: Dom0 could map
parts of the NVDIMM to DomU as it maps other MMIO regions.

I'm not sure Xen needs to know anything else apart from how to map the
full NVDIMM to Dom0 as MMIO, which would greatly simplify this series.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-09 12:33 ` [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Roger Pau Monné
@ 2018-02-12  1:25   ` Haozhong Zhang
  2018-02-12 10:05     ` Roger Pau Monné
  0 siblings, 1 reply; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-12  1:25 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Jan Beulich, Shane Wang, Chao Peng, Dan Williams, Gang Wei

On 02/09/18 12:33 +0000, Roger Pau Monné wrote:
> Thanks for the series, I'm however wondering whether it's appropriate
> to post a v4 as RFC. Ie: at v4 the reviewer expects the submitter to
> have a clear picture of what needs to be implemented.
> 
> On Thu, Dec 07, 2017 at 06:09:49PM +0800, Haozhong Zhang wrote:
> > All patches can also be found at
> >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
> >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4
> > 
> > RFC v3 can be found at
> >   https://lists.xen.org/archives/html/xen-devel/2017-09/msg00964.html
> > 
> > Changes in v4:
> >   * Move the functionality of management util 'xen-ndctl' to Xne
> >     management tool 'xl'.
> >   * Load QEMU ACPI via QEMU fw_cfg and BIOSLinkerLoader interface.
> >   * Other changes are documented in patches separately.
> > 
> > 
> > - Part 0. Bug fix and code cleanup
> >   [01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
> >   [02/41] x86_64/mm: avoid cleaning the unmapped frame table
> >   [03/41] hvmloader/util: do not compare characters after '\0' in strncmp
> > 
> > - Part 1. Detect host PMEM
> >   Detect host PMEM via NFIT. No frametable and M2P table for them are
> >   created in this part.
> > 
> >   [04/41] xen/common: add Kconfig item for pmem support
> >   [05/41] x86/mm: exclude PMEM regions from initial frametable
> >   [06/41] acpi: probe valid PMEM regions via NFIT
> >   [07/41] xen/pmem: register valid PMEM regions to Xen hypervisor
> >   [08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0
> 
> I'm afraid I might ask stupied questions, since I haven't followed the
> design discussion of this series very closely.
> 
> So you basically hide the NVDIMM from Dom0, and only allow guests to
> use it?

Yes, though I have some unsent patches (for vNVDIMM label support) to
allow QEMU in dom0 to access NVDIMM via DMOP.

> 
> What happens when you boot the same system without Xen? Will the
> NVDIMM get corrupted because for example Linux will write something to
> it?

Bare metal OS without Xen may write to NVDIMM which may or may not
corrupt the data, depending on the existing data on NVDIMM and how
other OS uses NVDIMM.

If the bare-metal OS uses NVDIMM, for example, as the volatile memory
or the fast disk cache, then the random data may be dumped to NVDIMM
and corrupt the existing data.

If the bare-metal OS treats NVDIMM as storage, it may probe certain
structures (e.g., file systems) on NVDIMM before further operations
and stop if such structures are not probed. In such case, the existing
data on NVDIMM will not be corrupted.

> 
> >   [09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
> >   [10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
> >   [11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
> >   [12/41] tools/xl: add xl command 'pmem-list'
> > 
> > - Part 2. Setup host PMEM for management and guest data usage
> >   Allow users or admins in Dom0 to setup host PMEM pages for
> >   management and guest data usages.
> >    * Management PMEM pages are used to store the frametable and M2P of
> >      PMEM pages (including themselves), and never mapped to guest.
> >    * Guest data PMEM pages can be mapped to guest and used as the
> >      backend storage of virtual NVDIMM devices.
> 
> So this is basically tied to a PV Dom0, but I would like to also think
> about what would happen with a PVH Dom0. In that case AFAICT Xen could
> map the full NVDIMM to the Dom0 p2m as MMIO using 1GB pages, at which
> point Dom0 could manage the NVDIMM as desired? Ie: Dom0 could map
> parts of the NVDIMM to DomU as it maps other MMIO regions.

The primary reason I don't want to map NVDIMM to Dom0 (either PV or
PVH) is the frame table and M2P table of NVDIMM are maintained on
NVDIMM. Because NVDIMM is non-volatile and Xen has no idea of which
portion of NVDIMM can be used for frame table and M2P, Xen needs the
user input for such information (patch 18, 22, 23) after it boots
up. That is, before Xen boots up, Xen cannot determine which portion
of NVDIMM for its frame table and M2P that should not map to Dom0.

Thanks,
Haozhong

> 
> I'm not sure Xen needs to know anything else apart from how to map the
> full NVDIMM to Dom0 as MMIO, which would greatly simplify this series.
> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-12  1:25   ` Haozhong Zhang
@ 2018-02-12 10:05     ` Roger Pau Monné
  2018-02-13 10:06       ` Jan Beulich
  2018-02-15  6:44       ` Haozhong Zhang
  0 siblings, 2 replies; 113+ messages in thread
From: Roger Pau Monné @ 2018-02-12 10:05 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Gang Wei, Jan Beulich,
	Shane Wang, Chao Peng, Dan Williams, Daniel De Graaf

On Mon, Feb 12, 2018 at 09:25:42AM +0800, Haozhong Zhang wrote:
> On 02/09/18 12:33 +0000, Roger Pau Monné wrote:
> > Thanks for the series, I'm however wondering whether it's appropriate
> > to post a v4 as RFC. Ie: at v4 the reviewer expects the submitter to
> > have a clear picture of what needs to be implemented.
> > 
> > On Thu, Dec 07, 2017 at 06:09:49PM +0800, Haozhong Zhang wrote:
> > > All patches can also be found at
> > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
> > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4
> > > 
> > > RFC v3 can be found at
> > >   https://lists.xen.org/archives/html/xen-devel/2017-09/msg00964.html
> > > 
> > > Changes in v4:
> > >   * Move the functionality of management util 'xen-ndctl' to Xne
> > >     management tool 'xl'.
> > >   * Load QEMU ACPI via QEMU fw_cfg and BIOSLinkerLoader interface.
> > >   * Other changes are documented in patches separately.
> > > 
> > > 
> > > - Part 0. Bug fix and code cleanup
> > >   [01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
> > >   [02/41] x86_64/mm: avoid cleaning the unmapped frame table
> > >   [03/41] hvmloader/util: do not compare characters after '\0' in strncmp
> > > 
> > > - Part 1. Detect host PMEM
> > >   Detect host PMEM via NFIT. No frametable and M2P table for them are
> > >   created in this part.
> > > 
> > >   [04/41] xen/common: add Kconfig item for pmem support
> > >   [05/41] x86/mm: exclude PMEM regions from initial frametable
> > >   [06/41] acpi: probe valid PMEM regions via NFIT
> > >   [07/41] xen/pmem: register valid PMEM regions to Xen hypervisor
> > >   [08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0
> > 
> > I'm afraid I might ask stupied questions, since I haven't followed the
> > design discussion of this series very closely.
> > 
> > So you basically hide the NVDIMM from Dom0, and only allow guests to
> > use it?
> 
> Yes, though I have some unsent patches (for vNVDIMM label support) to
> allow QEMU in dom0 to access NVDIMM via DMOP.
> 
> > 
> > What happens when you boot the same system without Xen? Will the
> > NVDIMM get corrupted because for example Linux will write something to
> > it?
> 
> Bare metal OS without Xen may write to NVDIMM which may or may not
> corrupt the data, depending on the existing data on NVDIMM and how
> other OS uses NVDIMM.
> 
> If the bare-metal OS uses NVDIMM, for example, as the volatile memory
> or the fast disk cache, then the random data may be dumped to NVDIMM
> and corrupt the existing data.
> 
> If the bare-metal OS treats NVDIMM as storage, it may probe certain
> structures (e.g., file systems) on NVDIMM before further operations
> and stop if such structures are not probed. In such case, the existing
> data on NVDIMM will not be corrupted.

OK. I have to admit my knowledge of NVDIMM is very limited. Is it
expected to for example partition a NVDIMM into several partitions and
maybe use one as disk cache and others as storage?

How would that be accomplished, using GPT for example? Or there's some
NVDIMM specific way to describe the layout?

Would it be conceivable to store Dom0 root filesystem in a NVDIMM
while also using it to provide storage to the guests?

> > 
> > >   [09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
> > >   [10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
> > >   [11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
> > >   [12/41] tools/xl: add xl command 'pmem-list'
> > > 
> > > - Part 2. Setup host PMEM for management and guest data usage
> > >   Allow users or admins in Dom0 to setup host PMEM pages for
> > >   management and guest data usages.
> > >    * Management PMEM pages are used to store the frametable and M2P of
> > >      PMEM pages (including themselves), and never mapped to guest.
> > >    * Guest data PMEM pages can be mapped to guest and used as the
> > >      backend storage of virtual NVDIMM devices.
> > 
> > So this is basically tied to a PV Dom0, but I would like to also think
> > about what would happen with a PVH Dom0. In that case AFAICT Xen could
> > map the full NVDIMM to the Dom0 p2m as MMIO using 1GB pages, at which
> > point Dom0 could manage the NVDIMM as desired? Ie: Dom0 could map
> > parts of the NVDIMM to DomU as it maps other MMIO regions.
> 
> The primary reason I don't want to map NVDIMM to Dom0 (either PV or
> PVH) is the frame table and M2P table of NVDIMM are maintained on
> NVDIMM. Because NVDIMM is non-volatile and Xen has no idea of which
> portion of NVDIMM can be used for frame table and M2P, Xen needs the
> user input for such information (patch 18, 22, 23) after it boots
> up. That is, before Xen boots up, Xen cannot determine which portion
> of NVDIMM for its frame table and M2P that should not map to Dom0.

If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
IIRC, and if it's mapped using 1GB pages it shouldn't use that much
memory for the page tables (ie: you could just use normal RAM for the
page tables that map the NVDIMM IMO). Of course that only applies to
PVH/HVM.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-12 10:05     ` Roger Pau Monné
@ 2018-02-13 10:06       ` Jan Beulich
  2018-02-13 10:29         ` Roger Pau Monné
  2018-02-15  6:44       ` Haozhong Zhang
  1 sibling, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2018-02-13 10:06 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Shane Wang, Chao Peng, Dan Williams, Gang Wei

>>> On 12.02.18 at 11:05, <roger.pau@citrix.com> wrote:
> If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
> IIRC, and if it's mapped using 1GB pages it shouldn't use that much
> memory for the page tables (ie: you could just use normal RAM for the
> page tables that map the NVDIMM IMO). Of course that only applies to
> PVH/HVM.

But in order to use (part of) it in a RAM-like manner we need struct
page_info for it.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-13 10:06       ` Jan Beulich
@ 2018-02-13 10:29         ` Roger Pau Monné
  2018-02-13 11:05           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Roger Pau Monné @ 2018-02-13 10:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Shane Wang, Chao Peng, Dan Williams, Gang Wei

On Tue, Feb 13, 2018 at 03:06:24AM -0700, Jan Beulich wrote:
> >>> On 12.02.18 at 11:05, <roger.pau@citrix.com> wrote:
> > If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
> > IIRC, and if it's mapped using 1GB pages it shouldn't use that much
> > memory for the page tables (ie: you could just use normal RAM for the
> > page tables that map the NVDIMM IMO). Of course that only applies to
> > PVH/HVM.
> 
> But in order to use (part of) it in a RAM-like manner we need struct
> page_info for it.

I guess the main use of this would be to grant NVDIMM pages? And
without a page_info that's not possible.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-13 10:29         ` Roger Pau Monné
@ 2018-02-13 11:05           ` Jan Beulich
  2018-02-13 11:13             ` Roger Pau Monné
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2018-02-13 11:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: TimDeegan, StefanoStabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Shane Wang, Chao Peng, Dan Williams, GangWei

>>> On 13.02.18 at 11:29, <roger.pau@citrix.com> wrote:
> On Tue, Feb 13, 2018 at 03:06:24AM -0700, Jan Beulich wrote:
>> >>> On 12.02.18 at 11:05, <roger.pau@citrix.com> wrote:
>> > If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
>> > IIRC, and if it's mapped using 1GB pages it shouldn't use that much
>> > memory for the page tables (ie: you could just use normal RAM for the
>> > page tables that map the NVDIMM IMO). Of course that only applies to
>> > PVH/HVM.
>> 
>> But in order to use (part of) it in a RAM-like manner we need struct
>> page_info for it.
> 
> I guess the main use of this would be to grant NVDIMM pages? And
> without a page_info that's not possible.

Why grant? Simply giving such a page as RAM to a guest would
already be a problem without struct page_info (as then we can't
track the page owner, nor can we refcount the page).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-13 11:05           ` Jan Beulich
@ 2018-02-13 11:13             ` Roger Pau Monné
  2018-02-13 13:40               ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Roger Pau Monné @ 2018-02-13 11:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: TimDeegan, StefanoStabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Shane Wang, Chao Peng, Dan Williams, GangWei

On Tue, Feb 13, 2018 at 04:05:45AM -0700, Jan Beulich wrote:
> >>> On 13.02.18 at 11:29, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 13, 2018 at 03:06:24AM -0700, Jan Beulich wrote:
> >> >>> On 12.02.18 at 11:05, <roger.pau@citrix.com> wrote:
> >> > If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
> >> > IIRC, and if it's mapped using 1GB pages it shouldn't use that much
> >> > memory for the page tables (ie: you could just use normal RAM for the
> >> > page tables that map the NVDIMM IMO). Of course that only applies to
> >> > PVH/HVM.
> >> 
> >> But in order to use (part of) it in a RAM-like manner we need struct
> >> page_info for it.
> > 
> > I guess the main use of this would be to grant NVDIMM pages? And
> > without a page_info that's not possible.
> 
> Why grant? Simply giving such a page as RAM to a guest would
> already be a problem without struct page_info (as then we can't
> track the page owner, nor can we refcount the page).

My point was to avoid doing that, and always assign the pages as
MMIO, which IIRC doesn't require a struct page_info.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-13 11:13             ` Roger Pau Monné
@ 2018-02-13 13:40               ` Jan Beulich
  2018-02-13 15:39                 ` Roger Pau Monné
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2018-02-13 13:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: TimDeegan, StefanoStabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Shane Wang, Chao Peng, Dan Williams, GangWei

>>> On 13.02.18 at 12:13, <roger.pau@citrix.com> wrote:
> On Tue, Feb 13, 2018 at 04:05:45AM -0700, Jan Beulich wrote:
>> >>> On 13.02.18 at 11:29, <roger.pau@citrix.com> wrote:
>> > On Tue, Feb 13, 2018 at 03:06:24AM -0700, Jan Beulich wrote:
>> >> >>> On 12.02.18 at 11:05, <roger.pau@citrix.com> wrote:
>> >> > If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
>> >> > IIRC, and if it's mapped using 1GB pages it shouldn't use that much
>> >> > memory for the page tables (ie: you could just use normal RAM for the
>> >> > page tables that map the NVDIMM IMO). Of course that only applies to
>> >> > PVH/HVM.
>> >> 
>> >> But in order to use (part of) it in a RAM-like manner we need struct
>> >> page_info for it.
>> > 
>> > I guess the main use of this would be to grant NVDIMM pages? And
>> > without a page_info that's not possible.
>> 
>> Why grant? Simply giving such a page as RAM to a guest would
>> already be a problem without struct page_info (as then we can't
>> track the page owner, nor can we refcount the page).
> 
> My point was to avoid doing that, and always assign the pages as
> MMIO, which IIRC doesn't require a struct page_info.

MMIO pages can't be used for things like page tables, because of
the refcounting that's needed. The page being like RAM, however,
implies that the guest needs to be able to use it as anything a RAM
page can be used for.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-13 13:40               ` Jan Beulich
@ 2018-02-13 15:39                 ` Roger Pau Monné
  2018-02-15  6:59                   ` Haozhong Zhang
  0 siblings, 1 reply; 113+ messages in thread
From: Roger Pau Monné @ 2018-02-13 15:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: TimDeegan, StefanoStabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Shane Wang, Chao Peng, Dan Williams, GangWei

On Tue, Feb 13, 2018 at 06:40:20AM -0700, Jan Beulich wrote:
> >>> On 13.02.18 at 12:13, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 13, 2018 at 04:05:45AM -0700, Jan Beulich wrote:
> >> >>> On 13.02.18 at 11:29, <roger.pau@citrix.com> wrote:
> >> > On Tue, Feb 13, 2018 at 03:06:24AM -0700, Jan Beulich wrote:
> >> >> >>> On 12.02.18 at 11:05, <roger.pau@citrix.com> wrote:
> >> >> > If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
> >> >> > IIRC, and if it's mapped using 1GB pages it shouldn't use that much
> >> >> > memory for the page tables (ie: you could just use normal RAM for the
> >> >> > page tables that map the NVDIMM IMO). Of course that only applies to
> >> >> > PVH/HVM.
> >> >> 
> >> >> But in order to use (part of) it in a RAM-like manner we need struct
> >> >> page_info for it.
> >> > 
> >> > I guess the main use of this would be to grant NVDIMM pages? And
> >> > without a page_info that's not possible.
> >> 
> >> Why grant? Simply giving such a page as RAM to a guest would
> >> already be a problem without struct page_info (as then we can't
> >> track the page owner, nor can we refcount the page).
> > 
> > My point was to avoid doing that, and always assign the pages as
> > MMIO, which IIRC doesn't require a struct page_info.
> 
> MMIO pages can't be used for things like page tables, because of
> the refcounting that's needed. The page being like RAM, however,
> implies that the guest needs to be able to use it as anything a RAM
> page can be used for.

OK, I'm quite unsure about what people actually use NVDIMM for, I
thought it was mostly used as some kind of storage, but if it's
actually used as plain RAM then yes, we likely need struct page_info
for them, which is a PITA.

My worries are that if you boot bare metal Linux and use NVDIMM, and
then reboot into Xen you won't be able to access the NVDIMM data
anymore AFAICT because Xen will have taken over it, and already used
part of it to store it's own page tables, which is problematic IMO.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-12 10:05     ` Roger Pau Monné
  2018-02-13 10:06       ` Jan Beulich
@ 2018-02-15  6:44       ` Haozhong Zhang
  1 sibling, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-15  6:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Daniel De Graaf, xen-devel,
	Jan Beulich, Shane Wang, Chao Peng, Dan Williams, Gang Wei

On 02/12/18 10:05 +0000, Roger Pau Monné wrote:
> On Mon, Feb 12, 2018 at 09:25:42AM +0800, Haozhong Zhang wrote:
> > On 02/09/18 12:33 +0000, Roger Pau Monné wrote:
> > > Thanks for the series, I'm however wondering whether it's appropriate
> > > to post a v4 as RFC. Ie: at v4 the reviewer expects the submitter to
> > > have a clear picture of what needs to be implemented.
> > > 
> > > On Thu, Dec 07, 2017 at 06:09:49PM +0800, Haozhong Zhang wrote:
> > > > All patches can also be found at
> > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
> > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4
> > > > 
> > > > RFC v3 can be found at
> > > >   https://lists.xen.org/archives/html/xen-devel/2017-09/msg00964.html
> > > > 
> > > > Changes in v4:
> > > >   * Move the functionality of management util 'xen-ndctl' to Xne
> > > >     management tool 'xl'.
> > > >   * Load QEMU ACPI via QEMU fw_cfg and BIOSLinkerLoader interface.
> > > >   * Other changes are documented in patches separately.
> > > > 
> > > > 
> > > > - Part 0. Bug fix and code cleanup
> > > >   [01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
> > > >   [02/41] x86_64/mm: avoid cleaning the unmapped frame table
> > > >   [03/41] hvmloader/util: do not compare characters after '\0' in strncmp
> > > > 
> > > > - Part 1. Detect host PMEM
> > > >   Detect host PMEM via NFIT. No frametable and M2P table for them are
> > > >   created in this part.
> > > > 
> > > >   [04/41] xen/common: add Kconfig item for pmem support
> > > >   [05/41] x86/mm: exclude PMEM regions from initial frametable
> > > >   [06/41] acpi: probe valid PMEM regions via NFIT
> > > >   [07/41] xen/pmem: register valid PMEM regions to Xen hypervisor
> > > >   [08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0
> > > 
> > > I'm afraid I might ask stupied questions, since I haven't followed the
> > > design discussion of this series very closely.
> > > 
> > > So you basically hide the NVDIMM from Dom0, and only allow guests to
> > > use it?
> > 
> > Yes, though I have some unsent patches (for vNVDIMM label support) to
> > allow QEMU in dom0 to access NVDIMM via DMOP.
> > 
> > > 
> > > What happens when you boot the same system without Xen? Will the
> > > NVDIMM get corrupted because for example Linux will write something to
> > > it?
> > 
> > Bare metal OS without Xen may write to NVDIMM which may or may not
> > corrupt the data, depending on the existing data on NVDIMM and how
> > other OS uses NVDIMM.
> > 
> > If the bare-metal OS uses NVDIMM, for example, as the volatile memory
> > or the fast disk cache, then the random data may be dumped to NVDIMM
> > and corrupt the existing data.
> > 
> > If the bare-metal OS treats NVDIMM as storage, it may probe certain
> > structures (e.g., file systems) on NVDIMM before further operations
> > and stop if such structures are not probed. In such case, the existing
> > data on NVDIMM will not be corrupted.
> 
> OK. I have to admit my knowledge of NVDIMM is very limited. Is it
> expected to for example partition a NVDIMM into several partitions and
> maybe use one as disk cache and others as storage?
> 
> How would that be accomplished, using GPT for example? Or there's some
> NVDIMM specific way to describe the layout?

NVDIMM is mapped to CPU address space just as regular RAM. Basically
SW can access it via the normal memory access instructions (e.g, mov
on x86) with necessary cache flush operations (e.g, clwb/clflushopt/clflush)
to guarantee the write persistence. Beyond this basic byte-addressable
interface, SW can choose to, for example, use it as the typical
memory, use it as a persistent storage, and even implement a block
interface over it. SW can choose its own method to partition NVDIMM,
maybe via the typical disk partitions and file systems, or the labels
which are provided by NVDIMM.

When those SW runs in a HVM domain, the primary work of Xen is to map
the host NVDIMM address to guest address space in EPT as RW just like
the normal memory virtualization.

> 
> Would it be conceivable to store Dom0 root filesystem in a NVDIMM
> while also using it to provide storage to the guests?

Yes, it's possible, though it's not allowed in this patchset.  We need
to configure Xen hypervisor before booting, to know which part of
NVDIMM is needed to map to Dom0 and where the management structures of
that part of NVDIMM are maintained (e.g., in another part of NVDIM or
in RAM).

Haozhong

> 
> > > 
> > > >   [09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
> > > >   [10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
> > > >   [11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
> > > >   [12/41] tools/xl: add xl command 'pmem-list'
> > > > 
> > > > - Part 2. Setup host PMEM for management and guest data usage
> > > >   Allow users or admins in Dom0 to setup host PMEM pages for
> > > >   management and guest data usages.
> > > >    * Management PMEM pages are used to store the frametable and M2P of
> > > >      PMEM pages (including themselves), and never mapped to guest.
> > > >    * Guest data PMEM pages can be mapped to guest and used as the
> > > >      backend storage of virtual NVDIMM devices.
> > > 
> > > So this is basically tied to a PV Dom0, but I would like to also think
> > > about what would happen with a PVH Dom0. In that case AFAICT Xen could
> > > map the full NVDIMM to the Dom0 p2m as MMIO using 1GB pages, at which
> > > point Dom0 could manage the NVDIMM as desired? Ie: Dom0 could map
> > > parts of the NVDIMM to DomU as it maps other MMIO regions.
> > 
> > The primary reason I don't want to map NVDIMM to Dom0 (either PV or
> > PVH) is the frame table and M2P table of NVDIMM are maintained on
> > NVDIMM. Because NVDIMM is non-volatile and Xen has no idea of which
> > portion of NVDIMM can be used for frame table and M2P, Xen needs the
> > user input for such information (patch 18, 22, 23) after it boots
> > up. That is, before Xen boots up, Xen cannot determine which portion
> > of NVDIMM for its frame table and M2P that should not map to Dom0.
> 
> If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
> IIRC, and if it's mapped using 1GB pages it shouldn't use that much
> memory for the page tables (ie: you could just use normal RAM for the
> page tables that map the NVDIMM IMO). Of course that only applies to
> PVH/HVM.
> 
> Thanks, Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains
  2018-02-13 15:39                 ` Roger Pau Monné
@ 2018-02-15  6:59                   ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-15  6:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper,
	TimDeegan, xen-devel, GangWei, Jan Beulich, Shane Wang,
	Chao Peng, Daniel De Graaf, Ian Jackson, Dan Williams

On 02/13/18 15:39 +0000, Roger Pau Monné wrote:
> On Tue, Feb 13, 2018 at 06:40:20AM -0700, Jan Beulich wrote:
> > >>> On 13.02.18 at 12:13, <roger.pau@citrix.com> wrote:
> > > On Tue, Feb 13, 2018 at 04:05:45AM -0700, Jan Beulich wrote:
> > >> >>> On 13.02.18 at 11:29, <roger.pau@citrix.com> wrote:
> > >> > On Tue, Feb 13, 2018 at 03:06:24AM -0700, Jan Beulich wrote:
> > >> >> >>> On 12.02.18 at 11:05, <roger.pau@citrix.com> wrote:
> > >> >> > If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
> > >> >> > IIRC, and if it's mapped using 1GB pages it shouldn't use that much
> > >> >> > memory for the page tables (ie: you could just use normal RAM for the
> > >> >> > page tables that map the NVDIMM IMO). Of course that only applies to
> > >> >> > PVH/HVM.
> > >> >> 
> > >> >> But in order to use (part of) it in a RAM-like manner we need struct
> > >> >> page_info for it.
> > >> > 
> > >> > I guess the main use of this would be to grant NVDIMM pages? And
> > >> > without a page_info that's not possible.
> > >> 
> > >> Why grant? Simply giving such a page as RAM to a guest would
> > >> already be a problem without struct page_info (as then we can't
> > >> track the page owner, nor can we refcount the page).
> > > 
> > > My point was to avoid doing that, and always assign the pages as
> > > MMIO, which IIRC doesn't require a struct page_info.
> > 
> > MMIO pages can't be used for things like page tables, because of
> > the refcounting that's needed. The page being like RAM, however,
> > implies that the guest needs to be able to use it as anything a RAM
> > page can be used for.
> 
> OK, I'm quite unsure about what people actually use NVDIMM for, I
> thought it was mostly used as some kind of storage, but if it's
> actually used as plain RAM then yes, we likely need struct page_info
> for them, which is a PITA.
> 
> My worries are that if you boot bare metal Linux and use NVDIMM, and
> then reboot into Xen you won't be able to access the NVDIMM data
> anymore AFAICT because Xen will have taken over it, and already used
> part of it to store it's own page tables, which is problematic IMO.
> 

The page tables for NVDIMM whose size is not large are still kept in
RAM.

This patchset does not let Xen use any NVDIMM at boot time, just
because of your worries. Part 2 of this patchset introduces a set of
xl subcommands to allow users to educate Xen after boot up which parts
of NVDIMM can be safely used by Xen without corrupting the real useful
data. That is, I suppose users to pre-partition the NVDIMM (before
using it with Xen) to at least two parts, one for hypervisor
management purpose and the data in it does not need to preserve across
power cycles, and others for user data which may need to be
non-volatile.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen
  2017-12-07 10:18     ` Haozhong Zhang
@ 2018-02-27 16:37       ` Anthony PERARD
  -1 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 16:37 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

On Thu, Dec 07, 2017 at 06:18:04PM +0800, Haozhong Zhang wrote:
> The guest physical address of vNVDIMM is allocated from the hotplug
> memory region, which is not created when QEMU is used as Xen device
> model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
> the code for pc machine type to create the hotplug memory region for
> Xen HVM domains.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> ---
>  hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
>  hw/i386/xen/xen-hvm.c |  2 ++
>  include/hw/i386/pc.h  |  1 +
>  3 files changed, 51 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 186545d2a4..9f46c8df79 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1315,6 +1315,53 @@ void xen_load_linux(PCMachineState *pcms)
>      pcms->fw_cfg = fw_cfg;
>  }
>  
> +void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)

It might be better to have a separate patch which move the code into a function.

> +{
> +    MachineState *machine = MACHINE(pcms);
> +    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
> +
> +    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
> +        return;
> +
> +    if (memory_region_size(&pcms->hotplug_memory.mr)) {

This new check looks like to catch programming error, rather than user
error. Would it be better to be an assert instead?

> +        error_report("hotplug memory region has been initialized");
> +        exit(EXIT_FAILURE);
> +    }
> +

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen
@ 2018-02-27 16:37       ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 16:37 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Paolo Bonzini, Chao Peng, xen-devel, Dan Williams,
	Richard Henderson

On Thu, Dec 07, 2017 at 06:18:04PM +0800, Haozhong Zhang wrote:
> The guest physical address of vNVDIMM is allocated from the hotplug
> memory region, which is not created when QEMU is used as Xen device
> model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
> the code for pc machine type to create the hotplug memory region for
> Xen HVM domains.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> ---
>  hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
>  hw/i386/xen/xen-hvm.c |  2 ++
>  include/hw/i386/pc.h  |  1 +
>  3 files changed, 51 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 186545d2a4..9f46c8df79 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1315,6 +1315,53 @@ void xen_load_linux(PCMachineState *pcms)
>      pcms->fw_cfg = fw_cfg;
>  }
>  
> +void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)

It might be better to have a separate patch which move the code into a function.

> +{
> +    MachineState *machine = MACHINE(pcms);
> +    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
> +
> +    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
> +        return;
> +
> +    if (memory_region_size(&pcms->hotplug_memory.mr)) {

This new check looks like to catch programming error, rather than user
error. Would it be better to be an assert instead?

> +        error_report("hotplug memory region has been initialized");
> +        exit(EXIT_FAILURE);
> +    }
> +

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
  2017-12-07 10:18     ` Haozhong Zhang
@ 2018-02-27 16:41       ` Anthony PERARD
  -1 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 16:41 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin

On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index ee2c2d5bfd..ba13a52994 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -12,6 +12,7 @@
>  #include "qemu/osdep.h"
>  #include "sysemu/hostmem.h"
>  #include "hw/boards.h"
> +#include "hw/xen/xen.h"
>  #include "qapi/error.h"
>  #include "qapi/visitor.h"
>  #include "qapi-types.h"
> @@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>              goto out;
>          }
>  
> +        /*
> +         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
> +         * so no further work in this function is needed.
> +         */
> +        if (xen_enabled() && !backend->mr.ram_block) {
> +            goto out;
> +        }
> +
>          ptr = memory_region_get_ram_ptr(&backend->mr);
>          sz = memory_region_size(&backend->mr);
>  
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 66eace5a5c..dcbfce33d5 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/kvm.h"
>  #include "trace.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/xen/xen.h"
>  
>  typedef struct pc_dimms_capacity {
>       uint64_t size;
> @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
>      }
>  
>      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> -    vmstate_register_ram(vmstate_mr, dev);
> +    /* memory-backend-xen is not backed by RAM. */
> +    if (!xen_enabled()) {

Is it possible to have the same condition as the one used in
host_memory_backend_memory_complete? i.e. base on whether the memory
region is mapped or not (backend->mr.ram_block).

> +        vmstate_register_ram(vmstate_mr, dev);
> +    }
>      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
>  
>  out:
> -- 
> 2.15.1
> 

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
@ 2018-02-27 16:41       ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 16:41 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Igor Mammedov, Chao Peng, xen-devel, Dan Williams

On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index ee2c2d5bfd..ba13a52994 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -12,6 +12,7 @@
>  #include "qemu/osdep.h"
>  #include "sysemu/hostmem.h"
>  #include "hw/boards.h"
> +#include "hw/xen/xen.h"
>  #include "qapi/error.h"
>  #include "qapi/visitor.h"
>  #include "qapi-types.h"
> @@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>              goto out;
>          }
>  
> +        /*
> +         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
> +         * so no further work in this function is needed.
> +         */
> +        if (xen_enabled() && !backend->mr.ram_block) {
> +            goto out;
> +        }
> +
>          ptr = memory_region_get_ram_ptr(&backend->mr);
>          sz = memory_region_size(&backend->mr);
>  
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 66eace5a5c..dcbfce33d5 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/kvm.h"
>  #include "trace.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/xen/xen.h"
>  
>  typedef struct pc_dimms_capacity {
>       uint64_t size;
> @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
>      }
>  
>      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> -    vmstate_register_ram(vmstate_mr, dev);
> +    /* memory-backend-xen is not backed by RAM. */
> +    if (!xen_enabled()) {

Is it possible to have the same condition as the one used in
host_memory_backend_memory_complete? i.e. base on whether the memory
region is mapped or not (backend->mr.ram_block).

> +        vmstate_register_ram(vmstate_mr, dev);
> +    }
>      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
>  
>  out:
> -- 
> 2.15.1
> 

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 05/10] xen-hvm: initialize fw_cfg interface
  2017-12-07 10:18     ` Haozhong Zhang
@ 2018-02-27 16:46       ` Anthony PERARD
  -1 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 16:46 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

On Thu, Dec 07, 2017 at 06:18:07PM +0800, Haozhong Zhang wrote:
> Xen is going to reuse QEMU to build ACPI of some devices (e.g., NFIT
> and SSDT for NVDIMM) for HVM domains. The existing QEMU ACPI build
> code requires a fw_cfg interface which will also be used to pass QEMU
> built ACPI to Xen. Therefore, we need to initialize fw_cfg when any
> ACPI is going to be built by QEMU.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> ---
>  hw/i386/xen/xen-hvm.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index fe01b7a025..4b29f4052b 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -14,6 +14,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/i386/pc.h"
>  #include "hw/i386/apic-msidef.h"
> +#include "hw/loader.h"
>  #include "hw/xen/xen_common.h"
>  #include "hw/xen/xen_backend.h"
>  #include "qmp-commands.h"
> @@ -1234,6 +1235,14 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
>      xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
>  }
>  
> +static void xen_fw_cfg_init(PCMachineState *pcms)
> +{

The fw_cfg interface might already be initialized, it is used for
"direct kernel boot" on hvm. It is initialized in xen_load_linux().

> +    FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> +
> +    rom_set_fw(fw_cfg);
> +    pcms->fw_cfg = fw_cfg;
> +}
> +
>  void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
>  {
>      int i, rc;
> @@ -1384,6 +1393,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
>  
>      /* Disable ACPI build because Xen handles it */
>      pcms->acpi_build_enabled = false;
> +    if (pcms->acpi_build_enabled) {
> +        xen_fw_cfg_init(pcms);
> +    }
>  
>      return;
>  
> -- 
> 2.15.1
> 

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 05/10] xen-hvm: initialize fw_cfg interface
@ 2018-02-27 16:46       ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 16:46 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Paolo Bonzini, Chao Peng, xen-devel, Dan Williams,
	Richard Henderson

On Thu, Dec 07, 2017 at 06:18:07PM +0800, Haozhong Zhang wrote:
> Xen is going to reuse QEMU to build ACPI of some devices (e.g., NFIT
> and SSDT for NVDIMM) for HVM domains. The existing QEMU ACPI build
> code requires a fw_cfg interface which will also be used to pass QEMU
> built ACPI to Xen. Therefore, we need to initialize fw_cfg when any
> ACPI is going to be built by QEMU.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> ---
>  hw/i386/xen/xen-hvm.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index fe01b7a025..4b29f4052b 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -14,6 +14,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/i386/pc.h"
>  #include "hw/i386/apic-msidef.h"
> +#include "hw/loader.h"
>  #include "hw/xen/xen_common.h"
>  #include "hw/xen/xen_backend.h"
>  #include "qmp-commands.h"
> @@ -1234,6 +1235,14 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
>      xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
>  }
>  
> +static void xen_fw_cfg_init(PCMachineState *pcms)
> +{

The fw_cfg interface might already be initialized, it is used for
"direct kernel boot" on hvm. It is initialized in xen_load_linux().

> +    FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> +
> +    rom_set_fw(fw_cfg);
> +    pcms->fw_cfg = fw_cfg;
> +}
> +
>  void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
>  {
>      int i, rc;
> @@ -1384,6 +1393,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
>  
>      /* Disable ACPI build because Xen handles it */
>      pcms->acpi_build_enabled = false;
> +    if (pcms->acpi_build_enabled) {
> +        xen_fw_cfg_init(pcms);
> +    }
>  
>      return;
>  
> -- 
> 2.15.1
> 

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
  2017-12-07 10:18   ` Haozhong Zhang
@ 2018-02-27 17:22     ` Anthony PERARD
  -1 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 17:22 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> This is the QEMU part patches that works with the associated Xen
> patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> guest address space for vNVDIMM devices.

I've got other question, and maybe possible improvements.

When QEMU build the ACPI tables, it also initialize some MemoryRegion,
which use more guest memory. Do you know if those regions are used with
your patch series on Xen? Otherwise, we could try to avoid their
creation with this:
In xenfv_machine_options()
m->rom_file_has_mr = false;
(setting this in xen_hvm_init() would probably be better, but I havn't
try)

If this is possible, libxl would not need to allocate more memory for
the guest (dm_acpi_size).

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
@ 2018-02-27 17:22     ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 17:22 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Igor Mammedov, Paolo Bonzini, Chao Peng, xen-devel,
	Dan Williams, Richard Henderson, Xiao Guangrong

On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> This is the QEMU part patches that works with the associated Xen
> patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> guest address space for vNVDIMM devices.

I've got other question, and maybe possible improvements.

When QEMU build the ACPI tables, it also initialize some MemoryRegion,
which use more guest memory. Do you know if those regions are used with
your patch series on Xen? Otherwise, we could try to avoid their
creation with this:
In xenfv_machine_options()
m->rom_file_has_mr = false;
(setting this in xen_hvm_init() would probably be better, but I havn't
try)

If this is possible, libxl would not need to allocate more memory for
the guest (dm_acpi_size).

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  2017-12-07 10:10 ` [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface Haozhong Zhang
@ 2018-02-27 17:37   ` Anthony PERARD
  2018-02-28  9:17     ` Haozhong Zhang
  2018-02-27 18:03   ` Anthony PERARD
  1 sibling, 1 reply; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 17:37 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich, Chao Peng,
	Dan Williams

On Thu, Dec 07, 2017 at 06:10:22PM +0800, Haozhong Zhang wrote:
> Add a function in libacpi to detect QEMU fw_cfg interface. Limit the
> usage of fw_cfg interface to hvmloader now, so use stub functions for
> others.

I think libacpi is not the right place for a driver. The fw_cfg driver
would be better in hvmloader.

As to copy the ACPI tables from fw_cfg to libacpi, maybe the passthrough
tables (or an improvement of it) could be use. (It is already to to add
extra tables from libxl (HVM_XS_ACPI_PT_ADDRESS).)

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 34/41] tools/libacpi: probe QEMU ACPI ROMs via fw_cfg interface
  2017-12-07 10:10 ` [RFC XEN PATCH v4 34/41] tools/libacpi: probe QEMU ACPI ROMs via " Haozhong Zhang
@ 2018-02-27 17:56   ` Anthony PERARD
  2018-02-28  9:28     ` Haozhong Zhang
  0 siblings, 1 reply; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 17:56 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich, Chao Peng,
	Dan Williams

On Thu, Dec 07, 2017 at 06:10:23PM +0800, Haozhong Zhang wrote:
> Probe following QEMU ACPI ROMs:
>  * etc/acpi/rsdp:       QEMU RSDP, which is used to iterate other
>                         QEMU ACPI tables in etc/acpi/tables
> 
>  * etc/acpi/tables:     other QEMU ACPI tables
> 
>  * etc/table-loader:    QEMU BIOSLinkerLoader ROM, which can be
>                         executed to load QEMU ACPI tables
> 
>  * etc/acpi/nvdimm-mem: RAM which is used as NVDIMM ACPI DSM buffer,
>                         the exact location will be allocated during
>                         the execution of /etc/table-loader
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---

> diff --git a/tools/libacpi/qemu_loader.c b/tools/libacpi/qemu_loader.c
> new file mode 100644
> index 0000000000..c0ed3b0ad0
> --- /dev/null
> +++ b/tools/libacpi/qemu_loader.c
> @@ -0,0 +1,82 @@
> +/*
> + * libacpi/qemu_loader.c
> + *
> + * Driver of QEMU BIOSLinkerLoader interface. The reference document
> + * can be found at
> + * https://github.com/qemu/qemu/blob/master/hw/acpi/bios-linker-loader.c.

That's only a mirror, the official QEMU tree is on git.qemu.org. So I
think the URL should read:
https://git.qemu.org/?p=qemu.git;a=blob;f=hw/acpi/bios-linker-loader.c;hb=HEAD

> + *
> + * Copyright (C) 2017,  Intel Corporation
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License, version 2.1, as published by the Free Software Foundation.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include LIBACPI_STDUTILS
> +#include "libacpi.h"
> +#include "qemu.h"
> +
> +struct rom {
> +    struct fw_cfg_file file;
> +    struct rom *next;
> +};
> +
> +static struct rom *roms = NULL;
> +static struct rom *bios_loader = NULL;
> +
> +static bool rom_needed(const char *file_name)
> +{
> +    return
> +        !strncmp(file_name, "etc/acpi/rsdp", FW_CFG_FILE_PATH_MAX_LENGTH) ||
> +        !strncmp(file_name, "etc/acpi/tables", FW_CFG_FILE_PATH_MAX_LENGTH) ||
> +        !strncmp(file_name, "etc/table-loader", FW_CFG_FILE_PATH_MAX_LENGTH) ||
> +        !strncmp(file_name, "etc/acpi/nvdimm-mem", FW_CFG_FILE_PATH_MAX_LENGTH);

Is it necessary to filter the "files" that are available via fw_cfg? Is
there enough memory for hvmloader to just alocate the "struct rom" for
every available files? Other solution might be to filter base on
"etc/acpi/*" + "etc/table-loader".


-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  2017-12-07 10:10 ` [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface Haozhong Zhang
  2018-02-27 17:37   ` Anthony PERARD
@ 2018-02-27 18:03   ` Anthony PERARD
  2018-02-28  8:18     ` Haozhong Zhang
  1 sibling, 1 reply; 113+ messages in thread
From: Anthony PERARD @ 2018-02-27 18:03 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich, Chao Peng,
	Dan Williams

On Thu, Dec 07, 2017 at 06:10:22PM +0800, Haozhong Zhang wrote:
> diff --git a/tools/libacpi/qemu_fw_cfg.c b/tools/libacpi/qemu_fw_cfg.c
> new file mode 100644
> index 0000000000..254d2f575d
> --- /dev/null
> +++ b/tools/libacpi/qemu_fw_cfg.c
> @@ -0,0 +1,66 @@
> +/*
> + * libacpi/qemu_fw_cfg.c
> + *
> + * Driver of QEMU fw_cfg interface. The reference document can be found at
> + * https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt.

On github.com, it's only a mirror. I think the URL should read:
https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/fw_cfg.txt;hb=HEAD

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen
  2018-02-27 16:37       ` Anthony PERARD
  (?)
@ 2018-02-28  7:47       ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:47 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

On 02/27/18 16:37 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:04PM +0800, Haozhong Zhang wrote:
> > The guest physical address of vNVDIMM is allocated from the hotplug
> > memory region, which is not created when QEMU is used as Xen device
> > model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
> > the code for pc machine type to create the hotplug memory region for
> > Xen HVM domains.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Richard Henderson <rth@twiddle.net>
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > ---
> >  hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
> >  hw/i386/xen/xen-hvm.c |  2 ++
> >  include/hw/i386/pc.h  |  1 +
> >  3 files changed, 51 insertions(+), 38 deletions(-)
> > 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 186545d2a4..9f46c8df79 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1315,6 +1315,53 @@ void xen_load_linux(PCMachineState *pcms)
> >      pcms->fw_cfg = fw_cfg;
> >  }
> >  
> > +void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)
> 
> It might be better to have a separate patch which move the code into a function.

will move it to a separate patch

> 
> > +{
> > +    MachineState *machine = MACHINE(pcms);
> > +    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> > +    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
> > +
> > +    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
> > +        return;
> > +
> > +    if (memory_region_size(&pcms->hotplug_memory.mr)) {
> 
> This new check looks like to catch programming error, rather than user
> error. Would it be better to be an assert instead?

Well, this was a debugging check and I forgot to remove it before
sending the patch. I'll drop it in the next version.

Thanks,
Haozhong

> 
> > +        error_report("hotplug memory region has been initialized");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> 
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen
  2018-02-27 16:37       ` Anthony PERARD
  (?)
  (?)
@ 2018-02-28  7:47       ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:47 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Paolo Bonzini, Chao Peng, xen-devel, Dan Williams,
	Richard Henderson

On 02/27/18 16:37 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:04PM +0800, Haozhong Zhang wrote:
> > The guest physical address of vNVDIMM is allocated from the hotplug
> > memory region, which is not created when QEMU is used as Xen device
> > model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
> > the code for pc machine type to create the hotplug memory region for
> > Xen HVM domains.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Richard Henderson <rth@twiddle.net>
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > ---
> >  hw/i386/pc.c          | 86 ++++++++++++++++++++++++++++-----------------------
> >  hw/i386/xen/xen-hvm.c |  2 ++
> >  include/hw/i386/pc.h  |  1 +
> >  3 files changed, 51 insertions(+), 38 deletions(-)
> > 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 186545d2a4..9f46c8df79 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1315,6 +1315,53 @@ void xen_load_linux(PCMachineState *pcms)
> >      pcms->fw_cfg = fw_cfg;
> >  }
> >  
> > +void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)
> 
> It might be better to have a separate patch which move the code into a function.

will move it to a separate patch

> 
> > +{
> > +    MachineState *machine = MACHINE(pcms);
> > +    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> > +    ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
> > +
> > +    if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size)
> > +        return;
> > +
> > +    if (memory_region_size(&pcms->hotplug_memory.mr)) {
> 
> This new check looks like to catch programming error, rather than user
> error. Would it be better to be an assert instead?

Well, this was a debugging check and I forgot to remove it before
sending the patch. I'll drop it in the next version.

Thanks,
Haozhong

> 
> > +        error_report("hotplug memory region has been initialized");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> 
> -- 
> Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
  2018-02-27 16:41       ` Anthony PERARD
  (?)
@ 2018-02-28  7:56       ` Haozhong Zhang
  2018-03-02 11:50           ` Anthony PERARD
  -1 siblings, 1 reply; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:56 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin

On 02/27/18 16:41 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> > diff --git a/backends/hostmem.c b/backends/hostmem.c
> > index ee2c2d5bfd..ba13a52994 100644
> > --- a/backends/hostmem.c
> > +++ b/backends/hostmem.c
> > @@ -12,6 +12,7 @@
> >  #include "qemu/osdep.h"
> >  #include "sysemu/hostmem.h"
> >  #include "hw/boards.h"
> > +#include "hw/xen/xen.h"
> >  #include "qapi/error.h"
> >  #include "qapi/visitor.h"
> >  #include "qapi-types.h"
> > @@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
> >              goto out;
> >          }
> >  
> > +        /*
> > +         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
> > +         * so no further work in this function is needed.
> > +         */
> > +        if (xen_enabled() && !backend->mr.ram_block) {
> > +            goto out;
> > +        }
> > +
> >          ptr = memory_region_get_ram_ptr(&backend->mr);
> >          sz = memory_region_size(&backend->mr);
> >  
> > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > index 66eace5a5c..dcbfce33d5 100644
> > --- a/hw/mem/pc-dimm.c
> > +++ b/hw/mem/pc-dimm.c
> > @@ -28,6 +28,7 @@
> >  #include "sysemu/kvm.h"
> >  #include "trace.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/xen/xen.h"
> >  
> >  typedef struct pc_dimms_capacity {
> >       uint64_t size;
> > @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
> >      }
> >  
> >      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> > -    vmstate_register_ram(vmstate_mr, dev);
> > +    /* memory-backend-xen is not backed by RAM. */
> > +    if (!xen_enabled()) {
> 
> Is it possible to have the same condition as the one used in
> host_memory_backend_memory_complete? i.e. base on whether the memory
> region is mapped or not (backend->mr.ram_block).

Like "if (!xen_enabled() || backend->mr.ram_block))"? No, it will mute
the abortion (vmstate_register_ram --> qemu_ram_set_idstr ) caused by
the case that !backend->mr.ram_block in the non-xen environment.

Haozhong

> 
> > +        vmstate_register_ram(vmstate_mr, dev);
> > +    }
> >      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
> >  
> >  out:
> > -- 
> > 2.15.1
> > 
> 
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
  2018-02-27 16:41       ` Anthony PERARD
  (?)
  (?)
@ 2018-02-28  7:56       ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:56 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Igor Mammedov, Chao Peng, xen-devel, Dan Williams

On 02/27/18 16:41 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> > diff --git a/backends/hostmem.c b/backends/hostmem.c
> > index ee2c2d5bfd..ba13a52994 100644
> > --- a/backends/hostmem.c
> > +++ b/backends/hostmem.c
> > @@ -12,6 +12,7 @@
> >  #include "qemu/osdep.h"
> >  #include "sysemu/hostmem.h"
> >  #include "hw/boards.h"
> > +#include "hw/xen/xen.h"
> >  #include "qapi/error.h"
> >  #include "qapi/visitor.h"
> >  #include "qapi-types.h"
> > @@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
> >              goto out;
> >          }
> >  
> > +        /*
> > +         * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
> > +         * so no further work in this function is needed.
> > +         */
> > +        if (xen_enabled() && !backend->mr.ram_block) {
> > +            goto out;
> > +        }
> > +
> >          ptr = memory_region_get_ram_ptr(&backend->mr);
> >          sz = memory_region_size(&backend->mr);
> >  
> > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > index 66eace5a5c..dcbfce33d5 100644
> > --- a/hw/mem/pc-dimm.c
> > +++ b/hw/mem/pc-dimm.c
> > @@ -28,6 +28,7 @@
> >  #include "sysemu/kvm.h"
> >  #include "trace.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/xen/xen.h"
> >  
> >  typedef struct pc_dimms_capacity {
> >       uint64_t size;
> > @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
> >      }
> >  
> >      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> > -    vmstate_register_ram(vmstate_mr, dev);
> > +    /* memory-backend-xen is not backed by RAM. */
> > +    if (!xen_enabled()) {
> 
> Is it possible to have the same condition as the one used in
> host_memory_backend_memory_complete? i.e. base on whether the memory
> region is mapped or not (backend->mr.ram_block).

Like "if (!xen_enabled() || backend->mr.ram_block))"? No, it will mute
the abortion (vmstate_register_ram --> qemu_ram_set_idstr ) caused by
the case that !backend->mr.ram_block in the non-xen environment.

Haozhong

> 
> > +        vmstate_register_ram(vmstate_mr, dev);
> > +    }
> >      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
> >  
> >  out:
> > -- 
> > 2.15.1
> > 
> 
> -- 
> Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 05/10] xen-hvm: initialize fw_cfg interface
  2018-02-27 16:46       ` Anthony PERARD
@ 2018-02-28  8:16         ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  8:16 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

On 02/27/18 16:46 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:07PM +0800, Haozhong Zhang wrote:
> > Xen is going to reuse QEMU to build ACPI of some devices (e.g., NFIT
> > and SSDT for NVDIMM) for HVM domains. The existing QEMU ACPI build
> > code requires a fw_cfg interface which will also be used to pass QEMU
> > built ACPI to Xen. Therefore, we need to initialize fw_cfg when any
> > ACPI is going to be built by QEMU.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Richard Henderson <rth@twiddle.net>
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > ---
> >  hw/i386/xen/xen-hvm.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> > index fe01b7a025..4b29f4052b 100644
> > --- a/hw/i386/xen/xen-hvm.c
> > +++ b/hw/i386/xen/xen-hvm.c
> > @@ -14,6 +14,7 @@
> >  #include "hw/pci/pci.h"
> >  #include "hw/i386/pc.h"
> >  #include "hw/i386/apic-msidef.h"
> > +#include "hw/loader.h"
> >  #include "hw/xen/xen_common.h"
> >  #include "hw/xen/xen_backend.h"
> >  #include "qmp-commands.h"
> > @@ -1234,6 +1235,14 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
> >      xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
> >  }
> >  
> > +static void xen_fw_cfg_init(PCMachineState *pcms)
> > +{
> 
> The fw_cfg interface might already be initialized, it is used for
> "direct kernel boot" on hvm. It is initialized in xen_load_linux().
>

xen_hvm_init() --> xen_fw_cfg_init() are called before
xen_load_linux(). I'll add a check in xen_load_linux() to avoid
redoing fw_cfg_init_io and rom_set_fw.

Haozhong

> > +    FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> > +
> > +    rom_set_fw(fw_cfg);
> > +    pcms->fw_cfg = fw_cfg;
> > +}
> > +
> >  void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
> >  {
> >      int i, rc;
> > @@ -1384,6 +1393,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
> >  
> >      /* Disable ACPI build because Xen handles it */
> >      pcms->acpi_build_enabled = false;
> > +    if (pcms->acpi_build_enabled) {
> > +        xen_fw_cfg_init(pcms);
> > +    }
> >  
> >      return;
> >  
> > -- 
> > 2.15.1
> > 
> 
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 05/10] xen-hvm: initialize fw_cfg interface
@ 2018-02-28  8:16         ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  8:16 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Paolo Bonzini, Chao Peng, xen-devel, Dan Williams,
	Richard Henderson

On 02/27/18 16:46 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:07PM +0800, Haozhong Zhang wrote:
> > Xen is going to reuse QEMU to build ACPI of some devices (e.g., NFIT
> > and SSDT for NVDIMM) for HVM domains. The existing QEMU ACPI build
> > code requires a fw_cfg interface which will also be used to pass QEMU
> > built ACPI to Xen. Therefore, we need to initialize fw_cfg when any
> > ACPI is going to be built by QEMU.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Richard Henderson <rth@twiddle.net>
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > ---
> >  hw/i386/xen/xen-hvm.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> > index fe01b7a025..4b29f4052b 100644
> > --- a/hw/i386/xen/xen-hvm.c
> > +++ b/hw/i386/xen/xen-hvm.c
> > @@ -14,6 +14,7 @@
> >  #include "hw/pci/pci.h"
> >  #include "hw/i386/pc.h"
> >  #include "hw/i386/apic-msidef.h"
> > +#include "hw/loader.h"
> >  #include "hw/xen/xen_common.h"
> >  #include "hw/xen/xen_backend.h"
> >  #include "qmp-commands.h"
> > @@ -1234,6 +1235,14 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
> >      xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
> >  }
> >  
> > +static void xen_fw_cfg_init(PCMachineState *pcms)
> > +{
> 
> The fw_cfg interface might already be initialized, it is used for
> "direct kernel boot" on hvm. It is initialized in xen_load_linux().
>

xen_hvm_init() --> xen_fw_cfg_init() are called before
xen_load_linux(). I'll add a check in xen_load_linux() to avoid
redoing fw_cfg_init_io and rom_set_fw.

Haozhong

> > +    FWCfgState *fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> > +
> > +    rom_set_fw(fw_cfg);
> > +    pcms->fw_cfg = fw_cfg;
> > +}
> > +
> >  void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
> >  {
> >      int i, rc;
> > @@ -1384,6 +1393,9 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
> >  
> >      /* Disable ACPI build because Xen handles it */
> >      pcms->acpi_build_enabled = false;
> > +    if (pcms->acpi_build_enabled) {
> > +        xen_fw_cfg_init(pcms);
> > +    }
> >  
> >      return;
> >  
> > -- 
> > 2.15.1
> > 
> 
> -- 
> Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  2018-02-27 18:03   ` Anthony PERARD
@ 2018-02-28  8:18     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  8:18 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich, Chao Peng,
	Dan Williams

On 02/27/18 18:03 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:10:22PM +0800, Haozhong Zhang wrote:
> > diff --git a/tools/libacpi/qemu_fw_cfg.c b/tools/libacpi/qemu_fw_cfg.c
> > new file mode 100644
> > index 0000000000..254d2f575d
> > --- /dev/null
> > +++ b/tools/libacpi/qemu_fw_cfg.c
> > @@ -0,0 +1,66 @@
> > +/*
> > + * libacpi/qemu_fw_cfg.c
> > + *
> > + * Driver of QEMU fw_cfg interface. The reference document can be found at
> > + * https://github.com/qemu/qemu/blob/master/docs/specs/fw_cfg.txt.
> 
> On github.com, it's only a mirror. I think the URL should read:
> https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/fw_cfg.txt;hb=HEAD

will fix the url in the next version

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  2018-02-27 17:37   ` Anthony PERARD
@ 2018-02-28  9:17     ` Haozhong Zhang
  2018-03-02 11:26       ` Anthony PERARD
  0 siblings, 1 reply; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  9:17 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich, Chao Peng,
	Dan Williams

On 02/27/18 17:37 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:10:22PM +0800, Haozhong Zhang wrote:
> > Add a function in libacpi to detect QEMU fw_cfg interface. Limit the
> > usage of fw_cfg interface to hvmloader now, so use stub functions for
> > others.
> 
> I think libacpi is not the right place for a driver. The fw_cfg driver
> would be better in hvmloader.

Yes, I can move it to hvmloader. My original thought was it might be
reused (by replacing those stub functions) when someone wants to add
vNVDIMM support to PVH domU and still use QEMU as the device model
for vNVDIMM.

> 
> As to copy the ACPI tables from fw_cfg to libacpi, maybe the passthrough
> tables (or an improvement of it) could be use. (It is already to to add
> extra tables from libxl (HVM_XS_ACPI_PT_ADDRESS).)
>

They are doing the same job (transferring guest ACPI from host to
guest) in two quite different ways, rather than two pieces of jobs not
completely overlap, so I think it's hard to let them collaborate with
each other. Do you have any idea in mind?

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 34/41] tools/libacpi: probe QEMU ACPI ROMs via fw_cfg interface
  2018-02-27 17:56   ` Anthony PERARD
@ 2018-02-28  9:28     ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  9:28 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich, Chao Peng,
	Dan Williams

On 02/27/18 17:56 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:10:23PM +0800, Haozhong Zhang wrote:
> > Probe following QEMU ACPI ROMs:
> >  * etc/acpi/rsdp:       QEMU RSDP, which is used to iterate other
> >                         QEMU ACPI tables in etc/acpi/tables
> > 
> >  * etc/acpi/tables:     other QEMU ACPI tables
> > 
> >  * etc/table-loader:    QEMU BIOSLinkerLoader ROM, which can be
> >                         executed to load QEMU ACPI tables
> > 
> >  * etc/acpi/nvdimm-mem: RAM which is used as NVDIMM ACPI DSM buffer,
> >                         the exact location will be allocated during
> >                         the execution of /etc/table-loader
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> 
> > diff --git a/tools/libacpi/qemu_loader.c b/tools/libacpi/qemu_loader.c
> > new file mode 100644
> > index 0000000000..c0ed3b0ad0
> > --- /dev/null
> > +++ b/tools/libacpi/qemu_loader.c
> > @@ -0,0 +1,82 @@
> > +/*
> > + * libacpi/qemu_loader.c
> > + *
> > + * Driver of QEMU BIOSLinkerLoader interface. The reference document
> > + * can be found at
> > + * https://github.com/qemu/qemu/blob/master/hw/acpi/bios-linker-loader.c.
> 
> That's only a mirror, the official QEMU tree is on git.qemu.org. So I
> think the URL should read:
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/acpi/bios-linker-loader.c;hb=HEAD

will fix the url

> 
> > + *
> > + * Copyright (C) 2017,  Intel Corporation
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License, version 2.1, as published by the Free Software Foundation.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include LIBACPI_STDUTILS
> > +#include "libacpi.h"
> > +#include "qemu.h"
> > +
> > +struct rom {
> > +    struct fw_cfg_file file;
> > +    struct rom *next;
> > +};
> > +
> > +static struct rom *roms = NULL;
> > +static struct rom *bios_loader = NULL;
> > +
> > +static bool rom_needed(const char *file_name)
> > +{
> > +    return
> > +        !strncmp(file_name, "etc/acpi/rsdp", FW_CFG_FILE_PATH_MAX_LENGTH) ||
> > +        !strncmp(file_name, "etc/acpi/tables", FW_CFG_FILE_PATH_MAX_LENGTH) ||
> > +        !strncmp(file_name, "etc/table-loader", FW_CFG_FILE_PATH_MAX_LENGTH) ||
> > +        !strncmp(file_name, "etc/acpi/nvdimm-mem", FW_CFG_FILE_PATH_MAX_LENGTH);
> 
> Is it necessary to filter the "files" that are available via fw_cfg? Is
> there enough memory for hvmloader to just alocate the "struct rom" for
> every available files? Other solution might be to filter base on
> "etc/acpi/*" + "etc/table-loader".
>

The filter here is primarily to avoid loading unused roms in case that
their presence may have harmful side-effects (though I can't image what
side-effects could be).

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
  2018-02-27 17:22     ` Anthony PERARD
  (?)
  (?)
@ 2018-02-28  9:36     ` Haozhong Zhang
  2018-03-02 12:03         ` Anthony PERARD
  -1 siblings, 1 reply; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  9:36 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > This is the QEMU part patches that works with the associated Xen
> > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > guest address space for vNVDIMM devices.
> 
> I've got other question, and maybe possible improvements.
> 
> When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> which use more guest memory. Do you know if those regions are used with
> your patch series on Xen?

Yes, that's why dm_acpi_size is introduced.

> Otherwise, we could try to avoid their
> creation with this:
> In xenfv_machine_options()
> m->rom_file_has_mr = false;
> (setting this in xen_hvm_init() would probably be better, but I havn't
> try)

If my memory is correct, simply setting rom_file_has_mr to false does
not work (though I cannot remind the exact reason). I'll have a look
as the code to refresh my memory.

Haozhong

> 
> If this is possible, libxl would not need to allocate more memory for
> the guest (dm_acpi_size).
> 
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
  2018-02-27 17:22     ` Anthony PERARD
  (?)
@ 2018-02-28  9:36     ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-02-28  9:36 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Igor Mammedov, Paolo Bonzini, Chao Peng, xen-devel,
	Dan Williams, Richard Henderson, Xiao Guangrong

On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > This is the QEMU part patches that works with the associated Xen
> > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > guest address space for vNVDIMM devices.
> 
> I've got other question, and maybe possible improvements.
> 
> When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> which use more guest memory. Do you know if those regions are used with
> your patch series on Xen?

Yes, that's why dm_acpi_size is introduced.

> Otherwise, we could try to avoid their
> creation with this:
> In xenfv_machine_options()
> m->rom_file_has_mr = false;
> (setting this in xen_hvm_init() would probably be better, but I havn't
> try)

If my memory is correct, simply setting rom_file_has_mr to false does
not work (though I cannot remind the exact reason). I'll have a look
as the code to refresh my memory.

Haozhong

> 
> If this is possible, libxl would not need to allocate more memory for
> the guest (dm_acpi_size).
> 
> -- 
> Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  2018-02-28  9:17     ` Haozhong Zhang
@ 2018-03-02 11:26       ` Anthony PERARD
  2018-03-05  7:55         ` Haozhong Zhang
  0 siblings, 1 reply; 113+ messages in thread
From: Anthony PERARD @ 2018-03-02 11:26 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Jan Beulich, Chao Peng,
	Dan Williams

On Wed, Feb 28, 2018 at 05:17:23PM +0800, Haozhong Zhang wrote:
> On 02/27/18 17:37 +0000, Anthony PERARD wrote:
> > On Thu, Dec 07, 2017 at 06:10:22PM +0800, Haozhong Zhang wrote:
> > > Add a function in libacpi to detect QEMU fw_cfg interface. Limit the
> > > usage of fw_cfg interface to hvmloader now, so use stub functions for
> > > others.
> > 
> > I think libacpi is not the right place for a driver. The fw_cfg driver
> > would be better in hvmloader.
> 
> Yes, I can move it to hvmloader. My original thought was it might be
> reused (by replacing those stub functions) when someone wants to add
> vNVDIMM support to PVH domU and still use QEMU as the device model
> for vNVDIMM.

:(, I don't see how the fw_cfg drivers could be reuse in a PVH guest,
right now. It is only usefull when runned from inside the guest. So far,
I think libacpi is use in Xen, maybe libxl and hvmloader.

If QEMU's fw_cfg was available within a PVH guest, I guess we could use
hvmloader, or teach OVMF to merge the tables from Xen and QEMU, or maybe
GRUB or Linux could learn about fw_cfg.

Anyway, I think for now, the fw_cfg drivers is better in hvmloader, and
we can move the code later if/when needed.

> > As to copy the ACPI tables from fw_cfg to libacpi, maybe the passthrough
> > tables (or an improvement of it) could be use. (It is already to to add
> > extra tables from libxl (HVM_XS_ACPI_PT_ADDRESS).)
> >
> 
> They are doing the same job (transferring guest ACPI from host to
> guest) in two quite different ways, rather than two pieces of jobs not
> completely overlap, so I think it's hard to let them collaborate with
> each other. Do you have any idea in mind?

I don't really have an idea in mind. I guess it is going to depends of
what libacpi have to do, once the fw_cfg drivers have done the jobs of
loading the ACPI tables in memory.

Thanks,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
  2018-02-28  7:56       ` [Qemu-devel] " Haozhong Zhang
@ 2018-03-02 11:50           ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-03-02 11:50 UTC (permalink / raw)
  To: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin

On Wed, Feb 28, 2018 at 03:56:54PM +0800, Haozhong Zhang wrote:
> On 02/27/18 16:41 +0000, Anthony PERARD wrote:
> > On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> > > @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
> > >      }
> > >  
> > >      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> > > -    vmstate_register_ram(vmstate_mr, dev);
> > > +    /* memory-backend-xen is not backed by RAM. */
> > > +    if (!xen_enabled()) {
> > 
> > Is it possible to have the same condition as the one used in
> > host_memory_backend_memory_complete? i.e. base on whether the memory
> > region is mapped or not (backend->mr.ram_block).
> 
> Like "if (!xen_enabled() || backend->mr.ram_block))"? No, it will mute
> the abortion (vmstate_register_ram --> qemu_ram_set_idstr ) caused by
> the case that !backend->mr.ram_block in the non-xen environment.

In non-xen environment, vmstate_register_ram() will be called, because
!xen_enabled() is true, it would not matter if there is a ram_block or
not.

But if there is a memory-backend that can run in a xen environment that
have a ram_block, vmstate_register_ram would not be called in the
origial patch, but if we use (!xen_enabled() || vmstate_mr->ram_block)
as condition then vmstate_register_ram will be called.

Is this make sense?

> > > +        vmstate_register_ram(vmstate_mr, dev);
> > > +    }
> > >      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
> > >  
> > >  out:

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
@ 2018-03-02 11:50           ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-03-02 11:50 UTC (permalink / raw)
  To: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin

On Wed, Feb 28, 2018 at 03:56:54PM +0800, Haozhong Zhang wrote:
> On 02/27/18 16:41 +0000, Anthony PERARD wrote:
> > On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> > > @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
> > >      }
> > >  
> > >      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> > > -    vmstate_register_ram(vmstate_mr, dev);
> > > +    /* memory-backend-xen is not backed by RAM. */
> > > +    if (!xen_enabled()) {
> > 
> > Is it possible to have the same condition as the one used in
> > host_memory_backend_memory_complete? i.e. base on whether the memory
> > region is mapped or not (backend->mr.ram_block).
> 
> Like "if (!xen_enabled() || backend->mr.ram_block))"? No, it will mute
> the abortion (vmstate_register_ram --> qemu_ram_set_idstr ) caused by
> the case that !backend->mr.ram_block in the non-xen environment.

In non-xen environment, vmstate_register_ram() will be called, because
!xen_enabled() is true, it would not matter if there is a ram_block or
not.

But if there is a memory-backend that can run in a xen environment that
have a ram_block, vmstate_register_ram would not be called in the
origial patch, but if we use (!xen_enabled() || vmstate_mr->ram_block)
as condition then vmstate_register_ram will be called.

Is this make sense?

> > > +        vmstate_register_ram(vmstate_mr, dev);
> > > +    }
> > >      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
> > >  
> > >  out:

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
  2018-02-28  9:36     ` [Qemu-devel] " Haozhong Zhang
@ 2018-03-02 12:03         ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-03-02 12:03 UTC (permalink / raw)
  To: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

On Wed, Feb 28, 2018 at 05:36:59PM +0800, Haozhong Zhang wrote:
> On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> > On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > > This is the QEMU part patches that works with the associated Xen
> > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > guest address space for vNVDIMM devices.
> > 
> > I've got other question, and maybe possible improvements.
> > 
> > When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> > which use more guest memory. Do you know if those regions are used with
> > your patch series on Xen?
> 
> Yes, that's why dm_acpi_size is introduced.
> 
> > Otherwise, we could try to avoid their
> > creation with this:
> > In xenfv_machine_options()
> > m->rom_file_has_mr = false;
> > (setting this in xen_hvm_init() would probably be better, but I havn't
> > try)
> 
> If my memory is correct, simply setting rom_file_has_mr to false does
> not work (though I cannot remind the exact reason). I'll have a look
> as the code to refresh my memory.

I've played a bit with this idea, but without a proper NVDIMM available
for the guest, so I don't know if it's going to work properly without
the mr.

To make it work, I had to disable some code in acpi_build_update() that
make use of the MemoryRegions, as well as an assert in acpi_setup().
After those small hacks, I could boot the guest, and I've check that the
expected ACPI tables where there, and they looked correct to my eyes.
And least `ndctl list` works and showed the nvdimm (that I have
configured on QEMU's cmdline).

But I may not have been far enough with my tests, and maybe something
later relies on the MRs, especially the _DSM method that I don't know if
it was working properly.

Anyway, that why I proposed the idea, and if we can avoid more
uncertainty about how much guest memory QEMU is going to use, that would
be good.

Thanks,

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
@ 2018-03-02 12:03         ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-03-02 12:03 UTC (permalink / raw)
  To: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

On Wed, Feb 28, 2018 at 05:36:59PM +0800, Haozhong Zhang wrote:
> On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> > On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > > This is the QEMU part patches that works with the associated Xen
> > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > guest address space for vNVDIMM devices.
> > 
> > I've got other question, and maybe possible improvements.
> > 
> > When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> > which use more guest memory. Do you know if those regions are used with
> > your patch series on Xen?
> 
> Yes, that's why dm_acpi_size is introduced.
> 
> > Otherwise, we could try to avoid their
> > creation with this:
> > In xenfv_machine_options()
> > m->rom_file_has_mr = false;
> > (setting this in xen_hvm_init() would probably be better, but I havn't
> > try)
> 
> If my memory is correct, simply setting rom_file_has_mr to false does
> not work (though I cannot remind the exact reason). I'll have a look
> as the code to refresh my memory.

I've played a bit with this idea, but without a proper NVDIMM available
for the guest, so I don't know if it's going to work properly without
the mr.

To make it work, I had to disable some code in acpi_build_update() that
make use of the MemoryRegions, as well as an assert in acpi_setup().
After those small hacks, I could boot the guest, and I've check that the
expected ACPI tables where there, and they looked correct to my eyes.
And least `ndctl list` works and showed the nvdimm (that I have
configured on QEMU's cmdline).

But I may not have been far enough with my tests, and maybe something
later relies on the MRs, especially the _DSM method that I don't know if
it was working properly.

Anyway, that why I proposed the idea, and if we can avoid more
uncertainty about how much guest memory QEMU is going to use, that would
be good.

Thanks,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
  2018-03-02 11:50           ` Anthony PERARD
  (?)
@ 2018-03-05  7:53           ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-03-05  7:53 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin

On 03/02/18 11:50 +0000, Anthony PERARD wrote:
> On Wed, Feb 28, 2018 at 03:56:54PM +0800, Haozhong Zhang wrote:
> > On 02/27/18 16:41 +0000, Anthony PERARD wrote:
> > > On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> > > > @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
> > > >      }
> > > >  
> > > >      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> > > > -    vmstate_register_ram(vmstate_mr, dev);
> > > > +    /* memory-backend-xen is not backed by RAM. */
> > > > +    if (!xen_enabled()) {
> > > 
> > > Is it possible to have the same condition as the one used in
> > > host_memory_backend_memory_complete? i.e. base on whether the memory
> > > region is mapped or not (backend->mr.ram_block).
> > 
> > Like "if (!xen_enabled() || backend->mr.ram_block))"? No, it will mute
> > the abortion (vmstate_register_ram --> qemu_ram_set_idstr ) caused by
> > the case that !backend->mr.ram_block in the non-xen environment.
> 
> In non-xen environment, vmstate_register_ram() will be called, because
> !xen_enabled() is true, it would not matter if there is a ram_block or
> not.

Sorry, I really meant 'if (backend->mr.ram_block)', which may mute the
abortion in non-xen environment. 'if (!xen_enabled())' keeps the
original semantics in non-xen environment, so it's unlikely to break
the non-xen usage.

Haozhong

> 
> But if there is a memory-backend that can run in a xen environment that
> have a ram_block, vmstate_register_ram would not be called in the
> origial patch, but if we use (!xen_enabled() || vmstate_mr->ram_block)
> as condition then vmstate_register_ram will be called.
> 
> Is this make sense?
> 
> > > > +        vmstate_register_ram(vmstate_mr, dev);
> > > > +    }
> > > >      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
> > > >  
> > > >  out:
> 
> -- 
> Anthony PERARD
> 

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen
  2018-03-02 11:50           ` Anthony PERARD
  (?)
  (?)
@ 2018-03-05  7:53           ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-03-05  7:53 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Igor Mammedov, Chao Peng, xen-devel, Dan Williams

On 03/02/18 11:50 +0000, Anthony PERARD wrote:
> On Wed, Feb 28, 2018 at 03:56:54PM +0800, Haozhong Zhang wrote:
> > On 02/27/18 16:41 +0000, Anthony PERARD wrote:
> > > On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> > > > @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
> > > >      }
> > > >  
> > > >      memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> > > > -    vmstate_register_ram(vmstate_mr, dev);
> > > > +    /* memory-backend-xen is not backed by RAM. */
> > > > +    if (!xen_enabled()) {
> > > 
> > > Is it possible to have the same condition as the one used in
> > > host_memory_backend_memory_complete? i.e. base on whether the memory
> > > region is mapped or not (backend->mr.ram_block).
> > 
> > Like "if (!xen_enabled() || backend->mr.ram_block))"? No, it will mute
> > the abortion (vmstate_register_ram --> qemu_ram_set_idstr ) caused by
> > the case that !backend->mr.ram_block in the non-xen environment.
> 
> In non-xen environment, vmstate_register_ram() will be called, because
> !xen_enabled() is true, it would not matter if there is a ram_block or
> not.

Sorry, I really meant 'if (backend->mr.ram_block)', which may mute the
abortion in non-xen environment. 'if (!xen_enabled())' keeps the
original semantics in non-xen environment, so it's unlikely to break
the non-xen usage.

Haozhong

> 
> But if there is a memory-backend that can run in a xen environment that
> have a ram_block, vmstate_register_ram would not be called in the
> origial patch, but if we use (!xen_enabled() || vmstate_mr->ram_block)
> as condition then vmstate_register_ram will be called.
> 
> Is this make sense?
> 
> > > > +        vmstate_register_ram(vmstate_mr, dev);
> > > > +    }
> > > >      numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
> > > >  
> > > >  out:
> 
> -- 
> Anthony PERARD
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface
  2018-03-02 11:26       ` Anthony PERARD
@ 2018-03-05  7:55         ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-03-05  7:55 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich, Chao Peng,
	Dan Williams

On 03/02/18 11:26 +0000, Anthony PERARD wrote:
> On Wed, Feb 28, 2018 at 05:17:23PM +0800, Haozhong Zhang wrote:
> > On 02/27/18 17:37 +0000, Anthony PERARD wrote:
> > > On Thu, Dec 07, 2017 at 06:10:22PM +0800, Haozhong Zhang wrote:
> > > > Add a function in libacpi to detect QEMU fw_cfg interface. Limit the
> > > > usage of fw_cfg interface to hvmloader now, so use stub functions for
> > > > others.
> > > 
> > > I think libacpi is not the right place for a driver. The fw_cfg driver
> > > would be better in hvmloader.
> > 
> > Yes, I can move it to hvmloader. My original thought was it might be
> > reused (by replacing those stub functions) when someone wants to add
> > vNVDIMM support to PVH domU and still use QEMU as the device model
> > for vNVDIMM.
> 
> :(, I don't see how the fw_cfg drivers could be reuse in a PVH guest,
> right now. It is only usefull when runned from inside the guest. So far,
> I think libacpi is use in Xen, maybe libxl and hvmloader.
> 
> If QEMU's fw_cfg was available within a PVH guest, I guess we could use
> hvmloader, or teach OVMF to merge the tables from Xen and QEMU, or maybe
> GRUB or Linux could learn about fw_cfg.
> 
> Anyway, I think for now, the fw_cfg drivers is better in hvmloader, and
> we can move the code later if/when needed.
>

You are right, I'll move it to hvmloader.

> > > As to copy the ACPI tables from fw_cfg to libacpi, maybe the passthrough
> > > tables (or an improvement of it) could be use. (It is already to to add
> > > extra tables from libxl (HVM_XS_ACPI_PT_ADDRESS).)
> > >
> > 
> > They are doing the same job (transferring guest ACPI from host to
> > guest) in two quite different ways, rather than two pieces of jobs not
> > completely overlap, so I think it's hard to let them collaborate with
> > each other. Do you have any idea in mind?
> 
> I don't really have an idea in mind. I guess it is going to depends of
> what libacpi have to do, once the fw_cfg drivers have done the jobs of
> loading the ACPI tables in memory.
> 
> Thanks,
> 
> -- 
> Anthony PERARD
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
  2018-03-02 12:03         ` Anthony PERARD
@ 2018-03-06  4:16           ` Haozhong Zhang
  -1 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-03-06  4:16 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

On 03/02/18 12:03 +0000, Anthony PERARD wrote:
> On Wed, Feb 28, 2018 at 05:36:59PM +0800, Haozhong Zhang wrote:
> > On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> > > On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > > > This is the QEMU part patches that works with the associated Xen
> > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > guest address space for vNVDIMM devices.
> > > 
> > > I've got other question, and maybe possible improvements.
> > > 
> > > When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> > > which use more guest memory. Do you know if those regions are used with
> > > your patch series on Xen?
> > 
> > Yes, that's why dm_acpi_size is introduced.
> > 
> > > Otherwise, we could try to avoid their
> > > creation with this:
> > > In xenfv_machine_options()
> > > m->rom_file_has_mr = false;
> > > (setting this in xen_hvm_init() would probably be better, but I havn't
> > > try)
> > 
> > If my memory is correct, simply setting rom_file_has_mr to false does
> > not work (though I cannot remind the exact reason). I'll have a look
> > as the code to refresh my memory.
> 
> I've played a bit with this idea, but without a proper NVDIMM available
> for the guest, so I don't know if it's going to work properly without
> the mr.
> 
> To make it work, I had to disable some code in acpi_build_update() that
> make use of the MemoryRegions, as well as an assert in acpi_setup().
> After those small hacks, I could boot the guest, and I've check that the
> expected ACPI tables where there, and they looked correct to my eyes.
> And least `ndctl list` works and showed the nvdimm (that I have
> configured on QEMU's cmdline).
> 
> But I may not have been far enough with my tests, and maybe something
> later relies on the MRs, especially the _DSM method that I don't know if
> it was working properly.
> 
> Anyway, that why I proposed the idea, and if we can avoid more
> uncertainty about how much guest memory QEMU is going to use, that would
> be good.
> 

Yes, I also tested some non-trivial _DSM methods and it looks rom
files without memory regions can work with Xen after some
modifications. I'll apply this idea in the next version if no other
issues are found.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
@ 2018-03-06  4:16           ` Haozhong Zhang
  0 siblings, 0 replies; 113+ messages in thread
From: Haozhong Zhang @ 2018-03-06  4:16 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Igor Mammedov, Paolo Bonzini, Chao Peng, xen-devel,
	Dan Williams, Richard Henderson, Xiao Guangrong

On 03/02/18 12:03 +0000, Anthony PERARD wrote:
> On Wed, Feb 28, 2018 at 05:36:59PM +0800, Haozhong Zhang wrote:
> > On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> > > On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > > > This is the QEMU part patches that works with the associated Xen
> > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > guest address space for vNVDIMM devices.
> > > 
> > > I've got other question, and maybe possible improvements.
> > > 
> > > When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> > > which use more guest memory. Do you know if those regions are used with
> > > your patch series on Xen?
> > 
> > Yes, that's why dm_acpi_size is introduced.
> > 
> > > Otherwise, we could try to avoid their
> > > creation with this:
> > > In xenfv_machine_options()
> > > m->rom_file_has_mr = false;
> > > (setting this in xen_hvm_init() would probably be better, but I havn't
> > > try)
> > 
> > If my memory is correct, simply setting rom_file_has_mr to false does
> > not work (though I cannot remind the exact reason). I'll have a look
> > as the code to refresh my memory.
> 
> I've played a bit with this idea, but without a proper NVDIMM available
> for the guest, so I don't know if it's going to work properly without
> the mr.
> 
> To make it work, I had to disable some code in acpi_build_update() that
> make use of the MemoryRegions, as well as an assert in acpi_setup().
> After those small hacks, I could boot the guest, and I've check that the
> expected ACPI tables where there, and they looked correct to my eyes.
> And least `ndctl list` works and showed the nvdimm (that I have
> configured on QEMU's cmdline).
> 
> But I may not have been far enough with my tests, and maybe something
> later relies on the MRs, especially the _DSM method that I don't know if
> it was working properly.
> 
> Anyway, that why I proposed the idea, and if we can avoid more
> uncertainty about how much guest memory QEMU is going to use, that would
> be good.
> 

Yes, I also tested some non-trivial _DSM methods and it looks rom
files without memory regions can work with Xen after some
modifications. I'll apply this idea in the next version if no other
issues are found.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
  2018-03-06  4:16           ` Haozhong Zhang
@ 2018-03-06 11:38             ` Anthony PERARD
  -1 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-03-06 11:38 UTC (permalink / raw)
  To: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

On Tue, Mar 06, 2018 at 12:16:08PM +0800, Haozhong Zhang wrote:
> On 03/02/18 12:03 +0000, Anthony PERARD wrote:
> > On Wed, Feb 28, 2018 at 05:36:59PM +0800, Haozhong Zhang wrote:
> > > On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> > > > On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > > > > This is the QEMU part patches that works with the associated Xen
> > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > guest address space for vNVDIMM devices.
> > > > 
> > > > I've got other question, and maybe possible improvements.
> > > > 
> > > > When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> > > > which use more guest memory. Do you know if those regions are used with
> > > > your patch series on Xen?
> > > 
> > > Yes, that's why dm_acpi_size is introduced.
> > > 
> > > > Otherwise, we could try to avoid their
> > > > creation with this:
> > > > In xenfv_machine_options()
> > > > m->rom_file_has_mr = false;
> > > > (setting this in xen_hvm_init() would probably be better, but I havn't
> > > > try)
> > > 
> > > If my memory is correct, simply setting rom_file_has_mr to false does
> > > not work (though I cannot remind the exact reason). I'll have a look
> > > as the code to refresh my memory.
> > 
> > I've played a bit with this idea, but without a proper NVDIMM available
> > for the guest, so I don't know if it's going to work properly without
> > the mr.
> > 
> > To make it work, I had to disable some code in acpi_build_update() that
> > make use of the MemoryRegions, as well as an assert in acpi_setup().
> > After those small hacks, I could boot the guest, and I've check that the
> > expected ACPI tables where there, and they looked correct to my eyes.
> > And least `ndctl list` works and showed the nvdimm (that I have
> > configured on QEMU's cmdline).
> > 
> > But I may not have been far enough with my tests, and maybe something
> > later relies on the MRs, especially the _DSM method that I don't know if
> > it was working properly.
> > 
> > Anyway, that why I proposed the idea, and if we can avoid more
> > uncertainty about how much guest memory QEMU is going to use, that would
> > be good.
> > 
> 
> Yes, I also tested some non-trivial _DSM methods and it looks rom
> files without memory regions can work with Xen after some
> modifications. I'll apply this idea in the next version if no other
> issues are found.

Awesome, thanks.

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest
@ 2018-03-06 11:38             ` Anthony PERARD
  0 siblings, 0 replies; 113+ messages in thread
From: Anthony PERARD @ 2018-03-06 11:38 UTC (permalink / raw)
  To: qemu-devel, xen-devel, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Dan Williams, Chao Peng, Eduardo Habkost, Igor Mammedov,
	Michael S. Tsirkin, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson

On Tue, Mar 06, 2018 at 12:16:08PM +0800, Haozhong Zhang wrote:
> On 03/02/18 12:03 +0000, Anthony PERARD wrote:
> > On Wed, Feb 28, 2018 at 05:36:59PM +0800, Haozhong Zhang wrote:
> > > On 02/27/18 17:22 +0000, Anthony PERARD wrote:
> > > > On Thu, Dec 07, 2017 at 06:18:02PM +0800, Haozhong Zhang wrote:
> > > > > This is the QEMU part patches that works with the associated Xen
> > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > guest address space for vNVDIMM devices.
> > > > 
> > > > I've got other question, and maybe possible improvements.
> > > > 
> > > > When QEMU build the ACPI tables, it also initialize some MemoryRegion,
> > > > which use more guest memory. Do you know if those regions are used with
> > > > your patch series on Xen?
> > > 
> > > Yes, that's why dm_acpi_size is introduced.
> > > 
> > > > Otherwise, we could try to avoid their
> > > > creation with this:
> > > > In xenfv_machine_options()
> > > > m->rom_file_has_mr = false;
> > > > (setting this in xen_hvm_init() would probably be better, but I havn't
> > > > try)
> > > 
> > > If my memory is correct, simply setting rom_file_has_mr to false does
> > > not work (though I cannot remind the exact reason). I'll have a look
> > > as the code to refresh my memory.
> > 
> > I've played a bit with this idea, but without a proper NVDIMM available
> > for the guest, so I don't know if it's going to work properly without
> > the mr.
> > 
> > To make it work, I had to disable some code in acpi_build_update() that
> > make use of the MemoryRegions, as well as an assert in acpi_setup().
> > After those small hacks, I could boot the guest, and I've check that the
> > expected ACPI tables where there, and they looked correct to my eyes.
> > And least `ndctl list` works and showed the nvdimm (that I have
> > configured on QEMU's cmdline).
> > 
> > But I may not have been far enough with my tests, and maybe something
> > later relies on the MRs, especially the _DSM method that I don't know if
> > it was working properly.
> > 
> > Anyway, that why I proposed the idea, and if we can avoid more
> > uncertainty about how much guest memory QEMU is going to use, that would
> > be good.
> > 
> 
> Yes, I also tested some non-trivial _DSM methods and it looks rom
> files without memory regions can work with Xen after some
> modifications. I'll apply this idea in the next version if no other
> issues are found.

Awesome, thanks.

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
  2017-12-07 10:09 ` [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
  2018-01-04  6:12   ` Chao Peng
@ 2018-05-07 15:59   ` Jan Beulich
  1 sibling, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2018-05-07 15:59 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, dan.j.williams, xen-devel, Chao Peng

>>> On 07.12.17 at 11:09, <haozhong.zhang@intel.com> wrote:
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1295,12 +1295,8 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
>          return 0;
>  
>      /* Make sure the new range is not present now */
> -    sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  & ~(PDX_GROUP_COUNT - 1))
> -            / PDX_GROUP_COUNT;
> +    sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;

I agree that rounding up here is bogus.

>      eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;
> -    if (sidx >= eidx)
> -        return 0;
> -
>      s = find_next_zero_bit(pdx_group_valid, eidx, sidx);

But isn't this one wrong too, needing eidx + 1 as argument instead? Also
for the following find_next_bit() then?

Also please don't drop the blank line there.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 113+ messages in thread

end of thread, other threads:[~2018-05-07 15:59 UTC | newest]

Thread overview: 113+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-07 10:09 [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Haozhong Zhang
2017-12-07 10:09 ` [RFC XEN PATCH v4 01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check() Haozhong Zhang
2018-01-04  6:12   ` Chao Peng
2018-05-07 15:59   ` Jan Beulich
2017-12-07 10:09 ` [RFC XEN PATCH v4 02/41] x86_64/mm: avoid cleaning the unmapped frame table Haozhong Zhang
2018-01-04  6:20   ` Chao Peng
2017-12-07 10:09 ` [RFC XEN PATCH v4 03/41] hvmloader/util: do not compare characters after '\0' in strncmp Haozhong Zhang
2018-01-04  6:23   ` Chao Peng
2017-12-07 10:09 ` [RFC XEN PATCH v4 04/41] xen/common: add Kconfig item for pmem support Haozhong Zhang
2017-12-07 10:09 ` [RFC XEN PATCH v4 05/41] x86/mm: exclude PMEM regions from initial frametable Haozhong Zhang
2017-12-07 10:09 ` [RFC XEN PATCH v4 06/41] acpi: probe valid PMEM regions via NFIT Haozhong Zhang
2017-12-07 10:09 ` [RFC XEN PATCH v4 07/41] xen/pmem: register valid PMEM regions to Xen hypervisor Haozhong Zhang
2017-12-07 10:09 ` [RFC XEN PATCH v4 08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0 Haozhong Zhang
2017-12-07 10:09 ` [RFC XEN PATCH v4 09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op Haozhong Zhang
2017-12-07 10:09 ` [RFC XEN PATCH v4 10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 12/41] tools/xl: add xl command 'pmem-list' Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 13/41] x86_64/mm: refactor memory_add() Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 14/41] x86_64/mm: allow customized location of extended frametable and M2P table Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 15/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 16/41] tools/xl: accept all bases in parse_ulong() Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 17/41] tools/xl: expose parse_ulong() Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 18/41] tools/xl: add xl command 'pmem-setup' Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 19/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 20/41] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 21/41] tools/xl: add option '--mgmt | -m' to xl command pmem-list Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 22/41] xen/pmem: support setup PMEM region for guest data usage Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 23/41] tools/xl: add option '--data | -d' to xl command pmem-setup Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 24/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 25/41] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 26/41] tools/xl: add option '--data | -d' to xl command pmem-list Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 27/41] xen/pmem: add function to map PMEM pages to HVM domain Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 28/41] xen/pmem: release PMEM pages on HVM domain destruction Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 29/41] xen: add hypercall XENMEM_populate_pmem_map Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 30/41] tools: reserve extra guest memory for ACPI from device model Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 31/41] tools/libacpi: add callback to translate GPA to GVA Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 32/41] tools/libacpi: build a DM ACPI signature blacklist Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 33/41] tools/libacpi, hvmloader: detect QEMU fw_cfg interface Haozhong Zhang
2018-02-27 17:37   ` Anthony PERARD
2018-02-28  9:17     ` Haozhong Zhang
2018-03-02 11:26       ` Anthony PERARD
2018-03-05  7:55         ` Haozhong Zhang
2018-02-27 18:03   ` Anthony PERARD
2018-02-28  8:18     ` Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 34/41] tools/libacpi: probe QEMU ACPI ROMs via " Haozhong Zhang
2018-02-27 17:56   ` Anthony PERARD
2018-02-28  9:28     ` Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 35/41] tools/libacpi: add a QEMU BIOSLinkLoader executor Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 36/41] tools/libacpi: add function to get the data of QEMU RSDP Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 37/41] tools/libacpi: load QEMU ACPI Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 38/41] tools/xl: add xl domain configuration for virtual NVDIMM devices Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 39/41] tools/libxl: allow aborting domain creation on fatal QMP init errors Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 40/41] tools/libxl: initiate PMEM mapping via QMP callback Haozhong Zhang
2017-12-07 10:10 ` [RFC XEN PATCH v4 41/41] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
2017-12-07 10:18 ` [Qemu-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest Haozhong Zhang
2017-12-07 10:18   ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 01/10] xen-hvm: remove a trailing space Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2018-02-27 16:37     ` [Qemu-devel] " Anthony PERARD
2018-02-27 16:37       ` Anthony PERARD
2018-02-28  7:47       ` [Qemu-devel] " Haozhong Zhang
2018-02-28  7:47       ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2018-02-27 16:41     ` [Qemu-devel] " Anthony PERARD
2018-02-27 16:41       ` Anthony PERARD
2018-02-28  7:56       ` [Qemu-devel] " Haozhong Zhang
2018-03-02 11:50         ` Anthony PERARD
2018-03-02 11:50           ` Anthony PERARD
2018-03-05  7:53           ` [Qemu-devel] " Haozhong Zhang
2018-03-05  7:53           ` Haozhong Zhang
2018-02-28  7:56       ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 04/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 05/10] xen-hvm: initialize fw_cfg interface Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2018-02-27 16:46     ` [Qemu-devel] " Anthony PERARD
2018-02-27 16:46       ` Anthony PERARD
2018-02-28  8:16       ` [Qemu-devel] " Haozhong Zhang
2018-02-28  8:16         ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 06/10] hw/acpi-build, xen-hvm: introduce a Xen-specific ACPI builder Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 07/10] xen-hvm: add functions to copy data from/to HVM memory Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 08/10] nvdimm acpi: add functions to access DSM memory on Xen Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 09/10] nvdimm acpi: add compatibility for 64-bit integer in ACPI 2.0 and later Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2017-12-07 10:18   ` [Qemu-devel] [RFC QEMU PATCH v4 10/10] xen-hvm: enable building NFIT and SSDT of vNVDIMM for HVM domains Haozhong Zhang
2017-12-07 10:18     ` Haozhong Zhang
2018-02-27 17:22   ` [Qemu-devel] [RFC QEMU PATCH v4 00/10] Implement vNVDIMM for Xen HVM guest Anthony PERARD
2018-02-27 17:22     ` Anthony PERARD
2018-02-28  9:36     ` Haozhong Zhang
2018-02-28  9:36     ` [Qemu-devel] " Haozhong Zhang
2018-03-02 12:03       ` Anthony PERARD
2018-03-02 12:03         ` Anthony PERARD
2018-03-06  4:16         ` [Qemu-devel] [Xen-devel] " Haozhong Zhang
2018-03-06  4:16           ` Haozhong Zhang
2018-03-06 11:38           ` [Qemu-devel] [Xen-devel] " Anthony PERARD
2018-03-06 11:38             ` Anthony PERARD
2018-02-09 12:33 ` [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains Roger Pau Monné
2018-02-12  1:25   ` Haozhong Zhang
2018-02-12 10:05     ` Roger Pau Monné
2018-02-13 10:06       ` Jan Beulich
2018-02-13 10:29         ` Roger Pau Monné
2018-02-13 11:05           ` Jan Beulich
2018-02-13 11:13             ` Roger Pau Monné
2018-02-13 13:40               ` Jan Beulich
2018-02-13 15:39                 ` Roger Pau Monné
2018-02-15  6:59                   ` Haozhong Zhang
2018-02-15  6:44       ` Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.