All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains
@ 2016-10-10  0:32 Haozhong Zhang
  2016-10-10  0:32 ` [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table Haozhong Zhang
                   ` (16 more replies)
  0 siblings, 17 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Andrew Cooper, Ian Jackson,
	Jan Beulich, Wei Liu, Daniel De Graaf

Overview
========
This RFC Xen patch series along with corresponding patch series of
QEMU, Linux kernel and ndctl implements the basic functionality of
vNVDIMM for HVM domains.

It currently supports to assign host pmem devices or files on host
pmem devices to HVM domains as virtual NVDIMM devices. Other functions
including DSM, hotplug, RAS and flush via ACPI will be implemented by
later patches.

Design and Implementation
=========================
The design of vNVDIMM can be found at
  https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html.

All patch series can be found at
  Xen:          https://github.com/hzzhan9/xen.git nvdimm-rfc-v1
  QEMU:         https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v1
  Linux kernel: https://github.com/hzzhan9/nvdimm.git xen-nvdimm-rfc-v1
  ndctl:        https://github.com/hzzhan9/ndctl.git pfn-xen-rfc-v1

For Xen patches,
 - Patch 01 - 05 implement the hypervisor part to map host pmem pages
   to guest;
 - Patch 06 - 11 implement the mechanism to pass guest ACPI tables and
   namespace devices from QEMU;
 - Patch 12 parses the xl vNVDIMM configs;
 - Patch 13 - 16 add the toolstack part to map host pmem devices or
   files on host pmem devices.

How to test
===========
1. Check out Xen and QEMU from above repositories and branches. Replace
   the default qemu-xen with the checked out QEMU, build and install Xen.

2. Check out Linux kernel from above repository and branch. Build and
   install the kernel as Dom0 kernel. Make sure the following kernel
   configs are selected as y or m:
       CONFIG_ACPI_NFIT
       CONFIG_LIBNVDIMM
       CONFIG_BLK_DEV_PMEM
       CONFIG_NVDIMM_PFN
       CONFIG_FS_DAX

3. Build and install ndctl from above repository and branch.

4. Boot from Xen and Dom0 Linux kernel built in step 1 and 2.

5. Suppose there is one host pmem namespace that is recognized by
   Dom0 Linux NVDIMM driver as namespace0.0 and block device
   /dev/pmem0. Turn it into Xen mode by ndctl:
       ndctl create-namespace -f -e namespace0.0 -m memory -M xen

   If the above command succeeds, following messages or similar should
   appear in Xen dmesg:
       (XEN) pmem: pfns     0xa40000 - 0xb40000
       (XEN)       reserved 0xa40002 - 0xa44a00
       (XEN)       data     0xa44a00 - 0xb40000

   The first line indicates the physical pages of the entire pmem
   namespace. The second line indicates the physical pages of the
   reserved area in the namespace that are used by Xen to put
   management data structures (i.e. frame table and M2P table). The
   third line indicates the physical pages in the namespace that can
   be used by Dom0 and HVM domU.

6-a. You can map the entire namespace to a HVM domU by adding the
   following line to its xl config file:
       vnvdimms = [ '/dev/pmem0' ]

6-b. Or, you can map a file on the namespace to a HVM domU:
       mkfs.ext4 /dev/pmem0
       mount -o dax /dev/pmem0 /mnt/dax/
       dd if=/dev/zero of=/mnt/dax/foo bs=1G count=2
   and add the following line to the domain config:
       vnvdimms = [ '/mnt/dax/foo' ]

7. If the NVDIMM driver is built with the guest Linux kernel, a block
   device /dev/pmem0 will be recognized by the guest kernel, and you
   can use it as normal.

You can take above steps in the nested virtualization environment
provided by KVM, especially when NVDIMM hardware is not widely
available yet.
1. Load KVM module with nested on
       modprobe kvm-intel nested=1
       
2. Create a file as the backend of the virtual NVDIMM device used in L1.
       dd if=/dev/zero of=/tmp/nvdimm bs=1G count=4
       
3. Start QEMU v2.6 or newer.
       qemu-system-x86_64 -enable-kvm -smp 4 -cpu qemu64,+vmx \
			  -hda /path/to/guest/image \
			  -machine pc,nvdimm \
			  -m 8G,slots=2,maxmem=16G \
			  -object memory-backend-file,id=mem1,share,mem-path=/tmp/nvdimm,size=4G \
			  -device nvdimm,memdev=mem1,id=nv1

4. Take previous steps 1 - 7 in L1.
       

Haozhong Zhang (16):
  01/ x86_64/mm: explicitly specify the location to place the frame table
  02/ x86_64/mm: explicitly specify the location to place the M2P table
  03/ xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  04/ xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest
  05/ xen/x86: release pmem pages at domain destroy
  06/ tools: reserve guest memory for ACPI from device model
  07/ tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address
  08/ tools/libacpi: expose details of memory allocation callback
  09/ tools/libacpi: add callbacks to access XenStore
  10/ tools/libacpi: add a simple AML builder
  11/ tools/libacpi: load ACPI built by the device model
  12/ tools/libxl: build qemu options from xl vNVDIMM configs
  13/ tools/libxl: add support to map host pmem device to guests
  14/ tools/libxl: add support to map files on pmem devices to guests
  15/ tools/libxl: handle return code of libxl__qmp_initializations()
  16/ tools/libxl: initiate pmem mapping via qmp callback

 tools/firmware/hvmloader/Makefile       |   3 +-
 tools/firmware/hvmloader/util.c         |  70 +++++++
 tools/firmware/hvmloader/util.h         |   3 +
 tools/firmware/hvmloader/xenbus.c       |  20 ++
 tools/libacpi/acpi2_0.h                 |   2 +
 tools/libacpi/aml_build.c               | 254 +++++++++++++++++++++++++
 tools/libacpi/aml_build.h               |  83 ++++++++
 tools/libacpi/build.c                   | 216 +++++++++++++++++++++
 tools/libacpi/libacpi.h                 |  19 ++
 tools/libxc/include/xc_dom.h            |   1 +
 tools/libxc/include/xenctrl.h           |   8 +
 tools/libxc/xc_dom_x86.c                |   7 +
 tools/libxc/xc_domain.c                 |  14 ++
 tools/libxl/Makefile                    |   5 +-
 tools/libxl/libxl_create.c              |   4 +-
 tools/libxl/libxl_dm.c                  | 113 ++++++++++-
 tools/libxl/libxl_dom.c                 |  25 +++
 tools/libxl/libxl_nvdimm.c              | 281 +++++++++++++++++++++++++++
 tools/libxl/libxl_nvdimm.h              |  45 +++++
 tools/libxl/libxl_qmp.c                 |  64 +++++++
 tools/libxl/libxl_types.idl             |   8 +
 tools/libxl/libxl_x86_acpi.c            |  36 ++++
 tools/libxl/xl_cmdimpl.c                |  16 ++
 xen/arch/x86/Makefile                   |   1 +
 xen/arch/x86/domain.c                   |   5 +
 xen/arch/x86/platform_hypercall.c       |   7 +
 xen/arch/x86/pmem.c                     | 325 ++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/mm.c                |  77 +++++++-
 xen/common/domain.c                     |   3 +
 xen/common/memory.c                     |  31 +++
 xen/include/asm-x86/mm.h                |   4 +
 xen/include/public/hvm/hvm_xs_strings.h |  11 ++
 xen/include/public/memory.h             |  14 +-
 xen/include/public/platform.h           |  14 ++
 xen/include/xen/pmem.h                  |  42 +++++
 xen/include/xen/sched.h                 |   3 +
 xen/xsm/flask/hooks.c                   |   1 +
 37 files changed, 1818 insertions(+), 17 deletions(-)
 create mode 100644 tools/libacpi/aml_build.c
 create mode 100644 tools/libacpi/aml_build.h
 create mode 100644 tools/libxl/libxl_nvdimm.c
 create mode 100644 tools/libxl/libxl_nvdimm.h
 create mode 100644 xen/arch/x86/pmem.c
 create mode 100644 xen/include/xen/pmem.h

-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2016-12-09 21:35   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table Haozhong Zhang
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Andrew Cooper, Xiao Guangrong, Jan Beulich

A reserved area on each pmem region is used to place the frame table.
However, it's not at the beginning of the pmem region, so we need to
specify the location explicitly when extending the frame table.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index b8b6b70..33f226a 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -792,7 +792,8 @@ static int setup_frametable_chunk(void *start, void *end,
     return 0;
 }
 
-static int extend_frame_table(struct mem_hotadd_info *info)
+static int extend_frame_table(struct mem_hotadd_info *info,
+                              struct mem_hotadd_info *alloc_info)
 {
     unsigned long cidx, nidx, eidx, spfn, epfn;
 
@@ -818,9 +819,9 @@ static int extend_frame_table(struct mem_hotadd_info *info)
         nidx = find_next_bit(pdx_group_valid, eidx, cidx);
         if ( nidx >= eidx )
             nidx = eidx;
-        err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT ),
+        err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT),
                                      pdx_to_page(nidx * PDX_GROUP_COUNT),
-                                     info);
+                                     alloc_info);
         if ( err )
             return err;
 
@@ -1413,7 +1414,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     info.epfn = epfn;
     info.cur = spfn;
 
-    ret = extend_frame_table(&info);
+    ret = extend_frame_table(&info, &info);
     if (ret)
         goto destroy_frametable;
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
  2016-10-10  0:32 ` [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2016-12-09 21:38   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions Haozhong Zhang
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Andrew Cooper, Xiao Guangrong, Jan Beulich

A reserved area on each pmem region is used to place the M2P table.
However, it's not at the beginning of the pmem region, so we need to
specify the location explicitly when creating the M2P table.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 33f226a..5c0f527 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -317,7 +317,8 @@ void destroy_m2p_mapping(struct mem_hotadd_info *info)
  * spfn/epfn: the pfn ranges to be setup
  * free_s/free_e: the pfn ranges that is free still
  */
-static int setup_compat_m2p_table(struct mem_hotadd_info *info)
+static int setup_compat_m2p_table(struct mem_hotadd_info *info,
+                                  struct mem_hotadd_info *alloc_info)
 {
     unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn;
     unsigned int n;
@@ -371,7 +372,7 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
         if ( n == CNT )
             continue;
 
-        mfn = alloc_hotadd_mfn(info);
+        mfn = alloc_hotadd_mfn(alloc_info);
         err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER,
                                PAGE_HYPERVISOR);
         if ( err )
@@ -391,7 +392,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
  * Allocate and map the machine-to-phys table.
  * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup already
  */
-static int setup_m2p_table(struct mem_hotadd_info *info)
+static int setup_m2p_table(struct mem_hotadd_info *info,
+                           struct mem_hotadd_info *alloc_info)
 {
     unsigned long i, va, smap, emap;
     unsigned int n;
@@ -440,7 +442,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
                 break;
         if ( n < CNT )
         {
-            unsigned long mfn = alloc_hotadd_mfn(info);
+            unsigned long mfn = alloc_hotadd_mfn(alloc_info);
 
             ret = map_pages_to_xen(
                         RDWR_MPT_VIRT_START + i * sizeof(unsigned long),
@@ -485,7 +487,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 #undef CNT
 #undef MFN
 
-    ret = setup_compat_m2p_table(info);
+    ret = setup_compat_m2p_table(info, alloc_info);
 error:
     return ret;
 }
@@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     total_pages += epfn - spfn;
 
     set_pdx_range(spfn, epfn);
-    ret = setup_m2p_table(&info);
+    ret = setup_m2p_table(&info, &info);
 
     if ( ret )
         goto destroy_m2p;
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
  2016-10-10  0:32 ` [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table Haozhong Zhang
  2016-10-10  0:32 ` [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2016-10-11 19:13   ` Andrew Cooper
                     ` (2 more replies)
  2016-10-10  0:32 ` [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest Haozhong Zhang
                   ` (13 subsequent siblings)
  16 siblings, 3 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Andrew Cooper, Jan Beulich,
	Daniel De Graaf

Xen hypervisor does not include a pmem driver. Instead, it relies on the
pmem driver in Dom0 to report the PFN ranges of the entire pmem region,
its reserved area and data area via XENPF_pmem_add. The reserved area is
used by Xen hypervisor to place the frame table and M2P table, and is
disallowed to be accessed from Dom0 once it's reported.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 xen/arch/x86/Makefile             |   1 +
 xen/arch/x86/platform_hypercall.c |   7 ++
 xen/arch/x86/pmem.c               | 161 ++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/mm.c          |  54 +++++++++++++
 xen/include/asm-x86/mm.h          |   4 +
 xen/include/public/platform.h     |  14 ++++
 xen/include/xen/pmem.h            |  31 ++++++++
 xen/xsm/flask/hooks.c             |   1 +
 8 files changed, 273 insertions(+)
 create mode 100644 xen/arch/x86/pmem.c
 create mode 100644 xen/include/xen/pmem.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 931917d..9cf2da1 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_TBOOT) += tboot.o
 obj-y += hpet.o
 obj-y += vm_event.o
 obj-y += xstate.o
+obj-y += pmem.o
 
 x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
 
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 0879e19..c47eea4 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -24,6 +24,7 @@
 #include <xen/pmstat.h>
 #include <xen/irq.h>
 #include <xen/symbols.h>
+#include <xen/pmem.h>
 #include <asm/current.h>
 #include <public/platform.h>
 #include <acpi/cpufreq/processor_perf.h>
@@ -822,6 +823,12 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_pmem_add:
+        ret = pmem_add(op->u.pmem_add.spfn, op->u.pmem_add.epfn,
+                       op->u.pmem_add.rsv_spfn, op->u.pmem_add.rsv_epfn,
+                       op->u.pmem_add.data_spfn, op->u.pmem_add.data_epfn);
+        break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
new file mode 100644
index 0000000..70358ed
--- /dev/null
+++ b/xen/arch/x86/pmem.c
@@ -0,0 +1,161 @@
+/******************************************************************************
+ * arch/x86/pmem.c
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ */
+
+#include <xen/guest_access.h>
+#include <xen/list.h>
+#include <xen/spinlock.h>
+#include <xen/pmem.h>
+#include <xen/iocap.h>
+#include <asm-x86/mm.h>
+
+/*
+ * All pmem regions reported from Dom0 are linked in pmem_list, which
+ * is proected by pmem_list_lock. Its entries are of type struct pmem
+ * and sorted incrementally by field spa.
+ */
+static DEFINE_SPINLOCK(pmem_list_lock);
+static LIST_HEAD(pmem_list);
+
+struct pmem {
+    struct list_head link;   /* link to pmem_list */
+    unsigned long spfn;      /* start PFN of the whole pmem region */
+    unsigned long epfn;      /* end PFN of the whole pmem region */
+    unsigned long rsv_spfn;  /* start PFN of the reserved area */
+    unsigned long rsv_epfn;  /* end PFN of the reserved area */
+    unsigned long data_spfn; /* start PFN of the data area */
+    unsigned long data_epfn; /* end PFN of the data area */
+};
+
+static int is_included(unsigned long s1, unsigned long e1,
+                       unsigned long s2, unsigned long e2)
+{
+    return s1 <= s2 && s2 < e2 && e2 <= e1;
+}
+
+static int is_overlaped(unsigned long s1, unsigned long e1,
+                        unsigned long s2, unsigned long e2)
+{
+    return (s1 <= s2 && s2 < e1) || (s2 < s1 && s1 < e2);
+}
+
+static int check_reserved_size(unsigned long rsv_mfns, unsigned long total_mfns)
+{
+    return rsv_mfns >=
+        ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
+        ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
+}
+
+static int pmem_add_check(unsigned long spfn, unsigned long epfn,
+                          unsigned long rsv_spfn, unsigned long rsv_epfn,
+                          unsigned long data_spfn, unsigned long data_epfn)
+{
+    if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
+        return 0;
+
+    if ( !is_included(spfn, epfn, rsv_spfn, rsv_epfn) ||
+         !is_included(spfn, epfn, data_spfn, data_epfn) )
+        return 0;
+
+    if ( is_overlaped(rsv_spfn, rsv_epfn, data_spfn, data_epfn) )
+        return 0;
+
+    if ( !check_reserved_size(rsv_epfn - rsv_spfn, epfn - spfn) )
+        return 0;
+
+    return 1;
+}
+
+static int pmem_list_add(unsigned long spfn, unsigned long epfn,
+                         unsigned long rsv_spfn, unsigned long rsv_epfn,
+                         unsigned long data_spfn, unsigned long data_epfn)
+{
+    struct list_head *cur;
+    struct pmem *new_pmem;
+    int ret = 0;
+
+    spin_lock(&pmem_list_lock);
+
+    list_for_each_prev(cur, &pmem_list)
+    {
+        struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
+        unsigned long cur_spfn = cur_pmem->spfn;
+        unsigned long cur_epfn = cur_pmem->epfn;
+
+        if ( (cur_spfn <= spfn && spfn < cur_epfn) ||
+             (spfn <= cur_spfn && cur_spfn < epfn) )
+        {
+            ret = -EINVAL;
+            goto out;
+        }
+
+        if ( cur_spfn < spfn )
+            break;
+    }
+
+    new_pmem = xmalloc(struct pmem);
+    if ( !new_pmem )
+    {
+        ret = -ENOMEM;
+        goto out;
+    }
+    new_pmem->spfn      = spfn;
+    new_pmem->epfn      = epfn;
+    new_pmem->rsv_spfn  = rsv_spfn;
+    new_pmem->rsv_epfn  = rsv_epfn;
+    new_pmem->data_spfn = data_spfn;
+    new_pmem->data_epfn = data_epfn;
+    list_add(&new_pmem->link, cur);
+
+ out:
+    spin_unlock(&pmem_list_lock);
+    return ret;
+}
+
+int pmem_add(unsigned long spfn, unsigned long epfn,
+             unsigned long rsv_spfn, unsigned long rsv_epfn,
+             unsigned long data_spfn, unsigned long data_epfn)
+{
+    int ret;
+
+    if ( !pmem_add_check(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn) )
+        return -EINVAL;
+
+    ret = pmem_setup(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
+    if ( ret )
+        goto out;
+
+    ret = iomem_deny_access(current->domain, rsv_spfn, rsv_epfn);
+    if ( ret )
+        goto out;
+
+    ret = pmem_list_add(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
+    if ( ret )
+        goto out;
+
+    printk(XENLOG_INFO
+           "pmem: pfns     0x%lx - 0x%lx\n"
+           "      reserved 0x%lx - 0x%lx\n"
+           "      data     0x%lx - 0x%lx\n",
+           spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
+
+ out:
+    return ret;
+}
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 5c0f527..b1f92f6 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1474,6 +1474,60 @@ destroy_frametable:
     return ret;
 }
 
+int pmem_setup(unsigned long spfn, unsigned long epfn,
+               unsigned long rsv_spfn, unsigned long rsv_epfn,
+               unsigned long data_spfn, unsigned long data_epfn)
+{
+    unsigned old_max = max_page, old_total = total_pages;
+    struct mem_hotadd_info info =
+        { .spfn = spfn, .epfn = epfn, .cur = spfn };
+    struct mem_hotadd_info rsv_info =
+        { .spfn = rsv_spfn, .epfn = rsv_epfn, .cur = rsv_spfn };
+    int ret;
+    unsigned long i;
+    struct page_info *pg;
+
+    if ( !mem_hotadd_check(spfn, epfn) )
+        return -EINVAL;
+
+    ret = extend_frame_table(&info, &rsv_info);
+    if ( ret )
+        goto destroy_frametable;
+
+    if ( max_page < epfn )
+    {
+        max_page = epfn;
+        max_pdx = pfn_to_pdx(max_page - 1) + 1;
+    }
+    total_pages += epfn - spfn;
+
+    set_pdx_range(spfn, epfn);
+    ret = setup_m2p_table(&info, &rsv_info);
+    if ( ret )
+        goto destroy_m2p;
+
+    share_hotadd_m2p_table(&info);
+
+    for ( i = spfn; i < epfn; i++ )
+    {
+        pg = mfn_to_page(i);
+        pg->count_info = (rsv_spfn <= i && i < rsv_info.cur) ?
+                         PGC_state_inuse : PGC_state_free;
+    }
+
+    return 0;
+
+destroy_m2p:
+    destroy_m2p_mapping(&info);
+    max_page = old_max;
+    total_pages = old_total;
+    max_pdx = pfn_to_pdx(max_page - 1) + 1;
+destroy_frametable:
+    cleanup_frame_table(&info);
+
+    return ret;
+}
+
 #include "compat/mm.c"
 
 /*
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index b781495..e31f1c8 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -597,4 +597,8 @@ typedef struct mm_rwlock {
 
 extern const char zero_page[];
 
+int pmem_setup(unsigned long spfn, unsigned long epfn,
+               unsigned long rsv_spfn, unsigned long rsv_epfn,
+               unsigned long data_spfn, unsigned long data_epfn);
+
 #endif /* __ASM_X86_MM_H__ */
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 1e6a6ce..c7e7cce 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -608,6 +608,19 @@ struct xenpf_symdata {
 typedef struct xenpf_symdata xenpf_symdata_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
 
+#define XENPF_pmem_add     64
+struct xenpf_pmem_add {
+    /* IN variables */
+    uint64_t spfn;      /* start PFN of the whole pmem region */
+    uint64_t epfn;      /* end PFN of the whole pmem region */
+    uint64_t rsv_spfn;  /* start PFN of the reserved area within the region */
+    uint64_t rsv_epfn;  /* end PFN of the reserved area within the region */
+    uint64_t data_spfn; /* start PFN of the data area within the region */
+    uint64_t data_epfn; /* end PFN of the data area within the region */
+};
+typedef struct xenpf_pmem_add xenpf_pmem_add_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_pmem_add_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -638,6 +651,7 @@ struct xen_platform_op {
         struct xenpf_core_parking      core_parking;
         struct xenpf_resource_op       resource_op;
         struct xenpf_symdata           symdata;
+        struct xenpf_pmem_add          pmem_add;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
new file mode 100644
index 0000000..a670ab8
--- /dev/null
+++ b/xen/include/xen/pmem.h
@@ -0,0 +1,31 @@
+/*
+ * xen/include/xen/pmem.h
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ */
+
+#ifndef __XEN_PMEM_H__
+#define __XEN_PMEM_H__
+
+#include <xen/types.h>
+
+int pmem_add(unsigned long spfn, unsigned long epfn,
+             unsigned long rsv_spfn, unsigned long rsv_epfn,
+             unsigned long data_spfn, unsigned long data_epfn);
+
+#endif /* __XEN_PMEM_H__ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 177c11f..948a161 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1360,6 +1360,7 @@ static int flask_platform_op(uint32_t op)
     case XENPF_cpu_offline:
     case XENPF_cpu_hotadd:
     case XENPF_mem_hotadd:
+    case XENPF_pmem_add:
         return 0;
 #endif
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (2 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2016-12-09 22:22   ` Konrad Rzeszutek Wilk
  2016-12-22 12:19   ` Jan Beulich
  2016-10-10  0:32 ` [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy Haozhong Zhang
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Xiao Guangrong

XENMEM_populate_pmemmap is used by toolstack to map given host pmem pages
to given guest pages. Only pages in the data area of a pmem region are
allowed to be mapped to guest.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/include/xenctrl.h |   8 +++
 tools/libxc/xc_domain.c       |  14 +++++
 xen/arch/x86/pmem.c           | 123 ++++++++++++++++++++++++++++++++++++++++++
 xen/common/domain.c           |   3 ++
 xen/common/memory.c           |  31 +++++++++++
 xen/include/public/memory.h   |  14 ++++-
 xen/include/xen/pmem.h        |  10 ++++
 xen/include/xen/sched.h       |   3 ++
 8 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2c83544..46c71fc 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2710,6 +2710,14 @@ int xc_livepatch_revert(xc_interface *xch, char *name, uint32_t timeout);
 int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
 int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
 
+/**
+ * Map host pmem pages at PFNs @mfn ~ (@mfn + @nr_mfns - 1) to
+ * guest physical pages at guest PFNs @gpfn ~ (@gpfn + @nr_mfns - 1)
+ */
+int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
+                               xen_pfn_t mfn, xen_pfn_t gpfn,
+                               unsigned int nr_mfns);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 296b852..81a90a1 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2520,6 +2520,20 @@ int xc_domain_soft_reset(xc_interface *xch,
     domctl.domain = (domid_t)domid;
     return do_domctl(xch, &domctl);
 }
+
+int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
+                               xen_pfn_t mfn, xen_pfn_t gpfn,
+                               unsigned int nr_mfns)
+{
+    struct xen_pmemmap pmemmap = {
+        .domid   = domid,
+        .mfn     = mfn,
+        .gpfn    = gpfn,
+        .nr_mfns = nr_mfns,
+    };
+    return do_memory_op(xch, XENMEM_populate_pmemmap, &pmemmap, sizeof(pmemmap));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
index 70358ed..e4dc685 100644
--- a/xen/arch/x86/pmem.c
+++ b/xen/arch/x86/pmem.c
@@ -24,6 +24,9 @@
 #include <xen/spinlock.h>
 #include <xen/pmem.h>
 #include <xen/iocap.h>
+#include <xen/sched.h>
+#include <xen/event.h>
+#include <xen/paging.h>
 #include <asm-x86/mm.h>
 
 /*
@@ -63,6 +66,48 @@ static int check_reserved_size(unsigned long rsv_mfns, unsigned long total_mfns)
         ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
 }
 
+static int is_data_mfn(unsigned long mfn)
+{
+    struct list_head *cur;
+    int data = 0;
+
+    ASSERT(spin_is_locked(&pmem_list_lock));
+
+    list_for_each(cur, &pmem_list)
+    {
+        struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+        if ( pmem->data_spfn <= mfn && mfn < pmem->data_epfn )
+        {
+            data = 1;
+            break;
+        }
+    }
+
+    return data;
+}
+
+static int pmem_page_valid(struct page_info *page, struct domain *d)
+{
+    /* only data area can be mapped to guest */
+    if ( !is_data_mfn(page_to_mfn(page)) )
+    {
+        dprintk(XENLOG_DEBUG, "pmem: mfn 0x%lx is not a pmem data page\n",
+                page_to_mfn(page));
+        return 0;
+    }
+
+    /* inuse/offlined/offlining pmem page cannot be mapped to guest */
+    if ( !page_state_is(page, free) )
+    {
+        dprintk(XENLOG_DEBUG, "pmem: invalid page state of mfn 0x%lx: 0x%lx\n",
+                page_to_mfn(page), page->count_info & PGC_state);
+        return 0;
+    }
+
+    return 1;
+}
+
 static int pmem_add_check(unsigned long spfn, unsigned long epfn,
                           unsigned long rsv_spfn, unsigned long rsv_epfn,
                           unsigned long data_spfn, unsigned long data_epfn)
@@ -159,3 +204,81 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
  out:
     return ret;
 }
+
+static int pmem_assign_pages(struct domain *d,
+                             struct page_info *pg, unsigned int order)
+{
+    int rc = 0;
+    unsigned long i;
+
+    spin_lock(&d->pmem_lock);
+
+    if ( unlikely(d->is_dying) )
+    {
+        rc = -EINVAL;
+        goto out;
+    }
+
+    for ( i = 0; i < (1 << order); i++ )
+    {
+        ASSERT(page_get_owner(&pg[i]) == NULL);
+        ASSERT((pg[i].count_info & ~(PGC_allocated | 1)) == 0);
+        page_set_owner(&pg[i], d);
+        smp_wmb();
+        pg[i].count_info = PGC_allocated | 1;
+        page_list_add_tail(&pg[i], &d->pmem_page_list);
+    }
+
+ out:
+    spin_unlock(&d->pmem_lock);
+    return rc;
+}
+
+int pmem_populate(struct xen_pmemmap_args *args)
+{
+    struct domain *d = args->domain;
+    unsigned long i, mfn, gpfn;
+    struct page_info *page;
+    int rc = 0;
+
+    if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
+        return -EINVAL;
+
+    for ( i = args->nr_done, mfn = args->mfn + i, gpfn = args->gpfn + i;
+          i < args->nr_mfns;
+          i++, mfn++, gpfn++ )
+    {
+        if ( i != args->nr_done && hypercall_preempt_check() )
+        {
+            args->preempted = 1;
+            goto out;
+        }
+
+        page = mfn_to_page(mfn);
+
+        spin_lock(&pmem_list_lock);
+        if ( !pmem_page_valid(page, d) )
+        {
+            dprintk(XENLOG_DEBUG, "pmem: MFN 0x%lx not a valid pmem page\n", mfn);
+            spin_unlock(&pmem_list_lock);
+            rc = -EINVAL;
+            goto out;
+        }
+        page->count_info = PGC_state_inuse;
+        spin_unlock(&pmem_list_lock);
+
+        page->u.inuse.type_info = 0;
+
+        guest_physmap_add_page(d, _gfn(gpfn), _mfn(mfn), 0);
+        if ( pmem_assign_pages(d, page, 0) )
+        {
+            guest_physmap_remove_page(d, _gfn(gpfn), _mfn(mfn), 0);
+            rc = -EFAULT;
+            goto out;
+        }
+    }
+
+ out:
+    args->nr_done = i;
+    return rc;
+}
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 3abaca9..8192548 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -288,6 +288,9 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
 
+    spin_lock_init_prof(d, pmem_lock);
+    INIT_PAGE_LIST_HEAD(&d->pmem_page_list);
+
     spin_lock_init(&d->node_affinity_lock);
     d->node_affinity = NODE_MASK_ALL;
     d->auto_node_affinity = 1;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 21797ca..09cb1c9 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -24,6 +24,7 @@
 #include <xen/numa.h>
 #include <xen/mem_access.h>
 #include <xen/trace.h>
+#include <xen/pmem.h>
 #include <asm/current.h>
 #include <asm/hardirq.h>
 #include <asm/p2m.h>
@@ -1329,6 +1330,36 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 #endif
 
+    case XENMEM_populate_pmemmap:
+    {
+        struct xen_pmemmap pmemmap;
+        struct xen_pmemmap_args args;
+
+        if ( copy_from_guest(&pmemmap, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(pmemmap.domid);
+        if ( !d )
+            return -EINVAL;
+
+        args.domain = d;
+        args.mfn = pmemmap.mfn;
+        args.gpfn = pmemmap.gpfn;
+        args.nr_mfns = pmemmap.nr_mfns;
+        args.nr_done = start_extent;
+        args.preempted = 0;
+
+        rc = pmem_populate(&args);
+        rcu_unlock_domain(d);
+
+        if ( !rc && args.preempted )
+            return hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "lh",
+                op | (args.nr_done << MEMOP_EXTENT_SHIFT), arg);
+
+        break;
+    }
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 5bf840f..8c048fc 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -646,7 +646,19 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 28 */
+#define XENMEM_populate_pmemmap 28
+
+struct xen_pmemmap {
+    /* IN */
+    domid_t domid;
+    xen_pfn_t mfn;
+    xen_pfn_t gpfn;
+    unsigned int nr_mfns;
+};
+typedef struct xen_pmemmap xen_pmemmap_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmemmap_t);
+
+/* Next available subop number is 29 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index a670ab8..60adf56 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -24,8 +24,18 @@
 
 #include <xen/types.h>
 
+struct xen_pmemmap_args {
+    struct domain *domain;
+    xen_pfn_t mfn;
+    xen_pfn_t gpfn;
+    unsigned int nr_mfns;
+    unsigned int nr_done;
+    int preempted;
+};
+
 int pmem_add(unsigned long spfn, unsigned long epfn,
              unsigned long rsv_spfn, unsigned long rsv_epfn,
              unsigned long data_spfn, unsigned long data_epfn);
+int pmem_populate(struct xen_pmemmap_args *args);
 
 #endif /* __XEN_PMEM_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 1fbda87..3c66225 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -329,6 +329,9 @@ struct domain
     atomic_t         shr_pages;       /* number of shared pages             */
     atomic_t         paged_pages;     /* number of paged-out pages          */
 
+    spinlock_t       pmem_lock;       /* protect all following pmem_ fields */
+    struct page_list_head pmem_page_list; /* linked list of pmem pages      */
+
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
     struct cpupool  *cpupool;
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (3 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2016-12-09 22:27   ` Konrad Rzeszutek Wilk
  2016-12-22 12:22   ` Jan Beulich
  2016-10-10  0:32 ` [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model Haozhong Zhang
                   ` (11 subsequent siblings)
  16 siblings, 2 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Andrew Cooper, Xiao Guangrong, Jan Beulich

The host pmem pages mapped to a domain are unassigned at domain destroy
so as to be used by other domains in future.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain.c  |  5 +++++
 xen/arch/x86/pmem.c    | 41 +++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/pmem.h |  1 +
 3 files changed, 47 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1bd5eb6..05ab389 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -61,6 +61,7 @@
 #include <asm/amd.h>
 #include <xen/numa.h>
 #include <xen/iommu.h>
+#include <xen/pmem.h>
 #include <compat/vcpu.h>
 #include <asm/psr.h>
 
@@ -2512,6 +2513,10 @@ int domain_relinquish_resources(struct domain *d)
         if ( ret )
             return ret;
 
+        ret = pmem_teardown(d);
+        if ( ret )
+            return ret;
+
         /* Tear down paging-assistance stuff. */
         ret = paging_teardown(d);
         if ( ret )
diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
index e4dc685..50e496b 100644
--- a/xen/arch/x86/pmem.c
+++ b/xen/arch/x86/pmem.c
@@ -282,3 +282,44 @@ int pmem_populate(struct xen_pmemmap_args *args)
     args->nr_done = i;
     return rc;
 }
+
+static int pmem_teardown_preemptible(struct domain *d, int *preempted)
+{
+    struct page_info *pg, *next;
+    int rc = 0;
+
+    spin_lock(&d->pmem_lock);
+
+    page_list_for_each_safe (pg, next, &d->pmem_page_list )
+    {
+        BUG_ON(page_get_owner(pg) != d);
+        BUG_ON(page_state_is(pg, free));
+
+        page_list_del(pg, &d->pmem_page_list);
+        page_set_owner(pg, NULL);
+        pg->count_info = (pg->count_info & ~PGC_count_mask) | PGC_state_free;
+
+        if ( preempted && hypercall_preempt_check() )
+        {
+            *preempted = 1;
+            goto out;
+        }
+    }
+
+ out:
+    spin_unlock(&d->pmem_lock);
+    return rc;
+}
+
+int pmem_teardown(struct domain *d)
+{
+    int preempted = 0;
+
+    ASSERT(d->is_dying);
+    ASSERT(d != current->domain);
+
+    if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
+        return -EINVAL;
+
+    return pmem_teardown_preemptible(d, &preempted);
+}
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 60adf56..ffbef1c 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -37,5 +37,6 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
              unsigned long rsv_spfn, unsigned long rsv_epfn,
              unsigned long data_spfn, unsigned long data_epfn);
 int pmem_populate(struct xen_pmemmap_args *args);
+int pmem_teardown(struct domain *d);
 
 #endif /* __XEN_PMEM_H__ */
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (4 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 20:44   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 07/16] tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address Haozhong Zhang
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Xiao Guangrong, Ian Jackson

One guest page is reserved for the device model to place guest ACPI. The
base address and size of the reserved area are passed to the device
model via XenStore keys hvmloader/dm-acpi/{address, length}.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xc_dom.h            |  1 +
 tools/libxc/xc_dom_x86.c                |  7 +++++++
 tools/libxl/libxl_dom.c                 | 25 +++++++++++++++++++++++++
 xen/include/public/hvm/hvm_xs_strings.h | 11 +++++++++++
 4 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 608cbc2..19d65cd 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -98,6 +98,7 @@ struct xc_dom_image {
     xen_pfn_t xenstore_pfn;
     xen_pfn_t shared_info_pfn;
     xen_pfn_t bootstack_pfn;
+    xen_pfn_t dm_acpi_pfn;
     xen_pfn_t pfn_alloc_end;
     xen_vaddr_t virt_alloc_end;
     xen_vaddr_t bsd_symtab_start;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 0eab8a7..47f14a1 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -674,6 +674,13 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
                          ioreq_server_pfn(0));
         xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
                          NR_IOREQ_SERVER_PAGES);
+
+        dom->dm_acpi_pfn = xc_dom_alloc_page(dom, "DM ACPI");
+        if ( dom->dm_acpi_pfn == INVALID_PFN )
+        {
+            DOMPRINTF("Could not allocate page for device model ACPI.");
+            goto error_out;
+        }
     }
 
     rc = xc_dom_alloc_segment(dom, &dom->start_info_seg,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d519c8d..f0a1d97 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -865,6 +865,31 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
             goto err;
     }
 
+    if (dom->dm_acpi_pfn) {
+        uint64_t guest_addr_out = dom->dm_acpi_pfn * XC_DOM_PAGE_SIZE(dom);
+
+        if (guest_addr_out >= 0x100000000ULL) {
+            LOG(ERROR,
+                "Guest address of DM ACPI is 0x%"PRIx64", but expected below 4G",
+                guest_addr_out);
+            goto err;
+        }
+
+        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_ADDRESS, domid);
+
+        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
+                               guest_addr_out);
+        if (ret)
+            goto err;
+
+        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_LENGTH, domid);
+
+        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
+                               (uint64_t) XC_DOM_PAGE_SIZE(dom));
+        if (ret)
+            goto err;
+    }
+
     return 0;
 
 err:
diff --git a/xen/include/public/hvm/hvm_xs_strings.h b/xen/include/public/hvm/hvm_xs_strings.h
index 146b0b0..f44f71f 100644
--- a/xen/include/public/hvm/hvm_xs_strings.h
+++ b/xen/include/public/hvm/hvm_xs_strings.h
@@ -79,4 +79,15 @@
  */
 #define HVM_XS_OEM_STRINGS             "bios-strings/oem-%d"
 
+/* Follows are XenStore keys for DM ACPI (ACPI built by device model,
+ * e.g. QEMU).
+ *
+ * A reserved area of guest physical memory is used to pass DM
+ * ACPI. Values of following two keys specify the base address and
+ * length (in bytes) of the reserved area.
+ */
+#define HVM_XS_DM_ACPI_ROOT              "hvmloader/dm-acpi"
+#define HVM_XS_DM_ACPI_ADDRESS           HVM_XS_DM_ACPI_ROOT"/address"
+#define HVM_XS_DM_ACPI_LENGTH            HVM_XS_DM_ACPI_ROOT"/length"
+
 #endif /* __XEN_PUBLIC_HVM_HVM_XS_STRINGS_H__ */
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 07/16] tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (5 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 20:46   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 08/16] tools/libacpi: expose details of memory allocation callback Haozhong Zhang
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Andrew Cooper, Ian Jackson,
	Jan Beulich, Wei Liu

This callback is used when libacpi needs to in-place access ACPI built
by the device model, whose address is specified in the physical address.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c |  6 ++++++
 tools/firmware/hvmloader/util.h |  1 +
 tools/libacpi/libacpi.h         |  1 +
 tools/libxl/libxl_x86_acpi.c    | 10 ++++++++++
 4 files changed, 18 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 6e0cfe7..1fe8dcc 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -871,6 +871,11 @@ static unsigned long acpi_v2p(struct acpi_ctxt *ctxt, void *v)
     return virt_to_phys(v);
 }
 
+static void *acpi_p2v(struct acpi_ctxt *ctxt, unsigned long p)
+{
+    return phys_to_virt(p);
+}
+
 static void *acpi_mem_alloc(struct acpi_ctxt *ctxt,
                             uint32_t size, uint32_t align)
 {
@@ -966,6 +971,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     ctxt.mem_ops.alloc = acpi_mem_alloc;
     ctxt.mem_ops.free = acpi_mem_free;
     ctxt.mem_ops.v2p = acpi_v2p;
+    ctxt.mem_ops.p2v = acpi_p2v;
 
     acpi_build_tables(&ctxt, config);
 
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 6062f0b..6a50dae 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -200,6 +200,7 @@ xen_pfn_t mem_hole_alloc(uint32_t nr_mfns);
 /* Allocate memory in a reserved region below 4GB. */
 void *mem_alloc(uint32_t size, uint32_t align);
 #define virt_to_phys(v) ((unsigned long)(v))
+#define phys_to_virt(p) ((void *)(p))
 
 /* Allocate memory in a scratch region */
 void *scratch_alloc(uint32_t size, uint32_t align);
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 1d388f9..62e90ab 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -45,6 +45,7 @@ struct acpi_ctxt {
         void *(*alloc)(struct acpi_ctxt *ctxt, uint32_t size, uint32_t align);
         void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size);
         unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
+        void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p);
     } mem_ops;
 };
 
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index ff0e2df..aa5b83d 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -52,6 +52,15 @@ static unsigned long virt_to_phys(struct acpi_ctxt *ctxt, void *v)
             libxl_ctxt->alloc_base_paddr);
 }
 
+static void *phys_to_virt(struct acpi_ctxt *ctxt, unsigned long p)
+{
+    struct libxl_acpi_ctxt *libxl_ctxt =
+        CONTAINER_OF(ctxt, struct libxl_acpi_ctxt, c);
+
+    return (void *)((p - libxl_ctxt->alloc_base_paddr) +
+                    libxl_ctxt->alloc_base_vaddr);
+}
+
 static void *mem_alloc(struct acpi_ctxt *ctxt,
                        uint32_t size, uint32_t align)
 {
@@ -176,6 +185,7 @@ int libxl__dom_load_acpi(libxl__gc *gc,
 
     libxl_ctxt.c.mem_ops.alloc = mem_alloc;
     libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
+    libxl_ctxt.c.mem_ops.p2v = phys_to_virt;
     libxl_ctxt.c.mem_ops.free = acpi_mem_free;
 
     rc = init_acpi_config(gc, dom, b_info, &config);
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 08/16] tools/libacpi: expose details of memory allocation callback
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (6 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 07/16] tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 20:58   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 09/16] tools/libacpi: add callbacks to access XenStore Haozhong Zhang
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Andrew Cooper, Ian Jackson,
	Jan Beulich, Wei Liu

Expose the minimal allocation unit and the minimal alignment used by the
memory allocator, so that certain ACPI code (e.g. the AML builder added
later) can get contiguous memory allocated by multiple calls to
acpi_ctxt.mem_ops.alloc().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c | 2 ++
 tools/libacpi/libacpi.h         | 3 +++
 tools/libxl/libxl_x86_acpi.c    | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 1fe8dcc..504ae6a 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -972,6 +972,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     ctxt.mem_ops.free = acpi_mem_free;
     ctxt.mem_ops.v2p = acpi_v2p;
     ctxt.mem_ops.p2v = acpi_p2v;
+    ctxt.min_alloc_unit = PAGE_SIZE;
+    ctxt.min_alloc_align = 16;
 
     acpi_build_tables(&ctxt, config);
 
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 62e90ab..0fb16e7 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -47,6 +47,9 @@ struct acpi_ctxt {
         unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
         void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p);
     } mem_ops;
+
+    uint32_t min_alloc_unit;
+    uint32_t min_alloc_align;
 };
 
 struct acpi_config {
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index aa5b83d..baf60ac 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -187,6 +187,8 @@ int libxl__dom_load_acpi(libxl__gc *gc,
     libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
     libxl_ctxt.c.mem_ops.p2v = phys_to_virt;
     libxl_ctxt.c.mem_ops.free = acpi_mem_free;
+    libxl_ctxt.c.min_alloc_unit = libxl_ctxt.page_size;
+    libxl_ctxt.c.min_alloc_align = 16;
 
     rc = init_acpi_config(gc, dom, b_info, &config);
     if (rc) {
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 09/16] tools/libacpi: add callbacks to access XenStore
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (7 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 08/16] tools/libacpi: expose details of memory allocation callback Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 21:10   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 10/16] tools/libacpi: add a simple AML builder Haozhong Zhang
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Andrew Cooper, Ian Jackson,
	Jan Beulich, Wei Liu

libacpi needs to access information placed in XenStore in order to load
ACPI built by the device model.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c   | 50 +++++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h   |  2 ++
 tools/firmware/hvmloader/xenbus.c | 20 ++++++++++++++++
 tools/libacpi/libacpi.h           | 10 ++++++++
 tools/libxl/libxl_x86_acpi.c      | 24 +++++++++++++++++++
 5 files changed, 106 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 504ae6a..dba954a 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -888,6 +888,51 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
     /* ACPI builder currently doesn't free memory so this is just a stub */
 }
 
+static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
+{
+    return xenstore_read(path, NULL);
+}
+
+static int acpi_xs_write(struct acpi_ctxt *ctxt,
+                         const char *path, const char *value)
+{
+    return xenstore_write(path, value);
+}
+
+static unsigned int count_strings(const char *strings, unsigned int len)
+{
+    const char *p;
+    unsigned int n;
+
+    for ( p = strings, n = 0; p < strings + len; p++ )
+        if ( *p == '\0' )
+            n++;
+
+    return n;
+}
+
+static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
+                                const char *path, unsigned int *num)
+{
+    const char *strings;
+    char *s, *p, **ret;
+    unsigned int len, n;
+
+    strings = xenstore_directory(path, &len, NULL);
+    if ( !strings )
+        return NULL;
+
+    n = count_strings(strings, len);
+    ret = ctxt->mem_ops.alloc(ctxt, n * sizeof(char *) + len, 0);
+    memcpy(&ret[n], strings, len);
+
+    s = (char *)&ret[n];
+    for ( p = s, *num = 0; p < s + len; p+= strlen(p) + 1 )
+        ret[(*num)++] = p;
+
+    return ret;
+}
+
 static uint8_t acpi_lapic_id(unsigned cpu)
 {
     return LAPIC_ID(cpu);
@@ -975,6 +1020,11 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     ctxt.min_alloc_unit = PAGE_SIZE;
     ctxt.min_alloc_align = 16;
 
+    ctxt.xs_ops.read = acpi_xs_read;
+    ctxt.xs_ops.write = acpi_xs_write;
+    ctxt.xs_ops.directory = acpi_xs_directory;
+    ctxt.xs_opaque = NULL;
+
     acpi_build_tables(&ctxt, config);
 
     hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 6a50dae..9443673 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -225,6 +225,8 @@ const char *xenstore_read(const char *path, const char *default_resp);
  */
 int xenstore_write(const char *path, const char *value);
 
+const char *xenstore_directory(const char *path, uint32_t *len,
+                               const char *default_resp);
 
 /* Get a HVM param.
  */
diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
index 448157d..70bdadd 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -296,6 +296,26 @@ int xenstore_write(const char *path, const char *value)
     return ret;
 }
 
+const char *xenstore_directory(const char *path, uint32_t *len,
+                               const char *default_resp)
+{
+    uint32_t type = 0;
+    const char *answer = NULL;
+
+    xenbus_send(XS_DIRECTORY,
+                path, strlen(path),
+                "", 1, /* nul separator */
+                NULL, 0);
+
+    if ( xenbus_recv(len, &answer, &type) || (type != XS_DIRECTORY) )
+        answer = NULL;
+
+    if ( (default_resp != NULL) && ((answer == NULL) || (*answer == '\0')) )
+        answer = default_resp;
+
+    return answer;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 0fb16e7..12cafd8 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -50,6 +50,16 @@ struct acpi_ctxt {
 
     uint32_t min_alloc_unit;
     uint32_t min_alloc_align;
+
+    struct acpi_xs_ops {
+        const char *(*read)(struct acpi_ctxt *ctxt, const char *path);
+        int (*write)(struct acpi_ctxt *ctxt,
+                     const char *path, const char *value);
+        char **(*directory)(struct acpi_ctxt *ctxt,
+                            const char *path, unsigned int *num);
+    } xs_ops;
+
+    void *xs_opaque;
 };
 
 struct acpi_config {
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index baf60ac..1afd2e3 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -93,6 +93,25 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
 {
 }
 
+static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
+{
+    return libxl__xs_read((libxl__gc *)ctxt->xs_opaque, XBT_NULL, path);
+}
+
+static int acpi_xs_write(struct acpi_ctxt *ctxt,
+                         const char *path, const char *value)
+{
+    return libxl__xs_write_checked((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
+                                   path, value);
+}
+
+static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
+                                const char *path, unsigned int *num)
+{
+    return libxl__xs_directory((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
+                               path, num);
+}
+
 static uint8_t acpi_lapic_id(unsigned cpu)
 {
     return cpu * 2;
@@ -190,6 +209,11 @@ int libxl__dom_load_acpi(libxl__gc *gc,
     libxl_ctxt.c.min_alloc_unit = libxl_ctxt.page_size;
     libxl_ctxt.c.min_alloc_align = 16;
 
+    libxl_ctxt.c.xs_ops.read = acpi_xs_read;
+    libxl_ctxt.c.xs_ops.write = acpi_xs_write;
+    libxl_ctxt.c.xs_ops.directory = acpi_xs_directory;
+    libxl_ctxt.c.xs_opaque = gc;
+
     rc = init_acpi_config(gc, dom, b_info, &config);
     if (rc) {
         LOG(ERROR, "init_acpi_config failed (rc=%d)", rc);
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 10/16] tools/libacpi: add a simple AML builder
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (8 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 09/16] tools/libacpi: add callbacks to access XenStore Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 21:19   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model Haozhong Zhang
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Andrew Cooper, Ian Jackson,
	Jan Beulich, Wei Liu

It is used by libacpi to generate SSDTs from ACPI namespace devices
built by the device model.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/Makefile |   3 +-
 tools/libacpi/aml_build.c         | 254 ++++++++++++++++++++++++++++++++++++++
 tools/libacpi/aml_build.h         |  83 +++++++++++++
 tools/libxl/Makefile              |   3 +-
 4 files changed, 341 insertions(+), 2 deletions(-)
 create mode 100644 tools/libacpi/aml_build.c
 create mode 100644 tools/libacpi/aml_build.h

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index 77d7551..cf0dac3 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -79,11 +79,12 @@ smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
 
 ACPI_PATH = ../../libacpi
 ACPI_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
-ACPI_OBJS = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o
+ACPI_OBJS = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o aml_build.o
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
 CFLAGS += -I$(ACPI_PATH)
 vpath build.c $(ACPI_PATH)
 vpath static_tables.c $(ACPI_PATH)
+vpath aml_build.c $(ACPI_PATH)
 OBJS += $(ACPI_OBJS)
 
 hvmloader: $(OBJS)
diff --git a/tools/libacpi/aml_build.c b/tools/libacpi/aml_build.c
new file mode 100644
index 0000000..b6f23f4
--- /dev/null
+++ b/tools/libacpi/aml_build.c
@@ -0,0 +1,254 @@
+/*
+ * tools/libacpi/aml_build.c
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ */
+
+#include LIBACPI_STDUTILS
+#include "libacpi.h"
+#include "aml_build.h"
+
+#define AML_OP_SCOPE     0x10
+#define AML_OP_EXT       0x5B
+#define AML_OP_DEVICE    0x82
+
+#define ACPI_NAMESEG_LEN 4
+
+struct aml_build_alloctor {
+    struct acpi_ctxt *ctxt;
+    uint8_t *buf;
+    uint32_t capacity;
+    uint32_t used;
+};
+static struct aml_build_alloctor alloc;
+
+enum { ALLOC_OVERFLOW, ALLOC_NOT_NEEDED, ALLOC_NEEDED };
+
+static int alloc_needed(uint32_t size)
+{
+    uint32_t len = alloc.used + size;
+
+    if ( len < alloc.used )
+        return ALLOC_OVERFLOW;
+    else if ( len <= alloc.capacity )
+        return ALLOC_NOT_NEEDED;
+    else
+        return ALLOC_NEEDED;
+}
+
+static uint8_t *aml_buf_alloc(uint32_t size)
+{
+    int needed = alloc_needed(size);
+    uint8_t *buf = NULL;
+    struct acpi_ctxt *ctxt = alloc.ctxt;
+    uint32_t alloc_size, alloc_align = ctxt->min_alloc_align;
+
+    switch ( needed )
+    {
+    case ALLOC_OVERFLOW:
+        break;
+
+    case ALLOC_NEEDED:
+        alloc_size = (size + alloc_align) & ~(alloc_align - 1);
+        buf = ctxt->mem_ops.alloc(ctxt, alloc_size, alloc_align);
+        if ( !buf )
+            break;
+        if ( alloc.buf + alloc.capacity != buf )
+        {
+            buf = NULL;
+            break;
+        }
+        alloc.capacity += alloc_size;
+        alloc.used += size;
+        break;
+
+    case ALLOC_NOT_NEEDED:
+        buf = alloc.buf + alloc.used;
+        alloc.used += size;
+        break;
+
+    default:
+        break;
+    }
+
+    return buf;
+}
+
+static uint32_t get_package_length(uint8_t *pkg)
+{
+    uint32_t len;
+
+    len = pkg - alloc.buf;
+    len = alloc.used - len;
+
+    return len;
+}
+
+static void build_prepend_byte(uint8_t *buf, uint8_t byte)
+{
+    uint32_t len;
+
+    len = buf - alloc.buf;
+    len = alloc.used - len;
+
+    aml_buf_alloc(sizeof(uint8_t));
+    if ( len )
+        memmove(buf + 1, buf, len);
+    buf[0] = byte;
+}
+
+/*
+ * XXX: names of multiple segments (e.g. X.Y.Z) are not supported
+ */
+static void build_prepend_name(uint8_t *buf, const char *name)
+{
+    uint8_t *p = buf;
+    const char *s = name;
+    uint32_t len, name_len;
+
+    while ( *s == '\\' || *s == '^' )
+    {
+        build_prepend_byte(p, (uint8_t) *s);
+        ++p;
+        ++s;
+    }
+
+    if ( !*s )
+    {
+        build_prepend_byte(p, 0x00);
+        return;
+    }
+
+    len = p - alloc.buf;
+    len = alloc.used - len;
+    name_len = strlen(s);
+    ASSERT(strlen(s) <= ACPI_NAMESEG_LEN);
+
+    aml_buf_alloc(ACPI_NAMESEG_LEN);
+    if ( len )
+        memmove(p + ACPI_NAMESEG_LEN, p, len);
+    memcpy(p, s, name_len);
+    memcpy(p + name_len, "____", ACPI_NAMESEG_LEN - name_len);
+}
+
+enum {
+    PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
+    PACKAGE_LENGTH_2BYTE_SHIFT = 4,
+    PACKAGE_LENGTH_3BYTE_SHIFT = 12,
+    PACKAGE_LENGTH_4BYTE_SHIFT = 20,
+};
+
+static void build_prepend_package_length(uint8_t *pkg, uint32_t length)
+{
+    uint8_t byte;
+    unsigned length_bytes;
+
+    if ( length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT) )
+        length_bytes = 1;
+    else if ( length + 2 < (1 << PACKAGE_LENGTH_3BYTE_SHIFT) )
+        length_bytes = 2;
+    else if ( length + 3 < (1 << PACKAGE_LENGTH_4BYTE_SHIFT) )
+        length_bytes = 3;
+    else
+        length_bytes = 4;
+
+    length += length_bytes;
+
+    switch ( length_bytes )
+    {
+    case 1:
+        byte = length;
+        build_prepend_byte(pkg, byte);
+        return;
+    case 4:
+        byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT;
+        build_prepend_byte(pkg, byte);
+        length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1;
+        /* fall through */
+    case 3:
+        byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT;
+        build_prepend_byte(pkg, byte);
+        length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1;
+        /* fall through */
+    case 2:
+        byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT;
+        build_prepend_byte(pkg, byte);
+        length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1;
+        /* fall through */
+    }
+    /*
+     * Most significant two bits of byte zero indicate how many following bytes
+     * are in PkgLength encoding.
+     */
+    byte = ((length_bytes - 1) << PACKAGE_LENGTH_1BYTE_SHIFT) | length;
+    build_prepend_byte(pkg, byte);
+}
+
+static void build_prepend_package(uint8_t *buf, uint8_t op)
+{
+    uint32_t length = get_package_length(buf);
+    build_prepend_package_length(buf, length);
+    build_prepend_byte(buf, op);
+}
+
+static void build_prepend_ext_packge(uint8_t *buf, uint8_t op)
+{
+    build_prepend_package(buf, op);
+    build_prepend_byte(buf, AML_OP_EXT);
+}
+
+void *aml_build_begin(struct acpi_ctxt *ctxt)
+{
+    alloc.ctxt = ctxt;
+    alloc.buf = ctxt->mem_ops.alloc(ctxt,
+                                    ctxt->min_alloc_unit, ctxt->min_alloc_align);
+    alloc.capacity = ctxt->min_alloc_unit;
+    alloc.used = 0;
+    return alloc.buf;
+}
+
+uint32_t aml_build_end(void)
+{
+    return alloc.used;
+}
+
+void aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t blob_length)
+{
+    uint32_t len;
+
+    len = buf - alloc.buf;
+    len = alloc.used - len;
+
+    aml_buf_alloc(blob_length);
+    if ( len )
+        memmove(buf + blob_length, buf, len);
+
+    memcpy(buf, blob, blob_length);
+}
+
+void aml_prepend_device(uint8_t *buf, const char *name)
+{
+    build_prepend_name(buf, name);
+    build_prepend_ext_packge(buf, AML_OP_DEVICE);
+}
+
+void aml_prepend_scope(uint8_t *buf, const char *name)
+{
+    build_prepend_name(buf, name);
+    build_prepend_package(buf, AML_OP_SCOPE);
+}
diff --git a/tools/libacpi/aml_build.h b/tools/libacpi/aml_build.h
new file mode 100644
index 0000000..ed68f66
--- /dev/null
+++ b/tools/libacpi/aml_build.h
@@ -0,0 +1,83 @@
+/*
+ * tools/libacpi/aml_build.h
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ */
+
+#ifndef _AML_BUILD_H_
+#define _AML_BUILD_H_
+
+#include <stdint.h>
+#include "libacpi.h"
+
+/*
+ * NB: All aml_prepend_* calls, which build AML code in one ACPI
+ *     table, should be placed between a pair of calls to
+ *     aml_build_begin() and aml_build_end().
+ */
+
+/**
+ * Reset the AML builder and begin a new round of building.
+ *
+ * Parameters:
+ *   @ctxt: ACPI context used by the AML builder
+ *
+ * Returns:
+ *   a pointer to the builder buffer where the AML code will be stored
+ */
+void *aml_build_begin(struct acpi_ctxt *ctxt);
+
+/**
+ * Mark the end of a round of AML building.
+ *
+ * Returns:
+ *  the number of bytes in the builder buffer built in this round
+ */
+uint32_t aml_build_end(void);
+
+/**
+ * Prepend a blob, which can contain arbitrary content, to the builder buffer.
+ *
+ * Parameters:
+ *   @buf:    pointer to the builder buffer
+ *   @blob:   pointer to the blob
+ *   @length: the number of bytes in the blob
+ */
+void aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t length);
+
+/**
+ * Prepend an AML device structure to the builder buffer. The existing
+ * data in the builder buffer is included in the AML device.
+ *
+ * Parameters:
+ *   @buf:  pointer to the builder buffer
+ *   @name: the name of the device
+ */
+void aml_prepend_device(uint8_t *buf, const char *name);
+
+/**
+ * Prepend an AML scope structure to the builder buffer. The existing
+ * data in the builder buffer is included in the AML scope.
+ *
+ * Parameters:
+ *   @buf:  pointer to the builder buffer
+ *   @name: the name of the scope
+ */
+void aml_prepend_scope(uint8_t *buf, const char *name);
+
+#endif /* _AML_BUILD_H_ */
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index c4e4117..a904927 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -77,11 +77,12 @@ endif
 
 ACPI_PATH  = $(XEN_ROOT)/tools/libacpi
 ACPI_FILES = dsdt_pvh.c
-ACPI_OBJS  = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o
+ACPI_OBJS  = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o aml_build.o
 $(ACPI_FILES): acpi
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/libxl_x86_acpi.h\"
 vpath build.c $(ACPI_PATH)/
 vpath static_tables.c $(ACPI_PATH)/
+vpath aml_build.c $(ACPI_PATH)/
 LIBXL_OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
 
 .PHONY: acpi
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (9 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 10/16] tools/libacpi: add a simple AML builder Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 21:40   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Andrew Cooper, Ian Jackson,
	Jan Beulich, Wei Liu

ACPI tables built by the device model, whose signatures do not
conflict with tables built by Xen (except SSDT), are loaded after ACPI
tables built by Xen.

ACPI namespace devices built by the device model, whose names do not
conflict with devices built by Xen, are assembled and placed in SSDTs
after ACPI tables built by Xen.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/firmware/hvmloader/util.c |  12 +++
 tools/libacpi/acpi2_0.h         |   2 +
 tools/libacpi/build.c           | 216 ++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h         |   5 +
 4 files changed, 235 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index dba954a..e6530cd 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -998,6 +998,18 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     if ( !strncmp(xenstore_read("platform/acpi_s4", "1"), "1", 1)  )
         config->table_flags |= ACPI_HAS_SSDT_S4;
 
+    s = xenstore_read(HVM_XS_DM_ACPI_ADDRESS, NULL);
+    if ( s )
+    {
+        config->dm.addr = strtoll(s, NULL, 0);
+
+        s = xenstore_read(HVM_XS_DM_ACPI_LENGTH, NULL);
+        if ( s )
+            config->dm.length = strtoll(s, NULL, 0);
+        else
+            config->dm.addr = 0;
+    }
+
     config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC | ACPI_HAS_WAET);
 
     config->tis_hdr = (uint16_t *)ACPI_TIS_HDR_ADDRESS;
diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 775eb7a..7414470 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -430,6 +430,7 @@ struct acpi_20_slit {
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_2_0_SSDT_SIGNATURE ASCII32('S','S','D','T')
 
 /*
  * Table revision numbers.
@@ -445,6 +446,7 @@ struct acpi_20_slit {
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
 #define ACPI_2_0_SLIT_REVISION 0x01
+#define ACPI_2_0_SSDT_REVISION 0x02
 
 #pragma pack ()
 
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index 47dae01..829a365 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -20,6 +20,7 @@
 #include "ssdt_s4.h"
 #include "ssdt_tpm.h"
 #include "ssdt_pm.h"
+#include "aml_build.h"
 #include <xen/hvm/hvm_info_table.h>
 #include <xen/hvm/hvm_xs_strings.h>
 #include <xen/hvm/params.h>
@@ -55,6 +56,34 @@ struct acpi_info {
     uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
 };
 
+#define DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */
+#define DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML definition of an ACPI namespace device */
+
+/* ACPI tables of following signatures should not appear in DM ACPI */
+static const uint64_t dm_acpi_signature_blacklist[] = {
+    ACPI_2_0_RSDP_SIGNATURE,
+    ACPI_2_0_FACS_SIGNATURE,
+    ACPI_2_0_FADT_SIGNATURE,
+    ACPI_2_0_MADT_SIGNATURE,
+    ACPI_2_0_RSDT_SIGNATURE,
+    ACPI_2_0_XSDT_SIGNATURE,
+    ACPI_2_0_TCPA_SIGNATURE,
+    ACPI_2_0_HPET_SIGNATURE,
+    ACPI_2_0_WAET_SIGNATURE,
+    ACPI_2_0_SRAT_SIGNATURE,
+    ACPI_2_0_SLIT_SIGNATURE,
+};
+
+/* ACPI namespace devices of following names should not appear in DM ACPI */
+static const char *dm_acpi_devname_blacklist[] = {
+    "MEM0",
+    "PCI0",
+    "AC",
+    "BAT0",
+    "BAT1",
+    "TPM",
+};
+
 static void set_checksum(
     void *table, uint32_t checksum_offset, uint32_t length)
 {
@@ -339,6 +368,190 @@ static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
     return nr_added;
 }
 
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
+
+static int check_signature_collision(uint64_t sig)
+{
+    int i;
+    for ( i = 0; i < ARRAY_SIZE(dm_acpi_signature_blacklist); i++ )
+    {
+        if ( sig == dm_acpi_signature_blacklist[i] )
+            return 1;
+    }
+    return 0;
+}
+
+static int check_devname_collision(const char *name)
+{
+    int i;
+    for ( i = 0; i < ARRAY_SIZE(dm_acpi_devname_blacklist); i++ )
+    {
+        if ( !strncmp(name, dm_acpi_devname_blacklist[i], 4) )
+            return 1;
+    }
+    return 0;
+}
+
+static const char *xs_read_dm_acpi_blob_key(struct acpi_ctxt *ctxt,
+                                            const char *name, const char *key)
+{
+#define DM_ACPI_BLOB_PATH_MAX_LENGTH 30
+    char path[DM_ACPI_BLOB_PATH_MAX_LENGTH];
+    snprintf(path, DM_ACPI_BLOB_PATH_MAX_LENGTH, HVM_XS_DM_ACPI_ROOT"/%s/%s",
+             name, key);
+    return ctxt->xs_ops.read(ctxt, path);
+}
+
+static int construct_dm_table(struct acpi_ctxt *ctxt,
+                              unsigned long *table_ptrs, int nr_tables,
+                              const void *blob, uint32_t length)
+{
+    const struct acpi_header *header = blob;
+    uint8_t *buffer;
+
+    if ( check_signature_collision(header->signature) )
+        return 0;
+
+    if ( header->length > length || header->length == 0 )
+        return 0;
+
+    buffer = ctxt->mem_ops.alloc(ctxt, header->length, 16);
+    if ( !buffer )
+        return 0;
+    memcpy(buffer, header, header->length);
+
+    /* some device models (e.g. QEMU) does not set checksum */
+    set_checksum(buffer, offsetof(struct acpi_header, checksum),
+                 header->length);
+
+    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
+
+    return 1;
+}
+
+static int construct_dm_nsdev(struct acpi_ctxt *ctxt,
+                              unsigned long *table_ptrs, int nr_tables,
+                              const char *dev_name,
+                              const void *blob, uint32_t blob_length)
+{
+    struct acpi_header ssdt, *header;
+    uint8_t *buffer;
+
+    if ( check_devname_collision(dev_name) )
+        return 0;
+
+    /* built ACPI namespace device from [name, blob] */
+    buffer = aml_build_begin(ctxt);
+    aml_prepend_blob(buffer, blob, blob_length);
+    aml_prepend_device(buffer, dev_name);
+    aml_prepend_scope(buffer, "\\_SB");
+
+    /* build SSDT header */
+    ssdt.signature = ACPI_2_0_SSDT_SIGNATURE;
+    ssdt.revision = ACPI_2_0_SSDT_REVISION;
+    fixed_strcpy(ssdt.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(ssdt.oem_table_id, ACPI_OEM_TABLE_ID);
+    ssdt.oem_revision = ACPI_OEM_REVISION;
+    ssdt.creator_id = ACPI_CREATOR_ID;
+    ssdt.creator_revision = ACPI_CREATOR_REVISION;
+
+    /* prepend SSDT header to ACPI namespace device */
+    aml_prepend_blob(buffer, &ssdt, sizeof(ssdt));
+    header = (struct acpi_header *) buffer;
+    header->length = aml_build_end();
+
+    /* calculate checksum of SSDT */
+    set_checksum(header, offsetof(struct acpi_header, checksum),
+                 header->length);
+
+    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
+
+    return 1;
+}
+
+/*
+ * All ACPI stuffs built by the device model are placed in the guest
+ * buffer whose address and size are specified by config->dm.{addr, length},
+ * or XenStore keys HVM_XS_DM_ACPI_{ADDRESS, LENGTH}.
+ *
+ * The data layout within the buffer is further specified by XenStore
+ * directories under HVM_XS_DM_ACPI_ROOT. Each directory specifies a
+ * data blob and contains following XenStore keys:
+ *
+ * - "type":
+ *   * DM_ACPI_BLOB_TYPE_TABLE
+ *     The data blob specified by this directory is an ACPI table.
+ *   * DM_ACPI_BLOB_TYPE_NSDEV
+ *     The data blob specified by this directory is an ACPI namespace device.
+ *     Its name is specified by the directory name, while the AML code of the
+ *     body of the AML device structure is in the data blob.
+ *
+ * - "length": the number of bytes in this data blob.
+ *
+ * - "offset": the offset in bytes of this data blob from the beginning of buffer
+ */
+static int construct_dm_tables(struct acpi_ctxt *ctxt,
+                               unsigned long *table_ptrs,
+                               int nr_tables,
+                               struct acpi_config *config)
+{
+    const char *s;
+    char **dir;
+    uint8_t type;
+    void *blob;
+    unsigned int num, length, offset, i;
+    int nr_added = 0;
+
+    if ( !config->dm.addr )
+        return 0;
+
+    dir = ctxt->xs_ops.directory(ctxt, HVM_XS_DM_ACPI_ROOT, &num);
+    if ( !dir || !num )
+        return 0;
+
+    if ( num > ACPI_MAX_SECONDARY_TABLES - nr_tables )
+        return 0;
+
+    for ( i = 0; i < num; i++, dir++ )
+    {
+        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "type");
+        if ( !s )
+            continue;
+        type = (uint8_t)strtoll(s, NULL, 0);
+
+        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "length");
+        if ( !s )
+            continue;
+        length = (uint32_t)strtoll(s, NULL, 0);
+
+        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "offset");
+        if ( !s )
+            continue;
+        offset = (uint32_t)strtoll(s, NULL, 0);
+
+        blob = ctxt->mem_ops.p2v(ctxt, config->dm.addr + offset);
+
+        switch ( type )
+        {
+        case DM_ACPI_BLOB_TYPE_TABLE:
+            nr_added += construct_dm_table(ctxt,
+                                           table_ptrs, nr_tables + nr_added,
+                                           blob, length);
+            break;
+        case DM_ACPI_BLOB_TYPE_NSDEV:
+            nr_added += construct_dm_nsdev(ctxt,
+                                           table_ptrs, nr_tables + nr_added,
+                                           *dir, blob, length);
+            break;
+        default:
+            /* skip blobs of unknown types */
+            continue;
+        }
+    }
+
+    return nr_added;
+}
+
 static int construct_secondary_tables(struct acpi_ctxt *ctxt,
                                       unsigned long *table_ptrs,
                                       struct acpi_config *config,
@@ -461,6 +674,9 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
     nr_tables += construct_passthrough_tables(ctxt, table_ptrs,
                                               nr_tables, config);
 
+    /* Load any additional tables passed from device model (e.g. QEMU) */
+    nr_tables += construct_dm_tables(ctxt, table_ptrs, nr_tables, config);
+
     table_ptrs[nr_tables] = 0;
     return nr_tables;
 }
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 12cafd8..684502d 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -82,6 +82,11 @@ struct acpi_config {
         uint32_t length;
     } pt;
 
+    struct {
+        uint32_t addr;
+        uint32_t length;
+    } dm;
+
     struct acpi_numa numa;
     const struct hvm_info_table *hvminfo;
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (10 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 21:47   ` Konrad Rzeszutek Wilk
  2017-01-27 21:48   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests Haozhong Zhang
                   ` (4 subsequent siblings)
  16 siblings, 2 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Xiao Guangrong, Ian Jackson

For xl vNVDIMM configs
  vnvdimms = [ '/path/to/pmem0', '/path/to/pmem1', ... ]

the following qemu options are built
  -machine <existing options>,nvdimm
  -m <existing options>,slots=$NR_SLOTS,maxmem=$MEM_SIZE
  -object memory-backend-xen,id=mem1,size=$PMEM0_SIZE,mem-path=/path/to/pmem0
  -device nvdimm,id=nvdimm1,memdev=mem1
  -object memory-backend-xen,id=mem2,size=$PMEM1_SIZE,mem-path=/path/to/pmem1
  -device nvdimm,id=nvdimm2,memdev=mem2
  ...
where
* NR_SLOTS is the number of entries in vnvdimms + 1,
* MEM_SIZE is the total size of all RAM and NVDIMM devices,
* PMEM#_SIZE is the size of the host pmem device/file '/path/to/pmem#'.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_dm.c      | 113 +++++++++++++++++++++++++++++++++++++++++++-
 tools/libxl/libxl_types.idl |   8 ++++
 tools/libxl/xl_cmdimpl.c    |  16 +++++++
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index ad366a8..6b8c019 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -24,6 +24,10 @@
 #include <sys/types.h>
 #include <pwd.h>
 
+#if defined(__linux__)
+#include <linux/fs.h>
+#endif
+
 static const char *libxl_tapif_script(libxl__gc *gc)
 {
 #if defined(__linux__) || defined(__FreeBSD__)
@@ -905,6 +909,86 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *target_path,
     return drive;
 }
 
+#if defined(__linux__)
+
+static uint64_t libxl__build_dm_vnvdimm_args(libxl__gc *gc, flexarray_t *dm_args,
+                                             struct libxl_device_vnvdimm *dev,
+                                             int dev_no)
+{
+    int fd, rc;
+    struct stat st;
+    uint64_t size = 0;
+    char *arg;
+
+    fd = open(dev->file, O_RDONLY);
+    if (fd < 0) {
+        LOG(ERROR, "failed to open file %s: %s",
+            dev->file, strerror(errno));
+        goto out;
+    }
+
+    if (stat(dev->file, &st)) {
+        LOG(ERROR, "failed to get status of file %s: %s",
+            dev->file, strerror(errno));
+        goto out_fclose;
+    }
+
+    switch (st.st_mode & S_IFMT) {
+    case S_IFBLK:
+        rc = ioctl(fd, BLKGETSIZE64, &size);
+        if (rc == -1) {
+            LOG(ERROR, "failed to get size of block device %s: %s",
+                dev->file, strerror(errno));
+            size = 0;
+        }
+        break;
+
+    case S_IFREG:
+        size = st.st_size;
+        break;
+
+    default:
+        LOG(ERROR, "%s is not a block device or regular file", dev->file);
+        break;
+    }
+
+    if (!size)
+        goto out_fclose;
+
+    flexarray_append(dm_args, "-object");
+    arg = GCSPRINTF("memory-backend-xen,id=mem%d,size=%"PRIu64",mem-path=%s",
+                    dev_no + 1, size, dev->file);
+    flexarray_append(dm_args, arg);
+
+    flexarray_append(dm_args, "-device");
+    arg = GCSPRINTF("nvdimm,id=nvdimm%d,memdev=mem%d", dev_no + 1, dev_no + 1);
+    flexarray_append(dm_args, arg);
+
+ out_fclose:
+    close(fd);
+ out:
+    return size;
+}
+
+static uint64_t libxl__build_dm_vnvdimms_args(
+    libxl__gc *gc, flexarray_t *dm_args,
+    struct libxl_device_vnvdimm *vnvdimms, int num_vnvdimms)
+{
+    uint64_t total_size = 0, size;
+    int i;
+
+    for (i = 0; i < num_vnvdimms; i++) {
+        size = libxl__build_dm_vnvdimm_args(gc, dm_args, &vnvdimms[i], i);
+        if (!size)
+            break;
+        total_size += size;
+    }
+
+    return total_size;
+}
+
+#endif /* __linux__ */
+
 static int libxl__build_device_model_args_new(libxl__gc *gc,
                                         const char *dm, int guest_domid,
                                         const libxl_domain_config *guest_config,
@@ -918,13 +1002,18 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
     const libxl_device_nic *nics = guest_config->nics;
     const int num_disks = guest_config->num_disks;
     const int num_nics = guest_config->num_nics;
+#if defined(__linux__)
+    const int num_vnvdimms = guest_config->num_vnvdimms;
+#else
+    const int num_vnvdimms = 0;
+#endif
     const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
     const libxl_sdl_info *sdl = dm_sdl(guest_config);
     const char *keymap = dm_keymap(guest_config);
     char *machinearg;
     flexarray_t *dm_args, *dm_envs;
     int i, connection, devid, ret;
-    uint64_t ram_size;
+    uint64_t ram_size, ram_size_in_byte, vnvdimms_size = 0;
     const char *path, *chardev;
     char *user = NULL;
 
@@ -1307,6 +1396,9 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
             }
         }
 
+        if (num_vnvdimms)
+            machinearg = libxl__sprintf(gc, "%s,nvdimm", machinearg);
+
         flexarray_append(dm_args, machinearg);
         for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++)
             flexarray_append(dm_args, b_info->extra_hvm[i]);
@@ -1316,8 +1408,25 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
     }
 
     ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb);
+    ram_size_in_byte = ram_size * 1024 * 1024;
+    if (num_vnvdimms) {
+        vnvdimms_size = libxl__build_dm_vnvdimms_args(gc, dm_args,
+                                                     guest_config->vnvdimms,
+                                                     num_vnvdimms);
+        if (ram_size_in_byte + vnvdimms_size < ram_size_in_byte) {
+            LOG(ERROR,
+                "total size of RAM (%"PRIu64") and NVDIMM (%"PRIu64") overflow",
+                ram_size_in_byte, vnvdimms_size);
+            return ERROR_INVAL;
+        }
+    }
     flexarray_append(dm_args, "-m");
-    flexarray_append(dm_args, GCSPRINTF("%"PRId64, ram_size));
+    flexarray_append(dm_args,
+                     vnvdimms_size ?
+                     GCSPRINTF("%"PRId64",slots=%d,maxmem=%"PRId64,
+                               ram_size, num_vnvdimms + 1,
+                               ROUNDUP(ram_size_in_byte + vnvdimms_size, 12)) :
+                     GCSPRINTF("%"PRId64, ram_size));
 
     if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
         if (b_info->u.hvm.hdtype == LIBXL_HDTYPE_AHCI)
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a32c751..76e4643 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -703,6 +703,13 @@ libxl_device_channel = Struct("device_channel", [
            ])),
 ])
 
+libxl_device_vnvdimm = Struct("device_vnvdimm", [
+    ("backend_domid",   libxl_domid),
+    ("backend_domname", string),
+    ("devid",           libxl_devid),
+    ("file",            string),
+])
+
 libxl_domain_config = Struct("domain_config", [
     ("c_info", libxl_domain_create_info),
     ("b_info", libxl_domain_build_info),
@@ -720,6 +727,7 @@ libxl_domain_config = Struct("domain_config", [
     ("channels", Array(libxl_device_channel, "num_channels")),
     ("usbctrls", Array(libxl_device_usbctrl, "num_usbctrls")),
     ("usbdevs", Array(libxl_device_usbdev, "num_usbdevs")),
+    ("vnvdimms", Array(libxl_device_vnvdimm, "num_vnvdimms")),
 
     ("on_poweroff", libxl_action_on_shutdown),
     ("on_reboot", libxl_action_on_shutdown),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index bb5afb8..e314b42 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1330,6 +1330,7 @@ static void parse_config_data(const char *config_source,
     XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms,
                    *usbctrls, *usbdevs;
     XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian, *dtdevs;
+    XLU_ConfigList *vnvdimms;
     int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian;
     int pci_power_mgmt = 0;
     int pci_msitranslate = 0;
@@ -2514,6 +2515,21 @@ skip_usbdev:
         }
      }
 
+    if (!xlu_cfg_get_list (config, "vnvdimms", &vnvdimms, 0, 0)) {
+#if defined(__linux__)
+        while ((buf = xlu_cfg_get_listitem(vnvdimms,
+                                           d_config->num_vnvdimms)) != NULL) {
+            libxl_device_vnvdimm *vnvdimm =
+                ARRAY_EXTEND_INIT(d_config->vnvdimms, d_config->num_vnvdimms,
+                                  libxl_device_vnvdimm_init);
+            vnvdimm->file = strdup(buf);
+        }
+#else
+        fprintf(stderr, "ERROR: vnvdimms is only supported on Linux\n");
+        exit(-ERROR_FAIL);
+#endif /* __linux__ */
+    }
+
     xlu_cfg_destroy(config);
 }
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (11 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 22:06   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 14/16] tools/libxl: add support to map files on pmem devices " Haozhong Zhang
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Xiao Guangrong, Ian Jackson

We can map host pmem devices or files on pmem devices to guests. This
patch adds support to map pmem devices. The implementation relies on the
Linux pmem driver, so it currently functions only when libxl is compiled
for Linux.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/Makefile       |   2 +-
 tools/libxl/libxl_nvdimm.c | 210 +++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nvdimm.h |  45 ++++++++++
 3 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_nvdimm.c
 create mode 100644 tools/libxl/libxl_nvdimm.h

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index a904927..ecc9ae1 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -106,7 +106,7 @@ ifeq ($(CONFIG_NetBSD),y)
 LIBXL_OBJS-y += libxl_netbsd.o
 else
 ifeq ($(CONFIG_Linux),y)
-LIBXL_OBJS-y += libxl_linux.o
+LIBXL_OBJS-y += libxl_linux.o libxl_nvdimm.o
 else
 ifeq ($(CONFIG_FreeBSD),y)
 LIBXL_OBJS-y += libxl_freebsd.o
diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
new file mode 100644
index 0000000..7bcbaaf
--- /dev/null
+++ b/tools/libxl/libxl_nvdimm.c
@@ -0,0 +1,210 @@
+/*
+ * tools/libxl/libxl_nvdimm.c
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ */
+
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdint.h>
+
+#include "libxl_internal.h"
+#include "libxl_arch.h"
+#include "libxl_nvdimm.h"
+
+#include <xc_dom.h>
+
+#define BLK_DEVICE_ROOT "/sys/dev/block"
+
+static int nvdimm_sysfs_read(libxl__gc *gc,
+                             unsigned int major, unsigned int minor,
+                             const char *name, void **data_r)
+{
+    char *path = libxl__sprintf(gc, BLK_DEVICE_ROOT"/%u:%u/device/%s",
+                                major, minor, name);
+    return libxl__read_sysfs_file_contents(gc, path, data_r, NULL);
+}
+
+static int nvdimm_get_spa(libxl__gc *gc, unsigned int major, unsigned int minor,
+                          uint64_t *spa_r)
+{
+    void *data;
+    int ret = nvdimm_sysfs_read(gc, major, minor, "resource", &data);
+
+    if ( ret )
+        return ret;
+
+    *spa_r = strtoll(data, NULL, 0);
+    return 0;
+}
+
+static int nvdimm_get_size(libxl__gc *gc, unsigned int major, unsigned int minor,
+                           uint64_t *size_r)
+{
+    void *data;
+    int ret = nvdimm_sysfs_read(gc, major, minor, "size", &data);
+
+    if ( ret )
+        return ret;
+
+    *size_r = strtoll(data, NULL, 0);
+
+    return 0;
+}
+
+static int add_pages(libxl__gc *gc, uint32_t domid,
+                     xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
+{
+    unsigned int nr;
+    int ret = 0;
+
+    while ( nr_mfns )
+    {
+        nr = min(nr_mfns, (unsigned long) UINT_MAX);
+
+        ret = xc_domain_populate_pmemmap(CTX->xch, domid, mfn, gpfn, nr);
+        if ( ret )
+        {
+            LOG(ERROR, "failed to map pmem pages, "
+                "mfn 0x%" PRIx64", gpfn 0x%" PRIx64 ", nr_mfns %u, err %d",
+                mfn, gpfn, nr, ret);
+            break;
+        }
+
+        nr_mfns -= nr;
+        mfn += nr;
+        gpfn += nr;
+    }
+
+    return ret;
+}
+
+static int add_file(libxl__gc *gc, uint32_t domid, int fd,
+                    xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
+{
+    return -EINVAL;
+}
+
+int libxl_nvdimm_add_device(libxl__gc *gc,
+                            uint32_t domid, const char *path,
+                            uint64_t guest_spa, uint64_t guest_size)
+{
+    int fd;
+    struct stat st;
+    unsigned int major, minor;
+    uint64_t host_spa, host_size;
+    xen_pfn_t mfn, gpfn;
+    unsigned long nr_gpfns;
+    int ret;
+
+    if ( (guest_spa & ~XC_PAGE_MASK) || (guest_size & ~XC_PAGE_MASK) )
+        return -EINVAL;
+
+    fd = open(path, O_RDONLY);
+    if ( fd < 0 )
+    {
+        LOG(ERROR, "failed to open file %s (err: %d)", path, errno);
+        return -EIO;
+    }
+
+    ret = fstat(fd, &st);
+    if ( ret )
+    {
+        LOG(ERROR, "failed to get status of file %s (err: %d)",
+            path, errno);
+        goto out;
+    }
+
+    switch ( st.st_mode & S_IFMT )
+    {
+    case S_IFBLK:
+        major = major(st.st_rdev);
+        minor = minor(st.st_rdev);
+        break;
+
+    case S_IFREG:
+        major = major(st.st_dev);
+        minor = minor(st.st_dev);
+        break;
+
+    default:
+        LOG(ERROR, "%s is neither a block device nor a regular file", path);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = nvdimm_get_spa(gc, major, minor, &host_spa);
+    if ( ret )
+    {
+        LOG(ERROR, "failed to get SPA of device %u:%u", major, minor);
+        goto out;
+    }
+    else if ( host_spa & ~XC_PAGE_MASK )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = nvdimm_get_size(gc, major, minor, &host_size);
+    if ( ret )
+    {
+        LOG(ERROR, "failed to get size of device %u:%u", major, minor);
+        goto out;
+    }
+    else if ( guest_size > host_size )
+    {
+        LOG(ERROR, "vNVDIMM size %" PRIu64 " expires NVDIMM size %" PRIu64,
+            guest_size, host_size);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    mfn = host_spa >> XC_PAGE_SHIFT;
+    gpfn = guest_spa >> XC_PAGE_SHIFT;
+    nr_gpfns = guest_size >> XC_PAGE_SHIFT;
+
+    switch ( st.st_mode & S_IFMT )
+    {
+    case S_IFBLK:
+        ret = add_pages(gc, domid, mfn, gpfn, nr_gpfns);
+        break;
+
+    case S_IFREG:
+        ret = add_file(gc, domid, fd, mfn, gpfn, nr_gpfns);
+        break;
+
+    default:
+        LOG(ERROR, "%s is neither a block device nor a regular file", path);
+        ret = -EINVAL;
+    }
+
+ out:
+    close(fd);
+    return ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nvdimm.h b/tools/libxl/libxl_nvdimm.h
new file mode 100644
index 0000000..4de2fb2
--- /dev/null
+++ b/tools/libxl/libxl_nvdimm.h
@@ -0,0 +1,45 @@
+/*
+ * tools/libxl/libxl_nvdimm.h
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ */
+
+#ifndef LIBXL_NVDIMM_H
+#define LIBXL_NVDIMM_H
+
+#include <stdint.h>
+#include "libxl_internal.h"
+
+#if defined(__linux__)
+
+int libxl_nvdimm_add_device(libxl__gc *gc,
+                            uint32_t domid, const char *path,
+                            uint64_t spa, uint64_t length);
+
+#else
+
+int libxl_nvdimm_add_device(libxl__gc *gc,
+                            uint32_t domid, const char *path,
+                            uint64_t spa, uint64_t length)
+{
+    return -EINVAL;
+}
+
+#endif /* __linux__ */
+
+#endif /* !LIBXL_NVDIMM_H */
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 14/16] tools/libxl: add support to map files on pmem devices to guests
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (12 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 22:10   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations() Haozhong Zhang
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Xiao Guangrong, Ian Jackson

We can map host pmem devices or files on pmem devices to guests. This
patch adds support to map files on pmem devices. The implementation
relies on the Linux pmem driver and kernel APIs, so it currently
functions only when libxl is compiled for Linux.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_nvdimm.c | 73 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
index 7bcbaaf..b3ba19a 100644
--- a/tools/libxl/libxl_nvdimm.c
+++ b/tools/libxl/libxl_nvdimm.c
@@ -25,6 +25,9 @@
 #include <unistd.h>
 #include <errno.h>
 #include <stdint.h>
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+#include <linux/fiemap.h>
 
 #include "libxl_internal.h"
 #include "libxl_arch.h"
@@ -97,10 +100,78 @@ static int add_pages(libxl__gc *gc, uint32_t domid,
     return ret;
 }
 
+static uint64_t
+get_file_extents(libxl__gc *gc, int fd, unsigned long length,
+                 struct fiemap_extent **extents_r)
+{
+    struct fiemap *fiemap;
+    uint64_t nr_extents = 0, extents_size;
+
+    fiemap = libxl__zalloc(gc, sizeof(*fiemap));
+    if ( !fiemap )
+        goto out;
+
+    fiemap->fm_length = length;
+    if ( ioctl(fd, FS_IOC_FIEMAP, fiemap) < 0 )
+        goto out;
+
+    nr_extents = fiemap->fm_mapped_extents;
+    extents_size = sizeof(struct fiemap_extent) * nr_extents;
+    fiemap = libxl__realloc(gc, fiemap, sizeof(*fiemap) + extents_size);
+    if ( !fiemap )
+        goto out;
+
+    memset(fiemap->fm_extents, 0, extents_size);
+    fiemap->fm_extent_count = nr_extents;
+    fiemap->fm_mapped_extents = 0;
+
+    if ( ioctl(fd, FS_IOC_FIEMAP, fiemap) < 0 )
+        goto out;
+
+    *extents_r = fiemap->fm_extents;
+
+ out:
+    return nr_extents;
+}
+
 static int add_file(libxl__gc *gc, uint32_t domid, int fd,
                     xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
 {
-    return -EINVAL;
+    struct fiemap_extent *extents;
+    uint64_t nr_extents, i;
+    int ret = 0;
+
+    nr_extents = get_file_extents(gc, fd, nr_mfns << XC_PAGE_SHIFT, &extents);
+    if ( !nr_extents )
+        return -EIO;
+
+    for ( i = 0; i < nr_extents; i++ )
+    {
+        uint64_t p_offset = extents[i].fe_physical;
+        uint64_t l_offset = extents[i].fe_logical;
+        uint64_t length = extents[i].fe_length;
+
+        if ( extents[i].fe_flags & ~FIEMAP_EXTENT_LAST )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( (p_offset | l_offset | length) & ~XC_PAGE_MASK )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        ret = add_pages(gc, domid,
+                        mfn + (p_offset >> XC_PAGE_SHIFT),
+                        gpfn + (l_offset >> XC_PAGE_SHIFT),
+                        length >> XC_PAGE_SHIFT);
+        if ( ret )
+            break;
+    }
+
+    return ret;
 }
 
 int libxl_nvdimm_add_device(libxl__gc *gc,
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (13 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 14/16] tools/libxl: add support to map files on pmem devices " Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 22:11   ` Konrad Rzeszutek Wilk
  2016-10-10  0:32 ` [RFC XEN PATCH 16/16] tools/libxl: initiate pmem mapping via qmp callback Haozhong Zhang
  2016-10-24 16:37 ` [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Wei Liu
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Xiao Guangrong, Ian Jackson

If any error code is returned when creating a domain, stop the domain
creation.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_create.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d986cd2..24e8368 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
     if (dcs->sdss.dm.guest_domid) {
         if (d_config->b_info.device_model_version
             == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
-            libxl__qmp_initializations(gc, domid, d_config);
+            ret = libxl__qmp_initializations(gc, domid, d_config);
+            if (ret)
+                goto error_out;
         }
     }
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC XEN PATCH 16/16] tools/libxl: initiate pmem mapping via qmp callback
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (14 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations() Haozhong Zhang
@ 2016-10-10  0:32 ` Haozhong Zhang
  2017-01-27 22:13   ` Konrad Rzeszutek Wilk
  2016-10-24 16:37 ` [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Wei Liu
  16 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-10  0:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Haozhong Zhang, Wei Liu, Xiao Guangrong, Ian Jackson

QMP command 'query-nvdimms' is used by libxl to get the backend, the
guest SPA and size of each vNVDIMM device, and then libxl starts mapping
backend to guest for each vNVDIMM device.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_qmp.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index f8addf9..02edd09 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -26,6 +26,7 @@
 
 #include "_libxl_list.h"
 #include "libxl_internal.h"
+#include "libxl_nvdimm.h"
 
 /* #define DEBUG_RECEIVED */
 
@@ -1146,6 +1147,66 @@ out:
     return rc;
 }
 
+static int qmp_register_nvdimm_callback(libxl__qmp_handler *qmp,
+                                        const libxl__json_object *o,
+                                        void *unused)
+{
+    GC_INIT(qmp->ctx);
+    const libxl__json_object *obj = NULL;
+    const libxl__json_object *sub_obj = NULL;
+    int i = 0;
+    const char *mem_path;
+    uint64_t slot, spa, length;
+    int ret = 0;
+
+    for (i = 0; (obj = libxl__json_array_get(o, i)); i++) {
+        if (!libxl__json_object_is_map(obj))
+            continue;
+
+        sub_obj = libxl__json_map_get("slot", obj, JSON_INTEGER);
+        slot = libxl__json_object_get_integer(sub_obj);
+
+        sub_obj = libxl__json_map_get("mem-path", obj, JSON_STRING);
+        mem_path = libxl__json_object_get_string(sub_obj);
+        if (!mem_path) {
+            LOG(ERROR, "No mem-path is specified for NVDIMM #%" PRId64, slot);
+            ret = -EINVAL;
+            goto out;
+        }
+
+        sub_obj = libxl__json_map_get("spa", obj, JSON_INTEGER);
+        spa = libxl__json_object_get_integer(sub_obj);
+
+        sub_obj = libxl__json_map_get("length", obj, JSON_INTEGER);
+        length = libxl__json_object_get_integer(sub_obj);
+
+        LOG(DEBUG,
+            "vNVDIMM #%" PRId64 ": %s, spa 0x%" PRIx64 ", length 0x%" PRIx64,
+            slot, mem_path, spa, length);
+
+        ret = libxl_nvdimm_add_device(gc, qmp->domid, mem_path, spa, length);
+        if (ret) {
+            LOG(ERROR,
+                "Failed to add NVDIMM #%" PRId64
+                "(mem_path %s, spa 0x%" PRIx64 ", length 0x%" PRIx64 ") "
+                "to domain %d (err = %d)",
+                slot, mem_path, spa, length, qmp->domid, ret);
+            goto out;
+        }
+    }
+
+ out:
+    GC_FREE;
+    return ret;
+}
+
+static int libxl__qmp_query_nvdimms(libxl__qmp_handler *qmp)
+{
+    return qmp_synchronous_send(qmp, "query-nvdimms", NULL,
+                                qmp_register_nvdimm_callback,
+                                NULL, qmp->timeout);
+}
+
 int libxl__qmp_hmp(libxl__gc *gc, int domid, const char *command_line,
                    char **output)
 {
@@ -1187,6 +1248,9 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
     if (!ret) {
         ret = qmp_query_vnc(qmp);
     }
+    if (!ret && guest_config->num_vnvdimms) {
+        ret = libxl__qmp_query_nvdimms(qmp);
+    }
     libxl__qmp_close(qmp);
     return ret;
 }
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-10-10  0:32 ` [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions Haozhong Zhang
@ 2016-10-11 19:13   ` Andrew Cooper
  2016-12-09 22:02   ` Konrad Rzeszutek Wilk
  2016-12-22 11:58   ` Jan Beulich
  2 siblings, 0 replies; 77+ messages in thread
From: Andrew Cooper @ 2016-10-11 19:13 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel; +Cc: Daniel De Graaf, Xiao Guangrong, Jan Beulich

On 10/10/16 01:32, Haozhong Zhang wrote:
> Xen hypervisor does not include a pmem driver. Instead, it relies on the
> pmem driver in Dom0 to report the PFN ranges of the entire pmem region,
> its reserved area and data area via XENPF_pmem_add. The reserved area is
> used by Xen hypervisor to place the frame table and M2P table, and is
> disallowed to be accessed from Dom0 once it's reported.
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>

Please consider the following scenario.  A user installs Linux onto a
server, formats a DAX filesystem and fills it with content.  The user
then installs Xen and reboots.

With this current hypercall, there is an explicit expectation that Xen
is destructive to a nominated part of the NVDIMM simply by reporting its
existence.  Xen (obviously) cannot blindly be destructive to the
NVDIMM.  Destructiveness is therefore a permission which must be granted
by dom0 (as the logical owner of the content of the NVDIMM) to Xen, but
this permission can only be granted once dom0 has initially accessed the
NVDIMM to identify a safe area to use.

Therefore, reporting the existence and details of an NVDIMM must
necessarily be separate from dom0 allowing Xen to use regions for its
own purposes.  There should be two hypercalls; one which identifies the
presence of the NVDIMM and its MFN/SPA location, and one which grants
Xen permission to use an area for its own purposes.

This approach is also more flexible.  A dom0 policy decision could be
that Xen may use an entire NVDIMM as RAM, while other NVDIMMs are
exclusively preexisting DAX content which need keeping, or some part of
an NVDIMM can be reserved for Xen's use by allocating a file and
granting access to the MFNs/SPAs making up that file.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains
  2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
                   ` (15 preceding siblings ...)
  2016-10-10  0:32 ` [RFC XEN PATCH 16/16] tools/libxl: initiate pmem mapping via qmp callback Haozhong Zhang
@ 2016-10-24 16:37 ` Wei Liu
  2016-10-25  6:55   ` Haozhong Zhang
  16 siblings, 1 reply; 77+ messages in thread
From: Wei Liu @ 2016-10-24 16:37 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Xiao Guangrong, Andrew Cooper, Ian Jackson, xen-devel,
	Jan Beulich, Wei Liu, Daniel De Graaf

Hi Haozhong

All the toolstack patches seem to be tied to the hypervisor interface.
Given that the final design of how nvdimm is expected to work in Xen is
still under discussion, I think I'm going to shelve these patches for
now.

Let me know if my judgement on this matter is incorrect.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains
  2016-10-24 16:37 ` [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Wei Liu
@ 2016-10-25  6:55   ` Haozhong Zhang
  2016-10-25 11:28     ` Wei Liu
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-10-25  6:55 UTC (permalink / raw)
  To: Wei Liu
  Cc: Xiao Guangrong, Andrew Cooper, Ian Jackson, xen-devel,
	Jan Beulich, Daniel De Graaf

On 10/24/16 17:37 +0100, Wei Liu wrote:
>Hi Haozhong
>
>All the toolstack patches seem to be tied to the hypervisor interface.
>Given that the final design of how nvdimm is expected to work in Xen is
>still under discussion, I think I'm going to shelve these patches for
>now.
>

If you mean Patch 13 - 16, then yes. As the interface might be changed
per the discussion of the kernel patches and my plan, it's fine to
review them later.

Thanks,
Haozhong

>Let me know if my judgement on this matter is incorrect.
>
>Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains
  2016-10-25  6:55   ` Haozhong Zhang
@ 2016-10-25 11:28     ` Wei Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Wei Liu @ 2016-10-25 11:28 UTC (permalink / raw)
  To: Wei Liu, xen-devel, Xiao Guangrong, Konrad Rzeszutek Wilk,
	Jan Beulich, Andrew Cooper, Ian Jackson, Daniel De Graaf

On Tue, Oct 25, 2016 at 02:55:06PM +0800, Haozhong Zhang wrote:
> On 10/24/16 17:37 +0100, Wei Liu wrote:
> >Hi Haozhong
> >
> >All the toolstack patches seem to be tied to the hypervisor interface.
> >Given that the final design of how nvdimm is expected to work in Xen is
> >still under discussion, I think I'm going to shelve these patches for
> >now.
> >
> 
> If you mean Patch 13 - 16, then yes. As the interface might be changed
> per the discussion of the kernel patches and my plan, it's fine to
> review them later.
> 

Patch 12 - 16, actually.

Thanks for confirming.

Wei.

> Thanks,
> Haozhong
> 
> >Let me know if my judgement on this matter is incorrect.
> >
> >Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table
  2016-10-10  0:32 ` [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table Haozhong Zhang
@ 2016-12-09 21:35   ` Konrad Rzeszutek Wilk
  2016-12-12  2:27     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-12-09 21:35 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Xiao Guangrong, Jan Beulich, xen-devel

On Mon, Oct 10, 2016 at 08:32:20AM +0800, Haozhong Zhang wrote:
> A reserved area on each pmem region is used to place the frame table.
> However, it's not at the beginning of the pmem region, so we need to
> specify the location explicitly when extending the frame table.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/x86_64/mm.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index b8b6b70..33f226a 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -792,7 +792,8 @@ static int setup_frametable_chunk(void *start, void *end,
>      return 0;
>  }
>  
> -static int extend_frame_table(struct mem_hotadd_info *info)
> +static int extend_frame_table(struct mem_hotadd_info *info,

This looks like it could be 'const struct mem_hotadd_info *info' ?

> +                              struct mem_hotadd_info *alloc_info)
>  {
>      unsigned long cidx, nidx, eidx, spfn, epfn;
>  
> @@ -818,9 +819,9 @@ static int extend_frame_table(struct mem_hotadd_info *info)
>          nidx = find_next_bit(pdx_group_valid, eidx, cidx);
>          if ( nidx >= eidx )
>              nidx = eidx;
> -        err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT ),
> +        err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT),
>                                       pdx_to_page(nidx * PDX_GROUP_COUNT),
> -                                     info);
> +                                     alloc_info);

Granted this one modifies the 'alloc_info' in
alloc_hotadd_mfn, and 'alloc_info'
>          if ( err )
>              return err;
>  
> @@ -1413,7 +1414,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>      info.epfn = epfn;
>      info.cur = spfn;
>  
> -    ret = extend_frame_table(&info);
> +    ret = extend_frame_table(&info, &info);

is equivalant to 'info' so I am not sure I understand the purpose
behind this patch?

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table
  2016-10-10  0:32 ` [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table Haozhong Zhang
@ 2016-12-09 21:38   ` Konrad Rzeszutek Wilk
  2016-12-12  2:31     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-12-09 21:38 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Xiao Guangrong, Jan Beulich, xen-devel

On Mon, Oct 10, 2016 at 08:32:21AM +0800, Haozhong Zhang wrote:
> A reserved area on each pmem region is used to place the M2P table.
> However, it's not at the beginning of the pmem region, so we need to
> specify the location explicitly when creating the M2P table.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/x86_64/mm.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 33f226a..5c0f527 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -317,7 +317,8 @@ void destroy_m2p_mapping(struct mem_hotadd_info *info)
>   * spfn/epfn: the pfn ranges to be setup
>   * free_s/free_e: the pfn ranges that is free still
>   */
> -static int setup_compat_m2p_table(struct mem_hotadd_info *info)
> +static int setup_compat_m2p_table(struct mem_hotadd_info *info,
> +                                  struct mem_hotadd_info *alloc_info)
>  {
>      unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn;
>      unsigned int n;
> @@ -371,7 +372,7 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
>          if ( n == CNT )
>              continue;
>  
> -        mfn = alloc_hotadd_mfn(info);
> +        mfn = alloc_hotadd_mfn(alloc_info);
>          err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER,
>                                 PAGE_HYPERVISOR);
>          if ( err )
> @@ -391,7 +392,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
>   * Allocate and map the machine-to-phys table.
>   * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup already
>   */
> -static int setup_m2p_table(struct mem_hotadd_info *info)
> +static int setup_m2p_table(struct mem_hotadd_info *info,
> +                           struct mem_hotadd_info *alloc_info)
>  {
>      unsigned long i, va, smap, emap;
>      unsigned int n;
> @@ -440,7 +442,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
>                  break;
>          if ( n < CNT )
>          {
> -            unsigned long mfn = alloc_hotadd_mfn(info);
> +            unsigned long mfn = alloc_hotadd_mfn(alloc_info);
>  
>              ret = map_pages_to_xen(
>                          RDWR_MPT_VIRT_START + i * sizeof(unsigned long),
> @@ -485,7 +487,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
>  #undef CNT
>  #undef MFN
>  
> -    ret = setup_compat_m2p_table(info);
> +    ret = setup_compat_m2p_table(info, alloc_info);
>  error:
>      return ret;
>  }
> @@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>      total_pages += epfn - spfn;
>  
>      set_pdx_range(spfn, epfn);
> -    ret = setup_m2p_table(&info);
> +    ret = setup_m2p_table(&info, &info);

I am not sure I follow this logic. You are passing the same contents, it
is just that 'alloc_info' and 'info' are aliased together?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-10-10  0:32 ` [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions Haozhong Zhang
  2016-10-11 19:13   ` Andrew Cooper
@ 2016-12-09 22:02   ` Konrad Rzeszutek Wilk
  2016-12-12  4:16     ` Haozhong Zhang
  2016-12-22 11:58   ` Jan Beulich
  2 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-12-09 22:02 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Andrew Cooper, Daniel De Graaf, Xiao Guangrong, Jan Beulich, xen-devel

On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
> Xen hypervisor does not include a pmem driver. Instead, it relies on the
> pmem driver in Dom0 to report the PFN ranges of the entire pmem region,
> its reserved area and data area via XENPF_pmem_add. The reserved area is
> used by Xen hypervisor to place the frame table and M2P table, and is
> disallowed to be accessed from Dom0 once it's reported.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
>  xen/arch/x86/Makefile             |   1 +
>  xen/arch/x86/platform_hypercall.c |   7 ++
>  xen/arch/x86/pmem.c               | 161 ++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/x86_64/mm.c          |  54 +++++++++++++
>  xen/include/asm-x86/mm.h          |   4 +
>  xen/include/public/platform.h     |  14 ++++
>  xen/include/xen/pmem.h            |  31 ++++++++
>  xen/xsm/flask/hooks.c             |   1 +
>  8 files changed, 273 insertions(+)
>  create mode 100644 xen/arch/x86/pmem.c
>  create mode 100644 xen/include/xen/pmem.h
> 
> diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
> index 931917d..9cf2da1 100644
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -67,6 +67,7 @@ obj-$(CONFIG_TBOOT) += tboot.o
>  obj-y += hpet.o
>  obj-y += vm_event.o
>  obj-y += xstate.o
> +obj-y += pmem.o

If possible please keep this alphabetical. Also I wonder if it makes
sense to have CONFIG_PMEM or such?

>  
>  x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
>  
> diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
> index 0879e19..c47eea4 100644
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -24,6 +24,7 @@
>  #include <xen/pmstat.h>
>  #include <xen/irq.h>
>  #include <xen/symbols.h>
> +#include <xen/pmem.h>
>  #include <asm/current.h>
>  #include <public/platform.h>
>  #include <acpi/cpufreq/processor_perf.h>
> @@ -822,6 +823,12 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>      }
>      break;
>  
> +    case XENPF_pmem_add:

Missing call to ret = xsm_resource_plug_core(XSM_HOOK);
or something similar .

> +        ret = pmem_add(op->u.pmem_add.spfn, op->u.pmem_add.epfn,
> +                       op->u.pmem_add.rsv_spfn, op->u.pmem_add.rsv_epfn,
> +                       op->u.pmem_add.data_spfn, op->u.pmem_add.data_epfn);
> +        break;
> +
>      default:
>          ret = -ENOSYS;
>          break;
> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
> new file mode 100644
> index 0000000..70358ed
> --- /dev/null
> +++ b/xen/arch/x86/pmem.c
> @@ -0,0 +1,161 @@
> +/******************************************************************************
> + * arch/x86/pmem.c
> + *
> + * Copyright (c) 2016, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.

Hm, please consult Intel lawyers with what '(at your option)' what other
later versions they are comfortable with.

> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
> + */
> +
> +#include <xen/guest_access.h>
> +#include <xen/list.h>
> +#include <xen/spinlock.h>
> +#include <xen/pmem.h>

Since this is a new file could I ask you sort these alphabetically?

> +#include <xen/iocap.h>
> +#include <asm-x86/mm.h>
> +
> +/*
> + * All pmem regions reported from Dom0 are linked in pmem_list, which
> + * is proected by pmem_list_lock. Its entries are of type struct pmem

protected
> + * and sorted incrementally by field spa.
> + */
> +static DEFINE_SPINLOCK(pmem_list_lock);
> +static LIST_HEAD(pmem_list);
> +
> +struct pmem {
> +    struct list_head link;   /* link to pmem_list */
> +    unsigned long spfn;      /* start PFN of the whole pmem region */
> +    unsigned long epfn;      /* end PFN of the whole pmem region */
> +    unsigned long rsv_spfn;  /* start PFN of the reserved area */
> +    unsigned long rsv_epfn;  /* end PFN of the reserved area */
> +    unsigned long data_spfn; /* start PFN of the data area */
> +    unsigned long data_epfn; /* end PFN of the data area */

Why not just:
struct pmem {
	struct list_head link;
	struct xenpf_pmem_add pmem;
}

or such?

> +};
> +
> +static int is_included(unsigned long s1, unsigned long e1,

bool?
> +                       unsigned long s2, unsigned long e2)
> +{
> +    return s1 <= s2 && s2 < e2 && e2 <= e1;

Is the s2 < e2 necessary?

> +}
> +
> +static int is_overlaped(unsigned long s1, unsigned long e1,

overlapped and perhaps bool?

> +                        unsigned long s2, unsigned long e2)
> +{
> +    return (s1 <= s2 && s2 < e1) || (s2 < s1 && s1 < e2);
> +}
> +
> +static int check_reserved_size(unsigned long rsv_mfns, unsigned long total_mfns)

bool?
> +{
> +    return rsv_mfns >=
> +        ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
> +        ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
> +}
> +
> +static int pmem_add_check(unsigned long spfn, unsigned long epfn,

bool?
> +                          unsigned long rsv_spfn, unsigned long rsv_epfn,
> +                          unsigned long data_spfn, unsigned long data_epfn)
> +{
> +    if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
> +        return 0;

Hm, I think it ought to be possible to have no rsv area..?

> +
> +    if ( !is_included(spfn, epfn, rsv_spfn, rsv_epfn) ||
> +         !is_included(spfn, epfn, data_spfn, data_epfn) )
> +        return 0;
> +
> +    if ( is_overlaped(rsv_spfn, rsv_epfn, data_spfn, data_epfn) )
> +        return 0;
> +
> +    if ( !check_reserved_size(rsv_epfn - rsv_spfn, epfn - spfn) )
> +        return 0;
> +
> +    return 1;
> +}
> +
> +static int pmem_list_add(unsigned long spfn, unsigned long epfn,
> +                         unsigned long rsv_spfn, unsigned long rsv_epfn,
> +                         unsigned long data_spfn, unsigned long data_epfn)
> +{
> +    struct list_head *cur;
> +    struct pmem *new_pmem;
> +    int ret = 0;
> +
> +    spin_lock(&pmem_list_lock);
> +
> +    list_for_each_prev(cur, &pmem_list)
> +    {
> +        struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
> +        unsigned long cur_spfn = cur_pmem->spfn;
> +        unsigned long cur_epfn = cur_pmem->epfn;
> +
> +        if ( (cur_spfn <= spfn && spfn < cur_epfn) ||
> +             (spfn <= cur_spfn && cur_spfn < epfn) )
> +        {
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +
> +        if ( cur_spfn < spfn )
> +            break;
> +    }
> +
> +    new_pmem = xmalloc(struct pmem);
> +    if ( !new_pmem )
> +    {
> +        ret = -ENOMEM;
> +        goto out;
> +    }
> +    new_pmem->spfn      = spfn;
> +    new_pmem->epfn      = epfn;
> +    new_pmem->rsv_spfn  = rsv_spfn;
> +    new_pmem->rsv_epfn  = rsv_epfn;
> +    new_pmem->data_spfn = data_spfn;
> +    new_pmem->data_epfn = data_epfn;
> +    list_add(&new_pmem->link, cur);
> +
> + out:
> +    spin_unlock(&pmem_list_lock);
> +    return ret;
> +}
> +
> +int pmem_add(unsigned long spfn, unsigned long epfn,
> +             unsigned long rsv_spfn, unsigned long rsv_epfn,
> +             unsigned long data_spfn, unsigned long data_epfn)
> +{
> +    int ret;
> +
> +    if ( !pmem_add_check(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn) )
> +        return -EINVAL;
> +
> +    ret = pmem_setup(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
> +    if ( ret )
> +        goto out;
> +
> +    ret = iomem_deny_access(current->domain, rsv_spfn, rsv_epfn);
> +    if ( ret )
> +        goto out;
> +
> +    ret = pmem_list_add(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
> +    if ( ret )
> +        goto out;
> +
> +    printk(XENLOG_INFO
> +           "pmem: pfns     0x%lx - 0x%lx\n"
> +           "      reserved 0x%lx - 0x%lx\n"
> +           "      data     0x%lx - 0x%lx\n",
> +           spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
> +
> + out:
> +    return ret;
> +}
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 5c0f527..b1f92f6 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1474,6 +1474,60 @@ destroy_frametable:
>      return ret;
>  }
>  
> +int pmem_setup(unsigned long spfn, unsigned long epfn,
> +               unsigned long rsv_spfn, unsigned long rsv_epfn,
> +               unsigned long data_spfn, unsigned long data_epfn)
> +{
> +    unsigned old_max = max_page, old_total = total_pages;
> +    struct mem_hotadd_info info =
> +        { .spfn = spfn, .epfn = epfn, .cur = spfn };
> +    struct mem_hotadd_info rsv_info =
> +        { .spfn = rsv_spfn, .epfn = rsv_epfn, .cur = rsv_spfn };
> +    int ret;
> +    unsigned long i;
> +    struct page_info *pg;
> +
> +    if ( !mem_hotadd_check(spfn, epfn) )
> +        return -EINVAL;
> +
> +    ret = extend_frame_table(&info, &rsv_info);

Aah, that is why you needed this extra piece.

> +    if ( ret )
> +        goto destroy_frametable;
> +
> +    if ( max_page < epfn )
> +    {
> +        max_page = epfn;
> +        max_pdx = pfn_to_pdx(max_page - 1) + 1;
> +    }
> +    total_pages += epfn - spfn;
> +
> +    set_pdx_range(spfn, epfn);
> +    ret = setup_m2p_table(&info, &rsv_info);
> +    if ( ret )
> +        goto destroy_m2p;
> +
> +    share_hotadd_m2p_table(&info);
> +
> +    for ( i = spfn; i < epfn; i++ )
> +    {
> +        pg = mfn_to_page(i);
> +        pg->count_info = (rsv_spfn <= i && i < rsv_info.cur) ?
> +                         PGC_state_inuse : PGC_state_free;
> +    }
> +

What about iommu_map_page calls to make it possible for dom0 to
do DMA operations to this area?

> +    return 0;
> +
> +destroy_m2p:
> +    destroy_m2p_mapping(&info);
> +    max_page = old_max;
> +    total_pages = old_total;
> +    max_pdx = pfn_to_pdx(max_page - 1) + 1;
> +destroy_frametable:
> +    cleanup_frame_table(&info);
> +
> +    return ret;
> +}
> +
>  #include "compat/mm.c"
>  
>  /*
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index b781495..e31f1c8 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -597,4 +597,8 @@ typedef struct mm_rwlock {
>  
>  extern const char zero_page[];
>  
> +int pmem_setup(unsigned long spfn, unsigned long epfn,
> +               unsigned long rsv_spfn, unsigned long rsv_epfn,
> +               unsigned long data_spfn, unsigned long data_epfn);
> +
>  #endif /* __ASM_X86_MM_H__ */
> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> index 1e6a6ce..c7e7cce 100644
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -608,6 +608,19 @@ struct xenpf_symdata {
>  typedef struct xenpf_symdata xenpf_symdata_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
>  
> +#define XENPF_pmem_add     64
> +struct xenpf_pmem_add {
> +    /* IN variables */
> +    uint64_t spfn;      /* start PFN of the whole pmem region */
> +    uint64_t epfn;      /* end PFN of the whole pmem region */
> +    uint64_t rsv_spfn;  /* start PFN of the reserved area within the region */
> +    uint64_t rsv_epfn;  /* end PFN of the reserved area within the region */

Could you include (perhaps above the hypercall) an explanation
what 'reserved' and 'data' is? And can these values be zero? Say spfn ==
data_spfn  and epfn == data_epfn?

> +    uint64_t data_spfn; /* start PFN of the data area within the region */
> +    uint64_t data_epfn; /* end PFN of the data area within the region */
> +};
> +typedef struct xenpf_pmem_add xenpf_pmem_add_t;
> +DEFINE_XEN_GUEST_HANDLE(xenpf_pmem_add_t);
> +
>  /*
>   * ` enum neg_errnoval
>   * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
> @@ -638,6 +651,7 @@ struct xen_platform_op {
>          struct xenpf_core_parking      core_parking;
>          struct xenpf_resource_op       resource_op;
>          struct xenpf_symdata           symdata;
> +        struct xenpf_pmem_add          pmem_add;
>          uint8_t                        pad[128];
>      } u;
>  };
> diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
> new file mode 100644
> index 0000000..a670ab8
> --- /dev/null
> +++ b/xen/include/xen/pmem.h
> @@ -0,0 +1,31 @@
> +/*
> + * xen/include/xen/pmem.h
> + *
> + * Copyright (c) 2016, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.

This '(at your option)' is for Intel lawyers to decide on. Could you
make sure you include what version they would be comfortable with?

> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
> + */
> +
> +#ifndef __XEN_PMEM_H__
> +#define __XEN_PMEM_H__
> +
> +#include <xen/types.h>
> +
> +int pmem_add(unsigned long spfn, unsigned long epfn,
> +             unsigned long rsv_spfn, unsigned long rsv_epfn,
> +             unsigned long data_spfn, unsigned long data_epfn);
> +
> +#endif /* __XEN_PMEM_H__ */
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 177c11f..948a161 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1360,6 +1360,7 @@ static int flask_platform_op(uint32_t op)
>      case XENPF_cpu_offline:
>      case XENPF_cpu_hotadd:
>      case XENPF_mem_hotadd:
> +    case XENPF_pmem_add:

Thanks for looking at XSM, but I think you missed the comment above:

/* These operations have their own XSM hooks */

which means that platform_hypercall.c should have an call to
xsm_resource_plug_core(XSM_HOOK) call. Or something equivalant.

>          return 0;
>  #endif
>  
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest
  2016-10-10  0:32 ` [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest Haozhong Zhang
@ 2016-12-09 22:22   ` Konrad Rzeszutek Wilk
  2016-12-12  4:38     ` Haozhong Zhang
  2016-12-22 12:19   ` Jan Beulich
  1 sibling, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-12-09 22:22 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Xiao Guangrong, Andrew Cooper, Ian Jackson, xen-devel,
	Jan Beulich, Wei Liu

On Mon, Oct 10, 2016 at 08:32:23AM +0800, Haozhong Zhang wrote:
> XENMEM_populate_pmemmap is used by toolstack to map given host pmem pages
> to given guest pages. Only pages in the data area of a pmem region are
> allowed to be mapped to guest.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxc/include/xenctrl.h |   8 +++
>  tools/libxc/xc_domain.c       |  14 +++++
>  xen/arch/x86/pmem.c           | 123 ++++++++++++++++++++++++++++++++++++++++++
>  xen/common/domain.c           |   3 ++
>  xen/common/memory.c           |  31 +++++++++++
>  xen/include/public/memory.h   |  14 ++++-
>  xen/include/xen/pmem.h        |  10 ++++
>  xen/include/xen/sched.h       |   3 ++
>  8 files changed, 205 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 2c83544..46c71fc 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2710,6 +2710,14 @@ int xc_livepatch_revert(xc_interface *xch, char *name, uint32_t timeout);
>  int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
>  int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
>  
> +/**
> + * Map host pmem pages at PFNs @mfn ~ (@mfn + @nr_mfns - 1) to
> + * guest physical pages at guest PFNs @gpfn ~ (@gpfn + @nr_mfns - 1)
> + */
> +int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
> +                               xen_pfn_t mfn, xen_pfn_t gpfn,
> +                               unsigned int nr_mfns);
> +
>  /* Compat shims */
>  #include "xenctrl_compat.h"
>  
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 296b852..81a90a1 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -2520,6 +2520,20 @@ int xc_domain_soft_reset(xc_interface *xch,
>      domctl.domain = (domid_t)domid;
>      return do_domctl(xch, &domctl);
>  }
> +
> +int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
> +                               xen_pfn_t mfn, xen_pfn_t gpfn,
> +                               unsigned int nr_mfns)
> +{
> +    struct xen_pmemmap pmemmap = {
> +        .domid   = domid,
> +        .mfn     = mfn,
> +        .gpfn    = gpfn,
> +        .nr_mfns = nr_mfns,
> +    };
> +    return do_memory_op(xch, XENMEM_populate_pmemmap, &pmemmap, sizeof(pmemmap));
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
> index 70358ed..e4dc685 100644
> --- a/xen/arch/x86/pmem.c
> +++ b/xen/arch/x86/pmem.c
> @@ -24,6 +24,9 @@
>  #include <xen/spinlock.h>
>  #include <xen/pmem.h>
>  #include <xen/iocap.h>
> +#include <xen/sched.h>
> +#include <xen/event.h>
> +#include <xen/paging.h>
>  #include <asm-x86/mm.h>
>  
>  /*
> @@ -63,6 +66,48 @@ static int check_reserved_size(unsigned long rsv_mfns, unsigned long total_mfns)
>          ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
>  }
>  
> +static int is_data_mfn(unsigned long mfn)

bool
> +{
> +    struct list_head *cur;
> +    int data = 0;
> +
> +    ASSERT(spin_is_locked(&pmem_list_lock));
> +
> +    list_for_each(cur, &pmem_list)
> +    {
> +        struct pmem *pmem = list_entry(cur, struct pmem, link);
> +
> +        if ( pmem->data_spfn <= mfn && mfn < pmem->data_epfn )

You may want to change the first conditional to have 'mfn' on the left
side. And perhaps change 'mfn' to 'pfn' as that is what your structure
is called?

But ... maybe the #3 patch that introduces XENPF_pmem_add should
use 'data_smfn', 'data_emfn' and so on?

> +        {
> +            data = 1;
> +            break;
> +        }
> +    }
> +
> +    return data;
> +}
> +
> +static int pmem_page_valid(struct page_info *page, struct domain *d)

bool
> +{
> +    /* only data area can be mapped to guest */
> +    if ( !is_data_mfn(page_to_mfn(page)) )
> +    {
> +        dprintk(XENLOG_DEBUG, "pmem: mfn 0x%lx is not a pmem data page\n",
> +                page_to_mfn(page));
> +        return 0;
> +    }
> +
> +    /* inuse/offlined/offlining pmem page cannot be mapped to guest */
> +    if ( !page_state_is(page, free) )
> +    {
> +        dprintk(XENLOG_DEBUG, "pmem: invalid page state of mfn 0x%lx: 0x%lx\n",
> +                page_to_mfn(page), page->count_info & PGC_state);
> +        return 0;
> +    }
> +
> +    return 1;
> +}
> +
>  static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>                            unsigned long rsv_spfn, unsigned long rsv_epfn,
>                            unsigned long data_spfn, unsigned long data_epfn)
> @@ -159,3 +204,81 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
>   out:
>      return ret;
>  }
> +
> +static int pmem_assign_pages(struct domain *d,
> +                             struct page_info *pg, unsigned int order)
> +{
> +    int rc = 0;
> +    unsigned long i;
> +
> +    spin_lock(&d->pmem_lock);
> +
> +    if ( unlikely(d->is_dying) )
> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    for ( i = 0; i < (1 << order); i++ )
> +    {
> +        ASSERT(page_get_owner(&pg[i]) == NULL);
> +        ASSERT((pg[i].count_info & ~(PGC_allocated | 1)) == 0);
> +        page_set_owner(&pg[i], d);
> +        smp_wmb();

Why here? Why not after the count_info is set?

> +        pg[i].count_info = PGC_allocated | 1;
> +        page_list_add_tail(&pg[i], &d->pmem_page_list);
> +    }
> +
> + out:
> +    spin_unlock(&d->pmem_lock);
> +    return rc;
> +}
> +
> +int pmem_populate(struct xen_pmemmap_args *args)
> +{
> +    struct domain *d = args->domain;
> +    unsigned long i, mfn, gpfn;
> +    struct page_info *page;
> +    int rc = 0;
> +
> +    if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
> +        return -EINVAL;
> +
> +    for ( i = args->nr_done, mfn = args->mfn + i, gpfn = args->gpfn + i;
> +          i < args->nr_mfns;
> +          i++, mfn++, gpfn++ )
> +    {
> +        if ( i != args->nr_done && hypercall_preempt_check() )
> +        {
> +            args->preempted = 1;
> +            goto out;
> +        }
> +
> +        page = mfn_to_page(mfn);
> +
> +        spin_lock(&pmem_list_lock);
> +        if ( !pmem_page_valid(page, d) )
> +        {
> +            dprintk(XENLOG_DEBUG, "pmem: MFN 0x%lx not a valid pmem page\n", mfn);
> +            spin_unlock(&pmem_list_lock);
> +            rc = -EINVAL;
> +            goto out;
> +        }
> +        page->count_info = PGC_state_inuse;

No test_and_set_bit ?

> +        spin_unlock(&pmem_list_lock);
> +
> +        page->u.inuse.type_info = 0;
> +
> +        guest_physmap_add_page(d, _gfn(gpfn), _mfn(mfn), 0);
> +        if ( pmem_assign_pages(d, page, 0) )
> +        {
> +            guest_physmap_remove_page(d, _gfn(gpfn), _mfn(mfn), 0);

Don't you also need to do something about PGC_state_inuse ?
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +    }
> +
> + out:
> +    args->nr_done = i;
> +    return rc;
> +}
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 3abaca9..8192548 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -288,6 +288,9 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
>      INIT_PAGE_LIST_HEAD(&d->page_list);
>      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
>  
> +    spin_lock_init_prof(d, pmem_lock);
> +    INIT_PAGE_LIST_HEAD(&d->pmem_page_list);
> +
>      spin_lock_init(&d->node_affinity_lock);
>      d->node_affinity = NODE_MASK_ALL;
>      d->auto_node_affinity = 1;
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 21797ca..09cb1c9 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -24,6 +24,7 @@
>  #include <xen/numa.h>
>  #include <xen/mem_access.h>
>  #include <xen/trace.h>
> +#include <xen/pmem.h>
>  #include <asm/current.h>
>  #include <asm/hardirq.h>
>  #include <asm/p2m.h>
> @@ -1329,6 +1330,36 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      }
>  #endif
>  
> +    case XENMEM_populate_pmemmap:
> +    {
> +        struct xen_pmemmap pmemmap;
> +        struct xen_pmemmap_args args;
> +
> +        if ( copy_from_guest(&pmemmap, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(pmemmap.domid);
> +        if ( !d )
> +            return -EINVAL;
> +
> +        args.domain = d;
> +        args.mfn = pmemmap.mfn;
> +        args.gpfn = pmemmap.gpfn;
> +        args.nr_mfns = pmemmap.nr_mfns;
> +        args.nr_done = start_extent;
> +        args.preempted = 0;
> +
> +        rc = pmem_populate(&args);
> +        rcu_unlock_domain(d);
> +
> +        if ( !rc && args.preempted )

Nice! Glad to see that preemption is there!

> +            return hypercall_create_continuation(
> +                __HYPERVISOR_memory_op, "lh",
> +                op | (args.nr_done << MEMOP_EXTENT_SHIFT), arg);
> +
> +        break;
> +    }
> +
>      default:
>          rc = arch_memory_op(cmd, arg);
>          break;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy
  2016-10-10  0:32 ` [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy Haozhong Zhang
@ 2016-12-09 22:27   ` Konrad Rzeszutek Wilk
  2016-12-12  4:47     ` Haozhong Zhang
  2016-12-22 12:22   ` Jan Beulich
  1 sibling, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-12-09 22:27 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Xiao Guangrong, Jan Beulich, xen-devel

On Mon, Oct 10, 2016 at 08:32:24AM +0800, Haozhong Zhang wrote:
> The host pmem pages mapped to a domain are unassigned at domain destroy
> so as to be used by other domains in future.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/domain.c  |  5 +++++
>  xen/arch/x86/pmem.c    | 41 +++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/pmem.h |  1 +
>  3 files changed, 47 insertions(+)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 1bd5eb6..05ab389 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -61,6 +61,7 @@
>  #include <asm/amd.h>
>  #include <xen/numa.h>
>  #include <xen/iommu.h>
> +#include <xen/pmem.h>
>  #include <compat/vcpu.h>
>  #include <asm/psr.h>
>  
> @@ -2512,6 +2513,10 @@ int domain_relinquish_resources(struct domain *d)
>          if ( ret )
>              return ret;
>  
> +        ret = pmem_teardown(d);
> +        if ( ret )
> +            return ret;

Good, so if ret == -ERESTART it preempts, but..
> +
>          /* Tear down paging-assistance stuff. */
>          ret = paging_teardown(d);
>          if ( ret )
> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
> index e4dc685..50e496b 100644
> --- a/xen/arch/x86/pmem.c
> +++ b/xen/arch/x86/pmem.c
> @@ -282,3 +282,44 @@ int pmem_populate(struct xen_pmemmap_args *args)
>      args->nr_done = i;
>      return rc;
>  }
> +
> +static int pmem_teardown_preemptible(struct domain *d, int *preempted)
> +{
> +    struct page_info *pg, *next;
> +    int rc = 0;
> +
> +    spin_lock(&d->pmem_lock);
> +
> +    page_list_for_each_safe (pg, next, &d->pmem_page_list )
> +    {
> +        BUG_ON(page_get_owner(pg) != d);
> +        BUG_ON(page_state_is(pg, free));
> +
> +        page_list_del(pg, &d->pmem_page_list);
> +        page_set_owner(pg, NULL);
> +        pg->count_info = (pg->count_info & ~PGC_count_mask) | PGC_state_free;
> +
> +        if ( preempted && hypercall_preempt_check() )
> +        {
> +            *preempted = 1;

.. you don't set rc = -ERSTART ?

> +            goto out;
> +        }
> +    }
> +
> + out:
> +    spin_unlock(&d->pmem_lock);
> +    return rc;
> +}
> +
> +int pmem_teardown(struct domain *d)
> +{
> +    int preempted = 0;
> +
> +    ASSERT(d->is_dying);
> +    ASSERT(d != current->domain);
> +
> +    if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
> +        return -EINVAL;
> +
> +    return pmem_teardown_preemptible(d, &preempted);

Not exactly sure what the 'preempted' is for? You don't seem to be
using it here?

Perhaps you meant to do:

  rc = pmem_teardown_preemptible(d, &preempted);
  if ( preempted )
    return -ERESTART;
  return rc;
?

> +}
> diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
> index 60adf56..ffbef1c 100644
> --- a/xen/include/xen/pmem.h
> +++ b/xen/include/xen/pmem.h
> @@ -37,5 +37,6 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
>               unsigned long rsv_spfn, unsigned long rsv_epfn,
>               unsigned long data_spfn, unsigned long data_epfn);
>  int pmem_populate(struct xen_pmemmap_args *args);
> +int pmem_teardown(struct domain *d);
>  
>  #endif /* __XEN_PMEM_H__ */
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table
  2016-12-09 21:35   ` Konrad Rzeszutek Wilk
@ 2016-12-12  2:27     ` Haozhong Zhang
  2016-12-12  8:25       ` Jan Beulich
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-12  2:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Xiao Guangrong, Jan Beulich, xen-devel

On 12/09/16 16:35 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:20AM +0800, Haozhong Zhang wrote:
>> A reserved area on each pmem region is used to place the frame table.
>> However, it's not at the beginning of the pmem region, so we need to
>> specify the location explicitly when extending the frame table.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  xen/arch/x86/x86_64/mm.c | 9 +++++----
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
>> index b8b6b70..33f226a 100644
>> --- a/xen/arch/x86/x86_64/mm.c
>> +++ b/xen/arch/x86/x86_64/mm.c
>> @@ -792,7 +792,8 @@ static int setup_frametable_chunk(void *start, void *end,
>>      return 0;
>>  }
>>
>> -static int extend_frame_table(struct mem_hotadd_info *info)
>> +static int extend_frame_table(struct mem_hotadd_info *info,
>
>This looks like it could be 'const struct mem_hotadd_info *info' ?
>

yes

>> +                              struct mem_hotadd_info *alloc_info)
>>  {
>>      unsigned long cidx, nidx, eidx, spfn, epfn;
>>
>> @@ -818,9 +819,9 @@ static int extend_frame_table(struct mem_hotadd_info *info)
>>          nidx = find_next_bit(pdx_group_valid, eidx, cidx);
>>          if ( nidx >= eidx )
>>              nidx = eidx;
>> -        err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT ),
>> +        err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT),
>>                                       pdx_to_page(nidx * PDX_GROUP_COUNT),
>> -                                     info);
>> +                                     alloc_info);
>
>Granted this one modifies the 'alloc_info' in
>alloc_hotadd_mfn, and 'alloc_info'
>>          if ( err )
>>              return err;
>>
>> @@ -1413,7 +1414,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>>      info.epfn = epfn;
>>      info.cur = spfn;
>>
>> -    ret = extend_frame_table(&info);
>> +    ret = extend_frame_table(&info, &info);
>
>is equivalant to 'info' so I am not sure I understand the purpose
>behind this patch?
>

Yes, they are identical for the ordinary RAM here, and the frame table
is allocated at the begin of the hot-added RAM. For NVDIMM, the
hypervisor does not know which part is used for data, so the second
parameter 'alloc_info' is used to indicate which part can be used for
the frame table, and might be different from 'info'.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table
  2016-12-09 21:38   ` Konrad Rzeszutek Wilk
@ 2016-12-12  2:31     ` Haozhong Zhang
  2016-12-12  8:26       ` Jan Beulich
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-12  2:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Xiao Guangrong, Jan Beulich, xen-devel

On 12/09/16 16:38 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:21AM +0800, Haozhong Zhang wrote:
>> A reserved area on each pmem region is used to place the M2P table.
>> However, it's not at the beginning of the pmem region, so we need to
>> specify the location explicitly when creating the M2P table.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  xen/arch/x86/x86_64/mm.c | 14 ++++++++------
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
>> index 33f226a..5c0f527 100644
>> --- a/xen/arch/x86/x86_64/mm.c
>> +++ b/xen/arch/x86/x86_64/mm.c
>> @@ -317,7 +317,8 @@ void destroy_m2p_mapping(struct mem_hotadd_info *info)
>>   * spfn/epfn: the pfn ranges to be setup
>>   * free_s/free_e: the pfn ranges that is free still
>>   */
>> -static int setup_compat_m2p_table(struct mem_hotadd_info *info)
>> +static int setup_compat_m2p_table(struct mem_hotadd_info *info,
>> +                                  struct mem_hotadd_info *alloc_info)
>>  {
>>      unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn;
>>      unsigned int n;
>> @@ -371,7 +372,7 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
>>          if ( n == CNT )
>>              continue;
>>
>> -        mfn = alloc_hotadd_mfn(info);
>> +        mfn = alloc_hotadd_mfn(alloc_info);
>>          err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER,
>>                                 PAGE_HYPERVISOR);
>>          if ( err )
>> @@ -391,7 +392,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info)
>>   * Allocate and map the machine-to-phys table.
>>   * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup already
>>   */
>> -static int setup_m2p_table(struct mem_hotadd_info *info)
>> +static int setup_m2p_table(struct mem_hotadd_info *info,
>> +                           struct mem_hotadd_info *alloc_info)
>>  {
>>      unsigned long i, va, smap, emap;
>>      unsigned int n;
>> @@ -440,7 +442,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
>>                  break;
>>          if ( n < CNT )
>>          {
>> -            unsigned long mfn = alloc_hotadd_mfn(info);
>> +            unsigned long mfn = alloc_hotadd_mfn(alloc_info);
>>
>>              ret = map_pages_to_xen(
>>                          RDWR_MPT_VIRT_START + i * sizeof(unsigned long),
>> @@ -485,7 +487,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
>>  #undef CNT
>>  #undef MFN
>>
>> -    ret = setup_compat_m2p_table(info);
>> +    ret = setup_compat_m2p_table(info, alloc_info);
>>  error:
>>      return ret;
>>  }
>> @@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>>      total_pages += epfn - spfn;
>>
>>      set_pdx_range(spfn, epfn);
>> -    ret = setup_m2p_table(&info);
>> +    ret = setup_m2p_table(&info, &info);
>
>I am not sure I follow this logic. You are passing the same contents, it
>is just that 'alloc_info' and 'info' are aliased together?
>

Similarly to patch 1, the two parameters of setup_m2p_table() are
identical for the ordinary RAM, and can be different for NVDIMM.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-12-09 22:02   ` Konrad Rzeszutek Wilk
@ 2016-12-12  4:16     ` Haozhong Zhang
  2016-12-12  8:30       ` Jan Beulich
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-12  4:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Daniel De Graaf, Xiao Guangrong, Jan Beulich, xen-devel

On 12/09/16 17:02 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
>> Xen hypervisor does not include a pmem driver. Instead, it relies on the
>> pmem driver in Dom0 to report the PFN ranges of the entire pmem region,
>> its reserved area and data area via XENPF_pmem_add. The reserved area is
>> used by Xen hypervisor to place the frame table and M2P table, and is
>> disallowed to be accessed from Dom0 once it's reported.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
>> ---
>>  xen/arch/x86/Makefile             |   1 +
>>  xen/arch/x86/platform_hypercall.c |   7 ++
>>  xen/arch/x86/pmem.c               | 161 ++++++++++++++++++++++++++++++++++++++
>>  xen/arch/x86/x86_64/mm.c          |  54 +++++++++++++
>>  xen/include/asm-x86/mm.h          |   4 +
>>  xen/include/public/platform.h     |  14 ++++
>>  xen/include/xen/pmem.h            |  31 ++++++++
>>  xen/xsm/flask/hooks.c             |   1 +
>>  8 files changed, 273 insertions(+)
>>  create mode 100644 xen/arch/x86/pmem.c
>>  create mode 100644 xen/include/xen/pmem.h
>>
>> diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
>> index 931917d..9cf2da1 100644
>> --- a/xen/arch/x86/Makefile
>> +++ b/xen/arch/x86/Makefile
>> @@ -67,6 +67,7 @@ obj-$(CONFIG_TBOOT) += tboot.o
>>  obj-y += hpet.o
>>  obj-y += vm_event.o
>>  obj-y += xstate.o
>> +obj-y += pmem.o
>
>If possible please keep this alphabetical. Also I wonder if it makes
>sense to have CONFIG_PMEM or such?
>

I'll try to add CONFIG_PMEM in the next version.

>>
>>  x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
>>
>> diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
>> index 0879e19..c47eea4 100644
>> --- a/xen/arch/x86/platform_hypercall.c
>> +++ b/xen/arch/x86/platform_hypercall.c
>> @@ -24,6 +24,7 @@
>>  #include <xen/pmstat.h>
>>  #include <xen/irq.h>
>>  #include <xen/symbols.h>
>> +#include <xen/pmem.h>
>>  #include <asm/current.h>
>>  #include <public/platform.h>
>>  #include <acpi/cpufreq/processor_perf.h>
>> @@ -822,6 +823,12 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>      }
>>      break;
>>
>> +    case XENPF_pmem_add:
>
>Missing call to ret = xsm_resource_plug_core(XSM_HOOK);
>or something similar .
>

I'll look into if xsm_resource_plug_core() applies here, or another
xsm hook is needed.

>> +        ret = pmem_add(op->u.pmem_add.spfn, op->u.pmem_add.epfn,
>> +                       op->u.pmem_add.rsv_spfn, op->u.pmem_add.rsv_epfn,
>> +                       op->u.pmem_add.data_spfn, op->u.pmem_add.data_epfn);
>> +        break;
>> +
>>      default:
>>          ret = -ENOSYS;
>>          break;
>> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
>> new file mode 100644
>> index 0000000..70358ed
>> --- /dev/null
>> +++ b/xen/arch/x86/pmem.c
>> @@ -0,0 +1,161 @@
>> +/******************************************************************************
>> + * arch/x86/pmem.c
>> + *
>> + * Copyright (c) 2016, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>
>Hm, please consult Intel lawyers with what '(at your option)' what other
>later versions they are comfortable with.
>

I just copied the license statement from other files and didn't notice
the exact content. I'll check whether the license statement applies.

>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
>> + */
>> +
>> +#include <xen/guest_access.h>
>> +#include <xen/list.h>
>> +#include <xen/spinlock.h>
>> +#include <xen/pmem.h>
>
>Since this is a new file could I ask you sort these alphabetically?
>

sure

>> +#include <xen/iocap.h>
>> +#include <asm-x86/mm.h>
>> +
>> +/*
>> + * All pmem regions reported from Dom0 are linked in pmem_list, which
>> + * is proected by pmem_list_lock. Its entries are of type struct pmem
>
>protected

will fix

>> + * and sorted incrementally by field spa.
>> + */
>> +static DEFINE_SPINLOCK(pmem_list_lock);
>> +static LIST_HEAD(pmem_list);
>> +
>> +struct pmem {
>> +    struct list_head link;   /* link to pmem_list */
>> +    unsigned long spfn;      /* start PFN of the whole pmem region */
>> +    unsigned long epfn;      /* end PFN of the whole pmem region */
>> +    unsigned long rsv_spfn;  /* start PFN of the reserved area */
>> +    unsigned long rsv_epfn;  /* end PFN of the reserved area */
>> +    unsigned long data_spfn; /* start PFN of the data area */
>> +    unsigned long data_epfn; /* end PFN of the data area */
>
>Why not just:
>struct pmem {
>	struct list_head link;
>	struct xenpf_pmem_add pmem;
>}
>
>or such?
>

Yes, it looks more neat.

>> +};
>> +
>> +static int is_included(unsigned long s1, unsigned long e1,
>
>bool?

Yes

>> +                       unsigned long s2, unsigned long e2)
>> +{
>> +    return s1 <= s2 && s2 < e2 && e2 <= e1;
>
>Is the s2 < e2 necessary?
>

No, I'll remove it.

>> +}
>> +
>> +static int is_overlaped(unsigned long s1, unsigned long e1,
>
>overlapped and perhaps bool?
>

Yes

>> +                        unsigned long s2, unsigned long e2)
>> +{
>> +    return (s1 <= s2 && s2 < e1) || (s2 < s1 && s1 < e2);
>> +}
>> +
>> +static int check_reserved_size(unsigned long rsv_mfns, unsigned long total_mfns)
>
>bool?

ditto

>> +{
>> +    return rsv_mfns >=
>> +        ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
>> +        ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
>> +}
>> +
>> +static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>
>bool?

ditto

>> +                          unsigned long rsv_spfn, unsigned long rsv_epfn,
>> +                          unsigned long data_spfn, unsigned long data_epfn)
>> +{
>> +    if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
>> +        return 0;
>
>Hm, I think it ought to be possible to have no rsv area..?
>

A reserved area must be provided to storing frametable and M2P table of NVDIMM.

>> +
>> +    if ( !is_included(spfn, epfn, rsv_spfn, rsv_epfn) ||
>> +         !is_included(spfn, epfn, data_spfn, data_epfn) )
>> +        return 0;
>> +
>> +    if ( is_overlaped(rsv_spfn, rsv_epfn, data_spfn, data_epfn) )
>> +        return 0;
>> +
>> +    if ( !check_reserved_size(rsv_epfn - rsv_spfn, epfn - spfn) )
>> +        return 0;
>> +
>> +    return 1;
>> +}
>> +
>> +static int pmem_list_add(unsigned long spfn, unsigned long epfn,
>> +                         unsigned long rsv_spfn, unsigned long rsv_epfn,
>> +                         unsigned long data_spfn, unsigned long data_epfn)
>> +{
>> +    struct list_head *cur;
>> +    struct pmem *new_pmem;
>> +    int ret = 0;
>> +
>> +    spin_lock(&pmem_list_lock);
>> +
>> +    list_for_each_prev(cur, &pmem_list)
>> +    {
>> +        struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
>> +        unsigned long cur_spfn = cur_pmem->spfn;
>> +        unsigned long cur_epfn = cur_pmem->epfn;
>> +
>> +        if ( (cur_spfn <= spfn && spfn < cur_epfn) ||
>> +             (spfn <= cur_spfn && cur_spfn < epfn) )
>> +        {
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        if ( cur_spfn < spfn )
>> +            break;
>> +    }
>> +
>> +    new_pmem = xmalloc(struct pmem);
>> +    if ( !new_pmem )
>> +    {
>> +        ret = -ENOMEM;
>> +        goto out;
>> +    }
>> +    new_pmem->spfn      = spfn;
>> +    new_pmem->epfn      = epfn;
>> +    new_pmem->rsv_spfn  = rsv_spfn;
>> +    new_pmem->rsv_epfn  = rsv_epfn;
>> +    new_pmem->data_spfn = data_spfn;
>> +    new_pmem->data_epfn = data_epfn;
>> +    list_add(&new_pmem->link, cur);
>> +
>> + out:
>> +    spin_unlock(&pmem_list_lock);
>> +    return ret;
>> +}
>> +
>> +int pmem_add(unsigned long spfn, unsigned long epfn,
>> +             unsigned long rsv_spfn, unsigned long rsv_epfn,
>> +             unsigned long data_spfn, unsigned long data_epfn)
>> +{
>> +    int ret;
>> +
>> +    if ( !pmem_add_check(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn) )
>> +        return -EINVAL;
>> +
>> +    ret = pmem_setup(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
>> +    if ( ret )
>> +        goto out;
>> +
>> +    ret = iomem_deny_access(current->domain, rsv_spfn, rsv_epfn);
>> +    if ( ret )
>> +        goto out;
>> +
>> +    ret = pmem_list_add(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
>> +    if ( ret )
>> +        goto out;
>> +
>> +    printk(XENLOG_INFO
>> +           "pmem: pfns     0x%lx - 0x%lx\n"
>> +           "      reserved 0x%lx - 0x%lx\n"
>> +           "      data     0x%lx - 0x%lx\n",
>> +           spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
>> +
>> + out:
>> +    return ret;
>> +}
>> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
>> index 5c0f527..b1f92f6 100644
>> --- a/xen/arch/x86/x86_64/mm.c
>> +++ b/xen/arch/x86/x86_64/mm.c
>> @@ -1474,6 +1474,60 @@ destroy_frametable:
>>      return ret;
>>  }
>>
>> +int pmem_setup(unsigned long spfn, unsigned long epfn,
>> +               unsigned long rsv_spfn, unsigned long rsv_epfn,
>> +               unsigned long data_spfn, unsigned long data_epfn)
>> +{
>> +    unsigned old_max = max_page, old_total = total_pages;
>> +    struct mem_hotadd_info info =
>> +        { .spfn = spfn, .epfn = epfn, .cur = spfn };
>> +    struct mem_hotadd_info rsv_info =
>> +        { .spfn = rsv_spfn, .epfn = rsv_epfn, .cur = rsv_spfn };
>> +    int ret;
>> +    unsigned long i;
>> +    struct page_info *pg;
>> +
>> +    if ( !mem_hotadd_check(spfn, epfn) )
>> +        return -EINVAL;
>> +
>> +    ret = extend_frame_table(&info, &rsv_info);
>
>Aah, that is why you needed this extra piece.
>
>> +    if ( ret )
>> +        goto destroy_frametable;
>> +
>> +    if ( max_page < epfn )
>> +    {
>> +        max_page = epfn;
>> +        max_pdx = pfn_to_pdx(max_page - 1) + 1;
>> +    }
>> +    total_pages += epfn - spfn;
>> +
>> +    set_pdx_range(spfn, epfn);
>> +    ret = setup_m2p_table(&info, &rsv_info);
>> +    if ( ret )
>> +        goto destroy_m2p;
>> +
>> +    share_hotadd_m2p_table(&info);
>> +
>> +    for ( i = spfn; i < epfn; i++ )
>> +    {
>> +        pg = mfn_to_page(i);
>> +        pg->count_info = (rsv_spfn <= i && i < rsv_info.cur) ?
>> +                         PGC_state_inuse : PGC_state_free;
>> +    }
>> +
>
>What about iommu_map_page calls to make it possible for dom0 to
>do DMA operations to this area?
>

I'll look at this.

>> +    return 0;
>> +
>> +destroy_m2p:
>> +    destroy_m2p_mapping(&info);
>> +    max_page = old_max;
>> +    total_pages = old_total;
>> +    max_pdx = pfn_to_pdx(max_page - 1) + 1;
>> +destroy_frametable:
>> +    cleanup_frame_table(&info);
>> +
>> +    return ret;
>> +}
>> +
>>  #include "compat/mm.c"
>>
>>  /*
>> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
>> index b781495..e31f1c8 100644
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -597,4 +597,8 @@ typedef struct mm_rwlock {
>>
>>  extern const char zero_page[];
>>
>> +int pmem_setup(unsigned long spfn, unsigned long epfn,
>> +               unsigned long rsv_spfn, unsigned long rsv_epfn,
>> +               unsigned long data_spfn, unsigned long data_epfn);
>> +
>>  #endif /* __ASM_X86_MM_H__ */
>> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
>> index 1e6a6ce..c7e7cce 100644
>> --- a/xen/include/public/platform.h
>> +++ b/xen/include/public/platform.h
>> @@ -608,6 +608,19 @@ struct xenpf_symdata {
>>  typedef struct xenpf_symdata xenpf_symdata_t;
>>  DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
>>
>> +#define XENPF_pmem_add     64
>> +struct xenpf_pmem_add {
>> +    /* IN variables */
>> +    uint64_t spfn;      /* start PFN of the whole pmem region */
>> +    uint64_t epfn;      /* end PFN of the whole pmem region */
>> +    uint64_t rsv_spfn;  /* start PFN of the reserved area within the region */
>> +    uint64_t rsv_epfn;  /* end PFN of the reserved area within the region */
>
>Could you include (perhaps above the hypercall) an explanation
>what 'reserved' and 'data' is? And can these values be zero? Say spfn ==
>data_spfn  and epfn == data_epfn?
>

The data area is the area that does not contain management structures
of either Xen or the Dom0 kernel, and can be mapped to DomU. Its size
cannot be zero.

The reserved area is the area that used by Xen to store the management
structures (frametable and M2P table) of NVDIMM. Its size cannot be
zero, i.e. rsv_epfn > rsv_spfn.

[data_spfn, data_epfn] and [rsv_spfn, rsv_epfn] must be disjoint and
included in [spfn, epfn].

>> +    uint64_t data_spfn; /* start PFN of the data area within the region */
>> +    uint64_t data_epfn; /* end PFN of the data area within the region */
>> +};
>> +typedef struct xenpf_pmem_add xenpf_pmem_add_t;
>> +DEFINE_XEN_GUEST_HANDLE(xenpf_pmem_add_t);
>> +
>>  /*
>>   * ` enum neg_errnoval
>>   * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
>> @@ -638,6 +651,7 @@ struct xen_platform_op {
>>          struct xenpf_core_parking      core_parking;
>>          struct xenpf_resource_op       resource_op;
>>          struct xenpf_symdata           symdata;
>> +        struct xenpf_pmem_add          pmem_add;
>>          uint8_t                        pad[128];
>>      } u;
>>  };
>> diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
>> new file mode 100644
>> index 0000000..a670ab8
>> --- /dev/null
>> +++ b/xen/include/xen/pmem.h
>> @@ -0,0 +1,31 @@
>> +/*
>> + * xen/include/xen/pmem.h
>> + *
>> + * Copyright (c) 2016, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>
>This '(at your option)' is for Intel lawyers to decide on. Could you
>make sure you include what version they would be comfortable with?
>
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
>> + */
>> +
>> +#ifndef __XEN_PMEM_H__
>> +#define __XEN_PMEM_H__
>> +
>> +#include <xen/types.h>
>> +
>> +int pmem_add(unsigned long spfn, unsigned long epfn,
>> +             unsigned long rsv_spfn, unsigned long rsv_epfn,
>> +             unsigned long data_spfn, unsigned long data_epfn);
>> +
>> +#endif /* __XEN_PMEM_H__ */
>> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
>> index 177c11f..948a161 100644
>> --- a/xen/xsm/flask/hooks.c
>> +++ b/xen/xsm/flask/hooks.c
>> @@ -1360,6 +1360,7 @@ static int flask_platform_op(uint32_t op)
>>      case XENPF_cpu_offline:
>>      case XENPF_cpu_hotadd:
>>      case XENPF_mem_hotadd:
>> +    case XENPF_pmem_add:
>
>Thanks for looking at XSM, but I think you missed the comment above:
>
>/* These operations have their own XSM hooks */
>
>which means that platform_hypercall.c should have an call to
>xsm_resource_plug_core(XSM_HOOK) call. Or something equivalant.
>

I'll look at these xsm hooks.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest
  2016-12-09 22:22   ` Konrad Rzeszutek Wilk
@ 2016-12-12  4:38     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-12  4:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Xiao Guangrong, Andrew Cooper, Ian Jackson, xen-devel,
	Jan Beulich, Wei Liu

On 12/09/16 17:22 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:23AM +0800, Haozhong Zhang wrote:
>> XENMEM_populate_pmemmap is used by toolstack to map given host pmem pages
>> to given guest pages. Only pages in the data area of a pmem region are
>> allowed to be mapped to guest.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  tools/libxc/include/xenctrl.h |   8 +++
>>  tools/libxc/xc_domain.c       |  14 +++++
>>  xen/arch/x86/pmem.c           | 123 ++++++++++++++++++++++++++++++++++++++++++
>>  xen/common/domain.c           |   3 ++
>>  xen/common/memory.c           |  31 +++++++++++
>>  xen/include/public/memory.h   |  14 ++++-
>>  xen/include/xen/pmem.h        |  10 ++++
>>  xen/include/xen/sched.h       |   3 ++
>>  8 files changed, 205 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 2c83544..46c71fc 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -2710,6 +2710,14 @@ int xc_livepatch_revert(xc_interface *xch, char *name, uint32_t timeout);
>>  int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
>>  int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
>>
>> +/**
>> + * Map host pmem pages at PFNs @mfn ~ (@mfn + @nr_mfns - 1) to
>> + * guest physical pages at guest PFNs @gpfn ~ (@gpfn + @nr_mfns - 1)
>> + */
>> +int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
>> +                               xen_pfn_t mfn, xen_pfn_t gpfn,
>> +                               unsigned int nr_mfns);
>> +
>>  /* Compat shims */
>>  #include "xenctrl_compat.h"
>>
>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>> index 296b852..81a90a1 100644
>> --- a/tools/libxc/xc_domain.c
>> +++ b/tools/libxc/xc_domain.c
>> @@ -2520,6 +2520,20 @@ int xc_domain_soft_reset(xc_interface *xch,
>>      domctl.domain = (domid_t)domid;
>>      return do_domctl(xch, &domctl);
>>  }
>> +
>> +int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
>> +                               xen_pfn_t mfn, xen_pfn_t gpfn,
>> +                               unsigned int nr_mfns)
>> +{
>> +    struct xen_pmemmap pmemmap = {
>> +        .domid   = domid,
>> +        .mfn     = mfn,
>> +        .gpfn    = gpfn,
>> +        .nr_mfns = nr_mfns,
>> +    };
>> +    return do_memory_op(xch, XENMEM_populate_pmemmap, &pmemmap, sizeof(pmemmap));
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
>> index 70358ed..e4dc685 100644
>> --- a/xen/arch/x86/pmem.c
>> +++ b/xen/arch/x86/pmem.c
>> @@ -24,6 +24,9 @@
>>  #include <xen/spinlock.h>
>>  #include <xen/pmem.h>
>>  #include <xen/iocap.h>
>> +#include <xen/sched.h>
>> +#include <xen/event.h>
>> +#include <xen/paging.h>
>>  #include <asm-x86/mm.h>
>>
>>  /*
>> @@ -63,6 +66,48 @@ static int check_reserved_size(unsigned long rsv_mfns, unsigned long total_mfns)
>>          ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
>>  }
>>
>> +static int is_data_mfn(unsigned long mfn)
>
>bool

will change

>> +{
>> +    struct list_head *cur;
>> +    int data = 0;
>> +
>> +    ASSERT(spin_is_locked(&pmem_list_lock));
>> +
>> +    list_for_each(cur, &pmem_list)
>> +    {
>> +        struct pmem *pmem = list_entry(cur, struct pmem, link);
>> +
>> +        if ( pmem->data_spfn <= mfn && mfn < pmem->data_epfn )
>
>You may want to change the first conditional to have 'mfn' on the left
>side. And perhaps change 'mfn' to 'pfn' as that is what your structure
>is called?
>
>But ... maybe the #3 patch that introduces XENPF_pmem_add should
>use 'data_smfn', 'data_emfn' and so on?
>

Ok, I'll change to mfn. I used pfn because I saw XENPF_mem_hotadd and
memory_add() uses pfn (though memory_add() uses them as mfn).

>> +        {
>> +            data = 1;
>> +            break;
>> +        }
>> +    }
>> +
>> +    return data;
>> +}
>> +
>> +static int pmem_page_valid(struct page_info *page, struct domain *d)
>
>bool

will change

>> +{
>> +    /* only data area can be mapped to guest */
>> +    if ( !is_data_mfn(page_to_mfn(page)) )
>> +    {
>> +        dprintk(XENLOG_DEBUG, "pmem: mfn 0x%lx is not a pmem data page\n",
>> +                page_to_mfn(page));
>> +        return 0;
>> +    }
>> +
>> +    /* inuse/offlined/offlining pmem page cannot be mapped to guest */
>> +    if ( !page_state_is(page, free) )
>> +    {
>> +        dprintk(XENLOG_DEBUG, "pmem: invalid page state of mfn 0x%lx: 0x%lx\n",
>> +                page_to_mfn(page), page->count_info & PGC_state);
>> +        return 0;
>> +    }
>> +
>> +    return 1;
>> +}
>> +
>>  static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>>                            unsigned long rsv_spfn, unsigned long rsv_epfn,
>>                            unsigned long data_spfn, unsigned long data_epfn)
>> @@ -159,3 +204,81 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
>>   out:
>>      return ret;
>>  }
>> +
>> +static int pmem_assign_pages(struct domain *d,
>> +                             struct page_info *pg, unsigned int order)
>> +{
>> +    int rc = 0;
>> +    unsigned long i;
>> +
>> +    spin_lock(&d->pmem_lock);
>> +
>> +    if ( unlikely(d->is_dying) )
>> +    {
>> +        rc = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    for ( i = 0; i < (1 << order); i++ )
>> +    {
>> +        ASSERT(page_get_owner(&pg[i]) == NULL);
>> +        ASSERT((pg[i].count_info & ~(PGC_allocated | 1)) == 0);
>> +        page_set_owner(&pg[i], d);
>> +        smp_wmb();
>
>Why here? Why not after the count_info is set?
>

I guess I forgot to adapt it when I added the next line. I will move
it after the next line.

>> +        pg[i].count_info = PGC_allocated | 1;
>> +        page_list_add_tail(&pg[i], &d->pmem_page_list);
>> +    }
>> +
>> + out:
>> +    spin_unlock(&d->pmem_lock);
>> +    return rc;
>> +}
>> +
>> +int pmem_populate(struct xen_pmemmap_args *args)
>> +{
>> +    struct domain *d = args->domain;
>> +    unsigned long i, mfn, gpfn;
>> +    struct page_info *page;
>> +    int rc = 0;
>> +
>> +    if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
>> +        return -EINVAL;
>> +
>> +    for ( i = args->nr_done, mfn = args->mfn + i, gpfn = args->gpfn + i;
>> +          i < args->nr_mfns;
>> +          i++, mfn++, gpfn++ )
>> +    {
>> +        if ( i != args->nr_done && hypercall_preempt_check() )
>> +        {
>> +            args->preempted = 1;
>> +            goto out;
>> +        }
>> +
>> +        page = mfn_to_page(mfn);
>> +
>> +        spin_lock(&pmem_list_lock);
>> +        if ( !pmem_page_valid(page, d) )
>> +        {
>> +            dprintk(XENLOG_DEBUG, "pmem: MFN 0x%lx not a valid pmem page\n", mfn);
>> +            spin_unlock(&pmem_list_lock);
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +        page->count_info = PGC_state_inuse;
>
>No test_and_set_bit ?
>

In order to mark a pmem page in pmem_list as PGC_state_inuse,
pmem_populate() must first acquire pmem_list_lock, so
test_and_set_bit() is not needed here.

>> +        spin_unlock(&pmem_list_lock);
>> +
>> +        page->u.inuse.type_info = 0;
>> +
>> +        guest_physmap_add_page(d, _gfn(gpfn), _mfn(mfn), 0);
>> +        if ( pmem_assign_pages(d, page, 0) )
>> +        {
>> +            guest_physmap_remove_page(d, _gfn(gpfn), _mfn(mfn), 0);
>
>Don't you also need to do something about PGC_state_inuse ?

I forgot to remove the PGC_state_inuse flag and will add.

>> +            rc = -EFAULT;
>> +            goto out;
>> +        }
>> +    }
>> +
>> + out:
>> +    args->nr_done = i;
>> +    return rc;
>> +}
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 3abaca9..8192548 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -288,6 +288,9 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
>>      INIT_PAGE_LIST_HEAD(&d->page_list);
>>      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
>>
>> +    spin_lock_init_prof(d, pmem_lock);
>> +    INIT_PAGE_LIST_HEAD(&d->pmem_page_list);
>> +
>>      spin_lock_init(&d->node_affinity_lock);
>>      d->node_affinity = NODE_MASK_ALL;
>>      d->auto_node_affinity = 1;
>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>> index 21797ca..09cb1c9 100644
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -24,6 +24,7 @@
>>  #include <xen/numa.h>
>>  #include <xen/mem_access.h>
>>  #include <xen/trace.h>
>> +#include <xen/pmem.h>
>>  #include <asm/current.h>
>>  #include <asm/hardirq.h>
>>  #include <asm/p2m.h>
>> @@ -1329,6 +1330,36 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      }
>>  #endif
>>
>> +    case XENMEM_populate_pmemmap:
>> +    {
>> +        struct xen_pmemmap pmemmap;
>> +        struct xen_pmemmap_args args;
>> +
>> +        if ( copy_from_guest(&pmemmap, arg, 1) )
>> +            return -EFAULT;
>> +
>> +        d = rcu_lock_domain_by_any_id(pmemmap.domid);
>> +        if ( !d )
>> +            return -EINVAL;
>> +
>> +        args.domain = d;
>> +        args.mfn = pmemmap.mfn;
>> +        args.gpfn = pmemmap.gpfn;
>> +        args.nr_mfns = pmemmap.nr_mfns;
>> +        args.nr_done = start_extent;
>> +        args.preempted = 0;
>> +
>> +        rc = pmem_populate(&args);
>> +        rcu_unlock_domain(d);
>> +
>> +        if ( !rc && args.preempted )
>
>Nice! Glad to see that preemption is there!
>
>> +            return hypercall_create_continuation(
>> +                __HYPERVISOR_memory_op, "lh",
>> +                op | (args.nr_done << MEMOP_EXTENT_SHIFT), arg);
>> +
>> +        break;
>> +    }
>> +
>>      default:
>>          rc = arch_memory_op(cmd, arg);
>>          break;

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy
  2016-12-09 22:27   ` Konrad Rzeszutek Wilk
@ 2016-12-12  4:47     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-12  4:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Xiao Guangrong, Jan Beulich, xen-devel

On 12/09/16 17:27 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:24AM +0800, Haozhong Zhang wrote:
>> The host pmem pages mapped to a domain are unassigned at domain destroy
>> so as to be used by other domains in future.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  xen/arch/x86/domain.c  |  5 +++++
>>  xen/arch/x86/pmem.c    | 41 +++++++++++++++++++++++++++++++++++++++++
>>  xen/include/xen/pmem.h |  1 +
>>  3 files changed, 47 insertions(+)
>>
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 1bd5eb6..05ab389 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -61,6 +61,7 @@
>>  #include <asm/amd.h>
>>  #include <xen/numa.h>
>>  #include <xen/iommu.h>
>> +#include <xen/pmem.h>
>>  #include <compat/vcpu.h>
>>  #include <asm/psr.h>
>>
>> @@ -2512,6 +2513,10 @@ int domain_relinquish_resources(struct domain *d)
>>          if ( ret )
>>              return ret;
>>
>> +        ret = pmem_teardown(d);
>> +        if ( ret )
>> +            return ret;
>
>Good, so if ret == -ERESTART it preempts, but..
>> +
>>          /* Tear down paging-assistance stuff. */
>>          ret = paging_teardown(d);
>>          if ( ret )
>> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
>> index e4dc685..50e496b 100644
>> --- a/xen/arch/x86/pmem.c
>> +++ b/xen/arch/x86/pmem.c
>> @@ -282,3 +282,44 @@ int pmem_populate(struct xen_pmemmap_args *args)
>>      args->nr_done = i;
>>      return rc;
>>  }
>> +
>> +static int pmem_teardown_preemptible(struct domain *d, int *preempted)
>> +{
>> +    struct page_info *pg, *next;
>> +    int rc = 0;
>> +
>> +    spin_lock(&d->pmem_lock);
>> +
>> +    page_list_for_each_safe (pg, next, &d->pmem_page_list )
>> +    {
>> +        BUG_ON(page_get_owner(pg) != d);
>> +        BUG_ON(page_state_is(pg, free));
>> +
>> +        page_list_del(pg, &d->pmem_page_list);
>> +        page_set_owner(pg, NULL);
>> +        pg->count_info = (pg->count_info & ~PGC_count_mask) | PGC_state_free;
>> +
>> +        if ( preempted && hypercall_preempt_check() )
>> +        {
>> +            *preempted = 1;
>
>.. you don't set rc = -ERSTART ?
>
>> +            goto out;
>> +        }
>> +    }
>> +
>> + out:
>> +    spin_unlock(&d->pmem_lock);
>> +    return rc;
>> +}
>> +
>> +int pmem_teardown(struct domain *d)
>> +{
>> +    int preempted = 0;
>> +
>> +    ASSERT(d->is_dying);
>> +    ASSERT(d != current->domain);
>> +
>> +    if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
>> +        return -EINVAL;
>> +
>> +    return pmem_teardown_preemptible(d, &preempted);
>
>Not exactly sure what the 'preempted' is for? You don't seem to be
>using it here?
>
>Perhaps you meant to do:
>
>  rc = pmem_teardown_preemptible(d, &preempted);
>  if ( preempted )
>    return -ERESTART;
>  return rc;
>?

Yes, I forgot this.

Thanks,
Haozhong

>
>> +}
>> diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
>> index 60adf56..ffbef1c 100644
>> --- a/xen/include/xen/pmem.h
>> +++ b/xen/include/xen/pmem.h
>> @@ -37,5 +37,6 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
>>               unsigned long rsv_spfn, unsigned long rsv_epfn,
>>               unsigned long data_spfn, unsigned long data_epfn);
>>  int pmem_populate(struct xen_pmemmap_args *args);
>> +int pmem_teardown(struct domain *d);
>>
>>  #endif /* __XEN_PMEM_H__ */
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table
  2016-12-12  2:27     ` Haozhong Zhang
@ 2016-12-12  8:25       ` Jan Beulich
  0 siblings, 0 replies; 77+ messages in thread
From: Jan Beulich @ 2016-12-12  8:25 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, xen-devel, Xiao Guangrong

>>> On 12.12.16 at 03:27, <haozhong.zhang@intel.com> wrote:
> On 12/09/16 16:35 -0500, Konrad Rzeszutek Wilk wrote:
>>On Mon, Oct 10, 2016 at 08:32:20AM +0800, Haozhong Zhang wrote:
>>> @@ -1413,7 +1414,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>>>      info.epfn = epfn;
>>>      info.cur = spfn;
>>>
>>> -    ret = extend_frame_table(&info);
>>> +    ret = extend_frame_table(&info, &info);
>>
>>is equivalant to 'info' so I am not sure I understand the purpose
>>behind this patch?
>>
> 
> Yes, they are identical for the ordinary RAM here, and the frame table
> is allocated at the begin of the hot-added RAM. For NVDIMM, the
> hypervisor does not know which part is used for data, so the second
> parameter 'alloc_info' is used to indicate which part can be used for
> the frame table, and might be different from 'info'.

In which case you want to add a comment ahead of that function
clarifying the difference (an perhaps pointing out that the two
may be identical).

Also the commit message should be adjusted a little - this is only
preparation for pmem support, so the wording should reflect that.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table
  2016-12-12  2:31     ` Haozhong Zhang
@ 2016-12-12  8:26       ` Jan Beulich
  2016-12-12  8:35         ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Jan Beulich @ 2016-12-12  8:26 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, xen-devel, Xiao Guangrong

>>> On 12.12.16 at 03:31, <haozhong.zhang@intel.com> wrote:
> On 12/09/16 16:38 -0500, Konrad Rzeszutek Wilk wrote:
>>On Mon, Oct 10, 2016 at 08:32:21AM +0800, Haozhong Zhang wrote:
>>> @@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>>>      total_pages += epfn - spfn;
>>>
>>>      set_pdx_range(spfn, epfn);
>>> -    ret = setup_m2p_table(&info);
>>> +    ret = setup_m2p_table(&info, &info);
>>
>>I am not sure I follow this logic. You are passing the same contents, it
>>is just that 'alloc_info' and 'info' are aliased together?
>>
> 
> Similarly to patch 1, the two parameters of setup_m2p_table() are
> identical for the ordinary RAM, and can be different for NVDIMM.

And the same comments as for patch 1 apply here then.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-12-12  4:16     ` Haozhong Zhang
@ 2016-12-12  8:30       ` Jan Beulich
  2016-12-12  8:38         ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Jan Beulich @ 2016-12-12  8:30 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Daniel De Graaf, xen-devel, Xiao Guangrong

>>> On 12.12.16 at 05:16, <haozhong.zhang@intel.com> wrote:
> On 12/09/16 17:02 -0500, Konrad Rzeszutek Wilk wrote:
>>On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
>>> +static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>>> +                          unsigned long rsv_spfn, unsigned long rsv_epfn,
>>> +                          unsigned long data_spfn, unsigned long data_epfn)
>>> +{
>>> +    if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
>>> +        return 0;
>>
>>Hm, I think it ought to be possible to have no rsv area..?
> 
> A reserved area must be provided to storing frametable and M2P table of 
> NVDIMM.

Is this really "must" rather than just "should", i.e. can't you do
without even if you prefer to have it?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table
  2016-12-12  8:26       ` Jan Beulich
@ 2016-12-12  8:35         ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-12  8:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Xiao Guangrong

On 12/12/16 01:26 -0700, Jan Beulich wrote:
>>>> On 12.12.16 at 03:31, <haozhong.zhang@intel.com> wrote:
>> On 12/09/16 16:38 -0500, Konrad Rzeszutek Wilk wrote:
>>>On Mon, Oct 10, 2016 at 08:32:21AM +0800, Haozhong Zhang wrote:
>>>> @@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>>>>      total_pages += epfn - spfn;
>>>>
>>>>      set_pdx_range(spfn, epfn);
>>>> -    ret = setup_m2p_table(&info);
>>>> +    ret = setup_m2p_table(&info, &info);
>>>
>>>I am not sure I follow this logic. You are passing the same contents, it
>>>is just that 'alloc_info' and 'info' are aliased together?
>>>
>>
>> Similarly to patch 1, the two parameters of setup_m2p_table() are
>> identical for the ordinary RAM, and can be different for NVDIMM.
>
>And the same comments as for patch 1 apply here then.
>
>Jan
>

I'll add comments and clarify in commit messages for both patch 1 and
patch 2 in the next version.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-12-12  8:30       ` Jan Beulich
@ 2016-12-12  8:38         ` Haozhong Zhang
  2016-12-12 14:44           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-12  8:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Daniel De Graaf, xen-devel, Xiao Guangrong

On 12/12/16 01:30 -0700, Jan Beulich wrote:
>>>> On 12.12.16 at 05:16, <haozhong.zhang@intel.com> wrote:
>> On 12/09/16 17:02 -0500, Konrad Rzeszutek Wilk wrote:
>>>On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
>>>> +static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>>>> +                          unsigned long rsv_spfn, unsigned long rsv_epfn,
>>>> +                          unsigned long data_spfn, unsigned long data_epfn)
>>>> +{
>>>> +    if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
>>>> +        return 0;
>>>
>>>Hm, I think it ought to be possible to have no rsv area..?
>>
>> A reserved area must be provided to storing frametable and M2P table of
>> NVDIMM.
>
>Is this really "must" rather than just "should", i.e. can't you do
>without even if you prefer to have it?
>
>Jan
>

It's a must in this version, but I relax this requirement in the WIP
v2 patch that fallback to the ordinary RAM if no reserved pmem area is
indicated.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-12-12  8:38         ` Haozhong Zhang
@ 2016-12-12 14:44           ` Konrad Rzeszutek Wilk
  2016-12-13  1:08             ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-12-12 14:44 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Konrad Rzeszutek Wilk,
	Xiao Guangrong, xen-devel, Daniel De Graaf

On Mon, Dec 12, 2016 at 04:38:38PM +0800, Haozhong Zhang wrote:
> On 12/12/16 01:30 -0700, Jan Beulich wrote:
> > > > > On 12.12.16 at 05:16, <haozhong.zhang@intel.com> wrote:
> > > On 12/09/16 17:02 -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
> > > > > +static int pmem_add_check(unsigned long spfn, unsigned long epfn,
> > > > > +                          unsigned long rsv_spfn, unsigned long rsv_epfn,
> > > > > +                          unsigned long data_spfn, unsigned long data_epfn)
> > > > > +{
> > > > > +    if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
> > > > > +        return 0;
> > > > 
> > > > Hm, I think it ought to be possible to have no rsv area..?
> > > 
> > > A reserved area must be provided to storing frametable and M2P table of
> > > NVDIMM.
> > 
> > Is this really "must" rather than just "should", i.e. can't you do
> > without even if you prefer to have it?
> > 
> > Jan
> > 
> 
> It's a must in this version, but I relax this requirement in the WIP
> v2 patch that fallback to the ordinary RAM if no reserved pmem area is
> indicated.

Awesome! Is your v2 patch somewhere available? I was thinking to continue
looking at your patchset later today but perhaps I should eyeball v2
instead?

Thanks!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-12-12 14:44           ` Konrad Rzeszutek Wilk
@ 2016-12-13  1:08             ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2016-12-13  1:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Xiao Guangrong, Andrew Cooper, xen-devel, Jan Beulich, Daniel De Graaf

On 12/12/16 09:44 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Dec 12, 2016 at 04:38:38PM +0800, Haozhong Zhang wrote:
>> On 12/12/16 01:30 -0700, Jan Beulich wrote:
>> > > > > On 12.12.16 at 05:16, <haozhong.zhang@intel.com> wrote:
>> > > On 12/09/16 17:02 -0500, Konrad Rzeszutek Wilk wrote:
>> > > > On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
>> > > > > +static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>> > > > > +                          unsigned long rsv_spfn, unsigned long rsv_epfn,
>> > > > > +                          unsigned long data_spfn, unsigned long data_epfn)
>> > > > > +{
>> > > > > +    if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
>> > > > > +        return 0;
>> > > >
>> > > > Hm, I think it ought to be possible to have no rsv area..?
>> > >
>> > > A reserved area must be provided to storing frametable and M2P table of
>> > > NVDIMM.
>> >
>> > Is this really "must" rather than just "should", i.e. can't you do
>> > without even if you prefer to have it?
>> >
>> > Jan
>> >
>>
>> It's a must in this version, but I relax this requirement in the WIP
>> v2 patch that fallback to the ordinary RAM if no reserved pmem area is
>> indicated.
>
>Awesome! Is your v2 patch somewhere available? I was thinking to continue
>looking at your patchset later today but perhaps I should eyeball v2
>instead?

It's not ready yet, but most of your comments also apply there. The
toolstack part in v2, especially the ACPI part, is almost the same so
far.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions
  2016-10-10  0:32 ` [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions Haozhong Zhang
  2016-10-11 19:13   ` Andrew Cooper
  2016-12-09 22:02   ` Konrad Rzeszutek Wilk
@ 2016-12-22 11:58   ` Jan Beulich
  2 siblings, 0 replies; 77+ messages in thread
From: Jan Beulich @ 2016-12-22 11:58 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Daniel De Graaf, Xiao Guangrong, xen-devel

>>> On 10.10.16 at 02:32, <haozhong.zhang@intel.com> wrote:
> --- /dev/null
> +++ b/xen/arch/x86/pmem.c

I wonder whether this should really be x86-specific: It's all ACPI
based, isn't it? I notice that you already place pmem.h that way.

> +static int is_included(unsigned long s1, unsigned long e1,
> +                       unsigned long s2, unsigned long e2)
> +{
> +    return s1 <= s2 && s2 < e2 && e2 <= e1;
> +}

Here and elsewhere, please properly use bool/true/false for
boolean values and (return) types.

> +static int is_overlaped(unsigned long s1, unsigned long e1,
> +                        unsigned long s2, unsigned long e2)
> +{
> +    return (s1 <= s2 && s2 < e1) || (s2 < s1 && s1 < e2);
> +}

is_overlapped() and there's an asymmetry here and conventionally
two comparisons suffice (s1 <= e2 && s2 <= e1, perhaps with the
<= exchanged for <).

> +static int pmem_list_add(unsigned long spfn, unsigned long epfn,
> +                         unsigned long rsv_spfn, unsigned long rsv_epfn,
> +                         unsigned long data_spfn, unsigned long data_epfn)
> +{
> +    struct list_head *cur;
> +    struct pmem *new_pmem;
> +    int ret = 0;
> +
> +    spin_lock(&pmem_list_lock);
> +
> +    list_for_each_prev(cur, &pmem_list)
> +    {
> +        struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
> +        unsigned long cur_spfn = cur_pmem->spfn;
> +        unsigned long cur_epfn = cur_pmem->epfn;
> +
> +        if ( (cur_spfn <= spfn && spfn < cur_epfn) ||
> +             (spfn <= cur_spfn && cur_spfn < epfn) )

is_overlapped()?

> +        {
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +
> +        if ( cur_spfn < spfn )
> +            break;
> +    }
> +
> +    new_pmem = xmalloc(struct pmem);

Please try to avoid allocations with a lock held, unless doing so is
significantly harming code readability.

> +int pmem_add(unsigned long spfn, unsigned long epfn,
> +             unsigned long rsv_spfn, unsigned long rsv_epfn,
> +             unsigned long data_spfn, unsigned long data_epfn)
> +{
> +    int ret;
> +
> +    if ( !pmem_add_check(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn) )
> +        return -EINVAL;
> +
> +    ret = pmem_setup(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
> +    if ( ret )
> +        goto out;
> +
> +    ret = iomem_deny_access(current->domain, rsv_spfn, rsv_epfn);
> +    if ( ret )
> +        goto out;
> +
> +    ret = pmem_list_add(spfn, epfn, rsv_spfn, rsv_epfn, data_spfn, data_epfn);
> +    if ( ret )
> +        goto out;
> +
> +    printk(XENLOG_INFO
> +           "pmem: pfns     0x%lx - 0x%lx\n"
> +           "      reserved 0x%lx - 0x%lx\n"
> +           "      data     0x%lx - 0x%lx\n",

%#lx

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest
  2016-10-10  0:32 ` [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest Haozhong Zhang
  2016-12-09 22:22   ` Konrad Rzeszutek Wilk
@ 2016-12-22 12:19   ` Jan Beulich
  1 sibling, 0 replies; 77+ messages in thread
From: Jan Beulich @ 2016-12-22 12:19 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Xiao Guangrong, Andrew Cooper, Ian Jackson, xen-devel, Wei Liu

>>> On 10.10.16 at 02:32, <haozhong.zhang@intel.com> wrote:
> +static int pmem_assign_pages(struct domain *d,
> +                             struct page_info *pg, unsigned int order)

What is the order parameter good for here, when the only caller
passes zero?

> +{
> +    int rc = 0;
> +    unsigned long i;
> +
> +    spin_lock(&d->pmem_lock);
> +
> +    if ( unlikely(d->is_dying) )
> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    for ( i = 0; i < (1 << order); i++ )
> +    {
> +        ASSERT(page_get_owner(&pg[i]) == NULL);
> +        ASSERT((pg[i].count_info & ~(PGC_allocated | 1)) == 0);

Why is PGC_allocated | 1 allowed to be set here?

> @@ -1329,6 +1330,36 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      }
>  #endif
>  
> +    case XENMEM_populate_pmemmap:
> +    {
> +        struct xen_pmemmap pmemmap;
> +        struct xen_pmemmap_args args;
> +
> +        if ( copy_from_guest(&pmemmap, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(pmemmap.domid);
> +        if ( !d )
> +            return -EINVAL;

I don't think you mean DOMID_SELF to be used here. And you're
lacking an XSM check in any event.

> +        args.domain = d;
> +        args.mfn = pmemmap.mfn;
> +        args.gpfn = pmemmap.gpfn;
> +        args.nr_mfns = pmemmap.nr_mfns;
> +        args.nr_done = start_extent;
> +        args.preempted = 0;
> +
> +        rc = pmem_populate(&args);

Please make sure you don't break the ARM build here.

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -646,7 +646,19 @@ struct xen_vnuma_topology_info {
>  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>  
> -/* Next available subop number is 28 */
> +#define XENMEM_populate_pmemmap 28
> +
> +struct xen_pmemmap {
> +    /* IN */
> +    domid_t domid;
> +    xen_pfn_t mfn;
> +    xen_pfn_t gpfn;
> +    unsigned int nr_mfns;
> +};

You'll clearly need to add compat mode argument translation code.

Also, may I suggest xen_pmem_map (and similarly elsewhere), to
avoid mistaking this for "physical memory map" or some such (i.e.
keeping pmem and map sufficiently separated)?

Also I think patch 5 needs to be merged here, or be moved ahead
of this one, to avoid the intermediate broken state (leaking all the
pmem pages assigned to a domain).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy
  2016-10-10  0:32 ` [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy Haozhong Zhang
  2016-12-09 22:27   ` Konrad Rzeszutek Wilk
@ 2016-12-22 12:22   ` Jan Beulich
  1 sibling, 0 replies; 77+ messages in thread
From: Jan Beulich @ 2016-12-22 12:22 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Andrew Cooper, Xiao Guangrong, xen-devel

>>> On 10.10.16 at 02:32, <haozhong.zhang@intel.com> wrote:
> --- a/xen/arch/x86/pmem.c
> +++ b/xen/arch/x86/pmem.c
> @@ -282,3 +282,44 @@ int pmem_populate(struct xen_pmemmap_args *args)
>      args->nr_done = i;
>      return rc;
>  }
> +
> +static int pmem_teardown_preemptible(struct domain *d, int *preempted)
> +{
> +    struct page_info *pg, *next;
> +    int rc = 0;
> +
> +    spin_lock(&d->pmem_lock);
> +
> +    page_list_for_each_safe (pg, next, &d->pmem_page_list )
> +    {
> +        BUG_ON(page_get_owner(pg) != d);
> +        BUG_ON(page_state_is(pg, free));
> +
> +        page_list_del(pg, &d->pmem_page_list);
> +        page_set_owner(pg, NULL);
> +        pg->count_info = (pg->count_info & ~PGC_count_mask) | PGC_state_free;
> +
> +        if ( preempted && hypercall_preempt_check() )
> +        {
> +            *preempted = 1;
> +            goto out;
> +        }
> +    }
> +
> + out:
> +    spin_unlock(&d->pmem_lock);
> +    return rc;
> +}
> +
> +int pmem_teardown(struct domain *d)
> +{
> +    int preempted = 0;
> +
> +    ASSERT(d->is_dying);
> +    ASSERT(d != current->domain);
> +
> +    if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
> +        return -EINVAL;

Why are these needed?

> +    return pmem_teardown_preemptible(d, &preempted);

I don't see what you have "preempted" for here. You want the
function to return -ERESTART in the preemption case. And I also
don't see what the helper function is good for - the code can
easily live right in this function.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model
  2016-10-10  0:32 ` [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model Haozhong Zhang
@ 2017-01-27 20:44   ` Konrad Rzeszutek Wilk
  2017-02-08  1:39     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 20:44 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On Mon, Oct 10, 2016 at 08:32:25AM +0800, Haozhong Zhang wrote:
> One guest page is reserved for the device model to place guest ACPI. The

guest ACPI what? ACPI SSDT? MADT?

Also why one page? What if there is a need for more than one page?

You add  HVM_XS_DM_ACPI_LENGTH which makes me think this is accounted
for?

> base address and size of the reserved area are passed to the device
> model via XenStore keys hvmloader/dm-acpi/{address, length}.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxc/include/xc_dom.h            |  1 +
>  tools/libxc/xc_dom_x86.c                |  7 +++++++
>  tools/libxl/libxl_dom.c                 | 25 +++++++++++++++++++++++++
>  xen/include/public/hvm/hvm_xs_strings.h | 11 +++++++++++
>  4 files changed, 44 insertions(+)
> 
> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
> index 608cbc2..19d65cd 100644
> --- a/tools/libxc/include/xc_dom.h
> +++ b/tools/libxc/include/xc_dom.h
> @@ -98,6 +98,7 @@ struct xc_dom_image {
>      xen_pfn_t xenstore_pfn;
>      xen_pfn_t shared_info_pfn;
>      xen_pfn_t bootstack_pfn;
> +    xen_pfn_t dm_acpi_pfn;

Perhaps an pointer to an variable size array?

 xen_pfn_t *dm_acpi_pfns;
 unsigned int dm_apci_nr;

?
>      xen_pfn_t pfn_alloc_end;
>      xen_vaddr_t virt_alloc_end;
>      xen_vaddr_t bsd_symtab_start;
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index 0eab8a7..47f14a1 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -674,6 +674,13 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
>                           ioreq_server_pfn(0));
>          xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
>                           NR_IOREQ_SERVER_PAGES);
> +
> +        dom->dm_acpi_pfn = xc_dom_alloc_page(dom, "DM ACPI");
> +        if ( dom->dm_acpi_pfn == INVALID_PFN )
> +        {
> +            DOMPRINTF("Could not allocate page for device model ACPI.");
> +            goto error_out;
> +        }
>      }
>  
>      rc = xc_dom_alloc_segment(dom, &dom->start_info_seg,
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index d519c8d..f0a1d97 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -865,6 +865,31 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
>              goto err;
>      }
>  
> +    if (dom->dm_acpi_pfn) {
> +        uint64_t guest_addr_out = dom->dm_acpi_pfn * XC_DOM_PAGE_SIZE(dom);
> +
> +        if (guest_addr_out >= 0x100000000ULL) {
> +            LOG(ERROR,
> +                "Guest address of DM ACPI is 0x%"PRIx64", but expected below 4G",
> +                guest_addr_out);
> +            goto err;
> +        }
> +
> +        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_ADDRESS, domid);
> +
> +        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
> +                               guest_addr_out);
> +        if (ret)
> +            goto err;
> +
> +        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_LENGTH, domid);
> +
> +        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
> +                               (uint64_t) XC_DOM_PAGE_SIZE(dom));
I don't think you need the space here:      ^
> +        if (ret)
> +            goto err;
> +    }
> +
>      return 0;
>  
>  err:
> diff --git a/xen/include/public/hvm/hvm_xs_strings.h b/xen/include/public/hvm/hvm_xs_strings.h
> index 146b0b0..f44f71f 100644
> --- a/xen/include/public/hvm/hvm_xs_strings.h
> +++ b/xen/include/public/hvm/hvm_xs_strings.h
> @@ -79,4 +79,15 @@
>   */
>  #define HVM_XS_OEM_STRINGS             "bios-strings/oem-%d"
>  
> +/* Follows are XenStore keys for DM ACPI (ACPI built by device model,
> + * e.g. QEMU).
> + *
> + * A reserved area of guest physical memory is used to pass DM
> + * ACPI. Values of following two keys specify the base address and
> + * length (in bytes) of the reserved area.
> + */
> +#define HVM_XS_DM_ACPI_ROOT              "hvmloader/dm-acpi"
> +#define HVM_XS_DM_ACPI_ADDRESS           HVM_XS_DM_ACPI_ROOT"/address"
> +#define HVM_XS_DM_ACPI_LENGTH            HVM_XS_DM_ACPI_ROOT"/length"
> +
>  #endif /* __XEN_PUBLIC_HVM_HVM_XS_STRINGS_H__ */
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 07/16] tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address
  2016-10-10  0:32 ` [RFC XEN PATCH 07/16] tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address Haozhong Zhang
@ 2017-01-27 20:46   ` Konrad Rzeszutek Wilk
  2017-02-08  1:42     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 20:46 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On Mon, Oct 10, 2016 at 08:32:26AM +0800, Haozhong Zhang wrote:
> This callback is used when libacpi needs to in-place access ACPI built
> by the device model, whose address is specified in the physical address.

May I recommend you write:

This callback is used when libacpi needs to access ACPI blobs
built by the device-model. The address is provided as an physical
address on XenBus (see patch titled: XYZ).

?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 08/16] tools/libacpi: expose details of memory allocation callback
  2016-10-10  0:32 ` [RFC XEN PATCH 08/16] tools/libacpi: expose details of memory allocation callback Haozhong Zhang
@ 2017-01-27 20:58   ` Konrad Rzeszutek Wilk
  2017-02-08  2:12     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 20:58 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On Mon, Oct 10, 2016 at 08:32:27AM +0800, Haozhong Zhang wrote:
> Expose the minimal allocation unit and the minimal alignment used by the
> memory allocator, so that certain ACPI code (e.g. the AML builder added
> later) can get contiguous memory allocated by multiple calls to

s/later/in patch titled: "XYZ"/

> acpi_ctxt.mem_ops.alloc().

This contingous memory is virtual or physical? You may want to be
specific.

And you may want to say that acpi_build_tables uses that by default
which is why you have the value of sixteen.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/firmware/hvmloader/util.c | 2 ++
>  tools/libacpi/libacpi.h         | 3 +++
>  tools/libxl/libxl_x86_acpi.c    | 2 ++
>  3 files changed, 7 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 1fe8dcc..504ae6a 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -972,6 +972,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>      ctxt.mem_ops.free = acpi_mem_free;
>      ctxt.mem_ops.v2p = acpi_v2p;
>      ctxt.mem_ops.p2v = acpi_p2v;
> +    ctxt.min_alloc_unit = PAGE_SIZE;

?? Really? That seems excessive as the acpi_build_tables calls
ctxt->mem_ops.alloc :
$ grep "ctxt->mem_ops.alloc" * | wc -l
20

That would imply 20 pages ?

> +    ctxt.min_alloc_align = 16;

Does that mean it is sixteen pages aligment? Or 16 bytes aligment?

If the bytes perhaps you want to change the name to
'min_alloc_byte_align' ?

>  
>      acpi_build_tables(&ctxt, config);
>  
> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
> index 62e90ab..0fb16e7 100644
> --- a/tools/libacpi/libacpi.h
> +++ b/tools/libacpi/libacpi.h
> @@ -47,6 +47,9 @@ struct acpi_ctxt {
>          unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
>          void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p);
>      } mem_ops;
> +
> +    uint32_t min_alloc_unit;
> +    uint32_t min_alloc_align;
>  };
>  
>  struct acpi_config {
> diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
> index aa5b83d..baf60ac 100644
> --- a/tools/libxl/libxl_x86_acpi.c
> +++ b/tools/libxl/libxl_x86_acpi.c
> @@ -187,6 +187,8 @@ int libxl__dom_load_acpi(libxl__gc *gc,
>      libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
>      libxl_ctxt.c.mem_ops.p2v = phys_to_virt;
>      libxl_ctxt.c.mem_ops.free = acpi_mem_free;
> +    libxl_ctxt.c.min_alloc_unit = libxl_ctxt.page_size;
> +    libxl_ctxt.c.min_alloc_align = 16;
>  
>      rc = init_acpi_config(gc, dom, b_info, &config);
>      if (rc) {
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 09/16] tools/libacpi: add callbacks to access XenStore
  2016-10-10  0:32 ` [RFC XEN PATCH 09/16] tools/libacpi: add callbacks to access XenStore Haozhong Zhang
@ 2017-01-27 21:10   ` Konrad Rzeszutek Wilk
  2017-02-08  2:19     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 21:10 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On Mon, Oct 10, 2016 at 08:32:28AM +0800, Haozhong Zhang wrote:
> libacpi needs to access information placed in XenStore in order to load
> ACPI built by the device model.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/firmware/hvmloader/util.c   | 50 +++++++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.h   |  2 ++
>  tools/firmware/hvmloader/xenbus.c | 20 ++++++++++++++++
>  tools/libacpi/libacpi.h           | 10 ++++++++
>  tools/libxl/libxl_x86_acpi.c      | 24 +++++++++++++++++++
>  5 files changed, 106 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 504ae6a..dba954a 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -888,6 +888,51 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
>      /* ACPI builder currently doesn't free memory so this is just a stub */
>  }
>  
> +static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
> +{
> +    return xenstore_read(path, NULL);
> +}
> +
> +static int acpi_xs_write(struct acpi_ctxt *ctxt,
> +                         const char *path, const char *value)
> +{
> +    return xenstore_write(path, value);
> +}
> +
> +static unsigned int count_strings(const char *strings, unsigned int len)
> +{
> +    const char *p;
> +    unsigned int n;
> +
> +    for ( p = strings, n = 0; p < strings + len; p++ )
> +        if ( *p == '\0' )
> +            n++;
> +
> +    return n;
> +}
> +
> +static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
> +                                const char *path, unsigned int *num)
> +{
> +    const char *strings;
> +    char *s, *p, **ret;
> +    unsigned int len, n;
> +
> +    strings = xenstore_directory(path, &len, NULL);
> +    if ( !strings )
> +        return NULL;
> +
> +    n = count_strings(strings, len);
> +    ret = ctxt->mem_ops.alloc(ctxt, n * sizeof(char *) + len, 0);

sizeof(*s)

But you may also check ret against NULL before you memcpy data in there.


> +    memcpy(&ret[n], strings, len);
> +
> +    s = (char *)&ret[n];
> +    for ( p = s, *num = 0; p < s + len; p+= strlen(p) + 1 )

Perhaps add an space before += ?
> +        ret[(*num)++] = p;
> +
> +    return ret;
> +}
> +
>  static uint8_t acpi_lapic_id(unsigned cpu)
>  {
>      return LAPIC_ID(cpu);
> @@ -975,6 +1020,11 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>      ctxt.min_alloc_unit = PAGE_SIZE;
>      ctxt.min_alloc_align = 16;
>  
> +    ctxt.xs_ops.read = acpi_xs_read;
> +    ctxt.xs_ops.write = acpi_xs_write;
> +    ctxt.xs_ops.directory = acpi_xs_directory;
> +    ctxt.xs_opaque = NULL;
> +
>      acpi_build_tables(&ctxt, config);
>  
>      hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index 6a50dae..9443673 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -225,6 +225,8 @@ const char *xenstore_read(const char *path, const char *default_resp);
>   */
>  int xenstore_write(const char *path, const char *value);
>  
> +const char *xenstore_directory(const char *path, uint32_t *len,
> +                               const char *default_resp);
>  
>  /* Get a HVM param.
>   */
> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
> index 448157d..70bdadd 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -296,6 +296,26 @@ int xenstore_write(const char *path, const char *value)
>      return ret;
>  }
>  
> +const char *xenstore_directory(const char *path, uint32_t *len,
> +                               const char *default_resp)
> +{
> +    uint32_t type = 0;
> +    const char *answer = NULL;
> +
> +    xenbus_send(XS_DIRECTORY,
> +                path, strlen(path),
> +                "", 1, /* nul separator */
> +                NULL, 0);
> +
> +    if ( xenbus_recv(len, &answer, &type) || (type != XS_DIRECTORY) )
> +        answer = NULL;
> +
> +    if ( (default_resp != NULL) && ((answer == NULL) || (*answer == '\0')) )
> +        answer = default_resp;
> +
> +    return answer;

This function looks very similar to xenstore_read. Could xenstore_read
become __xenstore_read with an extra argument (type) and then the
new xenstore_read along with xenstore_dir would call in it?
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
> index 0fb16e7..12cafd8 100644
> --- a/tools/libacpi/libacpi.h
> +++ b/tools/libacpi/libacpi.h
> @@ -50,6 +50,16 @@ struct acpi_ctxt {
>  
>      uint32_t min_alloc_unit;
>      uint32_t min_alloc_align;
> +
> +    struct acpi_xs_ops {
> +        const char *(*read)(struct acpi_ctxt *ctxt, const char *path);
> +        int (*write)(struct acpi_ctxt *ctxt,
> +                     const char *path, const char *value);
> +        char **(*directory)(struct acpi_ctxt *ctxt,
> +                            const char *path, unsigned int *num);
> +    } xs_ops;
> +
> +    void *xs_opaque;
>  };
>  
>  struct acpi_config {
> diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
> index baf60ac..1afd2e3 100644
> --- a/tools/libxl/libxl_x86_acpi.c
> +++ b/tools/libxl/libxl_x86_acpi.c
> @@ -93,6 +93,25 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
>  {
>  }
>  
> +static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
> +{
> +    return libxl__xs_read((libxl__gc *)ctxt->xs_opaque, XBT_NULL, path);
> +}
> +
> +static int acpi_xs_write(struct acpi_ctxt *ctxt,
> +                         const char *path, const char *value)
> +{
> +    return libxl__xs_write_checked((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
> +                                   path, value);
> +}
> +
> +static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
> +                                const char *path, unsigned int *num)
> +{
> +    return libxl__xs_directory((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
> +                               path, num);
> +}
> +
>  static uint8_t acpi_lapic_id(unsigned cpu)
>  {
>      return cpu * 2;
> @@ -190,6 +209,11 @@ int libxl__dom_load_acpi(libxl__gc *gc,
>      libxl_ctxt.c.min_alloc_unit = libxl_ctxt.page_size;
>      libxl_ctxt.c.min_alloc_align = 16;
>  
> +    libxl_ctxt.c.xs_ops.read = acpi_xs_read;
> +    libxl_ctxt.c.xs_ops.write = acpi_xs_write;
> +    libxl_ctxt.c.xs_ops.directory = acpi_xs_directory;
> +    libxl_ctxt.c.xs_opaque = gc;
> +
>      rc = init_acpi_config(gc, dom, b_info, &config);
>      if (rc) {
>          LOG(ERROR, "init_acpi_config failed (rc=%d)", rc);
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 10/16] tools/libacpi: add a simple AML builder
  2016-10-10  0:32 ` [RFC XEN PATCH 10/16] tools/libacpi: add a simple AML builder Haozhong Zhang
@ 2017-01-27 21:19   ` Konrad Rzeszutek Wilk
  2017-02-08  2:33     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 21:19 UTC (permalink / raw)
  To: Haozhong Zhang, ross.philipson
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On Mon, Oct 10, 2016 at 08:32:29AM +0800, Haozhong Zhang wrote:
> It is used by libacpi to generate SSDTs from ACPI namespace devices
> built by the device model.

Would it make sense to include a link to document outlining the
the AML code? Or perhaps even just include an simple example
of ASL and what the resulting AML code should look like?

And maybe what subset of the AML code this implements?
(with some simple ASL examples?)

Also adding Ross who wrote an AML builder as well.

> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/firmware/hvmloader/Makefile |   3 +-
>  tools/libacpi/aml_build.c         | 254 ++++++++++++++++++++++++++++++++++++++
>  tools/libacpi/aml_build.h         |  83 +++++++++++++
>  tools/libxl/Makefile              |   3 +-
>  4 files changed, 341 insertions(+), 2 deletions(-)
>  create mode 100644 tools/libacpi/aml_build.c
>  create mode 100644 tools/libacpi/aml_build.h
> 
> diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
> index 77d7551..cf0dac3 100644
> --- a/tools/firmware/hvmloader/Makefile
> +++ b/tools/firmware/hvmloader/Makefile
> @@ -79,11 +79,12 @@ smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>  
>  ACPI_PATH = ../../libacpi
>  ACPI_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
> -ACPI_OBJS = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o
> +ACPI_OBJS = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o aml_build.o
>  $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
>  CFLAGS += -I$(ACPI_PATH)
>  vpath build.c $(ACPI_PATH)
>  vpath static_tables.c $(ACPI_PATH)
> +vpath aml_build.c $(ACPI_PATH)
>  OBJS += $(ACPI_OBJS)
>  
>  hvmloader: $(OBJS)
> diff --git a/tools/libacpi/aml_build.c b/tools/libacpi/aml_build.c
> new file mode 100644
> index 0000000..b6f23f4
> --- /dev/null
> +++ b/tools/libacpi/aml_build.c
> @@ -0,0 +1,254 @@
> +/*
> + * tools/libacpi/aml_build.c
> + *
> + * Copyright (c) 2016, Intel Corporation.

.. now 2017
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by

The libacpi is LGPL.

Could this be licensed as LGPL please?

> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
> + */
> +
> +#include LIBACPI_STDUTILS
> +#include "libacpi.h"
> +#include "aml_build.h"
> +
> +#define AML_OP_SCOPE     0x10
> +#define AML_OP_EXT       0x5B
> +#define AML_OP_DEVICE    0x82
> +
> +#define ACPI_NAMESEG_LEN 4
> +
> +struct aml_build_alloctor {
> +    struct acpi_ctxt *ctxt;
> +    uint8_t *buf;
> +    uint32_t capacity;
> +    uint32_t used;
> +};
> +static struct aml_build_alloctor alloc;
> +
> +enum { ALLOC_OVERFLOW, ALLOC_NOT_NEEDED, ALLOC_NEEDED };

Why not make this a named enum?
> +
> +static int alloc_needed(uint32_t size)
> +{
> +    uint32_t len = alloc.used + size;
> +
> +    if ( len < alloc.used )
> +        return ALLOC_OVERFLOW;
> +    else if ( len <= alloc.capacity )
> +        return ALLOC_NOT_NEEDED;
> +    else
> +        return ALLOC_NEEDED;
> +}
> +
> +static uint8_t *aml_buf_alloc(uint32_t size)
> +{
> +    int needed = alloc_needed(size);

And then this can be an enum? Or alternatively make this unsigned int.

> +    uint8_t *buf = NULL;
> +    struct acpi_ctxt *ctxt = alloc.ctxt;
> +    uint32_t alloc_size, alloc_align = ctxt->min_alloc_align;
> +
> +    switch ( needed )
> +    {
> +    case ALLOC_OVERFLOW:
> +        break;
> +
> +    case ALLOC_NEEDED:
> +        alloc_size = (size + alloc_align) & ~(alloc_align - 1);

Perhaps multiply times two so we have more wiggle room?

> +        buf = ctxt->mem_ops.alloc(ctxt, alloc_size, alloc_align);
> +        if ( !buf )
> +            break;
> +        if ( alloc.buf + alloc.capacity != buf )
> +        {
> +            buf = NULL;
> +            break;
> +        }
> +        alloc.capacity += alloc_size;
> +        alloc.used += size;
> +        break;
> +
> +    case ALLOC_NOT_NEEDED:
> +        buf = alloc.buf + alloc.used;
> +        alloc.used += size;
> +        break;
> +
> +    default:
> +        break;
> +    }
> +
> +    return buf;
> +}
> +
> +static uint32_t get_package_length(uint8_t *pkg)
> +{
> +    uint32_t len;
> +
> +    len = pkg - alloc.buf;
> +    len = alloc.used - len;
> +
> +    return len;
> +}
> +
> +static void build_prepend_byte(uint8_t *buf, uint8_t byte)
> +{
> +    uint32_t len;
> +
> +    len = buf - alloc.buf;
> +    len = alloc.used - len;
> +
> +    aml_buf_alloc(sizeof(uint8_t));
> +    if ( len )
> +        memmove(buf + 1, buf, len);
> +    buf[0] = byte;
> +}
> +
> +/*
> + * XXX: names of multiple segments (e.g. X.Y.Z) are not supported
> + */
> +static void build_prepend_name(uint8_t *buf, const char *name)
> +{
> +    uint8_t *p = buf;
> +    const char *s = name;
> +    uint32_t len, name_len;
> +
> +    while ( *s == '\\' || *s == '^' )
> +    {
> +        build_prepend_byte(p, (uint8_t) *s);
> +        ++p;
> +        ++s;
> +    }
> +
> +    if ( !*s )
> +    {
> +        build_prepend_byte(p, 0x00);
> +        return;
> +    }
> +
> +    len = p - alloc.buf;
> +    len = alloc.used - len;
> +    name_len = strlen(s);
> +    ASSERT(strlen(s) <= ACPI_NAMESEG_LEN);
> +
> +    aml_buf_alloc(ACPI_NAMESEG_LEN);
> +    if ( len )
> +        memmove(p + ACPI_NAMESEG_LEN, p, len);
> +    memcpy(p, s, name_len);
> +    memcpy(p + name_len, "____", ACPI_NAMESEG_LEN - name_len);
> +}
> +
> +enum {
> +    PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
> +    PACKAGE_LENGTH_2BYTE_SHIFT = 4,
> +    PACKAGE_LENGTH_3BYTE_SHIFT = 12,
> +    PACKAGE_LENGTH_4BYTE_SHIFT = 20,
> +};
> +
> +static void build_prepend_package_length(uint8_t *pkg, uint32_t length)
> +{
> +    uint8_t byte;
> +    unsigned length_bytes;
> +
> +    if ( length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT) )
> +        length_bytes = 1;
> +    else if ( length + 2 < (1 << PACKAGE_LENGTH_3BYTE_SHIFT) )
> +        length_bytes = 2;
> +    else if ( length + 3 < (1 << PACKAGE_LENGTH_4BYTE_SHIFT) )
> +        length_bytes = 3;
> +    else
> +        length_bytes = 4;
> +
> +    length += length_bytes;
> +
> +    switch ( length_bytes )
> +    {
> +    case 1:
> +        byte = length;
> +        build_prepend_byte(pkg, byte);
> +        return;
> +    case 4:
> +        byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT;
> +        build_prepend_byte(pkg, byte);
> +        length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1;
> +        /* fall through */
> +    case 3:
> +        byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT;
> +        build_prepend_byte(pkg, byte);
> +        length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1;
> +        /* fall through */
> +    case 2:
> +        byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT;
> +        build_prepend_byte(pkg, byte);
> +        length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1;
> +        /* fall through */
> +    }
> +    /*
> +     * Most significant two bits of byte zero indicate how many following bytes
> +     * are in PkgLength encoding.
> +     */
> +    byte = ((length_bytes - 1) << PACKAGE_LENGTH_1BYTE_SHIFT) | length;
> +    build_prepend_byte(pkg, byte);
> +}
> +
> +static void build_prepend_package(uint8_t *buf, uint8_t op)
> +{
> +    uint32_t length = get_package_length(buf);
> +    build_prepend_package_length(buf, length);
> +    build_prepend_byte(buf, op);
> +}
> +
> +static void build_prepend_ext_packge(uint8_t *buf, uint8_t op)
> +{
> +    build_prepend_package(buf, op);
> +    build_prepend_byte(buf, AML_OP_EXT);
> +}
> +
> +void *aml_build_begin(struct acpi_ctxt *ctxt)
> +{
> +    alloc.ctxt = ctxt;
> +    alloc.buf = ctxt->mem_ops.alloc(ctxt,
> +                                    ctxt->min_alloc_unit, ctxt->min_alloc_align);
> +    alloc.capacity = ctxt->min_alloc_unit;
> +    alloc.used = 0;
> +    return alloc.buf;
> +}
> +
> +uint32_t aml_build_end(void)
> +{
> +    return alloc.used;
> +}
> +
> +void aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t blob_length)
> +{
> +    uint32_t len;
> +
> +    len = buf - alloc.buf;
> +    len = alloc.used - len;
> +
> +    aml_buf_alloc(blob_length);
> +    if ( len )
> +        memmove(buf + blob_length, buf, len);
> +
> +    memcpy(buf, blob, blob_length);
> +}
> +
> +void aml_prepend_device(uint8_t *buf, const char *name)
> +{
> +    build_prepend_name(buf, name);
> +    build_prepend_ext_packge(buf, AML_OP_DEVICE);
> +}
> +
> +void aml_prepend_scope(uint8_t *buf, const char *name)
> +{
> +    build_prepend_name(buf, name);
> +    build_prepend_package(buf, AML_OP_SCOPE);
> +}
> diff --git a/tools/libacpi/aml_build.h b/tools/libacpi/aml_build.h
> new file mode 100644
> index 0000000..ed68f66
> --- /dev/null
> +++ b/tools/libacpi/aml_build.h
> @@ -0,0 +1,83 @@
> +/*
> + * tools/libacpi/aml_build.h
> + *
> + * Copyright (c) 2016, Intel Corporation.

Now 2017.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.

Again this needs to be LGPL license.

> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
> + */
> +
> +#ifndef _AML_BUILD_H_
> +#define _AML_BUILD_H_
> +
> +#include <stdint.h>
> +#include "libacpi.h"
> +
> +/*
> + * NB: All aml_prepend_* calls, which build AML code in one ACPI
> + *     table, should be placed between a pair of calls to
> + *     aml_build_begin() and aml_build_end().
> + */
> +
> +/**
> + * Reset the AML builder and begin a new round of building.
> + *
> + * Parameters:
> + *   @ctxt: ACPI context used by the AML builder
> + *
> + * Returns:
> + *   a pointer to the builder buffer where the AML code will be stored
> + */
> +void *aml_build_begin(struct acpi_ctxt *ctxt);
> +
> +/**
> + * Mark the end of a round of AML building.
> + *
> + * Returns:
> + *  the number of bytes in the builder buffer built in this round
> + */
> +uint32_t aml_build_end(void);
> +
> +/**
> + * Prepend a blob, which can contain arbitrary content, to the builder buffer.
> + *
> + * Parameters:
> + *   @buf:    pointer to the builder buffer
> + *   @blob:   pointer to the blob
> + *   @length: the number of bytes in the blob
> + */
> +void aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t length);
> +
> +/**
> + * Prepend an AML device structure to the builder buffer. The existing
> + * data in the builder buffer is included in the AML device.
> + *
> + * Parameters:
> + *   @buf:  pointer to the builder buffer
> + *   @name: the name of the device
> + */
> +void aml_prepend_device(uint8_t *buf, const char *name);
> +
> +/**
> + * Prepend an AML scope structure to the builder buffer. The existing
> + * data in the builder buffer is included in the AML scope.
> + *
> + * Parameters:
> + *   @buf:  pointer to the builder buffer
> + *   @name: the name of the scope
> + */
> +void aml_prepend_scope(uint8_t *buf, const char *name);
> +
> +#endif /* _AML_BUILD_H_ */
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index c4e4117..a904927 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -77,11 +77,12 @@ endif
>  
>  ACPI_PATH  = $(XEN_ROOT)/tools/libacpi
>  ACPI_FILES = dsdt_pvh.c
> -ACPI_OBJS  = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o
> +ACPI_OBJS  = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o aml_build.o
>  $(ACPI_FILES): acpi
>  $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/libxl_x86_acpi.h\"
>  vpath build.c $(ACPI_PATH)/
>  vpath static_tables.c $(ACPI_PATH)/
> +vpath aml_build.c $(ACPI_PATH)/
>  LIBXL_OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
>  
>  .PHONY: acpi
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model
  2016-10-10  0:32 ` [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model Haozhong Zhang
@ 2017-01-27 21:40   ` Konrad Rzeszutek Wilk
  2017-02-08  5:38     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 21:40 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On Mon, Oct 10, 2016 at 08:32:30AM +0800, Haozhong Zhang wrote:
> ACPI tables built by the device model, whose signatures do not
> conflict with tables built by Xen (except SSDT), are loaded after ACPI
> tables built by Xen.
> 
> ACPI namespace devices built by the device model, whose names do not
> conflict with devices built by Xen, are assembled and placed in SSDTs
> after ACPI tables built by Xen.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/firmware/hvmloader/util.c |  12 +++
>  tools/libacpi/acpi2_0.h         |   2 +
>  tools/libacpi/build.c           | 216 ++++++++++++++++++++++++++++++++++++++++
>  tools/libacpi/libacpi.h         |   5 +
>  4 files changed, 235 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index dba954a..e6530cd 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -998,6 +998,18 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>      if ( !strncmp(xenstore_read("platform/acpi_s4", "1"), "1", 1)  )
>          config->table_flags |= ACPI_HAS_SSDT_S4;
>  
> +    s = xenstore_read(HVM_XS_DM_ACPI_ADDRESS, NULL);
> +    if ( s )
> +    {
> +        config->dm.addr = strtoll(s, NULL, 0);
> +
> +        s = xenstore_read(HVM_XS_DM_ACPI_LENGTH, NULL);
> +        if ( s )
> +            config->dm.length = strtoll(s, NULL, 0);
> +        else
> +            config->dm.addr = 0;
> +    }
> +
>      config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC | ACPI_HAS_WAET);
>  
>      config->tis_hdr = (uint16_t *)ACPI_TIS_HDR_ADDRESS;
> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
> index 775eb7a..7414470 100644
> --- a/tools/libacpi/acpi2_0.h
> +++ b/tools/libacpi/acpi2_0.h
> @@ -430,6 +430,7 @@ struct acpi_20_slit {
>  #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
>  #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
>  #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
> +#define ACPI_2_0_SSDT_SIGNATURE ASCII32('S','S','D','T')
>  
>  /*
>   * Table revision numbers.
> @@ -445,6 +446,7 @@ struct acpi_20_slit {
>  #define ACPI_1_0_FADT_REVISION 0x01
>  #define ACPI_2_0_SRAT_REVISION 0x01
>  #define ACPI_2_0_SLIT_REVISION 0x01
> +#define ACPI_2_0_SSDT_REVISION 0x02
>  
>  #pragma pack ()
>  
> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
> index 47dae01..829a365 100644
> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -20,6 +20,7 @@
>  #include "ssdt_s4.h"
>  #include "ssdt_tpm.h"
>  #include "ssdt_pm.h"
> +#include "aml_build.h"
>  #include <xen/hvm/hvm_info_table.h>
>  #include <xen/hvm/hvm_xs_strings.h>
>  #include <xen/hvm/params.h>
> @@ -55,6 +56,34 @@ struct acpi_info {
>      uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
>  };
>  
> +#define DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */
> +#define DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML definition of an ACPI namespace device */
> +
> +/* ACPI tables of following signatures should not appear in DM ACPI */

It would be good to have some form of build check to check against
this list..
> +static const uint64_t dm_acpi_signature_blacklist[] = {
> +    ACPI_2_0_RSDP_SIGNATURE,
> +    ACPI_2_0_FACS_SIGNATURE,
> +    ACPI_2_0_FADT_SIGNATURE,
> +    ACPI_2_0_MADT_SIGNATURE,
> +    ACPI_2_0_RSDT_SIGNATURE,
> +    ACPI_2_0_XSDT_SIGNATURE,
> +    ACPI_2_0_TCPA_SIGNATURE,
> +    ACPI_2_0_HPET_SIGNATURE,
> +    ACPI_2_0_WAET_SIGNATURE,
> +    ACPI_2_0_SRAT_SIGNATURE,
> +    ACPI_2_0_SLIT_SIGNATURE,
> +};
> +
> +/* ACPI namespace devices of following names should not appear in DM ACPI */
> +static const char *dm_acpi_devname_blacklist[] = {
> +    "MEM0",
> +    "PCI0",
> +    "AC",
> +    "BAT0",
> +    "BAT1",
> +    "TPM",

.. and this one.

But I am not even sure how one would do that?

Perhaps add a big warning:

"Make sure to add your table name if you this code (libacpi) is
constructing it. "?

Or maybe have some 'register_acpi_table' function which will expand
this blacklist?

> +};
> +
>  static void set_checksum(
>      void *table, uint32_t checksum_offset, uint32_t length)
>  {
> @@ -339,6 +368,190 @@ static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
>      return nr_added;
>  }
>  
> +#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))

That may want to go libacpi.h ?
> +
> +static int check_signature_collision(uint64_t sig)

bool
> +{
> +    int i;

unsigned int

> +    for ( i = 0; i < ARRAY_SIZE(dm_acpi_signature_blacklist); i++ )
> +    {
> +        if ( sig == dm_acpi_signature_blacklist[i] )
> +            return 1;

return true

> +    }
> +    return 0;
> +}
> +
> +static int check_devname_collision(const char *name)

bool
> +{
> +    int i;

unsigned int

> +    for ( i = 0; i < ARRAY_SIZE(dm_acpi_devname_blacklist); i++ )
> +    {
> +        if ( !strncmp(name, dm_acpi_devname_blacklist[i], 4) )

That 4 could be a #define
> +            return 1;
> +    }
> +    return 0;
> +}
> +
> +static const char *xs_read_dm_acpi_blob_key(struct acpi_ctxt *ctxt,
> +                                            const char *name, const char *key)
> +{
> +#define DM_ACPI_BLOB_PATH_MAX_LENGTH 30
> +    char path[DM_ACPI_BLOB_PATH_MAX_LENGTH];
> +    snprintf(path, DM_ACPI_BLOB_PATH_MAX_LENGTH, HVM_XS_DM_ACPI_ROOT"/%s/%s",
> +             name, key);
> +    return ctxt->xs_ops.read(ctxt, path);

#undef DM_APCI_BLOB... but perhaps that should go in
xen/include/public/hvm/hvm_xs_strings.h ?

> +}
> +
> +static int construct_dm_table(struct acpi_ctxt *ctxt,

bool
> +                              unsigned long *table_ptrs, int nr_tables,

unsigned int nr_tables
> +                              const void *blob, uint32_t length)
> +{
> +    const struct acpi_header *header = blob;
> +    uint8_t *buffer;
> +
> +    if ( check_signature_collision(header->signature) )
> +        return 0;
> +
> +    if ( header->length > length || header->length == 0 )
> +        return 0;
> +
> +    buffer = ctxt->mem_ops.alloc(ctxt, header->length, 16);
> +    if ( !buffer )
> +        return 0;
> +    memcpy(buffer, header, header->length);
> +
> +    /* some device models (e.g. QEMU) does not set checksum */
> +    set_checksum(buffer, offsetof(struct acpi_header, checksum),
> +                 header->length);
> +
> +    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
> +
> +    return 1;
> +}
> +
> +static int construct_dm_nsdev(struct acpi_ctxt *ctxt,

bool
> +                              unsigned long *table_ptrs, int nr_tables,

unsigned int
> +                              const char *dev_name,
> +                              const void *blob, uint32_t blob_length)
> +{
> +    struct acpi_header ssdt, *header;
> +    uint8_t *buffer;
> +
> +    if ( check_devname_collision(dev_name) )
> +        return 0;
> +
> +    /* built ACPI namespace device from [name, blob] */
> +    buffer = aml_build_begin(ctxt);
> +    aml_prepend_blob(buffer, blob, blob_length);
> +    aml_prepend_device(buffer, dev_name);
> +    aml_prepend_scope(buffer, "\\_SB");
> +
> +    /* build SSDT header */
> +    ssdt.signature = ACPI_2_0_SSDT_SIGNATURE;
> +    ssdt.revision = ACPI_2_0_SSDT_REVISION;
> +    fixed_strcpy(ssdt.oem_id, ACPI_OEM_ID);
> +    fixed_strcpy(ssdt.oem_table_id, ACPI_OEM_TABLE_ID);
> +    ssdt.oem_revision = ACPI_OEM_REVISION;
> +    ssdt.creator_id = ACPI_CREATOR_ID;
> +    ssdt.creator_revision = ACPI_CREATOR_REVISION;
> +
> +    /* prepend SSDT header to ACPI namespace device */
> +    aml_prepend_blob(buffer, &ssdt, sizeof(ssdt));
> +    header = (struct acpi_header *) buffer;
> +    header->length = aml_build_end();
> +
> +    /* calculate checksum of SSDT */
> +    set_checksum(header, offsetof(struct acpi_header, checksum),
> +                 header->length);
> +
> +    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
> +
> +    return 1;

return true.
> +}
> +
> +/*
> + * All ACPI stuffs built by the device model are placed in the guest
> + * buffer whose address and size are specified by config->dm.{addr, length},
> + * or XenStore keys HVM_XS_DM_ACPI_{ADDRESS, LENGTH}.

This should be also provided in
xen/include/public/hvm/hvm_xs_strings.h

Especially as you are in effect adding new keys and attributes to it.

> + *
> + * The data layout within the buffer is further specified by XenStore
> + * directories under HVM_XS_DM_ACPI_ROOT. Each directory specifies a

Is each directory the name of the DSDT object? In which case you
want to say a bit about the directory name and the limit (only four
characters long), don't use the ones that are built-in, etc..

But it looks like it can be anything. You extract the name from
the blob.

But we should still say what the directory names ought to be.

> + * data blob and contains following XenStore keys:
> + *
> + * - "type":
> + *   * DM_ACPI_BLOB_TYPE_TABLE
> + *     The data blob specified by this directory is an ACPI table.
> + *   * DM_ACPI_BLOB_TYPE_NSDEV
> + *     The data blob specified by this directory is an ACPI namespace device.
> + *     Its name is specified by the directory name, while the AML code of the
> + *     body of the AML device structure is in the data blob.

Could those be strings on XenStore? Strings are nice. "table" or
"nsdev"?
> + *
> + * - "length": the number of bytes in this data blob.
> + *
> + * - "offset": the offset in bytes of this data blob from the beginning of buffer
> + */
> +static int construct_dm_tables(struct acpi_ctxt *ctxt,

static unsigned int
> +                               unsigned long *table_ptrs,
> +                               int nr_tables,

unsigned int nr_tables
> +                               struct acpi_config *config)
> +{
> +    const char *s;
> +    char **dir;
> +    uint8_t type;
> +    void *blob;
> +    unsigned int num, length, offset, i;
> +    int nr_added = 0;

unsigned int
> +
> +    if ( !config->dm.addr )
> +        return 0;
> +
> +    dir = ctxt->xs_ops.directory(ctxt, HVM_XS_DM_ACPI_ROOT, &num);
> +    if ( !dir || !num )
> +        return 0;
> +
> +    if ( num > ACPI_MAX_SECONDARY_TABLES - nr_tables )
> +        return 0;
> +
> +    for ( i = 0; i < num; i++, dir++ )
> +    {

You probably want to check that *dir is not NULL. Just in case.

> +        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "type");
> +        if ( !s )
> +            continue;
> +        type = (uint8_t)strtoll(s, NULL, 0);
> +
> +        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "length");
> +        if ( !s )
> +            continue;
> +        length = (uint32_t)strtoll(s, NULL, 0);
> +
> +        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "offset");
> +        if ( !s )
> +            continue;
> +        offset = (uint32_t)strtoll(s, NULL, 0);
> +
> +        blob = ctxt->mem_ops.p2v(ctxt, config->dm.addr + offset);
> +
> +        switch ( type )
> +        {
> +        case DM_ACPI_BLOB_TYPE_TABLE:
> +            nr_added += construct_dm_table(ctxt,
> +                                           table_ptrs, nr_tables + nr_added,
> +                                           blob, length);
> +            break;
> +        case DM_ACPI_BLOB_TYPE_NSDEV:
> +            nr_added += construct_dm_nsdev(ctxt,
> +                                           table_ptrs, nr_tables + nr_added,
> +                                           *dir, blob, length);
> +            break;
> +        default:
> +            /* skip blobs of unknown types */
> +            continue;
> +        }
> +    }
> +
> +    return nr_added;
> +}
> +
>  static int construct_secondary_tables(struct acpi_ctxt *ctxt,
>                                        unsigned long *table_ptrs,
>                                        struct acpi_config *config,
> @@ -461,6 +674,9 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
>      nr_tables += construct_passthrough_tables(ctxt, table_ptrs,
>                                                nr_tables, config);
>  
> +    /* Load any additional tables passed from device model (e.g. QEMU) */

Perhaps an period at the end of the sentence?

> +    nr_tables += construct_dm_tables(ctxt, table_ptrs, nr_tables, config);
> +
>      table_ptrs[nr_tables] = 0;
>      return nr_tables;
>  }
> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
> index 12cafd8..684502d 100644
> --- a/tools/libacpi/libacpi.h
> +++ b/tools/libacpi/libacpi.h
> @@ -82,6 +82,11 @@ struct acpi_config {
>          uint32_t length;
>      } pt;
>  
> +    struct {
> +        uint32_t addr;
> +        uint32_t length;
> +    } dm;
> +
>      struct acpi_numa numa;
>      const struct hvm_info_table *hvminfo;
>  
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs
  2016-10-10  0:32 ` [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
@ 2017-01-27 21:47   ` Konrad Rzeszutek Wilk
  2017-02-08  5:42     ` Haozhong Zhang
  2017-01-27 21:48   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 21:47 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On Mon, Oct 10, 2016 at 08:32:31AM +0800, Haozhong Zhang wrote:
> For xl vNVDIMM configs
>   vnvdimms = [ '/path/to/pmem0', '/path/to/pmem1', ... ]
> 
> the following qemu options are built
>   -machine <existing options>,nvdimm
>   -m <existing options>,slots=$NR_SLOTS,maxmem=$MEM_SIZE
>   -object memory-backend-xen,id=mem1,size=$PMEM0_SIZE,mem-path=/path/to/pmem0
>   -device nvdimm,id=nvdimm1,memdev=mem1
>   -object memory-backend-xen,id=mem2,size=$PMEM1_SIZE,mem-path=/path/to/pmem1
>   -device nvdimm,id=nvdimm2,memdev=mem2
>   ...
> where
> * NR_SLOTS is the number of entries in vnvdimms + 1,
> * MEM_SIZE is the total size of all RAM and NVDIMM devices,
> * PMEM#_SIZE is the size of the host pmem device/file '/path/to/pmem#'.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_dm.c      | 113 +++++++++++++++++++++++++++++++++++++++++++-
>  tools/libxl/libxl_types.idl |   8 ++++
>  tools/libxl/xl_cmdimpl.c    |  16 +++++++

You probably also want this new parameter in the xl manpage.

>  3 files changed, 135 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index ad366a8..6b8c019 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -24,6 +24,10 @@
>  #include <sys/types.h>
>  #include <pwd.h>
>  
> +#if defined(__linux__)
> +#include <linux/fs.h>
> +#endif
> +
>  static const char *libxl_tapif_script(libxl__gc *gc)
>  {
>  #if defined(__linux__) || defined(__FreeBSD__)
> @@ -905,6 +909,86 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *target_path,
>      return drive;
>  }
>  
> +#if defined(__linux__)
> +
> +static uint64_t libxl__build_dm_vnvdimm_args(libxl__gc *gc, flexarray_t *dm_args,
> +                                             struct libxl_device_vnvdimm *dev,
> +                                             int dev_no)
> +{
> +    int fd, rc;
> +    struct stat st;
> +    uint64_t size = 0;
> +    char *arg;
> +
> +    fd = open(dev->file, O_RDONLY);
> +    if (fd < 0) {
> +        LOG(ERROR, "failed to open file %s: %s",
> +            dev->file, strerror(errno));
> +        goto out;
> +    }
> +
> +    if (stat(dev->file, &st)) {
> +        LOG(ERROR, "failed to get status of file %s: %s",
> +            dev->file, strerror(errno));
> +        goto out_fclose;
> +    }
> +
> +    switch (st.st_mode & S_IFMT) {
> +    case S_IFBLK:
> +        rc = ioctl(fd, BLKGETSIZE64, &size);
> +        if (rc == -1) {
> +            LOG(ERROR, "failed to get size of block device %s: %s",
> +                dev->file, strerror(errno));
> +            size = 0;
> +        }
> +        break;
> +
> +    case S_IFREG:
> +        size = st.st_size;
> +        break;
> +
> +    default:
> +        LOG(ERROR, "%s is not a block device or regular file", dev->file);
> +        break;
> +    }
> +
> +    if (!size)
> +        goto out_fclose;
> +
> +    flexarray_append(dm_args, "-object");
> +    arg = GCSPRINTF("memory-backend-xen,id=mem%d,size=%"PRIu64",mem-path=%s",
> +                    dev_no + 1, size, dev->file);
> +    flexarray_append(dm_args, arg);
> +
> +    flexarray_append(dm_args, "-device");
> +    arg = GCSPRINTF("nvdimm,id=nvdimm%d,memdev=mem%d", dev_no + 1, dev_no + 1);
> +    flexarray_append(dm_args, arg);
> +
> + out_fclose:
> +    close(fd);
> + out:
> +    return size;
> +}
> +
> +static uint64_t libxl__build_dm_vnvdimms_args(
> +    libxl__gc *gc, flexarray_t *dm_args,
> +    struct libxl_device_vnvdimm *vnvdimms, int num_vnvdimms)
> +{
> +    uint64_t total_size = 0, size;
> +    int i;

unsigned int
> +

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs
  2016-10-10  0:32 ` [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
  2017-01-27 21:47   ` Konrad Rzeszutek Wilk
@ 2017-01-27 21:48   ` Konrad Rzeszutek Wilk
  2017-02-08  5:47     ` Haozhong Zhang
  1 sibling, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 21:48 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On Mon, Oct 10, 2016 at 08:32:31AM +0800, Haozhong Zhang wrote:
> For xl vNVDIMM configs
>   vnvdimms = [ '/path/to/pmem0', '/path/to/pmem1', ... ]
> 
> the following qemu options are built
>   -machine <existing options>,nvdimm
>   -m <existing options>,slots=$NR_SLOTS,maxmem=$MEM_SIZE
>   -object memory-backend-xen,id=mem1,size=$PMEM0_SIZE,mem-path=/path/to/pmem0
>   -device nvdimm,id=nvdimm1,memdev=mem1
>   -object memory-backend-xen,id=mem2,size=$PMEM1_SIZE,mem-path=/path/to/pmem1
>   -device nvdimm,id=nvdimm2,memdev=mem2
>   ...

Also you may want to say which patch (just the title) adds support
for these parameters?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests
  2016-10-10  0:32 ` [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests Haozhong Zhang
@ 2017-01-27 22:06   ` Konrad Rzeszutek Wilk
  2017-01-27 22:09     ` Konrad Rzeszutek Wilk
  2017-02-08  5:59     ` Haozhong Zhang
  0 siblings, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 22:06 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On Mon, Oct 10, 2016 at 08:32:32AM +0800, Haozhong Zhang wrote:
> We can map host pmem devices or files on pmem devices to guests. This
> patch adds support to map pmem devices. The implementation relies on the
> Linux pmem driver, so it currently functions only when libxl is compiled

Perhaps say when the pmem driver was introduced and also what CONFIG
option to use to enable it?
> for Linux.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/Makefile       |   2 +-
>  tools/libxl/libxl_nvdimm.c | 210 +++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_nvdimm.h |  45 ++++++++++
>  3 files changed, 256 insertions(+), 1 deletion(-)
>  create mode 100644 tools/libxl/libxl_nvdimm.c
>  create mode 100644 tools/libxl/libxl_nvdimm.h
> 
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index a904927..ecc9ae1 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -106,7 +106,7 @@ ifeq ($(CONFIG_NetBSD),y)
>  LIBXL_OBJS-y += libxl_netbsd.o
>  else
>  ifeq ($(CONFIG_Linux),y)
> -LIBXL_OBJS-y += libxl_linux.o
> +LIBXL_OBJS-y += libxl_linux.o libxl_nvdimm.o
>  else
>  ifeq ($(CONFIG_FreeBSD),y)
>  LIBXL_OBJS-y += libxl_freebsd.o
> diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
> new file mode 100644
> index 0000000..7bcbaaf
> --- /dev/null
> +++ b/tools/libxl/libxl_nvdimm.c
> @@ -0,0 +1,210 @@
> +/*
> + * tools/libxl/libxl_nvdimm.c
> + *
> + * Copyright (c) 2016, Intel Corporation.

2017 now.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or

LGPL please. libxl uses that license.

> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
> + */
> +
> +#include <stdlib.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <stdint.h>
> +
> +#include "libxl_internal.h"
> +#include "libxl_arch.h"
> +#include "libxl_nvdimm.h"
> +
> +#include <xc_dom.h>
> +
> +#define BLK_DEVICE_ROOT "/sys/dev/block"
> +
> +static int nvdimm_sysfs_read(libxl__gc *gc,
> +                             unsigned int major, unsigned int minor,
> +                             const char *name, void **data_r)
> +{
> +    char *path = libxl__sprintf(gc, BLK_DEVICE_ROOT"/%u:%u/device/%s",
> +                                major, minor, name);
> +    return libxl__read_sysfs_file_contents(gc, path, data_r, NULL);
> +}
> +
> +static int nvdimm_get_spa(libxl__gc *gc, unsigned int major, unsigned int minor,
> +                          uint64_t *spa_r)
> +{
> +    void *data;
> +    int ret = nvdimm_sysfs_read(gc, major, minor, "resource", &data);
> +
> +    if ( ret )
> +        return ret;
> +
> +    *spa_r = strtoll(data, NULL, 0);
> +    return 0;
> +}
> +
> +static int nvdimm_get_size(libxl__gc *gc, unsigned int major, unsigned int minor,
> +                           uint64_t *size_r)
> +{
> +    void *data;
> +    int ret = nvdimm_sysfs_read(gc, major, minor, "size", &data);
> +
> +    if ( ret )
> +        return ret;
> +
> +    *size_r = strtoll(data, NULL, 0);
> +
> +    return 0;
> +}
> +
> +static int add_pages(libxl__gc *gc, uint32_t domid,
> +                     xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
> +{
> +    unsigned int nr;
> +    int ret = 0;
> +
> +    while ( nr_mfns )
> +    {
> +        nr = min(nr_mfns, (unsigned long) UINT_MAX);
No need for space                           ^- here.
> +
> +        ret = xc_domain_populate_pmemmap(CTX->xch, domid, mfn, gpfn, nr);
> +        if ( ret )
> +        {
> +            LOG(ERROR, "failed to map pmem pages, "
> +                "mfn 0x%" PRIx64", gpfn 0x%" PRIx64 ", nr_mfns %u, err %d",
> +                mfn, gpfn, nr, ret);
> +            break;
> +        }
> +
> +        nr_mfns -= nr;
> +        mfn += nr;
> +        gpfn += nr;
> +    }
> +
> +    return ret;
> +}
> +
> +static int add_file(libxl__gc *gc, uint32_t domid, int fd,
> +                    xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
> +{
> +    return -EINVAL;

Hehehehe..
> +}
> +
> +int libxl_nvdimm_add_device(libxl__gc *gc,
> +                            uint32_t domid, const char *path,
> +                            uint64_t guest_spa, uint64_t guest_size)
> +{
> +    int fd;
> +    struct stat st;
> +    unsigned int major, minor;
> +    uint64_t host_spa, host_size;
> +    xen_pfn_t mfn, gpfn;
> +    unsigned long nr_gpfns;
> +    int ret;
> +
> +    if ( (guest_spa & ~XC_PAGE_MASK) || (guest_size & ~XC_PAGE_MASK) )
> +        return -EINVAL;

That is the wrong return value. The libxl functions return enum
libxl_error

Please use those throughout the code.
> +
> +    fd = open(path, O_RDONLY);
> +    if ( fd < 0 )
> +    {
> +        LOG(ERROR, "failed to open file %s (err: %d)", path, errno);
> +        return -EIO;
> +    }
> +
> +    ret = fstat(fd, &st);
> +    if ( ret )
> +    {
> +        LOG(ERROR, "failed to get status of file %s (err: %d)",
> +            path, errno);
> +        goto out;
> +    }
> +
> +    switch ( st.st_mode & S_IFMT )
> +    {
> +    case S_IFBLK:
> +        major = major(st.st_rdev);
> +        minor = minor(st.st_rdev);
> +        break;
> +
> +    case S_IFREG:
> +        major = major(st.st_dev);
> +        minor = minor(st.st_dev);
> +        break;
> +
> +    default:
> +        LOG(ERROR, "%s is neither a block device nor a regular file", path);
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ret = nvdimm_get_spa(gc, major, minor, &host_spa);
> +    if ( ret )
> +    {
> +        LOG(ERROR, "failed to get SPA of device %u:%u", major, minor);
> +        goto out;
> +    }
> +    else if ( host_spa & ~XC_PAGE_MASK )
> +    {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ret = nvdimm_get_size(gc, major, minor, &host_size);
> +    if ( ret )
> +    {
> +        LOG(ERROR, "failed to get size of device %u:%u", major, minor);
> +        goto out;
> +    }
> +    else if ( guest_size > host_size )
> +    {
> +        LOG(ERROR, "vNVDIMM size %" PRIu64 " expires NVDIMM size %" PRIu64,

expires? larger?
> +            guest_size, host_size);
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    mfn = host_spa >> XC_PAGE_SHIFT;
> +    gpfn = guest_spa >> XC_PAGE_SHIFT;
> +    nr_gpfns = guest_size >> XC_PAGE_SHIFT;
> +
> +    switch ( st.st_mode & S_IFMT )
> +    {
> +    case S_IFBLK:
> +        ret = add_pages(gc, domid, mfn, gpfn, nr_gpfns);

You will need to change the return value.
> +        break;
> +
> +    case S_IFREG:
> +        ret = add_file(gc, domid, fd, mfn, gpfn, nr_gpfns);

Ditto here.
> +        break;
> +
> +    default:
> +        LOG(ERROR, "%s is neither a block device nor a regular file", path);
> +        ret = -EINVAL;
> +    }
> +
> + out:
> +    close(fd);
> +    return ret;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/libxl/libxl_nvdimm.h b/tools/libxl/libxl_nvdimm.h
> new file mode 100644
> index 0000000..4de2fb2
> --- /dev/null
> +++ b/tools/libxl/libxl_nvdimm.h

Why not add it in libxl.h?

(Along with the LIBXL_HAVE_NVDIMM and such?)

> @@ -0,0 +1,45 @@
> +/*
> + * tools/libxl/libxl_nvdimm.h
> + *
> + * Copyright (c) 2016, Intel Corporation.

Now 2017
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by

And please relicense it under LGPL

> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
> + */
> +
> +#ifndef LIBXL_NVDIMM_H
> +#define LIBXL_NVDIMM_H
> +
> +#include <stdint.h>
> +#include "libxl_internal.h"
> +
> +#if defined(__linux__)
> +
> +int libxl_nvdimm_add_device(libxl__gc *gc,
> +                            uint32_t domid, const char *path,
> +                            uint64_t spa, uint64_t length);
> +
> +#else
> +
> +int libxl_nvdimm_add_device(libxl__gc *gc,
> +                            uint32_t domid, const char *path,
> +                            uint64_t spa, uint64_t length)
> +{
> +    return -EINVAL;
> +}
> +
> +#endif /* __linux__ */
> +
> +#endif /* !LIBXL_NVDIMM_H */
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests
  2017-01-27 22:06   ` Konrad Rzeszutek Wilk
@ 2017-01-27 22:09     ` Konrad Rzeszutek Wilk
  2017-02-08  5:59     ` Haozhong Zhang
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 22:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian Jackson, Haozhong Zhang, Xiao Guangrong, Wei Liu, xen-devel

..snip..
> > +    mfn = host_spa >> XC_PAGE_SHIFT;
> > +    gpfn = guest_spa >> XC_PAGE_SHIFT;
> > +    nr_gpfns = guest_size >> XC_PAGE_SHIFT;
> > +
> > +    switch ( st.st_mode & S_IFMT )
> > +    {
> > +    case S_IFBLK:
> > +        ret = add_pages(gc, domid, mfn, gpfn, nr_gpfns);
> 
> You will need to change the return value.
> > +        break;
> > +
> > +    case S_IFREG:
> > +        ret = add_file(gc, domid, fd, mfn, gpfn, nr_gpfns);
> 
> Ditto here.

Also should we close the fd descriptor if there are any errors?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 14/16] tools/libxl: add support to map files on pmem devices to guests
  2016-10-10  0:32 ` [RFC XEN PATCH 14/16] tools/libxl: add support to map files on pmem devices " Haozhong Zhang
@ 2017-01-27 22:10   ` Konrad Rzeszutek Wilk
  2017-02-08  6:03     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 22:10 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On Mon, Oct 10, 2016 at 08:32:33AM +0800, Haozhong Zhang wrote:
> We can map host pmem devices or files on pmem devices to guests. This
> patch adds support to map files on pmem devices. The implementation
> relies on the Linux pmem driver and kernel APIs, so it currently

May want to mention which CONFIG_ options are needed.
> functions only when libxl is compiled for Linux.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_nvdimm.c | 73 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 72 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
> index 7bcbaaf..b3ba19a 100644
> --- a/tools/libxl/libxl_nvdimm.c
> +++ b/tools/libxl/libxl_nvdimm.c
> @@ -25,6 +25,9 @@
>  #include <unistd.h>
>  #include <errno.h>
>  #include <stdint.h>
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +#include <linux/fiemap.h>
>  
>  #include "libxl_internal.h"
>  #include "libxl_arch.h"
> @@ -97,10 +100,78 @@ static int add_pages(libxl__gc *gc, uint32_t domid,
>      return ret;
>  }
>  
> +static uint64_t
> +get_file_extents(libxl__gc *gc, int fd, unsigned long length,
> +                 struct fiemap_extent **extents_r)
> +{
> +    struct fiemap *fiemap;
> +    uint64_t nr_extents = 0, extents_size;
> +
> +    fiemap = libxl__zalloc(gc, sizeof(*fiemap));
> +    if ( !fiemap )
> +        goto out;
> +
> +    fiemap->fm_length = length;
> +    if ( ioctl(fd, FS_IOC_FIEMAP, fiemap) < 0 )
> +        goto out;
> +
> +    nr_extents = fiemap->fm_mapped_extents;
> +    extents_size = sizeof(struct fiemap_extent) * nr_extents;
> +    fiemap = libxl__realloc(gc, fiemap, sizeof(*fiemap) + extents_size);
> +    if ( !fiemap )
> +        goto out;
> +
> +    memset(fiemap->fm_extents, 0, extents_size);
> +    fiemap->fm_extent_count = nr_extents;
> +    fiemap->fm_mapped_extents = 0;
> +
> +    if ( ioctl(fd, FS_IOC_FIEMAP, fiemap) < 0 )
> +        goto out;
> +
> +    *extents_r = fiemap->fm_extents;
> +
> + out:
> +    return nr_extents;
> +}
> +
>  static int add_file(libxl__gc *gc, uint32_t domid, int fd,
>                      xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
>  {
> -    return -EINVAL;
> +    struct fiemap_extent *extents;
> +    uint64_t nr_extents, i;
> +    int ret = 0;
> +
> +    nr_extents = get_file_extents(gc, fd, nr_mfns << XC_PAGE_SHIFT, &extents);
> +    if ( !nr_extents )
> +        return -EIO;
> +
> +    for ( i = 0; i < nr_extents; i++ )
> +    {
> +        uint64_t p_offset = extents[i].fe_physical;
> +        uint64_t l_offset = extents[i].fe_logical;
> +        uint64_t length = extents[i].fe_length;
> +
> +        if ( extents[i].fe_flags & ~FIEMAP_EXTENT_LAST )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( (p_offset | l_offset | length) & ~XC_PAGE_MASK )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        ret = add_pages(gc, domid,
> +                        mfn + (p_offset >> XC_PAGE_SHIFT),
> +                        gpfn + (l_offset >> XC_PAGE_SHIFT),
> +                        length >> XC_PAGE_SHIFT);
> +        if ( ret )
> +            break;
> +    }
> +
> +    return ret;
>  }
>  
>  int libxl_nvdimm_add_device(libxl__gc *gc,
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2016-10-10  0:32 ` [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations() Haozhong Zhang
@ 2017-01-27 22:11   ` Konrad Rzeszutek Wilk
  2017-02-08  6:07     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 22:11 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> If any error code is returned when creating a domain, stop the domain
> creation.

This looks like it is a bug-fix that can be spun off from this
patchset?

> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_create.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index d986cd2..24e8368 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
>      if (dcs->sdss.dm.guest_domid) {
>          if (d_config->b_info.device_model_version
>              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
> -            libxl__qmp_initializations(gc, domid, d_config);
> +            ret = libxl__qmp_initializations(gc, domid, d_config);
> +            if (ret)
> +                goto error_out;
>          }
>      }
>  
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 16/16] tools/libxl: initiate pmem mapping via qmp callback
  2016-10-10  0:32 ` [RFC XEN PATCH 16/16] tools/libxl: initiate pmem mapping via qmp callback Haozhong Zhang
@ 2017-01-27 22:13   ` Konrad Rzeszutek Wilk
  2017-02-08  6:08     ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 22:13 UTC (permalink / raw)
  To: Haozhong Zhang; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On Mon, Oct 10, 2016 at 08:32:35AM +0800, Haozhong Zhang wrote:
> QMP command 'query-nvdimms' is used by libxl to get the backend, the
> guest SPA and size of each vNVDIMM device, and then libxl starts mapping
> backend to guest for each vNVDIMM device.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_qmp.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
> index f8addf9..02edd09 100644
> --- a/tools/libxl/libxl_qmp.c
> +++ b/tools/libxl/libxl_qmp.c
> @@ -26,6 +26,7 @@
>  
>  #include "_libxl_list.h"
>  #include "libxl_internal.h"
> +#include "libxl_nvdimm.h"
>  
>  /* #define DEBUG_RECEIVED */
>  
> @@ -1146,6 +1147,66 @@ out:
>      return rc;
>  }
>  
> +static int qmp_register_nvdimm_callback(libxl__qmp_handler *qmp,
> +                                        const libxl__json_object *o,
> +                                        void *unused)
> +{
> +    GC_INIT(qmp->ctx);
> +    const libxl__json_object *obj = NULL;
> +    const libxl__json_object *sub_obj = NULL;
> +    int i = 0;

unsigned int.
> +    const char *mem_path;
> +    uint64_t slot, spa, length;
> +    int ret = 0;
> +
> +    for (i = 0; (obj = libxl__json_array_get(o, i)); i++) {
> +        if (!libxl__json_object_is_map(obj))
> +            continue;
> +
> +        sub_obj = libxl__json_map_get("slot", obj, JSON_INTEGER);
> +        slot = libxl__json_object_get_integer(sub_obj);
> +
> +        sub_obj = libxl__json_map_get("mem-path", obj, JSON_STRING);
> +        mem_path = libxl__json_object_get_string(sub_obj);
> +        if (!mem_path) {
> +            LOG(ERROR, "No mem-path is specified for NVDIMM #%" PRId64, slot);
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +
> +        sub_obj = libxl__json_map_get("spa", obj, JSON_INTEGER);
> +        spa = libxl__json_object_get_integer(sub_obj);
> +
> +        sub_obj = libxl__json_map_get("length", obj, JSON_INTEGER);
> +        length = libxl__json_object_get_integer(sub_obj);
> +
> +        LOG(DEBUG,
> +            "vNVDIMM #%" PRId64 ": %s, spa 0x%" PRIx64 ", length 0x%" PRIx64,
> +            slot, mem_path, spa, length);
> +
> +        ret = libxl_nvdimm_add_device(gc, qmp->domid, mem_path, spa, length);
> +        if (ret) {
> +            LOG(ERROR,
> +                "Failed to add NVDIMM #%" PRId64
> +                "(mem_path %s, spa 0x%" PRIx64 ", length 0x%" PRIx64 ") "
> +                "to domain %d (err = %d)",
> +                slot, mem_path, spa, length, qmp->domid, ret);
> +            goto out;
> +        }
> +    }
> +
> + out:
> +    GC_FREE;
> +    return ret;
> +}
> +
> +static int libxl__qmp_query_nvdimms(libxl__qmp_handler *qmp)
> +{
> +    return qmp_synchronous_send(qmp, "query-nvdimms", NULL,
> +                                qmp_register_nvdimm_callback,
> +                                NULL, qmp->timeout);
> +}
> +
>  int libxl__qmp_hmp(libxl__gc *gc, int domid, const char *command_line,
>                     char **output)
>  {
> @@ -1187,6 +1248,9 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
>      if (!ret) {
>          ret = qmp_query_vnc(qmp);
>      }
> +    if (!ret && guest_config->num_vnvdimms) {
> +        ret = libxl__qmp_query_nvdimms(qmp);
> +    }
>      libxl__qmp_close(qmp);
>      return ret;
>  }
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model
  2017-01-27 20:44   ` Konrad Rzeszutek Wilk
@ 2017-02-08  1:39     ` Haozhong Zhang
  2017-02-08 14:31       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  1:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On 01/27/17 15:44 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:25AM +0800, Haozhong Zhang wrote:
>> One guest page is reserved for the device model to place guest ACPI. The
>
>guest ACPI what? ACPI SSDT? MADT?

For NVDIMM, it includes NFIT and SSDT. However, the mechanism
implemented in this and following libacpi patches can be a generic one
to pass guest ACPI from device model. A simple conflict detection is
implemented in patch 11.

>
>Also why one page? What if there is a need for more than one page?
>
>You add  HVM_XS_DM_ACPI_LENGTH which makes me think this is accounted
>for?
>

One page is enough for NVDIMM (NFIT + SSDT). I don't see fundamental
restriction to not allow more one page, so I use HVM_XS_DM_ACPI_LENGTH
to pass the reserved size to allow changing the size in future.

>> base address and size of the reserved area are passed to the device
>> model via XenStore keys hvmloader/dm-acpi/{address, length}.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxc/include/xc_dom.h            |  1 +
>>  tools/libxc/xc_dom_x86.c                |  7 +++++++
>>  tools/libxl/libxl_dom.c                 | 25 +++++++++++++++++++++++++
>>  xen/include/public/hvm/hvm_xs_strings.h | 11 +++++++++++
>>  4 files changed, 44 insertions(+)
>>
>> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
>> index 608cbc2..19d65cd 100644
>> --- a/tools/libxc/include/xc_dom.h
>> +++ b/tools/libxc/include/xc_dom.h
>> @@ -98,6 +98,7 @@ struct xc_dom_image {
>>      xen_pfn_t xenstore_pfn;
>>      xen_pfn_t shared_info_pfn;
>>      xen_pfn_t bootstack_pfn;
>> +    xen_pfn_t dm_acpi_pfn;
>
>Perhaps an pointer to an variable size array?
>
> xen_pfn_t *dm_acpi_pfns;
> unsigned int dm_apci_nr;
>
>?

dm_acpi_pfn is passed to QEMU via xenstore. Though passing an array of
pfns via xenstore is also doable, a pair of base pfn and length is
simpler.

>>      xen_pfn_t pfn_alloc_end;
>>      xen_vaddr_t virt_alloc_end;
>>      xen_vaddr_t bsd_symtab_start;
>> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
>> index 0eab8a7..47f14a1 100644
>> --- a/tools/libxc/xc_dom_x86.c
>> +++ b/tools/libxc/xc_dom_x86.c
>> @@ -674,6 +674,13 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
>>                           ioreq_server_pfn(0));
>>          xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
>>                           NR_IOREQ_SERVER_PAGES);
>> +
>> +        dom->dm_acpi_pfn = xc_dom_alloc_page(dom, "DM ACPI");
>> +        if ( dom->dm_acpi_pfn == INVALID_PFN )
>> +        {
>> +            DOMPRINTF("Could not allocate page for device model ACPI.");
>> +            goto error_out;
>> +        }
>>      }
>>
>>      rc = xc_dom_alloc_segment(dom, &dom->start_info_seg,
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index d519c8d..f0a1d97 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -865,6 +865,31 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
>>              goto err;
>>      }
>>
>> +    if (dom->dm_acpi_pfn) {
>> +        uint64_t guest_addr_out = dom->dm_acpi_pfn * XC_DOM_PAGE_SIZE(dom);
>> +
>> +        if (guest_addr_out >= 0x100000000ULL) {
>> +            LOG(ERROR,
>> +                "Guest address of DM ACPI is 0x%"PRIx64", but expected below 4G",
>> +                guest_addr_out);
>> +            goto err;
>> +        }
>> +
>> +        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_ADDRESS, domid);
>> +
>> +        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
>> +                               guest_addr_out);
>> +        if (ret)
>> +            goto err;
>> +
>> +        path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_LENGTH, domid);
>> +
>> +        ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
>> +                               (uint64_t) XC_DOM_PAGE_SIZE(dom));
>I don't think you need the space here:      ^

will remove the space here and in other patches

Thanks,
Haozhong

>> +        if (ret)
>> +            goto err;
>> +    }
>> +
>>      return 0;
>>
>>  err:
>> diff --git a/xen/include/public/hvm/hvm_xs_strings.h b/xen/include/public/hvm/hvm_xs_strings.h
>> index 146b0b0..f44f71f 100644
>> --- a/xen/include/public/hvm/hvm_xs_strings.h
>> +++ b/xen/include/public/hvm/hvm_xs_strings.h
>> @@ -79,4 +79,15 @@
>>   */
>>  #define HVM_XS_OEM_STRINGS             "bios-strings/oem-%d"
>>
>> +/* Follows are XenStore keys for DM ACPI (ACPI built by device model,
>> + * e.g. QEMU).
>> + *
>> + * A reserved area of guest physical memory is used to pass DM
>> + * ACPI. Values of following two keys specify the base address and
>> + * length (in bytes) of the reserved area.
>> + */
>> +#define HVM_XS_DM_ACPI_ROOT              "hvmloader/dm-acpi"
>> +#define HVM_XS_DM_ACPI_ADDRESS           HVM_XS_DM_ACPI_ROOT"/address"
>> +#define HVM_XS_DM_ACPI_LENGTH            HVM_XS_DM_ACPI_ROOT"/length"
>> +
>>  #endif /* __XEN_PUBLIC_HVM_HVM_XS_STRINGS_H__ */
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 07/16] tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address
  2017-01-27 20:46   ` Konrad Rzeszutek Wilk
@ 2017-02-08  1:42     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  1:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On 01/27/17 15:46 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:26AM +0800, Haozhong Zhang wrote:
>> This callback is used when libacpi needs to in-place access ACPI built
>> by the device model, whose address is specified in the physical address.
>
>May I recommend you write:
>
>This callback is used when libacpi needs to access ACPI blobs
>built by the device-model. The address is provided as an physical
>address on XenBus (see patch titled: XYZ).
>
>?

Sure.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 08/16] tools/libacpi: expose details of memory allocation callback
  2017-01-27 20:58   ` Konrad Rzeszutek Wilk
@ 2017-02-08  2:12     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  2:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On 01/27/17 15:58 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:27AM +0800, Haozhong Zhang wrote:
>> Expose the minimal allocation unit and the minimal alignment used by the
>> memory allocator, so that certain ACPI code (e.g. the AML builder added
>> later) can get contiguous memory allocated by multiple calls to
>
>s/later/in patch titled: "XYZ"/
>

will do

>> acpi_ctxt.mem_ops.alloc().
>
>This contingous memory is virtual or physical? You may want to be
>specific.
>

physical

>And you may want to say that acpi_build_tables uses that by default
>which is why you have the value of sixteen.

yes

>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/firmware/hvmloader/util.c | 2 ++
>>  tools/libacpi/libacpi.h         | 3 +++
>>  tools/libxl/libxl_x86_acpi.c    | 2 ++
>>  3 files changed, 7 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>> index 1fe8dcc..504ae6a 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -972,6 +972,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>>      ctxt.mem_ops.free = acpi_mem_free;
>>      ctxt.mem_ops.v2p = acpi_v2p;
>>      ctxt.mem_ops.p2v = acpi_p2v;
>> +    ctxt.min_alloc_unit = PAGE_SIZE;
>
>?? Really? That seems excessive as the acpi_build_tables calls
>ctxt->mem_ops.alloc :
>$ grep "ctxt->mem_ops.alloc" * | wc -l
>20
>
>That would imply 20 pages ?
>

I'm wrong here. I just relooked at mem_ops.alloc provided by hvmloader
and libxl_x86_acpi and found both allocates by byte, rather than by page.
I'll remove this field.

>> +    ctxt.min_alloc_align = 16;
>
>Does that mean it is sixteen pages aligment? Or 16 bytes aligment?
>
>If the bytes perhaps you want to change the name to
>'min_alloc_byte_align' ?

It's for byte alignment. I'll rename this field.

Thanks,
Haozhong

>
>>
>>      acpi_build_tables(&ctxt, config);
>>
>> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
>> index 62e90ab..0fb16e7 100644
>> --- a/tools/libacpi/libacpi.h
>> +++ b/tools/libacpi/libacpi.h
>> @@ -47,6 +47,9 @@ struct acpi_ctxt {
>>          unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
>>          void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p);
>>      } mem_ops;
>> +
>> +    uint32_t min_alloc_unit;
>> +    uint32_t min_alloc_align;
>>  };
>>
>>  struct acpi_config {
>> diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
>> index aa5b83d..baf60ac 100644
>> --- a/tools/libxl/libxl_x86_acpi.c
>> +++ b/tools/libxl/libxl_x86_acpi.c
>> @@ -187,6 +187,8 @@ int libxl__dom_load_acpi(libxl__gc *gc,
>>      libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
>>      libxl_ctxt.c.mem_ops.p2v = phys_to_virt;
>>      libxl_ctxt.c.mem_ops.free = acpi_mem_free;
>> +    libxl_ctxt.c.min_alloc_unit = libxl_ctxt.page_size;
>> +    libxl_ctxt.c.min_alloc_align = 16;
>>
>>      rc = init_acpi_config(gc, dom, b_info, &config);
>>      if (rc) {
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 09/16] tools/libacpi: add callbacks to access XenStore
  2017-01-27 21:10   ` Konrad Rzeszutek Wilk
@ 2017-02-08  2:19     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  2:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On 01/27/17 16:10 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:28AM +0800, Haozhong Zhang wrote:
>> libacpi needs to access information placed in XenStore in order to load
>> ACPI built by the device model.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/firmware/hvmloader/util.c   | 50 +++++++++++++++++++++++++++++++++++++++
>>  tools/firmware/hvmloader/util.h   |  2 ++
>>  tools/firmware/hvmloader/xenbus.c | 20 ++++++++++++++++
>>  tools/libacpi/libacpi.h           | 10 ++++++++
>>  tools/libxl/libxl_x86_acpi.c      | 24 +++++++++++++++++++
>>  5 files changed, 106 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>> index 504ae6a..dba954a 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -888,6 +888,51 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
>>      /* ACPI builder currently doesn't free memory so this is just a stub */
>>  }
>>
>> +static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
>> +{
>> +    return xenstore_read(path, NULL);
>> +}
>> +
>> +static int acpi_xs_write(struct acpi_ctxt *ctxt,
>> +                         const char *path, const char *value)
>> +{
>> +    return xenstore_write(path, value);
>> +}
>> +
>> +static unsigned int count_strings(const char *strings, unsigned int len)
>> +{
>> +    const char *p;
>> +    unsigned int n;
>> +
>> +    for ( p = strings, n = 0; p < strings + len; p++ )
>> +        if ( *p == '\0' )
>> +            n++;
>> +
>> +    return n;
>> +}
>> +
>> +static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
>> +                                const char *path, unsigned int *num)
>> +{
>> +    const char *strings;
>> +    char *s, *p, **ret;
>> +    unsigned int len, n;
>> +
>> +    strings = xenstore_directory(path, &len, NULL);
>> +    if ( !strings )
>> +        return NULL;
>> +
>> +    n = count_strings(strings, len);
>> +    ret = ctxt->mem_ops.alloc(ctxt, n * sizeof(char *) + len, 0);
>
>sizeof(*s)
>
>But you may also check ret against NULL before you memcpy data in there.
>

will add the check

>
>> +    memcpy(&ret[n], strings, len);
>> +
>> +    s = (char *)&ret[n];
>> +    for ( p = s, *num = 0; p < s + len; p+= strlen(p) + 1 )
>
>Perhaps add an space before += ?

will add

>> +        ret[(*num)++] = p;
>> +
>> +    return ret;
>> +}
>> +
>>  static uint8_t acpi_lapic_id(unsigned cpu)
>>  {
>>      return LAPIC_ID(cpu);
>> @@ -975,6 +1020,11 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>>      ctxt.min_alloc_unit = PAGE_SIZE;
>>      ctxt.min_alloc_align = 16;
>>
>> +    ctxt.xs_ops.read = acpi_xs_read;
>> +    ctxt.xs_ops.write = acpi_xs_write;
>> +    ctxt.xs_ops.directory = acpi_xs_directory;
>> +    ctxt.xs_opaque = NULL;
>> +
>>      acpi_build_tables(&ctxt, config);
>>
>>      hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
>> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
>> index 6a50dae..9443673 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -225,6 +225,8 @@ const char *xenstore_read(const char *path, const char *default_resp);
>>   */
>>  int xenstore_write(const char *path, const char *value);
>>
>> +const char *xenstore_directory(const char *path, uint32_t *len,
>> +                               const char *default_resp);
>>
>>  /* Get a HVM param.
>>   */
>> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
>> index 448157d..70bdadd 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -296,6 +296,26 @@ int xenstore_write(const char *path, const char *value)
>>      return ret;
>>  }
>>
>> +const char *xenstore_directory(const char *path, uint32_t *len,
>> +                               const char *default_resp)
>> +{
>> +    uint32_t type = 0;
>> +    const char *answer = NULL;
>> +
>> +    xenbus_send(XS_DIRECTORY,
>> +                path, strlen(path),
>> +                "", 1, /* nul separator */
>> +                NULL, 0);
>> +
>> +    if ( xenbus_recv(len, &answer, &type) || (type != XS_DIRECTORY) )
>> +        answer = NULL;
>> +
>> +    if ( (default_resp != NULL) && ((answer == NULL) || (*answer == '\0')) )
>> +        answer = default_resp;
>> +
>> +    return answer;
>
>This function looks very similar to xenstore_read. Could xenstore_read
>become __xenstore_read with an extra argument (type) and then the
>new xenstore_read along with xenstore_dir would call in it?

Yes, will do

Thanks,
Haozhong

>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
>> index 0fb16e7..12cafd8 100644
>> --- a/tools/libacpi/libacpi.h
>> +++ b/tools/libacpi/libacpi.h
>> @@ -50,6 +50,16 @@ struct acpi_ctxt {
>>
>>      uint32_t min_alloc_unit;
>>      uint32_t min_alloc_align;
>> +
>> +    struct acpi_xs_ops {
>> +        const char *(*read)(struct acpi_ctxt *ctxt, const char *path);
>> +        int (*write)(struct acpi_ctxt *ctxt,
>> +                     const char *path, const char *value);
>> +        char **(*directory)(struct acpi_ctxt *ctxt,
>> +                            const char *path, unsigned int *num);
>> +    } xs_ops;
>> +
>> +    void *xs_opaque;
>>  };
>>
>>  struct acpi_config {
>> diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
>> index baf60ac..1afd2e3 100644
>> --- a/tools/libxl/libxl_x86_acpi.c
>> +++ b/tools/libxl/libxl_x86_acpi.c
>> @@ -93,6 +93,25 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
>>  {
>>  }
>>
>> +static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
>> +{
>> +    return libxl__xs_read((libxl__gc *)ctxt->xs_opaque, XBT_NULL, path);
>> +}
>> +
>> +static int acpi_xs_write(struct acpi_ctxt *ctxt,
>> +                         const char *path, const char *value)
>> +{
>> +    return libxl__xs_write_checked((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
>> +                                   path, value);
>> +}
>> +
>> +static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
>> +                                const char *path, unsigned int *num)
>> +{
>> +    return libxl__xs_directory((libxl__gc *)ctxt->xs_opaque, XBT_NULL,
>> +                               path, num);
>> +}
>> +
>>  static uint8_t acpi_lapic_id(unsigned cpu)
>>  {
>>      return cpu * 2;
>> @@ -190,6 +209,11 @@ int libxl__dom_load_acpi(libxl__gc *gc,
>>      libxl_ctxt.c.min_alloc_unit = libxl_ctxt.page_size;
>>      libxl_ctxt.c.min_alloc_align = 16;
>>
>> +    libxl_ctxt.c.xs_ops.read = acpi_xs_read;
>> +    libxl_ctxt.c.xs_ops.write = acpi_xs_write;
>> +    libxl_ctxt.c.xs_ops.directory = acpi_xs_directory;
>> +    libxl_ctxt.c.xs_opaque = gc;
>> +
>>      rc = init_acpi_config(gc, dom, b_info, &config);
>>      if (rc) {
>>          LOG(ERROR, "init_acpi_config failed (rc=%d)", rc);
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 10/16] tools/libacpi: add a simple AML builder
  2017-01-27 21:19   ` Konrad Rzeszutek Wilk
@ 2017-02-08  2:33     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  2:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Xiao Guangrong, Andrew Cooper, Ian Jackson, xen-devel,
	ross.philipson, Jan Beulich, Wei Liu

On 01/27/17 16:19 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:29AM +0800, Haozhong Zhang wrote:
>> It is used by libacpi to generate SSDTs from ACPI namespace devices
>> built by the device model.
>
>Would it make sense to include a link to document outlining the
>the AML code? Or perhaps even just include an simple example
>of ASL and what the resulting AML code should look like?
>
>And maybe what subset of the AML code this implements?
>(with some simple ASL examples?)
>

I'll add the reference to chapters of ACPI spec and more comments and
examples.

>Also adding Ross who wrote an AML builder as well.
>

Sure

>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/firmware/hvmloader/Makefile |   3 +-
>>  tools/libacpi/aml_build.c         | 254 ++++++++++++++++++++++++++++++++++++++
>>  tools/libacpi/aml_build.h         |  83 +++++++++++++
>>  tools/libxl/Makefile              |   3 +-
>>  4 files changed, 341 insertions(+), 2 deletions(-)
>>  create mode 100644 tools/libacpi/aml_build.c
>>  create mode 100644 tools/libacpi/aml_build.h
>>
>> diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
>> index 77d7551..cf0dac3 100644
>> --- a/tools/firmware/hvmloader/Makefile
>> +++ b/tools/firmware/hvmloader/Makefile
>> @@ -79,11 +79,12 @@ smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>>
>>  ACPI_PATH = ../../libacpi
>>  ACPI_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
>> -ACPI_OBJS = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o
>> +ACPI_OBJS = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o aml_build.o
>>  $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
>>  CFLAGS += -I$(ACPI_PATH)
>>  vpath build.c $(ACPI_PATH)
>>  vpath static_tables.c $(ACPI_PATH)
>> +vpath aml_build.c $(ACPI_PATH)
>>  OBJS += $(ACPI_OBJS)
>>
>>  hvmloader: $(OBJS)
>> diff --git a/tools/libacpi/aml_build.c b/tools/libacpi/aml_build.c
>> new file mode 100644
>> index 0000000..b6f23f4
>> --- /dev/null
>> +++ b/tools/libacpi/aml_build.c
>> @@ -0,0 +1,254 @@
>> +/*
>> + * tools/libacpi/aml_build.c
>> + *
>> + * Copyright (c) 2016, Intel Corporation.
>
>.. now 2017
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>
>The libacpi is LGPL.
>
>Could this be licensed as LGPL please?
>

I didn't notice libacpi is LGPL licensed. I'll fix the license here
and below to LGPL.

>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
>> + */
>> +
>> +#include LIBACPI_STDUTILS
>> +#include "libacpi.h"
>> +#include "aml_build.h"
>> +
>> +#define AML_OP_SCOPE     0x10
>> +#define AML_OP_EXT       0x5B
>> +#define AML_OP_DEVICE    0x82
>> +
>> +#define ACPI_NAMESEG_LEN 4
>> +
>> +struct aml_build_alloctor {
>> +    struct acpi_ctxt *ctxt;
>> +    uint8_t *buf;
>> +    uint32_t capacity;
>> +    uint32_t used;
>> +};
>> +static struct aml_build_alloctor alloc;
>> +
>> +enum { ALLOC_OVERFLOW, ALLOC_NOT_NEEDED, ALLOC_NEEDED };
>
>Why not make this a named enum?

Ok, will do.

>> +
>> +static int alloc_needed(uint32_t size)
>> +{
>> +    uint32_t len = alloc.used + size;
>> +
>> +    if ( len < alloc.used )
>> +        return ALLOC_OVERFLOW;
>> +    else if ( len <= alloc.capacity )
>> +        return ALLOC_NOT_NEEDED;
>> +    else
>> +        return ALLOC_NEEDED;
>> +}
>> +
>> +static uint8_t *aml_buf_alloc(uint32_t size)
>> +{
>> +    int needed = alloc_needed(size);
>
>And then this can be an enum? Or alternatively make this unsigned int.
>

will change to enum

>> +    uint8_t *buf = NULL;
>> +    struct acpi_ctxt *ctxt = alloc.ctxt;
>> +    uint32_t alloc_size, alloc_align = ctxt->min_alloc_align;
>> +
>> +    switch ( needed )
>> +    {
>> +    case ALLOC_OVERFLOW:
>> +        break;
>> +
>> +    case ALLOC_NEEDED:
>> +        alloc_size = (size + alloc_align) & ~(alloc_align - 1);
>
>Perhaps multiply times two so we have more wiggle room?

Yes, it can allocate a little more then required. 

Thanks,
Haozhong

>
>> +        buf = ctxt->mem_ops.alloc(ctxt, alloc_size, alloc_align);
>> +        if ( !buf )
>> +            break;
>> +        if ( alloc.buf + alloc.capacity != buf )
>> +        {
>> +            buf = NULL;
>> +            break;
>> +        }
>> +        alloc.capacity += alloc_size;
>> +        alloc.used += size;
>> +        break;
>> +
>> +    case ALLOC_NOT_NEEDED:
>> +        buf = alloc.buf + alloc.used;
>> +        alloc.used += size;
>> +        break;
>> +
>> +    default:
>> +        break;
>> +    }
>> +
>> +    return buf;
>> +}
>> +
>> +static uint32_t get_package_length(uint8_t *pkg)
>> +{
>> +    uint32_t len;
>> +
>> +    len = pkg - alloc.buf;
>> +    len = alloc.used - len;
>> +
>> +    return len;
>> +}
>> +
>> +static void build_prepend_byte(uint8_t *buf, uint8_t byte)
>> +{
>> +    uint32_t len;
>> +
>> +    len = buf - alloc.buf;
>> +    len = alloc.used - len;
>> +
>> +    aml_buf_alloc(sizeof(uint8_t));
>> +    if ( len )
>> +        memmove(buf + 1, buf, len);
>> +    buf[0] = byte;
>> +}
>> +
>> +/*
>> + * XXX: names of multiple segments (e.g. X.Y.Z) are not supported
>> + */
>> +static void build_prepend_name(uint8_t *buf, const char *name)
>> +{
>> +    uint8_t *p = buf;
>> +    const char *s = name;
>> +    uint32_t len, name_len;
>> +
>> +    while ( *s == '\\' || *s == '^' )
>> +    {
>> +        build_prepend_byte(p, (uint8_t) *s);
>> +        ++p;
>> +        ++s;
>> +    }
>> +
>> +    if ( !*s )
>> +    {
>> +        build_prepend_byte(p, 0x00);
>> +        return;
>> +    }
>> +
>> +    len = p - alloc.buf;
>> +    len = alloc.used - len;
>> +    name_len = strlen(s);
>> +    ASSERT(strlen(s) <= ACPI_NAMESEG_LEN);
>> +
>> +    aml_buf_alloc(ACPI_NAMESEG_LEN);
>> +    if ( len )
>> +        memmove(p + ACPI_NAMESEG_LEN, p, len);
>> +    memcpy(p, s, name_len);
>> +    memcpy(p + name_len, "____", ACPI_NAMESEG_LEN - name_len);
>> +}
>> +
>> +enum {
>> +    PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
>> +    PACKAGE_LENGTH_2BYTE_SHIFT = 4,
>> +    PACKAGE_LENGTH_3BYTE_SHIFT = 12,
>> +    PACKAGE_LENGTH_4BYTE_SHIFT = 20,
>> +};
>> +
>> +static void build_prepend_package_length(uint8_t *pkg, uint32_t length)
>> +{
>> +    uint8_t byte;
>> +    unsigned length_bytes;
>> +
>> +    if ( length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT) )
>> +        length_bytes = 1;
>> +    else if ( length + 2 < (1 << PACKAGE_LENGTH_3BYTE_SHIFT) )
>> +        length_bytes = 2;
>> +    else if ( length + 3 < (1 << PACKAGE_LENGTH_4BYTE_SHIFT) )
>> +        length_bytes = 3;
>> +    else
>> +        length_bytes = 4;
>> +
>> +    length += length_bytes;
>> +
>> +    switch ( length_bytes )
>> +    {
>> +    case 1:
>> +        byte = length;
>> +        build_prepend_byte(pkg, byte);
>> +        return;
>> +    case 4:
>> +        byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT;
>> +        build_prepend_byte(pkg, byte);
>> +        length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1;
>> +        /* fall through */
>> +    case 3:
>> +        byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT;
>> +        build_prepend_byte(pkg, byte);
>> +        length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1;
>> +        /* fall through */
>> +    case 2:
>> +        byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT;
>> +        build_prepend_byte(pkg, byte);
>> +        length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1;
>> +        /* fall through */
>> +    }
>> +    /*
>> +     * Most significant two bits of byte zero indicate how many following bytes
>> +     * are in PkgLength encoding.
>> +     */
>> +    byte = ((length_bytes - 1) << PACKAGE_LENGTH_1BYTE_SHIFT) | length;
>> +    build_prepend_byte(pkg, byte);
>> +}
>> +
>> +static void build_prepend_package(uint8_t *buf, uint8_t op)
>> +{
>> +    uint32_t length = get_package_length(buf);
>> +    build_prepend_package_length(buf, length);
>> +    build_prepend_byte(buf, op);
>> +}
>> +
>> +static void build_prepend_ext_packge(uint8_t *buf, uint8_t op)
>> +{
>> +    build_prepend_package(buf, op);
>> +    build_prepend_byte(buf, AML_OP_EXT);
>> +}
>> +
>> +void *aml_build_begin(struct acpi_ctxt *ctxt)
>> +{
>> +    alloc.ctxt = ctxt;
>> +    alloc.buf = ctxt->mem_ops.alloc(ctxt,
>> +                                    ctxt->min_alloc_unit, ctxt->min_alloc_align);
>> +    alloc.capacity = ctxt->min_alloc_unit;
>> +    alloc.used = 0;
>> +    return alloc.buf;
>> +}
>> +
>> +uint32_t aml_build_end(void)
>> +{
>> +    return alloc.used;
>> +}
>> +
>> +void aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t blob_length)
>> +{
>> +    uint32_t len;
>> +
>> +    len = buf - alloc.buf;
>> +    len = alloc.used - len;
>> +
>> +    aml_buf_alloc(blob_length);
>> +    if ( len )
>> +        memmove(buf + blob_length, buf, len);
>> +
>> +    memcpy(buf, blob, blob_length);
>> +}
>> +
>> +void aml_prepend_device(uint8_t *buf, const char *name)
>> +{
>> +    build_prepend_name(buf, name);
>> +    build_prepend_ext_packge(buf, AML_OP_DEVICE);
>> +}
>> +
>> +void aml_prepend_scope(uint8_t *buf, const char *name)
>> +{
>> +    build_prepend_name(buf, name);
>> +    build_prepend_package(buf, AML_OP_SCOPE);
>> +}
>> diff --git a/tools/libacpi/aml_build.h b/tools/libacpi/aml_build.h
>> new file mode 100644
>> index 0000000..ed68f66
>> --- /dev/null
>> +++ b/tools/libacpi/aml_build.h
>> @@ -0,0 +1,83 @@
>> +/*
>> + * tools/libacpi/aml_build.h
>> + *
>> + * Copyright (c) 2016, Intel Corporation.
>
>Now 2017.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>
>Again this needs to be LGPL license.
>
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
>> + */
>> +
>> +#ifndef _AML_BUILD_H_
>> +#define _AML_BUILD_H_
>> +
>> +#include <stdint.h>
>> +#include "libacpi.h"
>> +
>> +/*
>> + * NB: All aml_prepend_* calls, which build AML code in one ACPI
>> + *     table, should be placed between a pair of calls to
>> + *     aml_build_begin() and aml_build_end().
>> + */
>> +
>> +/**
>> + * Reset the AML builder and begin a new round of building.
>> + *
>> + * Parameters:
>> + *   @ctxt: ACPI context used by the AML builder
>> + *
>> + * Returns:
>> + *   a pointer to the builder buffer where the AML code will be stored
>> + */
>> +void *aml_build_begin(struct acpi_ctxt *ctxt);
>> +
>> +/**
>> + * Mark the end of a round of AML building.
>> + *
>> + * Returns:
>> + *  the number of bytes in the builder buffer built in this round
>> + */
>> +uint32_t aml_build_end(void);
>> +
>> +/**
>> + * Prepend a blob, which can contain arbitrary content, to the builder buffer.
>> + *
>> + * Parameters:
>> + *   @buf:    pointer to the builder buffer
>> + *   @blob:   pointer to the blob
>> + *   @length: the number of bytes in the blob
>> + */
>> +void aml_prepend_blob(uint8_t *buf, const void *blob, uint32_t length);
>> +
>> +/**
>> + * Prepend an AML device structure to the builder buffer. The existing
>> + * data in the builder buffer is included in the AML device.
>> + *
>> + * Parameters:
>> + *   @buf:  pointer to the builder buffer
>> + *   @name: the name of the device
>> + */
>> +void aml_prepend_device(uint8_t *buf, const char *name);
>> +
>> +/**
>> + * Prepend an AML scope structure to the builder buffer. The existing
>> + * data in the builder buffer is included in the AML scope.
>> + *
>> + * Parameters:
>> + *   @buf:  pointer to the builder buffer
>> + *   @name: the name of the scope
>> + */
>> +void aml_prepend_scope(uint8_t *buf, const char *name);
>> +
>> +#endif /* _AML_BUILD_H_ */
>> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
>> index c4e4117..a904927 100644
>> --- a/tools/libxl/Makefile
>> +++ b/tools/libxl/Makefile
>> @@ -77,11 +77,12 @@ endif
>>
>>  ACPI_PATH  = $(XEN_ROOT)/tools/libacpi
>>  ACPI_FILES = dsdt_pvh.c
>> -ACPI_OBJS  = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o
>> +ACPI_OBJS  = $(patsubst %.c,%.o,$(ACPI_FILES)) build.o static_tables.o aml_build.o
>>  $(ACPI_FILES): acpi
>>  $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/libxl_x86_acpi.h\"
>>  vpath build.c $(ACPI_PATH)/
>>  vpath static_tables.c $(ACPI_PATH)/
>> +vpath aml_build.c $(ACPI_PATH)/
>>  LIBXL_OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
>>
>>  .PHONY: acpi
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model
  2017-01-27 21:40   ` Konrad Rzeszutek Wilk
@ 2017-02-08  5:38     ` Haozhong Zhang
  2017-02-08 14:35       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  5:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Xiao Guangrong

On 01/27/17 16:40 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:30AM +0800, Haozhong Zhang wrote:
>> ACPI tables built by the device model, whose signatures do not
>> conflict with tables built by Xen (except SSDT), are loaded after ACPI
>> tables built by Xen.
>>
>> ACPI namespace devices built by the device model, whose names do not
>> conflict with devices built by Xen, are assembled and placed in SSDTs
>> after ACPI tables built by Xen.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/firmware/hvmloader/util.c |  12 +++
>>  tools/libacpi/acpi2_0.h         |   2 +
>>  tools/libacpi/build.c           | 216 ++++++++++++++++++++++++++++++++++++++++
>>  tools/libacpi/libacpi.h         |   5 +
>>  4 files changed, 235 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>> index dba954a..e6530cd 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -998,6 +998,18 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>>      if ( !strncmp(xenstore_read("platform/acpi_s4", "1"), "1", 1)  )
>>          config->table_flags |= ACPI_HAS_SSDT_S4;
>>
>> +    s = xenstore_read(HVM_XS_DM_ACPI_ADDRESS, NULL);
>> +    if ( s )
>> +    {
>> +        config->dm.addr = strtoll(s, NULL, 0);
>> +
>> +        s = xenstore_read(HVM_XS_DM_ACPI_LENGTH, NULL);
>> +        if ( s )
>> +            config->dm.length = strtoll(s, NULL, 0);
>> +        else
>> +            config->dm.addr = 0;
>> +    }
>> +
>>      config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC | ACPI_HAS_WAET);
>>
>>      config->tis_hdr = (uint16_t *)ACPI_TIS_HDR_ADDRESS;
>> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
>> index 775eb7a..7414470 100644
>> --- a/tools/libacpi/acpi2_0.h
>> +++ b/tools/libacpi/acpi2_0.h
>> @@ -430,6 +430,7 @@ struct acpi_20_slit {
>>  #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
>>  #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
>>  #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
>> +#define ACPI_2_0_SSDT_SIGNATURE ASCII32('S','S','D','T')
>>
>>  /*
>>   * Table revision numbers.
>> @@ -445,6 +446,7 @@ struct acpi_20_slit {
>>  #define ACPI_1_0_FADT_REVISION 0x01
>>  #define ACPI_2_0_SRAT_REVISION 0x01
>>  #define ACPI_2_0_SLIT_REVISION 0x01
>> +#define ACPI_2_0_SSDT_REVISION 0x02
>>
>>  #pragma pack ()
>>
>> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
>> index 47dae01..829a365 100644
>> --- a/tools/libacpi/build.c
>> +++ b/tools/libacpi/build.c
>> @@ -20,6 +20,7 @@
>>  #include "ssdt_s4.h"
>>  #include "ssdt_tpm.h"
>>  #include "ssdt_pm.h"
>> +#include "aml_build.h"
>>  #include <xen/hvm/hvm_info_table.h>
>>  #include <xen/hvm/hvm_xs_strings.h>
>>  #include <xen/hvm/params.h>
>> @@ -55,6 +56,34 @@ struct acpi_info {
>>      uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
>>  };
>>
>> +#define DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */
>> +#define DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML definition of an ACPI namespace device */
>> +
>> +/* ACPI tables of following signatures should not appear in DM ACPI */
>
>It would be good to have some form of build check to check against
>this list..
>> +static const uint64_t dm_acpi_signature_blacklist[] = {
>> +    ACPI_2_0_RSDP_SIGNATURE,
>> +    ACPI_2_0_FACS_SIGNATURE,
>> +    ACPI_2_0_FADT_SIGNATURE,
>> +    ACPI_2_0_MADT_SIGNATURE,
>> +    ACPI_2_0_RSDT_SIGNATURE,
>> +    ACPI_2_0_XSDT_SIGNATURE,
>> +    ACPI_2_0_TCPA_SIGNATURE,
>> +    ACPI_2_0_HPET_SIGNATURE,
>> +    ACPI_2_0_WAET_SIGNATURE,
>> +    ACPI_2_0_SRAT_SIGNATURE,
>> +    ACPI_2_0_SLIT_SIGNATURE,
>> +};
>> +
>> +/* ACPI namespace devices of following names should not appear in DM ACPI */
>> +static const char *dm_acpi_devname_blacklist[] = {
>> +    "MEM0",
>> +    "PCI0",
>> +    "AC",
>> +    "BAT0",
>> +    "BAT1",
>> +    "TPM",
>
>.. and this one.
>
>But I am not even sure how one would do that?
>
>Perhaps add a big warning:
>
>"Make sure to add your table name if you this code (libacpi) is
>constructing it. "?
>
>Or maybe have some 'register_acpi_table' function which will expand
>this blacklist?
>

I think the later is better.

>> +};
>> +
>>  static void set_checksum(
>>      void *table, uint32_t checksum_offset, uint32_t length)
>>  {
>> @@ -339,6 +368,190 @@ static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
>>      return nr_added;
>>  }
>>
>> +#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
>
>That may want to go libacpi.h ?

will move to libacpi.h, so others can use it in future.

>> +
>> +static int check_signature_collision(uint64_t sig)
>
>bool

will change here and below

>> +{
>> +    int i;
>
>unsigned int

ditto

>
>> +    for ( i = 0; i < ARRAY_SIZE(dm_acpi_signature_blacklist); i++ )
>> +    {
>> +        if ( sig == dm_acpi_signature_blacklist[i] )
>> +            return 1;
>
>return true
>

ditto

>> +    }
>> +    return 0;
>> +}
>> +
>> +static int check_devname_collision(const char *name)
>
>bool
>> +{
>> +    int i;
>
>unsigned int
>
>> +    for ( i = 0; i < ARRAY_SIZE(dm_acpi_devname_blacklist); i++ )
>> +    {
>> +        if ( !strncmp(name, dm_acpi_devname_blacklist[i], 4) )
>
>That 4 could be a #define

yes

>> +            return 1;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static const char *xs_read_dm_acpi_blob_key(struct acpi_ctxt *ctxt,
>> +                                            const char *name, const char *key)
>> +{
>> +#define DM_ACPI_BLOB_PATH_MAX_LENGTH 30
>> +    char path[DM_ACPI_BLOB_PATH_MAX_LENGTH];
>> +    snprintf(path, DM_ACPI_BLOB_PATH_MAX_LENGTH, HVM_XS_DM_ACPI_ROOT"/%s/%s",
>> +             name, key);
>> +    return ctxt->xs_ops.read(ctxt, path);
>
>#undef DM_APCI_BLOB... but perhaps that should go in
>xen/include/public/hvm/hvm_xs_strings.h ?
>

will move to the header file

>> +}
>> +
>> +static int construct_dm_table(struct acpi_ctxt *ctxt,
>
>bool
>> +                              unsigned long *table_ptrs, int nr_tables,
>
>unsigned int nr_tables
>> +                              const void *blob, uint32_t length)
>> +{
>> +    const struct acpi_header *header = blob;
>> +    uint8_t *buffer;
>> +
>> +    if ( check_signature_collision(header->signature) )
>> +        return 0;
>> +
>> +    if ( header->length > length || header->length == 0 )
>> +        return 0;
>> +
>> +    buffer = ctxt->mem_ops.alloc(ctxt, header->length, 16);
>> +    if ( !buffer )
>> +        return 0;
>> +    memcpy(buffer, header, header->length);
>> +
>> +    /* some device models (e.g. QEMU) does not set checksum */
>> +    set_checksum(buffer, offsetof(struct acpi_header, checksum),
>> +                 header->length);
>> +
>> +    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
>> +
>> +    return 1;
>> +}
>> +
>> +static int construct_dm_nsdev(struct acpi_ctxt *ctxt,
>
>bool
>> +                              unsigned long *table_ptrs, int nr_tables,
>
>unsigned int
>> +                              const char *dev_name,
>> +                              const void *blob, uint32_t blob_length)
>> +{
>> +    struct acpi_header ssdt, *header;
>> +    uint8_t *buffer;
>> +
>> +    if ( check_devname_collision(dev_name) )
>> +        return 0;
>> +
>> +    /* built ACPI namespace device from [name, blob] */
>> +    buffer = aml_build_begin(ctxt);
>> +    aml_prepend_blob(buffer, blob, blob_length);
>> +    aml_prepend_device(buffer, dev_name);
>> +    aml_prepend_scope(buffer, "\\_SB");
>> +
>> +    /* build SSDT header */
>> +    ssdt.signature = ACPI_2_0_SSDT_SIGNATURE;
>> +    ssdt.revision = ACPI_2_0_SSDT_REVISION;
>> +    fixed_strcpy(ssdt.oem_id, ACPI_OEM_ID);
>> +    fixed_strcpy(ssdt.oem_table_id, ACPI_OEM_TABLE_ID);
>> +    ssdt.oem_revision = ACPI_OEM_REVISION;
>> +    ssdt.creator_id = ACPI_CREATOR_ID;
>> +    ssdt.creator_revision = ACPI_CREATOR_REVISION;
>> +
>> +    /* prepend SSDT header to ACPI namespace device */
>> +    aml_prepend_blob(buffer, &ssdt, sizeof(ssdt));
>> +    header = (struct acpi_header *) buffer;
>> +    header->length = aml_build_end();
>> +
>> +    /* calculate checksum of SSDT */
>> +    set_checksum(header, offsetof(struct acpi_header, checksum),
>> +                 header->length);
>> +
>> +    table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, buffer);
>> +
>> +    return 1;
>
>return true.
>> +}
>> +
>> +/*
>> + * All ACPI stuffs built by the device model are placed in the guest
>> + * buffer whose address and size are specified by config->dm.{addr, length},
>> + * or XenStore keys HVM_XS_DM_ACPI_{ADDRESS, LENGTH}.
>
>This should be also provided in
>xen/include/public/hvm/hvm_xs_strings.h
>
>Especially as you are in effect adding new keys and attributes to it.
>

yes, will move to the header file.

>> + *
>> + * The data layout within the buffer is further specified by XenStore
>> + * directories under HVM_XS_DM_ACPI_ROOT. Each directory specifies a
>
>Is each directory the name of the DSDT object? In which case you
>want to say a bit about the directory name and the limit (only four
>characters long), don't use the ones that are built-in, etc..
>

If the type is DM_ACPI_BLOB_TYPE_TABLE, the directory name makes
little sense as long as it's different than other names.

If the type is DM_ACPI_BLOB_TYPE_NSDEV, the directory name is the name
of namespace device (see the comment in the patch about
DM_ACPI_BLOB_TYPE_NSDEV below), such as "NVDR" below
    Scope (\_SB) {
        Device (NVDR) {
           ...
        }
    }
I'll add comments about the limitation of the name.

>But it looks like it can be anything. You extract the name from
>the blob.
>
>But we should still say what the directory names ought to be.
>
>> + * data blob and contains following XenStore keys:
>> + *
>> + * - "type":
>> + *   * DM_ACPI_BLOB_TYPE_TABLE
>> + *     The data blob specified by this directory is an ACPI table.
>> + *   * DM_ACPI_BLOB_TYPE_NSDEV
>> + *     The data blob specified by this directory is an ACPI namespace device.
>> + *     Its name is specified by the directory name, while the AML code of the
>> + *     body of the AML device structure is in the data blob.
>
>Could those be strings on XenStore? Strings are nice. "table" or
>"nsdev"?

I'm not object to use a string, but isn't an integer easier for
programs to parse? Or you are considering it should also be human
readable?

>> + *
>> + * - "length": the number of bytes in this data blob.
>> + *
>> + * - "offset": the offset in bytes of this data blob from the beginning of buffer
>> + */
>> +static int construct_dm_tables(struct acpi_ctxt *ctxt,
>
>static unsigned int
>> +                               unsigned long *table_ptrs,
>> +                               int nr_tables,
>
>unsigned int nr_tables

OK. As long as ACPI_MAX_SECONDARY_TABLES (i.e. the max value of
nr_tables) is smaller than INT_MAX, the signed -> unsigned type will
not cause trouble to its caller.

>> +                               struct acpi_config *config)
>> +{
>> +    const char *s;
>> +    char **dir;
>> +    uint8_t type;
>> +    void *blob;
>> +    unsigned int num, length, offset, i;
>> +    int nr_added = 0;
>
>unsigned int
>> +
>> +    if ( !config->dm.addr )
>> +        return 0;
>> +
>> +    dir = ctxt->xs_ops.directory(ctxt, HVM_XS_DM_ACPI_ROOT, &num);
>> +    if ( !dir || !num )
>> +        return 0;
>> +
>> +    if ( num > ACPI_MAX_SECONDARY_TABLES - nr_tables )
>> +        return 0;
>> +
>> +    for ( i = 0; i < num; i++, dir++ )
>> +    {
>
>You probably want to check that *dir is not NULL. Just in case.

will do

>
>> +        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "type");
>> +        if ( !s )
>> +            continue;
>> +        type = (uint8_t)strtoll(s, NULL, 0);
>> +
>> +        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "length");
>> +        if ( !s )
>> +            continue;
>> +        length = (uint32_t)strtoll(s, NULL, 0);
>> +
>> +        s = xs_read_dm_acpi_blob_key(ctxt, *dir, "offset");
>> +        if ( !s )
>> +            continue;
>> +        offset = (uint32_t)strtoll(s, NULL, 0);
>> +
>> +        blob = ctxt->mem_ops.p2v(ctxt, config->dm.addr + offset);
>> +
>> +        switch ( type )
>> +        {
>> +        case DM_ACPI_BLOB_TYPE_TABLE:
>> +            nr_added += construct_dm_table(ctxt,
>> +                                           table_ptrs, nr_tables + nr_added,
>> +                                           blob, length);
>> +            break;
>> +        case DM_ACPI_BLOB_TYPE_NSDEV:
>> +            nr_added += construct_dm_nsdev(ctxt,
>> +                                           table_ptrs, nr_tables + nr_added,
>> +                                           *dir, blob, length);
>> +            break;
>> +        default:
>> +            /* skip blobs of unknown types */
>> +            continue;
>> +        }
>> +    }
>> +
>> +    return nr_added;
>> +}
>> +
>>  static int construct_secondary_tables(struct acpi_ctxt *ctxt,
>>                                        unsigned long *table_ptrs,
>>                                        struct acpi_config *config,
>> @@ -461,6 +674,9 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
>>      nr_tables += construct_passthrough_tables(ctxt, table_ptrs,
>>                                                nr_tables, config);
>>
>> +    /* Load any additional tables passed from device model (e.g. QEMU) */
>
>Perhaps an period at the end of the sentence?

Yes.

Thanks,
Haozhong

>
>> +    nr_tables += construct_dm_tables(ctxt, table_ptrs, nr_tables, config);
>> +
>>      table_ptrs[nr_tables] = 0;
>>      return nr_tables;
>>  }
>> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
>> index 12cafd8..684502d 100644
>> --- a/tools/libacpi/libacpi.h
>> +++ b/tools/libacpi/libacpi.h
>> @@ -82,6 +82,11 @@ struct acpi_config {
>>          uint32_t length;
>>      } pt;
>>
>> +    struct {
>> +        uint32_t addr;
>> +        uint32_t length;
>> +    } dm;
>> +
>>      struct acpi_numa numa;
>>      const struct hvm_info_table *hvminfo;
>>
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs
  2017-01-27 21:47   ` Konrad Rzeszutek Wilk
@ 2017-02-08  5:42     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  5:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On 01/27/17 16:47 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:31AM +0800, Haozhong Zhang wrote:
>> For xl vNVDIMM configs
>>   vnvdimms = [ '/path/to/pmem0', '/path/to/pmem1', ... ]
>>
>> the following qemu options are built
>>   -machine <existing options>,nvdimm
>>   -m <existing options>,slots=$NR_SLOTS,maxmem=$MEM_SIZE
>>   -object memory-backend-xen,id=mem1,size=$PMEM0_SIZE,mem-path=/path/to/pmem0
>>   -device nvdimm,id=nvdimm1,memdev=mem1
>>   -object memory-backend-xen,id=mem2,size=$PMEM1_SIZE,mem-path=/path/to/pmem1
>>   -device nvdimm,id=nvdimm2,memdev=mem2
>>   ...
>> where
>> * NR_SLOTS is the number of entries in vnvdimms + 1,
>> * MEM_SIZE is the total size of all RAM and NVDIMM devices,
>> * PMEM#_SIZE is the size of the host pmem device/file '/path/to/pmem#'.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxl/libxl_dm.c      | 113 +++++++++++++++++++++++++++++++++++++++++++-
>>  tools/libxl/libxl_types.idl |   8 ++++
>>  tools/libxl/xl_cmdimpl.c    |  16 +++++++
>
>You probably also want this new parameter in the xl manpage.

will add description in xl manpage

>
>>  3 files changed, 135 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>> index ad366a8..6b8c019 100644
>> --- a/tools/libxl/libxl_dm.c
>> +++ b/tools/libxl/libxl_dm.c
>> @@ -24,6 +24,10 @@
>>  #include <sys/types.h>
>>  #include <pwd.h>
>>
>> +#if defined(__linux__)
>> +#include <linux/fs.h>
>> +#endif
>> +
>>  static const char *libxl_tapif_script(libxl__gc *gc)
>>  {
>>  #if defined(__linux__) || defined(__FreeBSD__)
>> @@ -905,6 +909,86 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *target_path,
>>      return drive;
>>  }
>>
>> +#if defined(__linux__)
>> +
>> +static uint64_t libxl__build_dm_vnvdimm_args(libxl__gc *gc, flexarray_t *dm_args,
>> +                                             struct libxl_device_vnvdimm *dev,
>> +                                             int dev_no)
>> +{
>> +    int fd, rc;
>> +    struct stat st;
>> +    uint64_t size = 0;
>> +    char *arg;
>> +
>> +    fd = open(dev->file, O_RDONLY);
>> +    if (fd < 0) {
>> +        LOG(ERROR, "failed to open file %s: %s",
>> +            dev->file, strerror(errno));
>> +        goto out;
>> +    }
>> +
>> +    if (stat(dev->file, &st)) {
>> +        LOG(ERROR, "failed to get status of file %s: %s",
>> +            dev->file, strerror(errno));
>> +        goto out_fclose;
>> +    }
>> +
>> +    switch (st.st_mode & S_IFMT) {
>> +    case S_IFBLK:
>> +        rc = ioctl(fd, BLKGETSIZE64, &size);
>> +        if (rc == -1) {
>> +            LOG(ERROR, "failed to get size of block device %s: %s",
>> +                dev->file, strerror(errno));
>> +            size = 0;
>> +        }
>> +        break;
>> +
>> +    case S_IFREG:
>> +        size = st.st_size;
>> +        break;
>> +
>> +    default:
>> +        LOG(ERROR, "%s is not a block device or regular file", dev->file);
>> +        break;
>> +    }
>> +
>> +    if (!size)
>> +        goto out_fclose;
>> +
>> +    flexarray_append(dm_args, "-object");
>> +    arg = GCSPRINTF("memory-backend-xen,id=mem%d,size=%"PRIu64",mem-path=%s",
>> +                    dev_no + 1, size, dev->file);
>> +    flexarray_append(dm_args, arg);
>> +
>> +    flexarray_append(dm_args, "-device");
>> +    arg = GCSPRINTF("nvdimm,id=nvdimm%d,memdev=mem%d", dev_no + 1, dev_no + 1);
>> +    flexarray_append(dm_args, arg);
>> +
>> + out_fclose:
>> +    close(fd);
>> + out:
>> +    return size;
>> +}
>> +
>> +static uint64_t libxl__build_dm_vnvdimms_args(
>> +    libxl__gc *gc, flexarray_t *dm_args,
>> +    struct libxl_device_vnvdimm *vnvdimms, int num_vnvdimms)
>> +{
>> +    uint64_t total_size = 0, size;
>> +    int i;
>
>unsigned int

will change

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs
  2017-01-27 21:48   ` Konrad Rzeszutek Wilk
@ 2017-02-08  5:47     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  5:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On 01/27/17 16:48 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:31AM +0800, Haozhong Zhang wrote:
>> For xl vNVDIMM configs
>>   vnvdimms = [ '/path/to/pmem0', '/path/to/pmem1', ... ]
>>
>> the following qemu options are built
>>   -machine <existing options>,nvdimm
>>   -m <existing options>,slots=$NR_SLOTS,maxmem=$MEM_SIZE
>>   -object memory-backend-xen,id=mem1,size=$PMEM0_SIZE,mem-path=/path/to/pmem0
>>   -device nvdimm,id=nvdimm1,memdev=mem1
>>   -object memory-backend-xen,id=mem2,size=$PMEM1_SIZE,mem-path=/path/to/pmem1
>>   -device nvdimm,id=nvdimm2,memdev=mem2
>>   ...
>
>Also you may want to say which patch (just the title) adds support
>for these parameters?

Sure, I'll add comments about the corresponding QEMU commit ids. Those
commits are already in upstream QEMU (since v2.6).

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests
  2017-01-27 22:06   ` Konrad Rzeszutek Wilk
  2017-01-27 22:09     ` Konrad Rzeszutek Wilk
@ 2017-02-08  5:59     ` Haozhong Zhang
  2017-02-08 14:37       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  5:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On 01/27/17 17:06 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:32AM +0800, Haozhong Zhang wrote:
>> We can map host pmem devices or files on pmem devices to guests. This
>> patch adds support to map pmem devices. The implementation relies on the
>> Linux pmem driver, so it currently functions only when libxl is compiled
>
>Perhaps say when the pmem driver was introduced and also what CONFIG
>option to use to enable it?

Yes, I'll add the kernel version and CONFIG_*.

>> for Linux.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxl/Makefile       |   2 +-
>>  tools/libxl/libxl_nvdimm.c | 210 +++++++++++++++++++++++++++++++++++++++++++++
>>  tools/libxl/libxl_nvdimm.h |  45 ++++++++++
>>  3 files changed, 256 insertions(+), 1 deletion(-)
>>  create mode 100644 tools/libxl/libxl_nvdimm.c
>>  create mode 100644 tools/libxl/libxl_nvdimm.h
>>
>> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
>> index a904927..ecc9ae1 100644
>> --- a/tools/libxl/Makefile
>> +++ b/tools/libxl/Makefile
>> @@ -106,7 +106,7 @@ ifeq ($(CONFIG_NetBSD),y)
>>  LIBXL_OBJS-y += libxl_netbsd.o
>>  else
>>  ifeq ($(CONFIG_Linux),y)
>> -LIBXL_OBJS-y += libxl_linux.o
>> +LIBXL_OBJS-y += libxl_linux.o libxl_nvdimm.o
>>  else
>>  ifeq ($(CONFIG_FreeBSD),y)
>>  LIBXL_OBJS-y += libxl_freebsd.o
>> diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
>> new file mode 100644
>> index 0000000..7bcbaaf
>> --- /dev/null
>> +++ b/tools/libxl/libxl_nvdimm.c
>> @@ -0,0 +1,210 @@
>> +/*
>> + * tools/libxl/libxl_nvdimm.c
>> + *
>> + * Copyright (c) 2016, Intel Corporation.
>
>2017 now.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>
>LGPL please. libxl uses that license.
>

I'll fix the license.

>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
>> + */
>> +
>> +#include <stdlib.h>
>> +#include <sys/types.h>
>> +#include <sys/stat.h>
>> +#include <unistd.h>
>> +#include <errno.h>
>> +#include <stdint.h>
>> +
>> +#include "libxl_internal.h"
>> +#include "libxl_arch.h"
>> +#include "libxl_nvdimm.h"
>> +
>> +#include <xc_dom.h>
>> +
>> +#define BLK_DEVICE_ROOT "/sys/dev/block"
>> +
>> +static int nvdimm_sysfs_read(libxl__gc *gc,
>> +                             unsigned int major, unsigned int minor,
>> +                             const char *name, void **data_r)
>> +{
>> +    char *path = libxl__sprintf(gc, BLK_DEVICE_ROOT"/%u:%u/device/%s",
>> +                                major, minor, name);
>> +    return libxl__read_sysfs_file_contents(gc, path, data_r, NULL);
>> +}
>> +
>> +static int nvdimm_get_spa(libxl__gc *gc, unsigned int major, unsigned int minor,
>> +                          uint64_t *spa_r)
>> +{
>> +    void *data;
>> +    int ret = nvdimm_sysfs_read(gc, major, minor, "resource", &data);
>> +
>> +    if ( ret )
>> +        return ret;
>> +
>> +    *spa_r = strtoll(data, NULL, 0);
>> +    return 0;
>> +}
>> +
>> +static int nvdimm_get_size(libxl__gc *gc, unsigned int major, unsigned int minor,
>> +                           uint64_t *size_r)
>> +{
>> +    void *data;
>> +    int ret = nvdimm_sysfs_read(gc, major, minor, "size", &data);
>> +
>> +    if ( ret )
>> +        return ret;
>> +
>> +    *size_r = strtoll(data, NULL, 0);
>> +
>> +    return 0;
>> +}
>> +
>> +static int add_pages(libxl__gc *gc, uint32_t domid,
>> +                     xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
>> +{
>> +    unsigned int nr;
>> +    int ret = 0;
>> +
>> +    while ( nr_mfns )
>> +    {
>> +        nr = min(nr_mfns, (unsigned long) UINT_MAX);
>No need for space                           ^- here.

will remove spaces here and in other patches

>> +
>> +        ret = xc_domain_populate_pmemmap(CTX->xch, domid, mfn, gpfn, nr);
>> +        if ( ret )
>> +        {
>> +            LOG(ERROR, "failed to map pmem pages, "
>> +                "mfn 0x%" PRIx64", gpfn 0x%" PRIx64 ", nr_mfns %u, err %d",
>> +                mfn, gpfn, nr, ret);
>> +            break;
>> +        }
>> +
>> +        nr_mfns -= nr;
>> +        mfn += nr;
>> +        gpfn += nr;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static int add_file(libxl__gc *gc, uint32_t domid, int fd,
>> +                    xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
>> +{
>> +    return -EINVAL;
>
>Hehehehe..

After consulting the driver developers, I'm going to remove the file
support in the version, because there is current no stable way to fix
the mapping between file extents and their physical locations (some of
my previous understanding about file mapping was wrong).

>> +}
>> +
>> +int libxl_nvdimm_add_device(libxl__gc *gc,
>> +                            uint32_t domid, const char *path,
>> +                            uint64_t guest_spa, uint64_t guest_size)
>> +{
>> +    int fd;
>> +    struct stat st;
>> +    unsigned int major, minor;
>> +    uint64_t host_spa, host_size;
>> +    xen_pfn_t mfn, gpfn;
>> +    unsigned long nr_gpfns;
>> +    int ret;
>> +
>> +    if ( (guest_spa & ~XC_PAGE_MASK) || (guest_size & ~XC_PAGE_MASK) )
>> +        return -EINVAL;
>
>That is the wrong return value. The libxl functions return enum
>libxl_error
>
>Please use those throughout the code.

Thanks for indicating this, I totally didn't notice it.

>> +
>> +    fd = open(path, O_RDONLY);
>> +    if ( fd < 0 )
>> +    {
>> +        LOG(ERROR, "failed to open file %s (err: %d)", path, errno);
>> +        return -EIO;
>> +    }
>> +
>> +    ret = fstat(fd, &st);
>> +    if ( ret )
>> +    {
>> +        LOG(ERROR, "failed to get status of file %s (err: %d)",
>> +            path, errno);
>> +        goto out;
>> +    }
>> +
>> +    switch ( st.st_mode & S_IFMT )
>> +    {
>> +    case S_IFBLK:
>> +        major = major(st.st_rdev);
>> +        minor = minor(st.st_rdev);
>> +        break;
>> +
>> +    case S_IFREG:
>> +        major = major(st.st_dev);
>> +        minor = minor(st.st_dev);
>> +        break;
>> +
>> +    default:
>> +        LOG(ERROR, "%s is neither a block device nor a regular file", path);
>> +        ret = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    ret = nvdimm_get_spa(gc, major, minor, &host_spa);
>> +    if ( ret )
>> +    {
>> +        LOG(ERROR, "failed to get SPA of device %u:%u", major, minor);
>> +        goto out;
>> +    }
>> +    else if ( host_spa & ~XC_PAGE_MASK )
>> +    {
>> +        ret = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    ret = nvdimm_get_size(gc, major, minor, &host_size);
>> +    if ( ret )
>> +    {
>> +        LOG(ERROR, "failed to get size of device %u:%u", major, minor);
>> +        goto out;
>> +    }
>> +    else if ( guest_size > host_size )
>> +    {
>> +        LOG(ERROR, "vNVDIMM size %" PRIu64 " expires NVDIMM size %" PRIu64,
>
>expires? larger?

larger

>> +            guest_size, host_size);
>> +        ret = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    mfn = host_spa >> XC_PAGE_SHIFT;
>> +    gpfn = guest_spa >> XC_PAGE_SHIFT;
>> +    nr_gpfns = guest_size >> XC_PAGE_SHIFT;
>> +
>> +    switch ( st.st_mode & S_IFMT )
>> +    {
>> +    case S_IFBLK:
>> +        ret = add_pages(gc, domid, mfn, gpfn, nr_gpfns);
>
>You will need to change the return value.

yes

>> +        break;
>> +
>> +    case S_IFREG:
>> +        ret = add_file(gc, domid, fd, mfn, gpfn, nr_gpfns);
>
>Ditto here.
>> +        break;
>> +
>> +    default:
>> +        LOG(ERROR, "%s is neither a block device nor a regular file", path);
>> +        ret = -EINVAL;
>> +    }
>> +
>> + out:
>> +    close(fd);
>> +    return ret;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-basic-offset: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/tools/libxl/libxl_nvdimm.h b/tools/libxl/libxl_nvdimm.h
>> new file mode 100644
>> index 0000000..4de2fb2
>> --- /dev/null
>> +++ b/tools/libxl/libxl_nvdimm.h
>
>Why not add it in libxl.h?
>
>(Along with the LIBXL_HAVE_NVDIMM and such?)
>

will move them to libxl.h

Thanks,
Haozhong

>> @@ -0,0 +1,45 @@
>> +/*
>> + * tools/libxl/libxl_nvdimm.h
>> + *
>> + * Copyright (c) 2016, Intel Corporation.
>
>Now 2017
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>
>And please relicense it under LGPL
>
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Haozhong Zhang <haozhong.zhang@intel.com>
>> + */
>> +
>> +#ifndef LIBXL_NVDIMM_H
>> +#define LIBXL_NVDIMM_H
>> +
>> +#include <stdint.h>
>> +#include "libxl_internal.h"
>> +
>> +#if defined(__linux__)
>> +
>> +int libxl_nvdimm_add_device(libxl__gc *gc,
>> +                            uint32_t domid, const char *path,
>> +                            uint64_t spa, uint64_t length);
>> +
>> +#else
>> +
>> +int libxl_nvdimm_add_device(libxl__gc *gc,
>> +                            uint32_t domid, const char *path,
>> +                            uint64_t spa, uint64_t length)
>> +{
>> +    return -EINVAL;
>> +}
>> +
>> +#endif /* __linux__ */
>> +
>> +#endif /* !LIBXL_NVDIMM_H */
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 14/16] tools/libxl: add support to map files on pmem devices to guests
  2017-01-27 22:10   ` Konrad Rzeszutek Wilk
@ 2017-02-08  6:03     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  6:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On 01/27/17 17:10 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:33AM +0800, Haozhong Zhang wrote:
>> We can map host pmem devices or files on pmem devices to guests. This
>> patch adds support to map files on pmem devices. The implementation
>> relies on the Linux pmem driver and kernel APIs, so it currently
>
>May want to mention which CONFIG_ options are needed.

I'll drop the support to map a file. After consulting our driver
developers, there is really no stable way to fix mappings between the
file extents and their physical locations, so the fiemap code in this
patch in fact may not work correctly.

Thanks,
Haozhong


>> functions only when libxl is compiled for Linux.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxl/libxl_nvdimm.c | 73 +++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 72 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxl/libxl_nvdimm.c b/tools/libxl/libxl_nvdimm.c
>> index 7bcbaaf..b3ba19a 100644
>> --- a/tools/libxl/libxl_nvdimm.c
>> +++ b/tools/libxl/libxl_nvdimm.c
>> @@ -25,6 +25,9 @@
>>  #include <unistd.h>
>>  #include <errno.h>
>>  #include <stdint.h>
>> +#include <sys/ioctl.h>
>> +#include <linux/fs.h>
>> +#include <linux/fiemap.h>
>>
>>  #include "libxl_internal.h"
>>  #include "libxl_arch.h"
>> @@ -97,10 +100,78 @@ static int add_pages(libxl__gc *gc, uint32_t domid,
>>      return ret;
>>  }
>>
>> +static uint64_t
>> +get_file_extents(libxl__gc *gc, int fd, unsigned long length,
>> +                 struct fiemap_extent **extents_r)
>> +{
>> +    struct fiemap *fiemap;
>> +    uint64_t nr_extents = 0, extents_size;
>> +
>> +    fiemap = libxl__zalloc(gc, sizeof(*fiemap));
>> +    if ( !fiemap )
>> +        goto out;
>> +
>> +    fiemap->fm_length = length;
>> +    if ( ioctl(fd, FS_IOC_FIEMAP, fiemap) < 0 )
>> +        goto out;
>> +
>> +    nr_extents = fiemap->fm_mapped_extents;
>> +    extents_size = sizeof(struct fiemap_extent) * nr_extents;
>> +    fiemap = libxl__realloc(gc, fiemap, sizeof(*fiemap) + extents_size);
>> +    if ( !fiemap )
>> +        goto out;
>> +
>> +    memset(fiemap->fm_extents, 0, extents_size);
>> +    fiemap->fm_extent_count = nr_extents;
>> +    fiemap->fm_mapped_extents = 0;
>> +
>> +    if ( ioctl(fd, FS_IOC_FIEMAP, fiemap) < 0 )
>> +        goto out;
>> +
>> +    *extents_r = fiemap->fm_extents;
>> +
>> + out:
>> +    return nr_extents;
>> +}
>> +
>>  static int add_file(libxl__gc *gc, uint32_t domid, int fd,
>>                      xen_pfn_t mfn, xen_pfn_t gpfn, unsigned long nr_mfns)
>>  {
>> -    return -EINVAL;
>> +    struct fiemap_extent *extents;
>> +    uint64_t nr_extents, i;
>> +    int ret = 0;
>> +
>> +    nr_extents = get_file_extents(gc, fd, nr_mfns << XC_PAGE_SHIFT, &extents);
>> +    if ( !nr_extents )
>> +        return -EIO;
>> +
>> +    for ( i = 0; i < nr_extents; i++ )
>> +    {
>> +        uint64_t p_offset = extents[i].fe_physical;
>> +        uint64_t l_offset = extents[i].fe_logical;
>> +        uint64_t length = extents[i].fe_length;
>> +
>> +        if ( extents[i].fe_flags & ~FIEMAP_EXTENT_LAST )
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        if ( (p_offset | l_offset | length) & ~XC_PAGE_MASK )
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        ret = add_pages(gc, domid,
>> +                        mfn + (p_offset >> XC_PAGE_SHIFT),
>> +                        gpfn + (l_offset >> XC_PAGE_SHIFT),
>> +                        length >> XC_PAGE_SHIFT);
>> +        if ( ret )
>> +            break;
>> +    }
>> +
>> +    return ret;
>>  }
>>
>>  int libxl_nvdimm_add_device(libxl__gc *gc,
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-01-27 22:11   ` Konrad Rzeszutek Wilk
@ 2017-02-08  6:07     ` Haozhong Zhang
  2017-02-08 10:31       ` Wei Liu
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  6:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> If any error code is returned when creating a domain, stop the domain
>> creation.
>
>This looks like it is a bug-fix that can be spun off from this
>patchset?
>

Yes, if everyone considers it's really a bug and the fix does not
cause compatibility problem (e.g. xl w/o this patch does not abort the
domain creation if it fails to connect to QEMU VNC port).

Thanks,
Haozhong

>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxl/libxl_create.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index d986cd2..24e8368 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
>>      if (dcs->sdss.dm.guest_domid) {
>>          if (d_config->b_info.device_model_version
>>              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
>> -            libxl__qmp_initializations(gc, domid, d_config);
>> +            ret = libxl__qmp_initializations(gc, domid, d_config);
>> +            if (ret)
>> +                goto error_out;
>>          }
>>      }
>>
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 16/16] tools/libxl: initiate pmem mapping via qmp callback
  2017-01-27 22:13   ` Konrad Rzeszutek Wilk
@ 2017-02-08  6:08     ` Haozhong Zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-08  6:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xiao Guangrong, Wei Liu, Ian Jackson, xen-devel

On 01/27/17 17:13 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:35AM +0800, Haozhong Zhang wrote:
>> QMP command 'query-nvdimms' is used by libxl to get the backend, the
>> guest SPA and size of each vNVDIMM device, and then libxl starts mapping
>> backend to guest for each vNVDIMM device.
>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxl/libxl_qmp.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 64 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
>> index f8addf9..02edd09 100644
>> --- a/tools/libxl/libxl_qmp.c
>> +++ b/tools/libxl/libxl_qmp.c
>> @@ -26,6 +26,7 @@
>>
>>  #include "_libxl_list.h"
>>  #include "libxl_internal.h"
>> +#include "libxl_nvdimm.h"
>>
>>  /* #define DEBUG_RECEIVED */
>>
>> @@ -1146,6 +1147,66 @@ out:
>>      return rc;
>>  }
>>
>> +static int qmp_register_nvdimm_callback(libxl__qmp_handler *qmp,
>> +                                        const libxl__json_object *o,
>> +                                        void *unused)
>> +{
>> +    GC_INIT(qmp->ctx);
>> +    const libxl__json_object *obj = NULL;
>> +    const libxl__json_object *sub_obj = NULL;
>> +    int i = 0;
>
>unsigned int.

will fix

Thanks,
Haozhong

>> +    const char *mem_path;
>> +    uint64_t slot, spa, length;
>> +    int ret = 0;
>> +
>> +    for (i = 0; (obj = libxl__json_array_get(o, i)); i++) {
>> +        if (!libxl__json_object_is_map(obj))
>> +            continue;
>> +
>> +        sub_obj = libxl__json_map_get("slot", obj, JSON_INTEGER);
>> +        slot = libxl__json_object_get_integer(sub_obj);
>> +
>> +        sub_obj = libxl__json_map_get("mem-path", obj, JSON_STRING);
>> +        mem_path = libxl__json_object_get_string(sub_obj);
>> +        if (!mem_path) {
>> +            LOG(ERROR, "No mem-path is specified for NVDIMM #%" PRId64, slot);
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        sub_obj = libxl__json_map_get("spa", obj, JSON_INTEGER);
>> +        spa = libxl__json_object_get_integer(sub_obj);
>> +
>> +        sub_obj = libxl__json_map_get("length", obj, JSON_INTEGER);
>> +        length = libxl__json_object_get_integer(sub_obj);
>> +
>> +        LOG(DEBUG,
>> +            "vNVDIMM #%" PRId64 ": %s, spa 0x%" PRIx64 ", length 0x%" PRIx64,
>> +            slot, mem_path, spa, length);
>> +
>> +        ret = libxl_nvdimm_add_device(gc, qmp->domid, mem_path, spa, length);
>> +        if (ret) {
>> +            LOG(ERROR,
>> +                "Failed to add NVDIMM #%" PRId64
>> +                "(mem_path %s, spa 0x%" PRIx64 ", length 0x%" PRIx64 ") "
>> +                "to domain %d (err = %d)",
>> +                slot, mem_path, spa, length, qmp->domid, ret);
>> +            goto out;
>> +        }
>> +    }
>> +
>> + out:
>> +    GC_FREE;
>> +    return ret;
>> +}
>> +
>> +static int libxl__qmp_query_nvdimms(libxl__qmp_handler *qmp)
>> +{
>> +    return qmp_synchronous_send(qmp, "query-nvdimms", NULL,
>> +                                qmp_register_nvdimm_callback,
>> +                                NULL, qmp->timeout);
>> +}
>> +
>>  int libxl__qmp_hmp(libxl__gc *gc, int domid, const char *command_line,
>>                     char **output)
>>  {
>> @@ -1187,6 +1248,9 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
>>      if (!ret) {
>>          ret = qmp_query_vnc(qmp);
>>      }
>> +    if (!ret && guest_config->num_vnvdimms) {
>> +        ret = libxl__qmp_query_nvdimms(qmp);
>> +    }
>>      libxl__qmp_close(qmp);
>>      return ret;
>>  }
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-08  6:07     ` Haozhong Zhang
@ 2017-02-08 10:31       ` Wei Liu
  2017-02-09  2:47         ` Haozhong Zhang
  0 siblings, 1 reply; 77+ messages in thread
From: Wei Liu @ 2017-02-08 10:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, Wei Liu, Xiao Guangrong, Ian Jackson

On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > If any error code is returned when creating a domain, stop the domain
> > > creation.
> > 
> > This looks like it is a bug-fix that can be spun off from this
> > patchset?
> > 
> 
> Yes, if everyone considers it's really a bug and the fix does not
> cause compatibility problem (e.g. xl w/o this patch does not abort the
> domain creation if it fails to connect to QEMU VNC port).
> 

I'm two minded here. If the failure to connect is caused by some
temporary glitches in QEMU and we're sure it will eventually succeed,
there is no need to abort domain creation. If failure to connect is due
to permanent glitches, we should abort.

OOI how did you discover this issue? That could be the key to understand
the issue here.

Wei.

> Thanks,
> Haozhong
> 
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > ---
> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > > Cc: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >  tools/libxl/libxl_create.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > > index d986cd2..24e8368 100644
> > > --- a/tools/libxl/libxl_create.c
> > > +++ b/tools/libxl/libxl_create.c
> > > @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
> > >      if (dcs->sdss.dm.guest_domid) {
> > >          if (d_config->b_info.device_model_version
> > >              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
> > > -            libxl__qmp_initializations(gc, domid, d_config);
> > > +            ret = libxl__qmp_initializations(gc, domid, d_config);
> > > +            if (ret)
> > > +                goto error_out;
> > >          }
> > >      }
> > > 
> > > --
> > > 2.10.1
> > > 
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model
  2017-02-08  1:39     ` Haozhong Zhang
@ 2017-02-08 14:31       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-02-08 14:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, Wei Liu, Xiao Guangrong, Ian Jackson

On Wed, Feb 08, 2017 at 09:39:09AM +0800, Haozhong Zhang wrote:
> On 01/27/17 15:44 -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Oct 10, 2016 at 08:32:25AM +0800, Haozhong Zhang wrote:
> > > One guest page is reserved for the device model to place guest ACPI. The
> > 
> > guest ACPI what? ACPI SSDT? MADT?
> 
> For NVDIMM, it includes NFIT and SSDT. However, the mechanism
> implemented in this and following libacpi patches can be a generic one
> to pass guest ACPI from device model. A simple conflict detection is
> implemented in patch 11.
> 
> > 
> > Also why one page? What if there is a need for more than one page?
> > 
> > You add  HVM_XS_DM_ACPI_LENGTH which makes me think this is accounted
> > for?
> > 
> 
> One page is enough for NVDIMM (NFIT + SSDT). I don't see fundamental
> restriction to not allow more one page, so I use HVM_XS_DM_ACPI_LENGTH
> to pass the reserved size to allow changing the size in future.

640K ought to be enough for everybody ? :-)

> 
> > > base address and size of the reserved area are passed to the device
> > > model via XenStore keys hvmloader/dm-acpi/{address, length}.
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > ---
> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > > Cc: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >  tools/libxc/include/xc_dom.h            |  1 +
> > >  tools/libxc/xc_dom_x86.c                |  7 +++++++
> > >  tools/libxl/libxl_dom.c                 | 25 +++++++++++++++++++++++++
> > >  xen/include/public/hvm/hvm_xs_strings.h | 11 +++++++++++
> > >  4 files changed, 44 insertions(+)
> > > 
> > > diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
> > > index 608cbc2..19d65cd 100644
> > > --- a/tools/libxc/include/xc_dom.h
> > > +++ b/tools/libxc/include/xc_dom.h
> > > @@ -98,6 +98,7 @@ struct xc_dom_image {
> > >      xen_pfn_t xenstore_pfn;
> > >      xen_pfn_t shared_info_pfn;
> > >      xen_pfn_t bootstack_pfn;
> > > +    xen_pfn_t dm_acpi_pfn;
> > 
> > Perhaps an pointer to an variable size array?
> > 
> > xen_pfn_t *dm_acpi_pfns;
> > unsigned int dm_apci_nr;
> > 
> > ?
> 
> dm_acpi_pfn is passed to QEMU via xenstore. Though passing an array of
> pfns via xenstore is also doable, a pair of base pfn and length is
> simpler.

Whatever you think is correct - as long as you can do more than
one page.

Thanks!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model
  2017-02-08  5:38     ` Haozhong Zhang
@ 2017-02-08 14:35       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-02-08 14:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, Xiao Guangrong, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

.giant snip..
> > > + * - "type":
> > > + *   * DM_ACPI_BLOB_TYPE_TABLE
> > > + *     The data blob specified by this directory is an ACPI table.
> > > + *   * DM_ACPI_BLOB_TYPE_NSDEV
> > > + *     The data blob specified by this directory is an ACPI namespace device.
> > > + *     Its name is specified by the directory name, while the AML code of the
> > > + *     body of the AML device structure is in the data blob.
> > 
> > Could those be strings on XenStore? Strings are nice. "table" or
> > "nsdev"?
> 
> I'm not object to use a string, but isn't an integer easier for
> programs to parse? Or you are considering it should also be human
> readable?

Integers are easier. But it needs to be defined in the public
header files.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests
  2017-02-08  5:59     ` Haozhong Zhang
@ 2017-02-08 14:37       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-02-08 14:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, Wei Liu, Xiao Guangrong, Ian Jackson

. snip.. 
> After consulting the driver developers, I'm going to remove the file
> support in the version, because there is current no stable way to fix
> the mapping between file extents and their physical locations (some of
> my previous understanding about file mapping was wrong).

Could you expand more please? I have to say I didn't follow
very much the device dax and such conversation and I am wondering
what the new direction of this is?

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-08 10:31       ` Wei Liu
@ 2017-02-09  2:47         ` Haozhong Zhang
  2017-02-09 10:13           ` Wei Liu
  0 siblings, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-09  2:47 UTC (permalink / raw)
  To: Wei Liu; +Cc: Konrad Rzeszutek Wilk, Xiao Guangrong, Ian Jackson, xen-devel

On 02/08/17 10:31 +0000, Wei Liu wrote:
>On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
>> On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>> > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> > > If any error code is returned when creating a domain, stop the domain
>> > > creation.
>> >
>> > This looks like it is a bug-fix that can be spun off from this
>> > patchset?
>> >
>>
>> Yes, if everyone considers it's really a bug and the fix does not
>> cause compatibility problem (e.g. xl w/o this patch does not abort the
>> domain creation if it fails to connect to QEMU VNC port).
>>
>
>I'm two minded here. If the failure to connect is caused by some
>temporary glitches in QEMU and we're sure it will eventually succeed,
>there is no need to abort domain creation. If failure to connect is due
>to permanent glitches, we should abort.
>

Sorry, I should say "*query* QEMU VNC port" instead of *connect*.

libxl__qmp_initializations() currently does following tasks.
1/ Create a QMP socket.

   I think all failures in 1/ should be considered as permanent. It
   does not only fail the following tasks, but also fails the device
   hotplug which needs to cooperate with QEMU.

2/ If 1/ succeeds, query qmp about parameters of serial port and fill
   them in xenstore.
3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
   address, port) of VNC and fill them in xenstore.

   If we assume Xen always send the correct QMP commands and
   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
   socket errors (see qmp_next()), which are hard to tell whether they
   are permanent or temporal. However, if the missing of serial port
   or VNC is considered as not affecting the execution of guest
   domain, we may ignore failures here.

>OOI how did you discover this issue? That could be the key to understand
>the issue here.

The next patch adds code in libxl__qmp_initialization() to query qmp
about vNVDIMM parameters (e.g. the base gpfn which is calculated by
QEMU) and return error code if it fails. While I was developing that
patch, I found xl didn't stop even if bugs in my QEMU patches failed
the code in my Xen patch.

Maybe we could let libxl__qmp_initializations() report whether a
failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
can continue, but it needs to warn those failures.

Thanks,
Haozhong

>
>Wei.
>
>> Thanks,
>> Haozhong
>>
>> > >
>> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> > > ---
>> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> > > Cc: Wei Liu <wei.liu2@citrix.com>
>> > > ---
>> > >  tools/libxl/libxl_create.c | 4 +++-
>> > >  1 file changed, 3 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> > > index d986cd2..24e8368 100644
>> > > --- a/tools/libxl/libxl_create.c
>> > > +++ b/tools/libxl/libxl_create.c
>> > > @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
>> > >      if (dcs->sdss.dm.guest_domid) {
>> > >          if (d_config->b_info.device_model_version
>> > >              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
>> > > -            libxl__qmp_initializations(gc, domid, d_config);
>> > > +            ret = libxl__qmp_initializations(gc, domid, d_config);
>> > > +            if (ret)
>> > > +                goto error_out;
>> > >          }
>> > >      }
>> > >
>> > > --
>> > > 2.10.1
>> > >
>> > >
>> > > _______________________________________________
>> > > Xen-devel mailing list
>> > > Xen-devel@lists.xen.org
>> > > https://lists.xen.org/xen-devel
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xen.org
>https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-09  2:47         ` Haozhong Zhang
@ 2017-02-09 10:13           ` Wei Liu
  2017-02-09 10:16             ` Wei Liu
  2017-02-10  2:37             ` Haozhong Zhang
  0 siblings, 2 replies; 77+ messages in thread
From: Wei Liu @ 2017-02-09 10:13 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk, xen-devel, Xiao Guangrong, Ian Jackson

On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> On 02/08/17 10:31 +0000, Wei Liu wrote:
> > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > If any error code is returned when creating a domain, stop the domain
> > > > > creation.
> > > >
> > > > This looks like it is a bug-fix that can be spun off from this
> > > > patchset?
> > > >
> > > 
> > > Yes, if everyone considers it's really a bug and the fix does not
> > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > domain creation if it fails to connect to QEMU VNC port).
> > > 
> > 
> > I'm two minded here. If the failure to connect is caused by some
> > temporary glitches in QEMU and we're sure it will eventually succeed,
> > there is no need to abort domain creation. If failure to connect is due
> > to permanent glitches, we should abort.
> > 
> 
> Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> 
> libxl__qmp_initializations() currently does following tasks.
> 1/ Create a QMP socket.
> 
>   I think all failures in 1/ should be considered as permanent. It
>   does not only fail the following tasks, but also fails the device
>   hotplug which needs to cooperate with QEMU.
> 
> 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
>   them in xenstore.
> 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
>   address, port) of VNC and fill them in xenstore.
> 
>   If we assume Xen always send the correct QMP commands and
>   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
>   socket errors (see qmp_next()), which are hard to tell whether they
>   are permanent or temporal. However, if the missing of serial port
>   or VNC is considered as not affecting the execution of guest
>   domain, we may ignore failures here.
> 
> > OOI how did you discover this issue? That could be the key to understand
> > the issue here.
> 
> The next patch adds code in libxl__qmp_initialization() to query qmp
> about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> QEMU) and return error code if it fails. While I was developing that
> patch, I found xl didn't stop even if bugs in my QEMU patches failed
> the code in my Xen patch.
> 

Right, this should definitely be fatal.

> Maybe we could let libxl__qmp_initializations() report whether a
> failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> can continue, but it needs to warn those failures.
> 

Yes, we can do that. It's an internal function, we can change things as
we see fit.

I would suggest you only make vNVDIMM failure fatal as a start.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-09 10:13           ` Wei Liu
@ 2017-02-09 10:16             ` Wei Liu
  2017-02-10  2:37             ` Haozhong Zhang
  1 sibling, 0 replies; 77+ messages in thread
From: Wei Liu @ 2017-02-09 10:16 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk, xen-devel, Xiao Guangrong,
	Ian Jackson, haozhong.zhang

Hmm... not sure why my reply didn't have you in the To: field.

On Thu, Feb 09, 2017 at 10:13:13AM +0000, Wei Liu wrote:
> On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> > On 02/08/17 10:31 +0000, Wei Liu wrote:
> > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > > If any error code is returned when creating a domain, stop the domain
> > > > > > creation.
> > > > >
> > > > > This looks like it is a bug-fix that can be spun off from this
> > > > > patchset?
> > > > >
> > > > 
> > > > Yes, if everyone considers it's really a bug and the fix does not
> > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > > domain creation if it fails to connect to QEMU VNC port).
> > > > 
> > > 
> > > I'm two minded here. If the failure to connect is caused by some
> > > temporary glitches in QEMU and we're sure it will eventually succeed,
> > > there is no need to abort domain creation. If failure to connect is due
> > > to permanent glitches, we should abort.
> > > 
> > 
> > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> > 
> > libxl__qmp_initializations() currently does following tasks.
> > 1/ Create a QMP socket.
> > 
> >   I think all failures in 1/ should be considered as permanent. It
> >   does not only fail the following tasks, but also fails the device
> >   hotplug which needs to cooperate with QEMU.
> > 
> > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
> >   them in xenstore.
> > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
> >   address, port) of VNC and fill them in xenstore.
> > 
> >   If we assume Xen always send the correct QMP commands and
> >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
> >   socket errors (see qmp_next()), which are hard to tell whether they
> >   are permanent or temporal. However, if the missing of serial port
> >   or VNC is considered as not affecting the execution of guest
> >   domain, we may ignore failures here.
> > 
> > > OOI how did you discover this issue? That could be the key to understand
> > > the issue here.
> > 
> > The next patch adds code in libxl__qmp_initialization() to query qmp
> > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> > QEMU) and return error code if it fails. While I was developing that
> > patch, I found xl didn't stop even if bugs in my QEMU patches failed
> > the code in my Xen patch.
> > 
> 
> Right, this should definitely be fatal.
> 
> > Maybe we could let libxl__qmp_initializations() report whether a
> > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> > can continue, but it needs to warn those failures.
> > 
> 
> Yes, we can do that. It's an internal function, we can change things as
> we see fit.
> 
> I would suggest you only make vNVDIMM failure fatal as a start.
> 
> Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-09 10:13           ` Wei Liu
  2017-02-09 10:16             ` Wei Liu
@ 2017-02-10  2:37             ` Haozhong Zhang
  2017-02-10  8:11               ` Wei Liu
  1 sibling, 1 reply; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-10  2:37 UTC (permalink / raw)
  To: Wei Liu
  Cc: Konrad Rzeszutek Wilk, Xiao Guangrong, Ian Jackson,
	Haozhong Zhang, xen-devel

On 02/09/17 10:13 +0000, Wei Liu wrote:
>On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
>> On 02/08/17 10:31 +0000, Wei Liu wrote:
>> > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
>> > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>> > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> > > > > If any error code is returned when creating a domain, stop the domain
>> > > > > creation.
>> > > >
>> > > > This looks like it is a bug-fix that can be spun off from this
>> > > > patchset?
>> > > >
>> > >
>> > > Yes, if everyone considers it's really a bug and the fix does not
>> > > cause compatibility problem (e.g. xl w/o this patch does not abort the
>> > > domain creation if it fails to connect to QEMU VNC port).
>> > >
>> >
>> > I'm two minded here. If the failure to connect is caused by some
>> > temporary glitches in QEMU and we're sure it will eventually succeed,
>> > there is no need to abort domain creation. If failure to connect is due
>> > to permanent glitches, we should abort.
>> >
>>
>> Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
>>
>> libxl__qmp_initializations() currently does following tasks.
>> 1/ Create a QMP socket.
>>
>>   I think all failures in 1/ should be considered as permanent. It
>>   does not only fail the following tasks, but also fails the device
>>   hotplug which needs to cooperate with QEMU.
>>
>> 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
>>   them in xenstore.
>> 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
>>   address, port) of VNC and fill them in xenstore.
>>
>>   If we assume Xen always send the correct QMP commands and
>>   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
>>   socket errors (see qmp_next()), which are hard to tell whether they
>>   are permanent or temporal. However, if the missing of serial port
>>   or VNC is considered as not affecting the execution of guest
>>   domain, we may ignore failures here.
>>
>> > OOI how did you discover this issue? That could be the key to understand
>> > the issue here.
>>
>> The next patch adds code in libxl__qmp_initialization() to query qmp
>> about vNVDIMM parameters (e.g. the base gpfn which is calculated by
>> QEMU) and return error code if it fails. While I was developing that
>> patch, I found xl didn't stop even if bugs in my QEMU patches failed
>> the code in my Xen patch.
>>
>
>Right, this should definitely be fatal.
>
>> Maybe we could let libxl__qmp_initializations() report whether a
>> failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
>> xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
>> can continue, but it needs to warn those failures.
>>
>
>Yes, we can do that. It's an internal function, we can change things as
>we see fit.
>
>I would suggest you only make vNVDIMM failure fatal as a start.
>

I'll send a patch out of this series to implement above w/o NVDIMM
stuffs.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-10  2:37             ` Haozhong Zhang
@ 2017-02-10  8:11               ` Wei Liu
  2017-02-10  8:23                 ` Wei Liu
  2017-02-10  8:24                 ` Haozhong Zhang
  0 siblings, 2 replies; 77+ messages in thread
From: Wei Liu @ 2017-02-10  8:11 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk, xen-devel, Xiao Guangrong, Ian Jackson

On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote:
> On 02/09/17 10:13 +0000, Wei Liu wrote:
> > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> > > On 02/08/17 10:31 +0000, Wei Liu wrote:
> > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > > > If any error code is returned when creating a domain, stop the domain
> > > > > > > creation.
> > > > > >
> > > > > > This looks like it is a bug-fix that can be spun off from this
> > > > > > patchset?
> > > > > >
> > > > >
> > > > > Yes, if everyone considers it's really a bug and the fix does not
> > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > > > domain creation if it fails to connect to QEMU VNC port).
> > > > >
> > > >
> > > > I'm two minded here. If the failure to connect is caused by some
> > > > temporary glitches in QEMU and we're sure it will eventually succeed,
> > > > there is no need to abort domain creation. If failure to connect is due
> > > > to permanent glitches, we should abort.
> > > >
> > > 
> > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> > > 
> > > libxl__qmp_initializations() currently does following tasks.
> > > 1/ Create a QMP socket.
> > > 
> > >   I think all failures in 1/ should be considered as permanent. It
> > >   does not only fail the following tasks, but also fails the device
> > >   hotplug which needs to cooperate with QEMU.
> > > 
> > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
> > >   them in xenstore.
> > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
> > >   address, port) of VNC and fill them in xenstore.
> > > 
> > >   If we assume Xen always send the correct QMP commands and
> > >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
> > >   socket errors (see qmp_next()), which are hard to tell whether they
> > >   are permanent or temporal. However, if the missing of serial port
> > >   or VNC is considered as not affecting the execution of guest
> > >   domain, we may ignore failures here.
> > > 
> > > > OOI how did you discover this issue? That could be the key to understand
> > > > the issue here.
> > > 
> > > The next patch adds code in libxl__qmp_initialization() to query qmp
> > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> > > QEMU) and return error code if it fails. While I was developing that
> > > patch, I found xl didn't stop even if bugs in my QEMU patches failed
> > > the code in my Xen patch.
> > > 
> > 
> > Right, this should definitely be fatal.
> > 
> > > Maybe we could let libxl__qmp_initializations() report whether a
> > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> > > can continue, but it needs to warn those failures.
> > > 
> > 
> > Yes, we can do that. It's an internal function, we can change things as
> > we see fit.
> > 
> > I would suggest you only make vNVDIMM failure fatal as a start.
> > 
> 
> I'll send a patch out of this series to implement above w/o NVDIMM
> stuffs.
> 

Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're
fine with this function as-is because we don't want to make VNC / serial
error fatal, right?

(not going to work today so please allow me some time to read your
reply)

Wei.



> Thanks,
> Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-10  8:11               ` Wei Liu
@ 2017-02-10  8:23                 ` Wei Liu
  2017-02-10  8:24                 ` Haozhong Zhang
  1 sibling, 0 replies; 77+ messages in thread
From: Wei Liu @ 2017-02-10  8:23 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk, xen-devel, Xiao Guangrong,
	Ian Jackson, haozhong.zhang

On Fri, Feb 10, 2017 at 08:11:20AM +0000, Wei Liu wrote:
> On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote:
> > On 02/09/17 10:13 +0000, Wei Liu wrote:
> > > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> > > > On 02/08/17 10:31 +0000, Wei Liu wrote:
> > > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > > > > If any error code is returned when creating a domain, stop the domain
> > > > > > > > creation.
> > > > > > >
> > > > > > > This looks like it is a bug-fix that can be spun off from this
> > > > > > > patchset?
> > > > > > >
> > > > > >
> > > > > > Yes, if everyone considers it's really a bug and the fix does not
> > > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > > > > domain creation if it fails to connect to QEMU VNC port).
> > > > > >
> > > > >
> > > > > I'm two minded here. If the failure to connect is caused by some
> > > > > temporary glitches in QEMU and we're sure it will eventually succeed,
> > > > > there is no need to abort domain creation. If failure to connect is due
> > > > > to permanent glitches, we should abort.
> > > > >
> > > > 
> > > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> > > > 
> > > > libxl__qmp_initializations() currently does following tasks.
> > > > 1/ Create a QMP socket.
> > > > 
> > > >   I think all failures in 1/ should be considered as permanent. It
> > > >   does not only fail the following tasks, but also fails the device
> > > >   hotplug which needs to cooperate with QEMU.
> > > > 
> > > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
> > > >   them in xenstore.
> > > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
> > > >   address, port) of VNC and fill them in xenstore.
> > > > 
> > > >   If we assume Xen always send the correct QMP commands and
> > > >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
> > > >   socket errors (see qmp_next()), which are hard to tell whether they
> > > >   are permanent or temporal. However, if the missing of serial port
> > > >   or VNC is considered as not affecting the execution of guest
> > > >   domain, we may ignore failures here.
> > > > 
> > > > > OOI how did you discover this issue? That could be the key to understand
> > > > > the issue here.
> > > > 
> > > > The next patch adds code in libxl__qmp_initialization() to query qmp
> > > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> > > > QEMU) and return error code if it fails. While I was developing that
> > > > patch, I found xl didn't stop even if bugs in my QEMU patches failed
> > > > the code in my Xen patch.
> > > > 
> > > 
> > > Right, this should definitely be fatal.
> > > 
> > > > Maybe we could let libxl__qmp_initializations() report whether a
> > > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> > > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> > > > can continue, but it needs to warn those failures.
> > > > 
> > > 
> > > Yes, we can do that. It's an internal function, we can change things as
> > > we see fit.
> > > 
> > > I would suggest you only make vNVDIMM failure fatal as a start.
> > > 
> > 
> > I'll send a patch out of this series to implement above w/o NVDIMM
> > stuffs.
> > 
> 
> Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're
> fine with this function as-is because we don't want to make VNC / serial
> error fatal, right?
> 
> (not going to work today so please allow me some time to read your
> reply)
> 
> Wei.
> 
> 
> 
> > Thanks,
> > Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations()
  2017-02-10  8:11               ` Wei Liu
  2017-02-10  8:23                 ` Wei Liu
@ 2017-02-10  8:24                 ` Haozhong Zhang
  1 sibling, 0 replies; 77+ messages in thread
From: Haozhong Zhang @ 2017-02-10  8:24 UTC (permalink / raw)
  To: Wei Liu; +Cc: Konrad Rzeszutek Wilk, Xiao Guangrong, Ian Jackson, xen-devel

On 02/10/17 08:11 +0000, Wei Liu wrote:
>On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote:
>> On 02/09/17 10:13 +0000, Wei Liu wrote:
>> > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
>> > > On 02/08/17 10:31 +0000, Wei Liu wrote:
>> > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
>> > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>> > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> > > > > > > If any error code is returned when creating a domain, stop the domain
>> > > > > > > creation.
>> > > > > >
>> > > > > > This looks like it is a bug-fix that can be spun off from this
>> > > > > > patchset?
>> > > > > >
>> > > > >
>> > > > > Yes, if everyone considers it's really a bug and the fix does not
>> > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
>> > > > > domain creation if it fails to connect to QEMU VNC port).
>> > > > >
>> > > >
>> > > > I'm two minded here. If the failure to connect is caused by some
>> > > > temporary glitches in QEMU and we're sure it will eventually succeed,
>> > > > there is no need to abort domain creation. If failure to connect is due
>> > > > to permanent glitches, we should abort.
>> > > >
>> > >
>> > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
>> > >
>> > > libxl__qmp_initializations() currently does following tasks.
>> > > 1/ Create a QMP socket.
>> > >
>> > >   I think all failures in 1/ should be considered as permanent. It
>> > >   does not only fail the following tasks, but also fails the device
>> > >   hotplug which needs to cooperate with QEMU.
>> > >
>> > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
>> > >   them in xenstore.
>> > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
>> > >   address, port) of VNC and fill them in xenstore.
>> > >
>> > >   If we assume Xen always send the correct QMP commands and
>> > >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
>> > >   socket errors (see qmp_next()), which are hard to tell whether they
>> > >   are permanent or temporal. However, if the missing of serial port
>> > >   or VNC is considered as not affecting the execution of guest
>> > >   domain, we may ignore failures here.
>> > >
>> > > > OOI how did you discover this issue? That could be the key to understand
>> > > > the issue here.
>> > >
>> > > The next patch adds code in libxl__qmp_initialization() to query qmp
>> > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
>> > > QEMU) and return error code if it fails. While I was developing that
>> > > patch, I found xl didn't stop even if bugs in my QEMU patches failed
>> > > the code in my Xen patch.
>> > >
>> >
>> > Right, this should definitely be fatal.
>> >
>> > > Maybe we could let libxl__qmp_initializations() report whether a
>> > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
>> > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
>> > > can continue, but it needs to warn those failures.
>> > >
>> >
>> > Yes, we can do that. It's an internal function, we can change things as
>> > we see fit.
>> >
>> > I would suggest you only make vNVDIMM failure fatal as a start.
>> >
>>
>> I'll send a patch out of this series to implement above w/o NVDIMM
>> stuffs.
>>
>
>Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're
>fine with this function as-is because we don't want to make VNC / serial
>error fatal, right?
>

I misunderstood that xl should fail if encountering errors in 1/, but
now you indicate it's fine to leave it as-is, so no patch will be
needed until NVDIMM support is added.

Haozhong

>(not going to work today so please allow me some time to read your
>reply)
>
>Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2017-02-10  8:24 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10  0:32 [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table Haozhong Zhang
2016-12-09 21:35   ` Konrad Rzeszutek Wilk
2016-12-12  2:27     ` Haozhong Zhang
2016-12-12  8:25       ` Jan Beulich
2016-10-10  0:32 ` [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table Haozhong Zhang
2016-12-09 21:38   ` Konrad Rzeszutek Wilk
2016-12-12  2:31     ` Haozhong Zhang
2016-12-12  8:26       ` Jan Beulich
2016-12-12  8:35         ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions Haozhong Zhang
2016-10-11 19:13   ` Andrew Cooper
2016-12-09 22:02   ` Konrad Rzeszutek Wilk
2016-12-12  4:16     ` Haozhong Zhang
2016-12-12  8:30       ` Jan Beulich
2016-12-12  8:38         ` Haozhong Zhang
2016-12-12 14:44           ` Konrad Rzeszutek Wilk
2016-12-13  1:08             ` Haozhong Zhang
2016-12-22 11:58   ` Jan Beulich
2016-10-10  0:32 ` [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest Haozhong Zhang
2016-12-09 22:22   ` Konrad Rzeszutek Wilk
2016-12-12  4:38     ` Haozhong Zhang
2016-12-22 12:19   ` Jan Beulich
2016-10-10  0:32 ` [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy Haozhong Zhang
2016-12-09 22:27   ` Konrad Rzeszutek Wilk
2016-12-12  4:47     ` Haozhong Zhang
2016-12-22 12:22   ` Jan Beulich
2016-10-10  0:32 ` [RFC XEN PATCH 06/16] tools: reserve guest memory for ACPI from device model Haozhong Zhang
2017-01-27 20:44   ` Konrad Rzeszutek Wilk
2017-02-08  1:39     ` Haozhong Zhang
2017-02-08 14:31       ` Konrad Rzeszutek Wilk
2016-10-10  0:32 ` [RFC XEN PATCH 07/16] tools/libacpi: add callback acpi_ctxt.p2v to get a pointer from physical address Haozhong Zhang
2017-01-27 20:46   ` Konrad Rzeszutek Wilk
2017-02-08  1:42     ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 08/16] tools/libacpi: expose details of memory allocation callback Haozhong Zhang
2017-01-27 20:58   ` Konrad Rzeszutek Wilk
2017-02-08  2:12     ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 09/16] tools/libacpi: add callbacks to access XenStore Haozhong Zhang
2017-01-27 21:10   ` Konrad Rzeszutek Wilk
2017-02-08  2:19     ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 10/16] tools/libacpi: add a simple AML builder Haozhong Zhang
2017-01-27 21:19   ` Konrad Rzeszutek Wilk
2017-02-08  2:33     ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 11/16] tools/libacpi: load ACPI built by the device model Haozhong Zhang
2017-01-27 21:40   ` Konrad Rzeszutek Wilk
2017-02-08  5:38     ` Haozhong Zhang
2017-02-08 14:35       ` Konrad Rzeszutek Wilk
2016-10-10  0:32 ` [RFC XEN PATCH 12/16] tools/libxl: build qemu options from xl vNVDIMM configs Haozhong Zhang
2017-01-27 21:47   ` Konrad Rzeszutek Wilk
2017-02-08  5:42     ` Haozhong Zhang
2017-01-27 21:48   ` Konrad Rzeszutek Wilk
2017-02-08  5:47     ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 13/16] tools/libxl: add support to map host pmem device to guests Haozhong Zhang
2017-01-27 22:06   ` Konrad Rzeszutek Wilk
2017-01-27 22:09     ` Konrad Rzeszutek Wilk
2017-02-08  5:59     ` Haozhong Zhang
2017-02-08 14:37       ` Konrad Rzeszutek Wilk
2016-10-10  0:32 ` [RFC XEN PATCH 14/16] tools/libxl: add support to map files on pmem devices " Haozhong Zhang
2017-01-27 22:10   ` Konrad Rzeszutek Wilk
2017-02-08  6:03     ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 15/16] tools/libxl: handle return code of libxl__qmp_initializations() Haozhong Zhang
2017-01-27 22:11   ` Konrad Rzeszutek Wilk
2017-02-08  6:07     ` Haozhong Zhang
2017-02-08 10:31       ` Wei Liu
2017-02-09  2:47         ` Haozhong Zhang
2017-02-09 10:13           ` Wei Liu
2017-02-09 10:16             ` Wei Liu
2017-02-10  2:37             ` Haozhong Zhang
2017-02-10  8:11               ` Wei Liu
2017-02-10  8:23                 ` Wei Liu
2017-02-10  8:24                 ` Haozhong Zhang
2016-10-10  0:32 ` [RFC XEN PATCH 16/16] tools/libxl: initiate pmem mapping via qmp callback Haozhong Zhang
2017-01-27 22:13   ` Konrad Rzeszutek Wilk
2017-02-08  6:08     ` Haozhong Zhang
2016-10-24 16:37 ` [RFC XEN PATCH 00/16] Add vNVDIMM support to HVM domains Wei Liu
2016-10-25  6:55   ` Haozhong Zhang
2016-10-25 11:28     ` Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.