All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 0/3] Refactor ioreq server for better performance.
@ 2016-01-22  3:20 Yu Zhang
  2016-01-22  3:20 ` [PATCH v11 1/3] Refactor rangeset structure " Yu Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Yu Zhang @ 2016-01-22  3:20 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, Paul.Durrant, zhiyuan.lv, jbeulich,
	wei.liu2

XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the 
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.  This patch series refactored rangeset to base
it on red-back tree, so that the searching would be more efficient. 

Besides, this patchset also splits the tracking of MMIO and guest
ram ranges into different rangesets. And to accommodate more ranges,
a new parameter , max_wp_ram_ranges, is introduced in hvm configuration
file.

Changes in v11: 
1> rename the new parameter to "max_wp_ram_ranges", and use it
specifically for write-protected ram ranges.
2> clear the documentation part.
3> define a LIBXL_HAVE_BUILDINFO_HVM_MAX_WP_RAM_RANGES in libxl.h.

Changes in v10: 
1> Add a new patch to configure the range limit inside ioreq server.
2> Commit message changes. 
3> The previous patch "[1/3] Remove identical relationship between
   ioreq type and rangeset type." has already been merged, and is not
   included in this series now.

Changes in v9: 
1> Change order of patch 2 and patch3.
2> Intruduce a const static array before hvm_ioreq_server_alloc_rangesets().
3> Coding style changes.

Changes in v8: 
Use a clearer API name to map/unmap the write-protected memory in
ioreq server.

Changes in v7: 
1> Coding style changes;
2> Fix a typo in hvm_select_ioreq_server().

Changes in v6: 
Break the identical relationship between ioreq type and rangeset
index inside ioreq server.

Changes in v5:
1> Use gpfn, instead of gpa to track guest write-protected pages;
2> Remove redundant conditional statement in routine find_range().

Changes in v4:
Keep the name HVMOP_IO_RANGE_MEMORY for MMIO resources, and add
a new one, HVMOP_IO_RANGE_WP_MEM, for write-protected memory.

Changes in v3:
1> Use a seperate rangeset for guest ram pages in ioreq server;
2> Refactor rangeset, instead of introduce a new data structure.

Changes in v2:
1> Split the original patch into 2;
2> Take Paul Durrant's comments:
  a> Add a name member in the struct rb_rangeset, and use the 'q'
debug key to dump the ranges in ioreq server;
  b> Keep original routine names for hvm ioreq server;
  c> Commit message changes - mention that a future patch to change
the maximum ranges inside ioreq server.

Yu Zhang (3):
  Refactor rangeset structure for better performance.
  Differentiate IO/mem resources tracked by ioreq server
  tools: introduce parameter max_wp_ram_ranges.

 docs/man/xl.cfg.pod.5            | 18 +++++++++
 tools/libxc/include/xenctrl.h    | 31 +++++++++++++++
 tools/libxc/xc_domain.c          | 61 ++++++++++++++++++++++++++++++
 tools/libxl/libxl.h              |  5 +++
 tools/libxl/libxl_dom.c          |  3 ++
 tools/libxl/libxl_types.idl      |  1 +
 tools/libxl/xl_cmdimpl.c         |  4 ++
 xen/arch/x86/hvm/hvm.c           | 37 +++++++++++++++---
 xen/common/rangeset.c            | 82 +++++++++++++++++++++++++++++-----------
 xen/include/asm-x86/hvm/domain.h |  2 +-
 xen/include/public/hvm/hvm_op.h  |  1 +
 xen/include/public/hvm/params.h  |  5 ++-
 12 files changed, 221 insertions(+), 29 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v11 1/3] Refactor rangeset structure for better performance.
  2016-01-22  3:20 [PATCH v11 0/3] Refactor ioreq server for better performance Yu Zhang
@ 2016-01-22  3:20 ` Yu Zhang
  2016-01-22  3:20 ` [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
  2016-01-22  3:20 ` [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges Yu Zhang
  2 siblings, 0 replies; 23+ messages in thread
From: Yu Zhang @ 2016-01-22  3:20 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, Paul.Durrant, zhiyuan.lv, jbeulich,
	wei.liu2

This patch refactors struct rangeset to base it on the red-black
tree structure, instead of on the current doubly linked list. By
now, ioreq leverages rangeset to keep track of the IO/memory
resources to be emulated. Yet when number of ranges inside one
ioreq server is very high, traversing a doubly linked list could
be time consuming. With this patch, the time complexity for
searching a rangeset can be improved from O(n) to O(log(n)).
Interfaces of rangeset still remain the same, and no new APIs
introduced.

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 xen/common/rangeset.c | 82 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 6c6293c..d15d8d5 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -10,11 +10,12 @@
 #include <xen/sched.h>
 #include <xen/errno.h>
 #include <xen/rangeset.h>
+#include <xen/rbtree.h>
 #include <xsm/xsm.h>
 
 /* An inclusive range [s,e] and pointer to next range in ascending order. */
 struct range {
-    struct list_head list;
+    struct rb_node node;
     unsigned long s, e;
 };
 
@@ -24,7 +25,7 @@ struct rangeset {
     struct domain   *domain;
 
     /* Ordered list of ranges contained in this set, and protecting lock. */
-    struct list_head range_list;
+    struct rb_root   range_tree;
 
     /* Number of ranges that can be allocated */
     long             nr_ranges;
@@ -45,41 +46,78 @@ struct rangeset {
 static struct range *find_range(
     struct rangeset *r, unsigned long s)
 {
-    struct range *x = NULL, *y;
+    struct rb_node *node;
+    struct range   *x;
+    struct range   *prev = NULL;
 
-    list_for_each_entry ( y, &r->range_list, list )
+    node = r->range_tree.rb_node;
+    while ( node != NULL )
     {
-        if ( y->s > s )
-            break;
-        x = y;
+        x = container_of(node, struct range, node);
+        if ( (s >= x->s) && (s <= x->e) )
+            return x;
+        if ( s < x->s )
+            node = node->rb_left;
+        else
+        {
+            prev = x;
+            node = node->rb_right;
+        }
     }
 
-    return x;
+    return prev;
 }
 
 /* Return the lowest range in the set r, or NULL if r is empty. */
 static struct range *first_range(
     struct rangeset *r)
 {
-    if ( list_empty(&r->range_list) )
-        return NULL;
-    return list_entry(r->range_list.next, struct range, list);
+    struct rb_node *node;
+
+    node = rb_first(&r->range_tree);
+    if ( node != NULL )
+        return container_of(node, struct range, node);
+
+    return NULL;
 }
 
 /* Return range following x in ascending order, or NULL if x is the highest. */
 static struct range *next_range(
     struct rangeset *r, struct range *x)
 {
-    if ( x->list.next == &r->range_list )
-        return NULL;
-    return list_entry(x->list.next, struct range, list);
+    struct rb_node *node;
+
+    node = rb_next(&x->node);
+    if ( node != NULL )
+        return container_of(node, struct range, node);
+
+    return NULL;
 }
 
 /* Insert range y after range x in r. Insert as first range if x is NULL. */
 static void insert_range(
     struct rangeset *r, struct range *x, struct range *y)
 {
-    list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
+    struct rb_node **node;
+    struct rb_node *parent = NULL;
+
+    if ( x == NULL )
+        node = &r->range_tree.rb_node;
+    else
+    {
+        node = &x->node.rb_right;
+        parent = &x->node;
+    }
+
+    while ( *node != NULL )
+    {
+        parent = *node;
+        node = &parent->rb_left;
+    }
+
+    /* Add new node and rebalance the red-black tree. */
+    rb_link_node(&y->node, parent, node);
+    rb_insert_color(&y->node, &r->range_tree);
 }
 
 /* Remove a range from its list and free it. */
@@ -88,7 +126,7 @@ static void destroy_range(
 {
     r->nr_ranges++;
 
-    list_del(&x->list);
+    rb_erase(&x->node, &r->range_tree);
     xfree(x);
 }
 
@@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
 bool_t rangeset_is_empty(
     const struct rangeset *r)
 {
-    return ((r == NULL) || list_empty(&r->range_list));
+    return ((r == NULL) || RB_EMPTY_ROOT(&r->range_tree));
 }
 
 struct rangeset *rangeset_new(
@@ -332,7 +370,7 @@ struct rangeset *rangeset_new(
         return NULL;
 
     rwlock_init(&r->lock);
-    INIT_LIST_HEAD(&r->range_list);
+    r->range_tree = RB_ROOT;
     r->nr_ranges = -1;
 
     BUG_ON(flags & ~RANGESETF_prettyprint_hex);
@@ -410,7 +448,7 @@ void rangeset_domain_destroy(
 
 void rangeset_swap(struct rangeset *a, struct rangeset *b)
 {
-    LIST_HEAD(tmp);
+    struct rb_node *tmp;
 
     if ( a < b )
     {
@@ -423,9 +461,9 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
         write_lock(&a->lock);
     }
 
-    list_splice_init(&a->range_list, &tmp);
-    list_splice_init(&b->range_list, &a->range_list);
-    list_splice(&tmp, &b->range_list);
+    tmp = a->range_tree.rb_node;
+    a->range_tree.rb_node = b->range_tree.rb_node;
+    b->range_tree.rb_node = tmp;
 
     write_unlock(&a->lock);
     write_unlock(&b->lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-22  3:20 [PATCH v11 0/3] Refactor ioreq server for better performance Yu Zhang
  2016-01-22  3:20 ` [PATCH v11 1/3] Refactor rangeset structure " Yu Zhang
@ 2016-01-22  3:20 ` Yu Zhang
  2016-01-22 11:43   ` Jan Beulich
  2016-01-22  3:20 ` [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges Yu Zhang
  2 siblings, 1 reply; 23+ messages in thread
From: Yu Zhang @ 2016-01-22  3:20 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, Paul.Durrant, zhiyuan.lv, jbeulich,
	wei.liu2

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each. This
patch uses a seperate rangeset for the guest ram pages.

To differentiate the ioreq type between the write-protected memory
ranges and the mmio ranges when selecting an ioreq server, the p2m
type is retrieved by calling get_page_from_gfn(). And we do not
need to worry about the p2m type change during the ioreq selection
process.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++
 tools/libxc/xc_domain.c          | 61 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c           | 27 +++++++++++++++---
 xen/include/asm-x86/hvm/domain.h |  2 +-
 xen/include/public/hvm/hvm_op.h  |  1 +
 5 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 079cad0..036c72d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2023,6 +2023,37 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
                                             int is_mmio,
                                             uint64_t start,
                                             uint64_t end);
+/**
+ * This function registers a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_wp_mem_range_to_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            xen_pfn_t start,
+                                            xen_pfn_t end);
+
+/**
+ * This function deregisters a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_wp_mem_range_from_ioreq_server(xc_interface *xch,
+                                                domid_t domid,
+                                                ioservid_t id,
+                                                xen_pfn_t start,
+                                                xen_pfn_t end);
 
 /**
  * This function registers a PCI device for config space emulation.
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 99e0d48..4f43695 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1544,6 +1544,67 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
     return rc;
 }
 
+int xc_hvm_map_wp_mem_range_to_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            xen_pfn_t start,
+                                            xen_pfn_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_WP_MEM;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_wp_mem_range_from_ioreq_server(xc_interface *xch,
+                                                domid_t domid,
+                                                ioservid_t id,
+                                                xen_pfn_t start,
+                                                xen_pfn_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_WP_MEM;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+
+}
+
 int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
                                       ioservid_t id, uint16_t segment,
                                       uint8_t bus, uint8_t device,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 674feea..53d38e7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -932,6 +932,9 @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s,
         rangeset_destroy(s->range[i]);
 }
 
+const char *io_range_name[NR_IO_RANGE_TYPES] =
+                                {"port", "mmio", "pci", "wp-mem"};
+
 static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
                                             bool_t is_default)
 {
@@ -946,10 +949,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
         char *name;
 
         rc = asprintf(&name, "ioreq_server %d %s", s->id,
-                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
-                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
-                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
-                      "");
+                      (i < NR_IO_RANGE_TYPES) ? io_range_name[i] : "");
         if ( rc )
             goto fail;
 
@@ -1267,6 +1267,7 @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
             case HVMOP_IO_RANGE_PORT:
             case HVMOP_IO_RANGE_MEMORY:
             case HVMOP_IO_RANGE_PCI:
+            case HVMOP_IO_RANGE_WP_MEM:
                 r = s->range[type];
                 break;
 
@@ -1318,6 +1319,7 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
             case HVMOP_IO_RANGE_PORT:
             case HVMOP_IO_RANGE_MEMORY:
             case HVMOP_IO_RANGE_PCI:
+            case HVMOP_IO_RANGE_WP_MEM:
                 r = s->range[type];
                 break;
 
@@ -2558,6 +2560,8 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
     uint32_t cf8;
     uint8_t type;
     uint64_t addr;
+    p2m_type_t p2mt;
+    struct page_info *ram_page;
 
     if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
         return NULL;
@@ -2601,6 +2605,16 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         type = (p->type == IOREQ_TYPE_PIO) ?
                 HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
         addr = p->addr;
+        if ( type == HVMOP_IO_RANGE_MEMORY )
+        {
+             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
+                                          &p2mt, P2M_UNSHARE);
+             if ( p2mt == p2m_mmio_write_dm )
+                 type = HVMOP_IO_RANGE_WP_MEM;
+
+             if ( ram_page )
+                 put_page(ram_page);
+        }
     }
 
     list_for_each_entry ( s,
@@ -2642,6 +2656,11 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
             }
 
             break;
+        case HVMOP_IO_RANGE_WP_MEM:
+            if ( rangeset_contains_singleton(r, PFN_DOWN(addr)) )
+                return s;
+
+            break;
         }
     }
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index a8cc2ad..1e13973 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -48,7 +48,7 @@ struct hvm_ioreq_vcpu {
     bool_t           pending;
 };
 
-#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
+#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
 #define MAX_NR_IO_RANGES  256
 
 struct hvm_ioreq_server {
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 1606185..c0b1e30 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -333,6 +333,7 @@ struct xen_hvm_io_range {
 # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
 # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
 # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
+# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected ram range */
     uint64_aligned_t start, end; /* IN - inclusive start and end of range */
 };
 typedef struct xen_hvm_io_range xen_hvm_io_range_t;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-22  3:20 [PATCH v11 0/3] Refactor ioreq server for better performance Yu Zhang
  2016-01-22  3:20 ` [PATCH v11 1/3] Refactor rangeset structure " Yu Zhang
  2016-01-22  3:20 ` [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
@ 2016-01-22  3:20 ` Yu Zhang
  2016-01-22  8:01   ` Jan Beulich
  2016-01-26 11:16   ` David Vrabel
  2 siblings, 2 replies; 23+ messages in thread
From: Yu Zhang @ 2016-01-22  3:20 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, Paul.Durrant, zhiyuan.lv, jbeulich,
	wei.liu2

A new parameter - max_wp_ram_ranges is added to set the upper limit
of write-protected ram ranges to be tracked inside one ioreq server
rangeset.

Ioreq server uses a group of rangesets to track the I/O or memory
resources to be emulated. Default limit of ranges that one rangeset
can allocate is set to a small value, due to the fact that these ranges
are allocated in xen heap. Yet for the write-protected ram ranges,
there are circumstances under which the upper limit inside one rangeset
should exceed the default one. E.g. in XenGT, when tracking the
per-process graphic translation tables on intel broadwell platforms,
the number of page tables concerned will be several thousand(normally
in this case, 8192 could be a big enough value). Users who set this
item explicitly are supposed to know the specific scenarios that
necessitate this configuration.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 docs/man/xl.cfg.pod.5           | 18 ++++++++++++++++++
 tools/libxl/libxl.h             |  5 +++++
 tools/libxl/libxl_dom.c         |  3 +++
 tools/libxl/libxl_types.idl     |  1 +
 tools/libxl/xl_cmdimpl.c        |  4 ++++
 xen/arch/x86/hvm/hvm.c          | 10 +++++++++-
 xen/include/public/hvm/params.h |  5 ++++-
 7 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8899f75..7634c42 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -962,6 +962,24 @@ FIFO-based event channel ABI support up to 131,071 event channels.
 Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
 x86).
 
+=item B<max_wp_ram_ranges=N>
+
+Limit the maximum write-protected ram ranges that can be tracked
+inside one ioreq server rangeset.
+
+Ioreq server uses a group of rangesets to track the I/O or memory
+resources to be emulated. Default limit of ranges that one rangeset
+can allocate is set to a small value, due to the fact that these ranges
+are allocated in xen heap. Yet for the write-protected ram ranges,
+there are circumstances under which the upper limit inside one rangeset
+should exceed the default one. E.g. in XenGT, when tracking the per-
+process graphic translation tables on intel broadwell platforms, the
+number of page tables concerned will be several thousand(normally
+in this case, 8192 could be a big enough value). Not configuring this
+item, or setting its value to 0 will result in the upper limit set
+to its default one. Users who set his item explicitly are supposed
+to know the specific scenarios that necessitate this configuration.
+
 =back
 
 =head2 Paravirtualised (PV) Guest Specific Options
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 156c0d5..6698d72 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -136,6 +136,11 @@
 #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
 
 /*
+ * libxl_domain_build_info has the u.hvm.max_wp_ram_ranges field.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_MAX_WP_RAM_RANGES 1
+
+/*
  * libxl_domain_build_info has the u.hvm.ms_vm_genid field.
  */
 #define LIBXL_HAVE_BUILDINFO_HVM_MS_VM_GENID 1
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 2269998..54173cb 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -288,6 +288,9 @@ static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
                     libxl_defbool_val(info->u.hvm.nested_hvm));
     xc_hvm_param_set(handle, domid, HVM_PARAM_ALTP2M,
                     libxl_defbool_val(info->u.hvm.altp2m));
+    if (info->u.hvm.max_wp_ram_ranges > 0)
+        xc_hvm_param_set(handle, domid, HVM_PARAM_MAX_WP_RAM_RANGES,
+                        info->u.hvm.max_wp_ram_ranges);
 }
 
 int libxl__build_pre(libxl__gc *gc, uint32_t domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9ad7eba..c7d7b5f 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -518,6 +518,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
+                                       ("max_wp_ram_ranges", uint32),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25507c7..8bb7cc7 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1626,6 +1626,10 @@ static void parse_config_data(const char *config_source,
 
         if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
             b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
+
+        if (!xlu_cfg_get_long (config, "max_wp_ram_ranges", &l, 0))
+            b_info->u.hvm.max_wp_ram_ranges = l;
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 53d38e7..fd2b697 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
 {
     unsigned int i;
     int rc;
+    unsigned int max_wp_ram_ranges =
+        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
+        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
+        MAX_NR_IO_RANGES;
 
     if ( is_default )
         goto done;
@@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
         if ( !s->range[i] )
             goto fail;
 
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+        if ( i == HVMOP_IO_RANGE_WP_MEM )
+            rangeset_limit(s->range[i], max_wp_ram_ranges);
+        else
+            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
     }
 
  done:
@@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_IOREQ_SERVER_PFN:
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
+    case HVM_PARAM_MAX_WP_RAM_RANGES:
         if ( value != 0 && a->value != value )
             rc = -EEXIST;
         break;
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 81f9451..ab3b11d 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -210,6 +210,9 @@
 /* Boolean: Enable altp2m */
 #define HVM_PARAM_ALTP2M       35
 
-#define HVM_NR_PARAMS          36
+/* Max write-protected ram ranges to be tracked in one ioreq server rangeset */
+#define HVM_PARAM_MAX_WP_RAM_RANGES  36
+
+#define HVM_NR_PARAMS          37
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-22  3:20 ` [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges Yu Zhang
@ 2016-01-22  8:01   ` Jan Beulich
  2016-01-26  7:32     ` Yu, Zhang
  2016-01-26 11:16   ` David Vrabel
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-22  8:01 UTC (permalink / raw)
  To: Yu Zhang
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
> hvm_ioreq_server *s,
>  {
>      unsigned int i;
>      int rc;
> +    unsigned int max_wp_ram_ranges =
> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
> +        MAX_NR_IO_RANGES;

Besides this having stray blanks inside the parentheses it truncates
the value from 64 to 32 bits and would benefit from using the gcc
extension of omitting the middle operand of ?:. But even better
would imo be if you avoided the local variable and ...

> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>          if ( !s->range[i] )
>              goto fail;
>  
> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
> +        else
> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);

... did the entire computation here, using ?: for the second argument
of the function invocation.

> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>      case HVM_PARAM_IOREQ_SERVER_PFN:
>      case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>      case HVM_PARAM_ALTP2M:
> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>          if ( value != 0 && a->value != value )
>              rc = -EEXIST;
>          break;

Is there a particular reason you want this limit to be unchangeable
after having got set once?

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-22  3:20 ` [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
@ 2016-01-22 11:43   ` Jan Beulich
  2016-01-26  7:59     ` Yu, Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-22 11:43 UTC (permalink / raw)
  To: Yu Zhang
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
> @@ -2601,6 +2605,16 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>          type = (p->type == IOREQ_TYPE_PIO) ?
>                  HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
>          addr = p->addr;
> +        if ( type == HVMOP_IO_RANGE_MEMORY )
> +        {
> +             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
> +                                          &p2mt, P2M_UNSHARE);

It seems to me like I had asked before: Why P2M_UNSHARE instead
of just P2M_QUERY? (This could surely be fixed up while committing,
the more that I've already done some cleanup here, but I'd like to
understand this before it goes in.)

> +             if ( p2mt == p2m_mmio_write_dm )
> +                 type = HVMOP_IO_RANGE_WP_MEM;
> +
> +             if ( ram_page )
> +                 put_page(ram_page);
> +        }
>      }
>  
>      list_for_each_entry ( s,
> @@ -2642,6 +2656,11 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>              }
>  
>              break;
> +        case HVMOP_IO_RANGE_WP_MEM:
> +            if ( rangeset_contains_singleton(r, PFN_DOWN(addr)) )
> +                return s;

Considering you've got p2m_mmio_write_dm above - can this
validly return false here?

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-22  8:01   ` Jan Beulich
@ 2016-01-26  7:32     ` Yu, Zhang
  2016-01-26 11:00       ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Yu, Zhang @ 2016-01-26  7:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

Thank you, Jan.

On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>> hvm_ioreq_server *s,
>>   {
>>       unsigned int i;
>>       int rc;
>> +    unsigned int max_wp_ram_ranges =
>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>> +        MAX_NR_IO_RANGES;
>
> Besides this having stray blanks inside the parentheses it truncates
> the value from 64 to 32 bits and would benefit from using the gcc
> extension of omitting the middle operand of ?:. But even better
> would imo be if you avoided the local variable and ...
>
After second thought, how about we define a default value for this
parameter in libx.h, and initialize the parameter when creating the
domain with default value if it's not configured.
About this local variable, we keep it, and ...

>> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>>           if ( !s->range[i] )
>>               goto fail;
>>
>> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
>> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
>> +        else
>> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>
> ... did the entire computation here, using ?: for the second argument
> of the function invocation.
>
... replace the if/else pair with sth. like:
         rangeset_limit(s->range[i],
                        ((i == HVMOP_IO_RANGE_WP_MEM)?
                         max_wp_ram_ranges:
                         MAX_NR_IO_RANGES));
This 'max_wp_ram_ranges' has no particular usages, but the string
"s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] "
is too lengthy, and can easily break the 80 column limitation. :)
Does this approach sounds OK? :)

>> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>>       case HVM_PARAM_IOREQ_SERVER_PFN:
>>       case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>>       case HVM_PARAM_ALTP2M:
>> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>>           if ( value != 0 && a->value != value )
>>               rc = -EEXIST;
>>           break;
>
> Is there a particular reason you want this limit to be unchangeable
> after having got set once?
>
Well, not exactly. :)
I added this limit because by now we do not have any approach to
change the max range numbers inside ioreq server during run-time.
I can add another patch to introduce an xl command, which can change
it dynamically. But I doubt the necessity of this new command and
am also wonder if this new command would cause more confusion for
the user...
> Jan
>
>
B.R.
Yu
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-22 11:43   ` Jan Beulich
@ 2016-01-26  7:59     ` Yu, Zhang
  2016-01-26 11:24       ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Yu, Zhang @ 2016-01-26  7:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir



On 1/22/2016 7:43 PM, Jan Beulich wrote:
>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>> @@ -2601,6 +2605,16 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>>           type = (p->type == IOREQ_TYPE_PIO) ?
>>                   HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
>>           addr = p->addr;
>> +        if ( type == HVMOP_IO_RANGE_MEMORY )
>> +        {
>> +             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
>> +                                          &p2mt, P2M_UNSHARE);
>
> It seems to me like I had asked before: Why P2M_UNSHARE instead
> of just P2M_QUERY? (This could surely be fixed up while committing,
> the more that I've already done some cleanup here, but I'd like to
> understand this before it goes in.)
>
Hah, sorry for my bad memory. :)
I did not found P2M_QUERY; only P2M_UNSHARE and P2M_ALLOC are
defined. But after reading the code in ept_get_entry(), I guess the
P2M_UNSHARE is not accurate, maybe I should use 0 here for the
p2m_query_t parameter in get_page_from_gfn()?

>> +             if ( p2mt == p2m_mmio_write_dm )
>> +                 type = HVMOP_IO_RANGE_WP_MEM;
>> +
>> +             if ( ram_page )
>> +                 put_page(ram_page);
>> +        }
>>       }
>>
>>       list_for_each_entry ( s,
>> @@ -2642,6 +2656,11 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>>               }
>>
>>               break;
>> +        case HVMOP_IO_RANGE_WP_MEM:
>> +            if ( rangeset_contains_singleton(r, PFN_DOWN(addr)) )
>> +                return s;
>
> Considering you've got p2m_mmio_write_dm above - can this
> validly return false here?

Well, if we have multiple ioreq servers defined, it will...
Currently, this p2m type is only used in XenGT, which has only one
ioreq server other than qemu for the vGPU. But suppose there will
be more devices using this type and more ioreq servers introduced
for them, it can return false.
>
> Jan
>
>
B.R.
Yu
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-26  7:32     ` Yu, Zhang
@ 2016-01-26 11:00       ` Jan Beulich
  2016-01-27  7:01         ` Yu, Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-26 11:00 UTC (permalink / raw)
  To: Zhang Yu
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>> hvm_ioreq_server *s,
>>>   {
>>>       unsigned int i;
>>>       int rc;
>>> +    unsigned int max_wp_ram_ranges =
>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>> +        MAX_NR_IO_RANGES;
>>
>> Besides this having stray blanks inside the parentheses it truncates
>> the value from 64 to 32 bits and would benefit from using the gcc
>> extension of omitting the middle operand of ?:. But even better
>> would imo be if you avoided the local variable and ...
>>
> After second thought, how about we define a default value for this
> parameter in libx.h, and initialize the parameter when creating the
> domain with default value if it's not configured.

No, I don't think the tool stack should be determining the default
here (unless you want the default to be zero, and have zero
indeed mean zero).

> About this local variable, we keep it, and ...
> 
>>> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
> hvm_ioreq_server *s,
>>>           if ( !s->range[i] )
>>>               goto fail;
>>>
>>> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
>>> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
>>> +        else
>>> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>
>> ... did the entire computation here, using ?: for the second argument
>> of the function invocation.
>>
> ... replace the if/else pair with sth. like:
>          rangeset_limit(s->range[i],
>                         ((i == HVMOP_IO_RANGE_WP_MEM)?
>                          max_wp_ram_ranges:
>                          MAX_NR_IO_RANGES));
> This 'max_wp_ram_ranges' has no particular usages, but the string
> "s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] "
> is too lengthy, and can easily break the 80 column limitation. :)
> Does this approach sounds OK? :)

Seems better than the original, so okay.

>>> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>>>       case HVM_PARAM_IOREQ_SERVER_PFN:
>>>       case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>>>       case HVM_PARAM_ALTP2M:
>>> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>>>           if ( value != 0 && a->value != value )
>>>               rc = -EEXIST;
>>>           break;
>>
>> Is there a particular reason you want this limit to be unchangeable
>> after having got set once?
>>
> Well, not exactly. :)
> I added this limit because by now we do not have any approach to
> change the max range numbers inside ioreq server during run-time.
> I can add another patch to introduce an xl command, which can change
> it dynamically. But I doubt the necessity of this new command and
> am also wonder if this new command would cause more confusion for
> the user...

And I didn't say you need to expose this to the user. All I asked
was whether you really mean the value to be a set-once one. If
yes, the code above is fine. If no, the code above should be
changed, but there's then still no need to expose a way to
"manually" adjust the value until a need for such arises.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-22  3:20 ` [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges Yu Zhang
  2016-01-22  8:01   ` Jan Beulich
@ 2016-01-26 11:16   ` David Vrabel
  2016-01-27  7:03     ` Yu, Zhang
  1 sibling, 1 reply; 23+ messages in thread
From: David Vrabel @ 2016-01-26 11:16 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, Paul.Durrant, zhiyuan.lv, jbeulich,
	wei.liu2

On 22/01/16 03:20, Yu Zhang wrote:
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -962,6 +962,24 @@ FIFO-based event channel ABI support up to 131,071 event channels.
>  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
>  x86).
>  
> +=item B<max_wp_ram_ranges=N>
> +
> +Limit the maximum write-protected ram ranges that can be tracked
> +inside one ioreq server rangeset.
> +
> +Ioreq server uses a group of rangesets to track the I/O or memory
> +resources to be emulated. Default limit of ranges that one rangeset
> +can allocate is set to a small value, due to the fact that these ranges
> +are allocated in xen heap. Yet for the write-protected ram ranges,
> +there are circumstances under which the upper limit inside one rangeset
> +should exceed the default one. E.g. in XenGT, when tracking the per-
> +process graphic translation tables on intel broadwell platforms, the
> +number of page tables concerned will be several thousand(normally
> +in this case, 8192 could be a big enough value). Not configuring this
> +item, or setting its value to 0 will result in the upper limit set
> +to its default one. Users who set his item explicitly are supposed
> +to know the specific scenarios that necessitate this configuration.

This help text isn't very helpful.  How is a user supposed to "know the
specific scenarios" that need this option?

Why doesn't the toolstack (or qemu) automatically set this value based
on whether GVT-g/GVT-d is being used? Then there is no need to even
present this option to the user.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-26  7:59     ` Yu, Zhang
@ 2016-01-26 11:24       ` Jan Beulich
  2016-01-27  7:02         ` Yu, Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-26 11:24 UTC (permalink / raw)
  To: Zhang Yu
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 26.01.16 at 08:59, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 1/22/2016 7:43 PM, Jan Beulich wrote:
>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>> @@ -2601,6 +2605,16 @@ struct hvm_ioreq_server 
> *hvm_select_ioreq_server(struct domain *d,
>>>           type = (p->type == IOREQ_TYPE_PIO) ?
>>>                   HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
>>>           addr = p->addr;
>>> +        if ( type == HVMOP_IO_RANGE_MEMORY )
>>> +        {
>>> +             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
>>> +                                          &p2mt, P2M_UNSHARE);
>>
>> It seems to me like I had asked before: Why P2M_UNSHARE instead
>> of just P2M_QUERY? (This could surely be fixed up while committing,
>> the more that I've already done some cleanup here, but I'd like to
>> understand this before it goes in.)
>>
> Hah, sorry for my bad memory. :)
> I did not found P2M_QUERY; only P2M_UNSHARE and P2M_ALLOC are
> defined. But after reading the code in ept_get_entry(), I guess the
> P2M_UNSHARE is not accurate, maybe I should use 0 here for the
> p2m_query_t parameter in get_page_from_gfn()?

Ah, sorry for the misnamed suggestion. I'm not sure whether using
zero here actually matches your needs; P2M_UNSHARE though
seems odd in any case, so at least switching to P2M_ALLOC (to
populate PoD pages) would seem to be necessary.

>>> @@ -2642,6 +2656,11 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>>>               }
>>>
>>>               break;
>>> +        case HVMOP_IO_RANGE_WP_MEM:
>>> +            if ( rangeset_contains_singleton(r, PFN_DOWN(addr)) )
>>> +                return s;
>>
>> Considering you've got p2m_mmio_write_dm above - can this
>> validly return false here?
> 
> Well, if we have multiple ioreq servers defined, it will...

Ah, right. That's fine then.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-26 11:00       ` Jan Beulich
@ 2016-01-27  7:01         ` Yu, Zhang
  2016-01-27 10:27           ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Yu, Zhang @ 2016-01-27  7:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir



On 1/26/2016 7:00 PM, Jan Beulich wrote:
>>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
>> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>>> hvm_ioreq_server *s,
>>>>    {
>>>>        unsigned int i;
>>>>        int rc;
>>>> +    unsigned int max_wp_ram_ranges =
>>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>>> +        MAX_NR_IO_RANGES;
>>>
>>> Besides this having stray blanks inside the parentheses it truncates
>>> the value from 64 to 32 bits and would benefit from using the gcc
>>> extension of omitting the middle operand of ?:. But even better
>>> would imo be if you avoided the local variable and ...
>>>
>> After second thought, how about we define a default value for this
>> parameter in libx.h, and initialize the parameter when creating the
>> domain with default value if it's not configured.
>
> No, I don't think the tool stack should be determining the default
> here (unless you want the default to be zero, and have zero
> indeed mean zero).
>
Thank you, Jan.
If we do not provide a default value in tool stack, the code above
should be kept, to initialize the local variable with either the one
set in the configuration file, or with MAX_NR_IO_RANGES. Is this OK?

>> About this local variable, we keep it, and ...
>>
>>>> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>> hvm_ioreq_server *s,
>>>>            if ( !s->range[i] )
>>>>                goto fail;
>>>>
>>>> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>>> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
>>>> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
>>>> +        else
>>>> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>>
>>> ... did the entire computation here, using ?: for the second argument
>>> of the function invocation.
>>>
>> ... replace the if/else pair with sth. like:
>>           rangeset_limit(s->range[i],
>>                          ((i == HVMOP_IO_RANGE_WP_MEM)?
>>                           max_wp_ram_ranges:
>>                           MAX_NR_IO_RANGES));
>> This 'max_wp_ram_ranges' has no particular usages, but the string
>> "s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES]"
>> is too lengthy, and can easily break the 80 column limitation. :)
>> Does this approach sounds OK? :)
>
> Seems better than the original, so okay.
>
>>>> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>>>>        case HVM_PARAM_IOREQ_SERVER_PFN:
>>>>        case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>>>>        case HVM_PARAM_ALTP2M:
>>>> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>>>>            if ( value != 0 && a->value != value )
>>>>                rc = -EEXIST;
>>>>            break;
>>>
>>> Is there a particular reason you want this limit to be unchangeable
>>> after having got set once?
>>>
>> Well, not exactly. :)
>> I added this limit because by now we do not have any approach to
>> change the max range numbers inside ioreq server during run-time.
>> I can add another patch to introduce an xl command, which can change
>> it dynamically. But I doubt the necessity of this new command and
>> am also wonder if this new command would cause more confusion for
>> the user...
>
> And I didn't say you need to expose this to the user. All I asked
> was whether you really mean the value to be a set-once one. If
> yes, the code above is fine. If no, the code above should be
> changed, but there's then still no need to expose a way to
> "manually" adjust the value until a need for such arises.
>

I see. The constraint is not necessary. And I'll remove this code. :)

> Jan
>
>

B.R.
Yu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-26 11:24       ` Jan Beulich
@ 2016-01-27  7:02         ` Yu, Zhang
  2016-01-27 10:28           ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Yu, Zhang @ 2016-01-27  7:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir



On 1/26/2016 7:24 PM, Jan Beulich wrote:
>>>> On 26.01.16 at 08:59, <yu.c.zhang@linux.intel.com> wrote:
>
>>
>> On 1/22/2016 7:43 PM, Jan Beulich wrote:
>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>> @@ -2601,6 +2605,16 @@ struct hvm_ioreq_server
>> *hvm_select_ioreq_server(struct domain *d,
>>>>            type = (p->type == IOREQ_TYPE_PIO) ?
>>>>                    HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
>>>>            addr = p->addr;
>>>> +        if ( type == HVMOP_IO_RANGE_MEMORY )
>>>> +        {
>>>> +             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
>>>> +                                          &p2mt, P2M_UNSHARE);
>>>
>>> It seems to me like I had asked before: Why P2M_UNSHARE instead
>>> of just P2M_QUERY? (This could surely be fixed up while committing,
>>> the more that I've already done some cleanup here, but I'd like to
>>> understand this before it goes in.)
>>>
>> Hah, sorry for my bad memory. :)
>> I did not found P2M_QUERY; only P2M_UNSHARE and P2M_ALLOC are
>> defined. But after reading the code in ept_get_entry(), I guess the
>> P2M_UNSHARE is not accurate, maybe I should use 0 here for the
>> p2m_query_t parameter in get_page_from_gfn()?
>
> Ah, sorry for the misnamed suggestion. I'm not sure whether using
> zero here actually matches your needs; P2M_UNSHARE though
> seems odd in any case, so at least switching to P2M_ALLOC (to
> populate PoD pages) would seem to be necessary.
>

Thanks, Jan.  :)
And now I believe we should use zero here. By now XenGT does not
support PoD and here all we care about is whether the p2m type of this
gfn is p2m_mmio_write_dm.

>>>> @@ -2642,6 +2656,11 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>>>>                }
>>>>
>>>>                break;
>>>> +        case HVMOP_IO_RANGE_WP_MEM:
>>>> +            if ( rangeset_contains_singleton(r, PFN_DOWN(addr)) )
>>>> +                return s;
>>>
>>> Considering you've got p2m_mmio_write_dm above - can this
>>> validly return false here?
>>
>> Well, if we have multiple ioreq servers defined, it will...
>
> Ah, right. That's fine then.
>
> Jan
>
>

B.R.
Yu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-26 11:16   ` David Vrabel
@ 2016-01-27  7:03     ` Yu, Zhang
  0 siblings, 0 replies; 23+ messages in thread
From: Yu, Zhang @ 2016-01-27  7:03 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, Paul.Durrant, zhiyuan.lv, jbeulich,
	wei.liu2



On 1/26/2016 7:16 PM, David Vrabel wrote:
> On 22/01/16 03:20, Yu Zhang wrote:
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -962,6 +962,24 @@ FIFO-based event channel ABI support up to 131,071 event channels.
>>   Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
>>   x86).
>>
>> +=item B<max_wp_ram_ranges=N>
>> +
>> +Limit the maximum write-protected ram ranges that can be tracked
>> +inside one ioreq server rangeset.
>> +
>> +Ioreq server uses a group of rangesets to track the I/O or memory
>> +resources to be emulated. Default limit of ranges that one rangeset
>> +can allocate is set to a small value, due to the fact that these ranges
>> +are allocated in xen heap. Yet for the write-protected ram ranges,
>> +there are circumstances under which the upper limit inside one rangeset
>> +should exceed the default one. E.g. in XenGT, when tracking the per-
>> +process graphic translation tables on intel broadwell platforms, the
>> +number of page tables concerned will be several thousand(normally
>> +in this case, 8192 could be a big enough value). Not configuring this
>> +item, or setting its value to 0 will result in the upper limit set
>> +to its default one. Users who set his item explicitly are supposed
>> +to know the specific scenarios that necessitate this configuration.
>
> This help text isn't very helpful.  How is a user supposed to "know the
> specific scenarios" that need this option?
>

Thank you for your comment, David. :)

Well, "know the specific scenarios" may seem too ambiguous. Here the
"specific scenarios" means when this parameter is used:
1> for virtual devices other than vGPU in GVT-g;
2> for GVT-g, there also might be some extreme cases, e.g. too many
graphic related applications in one VM, which create a great deal of
per-process graphic translation tables.
3> for GVT-g, future cpu platforms which provide even more PPGGTs.
Other than these cases, 8192 is a suggested value for this option.

So how about we add a section to point out these scenarios in this
text?

> Why doesn't the toolstack (or qemu) automatically set this value based
> on whether GVT-g/GVT-d is being used? Then there is no need to even
> present this option to the user.
>
> David
>

By now, this parameter is used in GVT-g, but we are expecting more
usages for other devices which adopt this mediated pass-through idea.
Indeed, XenGT has an xl configuration flag, and several other XenGT
specific parameters. We have plans to upstream these options later
this year. After these XenGT options are accepted, we can set this
"max_wp_ram_ranges" to a default value if GVT-g is detected and the
"max_wp_ram_ranges" is not explicitly configured.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27  7:01         ` Yu, Zhang
@ 2016-01-27 10:27           ` Jan Beulich
  2016-01-27 14:13             ` Yu, Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-27 10:27 UTC (permalink / raw)
  To: Zhang Yu
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 27.01.16 at 08:01, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 1/26/2016 7:00 PM, Jan Beulich wrote:
>>>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
>>> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>>>> hvm_ioreq_server *s,
>>>>>    {
>>>>>        unsigned int i;
>>>>>        int rc;
>>>>> +    unsigned int max_wp_ram_ranges =
>>>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>>>> +        MAX_NR_IO_RANGES;
>>>>
>>>> Besides this having stray blanks inside the parentheses it truncates
>>>> the value from 64 to 32 bits and would benefit from using the gcc
>>>> extension of omitting the middle operand of ?:. But even better
>>>> would imo be if you avoided the local variable and ...
>>>>
>>> After second thought, how about we define a default value for this
>>> parameter in libx.h, and initialize the parameter when creating the
>>> domain with default value if it's not configured.
>>
>> No, I don't think the tool stack should be determining the default
>> here (unless you want the default to be zero, and have zero
>> indeed mean zero).
>>
> Thank you, Jan.
> If we do not provide a default value in tool stack, the code above
> should be kept, to initialize the local variable with either the one
> set in the configuration file, or with MAX_NR_IO_RANGES. Is this OK?

Well, not exactly: For one, the original comment (still present
above) regarding truncation holds. And then another question is:
Do you expect this resource type to be useful with its number of
ranges limited to MAX_NR_IO_RANGES? I ask because if the
answer is "no", having it default to zero might be as reasonable.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-27  7:02         ` Yu, Zhang
@ 2016-01-27 10:28           ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2016-01-27 10:28 UTC (permalink / raw)
  To: Zhang Yu
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 27.01.16 at 08:02, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 1/26/2016 7:24 PM, Jan Beulich wrote:
>>>>> On 26.01.16 at 08:59, <yu.c.zhang@linux.intel.com> wrote:
>>
>>>
>>> On 1/22/2016 7:43 PM, Jan Beulich wrote:
>>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>>> @@ -2601,6 +2605,16 @@ struct hvm_ioreq_server
>>> *hvm_select_ioreq_server(struct domain *d,
>>>>>            type = (p->type == IOREQ_TYPE_PIO) ?
>>>>>                    HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
>>>>>            addr = p->addr;
>>>>> +        if ( type == HVMOP_IO_RANGE_MEMORY )
>>>>> +        {
>>>>> +             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
>>>>> +                                          &p2mt, P2M_UNSHARE);
>>>>
>>>> It seems to me like I had asked before: Why P2M_UNSHARE instead
>>>> of just P2M_QUERY? (This could surely be fixed up while committing,
>>>> the more that I've already done some cleanup here, but I'd like to
>>>> understand this before it goes in.)
>>>>
>>> Hah, sorry for my bad memory. :)
>>> I did not found P2M_QUERY; only P2M_UNSHARE and P2M_ALLOC are
>>> defined. But after reading the code in ept_get_entry(), I guess the
>>> P2M_UNSHARE is not accurate, maybe I should use 0 here for the
>>> p2m_query_t parameter in get_page_from_gfn()?
>>
>> Ah, sorry for the misnamed suggestion. I'm not sure whether using
>> zero here actually matches your needs; P2M_UNSHARE though
>> seems odd in any case, so at least switching to P2M_ALLOC (to
>> populate PoD pages) would seem to be necessary.
>>
> 
> Thanks, Jan.  :)
> And now I believe we should use zero here. By now XenGT does not
> support PoD and here all we care about is whether the p2m type of this
> gfn is p2m_mmio_write_dm.

Okay, fine then (if, of course, it works).

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27 10:27           ` Jan Beulich
@ 2016-01-27 14:13             ` Yu, Zhang
  2016-01-27 14:32               ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Yu, Zhang @ 2016-01-27 14:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir



On 1/27/2016 6:27 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 08:01, <yu.c.zhang@linux.intel.com> wrote:
>
>>
>> On 1/26/2016 7:00 PM, Jan Beulich wrote:
>>>>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>>>>> hvm_ioreq_server *s,
>>>>>>     {
>>>>>>         unsigned int i;
>>>>>>         int rc;
>>>>>> +    unsigned int max_wp_ram_ranges =
>>>>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>>>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>>>>> +        MAX_NR_IO_RANGES;
>>>>>
>>>>> Besides this having stray blanks inside the parentheses it truncates
>>>>> the value from 64 to 32 bits and would benefit from using the gcc
>>>>> extension of omitting the middle operand of ?:. But even better
>>>>> would imo be if you avoided the local variable and ...
>>>>>
>>>> After second thought, how about we define a default value for this
>>>> parameter in libx.h, and initialize the parameter when creating the
>>>> domain with default value if it's not configured.
>>>
>>> No, I don't think the tool stack should be determining the default
>>> here (unless you want the default to be zero, and have zero
>>> indeed mean zero).
>>>
>> Thank you, Jan.
>> If we do not provide a default value in tool stack, the code above
>> should be kept, to initialize the local variable with either the one
>> set in the configuration file, or with MAX_NR_IO_RANGES. Is this OK?
>
> Well, not exactly: For one, the original comment (still present
> above) regarding truncation holds. And then another question is:
> Do you expect this resource type to be useful with its number of
> ranges limited to MAX_NR_IO_RANGES? I ask because if the
> answer is "no", having it default to zero might be as reasonable.
>

Thanks, Jan.

About the default value:
   You are right. :) For XenGT, MAX_NR_IO_RANGES may only work under
limited conditions. Having it default to zero means XenGT users must
manually configure this option. Since we have plans to push other XenGT
tool stack parameters(including a GVT-g flag), how about we set this
max_wp_ram_ranges to a default one when GVT-g flag is detected, and
till then, max_wp_ram_ranges is supposed to be configured explicitly for
XenGT?

About the truncation issue:
   I do not quite follow. Will this hurt if the value configured does
not exceed 4G? What about a type cast?

B.R.
Yu



> Jan
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27 14:13             ` Yu, Zhang
@ 2016-01-27 14:32               ` Jan Beulich
  2016-01-27 14:56                 ` Yu, Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-27 14:32 UTC (permalink / raw)
  To: Zhang Yu
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
> About the default value:
>    You are right. :) For XenGT, MAX_NR_IO_RANGES may only work under
> limited conditions. Having it default to zero means XenGT users must
> manually configure this option. Since we have plans to push other XenGT
> tool stack parameters(including a GVT-g flag), how about we set this
> max_wp_ram_ranges to a default one when GVT-g flag is detected, and
> till then, max_wp_ram_ranges is supposed to be configured explicitly for
> XenGT?

Sounds reasonable, and in line with what iirc was discussed on
the tool stack side.

> About the truncation issue:
>    I do not quite follow. Will this hurt if the value configured does
> not exceed 4G? What about a type cast?

A typecast would not alter behavior in any way. And of course
a problem only arises if the value was above 4 billion. You either
need to refuse such values while the attempt is made to set it.
or you need to deal with the full range of possible values. Likely
the former is the better (and I wonder whether the upper
bound shouldn't be forced even lower than 4 billion).

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27 14:32               ` Jan Beulich
@ 2016-01-27 14:56                 ` Yu, Zhang
  2016-01-27 15:12                   ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Yu, Zhang @ 2016-01-27 14:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir



On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>> About the default value:
>>     You are right. :) For XenGT, MAX_NR_IO_RANGES may only work under
>> limited conditions. Having it default to zero means XenGT users must
>> manually configure this option. Since we have plans to push other XenGT
>> tool stack parameters(including a GVT-g flag), how about we set this
>> max_wp_ram_ranges to a default one when GVT-g flag is detected, and
>> till then, max_wp_ram_ranges is supposed to be configured explicitly for
>> XenGT?
>
> Sounds reasonable, and in line with what iirc was discussed on
> the tool stack side.
>

Great, and thanks.

>> About the truncation issue:
>>     I do not quite follow. Will this hurt if the value configured does
>> not exceed 4G? What about a type cast?
>
> A typecast would not alter behavior in any way. And of course
> a problem only arises if the value was above 4 billion. You either
> need to refuse such values while the attempt is made to set it.
> or you need to deal with the full range of possible values. Likely
> the former is the better (and I wonder whether the upper
> bound shouldn't be forced even lower than 4 billion).
>

Oh, I see. A check with the upper bound sounds better. Using 4G as the
upper bound is a little conservative, but I do not have any better
criteria right now. :)

> Jan
>
>

Thanks
Yu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27 14:56                 ` Yu, Zhang
@ 2016-01-27 15:12                   ` Jan Beulich
  2016-01-27 15:23                     ` Yu, Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-27 15:12 UTC (permalink / raw)
  To: Zhang Yu
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>> About the truncation issue:
>>>     I do not quite follow. Will this hurt if the value configured does
>>> not exceed 4G? What about a type cast?
>>
>> A typecast would not alter behavior in any way. And of course
>> a problem only arises if the value was above 4 billion. You either
>> need to refuse such values while the attempt is made to set it.
>> or you need to deal with the full range of possible values. Likely
>> the former is the better (and I wonder whether the upper
>> bound shouldn't be forced even lower than 4 billion).
> 
> Oh, I see. A check with the upper bound sounds better. Using 4G as the
> upper bound is a little conservative, but I do not have any better
> criteria right now. :)

But when making that decision keep security in mind: How much
memory would it take to populate 4G rangeset nodes?

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27 15:12                   ` Jan Beulich
@ 2016-01-27 15:23                     ` Yu, Zhang
  2016-01-27 15:58                       ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Yu, Zhang @ 2016-01-27 15:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir



On 1/27/2016 11:12 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
>> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>>> About the truncation issue:
>>>>      I do not quite follow. Will this hurt if the value configured does
>>>> not exceed 4G? What about a type cast?
>>>
>>> A typecast would not alter behavior in any way. And of course
>>> a problem only arises if the value was above 4 billion. You either
>>> need to refuse such values while the attempt is made to set it.
>>> or you need to deal with the full range of possible values. Likely
>>> the former is the better (and I wonder whether the upper
>>> bound shouldn't be forced even lower than 4 billion).
>>
>> Oh, I see. A check with the upper bound sounds better. Using 4G as the
>> upper bound is a little conservative, but I do not have any better
>> criteria right now. :)
>
> But when making that decision keep security in mind: How much
> memory would it take to populate 4G rangeset nodes?
>
Well, for XenGT, one extreme case I can imagine would be half of all
the guest ram is used as the GPU page table, and page frames containing
these page tables are discontinuous (rangeset can combine continuous
ranges). For other virtual devices to leverage the write-protected gfn
rangeset, I believe the same idea applies. :)
Is this logic OK?

Thanks
Yu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27 15:23                     ` Yu, Zhang
@ 2016-01-27 15:58                       ` Jan Beulich
  2016-01-27 16:12                         ` Yu, Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-01-27 15:58 UTC (permalink / raw)
  To: Zhang Yu
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir

>>> On 27.01.16 at 16:23, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 1/27/2016 11:12 PM, Jan Beulich wrote:
>>>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
>>> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>>>> About the truncation issue:
>>>>>      I do not quite follow. Will this hurt if the value configured does
>>>>> not exceed 4G? What about a type cast?
>>>>
>>>> A typecast would not alter behavior in any way. And of course
>>>> a problem only arises if the value was above 4 billion. You either
>>>> need to refuse such values while the attempt is made to set it.
>>>> or you need to deal with the full range of possible values. Likely
>>>> the former is the better (and I wonder whether the upper
>>>> bound shouldn't be forced even lower than 4 billion).
>>>
>>> Oh, I see. A check with the upper bound sounds better. Using 4G as the
>>> upper bound is a little conservative, but I do not have any better
>>> criteria right now. :)
>>
>> But when making that decision keep security in mind: How much
>> memory would it take to populate 4G rangeset nodes?
>>
> Well, for XenGT, one extreme case I can imagine would be half of all
> the guest ram is used as the GPU page table, and page frames containing
> these page tables are discontinuous (rangeset can combine continuous
> ranges). For other virtual devices to leverage the write-protected gfn
> rangeset, I believe the same idea applies. :)
> Is this logic OK?

I can follow it, yes, but 4G ranges mean 16Tb of memory put
in page tables, which to be honest doesn't seem reasonable to
me.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges.
  2016-01-27 15:58                       ` Jan Beulich
@ 2016-01-27 16:12                         ` Yu, Zhang
  0 siblings, 0 replies; 23+ messages in thread
From: Yu, Zhang @ 2016-01-27 16:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, Paul.Durrant, zhiyuan.lv,
	keir



On 1/27/2016 11:58 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 16:23, <yu.c.zhang@linux.intel.com> wrote:
>
>>
>> On 1/27/2016 11:12 PM, Jan Beulich wrote:
>>>>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> About the truncation issue:
>>>>>>       I do not quite follow. Will this hurt if the value configured does
>>>>>> not exceed 4G? What about a type cast?
>>>>>
>>>>> A typecast would not alter behavior in any way. And of course
>>>>> a problem only arises if the value was above 4 billion. You either
>>>>> need to refuse such values while the attempt is made to set it.
>>>>> or you need to deal with the full range of possible values. Likely
>>>>> the former is the better (and I wonder whether the upper
>>>>> bound shouldn't be forced even lower than 4 billion).
>>>>
>>>> Oh, I see. A check with the upper bound sounds better. Using 4G as the
>>>> upper bound is a little conservative, but I do not have any better
>>>> criteria right now. :)
>>>
>>> But when making that decision keep security in mind: How much
>>> memory would it take to populate 4G rangeset nodes?
>>>
>> Well, for XenGT, one extreme case I can imagine would be half of all
>> the guest ram is used as the GPU page table, and page frames containing
>> these page tables are discontinuous (rangeset can combine continuous
>> ranges). For other virtual devices to leverage the write-protected gfn
>> rangeset, I believe the same idea applies. :)
>> Is this logic OK?
>
> I can follow it, yes, but 4G ranges mean 16Tb of memory put
> in page tables, which to be honest doesn't seem reasonable to
> me.
>

Thanks for your reply, Jan.
In most cases max_memkb in configuration file will not be set to such a
big value. So I'd suggest we use a comparison between the one
calculated from max_memkb and 4G, and choose the smaller one as upper
bound. If some time in the future, it becomes common cases for VMs to
use huge rams, we should use uint64 for the rangeset limit other than a 
uint32.

Yu

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-01-27 16:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-22  3:20 [PATCH v11 0/3] Refactor ioreq server for better performance Yu Zhang
2016-01-22  3:20 ` [PATCH v11 1/3] Refactor rangeset structure " Yu Zhang
2016-01-22  3:20 ` [PATCH v11 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
2016-01-22 11:43   ` Jan Beulich
2016-01-26  7:59     ` Yu, Zhang
2016-01-26 11:24       ` Jan Beulich
2016-01-27  7:02         ` Yu, Zhang
2016-01-27 10:28           ` Jan Beulich
2016-01-22  3:20 ` [PATCH v2 3/3] tools: introduce parameter max_wp_ram_ranges Yu Zhang
2016-01-22  8:01   ` Jan Beulich
2016-01-26  7:32     ` Yu, Zhang
2016-01-26 11:00       ` Jan Beulich
2016-01-27  7:01         ` Yu, Zhang
2016-01-27 10:27           ` Jan Beulich
2016-01-27 14:13             ` Yu, Zhang
2016-01-27 14:32               ` Jan Beulich
2016-01-27 14:56                 ` Yu, Zhang
2016-01-27 15:12                   ` Jan Beulich
2016-01-27 15:23                     ` Yu, Zhang
2016-01-27 15:58                       ` Jan Beulich
2016-01-27 16:12                         ` Yu, Zhang
2016-01-26 11:16   ` David Vrabel
2016-01-27  7:03     ` Yu, Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.