All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/3] Refactor ioreq server for better performance.
@ 2016-01-19  9:27 Yu Zhang
  2016-01-19  9:27 ` [PATCH v10 1/3] Refactor rangeset structure " Yu Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Yu Zhang @ 2016-01-19  9:27 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, stefano.stabellini, andrew.cooper3,
	Paul.Durrant, zhiyuan.lv, jbeulich, wei.liu2

XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the 
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.  This patch series refactored rangeset to base
it on red-back tree, so that the searching would be more efficient. 

Besides, this patchset also splits the tracking of MMIO and guest
ram ranges into different rangesets. And to accommodate more ranges,
a new parameter , max_ranges, is introduced in hvm configuration file.

Changes in v10: 
1> Add a new patch to configure the range limit inside ioreq server.
2> Commit message changes. 
3> The previous patch "[1/3] Remove identical relationship between
   ioreq type and rangeset type." has already been merged, and is not
   included in this series now.

Changes in v9: 
1> Change order of patch 2 and patch3.
2> Intruduce a const static array before hvm_ioreq_server_alloc_rangesets().
3> Coding style changes.

Changes in v8: 
Use a clearer API name to map/unmap the write-protected memory in
ioreq server.

Changes in v7: 
1> Coding style changes;
2> Fix a typo in hvm_select_ioreq_server().

Changes in v6: 
Break the identical relationship between ioreq type and rangeset
index inside ioreq server.

Changes in v5:
1> Use gpfn, instead of gpa to track guest write-protected pages;
2> Remove redundant conditional statement in routine find_range().

Changes in v4:
Keep the name HVMOP_IO_RANGE_MEMORY for MMIO resources, and add
a new one, HVMOP_IO_RANGE_WP_MEM, for write-protected memory.

Changes in v3:
1> Use a seperate rangeset for guest ram pages in ioreq server;
2> Refactor rangeset, instead of introduce a new data structure.

Changes in v2:
1> Split the original patch into 2;
2> Take Paul Durrant's comments:
  a> Add a name member in the struct rb_rangeset, and use the 'q'
debug key to dump the ranges in ioreq server;
  b> Keep original routine names for hvm ioreq server;
  c> Commit message changes - mention that a future patch to change
the maximum ranges inside ioreq server.

Yu Zhang (3):
  Refactor rangeset structure for better performance.
  Differentiate IO/mem resources tracked by ioreq server
  tools: introduce parameter max_ranges.

 docs/man/xl.cfg.pod.5            | 17 +++++++++
 tools/libxc/include/xenctrl.h    | 31 +++++++++++++++
 tools/libxc/xc_domain.c          | 61 ++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c          |  3 ++
 tools/libxl/libxl_types.idl      |  1 +
 tools/libxl/xl_cmdimpl.c         |  4 ++
 xen/arch/x86/hvm/hvm.c           | 34 ++++++++++++++---
 xen/common/rangeset.c            | 82 +++++++++++++++++++++++++++++-----------
 xen/include/asm-x86/hvm/domain.h |  2 +-
 xen/include/public/hvm/hvm_op.h  |  1 +
 xen/include/public/hvm/params.h  |  5 ++-
 11 files changed, 212 insertions(+), 29 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v10 1/3] Refactor rangeset structure for better performance.
  2016-01-19  9:27 [PATCH v10 0/3] Refactor ioreq server for better performance Yu Zhang
@ 2016-01-19  9:27 ` Yu Zhang
  2016-01-19  9:27 ` [PATCH v10 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
  2016-01-19  9:27 ` [PATCH 3/3] tools: introduce parameter max_ranges Yu Zhang
  2 siblings, 0 replies; 19+ messages in thread
From: Yu Zhang @ 2016-01-19  9:27 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, stefano.stabellini, andrew.cooper3,
	Paul.Durrant, zhiyuan.lv, jbeulich, wei.liu2

This patch refactors struct rangeset to base it on the red-black
tree structure, instead of on the current doubly linked list. By
now, ioreq leverages rangeset to keep track of the IO/memory
resources to be emulated. Yet when number of ranges inside one
ioreq server is very high, traversing a doubly linked list could
be time consuming. With this patch, the time complexity for
searching a rangeset can be improved from O(n) to O(log(n)).
Interfaces of rangeset still remain the same, and no new APIs
introduced.

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 xen/common/rangeset.c | 82 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 6c6293c..d15d8d5 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -10,11 +10,12 @@
 #include <xen/sched.h>
 #include <xen/errno.h>
 #include <xen/rangeset.h>
+#include <xen/rbtree.h>
 #include <xsm/xsm.h>
 
 /* An inclusive range [s,e] and pointer to next range in ascending order. */
 struct range {
-    struct list_head list;
+    struct rb_node node;
     unsigned long s, e;
 };
 
@@ -24,7 +25,7 @@ struct rangeset {
     struct domain   *domain;
 
     /* Ordered list of ranges contained in this set, and protecting lock. */
-    struct list_head range_list;
+    struct rb_root   range_tree;
 
     /* Number of ranges that can be allocated */
     long             nr_ranges;
@@ -45,41 +46,78 @@ struct rangeset {
 static struct range *find_range(
     struct rangeset *r, unsigned long s)
 {
-    struct range *x = NULL, *y;
+    struct rb_node *node;
+    struct range   *x;
+    struct range   *prev = NULL;
 
-    list_for_each_entry ( y, &r->range_list, list )
+    node = r->range_tree.rb_node;
+    while ( node != NULL )
     {
-        if ( y->s > s )
-            break;
-        x = y;
+        x = container_of(node, struct range, node);
+        if ( (s >= x->s) && (s <= x->e) )
+            return x;
+        if ( s < x->s )
+            node = node->rb_left;
+        else
+        {
+            prev = x;
+            node = node->rb_right;
+        }
     }
 
-    return x;
+    return prev;
 }
 
 /* Return the lowest range in the set r, or NULL if r is empty. */
 static struct range *first_range(
     struct rangeset *r)
 {
-    if ( list_empty(&r->range_list) )
-        return NULL;
-    return list_entry(r->range_list.next, struct range, list);
+    struct rb_node *node;
+
+    node = rb_first(&r->range_tree);
+    if ( node != NULL )
+        return container_of(node, struct range, node);
+
+    return NULL;
 }
 
 /* Return range following x in ascending order, or NULL if x is the highest. */
 static struct range *next_range(
     struct rangeset *r, struct range *x)
 {
-    if ( x->list.next == &r->range_list )
-        return NULL;
-    return list_entry(x->list.next, struct range, list);
+    struct rb_node *node;
+
+    node = rb_next(&x->node);
+    if ( node != NULL )
+        return container_of(node, struct range, node);
+
+    return NULL;
 }
 
 /* Insert range y after range x in r. Insert as first range if x is NULL. */
 static void insert_range(
     struct rangeset *r, struct range *x, struct range *y)
 {
-    list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
+    struct rb_node **node;
+    struct rb_node *parent = NULL;
+
+    if ( x == NULL )
+        node = &r->range_tree.rb_node;
+    else
+    {
+        node = &x->node.rb_right;
+        parent = &x->node;
+    }
+
+    while ( *node != NULL )
+    {
+        parent = *node;
+        node = &parent->rb_left;
+    }
+
+    /* Add new node and rebalance the red-black tree. */
+    rb_link_node(&y->node, parent, node);
+    rb_insert_color(&y->node, &r->range_tree);
 }
 
 /* Remove a range from its list and free it. */
@@ -88,7 +126,7 @@ static void destroy_range(
 {
     r->nr_ranges++;
 
-    list_del(&x->list);
+    rb_erase(&x->node, &r->range_tree);
     xfree(x);
 }
 
@@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
 bool_t rangeset_is_empty(
     const struct rangeset *r)
 {
-    return ((r == NULL) || list_empty(&r->range_list));
+    return ((r == NULL) || RB_EMPTY_ROOT(&r->range_tree));
 }
 
 struct rangeset *rangeset_new(
@@ -332,7 +370,7 @@ struct rangeset *rangeset_new(
         return NULL;
 
     rwlock_init(&r->lock);
-    INIT_LIST_HEAD(&r->range_list);
+    r->range_tree = RB_ROOT;
     r->nr_ranges = -1;
 
     BUG_ON(flags & ~RANGESETF_prettyprint_hex);
@@ -410,7 +448,7 @@ void rangeset_domain_destroy(
 
 void rangeset_swap(struct rangeset *a, struct rangeset *b)
 {
-    LIST_HEAD(tmp);
+    struct rb_node *tmp;
 
     if ( a < b )
     {
@@ -423,9 +461,9 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
         write_lock(&a->lock);
     }
 
-    list_splice_init(&a->range_list, &tmp);
-    list_splice_init(&b->range_list, &a->range_list);
-    list_splice(&tmp, &b->range_list);
+    tmp = a->range_tree.rb_node;
+    a->range_tree.rb_node = b->range_tree.rb_node;
+    b->range_tree.rb_node = tmp;
 
     write_unlock(&a->lock);
     write_unlock(&b->lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-19  9:27 [PATCH v10 0/3] Refactor ioreq server for better performance Yu Zhang
  2016-01-19  9:27 ` [PATCH v10 1/3] Refactor rangeset structure " Yu Zhang
@ 2016-01-19  9:27 ` Yu Zhang
  2016-01-19  9:47   ` Paul Durrant
  2016-01-19  9:27 ` [PATCH 3/3] tools: introduce parameter max_ranges Yu Zhang
  2 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2016-01-19  9:27 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, stefano.stabellini, andrew.cooper3,
	Paul.Durrant, zhiyuan.lv, jbeulich, wei.liu2

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each. This
patch uses a seperate rangeset for the guest ram pages.

To differentiate the ioreq type between the write-protected memory
ranges and the mmio ranges when selecting an ioreq server, the p2m
type is retrieved by calling get_page_from_gfn(). And we do not
need to worry about the p2m type change during the ioreq selection
process.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++
 tools/libxc/xc_domain.c          | 61 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c           | 27 +++++++++++++++---
 xen/include/asm-x86/hvm/domain.h |  2 +-
 xen/include/public/hvm/hvm_op.h  |  1 +
 5 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 079cad0..036c72d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2023,6 +2023,37 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
                                             int is_mmio,
                                             uint64_t start,
                                             uint64_t end);
+/**
+ * This function registers a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_wp_mem_range_to_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            xen_pfn_t start,
+                                            xen_pfn_t end);
+
+/**
+ * This function deregisters a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_wp_mem_range_from_ioreq_server(xc_interface *xch,
+                                                domid_t domid,
+                                                ioservid_t id,
+                                                xen_pfn_t start,
+                                                xen_pfn_t end);
 
 /**
  * This function registers a PCI device for config space emulation.
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 99e0d48..4f43695 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1544,6 +1544,67 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
     return rc;
 }
 
+int xc_hvm_map_wp_mem_range_to_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            xen_pfn_t start,
+                                            xen_pfn_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_WP_MEM;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_wp_mem_range_from_ioreq_server(xc_interface *xch,
+                                                domid_t domid,
+                                                ioservid_t id,
+                                                xen_pfn_t start,
+                                                xen_pfn_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_WP_MEM;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+
+}
+
 int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
                                       ioservid_t id, uint16_t segment,
                                       uint8_t bus, uint8_t device,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f646c1e..d59e7bc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -935,6 +935,9 @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s,
         rangeset_destroy(s->range[i]);
 }
 
+const char *io_range_name[NR_IO_RANGE_TYPES] =
+                                {"port", "mmio", "pci", "wp-mem"};
+
 static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
                                             bool_t is_default)
 {
@@ -949,10 +952,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
         char *name;
 
         rc = asprintf(&name, "ioreq_server %d %s", s->id,
-                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
-                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
-                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
-                      "");
+                      (i < NR_IO_RANGE_TYPES) ? io_range_name[i] : "");
         if ( rc )
             goto fail;
 
@@ -1270,6 +1270,7 @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
             case HVMOP_IO_RANGE_PORT:
             case HVMOP_IO_RANGE_MEMORY:
             case HVMOP_IO_RANGE_PCI:
+            case HVMOP_IO_RANGE_WP_MEM:
                 r = s->range[type];
                 break;
 
@@ -1321,6 +1322,7 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
             case HVMOP_IO_RANGE_PORT:
             case HVMOP_IO_RANGE_MEMORY:
             case HVMOP_IO_RANGE_PCI:
+            case HVMOP_IO_RANGE_WP_MEM:
                 r = s->range[type];
                 break;
 
@@ -2561,6 +2563,8 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
     uint32_t cf8;
     uint8_t type;
     uint64_t addr;
+    p2m_type_t p2mt;
+    struct page_info *ram_page;
 
     if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
         return NULL;
@@ -2604,6 +2608,16 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         type = (p->type == IOREQ_TYPE_PIO) ?
                 HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
         addr = p->addr;
+        if ( type == HVMOP_IO_RANGE_MEMORY )
+        {
+             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
+                                          &p2mt, P2M_UNSHARE);
+             if ( p2mt == p2m_mmio_write_dm )
+                 type = HVMOP_IO_RANGE_WP_MEM;
+
+             if ( ram_page )
+                 put_page(ram_page);
+        }
     }
 
     list_for_each_entry ( s,
@@ -2645,6 +2659,11 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
             }
 
             break;
+        case HVMOP_IO_RANGE_WP_MEM:
+            if ( rangeset_contains_singleton(r, PFN_DOWN(addr)) )
+                return s;
+
+            break;
         }
     }
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index a8cc2ad..1e13973 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -48,7 +48,7 @@ struct hvm_ioreq_vcpu {
     bool_t           pending;
 };
 
-#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
+#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
 #define MAX_NR_IO_RANGES  256
 
 struct hvm_ioreq_server {
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 1606185..c0b1e30 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -333,6 +333,7 @@ struct xen_hvm_io_range {
 # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
 # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
 # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
+# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected ram range */
     uint64_aligned_t start, end; /* IN - inclusive start and end of range */
 };
 typedef struct xen_hvm_io_range xen_hvm_io_range_t;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19  9:27 [PATCH v10 0/3] Refactor ioreq server for better performance Yu Zhang
  2016-01-19  9:27 ` [PATCH v10 1/3] Refactor rangeset structure " Yu Zhang
  2016-01-19  9:27 ` [PATCH v10 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
@ 2016-01-19  9:27 ` Yu Zhang
  2016-01-19 11:53   ` Wei Liu
  2 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2016-01-19  9:27 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, stefano.stabellini, andrew.cooper3,
	Paul.Durrant, zhiyuan.lv, jbeulich, wei.liu2

A new parameter - max_ranges is added to set the upper limit of ranges
to be tracked inside one ioreq server rangeset.

Ioreq server uses a group of rangesets to track the I/O or memory
resources to be emulated. The default value of this limit is set to
256. Yet there are circumstances under which the limit should exceed
the default one. E.g. in XenGT, when tracking the per-process graphic
translation tables on intel broadwell platforms, the number of page
tables concerned will be several thousand(normally in this case, 8192
could be a big enough value). Users who set his item explicitly are
supposed to know the specific scenarios that necessitate this
configuration.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 docs/man/xl.cfg.pod.5           | 17 +++++++++++++++++
 tools/libxl/libxl_dom.c         |  3 +++
 tools/libxl/libxl_types.idl     |  1 +
 tools/libxl/xl_cmdimpl.c        |  4 ++++
 xen/arch/x86/hvm/hvm.c          |  7 ++++++-
 xen/include/public/hvm/params.h |  5 ++++-
 6 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8899f75..562563d 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -962,6 +962,23 @@ FIFO-based event channel ABI support up to 131,071 event channels.
 Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
 x86).
 
+=item B<max_ranges=N>
+
+Limit the maximum ranges that can be tracked inside one ioreq server
+rangeset.
+
+Ioreq server uses a group of rangesets to track the I/O or memory
+resources to be emulated. By default, this item is not set. Not
+configuring this item, or setting its value to 0 will result in the
+upper limit set to its default value - 256. Yet there are circumstances
+under which the upper limit inside one rangeset should exceed the
+default one. E.g. in XenGT, when tracking the per-process graphic
+translation tables on intel broadwell platforms, the number of page
+tables concerned will be several thousand(normally in this case, 8192
+could be a big enough value). Users who set his item explicitly are
+supposed to know the specific scenarios that necessitate this
+configuration.
+
 =back
 
 =head2 Paravirtualised (PV) Guest Specific Options
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 47971a9..607b0c4 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -288,6 +288,9 @@ static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
                     libxl_defbool_val(info->u.hvm.nested_hvm));
     xc_hvm_param_set(handle, domid, HVM_PARAM_ALTP2M,
                     libxl_defbool_val(info->u.hvm.altp2m));
+    if (info->u.hvm.max_ranges > 0)
+        xc_hvm_param_set(handle, domid, HVM_PARAM_MAX_RANGES,
+                        info->u.hvm.max_ranges);
 }
 
 int libxl__build_pre(libxl__gc *gc, uint32_t domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9ad7eba..c936265 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -518,6 +518,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
+                                       ("max_ranges", uint32),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25507c7..9359de7 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1626,6 +1626,10 @@ static void parse_config_data(const char *config_source,
 
         if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
             b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
+
+        if (!xlu_cfg_get_long (config, "max_ranges", &l, 0))
+            b_info->u.hvm.max_ranges = l;
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d59e7bc..2f85089 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -943,6 +943,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
 {
     unsigned int i;
     int rc;
+    unsigned int max_ranges =
+        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_RANGES] > 0 ) ?
+        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_RANGES] :
+        MAX_NR_IO_RANGES;
 
     if ( is_default )
         goto done;
@@ -965,7 +969,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
         if ( !s->range[i] )
             goto fail;
 
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+        rangeset_limit(s->range[i], max_ranges);
     }
 
  done:
@@ -6012,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_IOREQ_SERVER_PFN:
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
+    case HVM_PARAM_MAX_RANGES:
         if ( value != 0 && a->value != value )
             rc = -EEXIST;
         break;
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 81f9451..7732087 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -210,6 +210,9 @@
 /* Boolean: Enable altp2m */
 #define HVM_PARAM_ALTP2M       35
 
-#define HVM_NR_PARAMS          36
+/* Maximum ranges to be tracked in one rangeset by ioreq server */
+#define HVM_PARAM_MAX_RANGES  36
+
+#define HVM_NR_PARAMS          37
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 2/3] Differentiate IO/mem resources tracked by ioreq server
  2016-01-19  9:27 ` [PATCH v10 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
@ 2016-01-19  9:47   ` Paul Durrant
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-01-19  9:47 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Kevin Tian, Keir (Xen.org),
	Andrew Cooper, Stefano Stabellini, zhiyuan.lv, jbeulich, Wei Liu

> -----Original Message-----
> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 19 January 2016 09:28
> To: xen-devel@lists.xen.org
> Cc: Paul Durrant; Stefano Stabellini; Keir (Xen.org); jbeulich@suse.com;
> Andrew Cooper; Wei Liu; Kevin Tian; zhiyuan.lv@intel.com
> Subject: [PATCH v10 2/3] Differentiate IO/mem resources tracked by ioreq
> server
> 
> Currently in ioreq server, guest write-protected ram pages are
> tracked in the same rangeset with device mmio resources. Yet
> unlike device mmio, which can be in big chunks, the guest write-
> protected pages may be discrete ranges with 4K bytes each. This
> patch uses a seperate rangeset for the guest ram pages.
> 
> To differentiate the ioreq type between the write-protected memory
> ranges and the mmio ranges when selecting an ioreq server, the p2m
> type is retrieved by calling get_page_from_gfn(). And we do not
> need to worry about the p2m type change during the ioreq selection
> process.
> 
> Note: Previously, a new hypercall or subop was suggested to map
> write-protected pages into ioreq server. However, it turned out
> handler of this new hypercall would be almost the same with the
> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
> already a type parameter in this hypercall. So no new hypercall
> defined, only a new type is introduced.
> 
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
>  tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++
>  tools/libxc/xc_domain.c          | 61
> ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/hvm.c           | 27 +++++++++++++++---
>  xen/include/asm-x86/hvm/domain.h |  2 +-
>  xen/include/public/hvm/hvm_op.h  |  1 +
>  5 files changed, 117 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 079cad0..036c72d 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2023,6 +2023,37 @@ int
> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
>                                              int is_mmio,
>                                              uint64_t start,
>                                              uint64_t end);
> +/**
> + * This function registers a range of write-protected memory for emulation.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm start start of range
> + * @parm end end of range (inclusive).
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_map_wp_mem_range_to_ioreq_server(xc_interface *xch,
> +                                            domid_t domid,
> +                                            ioservid_t id,
> +                                            xen_pfn_t start,
> +                                            xen_pfn_t end);
> +
> +/**
> + * This function deregisters a range of write-protected memory for
> emulation.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm start start of range
> + * @parm end end of range (inclusive).
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_unmap_wp_mem_range_from_ioreq_server(xc_interface
> *xch,
> +                                                domid_t domid,
> +                                                ioservid_t id,
> +                                                xen_pfn_t start,
> +                                                xen_pfn_t end);
> 
>  /**
>   * This function registers a PCI device for config space emulation.
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 99e0d48..4f43695 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1544,6 +1544,67 @@ int
> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t
> domid,
>      return rc;
>  }
> 
> +int xc_hvm_map_wp_mem_range_to_ioreq_server(xc_interface *xch,
> +                                            domid_t domid,
> +                                            ioservid_t id,
> +                                            xen_pfn_t start,
> +                                            xen_pfn_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
> +    arg->start = start;
> +    arg->end = end;
> +
> +    rc = do_xen_hypercall(xch, &hypercall);
> +
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_wp_mem_range_from_ioreq_server(xc_interface
> *xch,
> +                                                domid_t domid,
> +                                                ioservid_t id,
> +                                                xen_pfn_t start,
> +                                                xen_pfn_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
> +    arg->start = start;
> +    arg->end = end;
> +
> +    rc = do_xen_hypercall(xch, &hypercall);
> +
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +
> +}
> +
>  int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
> domid,
>                                        ioservid_t id, uint16_t segment,
>                                        uint8_t bus, uint8_t device,
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index f646c1e..d59e7bc 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -935,6 +935,9 @@ static void hvm_ioreq_server_free_rangesets(struct
> hvm_ioreq_server *s,
>          rangeset_destroy(s->range[i]);
>  }
> 
> +const char *io_range_name[NR_IO_RANGE_TYPES] =
> +                                {"port", "mmio", "pci", "wp-mem"};
> +
>  static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>                                              bool_t is_default)
>  {
> @@ -949,10 +952,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct
> hvm_ioreq_server *s,
>          char *name;
> 
>          rc = asprintf(&name, "ioreq_server %d %s", s->id,
> -                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
> -                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
> -                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
> -                      "");
> +                      (i < NR_IO_RANGE_TYPES) ? io_range_name[i] : "");
>          if ( rc )
>              goto fail;
> 
> @@ -1270,6 +1270,7 @@ static int
> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
>              case HVMOP_IO_RANGE_PORT:
>              case HVMOP_IO_RANGE_MEMORY:
>              case HVMOP_IO_RANGE_PCI:
> +            case HVMOP_IO_RANGE_WP_MEM:
>                  r = s->range[type];
>                  break;
> 
> @@ -1321,6 +1322,7 @@ static int
> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>              case HVMOP_IO_RANGE_PORT:
>              case HVMOP_IO_RANGE_MEMORY:
>              case HVMOP_IO_RANGE_PCI:
> +            case HVMOP_IO_RANGE_WP_MEM:
>                  r = s->range[type];
>                  break;
> 
> @@ -2561,6 +2563,8 @@ struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>      uint32_t cf8;
>      uint8_t type;
>      uint64_t addr;
> +    p2m_type_t p2mt;
> +    struct page_info *ram_page;
> 
>      if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
>          return NULL;
> @@ -2604,6 +2608,16 @@ struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>          type = (p->type == IOREQ_TYPE_PIO) ?
>                  HVMOP_IO_RANGE_PORT : HVMOP_IO_RANGE_MEMORY;
>          addr = p->addr;
> +        if ( type == HVMOP_IO_RANGE_MEMORY )
> +        {
> +             ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
> +                                          &p2mt, P2M_UNSHARE);
> +             if ( p2mt == p2m_mmio_write_dm )
> +                 type = HVMOP_IO_RANGE_WP_MEM;
> +
> +             if ( ram_page )
> +                 put_page(ram_page);
> +        }
>      }
> 
>      list_for_each_entry ( s,
> @@ -2645,6 +2659,11 @@ struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>              }
> 
>              break;
> +        case HVMOP_IO_RANGE_WP_MEM:
> +            if ( rangeset_contains_singleton(r, PFN_DOWN(addr)) )
> +                return s;
> +
> +            break;
>          }
>      }
> 
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> index a8cc2ad..1e13973 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -48,7 +48,7 @@ struct hvm_ioreq_vcpu {
>      bool_t           pending;
>  };
> 
> -#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
> +#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
>  #define MAX_NR_IO_RANGES  256
> 
>  struct hvm_ioreq_server {
> diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> index 1606185..c0b1e30 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -333,6 +333,7 @@ struct xen_hvm_io_range {
>  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
>  # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
>  # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
> +# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected ram range */
>      uint64_aligned_t start, end; /* IN - inclusive start and end of range */
>  };
>  typedef struct xen_hvm_io_range xen_hvm_io_range_t;
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19  9:27 ` [PATCH 3/3] tools: introduce parameter max_ranges Yu Zhang
@ 2016-01-19 11:53   ` Wei Liu
  2016-01-19 13:54     ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Wei Liu @ 2016-01-19 11:53 UTC (permalink / raw)
  To: Yu Zhang
  Cc: kevin.tian, keir, stefano.stabellini, andrew.cooper3, xen-devel,
	Paul.Durrant, zhiyuan.lv, jbeulich, wei.liu2

On Tue, Jan 19, 2016 at 05:27:58PM +0800, Yu Zhang wrote:
> A new parameter - max_ranges is added to set the upper limit of ranges
> to be tracked inside one ioreq server rangeset.
> 
> Ioreq server uses a group of rangesets to track the I/O or memory
> resources to be emulated. The default value of this limit is set to
> 256. Yet there are circumstances under which the limit should exceed
> the default one. E.g. in XenGT, when tracking the per-process graphic
> translation tables on intel broadwell platforms, the number of page
> tables concerned will be several thousand(normally in this case, 8192
> could be a big enough value). Users who set his item explicitly are
> supposed to know the specific scenarios that necessitate this
> configuration.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
>  docs/man/xl.cfg.pod.5           | 17 +++++++++++++++++
>  tools/libxl/libxl_dom.c         |  3 +++
>  tools/libxl/libxl_types.idl     |  1 +
>  tools/libxl/xl_cmdimpl.c        |  4 ++++
>  xen/arch/x86/hvm/hvm.c          |  7 ++++++-
>  xen/include/public/hvm/params.h |  5 ++++-
>  6 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 8899f75..562563d 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -962,6 +962,23 @@ FIFO-based event channel ABI support up to 131,071 event channels.
>  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
>  x86).
>  
> +=item B<max_ranges=N>
> +

This name is too generic. I don't have better suggestion though.

> +Limit the maximum ranges that can be tracked inside one ioreq server
> +rangeset.
> +
> +Ioreq server uses a group of rangesets to track the I/O or memory
> +resources to be emulated. By default, this item is not set. Not
> +configuring this item, or setting its value to 0 will result in the
> +upper limit set to its default value - 256. Yet there are circumstances

No need to say 256 because this might change in the future in
hypervisor.

> +under which the upper limit inside one rangeset should exceed the
> +default one. E.g. in XenGT, when tracking the per-process graphic
> +translation tables on intel broadwell platforms, the number of page
> +tables concerned will be several thousand(normally in this case, 8192
> +could be a big enough value). Users who set his item explicitly are
> +supposed to know the specific scenarios that necessitate this
> +configuration.
> +
>  =back
>  
>  =head2 Paravirtualised (PV) Guest Specific Options
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 47971a9..607b0c4 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -288,6 +288,9 @@ static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
>                      libxl_defbool_val(info->u.hvm.nested_hvm));
>      xc_hvm_param_set(handle, domid, HVM_PARAM_ALTP2M,
>                      libxl_defbool_val(info->u.hvm.altp2m));
> +    if (info->u.hvm.max_ranges > 0)
> +        xc_hvm_param_set(handle, domid, HVM_PARAM_MAX_RANGES,
> +                        info->u.hvm.max_ranges);
>  }
>  
>  int libxl__build_pre(libxl__gc *gc, uint32_t domid,
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 9ad7eba..c936265 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -518,6 +518,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                         ("serial_list",      libxl_string_list),
>                                         ("rdm", libxl_rdm_reserve),
>                                         ("rdm_mem_boundary_memkb", MemKB),
> +                                       ("max_ranges", uint32),

This also needs a better name.

Another thing is that you need to define LIBXL_HAVE_XXX in libxl.h to
indicate we introduce a new field.

Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19 11:53   ` Wei Liu
@ 2016-01-19 13:54     ` Paul Durrant
  2016-01-19 14:37       ` Wei Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-19 13:54 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, Keir (Xen.org),
	Andrew Cooper, xen-devel, Stefano Stabellini, zhiyuan.lv,
	jbeulich, Wei Liu

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 19 January 2016 11:54
> To: Yu Zhang
> Cc: xen-devel@lists.xen.org; Kevin Tian; Keir (Xen.org); Stefano Stabellini;
> Andrew Cooper; Paul Durrant; zhiyuan.lv@intel.com; jbeulich@suse.com;
> Wei Liu
> Subject: Re: [Xen-devel] [PATCH 3/3] tools: introduce parameter
> max_ranges.
> 
> On Tue, Jan 19, 2016 at 05:27:58PM +0800, Yu Zhang wrote:
> > A new parameter - max_ranges is added to set the upper limit of ranges
> > to be tracked inside one ioreq server rangeset.
> >
> > Ioreq server uses a group of rangesets to track the I/O or memory
> > resources to be emulated. The default value of this limit is set to
> > 256. Yet there are circumstances under which the limit should exceed
> > the default one. E.g. in XenGT, when tracking the per-process graphic
> > translation tables on intel broadwell platforms, the number of page
> > tables concerned will be several thousand(normally in this case, 8192
> > could be a big enough value). Users who set his item explicitly are
> > supposed to know the specific scenarios that necessitate this
> > configuration.
> >
> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > ---
> >  docs/man/xl.cfg.pod.5           | 17 +++++++++++++++++
> >  tools/libxl/libxl_dom.c         |  3 +++
> >  tools/libxl/libxl_types.idl     |  1 +
> >  tools/libxl/xl_cmdimpl.c        |  4 ++++
> >  xen/arch/x86/hvm/hvm.c          |  7 ++++++-
> >  xen/include/public/hvm/params.h |  5 ++++-
> >  6 files changed, 35 insertions(+), 2 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> > index 8899f75..562563d 100644
> > --- a/docs/man/xl.cfg.pod.5
> > +++ b/docs/man/xl.cfg.pod.5
> > @@ -962,6 +962,23 @@ FIFO-based event channel ABI support up to
> 131,071 event channels.
> >  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
> >  x86).
> >
> > +=item B<max_ranges=N>
> > +
> 
> This name is too generic. I don't have better suggestion though.
> 

I think the increased limit for XenGT need only be applied to wp mem ranges so perhaps the parameter name could be 'max_wp_memory_ranges'?

  Paul

> > +Limit the maximum ranges that can be tracked inside one ioreq server
> > +rangeset.
> > +
> > +Ioreq server uses a group of rangesets to track the I/O or memory
> > +resources to be emulated. By default, this item is not set. Not
> > +configuring this item, or setting its value to 0 will result in the
> > +upper limit set to its default value - 256. Yet there are circumstances
> 
> No need to say 256 because this might change in the future in
> hypervisor.
> 
> > +under which the upper limit inside one rangeset should exceed the
> > +default one. E.g. in XenGT, when tracking the per-process graphic
> > +translation tables on intel broadwell platforms, the number of page
> > +tables concerned will be several thousand(normally in this case, 8192
> > +could be a big enough value). Users who set his item explicitly are
> > +supposed to know the specific scenarios that necessitate this
> > +configuration.
> > +
> >  =back
> >
> >  =head2 Paravirtualised (PV) Guest Specific Options
> > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> > index 47971a9..607b0c4 100644
> > --- a/tools/libxl/libxl_dom.c
> > +++ b/tools/libxl/libxl_dom.c
> > @@ -288,6 +288,9 @@ static void hvm_set_conf_params(xc_interface
> *handle, uint32_t domid,
> >                      libxl_defbool_val(info->u.hvm.nested_hvm));
> >      xc_hvm_param_set(handle, domid, HVM_PARAM_ALTP2M,
> >                      libxl_defbool_val(info->u.hvm.altp2m));
> > +    if (info->u.hvm.max_ranges > 0)
> > +        xc_hvm_param_set(handle, domid, HVM_PARAM_MAX_RANGES,
> > +                        info->u.hvm.max_ranges);
> >  }
> >
> >  int libxl__build_pre(libxl__gc *gc, uint32_t domid,
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 9ad7eba..c936265 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -518,6 +518,7 @@ libxl_domain_build_info =
> Struct("domain_build_info",[
> >                                         ("serial_list",      libxl_string_list),
> >                                         ("rdm", libxl_rdm_reserve),
> >                                         ("rdm_mem_boundary_memkb", MemKB),
> > +                                       ("max_ranges", uint32),
> 
> This also needs a better name.
> 
> Another thing is that you need to define LIBXL_HAVE_XXX in libxl.h to
> indicate we introduce a new field.
> 
> Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19 13:54     ` Paul Durrant
@ 2016-01-19 14:37       ` Wei Liu
  2016-01-19 14:47         ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Wei Liu @ 2016-01-19 14:37 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org),
	jbeulich, Andrew Cooper, xen-devel, Yu Zhang, zhiyuan.lv,
	Stefano Stabellini, Wei Liu

On Tue, Jan 19, 2016 at 01:54:42PM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Wei Liu [mailto:wei.liu2@citrix.com]
> > Sent: 19 January 2016 11:54
> > To: Yu Zhang
> > Cc: xen-devel@lists.xen.org; Kevin Tian; Keir (Xen.org); Stefano Stabellini;
> > Andrew Cooper; Paul Durrant; zhiyuan.lv@intel.com; jbeulich@suse.com;
> > Wei Liu
> > Subject: Re: [Xen-devel] [PATCH 3/3] tools: introduce parameter
> > max_ranges.
> > 
> > On Tue, Jan 19, 2016 at 05:27:58PM +0800, Yu Zhang wrote:
> > > A new parameter - max_ranges is added to set the upper limit of ranges
> > > to be tracked inside one ioreq server rangeset.
> > >
> > > Ioreq server uses a group of rangesets to track the I/O or memory
> > > resources to be emulated. The default value of this limit is set to
> > > 256. Yet there are circumstances under which the limit should exceed
> > > the default one. E.g. in XenGT, when tracking the per-process graphic
> > > translation tables on intel broadwell platforms, the number of page
> > > tables concerned will be several thousand(normally in this case, 8192
> > > could be a big enough value). Users who set his item explicitly are
> > > supposed to know the specific scenarios that necessitate this
> > > configuration.
> > >
> > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > ---
> > >  docs/man/xl.cfg.pod.5           | 17 +++++++++++++++++
> > >  tools/libxl/libxl_dom.c         |  3 +++
> > >  tools/libxl/libxl_types.idl     |  1 +
> > >  tools/libxl/xl_cmdimpl.c        |  4 ++++
> > >  xen/arch/x86/hvm/hvm.c          |  7 ++++++-
> > >  xen/include/public/hvm/params.h |  5 ++++-
> > >  6 files changed, 35 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> > > index 8899f75..562563d 100644
> > > --- a/docs/man/xl.cfg.pod.5
> > > +++ b/docs/man/xl.cfg.pod.5
> > > @@ -962,6 +962,23 @@ FIFO-based event channel ABI support up to
> > 131,071 event channels.
> > >  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
> > >  x86).
> > >
> > > +=item B<max_ranges=N>
> > > +
> > 
> > This name is too generic. I don't have better suggestion though.
> > 
> 
> I think the increased limit for XenGT need only be applied to wp mem ranges so perhaps the parameter name could be 'max_wp_memory_ranges'?
> 

What does "WP" mean? "Write Protected"?

Is this parameter closely related to IOREQ server? Should it contain
"ioreq" somehow?

Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19 14:37       ` Wei Liu
@ 2016-01-19 14:47         ` Paul Durrant
  2016-01-19 15:04           ` Wei Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-19 14:47 UTC (permalink / raw)
  Cc: Kevin Tian, Keir (Xen.org),
	jbeulich, Andrew Cooper, xen-devel, Yu Zhang, zhiyuan.lv,
	Stefano Stabellini, Wei Liu

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 19 January 2016 14:37
> To: Paul Durrant
> Cc: Wei Liu; Yu Zhang; xen-devel@lists.xen.org; Kevin Tian; Keir (Xen.org);
> Stefano Stabellini; Andrew Cooper; zhiyuan.lv@intel.com;
> jbeulich@suse.com
> Subject: Re: [Xen-devel] [PATCH 3/3] tools: introduce parameter
> max_ranges.
> 
> On Tue, Jan 19, 2016 at 01:54:42PM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Wei Liu [mailto:wei.liu2@citrix.com]
> > > Sent: 19 January 2016 11:54
> > > To: Yu Zhang
> > > Cc: xen-devel@lists.xen.org; Kevin Tian; Keir (Xen.org); Stefano Stabellini;
> > > Andrew Cooper; Paul Durrant; zhiyuan.lv@intel.com; jbeulich@suse.com;
> > > Wei Liu
> > > Subject: Re: [Xen-devel] [PATCH 3/3] tools: introduce parameter
> > > max_ranges.
> > >
> > > On Tue, Jan 19, 2016 at 05:27:58PM +0800, Yu Zhang wrote:
> > > > A new parameter - max_ranges is added to set the upper limit of
> ranges
> > > > to be tracked inside one ioreq server rangeset.
> > > >
> > > > Ioreq server uses a group of rangesets to track the I/O or memory
> > > > resources to be emulated. The default value of this limit is set to
> > > > 256. Yet there are circumstances under which the limit should exceed
> > > > the default one. E.g. in XenGT, when tracking the per-process graphic
> > > > translation tables on intel broadwell platforms, the number of page
> > > > tables concerned will be several thousand(normally in this case, 8192
> > > > could be a big enough value). Users who set his item explicitly are
> > > > supposed to know the specific scenarios that necessitate this
> > > > configuration.
> > > >
> > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > > ---
> > > >  docs/man/xl.cfg.pod.5           | 17 +++++++++++++++++
> > > >  tools/libxl/libxl_dom.c         |  3 +++
> > > >  tools/libxl/libxl_types.idl     |  1 +
> > > >  tools/libxl/xl_cmdimpl.c        |  4 ++++
> > > >  xen/arch/x86/hvm/hvm.c          |  7 ++++++-
> > > >  xen/include/public/hvm/params.h |  5 ++++-
> > > >  6 files changed, 35 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> > > > index 8899f75..562563d 100644
> > > > --- a/docs/man/xl.cfg.pod.5
> > > > +++ b/docs/man/xl.cfg.pod.5
> > > > @@ -962,6 +962,23 @@ FIFO-based event channel ABI support up to
> > > 131,071 event channels.
> > > >  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
> > > >  x86).
> > > >
> > > > +=item B<max_ranges=N>
> > > > +
> > >
> > > This name is too generic. I don't have better suggestion though.
> > >
> >
> > I think the increased limit for XenGT need only be applied to wp mem
> ranges so perhaps the parameter name could be
> 'max_wp_memory_ranges'?
> >
> 
> What does "WP" mean? "Write Protected"?
> 

Yes.

> Is this parameter closely related to IOREQ server? Should it contain
> "ioreq" somehow?
> 

It is closely related but ioreq server is an implementation detail so do we want to expose it as a tunable? The concept we need to capture is that the toolstack can tune the limit of the maximum number of pages in the VM that can be set such that writes are emulated (but reads are as for normal ram). Or I guess we could get very specific and call it something like 'max_gtt_shadows'?

> Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19 14:47         ` Paul Durrant
@ 2016-01-19 15:04           ` Wei Liu
  2016-01-19 15:18             ` Ian Campbell
  0 siblings, 1 reply; 19+ messages in thread
From: Wei Liu @ 2016-01-19 15:04 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org),
	jbeulich, Andrew Cooper, xen-devel, Yu Zhang, zhiyuan.lv,
	Stefano Stabellini, Wei Liu

On Tue, Jan 19, 2016 at 02:47:40PM +0000, Paul Durrant wrote:
[...]
> > ranges so perhaps the parameter name could be
> > 'max_wp_memory_ranges'?
> > >
> > 
> > What does "WP" mean? "Write Protected"?
> > 
> 
> Yes.
> 
> > Is this parameter closely related to IOREQ server? Should it contain
> > "ioreq" somehow?
> > 
> 
> It is closely related but ioreq server is an implementation detail so
> do we want to expose it as a tunable? The concept we need to capture
> is that the toolstack can tune the limit of the maximum number of
> pages in the VM that can be set such that writes are emulated (but
> reads are as for normal ram). Or I guess we could get very specific
> and call it something like 'max_gtt_shadows'?

I would prefer generic concept in this case ("wp"). Let's wait a bit for
other people to voice their opinion.

Whichever one we pick it the meaning of the acronym needs to be clearly
documented...

Wei.

> 
> > Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19 15:04           ` Wei Liu
@ 2016-01-19 15:18             ` Ian Campbell
  2016-01-20  3:14               ` Tian, Kevin
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2016-01-19 15:18 UTC (permalink / raw)
  To: Wei Liu, Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org),
	Andrew Cooper, xen-devel, Yu Zhang, zhiyuan.lv, jbeulich,
	Stefano Stabellini

On Tue, 2016-01-19 at 15:04 +0000, Wei Liu wrote:

This patch doesn't seem to have been CCd to the tools maintainers, adding
Ian too, I think everyone else was picked up along the way.

Please use ./scripts/get_maintainers.pl in the future.

> On Tue, Jan 19, 2016 at 02:47:40PM +0000, Paul Durrant wrote:
> [...]
> > > ranges so perhaps the parameter name could be
> > > 'max_wp_memory_ranges'?
> > > > 
> > > 
> > > What does "WP" mean? "Write Protected"?
> > > 
> > 
> > Yes.
> > 
> > > Is this parameter closely related to IOREQ server? Should it contain
> > > "ioreq" somehow?
> > > 
> > 
> > It is closely related but ioreq server is an implementation detail so
> > do we want to expose it as a tunable? The concept we need to capture
> > is that the toolstack can tune the limit of the maximum number of
> > pages in the VM that can be set such that writes are emulated (but
> > reads are as for normal ram). Or I guess we could get very specific
> > and call it something like 'max_gtt_shadows'?
> 
> I would prefer generic concept in this case ("wp"). Let's wait a bit for
> other people to voice their opinion.
> 
> Whichever one we pick it the meaning of the acronym needs to be clearly
> documented...

I've got no ideas for a better name, "max_ranges" is clearly too generic
though.

One thought -- does XenGT require some other configuration option to enable
it or maybe a privilege which the target domain must necessarily have?
Could we use something like one of those to cause the t/stack to just DTRT
without the user having to micromanage the amount of pages which are
allowed to have this property?

> 
> Wei.
> 
> > 
> > > Wei.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-19 15:18             ` Ian Campbell
@ 2016-01-20  3:14               ` Tian, Kevin
  2016-01-20  3:33                 ` Yu, Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Tian, Kevin @ 2016-01-20  3:14 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu, Paul Durrant
  Cc: Keir (Xen.org),
	Andrew Cooper, xen-devel, Yu Zhang, Lv, Zhiyuan, jbeulich,
	Stefano Stabellini

> From: Ian Campbell [mailto:ian.campbell@citrix.com]
> Sent: Tuesday, January 19, 2016 11:19 PM
> 
> On Tue, 2016-01-19 at 15:04 +0000, Wei Liu wrote:
> 
> This patch doesn't seem to have been CCd to the tools maintainers, adding
> Ian too, I think everyone else was picked up along the way.
> 
> Please use ./scripts/get_maintainers.pl in the future.
> 
> > On Tue, Jan 19, 2016 at 02:47:40PM +0000, Paul Durrant wrote:
> > [...]
> > > > ranges so perhaps the parameter name could be
> > > > 'max_wp_memory_ranges'?
> > > > >
> > > >
> > > > What does "WP" mean? "Write Protected"?
> > > >
> > >
> > > Yes.
> > >
> > > > Is this parameter closely related to IOREQ server? Should it contain
> > > > "ioreq" somehow?
> > > >
> > >
> > > It is closely related but ioreq server is an implementation detail so
> > > do we want to expose it as a tunable? The concept we need to capture
> > > is that the toolstack can tune the limit of the maximum number of
> > > pages in the VM that can be set such that writes are emulated (but
> > > reads are as for normal ram). Or I guess we could get very specific
> > > and call it something like 'max_gtt_shadows'?
> >
> > I would prefer generic concept in this case ("wp"). Let's wait a bit for
> > other people to voice their opinion.
> >
> > Whichever one we pick it the meaning of the acronym needs to be clearly
> > documented...
> 
> I've got no ideas for a better name, "max_ranges" is clearly too generic
> though.
> 
> One thought -- does XenGT require some other configuration option to enable
> it or maybe a privilege which the target domain must necessarily have?
> Could we use something like one of those to cause the t/stack to just DTRT
> without the user having to micromanage the amount of pages which are
> allowed to have this property?
> 

Using "wp" is clear to me.

As a feature this write-protection has nothing to be GPU virtualization specific.
In the future the same mediated pass-through idea used in XenGT may be
used on other I/O devices which need to shadow some structure w/ requirement
to write-protect guest memory. So it's not good to tie this to either XenGT
or GTT.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-20  3:14               ` Tian, Kevin
@ 2016-01-20  3:33                 ` Yu, Zhang
  2016-01-20  3:58                   ` Tian, Kevin
  0 siblings, 1 reply; 19+ messages in thread
From: Yu, Zhang @ 2016-01-20  3:33 UTC (permalink / raw)
  To: Tian, Kevin, Ian Campbell, Wei Liu, Paul Durrant
  Cc: Keir (Xen.org),
	Andrew Cooper, xen-devel, Stefano Stabellini, Lv, Zhiyuan,
	jbeulich



On 1/20/2016 11:14 AM, Tian, Kevin wrote:
>> From: Ian Campbell [mailto:ian.campbell@citrix.com]
>> Sent: Tuesday, January 19, 2016 11:19 PM
>>
>> On Tue, 2016-01-19 at 15:04 +0000, Wei Liu wrote:
>>
>> This patch doesn't seem to have been CCd to the tools maintainers, adding
>> Ian too, I think everyone else was picked up along the way.
>>
>> Please use ./scripts/get_maintainers.pl in the future.
>>
>>> On Tue, Jan 19, 2016 at 02:47:40PM +0000, Paul Durrant wrote:
>>> [...]
>>>>> ranges so perhaps the parameter name could be
>>>>> 'max_wp_memory_ranges'?
>>>>>>
>>>>>
>>>>> What does "WP" mean? "Write Protected"?
>>>>>
>>>>
>>>> Yes.
>>>>
>>>>> Is this parameter closely related to IOREQ server? Should it contain
>>>>> "ioreq" somehow?
>>>>>
>>>>
>>>> It is closely related but ioreq server is an implementation detail so
>>>> do we want to expose it as a tunable? The concept we need to capture
>>>> is that the toolstack can tune the limit of the maximum number of
>>>> pages in the VM that can be set such that writes are emulated (but
>>>> reads are as for normal ram). Or I guess we could get very specific
>>>> and call it something like 'max_gtt_shadows'?
>>>
>>> I would prefer generic concept in this case ("wp"). Let's wait a bit for
>>> other people to voice their opinion.
>>>
>>> Whichever one we pick it the meaning of the acronym needs to be clearly
>>> documented...
>>
>> I've got no ideas for a better name, "max_ranges" is clearly too generic
>> though.
>>
>> One thought -- does XenGT require some other configuration option to enable
>> it or maybe a privilege which the target domain must necessarily have?
>> Could we use something like one of those to cause the t/stack to just DTRT
>> without the user having to micromanage the amount of pages which are
>> allowed to have this property?
>>
>
> Using "wp" is clear to me.
>
Thank you all. :)
So how about "max_wp_ram_ranges"? And the "wp" shall be well explained 
in documentation.

> As a feature this write-protection has nothing to be GPU virtualization specific.
> In the future the same mediated pass-through idea used in XenGT may be
> used on other I/O devices which need to shadow some structure w/ requirement
> to write-protect guest memory. So it's not good to tie this to either XenGT
> or GTT.
>
Thank you, Kevin.
Well, if this parameter is not supposed to be xengt specific, we do not 
need to connect it with any xengt flag such as ."vgt=1" or "GVT-g=1". 
Hence the user will have to configure the max_wp_ram_ranges himself,
right?

B.R.
Yu

> Thanks
> Kevin
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-20  3:33                 ` Yu, Zhang
@ 2016-01-20  3:58                   ` Tian, Kevin
  2016-01-20  5:02                     ` Yu, Zhang
  2016-01-20 10:16                     ` Ian Campbell
  0 siblings, 2 replies; 19+ messages in thread
From: Tian, Kevin @ 2016-01-20  3:58 UTC (permalink / raw)
  To: Yu, Zhang, Ian Campbell, Wei Liu, Paul Durrant
  Cc: Keir (Xen.org),
	Andrew Cooper, xen-devel, Stefano Stabellini, Lv, Zhiyuan,
	jbeulich

> From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Wednesday, January 20, 2016 11:33 AM
> > As a feature this write-protection has nothing to be GPU virtualization specific.
> > In the future the same mediated pass-through idea used in XenGT may be
> > used on other I/O devices which need to shadow some structure w/ requirement
> > to write-protect guest memory. So it's not good to tie this to either XenGT
> > or GTT.
> >
> Thank you, Kevin.
> Well, if this parameter is not supposed to be xengt specific, we do not
> need to connect it with any xengt flag such as ."vgt=1" or "GVT-g=1".
> Hence the user will have to configure the max_wp_ram_ranges himself,
> right?
> 

Not always. The option can be configured manually by the user, or 
automatically set in the code when "vgt=1" is recognized.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-20  3:58                   ` Tian, Kevin
@ 2016-01-20  5:02                     ` Yu, Zhang
  2016-01-20 10:17                       ` Wei Liu
  2016-01-20 10:16                     ` Ian Campbell
  1 sibling, 1 reply; 19+ messages in thread
From: Yu, Zhang @ 2016-01-20  5:02 UTC (permalink / raw)
  To: Tian, Kevin, Ian Campbell, Wei Liu, Paul Durrant
  Cc: Keir (Xen.org),
	Andrew Cooper, xen-devel, Stefano Stabellini, Lv, Zhiyuan,
	jbeulich



On 1/20/2016 11:58 AM, Tian, Kevin wrote:
>> From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: Wednesday, January 20, 2016 11:33 AM
>>> As a feature this write-protection has nothing to be GPU virtualization specific.
>>> In the future the same mediated pass-through idea used in XenGT may be
>>> used on other I/O devices which need to shadow some structure w/ requirement
>>> to write-protect guest memory. So it's not good to tie this to either XenGT
>>> or GTT.
>>>
>> Thank you, Kevin.
>> Well, if this parameter is not supposed to be xengt specific, we do not
>> need to connect it with any xengt flag such as ."vgt=1" or "GVT-g=1".
>> Hence the user will have to configure the max_wp_ram_ranges himself,
>> right?
>>
>
> Not always. The option can be configured manually by the user, or
> automatically set in the code when "vgt=1" is recognized.

OK. That sounds more reasonable. :)
To give a summary, I'll do the following changes in next version:

1> rename this new parameter to "max_wp_ram_ranges", then use this
parameter as the wp-ram rangeset limit, for the I/O rangeset, keep
MAX_NR_IO_RANGES as its limit;
2> clear the documentation part;
3> define a LIBXL_HAVE_XXX in libxl.h to indicate a new field in the
build info;
4> We do not introduce the xengt flag by now, and will add code to
automatically set the "max_wp_ram_ranges" after this flag is accepted
in the future.

Does anyone have more suggestions? :)

B.R.
Yu
>
> Thanks
> Kevin
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-20  3:58                   ` Tian, Kevin
  2016-01-20  5:02                     ` Yu, Zhang
@ 2016-01-20 10:16                     ` Ian Campbell
  2016-01-20 10:18                       ` Paul Durrant
  1 sibling, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2016-01-20 10:16 UTC (permalink / raw)
  To: Tian, Kevin, Yu, Zhang, Wei Liu, Paul Durrant
  Cc: Keir (Xen.org),
	Andrew Cooper, xen-devel, Stefano Stabellini, Lv, Zhiyuan,
	jbeulich

On Wed, 2016-01-20 at 03:58 +0000, Tian, Kevin wrote:
> > From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
> > Sent: Wednesday, January 20, 2016 11:33 AM
> > > As a feature this write-protection has nothing to be GPU
> > > virtualization specific.
> > > In the future the same mediated pass-through idea used in XenGT may
> > > be
> > > used on other I/O devices which need to shadow some structure w/
> > > requirement
> > > to write-protect guest memory. So it's not good to tie this to either
> > > XenGT
> > > or GTT.
> > > 
> > Thank you, Kevin.
> > Well, if this parameter is not supposed to be xengt specific, we do not
> > need to connect it with any xengt flag such as ."vgt=1" or "GVT-g=1".
> > Hence the user will have to configure the max_wp_ram_ranges himself,
> > right?
> > 
> 
> Not always. The option can be configured manually by the user, or 
> automatically set in the code when "vgt=1" is recognized.

Is the latter approach not always sufficient? IOW, if it can be done
automatically, why would the user need to tweak it?

Ian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-20  5:02                     ` Yu, Zhang
@ 2016-01-20 10:17                       ` Wei Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Wei Liu @ 2016-01-20 10:17 UTC (permalink / raw)
  To: Yu, Zhang
  Cc: Tian, Kevin, Keir (Xen.org),
	Ian Campbell, Andrew Cooper, xen-devel, Paul Durrant,
	Stefano Stabellini, Lv, Zhiyuan, jbeulich, Wei Liu

On Wed, Jan 20, 2016 at 01:02:39PM +0800, Yu, Zhang wrote:
> 
> 
> On 1/20/2016 11:58 AM, Tian, Kevin wrote:
> >>From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
> >>Sent: Wednesday, January 20, 2016 11:33 AM
> >>>As a feature this write-protection has nothing to be GPU virtualization specific.
> >>>In the future the same mediated pass-through idea used in XenGT may be
> >>>used on other I/O devices which need to shadow some structure w/ requirement
> >>>to write-protect guest memory. So it's not good to tie this to either XenGT
> >>>or GTT.
> >>>
> >>Thank you, Kevin.
> >>Well, if this parameter is not supposed to be xengt specific, we do not
> >>need to connect it with any xengt flag such as ."vgt=1" or "GVT-g=1".
> >>Hence the user will have to configure the max_wp_ram_ranges himself,
> >>right?
> >>
> >
> >Not always. The option can be configured manually by the user, or
> >automatically set in the code when "vgt=1" is recognized.
> 
> OK. That sounds more reasonable. :)
> To give a summary, I'll do the following changes in next version:
> 
> 1> rename this new parameter to "max_wp_ram_ranges", then use this
> parameter as the wp-ram rangeset limit, for the I/O rangeset, keep
> MAX_NR_IO_RANGES as its limit;
> 2> clear the documentation part;
> 3> define a LIBXL_HAVE_XXX in libxl.h to indicate a new field in the
> build info;
> 4> We do not introduce the xengt flag by now, and will add code to
> automatically set the "max_wp_ram_ranges" after this flag is accepted
> in the future.
> 
> Does anyone have more suggestions? :)
> 

Ian posted an enquiry earlier:

"Could we use something like one of those to cause the t/stack to just
DTRT without the user having to micromanage the amount of pages which
are allowed to have this property?"

Is that possible?

Wei.


> B.R.
> Yu
> >
> >Thanks
> >Kevin
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-20 10:16                     ` Ian Campbell
@ 2016-01-20 10:18                       ` Paul Durrant
  2016-01-20 11:13                         ` Yu, Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-20 10:18 UTC (permalink / raw)
  To: Ian Campbell, Kevin Tian, Yu, Zhang, Wei Liu
  Cc: Keir (Xen.org),
	Andrew Cooper, xen-devel, Stefano Stabellini, Lv, Zhiyuan,
	jbeulich

> -----Original Message-----
> From: Ian Campbell [mailto:ian.campbell@citrix.com]
> Sent: 20 January 2016 10:16
> To: Kevin Tian; Yu, Zhang; Wei Liu; Paul Durrant
> Cc: Keir (Xen.org); jbeulich@suse.com; Andrew Cooper; xen-
> devel@lists.xen.org; Lv, Zhiyuan; Stefano Stabellini
> Subject: Re: [Xen-devel] [PATCH 3/3] tools: introduce parameter
> max_ranges.
> 
> On Wed, 2016-01-20 at 03:58 +0000, Tian, Kevin wrote:
> > > From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
> > > Sent: Wednesday, January 20, 2016 11:33 AM
> > > > As a feature this write-protection has nothing to be GPU
> > > > virtualization specific.
> > > > In the future the same mediated pass-through idea used in XenGT may
> > > > be
> > > > used on other I/O devices which need to shadow some structure w/
> > > > requirement
> > > > to write-protect guest memory. So it's not good to tie this to either
> > > > XenGT
> > > > or GTT.
> > > >
> > > Thank you, Kevin.
> > > Well, if this parameter is not supposed to be xengt specific, we do not
> > > need to connect it with any xengt flag such as ."vgt=1" or "GVT-g=1".
> > > Hence the user will have to configure the max_wp_ram_ranges himself,
> > > right?
> > >
> >
> > Not always. The option can be configured manually by the user, or
> > automatically set in the code when "vgt=1" is recognized.
> 
> Is the latter approach not always sufficient? IOW, if it can be done
> automatically, why would the user need to tweak it?
>

I think latter is sufficient for now. We always have the option of adding a specific wp_ram_ranges parameter in future if there is a need.

  Paul
 
> Ian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] tools: introduce parameter max_ranges.
  2016-01-20 10:18                       ` Paul Durrant
@ 2016-01-20 11:13                         ` Yu, Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Yu, Zhang @ 2016-01-20 11:13 UTC (permalink / raw)
  To: Paul Durrant, Ian Campbell, Kevin Tian, Wei Liu
  Cc: Keir (Xen.org),
	Andrew Cooper, xen-devel, Stefano Stabellini, Lv, Zhiyuan,
	jbeulich



On 1/20/2016 6:18 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Ian Campbell [mailto:ian.campbell@citrix.com]
>> Sent: 20 January 2016 10:16
>> To: Kevin Tian; Yu, Zhang; Wei Liu; Paul Durrant
>> Cc: Keir (Xen.org); jbeulich@suse.com; Andrew Cooper; xen-
>> devel@lists.xen.org; Lv, Zhiyuan; Stefano Stabellini
>> Subject: Re: [Xen-devel] [PATCH 3/3] tools: introduce parameter
>> max_ranges.
>>
>> On Wed, 2016-01-20 at 03:58 +0000, Tian, Kevin wrote:
>>>> From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
>>>> Sent: Wednesday, January 20, 2016 11:33 AM
>>>>> As a feature this write-protection has nothing to be GPU
>>>>> virtualization specific.
>>>>> In the future the same mediated pass-through idea used in XenGT may
>>>>> be
>>>>> used on other I/O devices which need to shadow some structure w/
>>>>> requirement
>>>>> to write-protect guest memory. So it's not good to tie this to either
>>>>> XenGT
>>>>> or GTT.
>>>>>
>>>> Thank you, Kevin.
>>>> Well, if this parameter is not supposed to be xengt specific, we do not
>>>> need to connect it with any xengt flag such as ."vgt=1" or "GVT-g=1".
>>>> Hence the user will have to configure the max_wp_ram_ranges himself,
>>>> right?
>>>>
>>>
>>> Not always. The option can be configured manually by the user, or
>>> automatically set in the code when "vgt=1" is recognized.
>>
>> Is the latter approach not always sufficient? IOW, if it can be done
>> automatically, why would the user need to tweak it?
>>
>
> I think latter is sufficient for now. We always have the option of adding a specific wp_ram_ranges parameter in future if there is a need

Thank you all for your reply.
Well, I believe the latter option is only sufficient for most
usage models on BDW due to rangeset's ability to merge continuous
pages into one range, but there might be some extreme cases, e.g.
too many graphic related applications in one VM, which create a
great deal of per-process graphic translation tables. And also,
future cpu platforms might provide even more PPGGTs. So, I suggest
we use this max_wp_ram_ranges, and give the control to the system
administrator. Besides, like Kevin said, XenGT's mediated pass-thru
idea can also be adopted to other devices, and this parameter may
also help.
Also, we have plans to upstream the tool-stack changes later this
year. If this max_wp_ram_ranges is not convenient, we can introduce
a method to automatically set its default value.

B.R.
Yu
>
>    Paul
>
>> Ian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-01-20 11:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-19  9:27 [PATCH v10 0/3] Refactor ioreq server for better performance Yu Zhang
2016-01-19  9:27 ` [PATCH v10 1/3] Refactor rangeset structure " Yu Zhang
2016-01-19  9:27 ` [PATCH v10 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
2016-01-19  9:47   ` Paul Durrant
2016-01-19  9:27 ` [PATCH 3/3] tools: introduce parameter max_ranges Yu Zhang
2016-01-19 11:53   ` Wei Liu
2016-01-19 13:54     ` Paul Durrant
2016-01-19 14:37       ` Wei Liu
2016-01-19 14:47         ` Paul Durrant
2016-01-19 15:04           ` Wei Liu
2016-01-19 15:18             ` Ian Campbell
2016-01-20  3:14               ` Tian, Kevin
2016-01-20  3:33                 ` Yu, Zhang
2016-01-20  3:58                   ` Tian, Kevin
2016-01-20  5:02                     ` Yu, Zhang
2016-01-20 10:17                       ` Wei Liu
2016-01-20 10:16                     ` Ian Campbell
2016-01-20 10:18                       ` Paul Durrant
2016-01-20 11:13                         ` Yu, Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.