xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
@ 2016-05-19  9:05 Yu Zhang
  2016-05-19  9:05 ` [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 68+ messages in thread
From: Yu Zhang @ 2016-05-19  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul.Durrant, zhiyuan.lv

XenGT leverages ioreq server to track and forward the accesses to GPU 
I/O resources, e.g. the PPGTT(per-process graphic translation tables).
Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges
to be emulated. To select an ioreq server, the rangeset is searched to
see if the I/O range is recorded. However, number of ram pages to be
tracked may exceed the upper limit of rangeset.

Previously, one solution was proposed to refactor the rangeset, and 
extend its upper limit. However, after 12 rounds discussion, we have
decided to drop this approach due to security concerns. Now this new 
patch series introduces a new mem type, HVMMEM_ioreq_server, and added
hvm operations to let one ioreq server to claim its ownership of ram 
pages with this type. Accesses to a page of this type will be handled
by the specified ioreq server directly. 

Yu Zhang (3):
  x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server.
  x86/ioreq server: Add new functions to get/set memory types.
  x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to
    an ioreq server.

 xen/arch/x86/hvm/emulate.c       |  32 +++-
 xen/arch/x86/hvm/hvm.c           | 346 +++++++++++++++++++++++++--------------
 xen/arch/x86/hvm/ioreq.c         |  41 +++++
 xen/arch/x86/mm/hap/nested_hap.c |   2 +-
 xen/arch/x86/mm/p2m-ept.c        |   7 +-
 xen/arch/x86/mm/p2m-pt.c         |  23 ++-
 xen/arch/x86/mm/p2m.c            |  70 ++++++++
 xen/arch/x86/mm/shadow/multi.c   |   3 +-
 xen/include/asm-x86/hvm/ioreq.h  |   2 +
 xen/include/asm-x86/p2m.h        |  32 +++-
 xen/include/public/hvm/hvm_op.h  |  38 ++++-
 11 files changed, 452 insertions(+), 144 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server.
  2016-05-19  9:05 [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
@ 2016-05-19  9:05 ` Yu Zhang
  2016-06-14 10:04   ` Jan Beulich
  2016-05-19  9:05 ` [PATCH v4 2/3] x86/ioreq server: Add new functions to get/set memory types Yu Zhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-05-19  9:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jan Beulich, George Dunlap, Andrew Cooper,
	Tim Deegan, Paul Durrant, zhiyuan.lv, Jun Nakajima

Previously p2m type p2m_mmio_write_dm was introduced for write-
protected memory pages whose write operations are supposed to be
forwarded to and emulated by an ioreq server. Yet limitations of
rangeset restrict the number of guest pages to be write-protected.

This patch replaces the p2m type p2m_mmio_write_dm with a new name:
p2m_ioreq_server, which means this p2m type can be claimed by one
ioreq server, instead of being tracked inside the rangeset of ioreq
server. And a new memory type, HVMMEM_ioreq_server, is now used in
the HVMOP_set/get_mem_type interface to set/get this p2m type.

Patches following up will add the related HVMOP handling code which
map/unmap type p2m_ioreq_server to/from an ioreq server.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Tim Deegan <tim@xen.org>

changes in v4: 
  - According to George's comments, move the HVMMEM_unused part
    into a seperate patch(which has already been accepted);
  - Removed George's Reviewed-by because of changes after v3.
  - According to Wei Liu's comments, change the format of the commit
    message.

changes in v3: 
  - According to Jan & George's comments, keep HVMMEM_mmio_write_dm
    for old xen interface versions, and replace it with HVMMEM_unused
    for xen interfaces newer than 4.7.0; For p2m_ioreq_server, a new 
    enum - HVMMEM_ioreq_server is introduced for the get/set mem type
    interfaces;
  - Add George's Reviewed-by and Acked-by from Tim & Andrew.

changes in v2: 
  - According to George Dunlap's comments, only rename the p2m type,
    with no behavior changes.
---
 xen/arch/x86/hvm/hvm.c          | 9 ++++++---
 xen/arch/x86/mm/p2m-ept.c       | 2 +-
 xen/arch/x86/mm/p2m-pt.c        | 2 +-
 xen/arch/x86/mm/shadow/multi.c  | 2 +-
 xen/include/asm-x86/p2m.h       | 4 ++--
 xen/include/public/hvm/hvm_op.h | 5 +++--
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5040a5c..21bc45c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1857,7 +1857,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
      */
     if ( (p2mt == p2m_mmio_dm) || 
          (npfec.write_access &&
-          (p2m_is_discard_write(p2mt) || (p2mt == p2m_mmio_write_dm))) )
+          (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server))) )
     {
         __put_gfn(p2m, gfn);
         if ( ap2m_active )
@@ -5507,6 +5507,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             get_gfn_query_unlocked(d, a.pfn, &t);
             if ( p2m_is_mmio(t) )
                 a.mem_type =  HVMMEM_mmio_dm;
+            else if ( t == p2m_ioreq_server )
+                a.mem_type = HVMMEM_ioreq_server;
             else if ( p2m_is_readonly(t) )
                 a.mem_type =  HVMMEM_ram_ro;
             else if ( p2m_is_ram(t) )
@@ -5537,7 +5539,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             [HVMMEM_ram_rw]  = p2m_ram_rw,
             [HVMMEM_ram_ro]  = p2m_ram_ro,
             [HVMMEM_mmio_dm] = p2m_mmio_dm,
-            [HVMMEM_unused] = p2m_invalid
+            [HVMMEM_unused] = p2m_invalid,
+            [HVMMEM_ioreq_server] = p2m_ioreq_server
         };
 
         if ( copy_from_guest(&a, arg, 1) )
@@ -5586,7 +5589,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             }
             if ( !p2m_is_ram(t) &&
                  (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
-                 (t != p2m_mmio_write_dm || a.hvmmem_type != HVMMEM_ram_rw) )
+                 (t != p2m_ioreq_server || a.hvmmem_type != HVMMEM_ram_rw) )
             {
                 put_gfn(d, pfn);
                 goto setmemtype_fail;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 1ed5b47..a45a573 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -171,7 +171,7 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
         case p2m_grant_map_ro:
-        case p2m_mmio_write_dm:
+        case p2m_ioreq_server:
             entry->r = 1;
             entry->w = entry->x = 0;
             entry->a = !!cpu_has_vmx_ept_ad;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 3d80612..eabd2e3 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -94,7 +94,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
     default:
         return flags | _PAGE_NX_BIT;
     case p2m_grant_map_ro:
-    case p2m_mmio_write_dm:
+    case p2m_ioreq_server:
         return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
     case p2m_ram_ro:
     case p2m_ram_logdirty:
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 428be37..b322293 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3226,7 +3226,7 @@ static int sh_page_fault(struct vcpu *v,
 
     /* Need to hand off device-model MMIO to the device model */
     if ( p2mt == p2m_mmio_dm
-         || (p2mt == p2m_mmio_write_dm && ft == ft_demand_write) )
+         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
     {
         gpa = guest_walk_to_gpa(&gw);
         goto mmio;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 65675a2..f3e87d6 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -71,7 +71,7 @@ typedef enum {
     p2m_ram_shared = 12,          /* Shared or sharable memory */
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
     p2m_map_foreign  = 14,        /* ram pages from foreign domain */
-    p2m_mmio_write_dm = 15,       /* Read-only; writes go to the device model */
+    p2m_ioreq_server = 15,
 } p2m_type_t;
 
 /* Modifiers to the query */
@@ -112,7 +112,7 @@ typedef unsigned int p2m_query_t;
                       | p2m_to_mask(p2m_ram_ro)         \
                       | p2m_to_mask(p2m_grant_map_ro)   \
                       | p2m_to_mask(p2m_ram_shared)     \
-                      | p2m_to_mask(p2m_mmio_write_dm))
+                      | p2m_to_mask(p2m_ioreq_server))
 
 /* Write-discard types, which should discard the write operations */
 #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index ebb907a..b3e45cf 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -84,11 +84,12 @@ typedef enum {
     HVMMEM_ram_ro,             /* Read-only; writes are discarded */
     HVMMEM_mmio_dm,            /* Reads and write go to the device model */
 #if __XEN_INTERFACE_VERSION__ < 0x00040700
-    HVMMEM_mmio_write_dm       /* Read-only; writes go to the device model */
+    HVMMEM_mmio_write_dm,      /* Read-only; writes go to the device model */
 #else
-    HVMMEM_unused              /* Placeholder; setting memory to this type
+    HVMMEM_unused,             /* Placeholder; setting memory to this type
                                   will fail for code after 4.7.0 */
 #endif
+    HVMMEM_ioreq_server
 } hvmmem_type_t;
 
 /* Following tools-only interfaces may change in future. */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 2/3] x86/ioreq server: Add new functions to get/set memory types.
  2016-05-19  9:05 [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2016-05-19  9:05 ` [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
@ 2016-05-19  9:05 ` Yu Zhang
  2016-05-19  9:05 ` [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
  2016-05-27  7:52 ` [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Zhang, Yu C
  3 siblings, 0 replies; 68+ messages in thread
From: Yu Zhang @ 2016-05-19  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

For clarity this patch breaks the code to set/get memory types out
of do_hvm_op() into dedicated functions: hvmop_set/get_mem_type().
Also, for clarity, checks for whether a memory type change is allowed
are broken out into a separate function called by hvmop_set_mem_type().

There is no intentional functional change in this patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

changes in v4: 
  - According to Wei Liu's comments, change the format of the commit
    message.
  
changes in v3: 
  - Add Andrew's Acked-by and George's Reviewed-by.

changes in v2: 
  - According to George Dunlap's comments, follow the "set rc /
    do something / goto out" pattern in hvmop_get_mem_type().
---
 xen/arch/x86/hvm/hvm.c | 288 +++++++++++++++++++++++++++----------------------
 1 file changed, 161 insertions(+), 127 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21bc45c..346da97 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5287,6 +5287,61 @@ static int do_altp2m_op(
     return rc;
 }
 
+static int hvmop_get_mem_type(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
+{
+    struct xen_hvm_get_mem_type a;
+    struct domain *d;
+    p2m_type_t t;
+    int rc;
+
+    if ( copy_from_guest(&a, arg, 1) )
+        return -EFAULT;
+
+    d = rcu_lock_domain_by_any_id(a.domid);
+    if ( d == NULL )
+        return -ESRCH;
+
+    rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_get_mem_type);
+    if ( rc )
+        goto out;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    /*
+     * Use get_gfn query as we are interested in the current
+     * type, not in allocating or unsharing. That'll happen
+     * on access.
+     */
+    get_gfn_query_unlocked(d, a.pfn, &t);
+    if ( p2m_is_mmio(t) )
+        a.mem_type =  HVMMEM_mmio_dm;
+    else if ( t == p2m_ioreq_server )
+        a.mem_type = HVMMEM_ioreq_server;
+    else if ( p2m_is_readonly(t) )
+        a.mem_type =  HVMMEM_ram_ro;
+    else if ( p2m_is_ram(t) )
+        a.mem_type =  HVMMEM_ram_rw;
+    else if ( p2m_is_pod(t) )
+        a.mem_type =  HVMMEM_ram_rw;
+    else if ( p2m_is_grant(t) )
+        a.mem_type =  HVMMEM_ram_rw;
+    else
+        a.mem_type =  HVMMEM_mmio_dm;
+
+    rc = -EFAULT;
+    if ( __copy_to_guest(arg, &a, 1) )
+        goto out;
+    rc = 0;
+
+ out:
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
 /*
  * Note that this value is effectively part of the ABI, even if we don't need
  * to make it a formal part of it: A guest suspended for migration in the
@@ -5295,6 +5350,107 @@ static int do_altp2m_op(
  */
 #define HVMOP_op_mask 0xff
 
+static bool_t hvm_allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
+{
+    if ( p2m_is_ram(old) ||
+         (p2m_is_hole(old) && new == p2m_mmio_dm) ||
+         (old == p2m_ioreq_server && new == p2m_ram_rw) )
+        return 1;
+
+    return 0;
+}
+
+static int hvmop_set_mem_type(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_set_mem_type_t) arg,
+    unsigned long *iter)
+{
+    unsigned long start_iter = *iter;
+    struct xen_hvm_set_mem_type a;
+    struct domain *d;
+    int rc;
+
+    /* Interface types to internal p2m types */
+    static const p2m_type_t memtype[] = {
+        [HVMMEM_ram_rw]  = p2m_ram_rw,
+        [HVMMEM_ram_ro]  = p2m_ram_ro,
+        [HVMMEM_mmio_dm] = p2m_mmio_dm,
+        [HVMMEM_unused] = p2m_invalid,
+        [HVMMEM_ioreq_server] = p2m_ioreq_server
+    };
+
+    if ( copy_from_guest(&a, arg, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(a.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_control(XSM_DM_PRIV, d, HVMOP_set_mem_type);
+    if ( rc )
+        goto out;
+
+    rc = -EINVAL;
+    if ( a.nr < start_iter ||
+         ((a.first_pfn + a.nr - 1) < a.first_pfn) ||
+         ((a.first_pfn + a.nr - 1) > domain_get_maximum_gpfn(d)) )
+        goto out;
+
+    if ( a.hvmmem_type >= ARRAY_SIZE(memtype) ||
+         unlikely(a.hvmmem_type == HVMMEM_unused) )
+        goto out;
+
+    while ( a.nr > start_iter )
+    {
+        unsigned long pfn = a.first_pfn + start_iter;
+        p2m_type_t t;
+
+        get_gfn_unshare(d, pfn, &t);
+        if ( p2m_is_paging(t) )
+        {
+            put_gfn(d, pfn);
+            p2m_mem_paging_populate(d, pfn);
+            rc = -EAGAIN;
+            goto out;
+        }
+        if ( p2m_is_shared(t) )
+        {
+            put_gfn(d, pfn);
+            rc = -EAGAIN;
+            goto out;
+        }
+        if ( !hvm_allow_p2m_type_change(t, memtype[a.hvmmem_type]) )
+        {
+            put_gfn(d, pfn);
+            goto out;
+        }
+
+        rc = p2m_change_type_one(d, pfn, t, memtype[a.hvmmem_type]);
+        put_gfn(d, pfn);
+
+        if ( rc )
+            goto out;
+
+        /* Check for continuation if it's not the last interation */
+        if ( a.nr > ++start_iter && !(start_iter & HVMOP_op_mask) &&
+             hypercall_preempt_check() )
+        {
+            rc = -ERESTART;
+            goto out;
+        }
+    }
+    rc = 0;
+
+ out:
+    rcu_unlock_domain(d);
+    *iter = start_iter;
+
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     unsigned long start_iter, mask;
@@ -5484,137 +5640,15 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 
     case HVMOP_get_mem_type:
-    {
-        struct xen_hvm_get_mem_type a;
-        struct domain *d;
-        p2m_type_t t;
-
-        if ( copy_from_guest(&a, arg, 1) )
-            return -EFAULT;
-
-        d = rcu_lock_domain_by_any_id(a.domid);
-        if ( d == NULL )
-            return -ESRCH;
-
-        rc = xsm_hvm_param(XSM_TARGET, d, op);
-        if ( unlikely(rc) )
-            /* nothing */;
-        else if ( likely(is_hvm_domain(d)) )
-        {
-            /* Use get_gfn query as we are interested in the current 
-             * type, not in allocating or unsharing. That'll happen 
-             * on access. */
-            get_gfn_query_unlocked(d, a.pfn, &t);
-            if ( p2m_is_mmio(t) )
-                a.mem_type =  HVMMEM_mmio_dm;
-            else if ( t == p2m_ioreq_server )
-                a.mem_type = HVMMEM_ioreq_server;
-            else if ( p2m_is_readonly(t) )
-                a.mem_type =  HVMMEM_ram_ro;
-            else if ( p2m_is_ram(t) )
-                a.mem_type =  HVMMEM_ram_rw;
-            else if ( p2m_is_pod(t) )
-                a.mem_type =  HVMMEM_ram_rw;
-            else if ( p2m_is_grant(t) )
-                a.mem_type =  HVMMEM_ram_rw;
-            else
-                a.mem_type =  HVMMEM_mmio_dm;
-            if ( __copy_to_guest(arg, &a, 1) )
-                rc = -EFAULT;
-        }
-        else
-            rc = -EINVAL;
-
-        rcu_unlock_domain(d);
+        rc = hvmop_get_mem_type(
+            guest_handle_cast(arg, xen_hvm_get_mem_type_t));
         break;
-    }
 
     case HVMOP_set_mem_type:
-    {
-        struct xen_hvm_set_mem_type a;
-        struct domain *d;
-        
-        /* Interface types to internal p2m types */
-        static const p2m_type_t memtype[] = {
-            [HVMMEM_ram_rw]  = p2m_ram_rw,
-            [HVMMEM_ram_ro]  = p2m_ram_ro,
-            [HVMMEM_mmio_dm] = p2m_mmio_dm,
-            [HVMMEM_unused] = p2m_invalid,
-            [HVMMEM_ioreq_server] = p2m_ioreq_server
-        };
-
-        if ( copy_from_guest(&a, arg, 1) )
-            return -EFAULT;
-
-        rc = rcu_lock_remote_domain_by_id(a.domid, &d);
-        if ( rc != 0 )
-            return rc;
-
-        rc = -EINVAL;
-        if ( !is_hvm_domain(d) )
-            goto setmemtype_fail;
-
-        rc = xsm_hvm_control(XSM_DM_PRIV, d, op);
-        if ( rc )
-            goto setmemtype_fail;
-
-        rc = -EINVAL;
-        if ( a.nr < start_iter ||
-             ((a.first_pfn + a.nr - 1) < a.first_pfn) ||
-             ((a.first_pfn + a.nr - 1) > domain_get_maximum_gpfn(d)) )
-            goto setmemtype_fail;
-            
-        if ( a.hvmmem_type >= ARRAY_SIZE(memtype) ||
-             unlikely(a.hvmmem_type == HVMMEM_unused) )
-            goto setmemtype_fail;
-
-        while ( a.nr > start_iter )
-        {
-            unsigned long pfn = a.first_pfn + start_iter;
-            p2m_type_t t;
-
-            get_gfn_unshare(d, pfn, &t);
-            if ( p2m_is_paging(t) )
-            {
-                put_gfn(d, pfn);
-                p2m_mem_paging_populate(d, pfn);
-                rc = -EAGAIN;
-                goto setmemtype_fail;
-            }
-            if ( p2m_is_shared(t) )
-            {
-                put_gfn(d, pfn);
-                rc = -EAGAIN;
-                goto setmemtype_fail;
-            }
-            if ( !p2m_is_ram(t) &&
-                 (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
-                 (t != p2m_ioreq_server || a.hvmmem_type != HVMMEM_ram_rw) )
-            {
-                put_gfn(d, pfn);
-                goto setmemtype_fail;
-            }
-
-            rc = p2m_change_type_one(d, pfn, t, memtype[a.hvmmem_type]);
-            put_gfn(d, pfn);
-            if ( rc )
-                goto setmemtype_fail;
-
-            /* Check for continuation if it's not the last interation */
-            if ( a.nr > ++start_iter && !(start_iter & HVMOP_op_mask) &&
-                 hypercall_preempt_check() )
-            {
-                rc = -ERESTART;
-                goto setmemtype_fail;
-            }
-        }
-
-        rc = 0;
-
-    setmemtype_fail:
-        rcu_unlock_domain(d);
+        rc = hvmop_set_mem_type(
+            guest_handle_cast(arg, xen_hvm_set_mem_type_t),
+            &start_iter);
         break;
-    }
 
     case HVMOP_pagetable_dying:
     {
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-05-19  9:05 [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2016-05-19  9:05 ` [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
  2016-05-19  9:05 ` [PATCH v4 2/3] x86/ioreq server: Add new functions to get/set memory types Yu Zhang
@ 2016-05-19  9:05 ` Yu Zhang
  2016-06-14 10:45   ` Jan Beulich
  2016-06-14 13:14   ` George Dunlap
  2016-05-27  7:52 ` [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Zhang, Yu C
  3 siblings, 2 replies; 68+ messages in thread
From: Yu Zhang @ 2016-05-19  9:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Paul Durrant, zhiyuan.lv, Jan Beulich

A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
let one ioreq server claim/disclaim its responsibility for the
handling of guest pages with p2m type p2m_ioreq_server. Users
of this HVMOP can specify which kind of operation is supposed
to be emulated in a parameter named flags. Currently, this HVMOP
only support the emulation of write operations. And it can be
easily extended to support the emulation of read ones if an
ioreq server has such requirement in the future.

For now, we only support one ioreq server for this p2m type, so
once an ioreq server has claimed its ownership, subsequent calls
of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
disclaim the ownership of guest ram pages with p2m_ioreq_server, by
triggering this new HVMOP, with ioreq server id set to the current
owner's and flags parameter set to 0.

Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
are only supported for HVMs with HAP enabled.

Also note that only after one ioreq server claims its ownership
of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
be allowed.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Tim Deegan <tim@xen.org>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Tim Deegan <tim@xen.org>

changes in v4:
  - According to Paul's advice, add comments around the definition
    of HVMMEM_iore_server in hvm_op.h.
  - According to Wei Liu's comments, change the format of the commit
    message.

changes in v3:
  - Only support write emulation in this patch;
  - Remove the code to handle race condition in hvmemul_do_io(),
  - No need to reset the p2m type after an ioreq server has disclaimed
    its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server after an ioreq
    server has claimed its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server from pages with type
    p2m_ram_rw, and vice versa;
  - HVMOP_map_mem_type_to_ioreq_server interface change - use uint16,
    instead of enum to specify the memory type;
  - Function prototype change to p2m_get_ioreq_server();
  - Coding style changes;
  - Commit message changes;
  - Add Tim's Acked-by.

changes in v2: 
  - Only support HAP enabled HVMs;
  - Replace p2m_mem_type_changed() with p2m_change_entry_type_global()
    to reset the p2m type, when an ioreq server tries to claim/disclaim
    its ownership of p2m_ioreq_server;
  - Comments changes.
---
 xen/arch/x86/hvm/emulate.c       | 32 ++++++++++++++++--
 xen/arch/x86/hvm/hvm.c           | 63 ++++++++++++++++++++++++++++++++++--
 xen/arch/x86/hvm/ioreq.c         | 41 +++++++++++++++++++++++
 xen/arch/x86/mm/hap/nested_hap.c |  2 +-
 xen/arch/x86/mm/p2m-ept.c        |  7 +++-
 xen/arch/x86/mm/p2m-pt.c         | 23 +++++++++----
 xen/arch/x86/mm/p2m.c            | 70 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c   |  3 +-
 xen/include/asm-x86/hvm/ioreq.h  |  2 ++
 xen/include/asm-x86/p2m.h        | 30 +++++++++++++++--
 xen/include/public/hvm/hvm_op.h  | 35 +++++++++++++++++++-
 11 files changed, 289 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index b9cac8e..4571294 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -100,6 +100,7 @@ static int hvmemul_do_io(
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     ioreq_t p = {
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
@@ -141,7 +142,7 @@ static int hvmemul_do_io(
              (p.dir != dir) ||
              (p.df != df) ||
              (p.data_is_ptr != data_is_addr) )
-            domain_crash(curr->domain);
+            domain_crash(currd);
 
         if ( data_is_addr )
             return X86EMUL_UNHANDLEABLE;
@@ -178,8 +179,33 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
     {
-        struct hvm_ioreq_server *s =
-            hvm_select_ioreq_server(curr->domain, &p);
+        struct hvm_ioreq_server *s;
+        p2m_type_t p2mt;
+
+        if ( is_mmio )
+        {
+            unsigned long gmfn = paddr_to_pfn(addr);
+
+            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
+
+            if ( p2mt == p2m_ioreq_server )
+            {
+                unsigned long flags;
+
+                s = p2m_get_ioreq_server(currd, &flags);
+
+                if ( dir == IOREQ_WRITE &&
+                     !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS) )
+                    s = NULL;
+            }
+            else
+                s = hvm_select_ioreq_server(currd, &p);
+        }
+        else
+        {
+            p2mt = p2m_invalid;
+            s = hvm_select_ioreq_server(currd, &p);
+        }
 
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 346da97..23abeb2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4719,6 +4719,40 @@ static int hvmop_unmap_io_range_from_ioreq_server(
     return rc;
 }
 
+static int hvmop_map_mem_type_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_mem_type_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    /* Only support for HAP enabled hvm */
+    if ( !hap_enabled(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d,
+                              HVMOP_map_mem_type_to_ioreq_server);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_map_mem_type_to_ioreq_server(d, op.id, op.type, op.flags);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 static int hvmop_set_ioreq_server_state(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_set_ioreq_server_state_t) uop)
 {
@@ -5352,9 +5386,14 @@ static int hvmop_get_mem_type(
 
 static bool_t hvm_allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
 {
+    if ( new == p2m_ioreq_server )
+        return old == p2m_ram_rw;
+
+    if ( old == p2m_ioreq_server )
+        return new == p2m_ram_rw;
+
     if ( p2m_is_ram(old) ||
-         (p2m_is_hole(old) && new == p2m_mmio_dm) ||
-         (old == p2m_ioreq_server && new == p2m_ram_rw) )
+         (p2m_is_hole(old) && new == p2m_mmio_dm) )
         return 1;
 
     return 0;
@@ -5389,6 +5428,21 @@ static int hvmop_set_mem_type(
     if ( !is_hvm_domain(d) )
         goto out;
 
+    if ( a.hvmmem_type == HVMMEM_ioreq_server )
+    {
+        unsigned long flags;
+        struct hvm_ioreq_server *s;
+
+        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */
+        if ( !hap_enabled(d) )
+            goto out;
+
+        /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped. */
+        s = p2m_get_ioreq_server(d, &flags);
+        if ( s == NULL )
+            goto out;
+    }
+
     rc = xsm_hvm_control(XSM_DM_PRIV, d, HVMOP_set_mem_type);
     if ( rc )
         goto out;
@@ -5490,6 +5544,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             guest_handle_cast(arg, xen_hvm_io_range_t));
         break;
 
+    case HVMOP_map_mem_type_to_ioreq_server:
+        rc = hvmop_map_mem_type_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_mem_type_to_ioreq_server_t));
+        break;
+
     case HVMOP_set_ioreq_server_state:
         rc = hvmop_set_ioreq_server_state(
             guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 333ce14..d24e108 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
         domain_pause(d);
 
+        p2m_destroy_ioreq_server(d, s);
+
         hvm_ioreq_server_disable(s, 0);
 
         list_del(&s->list_entry);
@@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint16_t type, uint32_t flags)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    /* For now, only HVMMEM_ioreq_server is supported. */
+    if ( type != HVMMEM_ioreq_server )
+        return -EINVAL;
+
+    /* For now, only write emulation is supported. */
+    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_WRITE) )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            rc = p2m_set_ioreq_server(d, flags, s);
+            if ( rc == 0 )
+                dprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n",
+                         s->id, (flags != 0) ? "mapped to" : "unmapped from");
+
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+    return rc;
+}
+
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled)
 {
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index d41bb09..aa90a62 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -174,7 +174,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
     if ( *p2mt == p2m_mmio_direct )
         goto direct_mmio_out;
     rc = NESTEDHVM_PAGEFAULT_MMIO;
-    if ( *p2mt == p2m_mmio_dm )
+    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
         goto out;
 
     rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index a45a573..c5d1305 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -132,6 +132,12 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->r = entry->w = entry->x = 1;
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
+        case p2m_ioreq_server:
+            entry->r = entry->x = 1;
+            entry->w = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS);
+            entry->a = !!cpu_has_vmx_ept_ad;
+            entry->d = entry->w && cpu_has_vmx_ept_ad;
+            break;
         case p2m_mmio_direct:
             entry->r = entry->x = 1;
             entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
@@ -171,7 +177,6 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
         case p2m_grant_map_ro:
-        case p2m_ioreq_server:
             entry->r = 1;
             entry->w = entry->x = 0;
             entry->a = !!cpu_has_vmx_ept_ad;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index eabd2e3..bf75afa 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -72,7 +72,9 @@ static const unsigned long pgt[] = {
     PGT_l3_page_table
 };
 
-static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
+static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
+                                       p2m_type_t t,
+                                       mfn_t mfn,
                                        unsigned int level)
 {
     unsigned long flags;
@@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
     default:
         return flags | _PAGE_NX_BIT;
     case p2m_grant_map_ro:
-    case p2m_ioreq_server:
         return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
+    case p2m_ioreq_server:
+    {
+        flags |= P2M_BASE_FLAGS | _PAGE_RW;
+
+        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
+            return flags & ~_PAGE_RW;
+        else
+            return flags;
+    }
     case p2m_ram_ro:
     case p2m_ram_logdirty:
     case p2m_ram_shared:
@@ -442,7 +452,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
             p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
                               ? p2m_ram_logdirty : p2m_ram_rw;
             unsigned long mfn = l1e_get_pfn(e);
-            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
+            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
+                                                    _mfn(mfn), level);
 
             if ( level )
             {
@@ -579,7 +590,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
             ? l3e_from_pfn(mfn_x(mfn),
-                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
+                           p2m_type_to_flags(p2m, p2mt, mfn, 2) | _PAGE_PSE)
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
 
@@ -615,7 +626,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
-                                             p2m_type_to_flags(p2mt, mfn, 0));
+                                         p2m_type_to_flags(p2m, p2mt, mfn, 0));
         else
             entry_content = l1e_empty();
 
@@ -651,7 +662,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             l2e_content = l2e_from_pfn(mfn_x(mfn),
-                                       p2m_type_to_flags(p2mt, mfn, 1) |
+                                       p2m_type_to_flags(p2m, p2mt, mfn, 1) |
                                        _PAGE_PSE);
         else
             l2e_content = l2e_empty();
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 9b19769..59afa2c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -83,6 +83,8 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     else
         p2m_pt_init(p2m);
 
+    spin_lock_init(&p2m->ioreq.lock);
+
     return ret;
 }
 
@@ -289,6 +291,74 @@ void p2m_memory_type_changed(struct domain *d)
     }
 }
 
+int p2m_set_ioreq_server(struct domain *d,
+                         unsigned long flags,
+                         struct hvm_ioreq_server *s)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
+
+    spin_lock(&p2m->ioreq.lock);
+
+    if ( flags == 0 )
+    {
+        rc = -EINVAL;
+        if ( p2m->ioreq.server != s )
+            goto out;
+
+        /* Unmap ioreq server from p2m type by passing flags with 0. */
+        p2m->ioreq.server = NULL;
+        p2m->ioreq.flags = 0;
+    }
+    else
+    {
+        rc = -EBUSY;
+        if ( p2m->ioreq.server != NULL )
+            goto out;
+
+        p2m->ioreq.server = s;
+        p2m->ioreq.flags = flags;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock(&p2m->ioreq.lock);
+
+    return rc;
+}
+
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned long *flags)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&p2m->ioreq.lock);
+
+    s = p2m->ioreq.server;
+    *flags = p2m->ioreq.flags;
+
+    spin_unlock(&p2m->ioreq.lock);
+    return s;
+}
+
+void p2m_destroy_ioreq_server(struct domain *d,
+                              struct hvm_ioreq_server *s)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    spin_lock(&p2m->ioreq.lock);
+
+    if ( p2m->ioreq.server == s )
+    {
+        p2m->ioreq.server = NULL;
+        p2m->ioreq.flags = 0;
+    }
+
+    spin_unlock(&p2m->ioreq.lock);
+}
+
 void p2m_enable_hardware_log_dirty(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index b322293..ae845d2 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3225,8 +3225,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     /* Need to hand off device-model MMIO to the device model */
-    if ( p2mt == p2m_mmio_dm
-         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
+    if ( p2mt == p2m_mmio_dm )
     {
         gpa = guest_walk_to_gpa(&gw);
         goto mmio;
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index fbf2c74..340ae3e 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
 int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
                                          uint32_t type, uint64_t start,
                                          uint64_t end);
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint16_t type, uint32_t flags);
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index f3e87d6..3aa0dd7 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
                        | p2m_to_mask(p2m_ram_paging_out)      \
                        | p2m_to_mask(p2m_ram_paged)           \
                        | p2m_to_mask(p2m_ram_paging_in)       \
-                       | p2m_to_mask(p2m_ram_shared))
+                       | p2m_to_mask(p2m_ram_shared)          \
+                       | p2m_to_mask(p2m_ioreq_server))
 
 /* Types that represent a physmap hole that is ok to replace with a shared
  * entry */
@@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
 #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
                       | p2m_to_mask(p2m_ram_ro)         \
                       | p2m_to_mask(p2m_grant_map_ro)   \
-                      | p2m_to_mask(p2m_ram_shared)     \
-                      | p2m_to_mask(p2m_ioreq_server))
+                      | p2m_to_mask(p2m_ram_shared))
 
 /* Write-discard types, which should discard the write operations */
 #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
@@ -336,6 +336,24 @@ struct p2m_domain {
         struct ept_data ept;
         /* NPT-equivalent structure could be added here. */
     };
+
+    struct {
+        spinlock_t lock;
+        /*
+         * ioreq server who's responsible for the emulation of
+         * gfns with specific p2m type(for now, p2m_ioreq_server).
+         */
+        struct hvm_ioreq_server *server;
+        /*
+         * flags specifies whether read, write or both operations
+         * are to be emulated by an ioreq server.
+         */
+        unsigned int flags;
+
+#define P2M_IOREQ_HANDLE_WRITE_ACCESS HVMOP_IOREQ_MEM_ACCESS_WRITE
+#define P2M_IOREQ_HANDLE_READ_ACCESS  HVMOP_IOREQ_MEM_ACCESS_READ
+
+    } ioreq;
 };
 
 /* get host p2m table */
@@ -843,6 +861,12 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
     return flags;
 }
 
+int p2m_set_ioreq_server(struct domain *d, unsigned long flags,
+                         struct hvm_ioreq_server *s);
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned long *flags);
+void p2m_destroy_ioreq_server(struct domain *d, struct hvm_ioreq_server *s);
+
 #endif /* _XEN_P2M_H */
 
 /*
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index b3e45cf..22c15a7 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -89,7 +89,9 @@ typedef enum {
     HVMMEM_unused,             /* Placeholder; setting memory to this type
                                   will fail for code after 4.7.0 */
 #endif
-    HVMMEM_ioreq_server
+    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server; type
+                                  changes to this value are only allowed after
+                                  an ioreq server has claimed its ownership */
 } hvmmem_type_t;
 
 /* Following tools-only interfaces may change in future. */
@@ -383,6 +385,37 @@ struct xen_hvm_set_ioreq_server_state {
 typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
 
+/*
+ * HVMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
+ *                                      to specific memroy type <type>
+ *                                      for specific accesses <flags>
+ *
+ * For now, flags only accept the value of HVMOP_IOREQ_MEM_ACCESS_WRITE,
+ * which means only write operations are to be forwarded to an ioreq server.
+ * Support for the emulation of read operations can be added when an ioreq
+ * server has such requirement in future.
+ */
+#define HVMOP_map_mem_type_to_ioreq_server 26
+struct xen_hvm_map_mem_type_to_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - ioreq server id */
+    uint16_t type;      /* IN - memory type */
+    uint16_t pad;
+    uint32_t flags;     /* IN - types of accesses to be forwarded to the
+                           ioreq server. flags with 0 means to unmap the
+                           ioreq server */
+#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
+#define HVMOP_IOREQ_MEM_ACCESS_READ \
+    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
+
+#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
+#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
+    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)
+};
+typedef struct xen_hvm_map_mem_type_to_ioreq_server
+    xen_hvm_map_mem_type_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_mem_type_to_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #if defined(__i386__) || defined(__x86_64__)
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
  2016-05-19  9:05 [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (2 preceding siblings ...)
  2016-05-19  9:05 ` [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2016-05-27  7:52 ` Zhang, Yu C
  2016-05-27 10:00   ` Jan Beulich
  3 siblings, 1 reply; 68+ messages in thread
From: Zhang, Yu C @ 2016-05-27  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Paul.Durrant, zhiyuan.lv, Jan Beulich, George Dunlap

Hi Maintainers,

On 5/19/2016 5:05 PM, Yu Zhang wrote:
> XenGT leverages ioreq server to track and forward the accesses to GPU
> I/O resources, e.g. the PPGTT(per-process graphic translation tables).
> Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges
> to be emulated. To select an ioreq server, the rangeset is searched to
> see if the I/O range is recorded. However, number of ram pages to be
> tracked may exceed the upper limit of rangeset.
>
> Previously, one solution was proposed to refactor the rangeset, and
> extend its upper limit. However, after 12 rounds discussion, we have
> decided to drop this approach due to security concerns. Now this new
> patch series introduces a new mem type, HVMMEM_ioreq_server, and added
> hvm operations to let one ioreq server to claim its ownership of ram
> pages with this type. Accesses to a page of this type will be handled
> by the specified ioreq server directly.
>
> Yu Zhang (3):
>    x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server.
>    x86/ioreq server: Add new functions to get/set memory types.
>    x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to
>      an ioreq server.
>
>   xen/arch/x86/hvm/emulate.c       |  32 +++-
>   xen/arch/x86/hvm/hvm.c           | 346 +++++++++++++++++++++++++--------------
>   xen/arch/x86/hvm/ioreq.c         |  41 +++++
>   xen/arch/x86/mm/hap/nested_hap.c |   2 +-
>   xen/arch/x86/mm/p2m-ept.c        |   7 +-
>   xen/arch/x86/mm/p2m-pt.c         |  23 ++-
>   xen/arch/x86/mm/p2m.c            |  70 ++++++++
>   xen/arch/x86/mm/shadow/multi.c   |   3 +-
>   xen/include/asm-x86/hvm/ioreq.h  |   2 +
>   xen/include/asm-x86/p2m.h        |  32 +++-
>   xen/include/public/hvm/hvm_op.h  |  38 ++++-
>   11 files changed, 452 insertions(+), 144 deletions(-)
>
Any comment on this version? Sorry for the disturbance.

B.R.
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
  2016-05-27 10:00   ` Jan Beulich
@ 2016-05-27  9:51     ` Zhang, Yu C
  2016-05-27 10:02     ` George Dunlap
  1 sibling, 0 replies; 68+ messages in thread
From: Zhang, Yu C @ 2016-05-27  9:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Paul.Durrant, xen-devel, George Dunlap, zhiyuan.lv



On 5/27/2016 6:00 PM, Jan Beulich wrote:
>>>> On 27.05.16 at 09:52, <yu.c.zhang@linux.intel.com> wrote:
>> Any comment on this version? Sorry for the disturbance.
> It's on my things to look at, but since it can't go in right now anyway,
> it's not a top priority.

Got it. Thanks for your feedback. :)

Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
  2016-05-27  7:52 ` [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Zhang, Yu C
@ 2016-05-27 10:00   ` Jan Beulich
  2016-05-27  9:51     ` Zhang, Yu C
  2016-05-27 10:02     ` George Dunlap
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2016-05-27 10:00 UTC (permalink / raw)
  To: Yu C Zhang
  Cc: Andrew Cooper, Paul.Durrant, xen-devel, George Dunlap, zhiyuan.lv

>>> On 27.05.16 at 09:52, <yu.c.zhang@linux.intel.com> wrote:
> Any comment on this version? Sorry for the disturbance.

It's on my things to look at, but since it can't go in right now anyway,
it's not a top priority.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
  2016-05-27 10:00   ` Jan Beulich
  2016-05-27  9:51     ` Zhang, Yu C
@ 2016-05-27 10:02     ` George Dunlap
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-05-27 10:02 UTC (permalink / raw)
  To: Jan Beulich, Yu C Zhang
  Cc: Andrew Cooper, Paul.Durrant, zhiyuan.lv, xen-devel

On 27/05/16 11:00, Jan Beulich wrote:
>>>> On 27.05.16 at 09:52, <yu.c.zhang@linux.intel.com> wrote:
>> Any comment on this version? Sorry for the disturbance.
> 
> It's on my things to look at, but since it can't go in right now anyway,
> it's not a top priority.

Same -- I have a queue of "Things To Do Before the Release", and another
queue of "Things To Do After Those Thigns Are Done".  This series is
near the top of the second list.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server.
  2016-05-19  9:05 ` [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
@ 2016-06-14 10:04   ` Jan Beulich
  2016-06-14 13:14     ` George Dunlap
  2016-06-15 10:51     ` Yu Zhang
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-14 10:04 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
> @@ -5507,6 +5507,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              get_gfn_query_unlocked(d, a.pfn, &t);
>              if ( p2m_is_mmio(t) )
>                  a.mem_type =  HVMMEM_mmio_dm;
> +            else if ( t == p2m_ioreq_server )
> +                a.mem_type = HVMMEM_ioreq_server;
>              else if ( p2m_is_readonly(t) )
>                  a.mem_type =  HVMMEM_ram_ro;
>              else if ( p2m_is_ram(t) )

I can see this being suitable to be done here, but ...

> @@ -5537,7 +5539,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              [HVMMEM_ram_rw]  = p2m_ram_rw,
>              [HVMMEM_ram_ro]  = p2m_ram_ro,
>              [HVMMEM_mmio_dm] = p2m_mmio_dm,
> -            [HVMMEM_unused] = p2m_invalid
> +            [HVMMEM_unused] = p2m_invalid,
> +            [HVMMEM_ioreq_server] = p2m_ioreq_server
>          };
>  
>          if ( copy_from_guest(&a, arg, 1) )

... how can this be correct without actual handling having got added?
IOW doesn't at least this change belong into a later patch?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-05-19  9:05 ` [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2016-06-14 10:45   ` Jan Beulich
  2016-06-14 13:13     ` George Dunlap
  2016-06-15 10:52     ` Yu Zhang
  2016-06-14 13:14   ` George Dunlap
  1 sibling, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-14 10:45 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
> let one ioreq server claim/disclaim its responsibility for the
> handling of guest pages with p2m type p2m_ioreq_server. Users
> of this HVMOP can specify which kind of operation is supposed
> to be emulated in a parameter named flags. Currently, this HVMOP
> only support the emulation of write operations. And it can be
> easily extended to support the emulation of read ones if an
> ioreq server has such requirement in the future.

Didn't we determine that this isn't as easy as everyone first thought?

> @@ -178,8 +179,33 @@ static int hvmemul_do_io(
>          break;
>      case X86EMUL_UNHANDLEABLE:
>      {
> -        struct hvm_ioreq_server *s =
> -            hvm_select_ioreq_server(curr->domain, &p);
> +        struct hvm_ioreq_server *s;
> +        p2m_type_t p2mt;
> +
> +        if ( is_mmio )
> +        {
> +            unsigned long gmfn = paddr_to_pfn(addr);
> +
> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
> +
> +            if ( p2mt == p2m_ioreq_server )
> +            {
> +                unsigned long flags;
> +
> +                s = p2m_get_ioreq_server(currd, &flags);
> +
> +                if ( dir == IOREQ_WRITE &&
> +                     !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS) )

Shouldn't this be 

                if ( dir != IOREQ_WRITE ||
                     !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS) )
                    s = NULL;

in which case the question is whether you wouldn't better avoid
calling p2m_get_ioreq_server() in the first place when
dir != IOREQ_WRITE.

> +                    s = NULL;
> +            }
> +            else
> +                s = hvm_select_ioreq_server(currd, &p);
> +        }
> +        else
> +        {
> +            p2mt = p2m_invalid;

What is this needed for? In fact it looks like the variable decaration
could move into the next inner scope (alongside gmfn, which is
questionable to be a local variable anyway, considering that is gets
used just once).

> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>      return rc;
>  }
>  
> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                     uint16_t type, uint32_t flags)

I see no reason why both can't be unsigned int.

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -132,6 +132,12 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>              entry->r = entry->w = entry->x = 1;
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
> +        case p2m_ioreq_server:
> +            entry->r = entry->x = 1;

Why x?

> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>      default:
>          return flags | _PAGE_NX_BIT;
>      case p2m_grant_map_ro:
> -    case p2m_ioreq_server:
>          return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
> +    case p2m_ioreq_server:
> +    {
> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
> +
> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
> +            return flags & ~_PAGE_RW;
> +        else
> +            return flags;
> +    }

Same here (for the missing _PAGE_NX) plus no need for braces.

> @@ -289,6 +291,74 @@ void p2m_memory_type_changed(struct domain *d)
>      }
>  }
>  
> +int p2m_set_ioreq_server(struct domain *d,
> +                         unsigned long flags,

Why "long" and not just "int"?

> +                         struct hvm_ioreq_server *s)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int rc;
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    if ( flags == 0 )
> +    {
> +        rc = -EINVAL;
> +        if ( p2m->ioreq.server != s )
> +            goto out;
> +
> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }

What does "passing" refer to in the comment?

> +    else
> +    {
> +        rc = -EBUSY;
> +        if ( p2m->ioreq.server != NULL )
> +            goto out;
> +
> +        p2m->ioreq.server = s;
> +        p2m->ioreq.flags = flags;
> +    }
> +
> +    rc = 0;
> +
> + out:
> +    spin_unlock(&p2m->ioreq.lock);
> +
> +    return rc;
> +}
> +
> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> +                                              unsigned long *flags)

Again  why "long" and not just "int"?

> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    struct hvm_ioreq_server *s;
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    s = p2m->ioreq.server;
> +    *flags = p2m->ioreq.flags;
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +    return s;
> +}

Locking is somewhat strange here: You protect against the "set"
counterpart altering state while you retrieve it, but you don't
protect against the returned data becoming stale by the time
the caller can consume it. Is that not a problem? (The most
concerning case would seem to be a race of hvmop_set_mem_type()
with de-registration of the type.)

> +void p2m_destroy_ioreq_server(struct domain *d,
> +                              struct hvm_ioreq_server *s)

const

> @@ -336,6 +336,24 @@ struct p2m_domain {
>          struct ept_data ept;
>          /* NPT-equivalent structure could be added here. */
>      };
> +
> +    struct {
> +        spinlock_t lock;
> +        /*
> +         * ioreq server who's responsible for the emulation of
> +         * gfns with specific p2m type(for now, p2m_ioreq_server).
> +         */
> +        struct hvm_ioreq_server *server;
> +        /*
> +         * flags specifies whether read, write or both operations
> +         * are to be emulated by an ioreq server.
> +         */
> +        unsigned int flags;
> +
> +#define P2M_IOREQ_HANDLE_WRITE_ACCESS HVMOP_IOREQ_MEM_ACCESS_WRITE
> +#define P2M_IOREQ_HANDLE_READ_ACCESS  HVMOP_IOREQ_MEM_ACCESS_READ

Is there anything wrong with using the HVMOP_* values directly?

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -89,7 +89,9 @@ typedef enum {
>      HVMMEM_unused,             /* Placeholder; setting memory to this type
>                                    will fail for code after 4.7.0 */
>  #endif
> -    HVMMEM_ioreq_server
> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server; type
> +                                  changes to this value are only allowed after
> +                                  an ioreq server has claimed its ownership */

Missing trailing full stop.

> @@ -383,6 +385,37 @@ struct xen_hvm_set_ioreq_server_state {
>  typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
>  
> +/*
> + * HVMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
> + *                                      to specific memroy type <type>
> + *                                      for specific accesses <flags>
> + *
> + * For now, flags only accept the value of HVMOP_IOREQ_MEM_ACCESS_WRITE,
> + * which means only write operations are to be forwarded to an ioreq server.
> + * Support for the emulation of read operations can be added when an ioreq
> + * server has such requirement in future.
> + */
> +#define HVMOP_map_mem_type_to_ioreq_server 26
> +struct xen_hvm_map_mem_type_to_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - ioreq server id */
> +    uint16_t type;      /* IN - memory type */
> +    uint16_t pad;

This field does not appear to get checked in the handler.

> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
> +                           ioreq server. flags with 0 means to unmap the
> +                           ioreq server */
> +#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
> +#define HVMOP_IOREQ_MEM_ACCESS_READ \
> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
> +
> +#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
> +#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)

Is there any use for these _HVMOP_* values? The more that they
violate standard C name space rules?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-14 10:45   ` Jan Beulich
@ 2016-06-14 13:13     ` George Dunlap
  2016-06-14 13:31       ` Jan Beulich
  2016-06-15 10:52     ` Yu Zhang
  1 sibling, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-14 13:13 UTC (permalink / raw)
  To: Jan Beulich, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

On 14/06/16 11:45, Jan Beulich wrote:
>> +                         struct hvm_ioreq_server *s)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    int rc;
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    if ( flags == 0 )
>> +    {
>> +        rc = -EINVAL;
>> +        if ( p2m->ioreq.server != s )
>> +            goto out;
>> +
>> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
>> +        p2m->ioreq.server = NULL;
>> +        p2m->ioreq.flags = 0;
>> +    }
> 
> What does "passing" refer to in the comment?

You make the map_memtype_... hypercall with "flags" set to 0.  I'm not
sure what's unclear about the sentence; how would you put it differently?

> Locking is somewhat strange here: You protect against the "set"
> counterpart altering state while you retrieve it, but you don't
> protect against the returned data becoming stale by the time
> the caller can consume it. Is that not a problem? (The most
> concerning case would seem to be a race of hvmop_set_mem_type()
> with de-registration of the type.)

How is that different than calling set_mem_type() first, and then
de-registering without first unmapping all the types?

>> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
>> +                           ioreq server. flags with 0 means to unmap the
>> +                           ioreq server */
>> +#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
>> +#define HVMOP_IOREQ_MEM_ACCESS_READ \
>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
>> +
>> +#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
>> +#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)
> 
> Is there any use for these _HVMOP_* values? The more that they
> violate standard C name space rules?

I assume he's just going along with what he sees in params.h.
"Violating standard C name space rules" by having #defines which start
with a single _ seems to be a well-established policy for Xen. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-05-19  9:05 ` [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
  2016-06-14 10:45   ` Jan Beulich
@ 2016-06-14 13:14   ` George Dunlap
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-06-14 13:14 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Paul.Durrant, zhiyuan.lv, Jan Beulich

On 19/05/16 10:05, Yu Zhang wrote:
> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
> let one ioreq server claim/disclaim its responsibility for the
> handling of guest pages with p2m type p2m_ioreq_server. Users
> of this HVMOP can specify which kind of operation is supposed
> to be emulated in a parameter named flags. Currently, this HVMOP
> only support the emulation of write operations. And it can be
> easily extended to support the emulation of read ones if an
> ioreq server has such requirement in the future.
> 
> For now, we only support one ioreq server for this p2m type, so
> once an ioreq server has claimed its ownership, subsequent calls
> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
> triggering this new HVMOP, with ioreq server id set to the current
> owner's and flags parameter set to 0.
> 
> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
> are only supported for HVMs with HAP enabled.
> 
> Also note that only after one ioreq server claims its ownership
> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
> be allowed.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Acked-by: Tim Deegan <tim@xen.org>

Looks OK to me:

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Tim Deegan <tim@xen.org>
> 
> changes in v4:
>   - According to Paul's advice, add comments around the definition
>     of HVMMEM_iore_server in hvm_op.h.
>   - According to Wei Liu's comments, change the format of the commit
>     message.
> 
> changes in v3:
>   - Only support write emulation in this patch;
>   - Remove the code to handle race condition in hvmemul_do_io(),
>   - No need to reset the p2m type after an ioreq server has disclaimed
>     its ownership of p2m_ioreq_server;
>   - Only allow p2m type change to p2m_ioreq_server after an ioreq
>     server has claimed its ownership of p2m_ioreq_server;
>   - Only allow p2m type change to p2m_ioreq_server from pages with type
>     p2m_ram_rw, and vice versa;
>   - HVMOP_map_mem_type_to_ioreq_server interface change - use uint16,
>     instead of enum to specify the memory type;
>   - Function prototype change to p2m_get_ioreq_server();
>   - Coding style changes;
>   - Commit message changes;
>   - Add Tim's Acked-by.
> 
> changes in v2: 
>   - Only support HAP enabled HVMs;
>   - Replace p2m_mem_type_changed() with p2m_change_entry_type_global()
>     to reset the p2m type, when an ioreq server tries to claim/disclaim
>     its ownership of p2m_ioreq_server;
>   - Comments changes.
> ---
>  xen/arch/x86/hvm/emulate.c       | 32 ++++++++++++++++--
>  xen/arch/x86/hvm/hvm.c           | 63 ++++++++++++++++++++++++++++++++++--
>  xen/arch/x86/hvm/ioreq.c         | 41 +++++++++++++++++++++++
>  xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>  xen/arch/x86/mm/p2m-ept.c        |  7 +++-
>  xen/arch/x86/mm/p2m-pt.c         | 23 +++++++++----
>  xen/arch/x86/mm/p2m.c            | 70 ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/mm/shadow/multi.c   |  3 +-
>  xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>  xen/include/asm-x86/p2m.h        | 30 +++++++++++++++--
>  xen/include/public/hvm/hvm_op.h  | 35 +++++++++++++++++++-
>  11 files changed, 289 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index b9cac8e..4571294 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -100,6 +100,7 @@ static int hvmemul_do_io(
>      uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
>  {
>      struct vcpu *curr = current;
> +    struct domain *currd = curr->domain;
>      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>      ioreq_t p = {
>          .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
> @@ -141,7 +142,7 @@ static int hvmemul_do_io(
>               (p.dir != dir) ||
>               (p.df != df) ||
>               (p.data_is_ptr != data_is_addr) )
> -            domain_crash(curr->domain);
> +            domain_crash(currd);
>  
>          if ( data_is_addr )
>              return X86EMUL_UNHANDLEABLE;
> @@ -178,8 +179,33 @@ static int hvmemul_do_io(
>          break;
>      case X86EMUL_UNHANDLEABLE:
>      {
> -        struct hvm_ioreq_server *s =
> -            hvm_select_ioreq_server(curr->domain, &p);
> +        struct hvm_ioreq_server *s;
> +        p2m_type_t p2mt;
> +
> +        if ( is_mmio )
> +        {
> +            unsigned long gmfn = paddr_to_pfn(addr);
> +
> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
> +
> +            if ( p2mt == p2m_ioreq_server )
> +            {
> +                unsigned long flags;
> +
> +                s = p2m_get_ioreq_server(currd, &flags);
> +
> +                if ( dir == IOREQ_WRITE &&
> +                     !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS) )
> +                    s = NULL;
> +            }
> +            else
> +                s = hvm_select_ioreq_server(currd, &p);
> +        }
> +        else
> +        {
> +            p2mt = p2m_invalid;
> +            s = hvm_select_ioreq_server(currd, &p);
> +        }
>  
>          /* If there is no suitable backing DM, just ignore accesses */
>          if ( !s )
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 346da97..23abeb2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4719,6 +4719,40 @@ static int hvmop_unmap_io_range_from_ioreq_server(
>      return rc;
>  }
>  
> +static int hvmop_map_mem_type_to_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop)
> +{
> +    xen_hvm_map_mem_type_to_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    /* Only support for HAP enabled hvm */
> +    if ( !hap_enabled(d) )
> +        goto out;
> +
> +    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d,
> +                              HVMOP_map_mem_type_to_ioreq_server);
> +    if ( rc != 0 )
> +        goto out;
> +
> +    rc = hvm_map_mem_type_to_ioreq_server(d, op.id, op.type, op.flags);
> +
> + out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
>  static int hvmop_set_ioreq_server_state(
>      XEN_GUEST_HANDLE_PARAM(xen_hvm_set_ioreq_server_state_t) uop)
>  {
> @@ -5352,9 +5386,14 @@ static int hvmop_get_mem_type(
>  
>  static bool_t hvm_allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
>  {
> +    if ( new == p2m_ioreq_server )
> +        return old == p2m_ram_rw;
> +
> +    if ( old == p2m_ioreq_server )
> +        return new == p2m_ram_rw;
> +
>      if ( p2m_is_ram(old) ||
> -         (p2m_is_hole(old) && new == p2m_mmio_dm) ||
> -         (old == p2m_ioreq_server && new == p2m_ram_rw) )
> +         (p2m_is_hole(old) && new == p2m_mmio_dm) )
>          return 1;
>  
>      return 0;
> @@ -5389,6 +5428,21 @@ static int hvmop_set_mem_type(
>      if ( !is_hvm_domain(d) )
>          goto out;
>  
> +    if ( a.hvmmem_type == HVMMEM_ioreq_server )
> +    {
> +        unsigned long flags;
> +        struct hvm_ioreq_server *s;
> +
> +        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */
> +        if ( !hap_enabled(d) )
> +            goto out;
> +
> +        /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped. */
> +        s = p2m_get_ioreq_server(d, &flags);
> +        if ( s == NULL )
> +            goto out;
> +    }
> +
>      rc = xsm_hvm_control(XSM_DM_PRIV, d, HVMOP_set_mem_type);
>      if ( rc )
>          goto out;
> @@ -5490,6 +5544,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              guest_handle_cast(arg, xen_hvm_io_range_t));
>          break;
>  
> +    case HVMOP_map_mem_type_to_ioreq_server:
> +        rc = hvmop_map_mem_type_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_map_mem_type_to_ioreq_server_t));
> +        break;
> +
>      case HVMOP_set_ioreq_server_state:
>          rc = hvmop_set_ioreq_server_state(
>              guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 333ce14..d24e108 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
>  
>          domain_pause(d);
>  
> +        p2m_destroy_ioreq_server(d, s);
> +
>          hvm_ioreq_server_disable(s, 0);
>  
>          list_del(&s->list_entry);
> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>      return rc;
>  }
>  
> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                     uint16_t type, uint32_t flags)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    /* For now, only HVMMEM_ioreq_server is supported. */
> +    if ( type != HVMMEM_ioreq_server )
> +        return -EINVAL;
> +
> +    /* For now, only write emulation is supported. */
> +    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_WRITE) )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;
> +
> +        if ( s->id == id )
> +        {
> +            rc = p2m_set_ioreq_server(d, flags, s);
> +            if ( rc == 0 )
> +                dprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n",
> +                         s->id, (flags != 0) ? "mapped to" : "unmapped from");
> +
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
> +    return rc;
> +}
> +
>  int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
>                                 bool_t enabled)
>  {
> diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
> index d41bb09..aa90a62 100644
> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -174,7 +174,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
>      if ( *p2mt == p2m_mmio_direct )
>          goto direct_mmio_out;
>      rc = NESTEDHVM_PAGEFAULT_MMIO;
> -    if ( *p2mt == p2m_mmio_dm )
> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
>          goto out;
>  
>      rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index a45a573..c5d1305 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -132,6 +132,12 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>              entry->r = entry->w = entry->x = 1;
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
> +        case p2m_ioreq_server:
> +            entry->r = entry->x = 1;
> +            entry->w = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS);
> +            entry->a = !!cpu_has_vmx_ept_ad;
> +            entry->d = entry->w && cpu_has_vmx_ept_ad;
> +            break;
>          case p2m_mmio_direct:
>              entry->r = entry->x = 1;
>              entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
> @@ -171,7 +177,6 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
>          case p2m_grant_map_ro:
> -        case p2m_ioreq_server:
>              entry->r = 1;
>              entry->w = entry->x = 0;
>              entry->a = !!cpu_has_vmx_ept_ad;
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index eabd2e3..bf75afa 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -72,7 +72,9 @@ static const unsigned long pgt[] = {
>      PGT_l3_page_table
>  };
>  
> -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
> +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
> +                                       p2m_type_t t,
> +                                       mfn_t mfn,
>                                         unsigned int level)
>  {
>      unsigned long flags;
> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>      default:
>          return flags | _PAGE_NX_BIT;
>      case p2m_grant_map_ro:
> -    case p2m_ioreq_server:
>          return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
> +    case p2m_ioreq_server:
> +    {
> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
> +
> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
> +            return flags & ~_PAGE_RW;
> +        else
> +            return flags;
> +    }
>      case p2m_ram_ro:
>      case p2m_ram_logdirty:
>      case p2m_ram_shared:
> @@ -442,7 +452,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
>              p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
>                                ? p2m_ram_logdirty : p2m_ram_rw;
>              unsigned long mfn = l1e_get_pfn(e);
> -            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
> +            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
> +                                                    _mfn(mfn), level);
>  
>              if ( level )
>              {
> @@ -579,7 +590,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>          l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
>              ? l3e_from_pfn(mfn_x(mfn),
> -                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
> +                           p2m_type_to_flags(p2m, p2mt, mfn, 2) | _PAGE_PSE)
>              : l3e_empty();
>          entry_content.l1 = l3e_content.l3;
>  
> @@ -615,7 +626,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>  
>          if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
>              entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
> -                                             p2m_type_to_flags(p2mt, mfn, 0));
> +                                         p2m_type_to_flags(p2m, p2mt, mfn, 0));
>          else
>              entry_content = l1e_empty();
>  
> @@ -651,7 +662,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>          if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
>              l2e_content = l2e_from_pfn(mfn_x(mfn),
> -                                       p2m_type_to_flags(p2mt, mfn, 1) |
> +                                       p2m_type_to_flags(p2m, p2mt, mfn, 1) |
>                                         _PAGE_PSE);
>          else
>              l2e_content = l2e_empty();
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 9b19769..59afa2c 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -83,6 +83,8 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
>      else
>          p2m_pt_init(p2m);
>  
> +    spin_lock_init(&p2m->ioreq.lock);
> +
>      return ret;
>  }
>  
> @@ -289,6 +291,74 @@ void p2m_memory_type_changed(struct domain *d)
>      }
>  }
>  
> +int p2m_set_ioreq_server(struct domain *d,
> +                         unsigned long flags,
> +                         struct hvm_ioreq_server *s)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int rc;
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    if ( flags == 0 )
> +    {
> +        rc = -EINVAL;
> +        if ( p2m->ioreq.server != s )
> +            goto out;
> +
> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }
> +    else
> +    {
> +        rc = -EBUSY;
> +        if ( p2m->ioreq.server != NULL )
> +            goto out;
> +
> +        p2m->ioreq.server = s;
> +        p2m->ioreq.flags = flags;
> +    }
> +
> +    rc = 0;
> +
> + out:
> +    spin_unlock(&p2m->ioreq.lock);
> +
> +    return rc;
> +}
> +
> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> +                                              unsigned long *flags)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    struct hvm_ioreq_server *s;
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    s = p2m->ioreq.server;
> +    *flags = p2m->ioreq.flags;
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +    return s;
> +}
> +
> +void p2m_destroy_ioreq_server(struct domain *d,
> +                              struct hvm_ioreq_server *s)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    if ( p2m->ioreq.server == s )
> +    {
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +}
> +
>  void p2m_enable_hardware_log_dirty(struct domain *d)
>  {
>      struct p2m_domain *p2m = p2m_get_hostp2m(d);
> diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index b322293..ae845d2 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3225,8 +3225,7 @@ static int sh_page_fault(struct vcpu *v,
>      }
>  
>      /* Need to hand off device-model MMIO to the device model */
> -    if ( p2mt == p2m_mmio_dm
> -         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
> +    if ( p2mt == p2m_mmio_dm )
>      {
>          gpa = guest_walk_to_gpa(&gw);
>          goto mmio;
> diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
> index fbf2c74..340ae3e 100644
> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
>  int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>                                           uint32_t type, uint64_t start,
>                                           uint64_t end);
> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                     uint16_t type, uint32_t flags);
>  int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
>                                 bool_t enabled);
>  
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index f3e87d6..3aa0dd7 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
>                         | p2m_to_mask(p2m_ram_paging_out)      \
>                         | p2m_to_mask(p2m_ram_paged)           \
>                         | p2m_to_mask(p2m_ram_paging_in)       \
> -                       | p2m_to_mask(p2m_ram_shared))
> +                       | p2m_to_mask(p2m_ram_shared)          \
> +                       | p2m_to_mask(p2m_ioreq_server))
>  
>  /* Types that represent a physmap hole that is ok to replace with a shared
>   * entry */
> @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
>  #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
>                        | p2m_to_mask(p2m_ram_ro)         \
>                        | p2m_to_mask(p2m_grant_map_ro)   \
> -                      | p2m_to_mask(p2m_ram_shared)     \
> -                      | p2m_to_mask(p2m_ioreq_server))
> +                      | p2m_to_mask(p2m_ram_shared))
>  
>  /* Write-discard types, which should discard the write operations */
>  #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
> @@ -336,6 +336,24 @@ struct p2m_domain {
>          struct ept_data ept;
>          /* NPT-equivalent structure could be added here. */
>      };
> +
> +    struct {
> +        spinlock_t lock;
> +        /*
> +         * ioreq server who's responsible for the emulation of
> +         * gfns with specific p2m type(for now, p2m_ioreq_server).
> +         */
> +        struct hvm_ioreq_server *server;
> +        /*
> +         * flags specifies whether read, write or both operations
> +         * are to be emulated by an ioreq server.
> +         */
> +        unsigned int flags;
> +
> +#define P2M_IOREQ_HANDLE_WRITE_ACCESS HVMOP_IOREQ_MEM_ACCESS_WRITE
> +#define P2M_IOREQ_HANDLE_READ_ACCESS  HVMOP_IOREQ_MEM_ACCESS_READ
> +
> +    } ioreq;
>  };
>  
>  /* get host p2m table */
> @@ -843,6 +861,12 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
>      return flags;
>  }
>  
> +int p2m_set_ioreq_server(struct domain *d, unsigned long flags,
> +                         struct hvm_ioreq_server *s);
> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> +                                              unsigned long *flags);
> +void p2m_destroy_ioreq_server(struct domain *d, struct hvm_ioreq_server *s);
> +
>  #endif /* _XEN_P2M_H */
>  
>  /*
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index b3e45cf..22c15a7 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -89,7 +89,9 @@ typedef enum {
>      HVMMEM_unused,             /* Placeholder; setting memory to this type
>                                    will fail for code after 4.7.0 */
>  #endif
> -    HVMMEM_ioreq_server
> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server; type
> +                                  changes to this value are only allowed after
> +                                  an ioreq server has claimed its ownership */
>  } hvmmem_type_t;
>  
>  /* Following tools-only interfaces may change in future. */
> @@ -383,6 +385,37 @@ struct xen_hvm_set_ioreq_server_state {
>  typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
>  
> +/*
> + * HVMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
> + *                                      to specific memroy type <type>
> + *                                      for specific accesses <flags>
> + *
> + * For now, flags only accept the value of HVMOP_IOREQ_MEM_ACCESS_WRITE,
> + * which means only write operations are to be forwarded to an ioreq server.
> + * Support for the emulation of read operations can be added when an ioreq
> + * server has such requirement in future.
> + */
> +#define HVMOP_map_mem_type_to_ioreq_server 26
> +struct xen_hvm_map_mem_type_to_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - ioreq server id */
> +    uint16_t type;      /* IN - memory type */
> +    uint16_t pad;
> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
> +                           ioreq server. flags with 0 means to unmap the
> +                           ioreq server */
> +#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
> +#define HVMOP_IOREQ_MEM_ACCESS_READ \
> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
> +
> +#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
> +#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)
> +};
> +typedef struct xen_hvm_map_mem_type_to_ioreq_server
> +    xen_hvm_map_mem_type_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_mem_type_to_ioreq_server_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  
>  #if defined(__i386__) || defined(__x86_64__)
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server.
  2016-06-14 10:04   ` Jan Beulich
@ 2016-06-14 13:14     ` George Dunlap
  2016-06-15 10:51     ` Yu Zhang
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-06-14 13:14 UTC (permalink / raw)
  To: Jan Beulich, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

On 14/06/16 11:04, Jan Beulich wrote:
>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>> @@ -5507,6 +5507,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>              get_gfn_query_unlocked(d, a.pfn, &t);
>>              if ( p2m_is_mmio(t) )
>>                  a.mem_type =  HVMMEM_mmio_dm;
>> +            else if ( t == p2m_ioreq_server )
>> +                a.mem_type = HVMMEM_ioreq_server;
>>              else if ( p2m_is_readonly(t) )
>>                  a.mem_type =  HVMMEM_ram_ro;
>>              else if ( p2m_is_ram(t) )
> 
> I can see this being suitable to be done here, but ...
> 
>> @@ -5537,7 +5539,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>              [HVMMEM_ram_rw]  = p2m_ram_rw,
>>              [HVMMEM_ram_ro]  = p2m_ram_ro,
>>              [HVMMEM_mmio_dm] = p2m_mmio_dm,
>> -            [HVMMEM_unused] = p2m_invalid
>> +            [HVMMEM_unused] = p2m_invalid,
>> +            [HVMMEM_ioreq_server] = p2m_ioreq_server
>>          };
>>  
>>          if ( copy_from_guest(&a, arg, 1) )
> 
> ... how can this be correct without actual handling having got added?
> IOW doesn't at least this change belong into a later patch?

+1

With that change:

Acked-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-14 13:13     ` George Dunlap
@ 2016-06-14 13:31       ` Jan Beulich
  2016-06-15  9:50         ` George Dunlap
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-14 13:31 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

>>> On 14.06.16 at 15:13, <george.dunlap@citrix.com> wrote:
> On 14/06/16 11:45, Jan Beulich wrote:
>>> +                         struct hvm_ioreq_server *s)
>>> +{
>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +    int rc;
>>> +
>>> +    spin_lock(&p2m->ioreq.lock);
>>> +
>>> +    if ( flags == 0 )
>>> +    {
>>> +        rc = -EINVAL;
>>> +        if ( p2m->ioreq.server != s )
>>> +            goto out;
>>> +
>>> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
>>> +        p2m->ioreq.server = NULL;
>>> +        p2m->ioreq.flags = 0;
>>> +    }
>> 
>> What does "passing" refer to in the comment?
> 
> You make the map_memtype_... hypercall with "flags" set to 0.  I'm not
> sure what's unclear about the sentence; how would you put it differently?

I'd use "flushing", or indeed anything that doesn't resemble wording
used to describe how arguments get handed to functions.

>> Locking is somewhat strange here: You protect against the "set"
>> counterpart altering state while you retrieve it, but you don't
>> protect against the returned data becoming stale by the time
>> the caller can consume it. Is that not a problem? (The most
>> concerning case would seem to be a race of hvmop_set_mem_type()
>> with de-registration of the type.)
> 
> How is that different than calling set_mem_type() first, and then
> de-registering without first unmapping all the types?

Didn't we all agree this is something that should be disallowed
anyway (not that I've seen this implemented, i.e. just being
reminded of it by your reply)?

>>> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>> +                           ioreq server. flags with 0 means to unmap the
>>> +                           ioreq server */
>>> +#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
>>> +#define HVMOP_IOREQ_MEM_ACCESS_READ \
>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
>>> +
>>> +#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
>>> +#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)
>> 
>> Is there any use for these _HVMOP_* values? The more that they
>> violate standard C name space rules?
> 
> I assume he's just going along with what he sees in params.h.
> "Violating standard C name space rules" by having #defines which start
> with a single _ seems to be a well-established policy for Xen. :-)

Sadly, and I'm trying to prevent matters becoming worse.
Speaking of which - there are XEN_ prefixes missing here too.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-14 13:31       ` Jan Beulich
@ 2016-06-15  9:50         ` George Dunlap
  2016-06-15 10:21           ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-15  9:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

On 14/06/16 14:31, Jan Beulich wrote:
>>>> On 14.06.16 at 15:13, <george.dunlap@citrix.com> wrote:
>> On 14/06/16 11:45, Jan Beulich wrote:
>>>> +                         struct hvm_ioreq_server *s)
>>>> +{
>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +    int rc;
>>>> +
>>>> +    spin_lock(&p2m->ioreq.lock);
>>>> +
>>>> +    if ( flags == 0 )
>>>> +    {
>>>> +        rc = -EINVAL;
>>>> +        if ( p2m->ioreq.server != s )
>>>> +            goto out;
>>>> +
>>>> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
>>>> +        p2m->ioreq.server = NULL;
>>>> +        p2m->ioreq.flags = 0;
>>>> +    }
>>>
>>> What does "passing" refer to in the comment?
>>
>> You make the map_memtype_... hypercall with "flags" set to 0.  I'm not
>> sure what's unclear about the sentence; how would you put it differently?
> 
> I'd use "flushing", or indeed anything that doesn't resemble wording
> used to describe how arguments get handed to functions.
> 
>>> Locking is somewhat strange here: You protect against the "set"
>>> counterpart altering state while you retrieve it, but you don't
>>> protect against the returned data becoming stale by the time
>>> the caller can consume it. Is that not a problem? (The most
>>> concerning case would seem to be a race of hvmop_set_mem_type()
>>> with de-registration of the type.)
>>
>> How is that different than calling set_mem_type() first, and then
>> de-registering without first unmapping all the types?
> 
> Didn't we all agree this is something that should be disallowed
> anyway (not that I've seen this implemented, i.e. just being
> reminded of it by your reply)?

I think I suggested it as a good idea, but Paul and Yang both thought it
wasn't necessary.  Do you think it should be a requirement?

We could have the de-registering operation fail in those circumstances;
but probably a more robust thing to do would be to have Xen go change
all the ioreq_server entires back to ram_rw (since if the caller just
ignores the failure, things are in an even worse state).

> 
>>>> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>>> +                           ioreq server. flags with 0 means to unmap the
>>>> +                           ioreq server */
>>>> +#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
>>>> +#define HVMOP_IOREQ_MEM_ACCESS_READ \
>>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
>>>> +
>>>> +#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
>>>> +#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
>>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)
>>>
>>> Is there any use for these _HVMOP_* values? The more that they
>>> violate standard C name space rules?
>>
>> I assume he's just going along with what he sees in params.h.
>> "Violating standard C name space rules" by having #defines which start
>> with a single _ seems to be a well-established policy for Xen. :-)
> 
> Sadly, and I'm trying to prevent matters becoming worse.
> Speaking of which - there are XEN_ prefixes missing here too.

Right, so in that case I think I would have said, "I realize that lots
of other places in the Xen interface use this sort of template for
flags, but I think it's a bad idea and I'm trying to stop it expanding.
 Is there any actual need to have the bit numbers defined separately?
If not, please just define each flag as (1u << 0), &c."

I think you've tripped over "changing coding styles" in unfamiliar code
before too, so you know how frustrating it is to try to follow the
existing coding style only to be told that you did it wrong. :-)

 -George



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-15  9:50         ` George Dunlap
@ 2016-06-15 10:21           ` Jan Beulich
  2016-06-15 11:28             ` George Dunlap
  2016-06-16  9:30             ` Yu Zhang
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-15 10:21 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

>>> On 15.06.16 at 11:50, <george.dunlap@citrix.com> wrote:
> On 14/06/16 14:31, Jan Beulich wrote:
>>>>> On 14.06.16 at 15:13, <george.dunlap@citrix.com> wrote:
>>> On 14/06/16 11:45, Jan Beulich wrote:
>>>> Locking is somewhat strange here: You protect against the "set"
>>>> counterpart altering state while you retrieve it, but you don't
>>>> protect against the returned data becoming stale by the time
>>>> the caller can consume it. Is that not a problem? (The most
>>>> concerning case would seem to be a race of hvmop_set_mem_type()
>>>> with de-registration of the type.)
>>>
>>> How is that different than calling set_mem_type() first, and then
>>> de-registering without first unmapping all the types?
>> 
>> Didn't we all agree this is something that should be disallowed
>> anyway (not that I've seen this implemented, i.e. just being
>> reminded of it by your reply)?
> 
> I think I suggested it as a good idea, but Paul and Yang both thought it
> wasn't necessary.  Do you think it should be a requirement?

I think things shouldn't be left in a half-adjusted state.

> We could have the de-registering operation fail in those circumstances;
> but probably a more robust thing to do would be to have Xen go change
> all the ioreq_server entires back to ram_rw (since if the caller just
> ignores the failure, things are in an even worse state).

If that's reasonable to do without undue delay (e.g. by using
the usual "recalculate everything" forced to trickle down through
the page table levels, then that's as good.

>>>>> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>>>> +                           ioreq server. flags with 0 means to unmap the
>>>>> +                           ioreq server */
>>>>> +#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
>>>>> +#define HVMOP_IOREQ_MEM_ACCESS_READ \
>>>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
>>>>> +
>>>>> +#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
>>>>> +#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
>>>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)
>>>>
>>>> Is there any use for these _HVMOP_* values? The more that they
>>>> violate standard C name space rules?
>>>
>>> I assume he's just going along with what he sees in params.h.
>>> "Violating standard C name space rules" by having #defines which start
>>> with a single _ seems to be a well-established policy for Xen. :-)
>> 
>> Sadly, and I'm trying to prevent matters becoming worse.
>> Speaking of which - there are XEN_ prefixes missing here too.
> 
> Right, so in that case I think I would have said, "I realize that lots
> of other places in the Xen interface use this sort of template for
> flags, but I think it's a bad idea and I'm trying to stop it expanding.
>  Is there any actual need to have the bit numbers defined separately?
> If not, please just define each flag as (1u << 0), &c."

Actually my coding style related comment wasn't for these two
stage definitions - for those I simply questioned whether they're
needed. My style complaint was for the <underscore><uppercase>
name pattern (which would simply be avoided by not having the
individual bit number #define-s).

> I think you've tripped over "changing coding styles" in unfamiliar code
> before too, so you know how frustrating it is to try to follow the
> existing coding style only to be told that you did it wrong. :-)

Agreed, you caught me on this one. Albeit with the slight
difference that in the public interface we can't as easily correct
old mistakes to aid people who simply clone surrounding code
when adding new bits (the possibility of adding #ifdef-ery doesn't
seem very attractive to me there, unless we got reports of actual
name space collisions).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server.
  2016-06-14 10:04   ` Jan Beulich
  2016-06-14 13:14     ` George Dunlap
@ 2016-06-15 10:51     ` Yu Zhang
  1 sibling, 0 replies; 68+ messages in thread
From: Yu Zhang @ 2016-06-15 10:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima


On 6/14/2016 6:04 PM, Jan Beulich wrote:
>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>> @@ -5507,6 +5507,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>               get_gfn_query_unlocked(d, a.pfn, &t);
>>               if ( p2m_is_mmio(t) )
>>                   a.mem_type =  HVMMEM_mmio_dm;
>> +            else if ( t == p2m_ioreq_server )
>> +                a.mem_type = HVMMEM_ioreq_server;
>>               else if ( p2m_is_readonly(t) )
>>                   a.mem_type =  HVMMEM_ram_ro;
>>               else if ( p2m_is_ram(t) )
> I can see this being suitable to be done here, but ...
>
>> @@ -5537,7 +5539,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>               [HVMMEM_ram_rw]  = p2m_ram_rw,
>>               [HVMMEM_ram_ro]  = p2m_ram_ro,
>>               [HVMMEM_mmio_dm] = p2m_mmio_dm,
>> -            [HVMMEM_unused] = p2m_invalid
>> +            [HVMMEM_unused] = p2m_invalid,
>> +            [HVMMEM_ioreq_server] = p2m_ioreq_server
>>           };
>>   
>>           if ( copy_from_guest(&a, arg, 1) )
> ... how can this be correct without actual handling having got added?
> IOW doesn't at least this change belong into a later patch?

Thanks for your comments. :)
Well, although the new handling logic is in the 3rd patch, we still have 
the old handling code.
Without the other patches, a developer can still use HVMMEM_ioreq_server 
to write-protect
some guest ram pages, and try to handle the write operations on these 
pages with the old
approach - tracking these gfns insid the ioreq server's rangeset.

Thanks
Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-14 10:45   ` Jan Beulich
  2016-06-14 13:13     ` George Dunlap
@ 2016-06-15 10:52     ` Yu Zhang
  2016-06-15 12:26       ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-15 10:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 6/14/2016 6:45 PM, Jan Beulich wrote:
>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>> let one ioreq server claim/disclaim its responsibility for the
>> handling of guest pages with p2m type p2m_ioreq_server. Users
>> of this HVMOP can specify which kind of operation is supposed
>> to be emulated in a parameter named flags. Currently, this HVMOP
>> only support the emulation of write operations. And it can be
>> easily extended to support the emulation of read ones if an
>> ioreq server has such requirement in the future.
> Didn't we determine that this isn't as easy as everyone first thought?

My understanding is that to emulate read, we need to change the definition
of is_epte_present(), and I do not think this change will cause much 
trouble.
But since no one is using the read emulation, I am convinced the more 
cautious
way is to only support emulations for write operations for now.

>> @@ -178,8 +179,33 @@ static int hvmemul_do_io(
>>           break;
>>       case X86EMUL_UNHANDLEABLE:
>>       {
>> -        struct hvm_ioreq_server *s =
>> -            hvm_select_ioreq_server(curr->domain, &p);
>> +        struct hvm_ioreq_server *s;
>> +        p2m_type_t p2mt;
>> +
>> +        if ( is_mmio )
>> +        {
>> +            unsigned long gmfn = paddr_to_pfn(addr);
>> +
>> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
>> +
>> +            if ( p2mt == p2m_ioreq_server )
>> +            {
>> +                unsigned long flags;
>> +
>> +                s = p2m_get_ioreq_server(currd, &flags);
>> +
>> +                if ( dir == IOREQ_WRITE &&
>> +                     !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS) )
> Shouldn't this be
>
>                  if ( dir != IOREQ_WRITE ||
>                       !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS) )
>                      s = NULL;
>
> in which case the question is whether you wouldn't better avoid
> calling p2m_get_ioreq_server() in the first place when
> dir != IOREQ_WRITE.

You are right. Since dir with IOREQ_READ is not supposed to enter this 
code path, I'd
better change above code to check value of dir first before calling 
p2m_get_ioreq_server().

>> +                    s = NULL;
>> +            }
>> +            else
>> +                s = hvm_select_ioreq_server(currd, &p);
>> +        }
>> +        else
>> +        {
>> +            p2mt = p2m_invalid;
> What is this needed for? In fact it looks like the variable decaration
> could move into the next inner scope (alongside gmfn, which is
> questionable to be a local variable anyway, considering that is gets
> used just once).

Got it. Thanks.

>> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>       return rc;
>>   }
>>   
>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>> +                                     uint16_t type, uint32_t flags)
> I see no reason why both can't be unsigned int.

Parameter type is passed in from the type field inside struct 
xen_hvm_map_mem_type_to_ioreq_server,
which is a uint16_t, followed with a uint16_t pad. Now I am wondering, 
may be we can just remove the pad
field in this structure and just define type as uint32_t.

>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -132,6 +132,12 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>               entry->r = entry->w = entry->x = 1;
>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>               break;
>> +        case p2m_ioreq_server:
>> +            entry->r = entry->x = 1;
> Why x?

Setting entry->x to 1 is not a must. I can remove it. :)

>> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>       default:
>>           return flags | _PAGE_NX_BIT;
>>       case p2m_grant_map_ro:
>> -    case p2m_ioreq_server:
>>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>> +    case p2m_ioreq_server:
>> +    {
>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
>> +
>> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
>> +            return flags & ~_PAGE_RW;
>> +        else
>> +            return flags;
>> +    }
> Same here (for the missing _PAGE_NX) plus no need for braces.

I'll remove the brace. And we do not need to set the _PAGE_NX_BIT, like 
the p2m_ram_ro case I guess.

>
>> @@ -289,6 +291,74 @@ void p2m_memory_type_changed(struct domain *d)
>>       }
>>   }
>>   
>> +int p2m_set_ioreq_server(struct domain *d,
>> +                         unsigned long flags,
> Why "long" and not just "int"?

You are right, will change to unsigned int in next version.

>> +                         struct hvm_ioreq_server *s)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    int rc;
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    if ( flags == 0 )
>> +    {
>> +        rc = -EINVAL;
>> +        if ( p2m->ioreq.server != s )
>> +            goto out;
>> +
>> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
>> +        p2m->ioreq.server = NULL;
>> +        p2m->ioreq.flags = 0;
>> +    }
> What does "passing" refer to in the comment?

It means if this routine is called with flags=0, it will unmap the ioreq 
server.

>> +    else
>> +    {
>> +        rc = -EBUSY;
>> +        if ( p2m->ioreq.server != NULL )
>> +            goto out;
>> +
>> +        p2m->ioreq.server = s;
>> +        p2m->ioreq.flags = flags;
>> +    }
>> +
>> +    rc = 0;
>> +
>> + out:
>> +    spin_unlock(&p2m->ioreq.lock);
>> +
>> +    return rc;
>> +}
>> +
>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>> +                                              unsigned long *flags)
> Again  why "long" and not just "int"?

Got it.  Thanks.

>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    struct hvm_ioreq_server *s;
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    s = p2m->ioreq.server;
>> +    *flags = p2m->ioreq.flags;
>> +
>> +    spin_unlock(&p2m->ioreq.lock);
>> +    return s;
>> +}
> Locking is somewhat strange here: You protect against the "set"
> counterpart altering state while you retrieve it, but you don't
> protect against the returned data becoming stale by the time
> the caller can consume it. Is that not a problem? (The most
> concerning case would seem to be a race of hvmop_set_mem_type()
> with de-registration of the type.)

Yes. The case you mentioned might happen. But it's not a big deal I 
guess. If such
case happens, the  backend driver will receive an io request and can 
then discard it
if it has just de-registered the mem type.

>> +void p2m_destroy_ioreq_server(struct domain *d,
>> +                              struct hvm_ioreq_server *s)
> const
>
>> @@ -336,6 +336,24 @@ struct p2m_domain {
>>           struct ept_data ept;
>>           /* NPT-equivalent structure could be added here. */
>>       };
>> +
>> +    struct {
>> +        spinlock_t lock;
>> +        /*
>> +         * ioreq server who's responsible for the emulation of
>> +         * gfns with specific p2m type(for now, p2m_ioreq_server).
>> +         */
>> +        struct hvm_ioreq_server *server;
>> +        /*
>> +         * flags specifies whether read, write or both operations
>> +         * are to be emulated by an ioreq server.
>> +         */
>> +        unsigned int flags;
>> +
>> +#define P2M_IOREQ_HANDLE_WRITE_ACCESS HVMOP_IOREQ_MEM_ACCESS_WRITE
>> +#define P2M_IOREQ_HANDLE_READ_ACCESS  HVMOP_IOREQ_MEM_ACCESS_READ
> Is there anything wrong with using the HVMOP_* values directly?
>
>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -89,7 +89,9 @@ typedef enum {
>>       HVMMEM_unused,             /* Placeholder; setting memory to this type
>>                                     will fail for code after 4.7.0 */
>>   #endif
>> -    HVMMEM_ioreq_server
>> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server; type
>> +                                  changes to this value are only allowed after
>> +                                  an ioreq server has claimed its ownership */
> Missing trailing full stop.
>
>> @@ -383,6 +385,37 @@ struct xen_hvm_set_ioreq_server_state {
>>   typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
>>   
>> +/*
>> + * HVMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
>> + *                                      to specific memroy type <type>
>> + *                                      for specific accesses <flags>
>> + *
>> + * For now, flags only accept the value of HVMOP_IOREQ_MEM_ACCESS_WRITE,
>> + * which means only write operations are to be forwarded to an ioreq server.
>> + * Support for the emulation of read operations can be added when an ioreq
>> + * server has such requirement in future.
>> + */
>> +#define HVMOP_map_mem_type_to_ioreq_server 26
>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>> +    domid_t domid;      /* IN - domain to be serviced */
>> +    ioservid_t id;      /* IN - ioreq server id */
>> +    uint16_t type;      /* IN - memory type */
>> +    uint16_t pad;
> This field does not appear to get checked in the handler.

I am now wondering, how about we remove this pad field and define type 
as uint32_t?

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-15 10:21           ` Jan Beulich
@ 2016-06-15 11:28             ` George Dunlap
  2016-06-16  9:30             ` Yu Zhang
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-06-15 11:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

On 15/06/16 11:21, Jan Beulich wrote:
>> I think you've tripped over "changing coding styles" in unfamiliar code
>> before too, so you know how frustrating it is to try to follow the
>> existing coding style only to be told that you did it wrong. :-)
> 
> Agreed, you caught me on this one. Albeit with the slight
> difference that in the public interface we can't as easily correct
> old mistakes to aid people who simply clone surrounding code
> when adding new bits (the possibility of adding #ifdef-ery doesn't
> seem very attractive to me there, unless we got reports of actual
> name space collisions).

Indeed; and just to be clear, I wasn't trying to criticize the
deprecated coding style in the headers -- but given that there *is*
deprecated coding style, it's worth a slight apology when asking people
to use the new style instead of falling in line with what's already
there. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-15 10:52     ` Yu Zhang
@ 2016-06-15 12:26       ` Jan Beulich
  2016-06-16  9:32         ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-15 12:26 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 15.06.16 at 12:52, <yu.c.zhang@linux.intel.com> wrote:
> On 6/14/2016 6:45 PM, Jan Beulich wrote:
>>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>> let one ioreq server claim/disclaim its responsibility for the
>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>> of this HVMOP can specify which kind of operation is supposed
>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>> only support the emulation of write operations. And it can be
>>> easily extended to support the emulation of read ones if an
>>> ioreq server has such requirement in the future.
>> Didn't we determine that this isn't as easy as everyone first thought?
> 
> My understanding is that to emulate read, we need to change the definition
> of is_epte_present(), and I do not think this change will cause much 
> trouble.
> But since no one is using the read emulation, I am convinced the more 
> cautious
> way is to only support emulations for write operations for now.

Well, okay. I'd personally drop the "easily", but you know what
to tell people if later they come ask how this "easily" was meant.

>>> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain 
> *d, ioservid_t id,
>>>       return rc;
>>>   }
>>>   
>>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>> +                                     uint16_t type, uint32_t flags)
>> I see no reason why both can't be unsigned int.
> 
> Parameter type is passed in from the type field inside struct 
> xen_hvm_map_mem_type_to_ioreq_server,
> which is a uint16_t, followed with a uint16_t pad. Now I am wondering, 
> may be we can just remove the pad
> field in this structure and just define type as uint32_t.

I think keeping the interface structure unchanged is the desirable
route here. What I dislike is the passing around of non-natural
width types, which is more expensive in terms of processing. I.e.
as long as a fixed width type (which is necessary to be used in
the public interface) fits in "unsigned int", that should be the
respective internal type. Otherwise "unsigned long" etc.

There are cases where even internally we indeed want to use
fixed width types, and admittedly there are likely far more cases
where internally fixed width types get used without good reason,
but just like everywhere else - let's please not make this worse.
IOW please use fixed width types only when you really need them.

>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -132,6 +132,12 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>               entry->r = entry->w = entry->x = 1;
>>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>               break;
>>> +        case p2m_ioreq_server:
>>> +            entry->r = entry->x = 1;
>> Why x?
> 
> Setting entry->x to 1 is not a must. I can remove it. :)

Please do. We shouldn't grant permissions without reason.

>>> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>>       default:
>>>           return flags | _PAGE_NX_BIT;
>>>       case p2m_grant_map_ro:
>>> -    case p2m_ioreq_server:
>>>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>>> +    case p2m_ioreq_server:
>>> +    {
>>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
>>> +
>>> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
>>> +            return flags & ~_PAGE_RW;
>>> +        else
>>> +            return flags;
>>> +    }
>> Same here (for the missing _PAGE_NX) plus no need for braces.
> 
> I'll remove the brace. And we do not need to set the _PAGE_NX_BIT, like 
> the p2m_ram_ro case I guess.

I hope you mean the inverse: You should set _PAGE_NX_BIT here.

>>> +                         struct hvm_ioreq_server *s)
>>> +{
>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +    int rc;
>>> +
>>> +    spin_lock(&p2m->ioreq.lock);
>>> +
>>> +    if ( flags == 0 )
>>> +    {
>>> +        rc = -EINVAL;
>>> +        if ( p2m->ioreq.server != s )
>>> +            goto out;
>>> +
>>> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
>>> +        p2m->ioreq.server = NULL;
>>> +        p2m->ioreq.flags = 0;
>>> +    }
>> What does "passing" refer to in the comment?
> 
> It means if this routine is called with flags=0, it will unmap the ioreq 
> server.

Oh, that's a completely different reading of the comment than I had
implied: With what you say, "flags" here really refers to the function
parameter, whereas I implied it to refer to the structure field. I think
if that's what you want to say, then the comment should be put next
to the surrounding if() to clarify what "flags" refers to.

>>> +{
>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +    struct hvm_ioreq_server *s;
>>> +
>>> +    spin_lock(&p2m->ioreq.lock);
>>> +
>>> +    s = p2m->ioreq.server;
>>> +    *flags = p2m->ioreq.flags;
>>> +
>>> +    spin_unlock(&p2m->ioreq.lock);
>>> +    return s;
>>> +}
>> Locking is somewhat strange here: You protect against the "set"
>> counterpart altering state while you retrieve it, but you don't
>> protect against the returned data becoming stale by the time
>> the caller can consume it. Is that not a problem? (The most
>> concerning case would seem to be a race of hvmop_set_mem_type()
>> with de-registration of the type.)
> 
> Yes. The case you mentioned might happen. But it's not a big deal I 
> guess. If such
> case happens, the  backend driver will receive an io request and can 
> then discard it
> if it has just de-registered the mem type.

Could you clarify in a comment then what the lock is (and is not)
meant to guard against?

>>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>>> +    domid_t domid;      /* IN - domain to be serviced */
>>> +    ioservid_t id;      /* IN - ioreq server id */
>>> +    uint16_t type;      /* IN - memory type */
>>> +    uint16_t pad;
>> This field does not appear to get checked in the handler.
> 
> I am now wondering, how about we remove this pad field and define type 
> as uint32_t?

As above - I think the current layout is fine. But I'm also not heavily
opposed to using uint32_t here. It's not a stable interface anyway
(and I already have a series mostly ready to split off all control
operations from the HVMOP_* ones, into a new HVMCTL_* set,
which will make all of them interface-versioned).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-15 10:21           ` Jan Beulich
  2016-06-15 11:28             ` George Dunlap
@ 2016-06-16  9:30             ` Yu Zhang
  2016-06-16  9:55               ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-16  9:30 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/15/2016 6:21 PM, Jan Beulich wrote:
>>>> On 15.06.16 at 11:50, <george.dunlap@citrix.com> wrote:
>> On 14/06/16 14:31, Jan Beulich wrote:
>>>>>> On 14.06.16 at 15:13, <george.dunlap@citrix.com> wrote:
>>>> On 14/06/16 11:45, Jan Beulich wrote:
>>>>> Locking is somewhat strange here: You protect against the "set"
>>>>> counterpart altering state while you retrieve it, but you don't
>>>>> protect against the returned data becoming stale by the time
>>>>> the caller can consume it. Is that not a problem? (The most
>>>>> concerning case would seem to be a race of hvmop_set_mem_type()
>>>>> with de-registration of the type.)
>>>> How is that different than calling set_mem_type() first, and then
>>>> de-registering without first unmapping all the types?
>>> Didn't we all agree this is something that should be disallowed
>>> anyway (not that I've seen this implemented, i.e. just being
>>> reminded of it by your reply)?
>> I think I suggested it as a good idea, but Paul and Yang both thought it
>> wasn't necessary.  Do you think it should be a requirement?
> I think things shouldn't be left in a half-adjusted state.
>
>> We could have the de-registering operation fail in those circumstances;
>> but probably a more robust thing to do would be to have Xen go change
>> all the ioreq_server entires back to ram_rw (since if the caller just
>> ignores the failure, things are in an even worse state).
> If that's reasonable to do without undue delay (e.g. by using
> the usual "recalculate everything" forced to trickle down through
> the page table levels, then that's as good.

Thanks for your advices, Jan & George.

Previously in the 2nd version, I used p2m_change_entry_type_global() to 
reset the
outstanding p2m_ioreq_server entries back to p2m_ram_rw asynchronously after
the de-registration. But we realized later that this approach means we 
can not support
live migration. And to recalculate the whole p2m table forcefully when 
de-registration
happens means too much cost.

And further discussion with Paul was that we can leave the 
responsibility to reset p2m type
to the device model side, and even a device model fails to do so, the 
affected one will only
be the current VM, neither other VM nor hypervisor will get hurt.

I thought we have reached agreement in the review process of version 2, 
so I removed
this part from version 3.

>
>>>>>> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>>>>> +                           ioreq server. flags with 0 means to unmap the
>>>>>> +                           ioreq server */
>>>>>> +#define _HVMOP_IOREQ_MEM_ACCESS_READ 0
>>>>>> +#define HVMOP_IOREQ_MEM_ACCESS_READ \
>>>>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_READ)
>>>>>> +
>>>>>> +#define _HVMOP_IOREQ_MEM_ACCESS_WRITE 1
>>>>>> +#define HVMOP_IOREQ_MEM_ACCESS_WRITE \
>>>>>> +    (1u << _HVMOP_IOREQ_MEM_ACCESS_WRITE)
>>>>> Is there any use for these _HVMOP_* values? The more that they
>>>>> violate standard C name space rules?
>>>> I assume he's just going along with what he sees in params.h.
>>>> "Violating standard C name space rules" by having #defines which start
>>>> with a single _ seems to be a well-established policy for Xen. :-)
>>> Sadly, and I'm trying to prevent matters becoming worse.
>>> Speaking of which - there are XEN_ prefixes missing here too.
>> Right, so in that case I think I would have said, "I realize that lots
>> of other places in the Xen interface use this sort of template for
>> flags, but I think it's a bad idea and I'm trying to stop it expanding.
>>   Is there any actual need to have the bit numbers defined separately?
>> If not, please just define each flag as (1u << 0), &c."
> Actually my coding style related comment wasn't for these two
> stage definitions - for those I simply questioned whether they're
> needed. My style complaint was for the <underscore><uppercase>
> name pattern (which would simply be avoided by not having the
> individual bit number #define-s).
>
>> I think you've tripped over "changing coding styles" in unfamiliar code
>> before too, so you know how frustrating it is to try to follow the
>> existing coding style only to be told that you did it wrong. :-)
> Agreed, you caught me on this one. Albeit with the slight
> difference that in the public interface we can't as easily correct
> old mistakes to aid people who simply clone surrounding code
> when adding new bits (the possibility of adding #ifdef-ery doesn't
> seem very attractive to me there, unless we got reports of actual
> name space collisions).
>

Hah, I guess these 2 #defines are just cloned from similar ones, and I 
did not expected
they would receive so much comments. Anyway, I admire your preciseness 
and thanks
for pointing this out. :)

Since the bit number #defines have no special meaning, I'd like to just 
define the flags
directly:

#define HVMOP_IOREQ_MEM_ACCESS_READ (1u << 0)
#define HVMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)


B.R.
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-15 12:26       ` Jan Beulich
@ 2016-06-16  9:32         ` Yu Zhang
  2016-06-16 10:02           ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-16  9:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima


> On 6/14/2016 6:45 PM, Jan Beulich wrote:
>>>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>> let one ioreq server claim/disclaim its responsibility for the
>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>> of this HVMOP can specify which kind of operation is supposed
>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>> only support the emulation of write operations. And it can be
>>>> easily extended to support the emulation of read ones if an
>>>> ioreq server has such requirement in the future.
>>> Didn't we determine that this isn't as easy as everyone first thought?
>> My understanding is that to emulate read, we need to change the definition
>> of is_epte_present(), and I do not think this change will cause much
>> trouble.
>> But since no one is using the read emulation, I am convinced the more
>> cautious
>> way is to only support emulations for write operations for now.
> Well, okay. I'd personally drop the "easily", but you know what
> to tell people if later they come ask how this "easily" was meant.

OK. Let's drop the word "easily". :)

>>>> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain
>> *d, ioservid_t id,
>>>>        return rc;
>>>>    }
>>>>    
>>>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>> +                                     uint16_t type, uint32_t flags)
>>> I see no reason why both can't be unsigned int.
>> Parameter type is passed in from the type field inside struct
>> xen_hvm_map_mem_type_to_ioreq_server,
>> which is a uint16_t, followed with a uint16_t pad. Now I am wondering,
>> may be we can just remove the pad
>> field in this structure and just define type as uint32_t.
> I think keeping the interface structure unchanged is the desirable
> route here. What I dislike is the passing around of non-natural
> width types, which is more expensive in terms of processing. I.e.
> as long as a fixed width type (which is necessary to be used in
> the public interface) fits in "unsigned int", that should be the
> respective internal type. Otherwise "unsigned long" etc.
>
> There are cases where even internally we indeed want to use
> fixed width types, and admittedly there are likely far more cases
> where internally fixed width types get used without good reason,
> but just like everywhere else - let's please not make this worse.
> IOW please use fixed width types only when you really need them.
OK. I can keep the interface, and using uint32_t type in the internal 
routine
would means a implicit type conversion from uint16_6, which I do not think
is a problem.

>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -132,6 +132,12 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>>                entry->r = entry->w = entry->x = 1;
>>>>                entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>                break;
>>>> +        case p2m_ioreq_server:
>>>> +            entry->r = entry->x = 1;
>>> Why x?
>> Setting entry->x to 1 is not a must. I can remove it. :)
> Please do. We shouldn't grant permissions without reason.

Got it. Thanks.

>>>> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>>>        default:
>>>>            return flags | _PAGE_NX_BIT;
>>>>        case p2m_grant_map_ro:
>>>> -    case p2m_ioreq_server:
>>>>            return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>>>> +    case p2m_ioreq_server:
>>>> +    {
>>>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
>>>> +
>>>> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
>>>> +            return flags & ~_PAGE_RW;
>>>> +        else
>>>> +            return flags;
>>>> +    }
>>> Same here (for the missing _PAGE_NX) plus no need for braces.
>> I'll remove the brace. And we do not need to set the _PAGE_NX_BIT, like
>> the p2m_ram_ro case I guess.
> I hope you mean the inverse: You should set _PAGE_NX_BIT here.
Oh, right. I meant the reverse. Thanks for the remind. :)
And I have a question,  here in p2m_type_to_flags(), I saw current code 
uses _PAGE_NX_BIT
to disable the executable permission,  and I wonder, why don't we choose 
the _PAGE_NX,
which is defined as:

#define _PAGE_NX       (cpu_has_nx ? _PAGE_NX_BIT : 0)

How do we know for sure that bit 63 from pte is not a reserved one 
without checking
the cpu capability(the cpu_has_nx)? Is there any other reasons, i.e. the 
page tables might
be shared with IOMMU?

>>>> +                         struct hvm_ioreq_server *s)
>>>> +{
>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +    int rc;
>>>> +
>>>> +    spin_lock(&p2m->ioreq.lock);
>>>> +
>>>> +    if ( flags == 0 )
>>>> +    {
>>>> +        rc = -EINVAL;
>>>> +        if ( p2m->ioreq.server != s )
>>>> +            goto out;
>>>> +
>>>> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
>>>> +        p2m->ioreq.server = NULL;
>>>> +        p2m->ioreq.flags = 0;
>>>> +    }
>>> What does "passing" refer to in the comment?
>> It means if this routine is called with flags=0, it will unmap the ioreq
>> server.
> Oh, that's a completely different reading of the comment than I had
> implied: With what you say, "flags" here really refers to the function
> parameter, whereas I implied it to refer to the structure field. I think
> if that's what you want to say, then the comment should be put next
> to the surrounding if() to clarify what "flags" refers to.
Agreed. I'll move this comment above the surrounding if().

>>>> +{
>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +    struct hvm_ioreq_server *s;
>>>> +
>>>> +    spin_lock(&p2m->ioreq.lock);
>>>> +
>>>> +    s = p2m->ioreq.server;
>>>> +    *flags = p2m->ioreq.flags;
>>>> +
>>>> +    spin_unlock(&p2m->ioreq.lock);
>>>> +    return s;
>>>> +}
>>> Locking is somewhat strange here: You protect against the "set"
>>> counterpart altering state while you retrieve it, but you don't
>>> protect against the returned data becoming stale by the time
>>> the caller can consume it. Is that not a problem? (The most
>>> concerning case would seem to be a race of hvmop_set_mem_type()
>>> with de-registration of the type.)
>> Yes. The case you mentioned might happen. But it's not a big deal I
>> guess. If such
>> case happens, the  backend driver will receive an io request and can
>> then discard it
>> if it has just de-registered the mem type.
> Could you clarify in a comment then what the lock is (and is not)
> meant to guard against?

For now, only one ioreq server is allowed to be bounded with 
HVMMEM_ioreq_server,
one usage of this lock is that in p2m_set_ioreq_server(), it can prevent 
concurrent
setting requirements from multiple ioreq servers. And although it can 
not protect the
return value from p2m_get_ioreq_server(), it can still provide some kind 
protection inside
the routine.
I'll add the comments to illustrate this. :)

>>>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>>>> +    domid_t domid;      /* IN - domain to be serviced */
>>>> +    ioservid_t id;      /* IN - ioreq server id */
>>>> +    uint16_t type;      /* IN - memory type */
>>>> +    uint16_t pad;
>>> This field does not appear to get checked in the handler.
>> I am now wondering, how about we remove this pad field and define type
>> as uint32_t?
> As above - I think the current layout is fine. But I'm also not heavily
> opposed to using uint32_t here. It's not a stable interface anyway
> (and I already have a series mostly ready to split off all control
> operations from the HVMOP_* ones, into a new HVMCTL_* set,
> which will make all of them interface-versioned).

I'd like to keep this interface. BTW, you mentioned "this field does not 
appear to
get checked in the handler", do you mean we need to check the pad in the 
handler?
And why?

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-16  9:30             ` Yu Zhang
@ 2016-06-16  9:55               ` Jan Beulich
  2016-06-17 10:17                 ` George Dunlap
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-16  9:55 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 16.06.16 at 11:30, <yu.c.zhang@linux.intel.com> wrote:
> On 6/15/2016 6:21 PM, Jan Beulich wrote:
>>>>> On 15.06.16 at 11:50, <george.dunlap@citrix.com> wrote:
>>> On 14/06/16 14:31, Jan Beulich wrote:
>>>>>>> On 14.06.16 at 15:13, <george.dunlap@citrix.com> wrote:
>>>>> On 14/06/16 11:45, Jan Beulich wrote:
>>>>>> Locking is somewhat strange here: You protect against the "set"
>>>>>> counterpart altering state while you retrieve it, but you don't
>>>>>> protect against the returned data becoming stale by the time
>>>>>> the caller can consume it. Is that not a problem? (The most
>>>>>> concerning case would seem to be a race of hvmop_set_mem_type()
>>>>>> with de-registration of the type.)
>>>>> How is that different than calling set_mem_type() first, and then
>>>>> de-registering without first unmapping all the types?
>>>> Didn't we all agree this is something that should be disallowed
>>>> anyway (not that I've seen this implemented, i.e. just being
>>>> reminded of it by your reply)?
>>> I think I suggested it as a good idea, but Paul and Yang both thought it
>>> wasn't necessary.  Do you think it should be a requirement?
>> I think things shouldn't be left in a half-adjusted state.
>>
>>> We could have the de-registering operation fail in those circumstances;
>>> but probably a more robust thing to do would be to have Xen go change
>>> all the ioreq_server entires back to ram_rw (since if the caller just
>>> ignores the failure, things are in an even worse state).
>> If that's reasonable to do without undue delay (e.g. by using
>> the usual "recalculate everything" forced to trickle down through
>> the page table levels, then that's as good.
> 
> Thanks for your advices, Jan & George.
> 
> Previously in the 2nd version, I used p2m_change_entry_type_global() to 
> reset the
> outstanding p2m_ioreq_server entries back to p2m_ram_rw asynchronously after
> the de-registration. But we realized later that this approach means we 
> can not support
> live migration. And to recalculate the whole p2m table forcefully when 
> de-registration
> happens means too much cost.
> 
> And further discussion with Paul was that we can leave the 
> responsibility to reset p2m type
> to the device model side, and even a device model fails to do so, the 
> affected one will only
> be the current VM, neither other VM nor hypervisor will get hurt.
> 
> I thought we have reached agreement in the review process of version 2, 
> so I removed
> this part from version 3.

In which case I would appreciate the commit message to explain
this (in particular I admit I don't recall why live migration would
be affected by the p2m_change_entry_type_global() approach,
but the request is also so that later readers have at least some
source of information other than searching the mailing list).

> Hah, I guess these 2 #defines are just cloned from similar ones, and I 
> did not expected
> they would receive so much comments. Anyway, I admire your preciseness 
> and thanks
> for pointing this out. :)
> 
> Since the bit number #defines have no special meaning, I'd like to just 
> define the flags
> directly:
> 
> #define HVMOP_IOREQ_MEM_ACCESS_READ (1u << 0)
> #define HVMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)

XEN_HVMOP_* then please.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-16  9:32         ` Yu Zhang
@ 2016-06-16 10:02           ` Jan Beulich
  2016-06-16 11:18             ` Yu Zhang
  2016-06-20  9:05             ` Yu Zhang
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-16 10:02 UTC (permalink / raw)
  To: Andrew Cooper, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Tim Deegan, xen-devel, Paul Durrant,
	zhiyuan.lv, Jun Nakajima

>>> On 16.06.16 at 11:32, <yu.c.zhang@linux.intel.com> wrote:
>> On 6/14/2016 6:45 PM, Jan Beulich wrote:
>>>>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>>>>> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain
>>> *d, ioservid_t id,
>>>>>        return rc;
>>>>>    }
>>>>>    
>>>>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>>> +                                     uint16_t type, uint32_t flags)
>>>> I see no reason why both can't be unsigned int.
>>> Parameter type is passed in from the type field inside struct
>>> xen_hvm_map_mem_type_to_ioreq_server,
>>> which is a uint16_t, followed with a uint16_t pad. Now I am wondering,
>>> may be we can just remove the pad
>>> field in this structure and just define type as uint32_t.
>> I think keeping the interface structure unchanged is the desirable
>> route here. What I dislike is the passing around of non-natural
>> width types, which is more expensive in terms of processing. I.e.
>> as long as a fixed width type (which is necessary to be used in
>> the public interface) fits in "unsigned int", that should be the
>> respective internal type. Otherwise "unsigned long" etc.
>>
>> There are cases where even internally we indeed want to use
>> fixed width types, and admittedly there are likely far more cases
>> where internally fixed width types get used without good reason,
>> but just like everywhere else - let's please not make this worse.
>> IOW please use fixed width types only when you really need them.
> OK. I can keep the interface, and using uint32_t type in the internal 
> routine
> would means a implicit type conversion from uint16_6, which I do not think
> is a problem.

Just to reiterate: Unless there is a specific need, please avoid
fixed width integer types for any internal use.

>>>>> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>>>>        default:
>>>>>            return flags | _PAGE_NX_BIT;
>>>>>        case p2m_grant_map_ro:
>>>>> -    case p2m_ioreq_server:
>>>>>            return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>>>>> +    case p2m_ioreq_server:
>>>>> +    {
>>>>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
>>>>> +
>>>>> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
>>>>> +            return flags & ~_PAGE_RW;
>>>>> +        else
>>>>> +            return flags;
>>>>> +    }
>>>> Same here (for the missing _PAGE_NX) plus no need for braces.
>>> I'll remove the brace. And we do not need to set the _PAGE_NX_BIT, like
>>> the p2m_ram_ro case I guess.
>> I hope you mean the inverse: You should set _PAGE_NX_BIT here.
> Oh, right. I meant the reverse. Thanks for the remind. :)
> And I have a question,  here in p2m_type_to_flags(), I saw current code 
> uses _PAGE_NX_BIT
> to disable the executable permission,  and I wonder, why don't we choose 
> the _PAGE_NX,
> which is defined as:
> 
> #define _PAGE_NX       (cpu_has_nx ? _PAGE_NX_BIT : 0)
> 
> How do we know for sure that bit 63 from pte is not a reserved one 
> without checking
> the cpu capability(the cpu_has_nx)? Is there any other reasons, i.e. the 
> page tables might
> be shared with IOMMU?

Please wait for Andrew to confirm this (or correct me) - there are
some differences between AMD and Intel, and iirc the bit gets
ignored by AMD when NX is off.

>>>>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>>>>> +    domid_t domid;      /* IN - domain to be serviced */
>>>>> +    ioservid_t id;      /* IN - ioreq server id */
>>>>> +    uint16_t type;      /* IN - memory type */
>>>>> +    uint16_t pad;
>>>> This field does not appear to get checked in the handler.
>>> I am now wondering, how about we remove this pad field and define type
>>> as uint32_t?
>> As above - I think the current layout is fine. But I'm also not heavily
>> opposed to using uint32_t here. It's not a stable interface anyway
>> (and I already have a series mostly ready to split off all control
>> operations from the HVMOP_* ones, into a new HVMCTL_* set,
>> which will make all of them interface-versioned).
> 
> I'd like to keep this interface. BTW, you mentioned "this field does not 
> appear to
> get checked in the handler", do you mean we need to check the pad in the 
> handler?

Yes.

> And why?

In order to be able to later assign meaning to it without breaking
existing users.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-16 10:02           ` Jan Beulich
@ 2016-06-16 11:18             ` Yu Zhang
  2016-06-16 12:43               ` Jan Beulich
  2016-06-20  9:05             ` Yu Zhang
  1 sibling, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-16 11:18 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Kevin Tian, George Dunlap, Tim Deegan, xen-devel, Paul Durrant,
	zhiyuan.lv, Jun Nakajima



On 6/16/2016 6:02 PM, Jan Beulich wrote:
>>>>>> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>>>>>         default:
>>>>>>             return flags | _PAGE_NX_BIT;
>>>>>>         case p2m_grant_map_ro:
>>>>>> -    case p2m_ioreq_server:
>>>>>>             return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>>>>>> +    case p2m_ioreq_server:
>>>>>> +    {
>>>>>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
>>>>>> +
>>>>>> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
>>>>>> +            return flags & ~_PAGE_RW;
>>>>>> +        else
>>>>>> +            return flags;
>>>>>> +    }
>>>>> Same here (for the missing _PAGE_NX) plus no need for braces.
>>>> I'll remove the brace. And we do not need to set the _PAGE_NX_BIT, like
>>>> the p2m_ram_ro case I guess.
>>> I hope you mean the inverse: You should set _PAGE_NX_BIT here.
>> Oh, right. I meant the reverse. Thanks for the remind. :)
>> And I have a question,  here in p2m_type_to_flags(), I saw current code
>> uses _PAGE_NX_BIT
>> to disable the executable permission,  and I wonder, why don't we choose
>> the _PAGE_NX,
>> which is defined as:
>>
>> #define _PAGE_NX       (cpu_has_nx ? _PAGE_NX_BIT : 0)
>>
>> How do we know for sure that bit 63 from pte is not a reserved one
>> without checking
>> the cpu capability(the cpu_has_nx)? Is there any other reasons, i.e. the
>> page tables might
>> be shared with IOMMU?
> Please wait for Andrew to confirm this (or correct me) - there are
> some differences between AMD and Intel, and iirc the bit gets
> ignored by AMD when NX is off.
>
>>>>>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>>>>>> +    domid_t domid;      /* IN - domain to be serviced */
>>>>>> +    ioservid_t id;      /* IN - ioreq server id */
>>>>>> +    uint16_t type;      /* IN - memory type */
>>>>>> +    uint16_t pad;
>>>>> This field does not appear to get checked in the handler.
>>>> I am now wondering, how about we remove this pad field and define type
>>>> as uint32_t?
>>> As above - I think the current layout is fine. But I'm also not heavily
>>> opposed to using uint32_t here. It's not a stable interface anyway
>>> (and I already have a series mostly ready to split off all control
>>> operations from the HVMOP_* ones, into a new HVMCTL_* set,
>>> which will make all of them interface-versioned).
>> I'd like to keep this interface. BTW, you mentioned "this field does not
>> appear to
>> get checked in the handler", do you mean we need to check the pad in the
>> handler?
> Yes.
>
>> And why?
> In order to be able to later assign meaning to it without breaking
> existing users.

So the handler need to assure the pad is 0, right?

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-16 11:18             ` Yu Zhang
@ 2016-06-16 12:43               ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-16 12:43 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 16.06.16 at 13:18, <yu.c.zhang@linux.intel.com> wrote:
> On 6/16/2016 6:02 PM, Jan Beulich wrote:
>>>>>>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>>>>>>> +    domid_t domid;      /* IN - domain to be serviced */
>>>>>>> +    ioservid_t id;      /* IN - ioreq server id */
>>>>>>> +    uint16_t type;      /* IN - memory type */
>>>>>>> +    uint16_t pad;
>>>>>> This field does not appear to get checked in the handler.
>>>>> I am now wondering, how about we remove this pad field and define type
>>>>> as uint32_t?
>>>> As above - I think the current layout is fine. But I'm also not heavily
>>>> opposed to using uint32_t here. It's not a stable interface anyway
>>>> (and I already have a series mostly ready to split off all control
>>>> operations from the HVMOP_* ones, into a new HVMCTL_* set,
>>>> which will make all of them interface-versioned).
>>> I'd like to keep this interface. BTW, you mentioned "this field does not
>>> appear to
>>> get checked in the handler", do you mean we need to check the pad in the
>>> handler?
>> Yes.
>>
>>> And why?
>> In order to be able to later assign meaning to it without breaking
>> existing users.
> 
> So the handler need to assure the pad is 0, right?

Correct.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-16  9:55               ` Jan Beulich
@ 2016-06-17 10:17                 ` George Dunlap
  2016-06-20  9:03                   ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-17 10:17 UTC (permalink / raw)
  To: Jan Beulich, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 16/06/16 10:55, Jan Beulich wrote:
>> Previously in the 2nd version, I used p2m_change_entry_type_global() to 
>> reset the
>> outstanding p2m_ioreq_server entries back to p2m_ram_rw asynchronously after
>> the de-registration. But we realized later that this approach means we 
>> can not support
>> live migration. And to recalculate the whole p2m table forcefully when 
>> de-registration
>> happens means too much cost.
>>
>> And further discussion with Paul was that we can leave the 
>> responsibility to reset p2m type
>> to the device model side, and even a device model fails to do so, the 
>> affected one will only
>> be the current VM, neither other VM nor hypervisor will get hurt.
>>
>> I thought we have reached agreement in the review process of version 2, 
>> so I removed
>> this part from version 3.
> 
> In which case I would appreciate the commit message to explain
> this (in particular I admit I don't recall why live migration would
> be affected by the p2m_change_entry_type_global() approach,
> but the request is also so that later readers have at least some
> source of information other than searching the mailing list).

Yes, I don't see why either.  You wouldn't de-register the ioreq server
until after the final sweep after the VM has been paused, right?  At
which point the lazy p2m re-calculation shouldn't really matter much I
don't think.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-17 10:17                 ` George Dunlap
@ 2016-06-20  9:03                   ` Yu Zhang
  2016-06-20 10:10                     ` George Dunlap
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-20  9:03 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/17/2016 6:17 PM, George Dunlap wrote:
> On 16/06/16 10:55, Jan Beulich wrote:
>>> Previously in the 2nd version, I used p2m_change_entry_type_global() to
>>> reset the
>>> outstanding p2m_ioreq_server entries back to p2m_ram_rw asynchronously after
>>> the de-registration. But we realized later that this approach means we
>>> can not support
>>> live migration. And to recalculate the whole p2m table forcefully when
>>> de-registration
>>> happens means too much cost.
>>>
>>> And further discussion with Paul was that we can leave the
>>> responsibility to reset p2m type
>>> to the device model side, and even a device model fails to do so, the
>>> affected one will only
>>> be the current VM, neither other VM nor hypervisor will get hurt.
>>>
>>> I thought we have reached agreement in the review process of version 2,
>>> so I removed
>>> this part from version 3.
>> In which case I would appreciate the commit message to explain
>> this (in particular I admit I don't recall why live migration would
>> be affected by the p2m_change_entry_type_global() approach,
>> but the request is also so that later readers have at least some
>> source of information other than searching the mailing list).
> Yes, I don't see why either.  You wouldn't de-register the ioreq server
> until after the final sweep after the VM has been paused, right?  At
> which point the lazy p2m re-calculation shouldn't really matter much I
> don't think.

Oh, seems I need to give some explanation, and sorry for the late reply.

IIUC, p2m_change_entry_type_global() only sets the e.emt field to an 
invalid value and turn on
the e.recal flag; the real p2m reset is done in resolve_misconfig() when 
ept misconfiguration
happens or when ept_set_entry() is called.

In the 2nd version patch, we leveraged this approach, by adding 
p2m_ioreq_server into the
P2M_CHANGEABLE_TYPES, and triggering the p2m_change_entry_type_global() 
when an ioreq
server is unbounded, hoping that later accesses to these gfns will reset 
the p2m type back to
p2m_ram_rw. And for the recalculation itself, it works.

However, there are conflicts if we take live migration  into account, 
i.e. if the live migration is
triggered by the user(unintentionally maybe) during the gpu emulation 
process, resolve_misconfig()
will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty, 
which is not what we expected,
because our intention is to only reset the outdated p2m_ioreq_server 
entries back to p2m_ram_rw.
Adding special treatment for p2m_ioreq_server in resolve_misconfig() is 
not enough, because we can
not judge if the gpu emulation is in process by checking if 
p2m->ioreq_server is NULL - it might be
detached from ioreq server A(with some p2m_ioreq_server entries left to 
be recalculated) and then
attached to ioreq server B.

So one solution is to disallow the log dirty feature in XenGT, i.e. just 
return failure when enable_logdirty()
is called in toolstack. But I'm afraid this will restrict XenGT's future 
live migration feature.

Another proposal is to reset the p2m type by traversing the ept table 
synchronously when ioreq server is
detached, but this approach is time consuming.

So after further discussion with Paul, our conclusion is that, the p2m 
type resetting after the ioreq
server detachment is not a must. The worst case is wrong ioreq server be 
notified, but it will not affect
other VMs or the hypervisor. And it should be the device model's 
responsibility to take care of its
correctness. And if XenGT live migration is to be supported in the 
future, we can still leverage the log
dirty code to keep track of the normal guest ram pages, and for the 
emulated guest rams(i.e. gpu page
tables), device model's cooperation would be necessary.

Thanks
Yu



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-16 10:02           ` Jan Beulich
  2016-06-16 11:18             ` Yu Zhang
@ 2016-06-20  9:05             ` Yu Zhang
  1 sibling, 0 replies; 68+ messages in thread
From: Yu Zhang @ 2016-06-20  9:05 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Kevin Tian, George Dunlap, Tim Deegan, xen-devel, Paul Durrant,
	zhiyuan.lv, Jun Nakajima



On 6/16/2016 6:02 PM, Jan Beulich wrote:
>>>> On 16.06.16 at 11:32, <yu.c.zhang@linux.intel.com> wrote:
>>> On 6/14/2016 6:45 PM, Jan Beulich wrote:
>>>>>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain
>>>> *d, ioservid_t id,
>>>>>>         return rc;
>>>>>>     }
>>>>>>     
>>>>>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>>>> +                                     uint16_t type, uint32_t flags)
>>>>> I see no reason why both can't be unsigned int.
>>>> Parameter type is passed in from the type field inside struct
>>>> xen_hvm_map_mem_type_to_ioreq_server,
>>>> which is a uint16_t, followed with a uint16_t pad. Now I am wondering,
>>>> may be we can just remove the pad
>>>> field in this structure and just define type as uint32_t.
>>> I think keeping the interface structure unchanged is the desirable
>>> route here. What I dislike is the passing around of non-natural
>>> width types, which is more expensive in terms of processing. I.e.
>>> as long as a fixed width type (which is necessary to be used in
>>> the public interface) fits in "unsigned int", that should be the
>>> respective internal type. Otherwise "unsigned long" etc.
>>>
>>> There are cases where even internally we indeed want to use
>>> fixed width types, and admittedly there are likely far more cases
>>> where internally fixed width types get used without good reason,
>>> but just like everywhere else - let's please not make this worse.
>>> IOW please use fixed width types only when you really need them.
>> OK. I can keep the interface, and using uint32_t type in the internal
>> routine
>> would means a implicit type conversion from uint16_6, which I do not think
>> is a problem.
> Just to reiterate: Unless there is a specific need, please avoid
> fixed width integer types for any internal use.
>
>>>>>> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>>>>>         default:
>>>>>>             return flags | _PAGE_NX_BIT;
>>>>>>         case p2m_grant_map_ro:
>>>>>> -    case p2m_ioreq_server:
>>>>>>             return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>>>>>> +    case p2m_ioreq_server:
>>>>>> +    {
>>>>>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
>>>>>> +
>>>>>> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
>>>>>> +            return flags & ~_PAGE_RW;
>>>>>> +        else
>>>>>> +            return flags;
>>>>>> +    }
>>>>> Same here (for the missing _PAGE_NX) plus no need for braces.
>>>> I'll remove the brace. And we do not need to set the _PAGE_NX_BIT, like
>>>> the p2m_ram_ro case I guess.
>>> I hope you mean the inverse: You should set _PAGE_NX_BIT here.
>> Oh, right. I meant the reverse. Thanks for the remind. :)
>> And I have a question,  here in p2m_type_to_flags(), I saw current code
>> uses _PAGE_NX_BIT
>> to disable the executable permission,  and I wonder, why don't we choose
>> the _PAGE_NX,
>> which is defined as:
>>
>> #define _PAGE_NX       (cpu_has_nx ? _PAGE_NX_BIT : 0)
>>
>> How do we know for sure that bit 63 from pte is not a reserved one
>> without checking
>> the cpu capability(the cpu_has_nx)? Is there any other reasons, i.e. the
>> page tables might
>> be shared with IOMMU?
> Please wait for Andrew to confirm this (or correct me) - there are
> some differences between AMD and Intel, and iirc the bit gets
> ignored by AMD when NX is off.
>

Hi Andrew, sorry to bother you. Any comments on this? Thanks! :)

Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20  9:03                   ` Yu Zhang
@ 2016-06-20 10:10                     ` George Dunlap
  2016-06-20 10:25                       ` Jan Beulich
  2016-06-20 10:30                       ` Yu Zhang
  0 siblings, 2 replies; 68+ messages in thread
From: George Dunlap @ 2016-06-20 10:10 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 20/06/16 10:03, Yu Zhang wrote:
> 
> 
> On 6/17/2016 6:17 PM, George Dunlap wrote:
>> On 16/06/16 10:55, Jan Beulich wrote:
>>>> Previously in the 2nd version, I used p2m_change_entry_type_global() to
>>>> reset the
>>>> outstanding p2m_ioreq_server entries back to p2m_ram_rw
>>>> asynchronously after
>>>> the de-registration. But we realized later that this approach means we
>>>> can not support
>>>> live migration. And to recalculate the whole p2m table forcefully when
>>>> de-registration
>>>> happens means too much cost.
>>>>
>>>> And further discussion with Paul was that we can leave the
>>>> responsibility to reset p2m type
>>>> to the device model side, and even a device model fails to do so, the
>>>> affected one will only
>>>> be the current VM, neither other VM nor hypervisor will get hurt.
>>>>
>>>> I thought we have reached agreement in the review process of version 2,
>>>> so I removed
>>>> this part from version 3.
>>> In which case I would appreciate the commit message to explain
>>> this (in particular I admit I don't recall why live migration would
>>> be affected by the p2m_change_entry_type_global() approach,
>>> but the request is also so that later readers have at least some
>>> source of information other than searching the mailing list).
>> Yes, I don't see why either.  You wouldn't de-register the ioreq server
>> until after the final sweep after the VM has been paused, right?  At
>> which point the lazy p2m re-calculation shouldn't really matter much I
>> don't think.
> 
> Oh, seems I need to give some explanation, and sorry for the late reply.
> 
> IIUC, p2m_change_entry_type_global() only sets the e.emt field to an
> invalid value and turn on
> the e.recal flag; the real p2m reset is done in resolve_misconfig() when
> ept misconfiguration
> happens or when ept_set_entry() is called.
> 
> In the 2nd version patch, we leveraged this approach, by adding
> p2m_ioreq_server into the
> P2M_CHANGEABLE_TYPES, and triggering the p2m_change_entry_type_global()
> when an ioreq
> server is unbounded, hoping that later accesses to these gfns will reset
> the p2m type back to
> p2m_ram_rw. And for the recalculation itself, it works.
> 
> However, there are conflicts if we take live migration  into account,
> i.e. if the live migration is
> triggered by the user(unintentionally maybe) during the gpu emulation
> process, resolve_misconfig()
> will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty,
> which is not what we expected,
> because our intention is to only reset the outdated p2m_ioreq_server
> entries back to p2m_ram_rw.

Well the real problem in the situation you describe is that a second
"lazy" p2m_change_entry_type_global() operation is starting before the
first one is finished.  All that's needed to resolve the situation is
that if you get a second p2m_change_entry_type_global() operation while
there are outstanding entries from the first type change, you have to
finish the first operation (i.e., go "eagerly" find all the
misconfigured entries and change them to the new type) before starting
the second one.


> So one solution is to disallow the log dirty feature in XenGT, i.e. just
> return failure when enable_logdirty()
> is called in toolstack. But I'm afraid this will restrict XenGT's future
> live migration feature.

I don't understand this -- you can return -EBUSY if live migration is
attempted while there are outstanding ioreq_server entries for the time
being, and at some point in the future when this actually works, you can
return success.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:10                     ` George Dunlap
@ 2016-06-20 10:25                       ` Jan Beulich
  2016-06-20 10:32                         ` George Dunlap
  2016-06-20 10:30                       ` Yu Zhang
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-20 10:25 UTC (permalink / raw)
  To: George Dunlap, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 20.06.16 at 12:10, <george.dunlap@citrix.com> wrote:
> On 20/06/16 10:03, Yu Zhang wrote:
>> However, there are conflicts if we take live migration  into account,
>> i.e. if the live migration is
>> triggered by the user(unintentionally maybe) during the gpu emulation
>> process, resolve_misconfig()
>> will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty,
>> which is not what we expected,
>> because our intention is to only reset the outdated p2m_ioreq_server
>> entries back to p2m_ram_rw.
> 
> Well the real problem in the situation you describe is that a second
> "lazy" p2m_change_entry_type_global() operation is starting before the
> first one is finished.  All that's needed to resolve the situation is
> that if you get a second p2m_change_entry_type_global() operation while
> there are outstanding entries from the first type change, you have to
> finish the first operation (i.e., go "eagerly" find all the
> misconfigured entries and change them to the new type) before starting
> the second one.

Eager resolution of outstanding entries can't be the solution here, I
think, as that would - afaict - be as time consuming as doing the type
change synchronously right away. p2m_change_entry_type_global(),
at least right now, can be invoked freely without prior type changes
having fully propagated. The logic resolving mis-configured entries
simply needs to be able to know the correct new type. I can't see
why this logic shouldn't therefore be extensible to this new type
which can be in flight - after we ought to have a way to know what
type a particular GFN is supposed to be?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:10                     ` George Dunlap
  2016-06-20 10:25                       ` Jan Beulich
@ 2016-06-20 10:30                       ` Yu Zhang
  2016-06-20 10:43                         ` George Dunlap
  2016-06-20 10:45                         ` Jan Beulich
  1 sibling, 2 replies; 68+ messages in thread
From: Yu Zhang @ 2016-06-20 10:30 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/20/2016 6:10 PM, George Dunlap wrote:
> On 20/06/16 10:03, Yu Zhang wrote:
>>
>> On 6/17/2016 6:17 PM, George Dunlap wrote:
>>> On 16/06/16 10:55, Jan Beulich wrote:
>>>>> Previously in the 2nd version, I used p2m_change_entry_type_global() to
>>>>> reset the
>>>>> outstanding p2m_ioreq_server entries back to p2m_ram_rw
>>>>> asynchronously after
>>>>> the de-registration. But we realized later that this approach means we
>>>>> can not support
>>>>> live migration. And to recalculate the whole p2m table forcefully when
>>>>> de-registration
>>>>> happens means too much cost.
>>>>>
>>>>> And further discussion with Paul was that we can leave the
>>>>> responsibility to reset p2m type
>>>>> to the device model side, and even a device model fails to do so, the
>>>>> affected one will only
>>>>> be the current VM, neither other VM nor hypervisor will get hurt.
>>>>>
>>>>> I thought we have reached agreement in the review process of version 2,
>>>>> so I removed
>>>>> this part from version 3.
>>>> In which case I would appreciate the commit message to explain
>>>> this (in particular I admit I don't recall why live migration would
>>>> be affected by the p2m_change_entry_type_global() approach,
>>>> but the request is also so that later readers have at least some
>>>> source of information other than searching the mailing list).
>>> Yes, I don't see why either.  You wouldn't de-register the ioreq server
>>> until after the final sweep after the VM has been paused, right?  At
>>> which point the lazy p2m re-calculation shouldn't really matter much I
>>> don't think.
>> Oh, seems I need to give some explanation, and sorry for the late reply.
>>
>> IIUC, p2m_change_entry_type_global() only sets the e.emt field to an
>> invalid value and turn on
>> the e.recal flag; the real p2m reset is done in resolve_misconfig() when
>> ept misconfiguration
>> happens or when ept_set_entry() is called.
>>
>> In the 2nd version patch, we leveraged this approach, by adding
>> p2m_ioreq_server into the
>> P2M_CHANGEABLE_TYPES, and triggering the p2m_change_entry_type_global()
>> when an ioreq
>> server is unbounded, hoping that later accesses to these gfns will reset
>> the p2m type back to
>> p2m_ram_rw. And for the recalculation itself, it works.
>>
>> However, there are conflicts if we take live migration  into account,
>> i.e. if the live migration is
>> triggered by the user(unintentionally maybe) during the gpu emulation
>> process, resolve_misconfig()
>> will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty,
>> which is not what we expected,
>> because our intention is to only reset the outdated p2m_ioreq_server
>> entries back to p2m_ram_rw.
> Well the real problem in the situation you describe is that a second
> "lazy" p2m_change_entry_type_global() operation is starting before the
> first one is finished.  All that's needed to resolve the situation is
> that if you get a second p2m_change_entry_type_global() operation while
> there are outstanding entries from the first type change, you have to
> finish the first operation (i.e., go "eagerly" find all the
> misconfigured entries and change them to the new type) before starting
> the second one.

Thanks for your reply, George.  :)
I think this could also happen even when there's no first round 
p2m_change_entry_type_global(),
the resolve_misconfig() will also change normal p2m_ioreq_server entries 
back to p2m_log_dirty.

By "go 'eagerly'", do you mean traverse the ept table? Wouldn't that be 
time consuming
also?

>
>> So one solution is to disallow the log dirty feature in XenGT, i.e. just
>> return failure when enable_logdirty()
>> is called in toolstack. But I'm afraid this will restrict XenGT's future
>> live migration feature.
> I don't understand this -- you can return -EBUSY if live migration is
> attempted while there are outstanding ioreq_server entries for the time
> being, and at some point in the future when this actually works, you can
> return success.
>

Well, the problem is we cannot easily tell if there's any outstanding 
p2m_ioreq_server entries.
Besides, do you agree it is the responsibility of device model to do the 
cleaning?

Thanks
Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:25                       ` Jan Beulich
@ 2016-06-20 10:32                         ` George Dunlap
  2016-06-20 10:55                           ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-20 10:32 UTC (permalink / raw)
  To: Jan Beulich, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 20/06/16 11:25, Jan Beulich wrote:
>>>> On 20.06.16 at 12:10, <george.dunlap@citrix.com> wrote:
>> On 20/06/16 10:03, Yu Zhang wrote:
>>> However, there are conflicts if we take live migration  into account,
>>> i.e. if the live migration is
>>> triggered by the user(unintentionally maybe) during the gpu emulation
>>> process, resolve_misconfig()
>>> will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty,
>>> which is not what we expected,
>>> because our intention is to only reset the outdated p2m_ioreq_server
>>> entries back to p2m_ram_rw.
>>
>> Well the real problem in the situation you describe is that a second
>> "lazy" p2m_change_entry_type_global() operation is starting before the
>> first one is finished.  All that's needed to resolve the situation is
>> that if you get a second p2m_change_entry_type_global() operation while
>> there are outstanding entries from the first type change, you have to
>> finish the first operation (i.e., go "eagerly" find all the
>> misconfigured entries and change them to the new type) before starting
>> the second one.
> 
> Eager resolution of outstanding entries can't be the solution here, I
> think, as that would - afaict - be as time consuming as doing the type
> change synchronously right away.

But isn't it the case that p2m_change_entry_type_global() is only
implemented for EPT?  So we've been doing the slow method for both
shadow and AMD HAP (whatever it's called these days) since the
beginning.  And in any case we'd only have to go for the "slow" case in
circumstances where the 2nd type change happened before the first one
had completed.

>  p2m_change_entry_type_global(),
> at least right now, can be invoked freely without prior type changes
> having fully propagated. The logic resolving mis-configured entries
> simply needs to be able to know the correct new type. I can't see
> why this logic shouldn't therefore be extensible to this new type
> which can be in flight - after we ought to have a way to know what
> type a particular GFN is supposed to be?

Actually, come to think of it -- since the first type change is meant to
convert all ioreq_server -> ram_rw, and the second is meant to change
all ram_rw -> logdirty,  is there any case in which we *wouldn't* want
the resulting type to be logdirty?  Isn't that exactly what we'd get if
we'd done both operations synchronously?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:30                       ` Yu Zhang
@ 2016-06-20 10:43                         ` George Dunlap
  2016-06-20 10:45                         ` Jan Beulich
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-06-20 10:43 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 20/06/16 11:30, Yu Zhang wrote:
> 
> 
> On 6/20/2016 6:10 PM, George Dunlap wrote:
>> On 20/06/16 10:03, Yu Zhang wrote:
>>>
>>> On 6/17/2016 6:17 PM, George Dunlap wrote:
>>>> On 16/06/16 10:55, Jan Beulich wrote:
>>>>>> Previously in the 2nd version, I used
>>>>>> p2m_change_entry_type_global() to
>>>>>> reset the
>>>>>> outstanding p2m_ioreq_server entries back to p2m_ram_rw
>>>>>> asynchronously after
>>>>>> the de-registration. But we realized later that this approach
>>>>>> means we
>>>>>> can not support
>>>>>> live migration. And to recalculate the whole p2m table forcefully
>>>>>> when
>>>>>> de-registration
>>>>>> happens means too much cost.
>>>>>>
>>>>>> And further discussion with Paul was that we can leave the
>>>>>> responsibility to reset p2m type
>>>>>> to the device model side, and even a device model fails to do so, the
>>>>>> affected one will only
>>>>>> be the current VM, neither other VM nor hypervisor will get hurt.
>>>>>>
>>>>>> I thought we have reached agreement in the review process of
>>>>>> version 2,
>>>>>> so I removed
>>>>>> this part from version 3.
>>>>> In which case I would appreciate the commit message to explain
>>>>> this (in particular I admit I don't recall why live migration would
>>>>> be affected by the p2m_change_entry_type_global() approach,
>>>>> but the request is also so that later readers have at least some
>>>>> source of information other than searching the mailing list).
>>>> Yes, I don't see why either.  You wouldn't de-register the ioreq server
>>>> until after the final sweep after the VM has been paused, right?  At
>>>> which point the lazy p2m re-calculation shouldn't really matter much I
>>>> don't think.
>>> Oh, seems I need to give some explanation, and sorry for the late reply.
>>>
>>> IIUC, p2m_change_entry_type_global() only sets the e.emt field to an
>>> invalid value and turn on
>>> the e.recal flag; the real p2m reset is done in resolve_misconfig() when
>>> ept misconfiguration
>>> happens or when ept_set_entry() is called.
>>>
>>> In the 2nd version patch, we leveraged this approach, by adding
>>> p2m_ioreq_server into the
>>> P2M_CHANGEABLE_TYPES, and triggering the p2m_change_entry_type_global()
>>> when an ioreq
>>> server is unbounded, hoping that later accesses to these gfns will reset
>>> the p2m type back to
>>> p2m_ram_rw. And for the recalculation itself, it works.
>>>
>>> However, there are conflicts if we take live migration  into account,
>>> i.e. if the live migration is
>>> triggered by the user(unintentionally maybe) during the gpu emulation
>>> process, resolve_misconfig()
>>> will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty,
>>> which is not what we expected,
>>> because our intention is to only reset the outdated p2m_ioreq_server
>>> entries back to p2m_ram_rw.
>> Well the real problem in the situation you describe is that a second
>> "lazy" p2m_change_entry_type_global() operation is starting before the
>> first one is finished.  All that's needed to resolve the situation is
>> that if you get a second p2m_change_entry_type_global() operation while
>> there are outstanding entries from the first type change, you have to
>> finish the first operation (i.e., go "eagerly" find all the
>> misconfigured entries and change them to the new type) before starting
>> the second one.
> 
> Thanks for your reply, George.  :)
> I think this could also happen even when there's no first round
> p2m_change_entry_type_global(),
> the resolve_misconfig() will also change normal p2m_ioreq_server entries
> back to p2m_log_dirty.
> 
> By "go 'eagerly'", do you mean traverse the ept table? Wouldn't that be
> time consuming
> also?

Yes, but it would only need to be done in the cases where there happened
to be a collision.  And isn't it the case that we have to do things the
long way for all non-EPT guests (either shadow or AMD HAP) anyway?

>>> So one solution is to disallow the log dirty feature in XenGT, i.e. just
>>> return failure when enable_logdirty()
>>> is called in toolstack. But I'm afraid this will restrict XenGT's future
>>> live migration feature.
>> I don't understand this -- you can return -EBUSY if live migration is
>> attempted while there are outstanding ioreq_server entries for the time
>> being, and at some point in the future when this actually works, you can
>> return success.
>>
> 
> Well, the problem is we cannot easily tell if there's any outstanding
> p2m_ioreq_server entries.

Well at very least we could count if we needed to. :-)

> Besides, do you agree it is the responsibility of device model to do the
> cleaning?

I don't necessarily think so.  When qemu exits, for instance, dom0 will
automatically unmap all the references dom0 had to the guests' RAM --
that's part of the job of what operating systems do.  It just seems like
a more robust interface to have Xen clean up regardless of what the
guest does.

 -George



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:30                       ` Yu Zhang
  2016-06-20 10:43                         ` George Dunlap
@ 2016-06-20 10:45                         ` Jan Beulich
  2016-06-20 11:06                           ` Yu Zhang
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-20 10:45 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 20.06.16 at 12:30, <yu.c.zhang@linux.intel.com> wrote:
> On 6/20/2016 6:10 PM, George Dunlap wrote:
>> On 20/06/16 10:03, Yu Zhang wrote:
>>> So one solution is to disallow the log dirty feature in XenGT, i.e. just
>>> return failure when enable_logdirty()
>>> is called in toolstack. But I'm afraid this will restrict XenGT's future
>>> live migration feature.
>> I don't understand this -- you can return -EBUSY if live migration is
>> attempted while there are outstanding ioreq_server entries for the time
>> being, and at some point in the future when this actually works, you can
>> return success.
>>
> 
> Well, the problem is we cannot easily tell if there's any outstanding 
> p2m_ioreq_server entries.

That's easy to address: Keep a running count.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:32                         ` George Dunlap
@ 2016-06-20 10:55                           ` Jan Beulich
  2016-06-20 11:28                             ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-20 10:55 UTC (permalink / raw)
  To: George Dunlap, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 20.06.16 at 12:32, <george.dunlap@citrix.com> wrote:
> On 20/06/16 11:25, Jan Beulich wrote:
>>>>> On 20.06.16 at 12:10, <george.dunlap@citrix.com> wrote:
>>> On 20/06/16 10:03, Yu Zhang wrote:
>>>> However, there are conflicts if we take live migration  into account,
>>>> i.e. if the live migration is
>>>> triggered by the user(unintentionally maybe) during the gpu emulation
>>>> process, resolve_misconfig()
>>>> will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty,
>>>> which is not what we expected,
>>>> because our intention is to only reset the outdated p2m_ioreq_server
>>>> entries back to p2m_ram_rw.
>>>
>>> Well the real problem in the situation you describe is that a second
>>> "lazy" p2m_change_entry_type_global() operation is starting before the
>>> first one is finished.  All that's needed to resolve the situation is
>>> that if you get a second p2m_change_entry_type_global() operation while
>>> there are outstanding entries from the first type change, you have to
>>> finish the first operation (i.e., go "eagerly" find all the
>>> misconfigured entries and change them to the new type) before starting
>>> the second one.
>> 
>> Eager resolution of outstanding entries can't be the solution here, I
>> think, as that would - afaict - be as time consuming as doing the type
>> change synchronously right away.
> 
> But isn't it the case that p2m_change_entry_type_global() is only
> implemented for EPT?

Also for NPT, we're using a similar model in p2m-pt.c (see e.g. the
uses of RECALC_FLAGS - we're utilizing the _PAGE_USER set
unconditionally leads to NPF). And since shadow sits on top of
p2m-pt, that should be covered too.

>  So we've been doing the slow method for both
> shadow and AMD HAP (whatever it's called these days) since the
> beginning.  And in any case we'd only have to go for the "slow" case in
> circumstances where the 2nd type change happened before the first one
> had completed.

We can't even tell when one have fully finished.

>>  p2m_change_entry_type_global(),
>> at least right now, can be invoked freely without prior type changes
>> having fully propagated. The logic resolving mis-configured entries
>> simply needs to be able to know the correct new type. I can't see
>> why this logic shouldn't therefore be extensible to this new type
>> which can be in flight - after we ought to have a way to know what
>> type a particular GFN is supposed to be?
> 
> Actually, come to think of it -- since the first type change is meant to
> convert all ioreq_server -> ram_rw, and the second is meant to change
> all ram_rw -> logdirty,  is there any case in which we *wouldn't* want
> the resulting type to be logdirty?  Isn't that exactly what we'd get if
> we'd done both operations synchronously?

I think Yu's concern is for pages which did not get converted back?
Or on the restore side? Otherwise - "yes" to both of your questions.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:45                         ` Jan Beulich
@ 2016-06-20 11:06                           ` Yu Zhang
  2016-06-20 11:20                             ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-20 11:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima



On 6/20/2016 6:45 PM, Jan Beulich wrote:
>>>> On 20.06.16 at 12:30, <yu.c.zhang@linux.intel.com> wrote:
>> On 6/20/2016 6:10 PM, George Dunlap wrote:
>>> On 20/06/16 10:03, Yu Zhang wrote:
>>>> So one solution is to disallow the log dirty feature in XenGT, i.e. just
>>>> return failure when enable_logdirty()
>>>> is called in toolstack. But I'm afraid this will restrict XenGT's future
>>>> live migration feature.
>>> I don't understand this -- you can return -EBUSY if live migration is
>>> attempted while there are outstanding ioreq_server entries for the time
>>> being, and at some point in the future when this actually works, you can
>>> return success.
>>>
>> Well, the problem is we cannot easily tell if there's any outstanding
>> p2m_ioreq_server entries.
> That's easy to address: Keep a running count.
>

Oh, sorry, let me try to clarify: here by "outstanding p2m_ioreq_server 
entries", I mean the
entries with p2m_ioreq_server type which have not been set back to 
p2m_ram_rw by device
model when the ioreq server detaches. But with asynchronous resetting, 
we can not differentiate
these entries with the normal write protected ones which also have the 
p2m_ioreq_server set.

Thanks
Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 11:06                           ` Yu Zhang
@ 2016-06-20 11:20                             ` Jan Beulich
  2016-06-20 12:06                               ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-20 11:20 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 20.06.16 at 13:06, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 6/20/2016 6:45 PM, Jan Beulich wrote:
>>>>> On 20.06.16 at 12:30, <yu.c.zhang@linux.intel.com> wrote:
>>> On 6/20/2016 6:10 PM, George Dunlap wrote:
>>>> On 20/06/16 10:03, Yu Zhang wrote:
>>>>> So one solution is to disallow the log dirty feature in XenGT, i.e. just
>>>>> return failure when enable_logdirty()
>>>>> is called in toolstack. But I'm afraid this will restrict XenGT's future
>>>>> live migration feature.
>>>> I don't understand this -- you can return -EBUSY if live migration is
>>>> attempted while there are outstanding ioreq_server entries for the time
>>>> being, and at some point in the future when this actually works, you can
>>>> return success.
>>>>
>>> Well, the problem is we cannot easily tell if there's any outstanding
>>> p2m_ioreq_server entries.
>> That's easy to address: Keep a running count.
> 
> Oh, sorry, let me try to clarify: here by "outstanding p2m_ioreq_server 
> entries", I mean the
> entries with p2m_ioreq_server type which have not been set back to 
> p2m_ram_rw by device
> model when the ioreq server detaches. But with asynchronous resetting, 
> we can not differentiate
> these entries with the normal write protected ones which also have the 
> p2m_ioreq_server set.

I guess I'm missing something here, because I can't see why we
can't distinguish them (and also can't arrange for that).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 10:55                           ` Jan Beulich
@ 2016-06-20 11:28                             ` Yu Zhang
  2016-06-20 13:13                               ` George Dunlap
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-20 11:28 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/20/2016 6:55 PM, Jan Beulich wrote:
>>>> On 20.06.16 at 12:32, <george.dunlap@citrix.com> wrote:
>> On 20/06/16 11:25, Jan Beulich wrote:
>>>>>> On 20.06.16 at 12:10, <george.dunlap@citrix.com> wrote:
>>>> On 20/06/16 10:03, Yu Zhang wrote:
>>>>> However, there are conflicts if we take live migration  into account,
>>>>> i.e. if the live migration is
>>>>> triggered by the user(unintentionally maybe) during the gpu emulation
>>>>> process, resolve_misconfig()
>>>>> will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty,
>>>>> which is not what we expected,
>>>>> because our intention is to only reset the outdated p2m_ioreq_server
>>>>> entries back to p2m_ram_rw.
>>>> Well the real problem in the situation you describe is that a second
>>>> "lazy" p2m_change_entry_type_global() operation is starting before the
>>>> first one is finished.  All that's needed to resolve the situation is
>>>> that if you get a second p2m_change_entry_type_global() operation while
>>>> there are outstanding entries from the first type change, you have to
>>>> finish the first operation (i.e., go "eagerly" find all the
>>>> misconfigured entries and change them to the new type) before starting
>>>> the second one.
>>> Eager resolution of outstanding entries can't be the solution here, I
>>> think, as that would - afaict - be as time consuming as doing the type
>>> change synchronously right away.
>> But isn't it the case that p2m_change_entry_type_global() is only
>> implemented for EPT?
> Also for NPT, we're using a similar model in p2m-pt.c (see e.g. the
> uses of RECALC_FLAGS - we're utilizing the _PAGE_USER set
> unconditionally leads to NPF). And since shadow sits on top of
> p2m-pt, that should be covered too.
>
>>   So we've been doing the slow method for both
>> shadow and AMD HAP (whatever it's called these days) since the
>> beginning.  And in any case we'd only have to go for the "slow" case in
>> circumstances where the 2nd type change happened before the first one
>> had completed.
> We can't even tell when one have fully finished.

I agree, we have no idea if the previous type change is completely done.
Besides, IIUC, the p2m_change_entry_type_gobal() is not a quite slow 
method, because it does
not invalidate all the paging structure entries at once, it just writes 
the upper level ones, others
are updated in resolve_misconfig().

>
>>>   p2m_change_entry_type_global(),
>>> at least right now, can be invoked freely without prior type changes
>>> having fully propagated. The logic resolving mis-configured entries
>>> simply needs to be able to know the correct new type. I can't see
>>> why this logic shouldn't therefore be extensible to this new type
>>> which can be in flight - after we ought to have a way to know what
>>> type a particular GFN is supposed to be?
>> Actually, come to think of it -- since the first type change is meant to
>> convert all ioreq_server -> ram_rw, and the second is meant to change
>> all ram_rw -> logdirty,  is there any case in which we *wouldn't* want
>> the resulting type to be logdirty?  Isn't that exactly what we'd get if
>> we'd done both operations synchronously?
> I think Yu's concern is for pages which did not get converted back?
> Or on the restore side? Otherwise - "yes" to both of your questions.
>

Yes. My concern is that resolve_misconfig() can not easily be extended 
to differentiate the
p2m_ioreq_server entries which need to be reset and the normal 
p2m_ioreq_server entries.
So my implementation in the 2nd version would met the dilemmas I 
described if take live
migration into account.

Thanks
Yu




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 11:20                             ` Jan Beulich
@ 2016-06-20 12:06                               ` Yu Zhang
  2016-06-20 13:38                                 ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-20 12:06 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/20/2016 7:20 PM, Jan Beulich wrote:
>>>> On 20.06.16 at 13:06, <yu.c.zhang@linux.intel.com> wrote:
>> On 6/20/2016 6:45 PM, Jan Beulich wrote:
>>>>>> On 20.06.16 at 12:30, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 6/20/2016 6:10 PM, George Dunlap wrote:
>>>>> On 20/06/16 10:03, Yu Zhang wrote:
>>>>>> So one solution is to disallow the log dirty feature in XenGT, i.e. just
>>>>>> return failure when enable_logdirty()
>>>>>> is called in toolstack. But I'm afraid this will restrict XenGT's future
>>>>>> live migration feature.
>>>>> I don't understand this -- you can return -EBUSY if live migration is
>>>>> attempted while there are outstanding ioreq_server entries for the time
>>>>> being, and at some point in the future when this actually works, you can
>>>>> return success.
>>>>>
>>>> Well, the problem is we cannot easily tell if there's any outstanding
>>>> p2m_ioreq_server entries.
>>> That's easy to address: Keep a running count.
>> Oh, sorry, let me try to clarify: here by "outstanding p2m_ioreq_server
>> entries", I mean the
>> entries with p2m_ioreq_server type which have not been set back to
>> p2m_ram_rw by device
>> model when the ioreq server detaches. But with asynchronous resetting,
>> we can not differentiate
>> these entries with the normal write protected ones which also have the
>> p2m_ioreq_server set.
> I guess I'm missing something here, because I can't see why we
> can't distinguish them (and also can't arrange for that).
>

Because both have the same p2m type and access rights.

Sorry, it's my duty to explain this more clearly, but I just realized 
it's hard to describe. :)
Let me try to elaborate this with an example:

Suppose resolve_misconfig() is modified to change all p2m_ioreq_server 
entries(which also
have e.recalc flag turned on) back to p2m_ram_rw. And suppose we have 
ioreq server 1, which
emulates gfn A, and ioreq server 2 which emulates gfn B:

1> At the beginning, ioreq server 1 is attached to p2m_ioreq_server, and 
gfn A is write protected
by setting it to p2m_ioreq_server;

2> ioreq server 1 is detached from p2m_ioreq_server, left gfn A's p2m 
type unchanged;

3> After the detachment of ioreq server 1, 
p2m_change_entry_type_global() is called, all ept
entries are invalidated;

4> Later, ioreq server 2 is attached to p2m_ioreq_server;

5> Gfn B is set to p2m_ioreq_server, although its corresponding ept 
entry was invalidated,
ept_set_entry() will trigger resolve_misconfig(), which will set the p2m 
type of gfn B back to
p2m_ram_rw;

6> ept_set_entry() will set gfn B's p2m type to p2m_ioreq_server next; 
And now, we have two
ept entries with p2m_ioreq_server type - gfn A's and gfn B's.

With no live migration, things could work fine - later accesses to gfn A 
will ultimately change
its type back to p2m_ram_rw.

However, if live migration is started(all pte entries invalidated 
again), resolve_misconfig() would
change both gfn A's and gfn B's p2m type back to p2m_ram_rw, which means 
the emulation of
gfn B would fail.

I tried to further extend the log dirty logic to solve this conflict, 
but failed, because we do not
know for sure when the resetting of gfn A will be performed and the code 
can not easily tell
the different expectations for gfn A and gfn B because they both have 
the same p2m type
and access rights.

Hope you can understand this problem and I would be very appreciated if 
you have any suggestion.
Anyway, thanks for your patience! :)

Thanks
yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 11:28                             ` Yu Zhang
@ 2016-06-20 13:13                               ` George Dunlap
  2016-06-21  7:42                                 ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-20 13:13 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 20/06/16 12:28, Yu Zhang wrote:
> 
> 
> On 6/20/2016 6:55 PM, Jan Beulich wrote:
>>>>> On 20.06.16 at 12:32, <george.dunlap@citrix.com> wrote:
>>> On 20/06/16 11:25, Jan Beulich wrote:
>>>>>>> On 20.06.16 at 12:10, <george.dunlap@citrix.com> wrote:
>>>>> On 20/06/16 10:03, Yu Zhang wrote:
>>>>>> However, there are conflicts if we take live migration  into account,
>>>>>> i.e. if the live migration is
>>>>>> triggered by the user(unintentionally maybe) during the gpu emulation
>>>>>> process, resolve_misconfig()
>>>>>> will set all the outstanding p2m_ioreq_server entries to
>>>>>> p2m_log_dirty,
>>>>>> which is not what we expected,
>>>>>> because our intention is to only reset the outdated p2m_ioreq_server
>>>>>> entries back to p2m_ram_rw.
>>>>> Well the real problem in the situation you describe is that a second
>>>>> "lazy" p2m_change_entry_type_global() operation is starting before the
>>>>> first one is finished.  All that's needed to resolve the situation is
>>>>> that if you get a second p2m_change_entry_type_global() operation
>>>>> while
>>>>> there are outstanding entries from the first type change, you have to
>>>>> finish the first operation (i.e., go "eagerly" find all the
>>>>> misconfigured entries and change them to the new type) before starting
>>>>> the second one.
>>>> Eager resolution of outstanding entries can't be the solution here, I
>>>> think, as that would - afaict - be as time consuming as doing the type
>>>> change synchronously right away.
>>> But isn't it the case that p2m_change_entry_type_global() is only
>>> implemented for EPT?
>> Also for NPT, we're using a similar model in p2m-pt.c (see e.g. the
>> uses of RECALC_FLAGS - we're utilizing the _PAGE_USER set
>> unconditionally leads to NPF). And since shadow sits on top of
>> p2m-pt, that should be covered too.
>>
>>>   So we've been doing the slow method for both
>>> shadow and AMD HAP (whatever it's called these days) since the
>>> beginning.  And in any case we'd only have to go for the "slow" case in
>>> circumstances where the 2nd type change happened before the first one
>>> had completed.
>> We can't even tell when one have fully finished.
> 
> I agree, we have no idea if the previous type change is completely done.
> Besides, IIUC, the p2m_change_entry_type_gobal() is not a quite slow
> method, because it does
> not invalidate all the paging structure entries at once, it just writes
> the upper level ones, others
> are updated in resolve_misconfig().
> 
>>
>>>>   p2m_change_entry_type_global(),
>>>> at least right now, can be invoked freely without prior type changes
>>>> having fully propagated. The logic resolving mis-configured entries
>>>> simply needs to be able to know the correct new type. I can't see
>>>> why this logic shouldn't therefore be extensible to this new type
>>>> which can be in flight - after we ought to have a way to know what
>>>> type a particular GFN is supposed to be?
>>> Actually, come to think of it -- since the first type change is meant to
>>> convert all ioreq_server -> ram_rw, and the second is meant to change
>>> all ram_rw -> logdirty,  is there any case in which we *wouldn't* want
>>> the resulting type to be logdirty?  Isn't that exactly what we'd get if
>>> we'd done both operations synchronously?
>> I think Yu's concern is for pages which did not get converted back?
>> Or on the restore side? Otherwise - "yes" to both of your questions.
>>
> 
> Yes. My concern is that resolve_misconfig() can not easily be extended
> to differentiate the
> p2m_ioreq_server entries which need to be reset and the normal
> p2m_ioreq_server entries.

Under what circumstance should resolve_misconfig() change a
misconfigured entry into a p2m_ioreq_server entry?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 12:06                               ` Yu Zhang
@ 2016-06-20 13:38                                 ` Jan Beulich
  2016-06-21  7:45                                   ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-20 13:38 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 20.06.16 at 14:06, <yu.c.zhang@linux.intel.com> wrote:
> Suppose resolve_misconfig() is modified to change all p2m_ioreq_server 
> entries(which also
> have e.recalc flag turned on) back to p2m_ram_rw. And suppose we have 
> ioreq server 1, which
> emulates gfn A, and ioreq server 2 which emulates gfn B:
> 
> 1> At the beginning, ioreq server 1 is attached to p2m_ioreq_server, and 
> gfn A is write protected
> by setting it to p2m_ioreq_server;
> 
> 2> ioreq server 1 is detached from p2m_ioreq_server, left gfn A's p2m 
> type unchanged;
> 
> 3> After the detachment of ioreq server 1, 
> p2m_change_entry_type_global() is called, all ept
> entries are invalidated;
> 
> 4> Later, ioreq server 2 is attached to p2m_ioreq_server;
> 
> 5> Gfn B is set to p2m_ioreq_server, although its corresponding ept 
> entry was invalidated,
> ept_set_entry() will trigger resolve_misconfig(), which will set the p2m 
> type of gfn B back to
> p2m_ram_rw;
> 
> 6> ept_set_entry() will set gfn B's p2m type to p2m_ioreq_server next; 
> And now, we have two
> ept entries with p2m_ioreq_server type - gfn A's and gfn B's.
> 
> With no live migration, things could work fine - later accesses to gfn A 
> will ultimately change
> its type back to p2m_ram_rw.
> 
> However, if live migration is started(all pte entries invalidated 
> again), resolve_misconfig() would
> change both gfn A's and gfn B's p2m type back to p2m_ram_rw, which means 
> the emulation of
> gfn B would fail.

Why would it? Changes to p2m_ram_logdirty won't alter
p2m_ioreq_server entries, and hence changes from it back to
p2m_ram_rw won't either.

And then - didn't we mean to disable that part of XenGT during
migration, i.e. temporarily accept the higher performance
overhead without the p2m_ioreq_server entries? In which case
flipping everything back to p2m_ram_rw after (completed or
canceled) migration would be exactly what we want. The (new
or previous) ioreq server should attach only afterwards, and
can then freely re-establish any p2m_ioreq_server entries it
deems necessary.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 13:13                               ` George Dunlap
@ 2016-06-21  7:42                                 ` Yu Zhang
  0 siblings, 0 replies; 68+ messages in thread
From: Yu Zhang @ 2016-06-21  7:42 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/20/2016 9:13 PM, George Dunlap wrote:
> On 20/06/16 12:28, Yu Zhang wrote:
>>
>> On 6/20/2016 6:55 PM, Jan Beulich wrote:
>>>>>> On 20.06.16 at 12:32, <george.dunlap@citrix.com> wrote:
>>>> On 20/06/16 11:25, Jan Beulich wrote:
>>>>>>>> On 20.06.16 at 12:10, <george.dunlap@citrix.com> wrote:
>>>>>> On 20/06/16 10:03, Yu Zhang wrote:
>>>>>>> However, there are conflicts if we take live migration  into account,
>>>>>>> i.e. if the live migration is
>>>>>>> triggered by the user(unintentionally maybe) during the gpu emulation
>>>>>>> process, resolve_misconfig()
>>>>>>> will set all the outstanding p2m_ioreq_server entries to
>>>>>>> p2m_log_dirty,
>>>>>>> which is not what we expected,
>>>>>>> because our intention is to only reset the outdated p2m_ioreq_server
>>>>>>> entries back to p2m_ram_rw.
>>>>>> Well the real problem in the situation you describe is that a second
>>>>>> "lazy" p2m_change_entry_type_global() operation is starting before the
>>>>>> first one is finished.  All that's needed to resolve the situation is
>>>>>> that if you get a second p2m_change_entry_type_global() operation
>>>>>> while
>>>>>> there are outstanding entries from the first type change, you have to
>>>>>> finish the first operation (i.e., go "eagerly" find all the
>>>>>> misconfigured entries and change them to the new type) before starting
>>>>>> the second one.
>>>>> Eager resolution of outstanding entries can't be the solution here, I
>>>>> think, as that would - afaict - be as time consuming as doing the type
>>>>> change synchronously right away.
>>>> But isn't it the case that p2m_change_entry_type_global() is only
>>>> implemented for EPT?
>>> Also for NPT, we're using a similar model in p2m-pt.c (see e.g. the
>>> uses of RECALC_FLAGS - we're utilizing the _PAGE_USER set
>>> unconditionally leads to NPF). And since shadow sits on top of
>>> p2m-pt, that should be covered too.
>>>
>>>>    So we've been doing the slow method for both
>>>> shadow and AMD HAP (whatever it's called these days) since the
>>>> beginning.  And in any case we'd only have to go for the "slow" case in
>>>> circumstances where the 2nd type change happened before the first one
>>>> had completed.
>>> We can't even tell when one have fully finished.
>> I agree, we have no idea if the previous type change is completely done.
>> Besides, IIUC, the p2m_change_entry_type_gobal() is not a quite slow
>> method, because it does
>> not invalidate all the paging structure entries at once, it just writes
>> the upper level ones, others
>> are updated in resolve_misconfig().
>>
>>>>>    p2m_change_entry_type_global(),
>>>>> at least right now, can be invoked freely without prior type changes
>>>>> having fully propagated. The logic resolving mis-configured entries
>>>>> simply needs to be able to know the correct new type. I can't see
>>>>> why this logic shouldn't therefore be extensible to this new type
>>>>> which can be in flight - after we ought to have a way to know what
>>>>> type a particular GFN is supposed to be?
>>>> Actually, come to think of it -- since the first type change is meant to
>>>> convert all ioreq_server -> ram_rw, and the second is meant to change
>>>> all ram_rw -> logdirty,  is there any case in which we *wouldn't* want
>>>> the resulting type to be logdirty?  Isn't that exactly what we'd get if
>>>> we'd done both operations synchronously?
>>> I think Yu's concern is for pages which did not get converted back?
>>> Or on the restore side? Otherwise - "yes" to both of your questions.
>>>
>> Yes. My concern is that resolve_misconfig() can not easily be extended
>> to differentiate the
>> p2m_ioreq_server entries which need to be reset and the normal
>> p2m_ioreq_server entries.
> Under what circumstance should resolve_misconfig() change a
> misconfigured entry into a p2m_ioreq_server entry?

Oh, I did not mean that. Routine resolve_misconfig() shall not change
any entry into a p2m_ioreq_server type. I hoped this routine could be
changed to reset outdated p2m_ioreq_server entries(by "outdated" I
refer to the entries which are no longer tracked by an ioreq server but
remain as p2m_ioreq_server) back to p2m_ram_rw type.

Later I realized that we may also change the normal p2m_ioreq_server
entries(by "nomal" I mean the gfns which are in emulation process) if
live migration is triggered during emulation process. And it's hard to
distinguish the outdated p2m_ioreq_server entries and the normal ones.

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-20 13:38                                 ` Jan Beulich
@ 2016-06-21  7:45                                   ` Yu Zhang
  2016-06-21  8:22                                     ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-21  7:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima



On 6/20/2016 9:38 PM, Jan Beulich wrote:
>>>> On 20.06.16 at 14:06, <yu.c.zhang@linux.intel.com> wrote:
>> Suppose resolve_misconfig() is modified to change all p2m_ioreq_server
>> entries(which also
>> have e.recalc flag turned on) back to p2m_ram_rw. And suppose we have
>> ioreq server 1, which
>> emulates gfn A, and ioreq server 2 which emulates gfn B:
>>
>> 1> At the beginning, ioreq server 1 is attached to p2m_ioreq_server, and
>> gfn A is write protected
>> by setting it to p2m_ioreq_server;
>>
>> 2> ioreq server 1 is detached from p2m_ioreq_server, left gfn A's p2m
>> type unchanged;
>>
>> 3> After the detachment of ioreq server 1,
>> p2m_change_entry_type_global() is called, all ept
>> entries are invalidated;
>>
>> 4> Later, ioreq server 2 is attached to p2m_ioreq_server;
>>
>> 5> Gfn B is set to p2m_ioreq_server, although its corresponding ept
>> entry was invalidated,
>> ept_set_entry() will trigger resolve_misconfig(), which will set the p2m
>> type of gfn B back to
>> p2m_ram_rw;
>>
>> 6> ept_set_entry() will set gfn B's p2m type to p2m_ioreq_server next;
>> And now, we have two
>> ept entries with p2m_ioreq_server type - gfn A's and gfn B's.
>>
>> With no live migration, things could work fine - later accesses to gfn A
>> will ultimately change
>> its type back to p2m_ram_rw.
>>
>> However, if live migration is started(all pte entries invalidated
>> again), resolve_misconfig() would
>> change both gfn A's and gfn B's p2m type back to p2m_ram_rw, which means
>> the emulation of
>> gfn B would fail.
> Why would it? Changes to p2m_ram_logdirty won't alter
> p2m_ioreq_server entries, and hence changes from it back to
> p2m_ram_rw won't either.

Oh, above example is based on the assumption that resolve_misconfig() is 
extended
to handle the p2m_ioreq_server case(see my "Suppose resolve_misconfig() 
is modified...").
The code change could be something like below:

@@ -542,10 +542,14 @@ static int resolve_misconfig(struct p2m_domain 
*p2m, unsigned long gfn)

-                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
+                   if ( e.recalc )
                      {
-                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + 
i, gfn + i)
-                                     ? p2m_ram_logdirty : p2m_ram_rw;
+                         if ( e.sa_p2mt == p2m_ioreq_server )
+                             e.sa_p2mt = p2m_ram_rw;
+                         else if ( p2m_is_changeable(e.sa_p2mt) )
+                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn 
+ i, gfn + i)
+                                         ? p2m_ram_logdirty : p2m_ram_rw;
+
                           ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, 
e.access);
                      }
                      e.recalc = 0;

With changes like this, both p2m types of gfn A and gfn B from above example
would be set to p2m_ram_rw if log dirty is enabled.
So that's what I am worrying - if a user unintentionally typed "xl save" 
during
the emulation process , the emulation would fail. We can let the 
enable_logdirty()
return false if XenGT is detected, but we still wish to keep the log 
dirty feature.

>
> And then - didn't we mean to disable that part of XenGT during
> migration, i.e. temporarily accept the higher performance
> overhead without the p2m_ioreq_server entries? In which case
> flipping everything back to p2m_ram_rw after (completed or
> canceled) migration would be exactly what we want. The (new
> or previous) ioreq server should attach only afterwards, and
> can then freely re-establish any p2m_ioreq_server entries it
> deems necessary.
>

Well, I agree this part of XenGT should be disabled during migration. 
But in such
case I think it's device model's job to trigger the p2m type 
flipping(i.e. by calling
HVMOP_set_mem_type). And the device model should be notified first when the
migration begins - we may need new patches to do so if XenGT is going to 
support
vGPU migration in the future.

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-21  7:45                                   ` Yu Zhang
@ 2016-06-21  8:22                                     ` Jan Beulich
  2016-06-21  9:16                                       ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-21  8:22 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 21.06.16 at 09:45, <yu.c.zhang@linux.intel.com> wrote:
> On 6/20/2016 9:38 PM, Jan Beulich wrote:
>>>>> On 20.06.16 at 14:06, <yu.c.zhang@linux.intel.com> wrote:
>>> However, if live migration is started(all pte entries invalidated
>>> again), resolve_misconfig() would
>>> change both gfn A's and gfn B's p2m type back to p2m_ram_rw, which means
>>> the emulation of
>>> gfn B would fail.
>> Why would it? Changes to p2m_ram_logdirty won't alter
>> p2m_ioreq_server entries, and hence changes from it back to
>> p2m_ram_rw won't either.
> 
> Oh, above example is based on the assumption that resolve_misconfig() is 
> extended
> to handle the p2m_ioreq_server case(see my "Suppose resolve_misconfig() 
> is modified...").
> The code change could be something like below:
> 
> @@ -542,10 +542,14 @@ static int resolve_misconfig(struct p2m_domain 
> *p2m, unsigned long gfn)
> 
> -                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
> +                   if ( e.recalc )
>                       {
> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + 
> i, gfn + i)
> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                             e.sa_p2mt = p2m_ram_rw;
> +                         else if ( p2m_is_changeable(e.sa_p2mt) )
> +                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn 
> + i, gfn + i)
> +                                         ? p2m_ram_logdirty : p2m_ram_rw;
> +
>                            ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, 
> e.access);
>                       }
>                       e.recalc = 0;
> 
> With changes like this, both p2m types of gfn A and gfn B from above example
> would be set to p2m_ram_rw if log dirty is enabled.

Above modification would convert _all_ p2m_ioreq_server into
p2m_ram_rw, irrespective of log-dirty mode being active. Which
I don't think is what you want.

> So that's what I am worrying - if a user unintentionally typed "xl save" 
> during
> the emulation process , the emulation would fail. We can let the 
> enable_logdirty()
> return false if XenGT is detected, but we still wish to keep the log 
> dirty feature.

Well, enabling log-dirty mode would succeed as soon as all
p2m_ioreq_server pages got converted back to normal ones (by
the device model). So an unintentional "xl save" would simply fail.
Is there any problem with that?

>> And then - didn't we mean to disable that part of XenGT during
>> migration, i.e. temporarily accept the higher performance
>> overhead without the p2m_ioreq_server entries? In which case
>> flipping everything back to p2m_ram_rw after (completed or
>> canceled) migration would be exactly what we want. The (new
>> or previous) ioreq server should attach only afterwards, and
>> can then freely re-establish any p2m_ioreq_server entries it
>> deems necessary.
>>
> 
> Well, I agree this part of XenGT should be disabled during migration. 
> But in such
> case I think it's device model's job to trigger the p2m type 
> flipping(i.e. by calling
> HVMOP_set_mem_type).

I agree - this would seem to be the simpler model here, despite (as
George validly says) the more consistent model would be for the
hypervisor to do the cleanup. Such cleanup would imo be reasonable
only if there was an easy way for the hypervisor to enumerate all
p2m_ioreq_server pages.

> And the device model should be notified first when the
> migration begins - we may need new patches to do so if XenGT is going to 
> support
> vGPU migration in the future.

Quite possible.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-21  8:22                                     ` Jan Beulich
@ 2016-06-21  9:16                                       ` Yu Zhang
  2016-06-21  9:47                                         ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-21  9:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima



On 6/21/2016 4:22 PM, Jan Beulich wrote:
>>>> On 21.06.16 at 09:45, <yu.c.zhang@linux.intel.com> wrote:
>> On 6/20/2016 9:38 PM, Jan Beulich wrote:
>>>>>> On 20.06.16 at 14:06, <yu.c.zhang@linux.intel.com> wrote:
>>>> However, if live migration is started(all pte entries invalidated
>>>> again), resolve_misconfig() would
>>>> change both gfn A's and gfn B's p2m type back to p2m_ram_rw, which means
>>>> the emulation of
>>>> gfn B would fail.
>>> Why would it? Changes to p2m_ram_logdirty won't alter
>>> p2m_ioreq_server entries, and hence changes from it back to
>>> p2m_ram_rw won't either.
>> Oh, above example is based on the assumption that resolve_misconfig() is
>> extended
>> to handle the p2m_ioreq_server case(see my "Suppose resolve_misconfig()
>> is modified...").
>> The code change could be something like below:
>>
>> @@ -542,10 +542,14 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>>
>> -                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>> +                   if ( e.recalc )
>>                        {
>> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
>> i, gfn + i)
>> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>> +                             e.sa_p2mt = p2m_ram_rw;
>> +                         else if ( p2m_is_changeable(e.sa_p2mt) )
>> +                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>> + i, gfn + i)
>> +                                         ? p2m_ram_logdirty : p2m_ram_rw;
>> +
>>                             ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
>> e.access);
>>                        }
>>                        e.recalc = 0;
>>
>> With changes like this, both p2m types of gfn A and gfn B from above example
>> would be set to p2m_ram_rw if log dirty is enabled.
> Above modification would convert _all_ p2m_ioreq_server into
> p2m_ram_rw, irrespective of log-dirty mode being active. Which
> I don't think is what you want.

Well, this is another situation I found very interesting: without log-dirty,
this approach actually works. :) And the reasons are:

- resolve_misconfig() will only recalculate entries which have e.recalc
flag set;

- For the outdated p2m_ioreq_server entries, this routine will reset
them back to p2m_ram_rw;

- For the new p2m_ioreq_server entries, their e.recalc flag will first
be cleared to 0 by ept_set_entry() -> resolve_misconfig() is used to
set the p2m type when hvmop_set_mem_type is invoked);

- Yet live migration will turn on the recalc flag for all entries again...

You can see this in steps 3> to 6> in my previous example. :)

>
>> So that's what I am worrying - if a user unintentionally typed "xl save"
>> during
>> the emulation process , the emulation would fail. We can let the
>> enable_logdirty()
>> return false if XenGT is detected, but we still wish to keep the log
>> dirty feature.
> Well, enabling log-dirty mode would succeed as soon as all
> p2m_ioreq_server pages got converted back to normal ones (by
> the device model). So an unintentional "xl save" would simply fail.
> Is there any problem with that?

Well, I agree.
Since this patchset is only about ioreq server change, my plan is to keep
the log dirty logic as it is for now, and in the future if we are gonna 
support
vGPU migration, we can consider to let "xl save" simply fail if the device
model side is not ready(i.e. not having finished its memory type cleaning
tasks).

>>> And then - didn't we mean to disable that part of XenGT during
>>> migration, i.e. temporarily accept the higher performance
>>> overhead without the p2m_ioreq_server entries? In which case
>>> flipping everything back to p2m_ram_rw after (completed or
>>> canceled) migration would be exactly what we want. The (new
>>> or previous) ioreq server should attach only afterwards, and
>>> can then freely re-establish any p2m_ioreq_server entries it
>>> deems necessary.
>>>
>> Well, I agree this part of XenGT should be disabled during migration.
>> But in such
>> case I think it's device model's job to trigger the p2m type
>> flipping(i.e. by calling
>> HVMOP_set_mem_type).
> I agree - this would seem to be the simpler model here, despite (as
> George validly says) the more consistent model would be for the
> hypervisor to do the cleanup. Such cleanup would imo be reasonable
> only if there was an easy way for the hypervisor to enumerate all
> p2m_ioreq_server pages.

Well, for me, the "easy way" means we should avoid traversing the whole ept
paging structure all at once, right? I have not figured out any clean 
solution
in hypervisor side, that's one reason I'd like to left this job to 
device model
side(another reason is that I do think device model should take this 
responsibility).

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-21  9:16                                       ` Yu Zhang
@ 2016-06-21  9:47                                         ` Jan Beulich
  2016-06-21 10:00                                           ` Yu Zhang
  2016-06-21 14:38                                           ` George Dunlap
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-21  9:47 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 21.06.16 at 11:16, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 6/21/2016 4:22 PM, Jan Beulich wrote:
>>>>> On 21.06.16 at 09:45, <yu.c.zhang@linux.intel.com> wrote:
>>> On 6/20/2016 9:38 PM, Jan Beulich wrote:
>>>>>>> On 20.06.16 at 14:06, <yu.c.zhang@linux.intel.com> wrote:
>>>>> However, if live migration is started(all pte entries invalidated
>>>>> again), resolve_misconfig() would
>>>>> change both gfn A's and gfn B's p2m type back to p2m_ram_rw, which means
>>>>> the emulation of
>>>>> gfn B would fail.
>>>> Why would it? Changes to p2m_ram_logdirty won't alter
>>>> p2m_ioreq_server entries, and hence changes from it back to
>>>> p2m_ram_rw won't either.
>>> Oh, above example is based on the assumption that resolve_misconfig() is
>>> extended
>>> to handle the p2m_ioreq_server case(see my "Suppose resolve_misconfig()
>>> is modified...").
>>> The code change could be something like below:
>>>
>>> @@ -542,10 +542,14 @@ static int resolve_misconfig(struct p2m_domain
>>> *p2m, unsigned long gfn)
>>>
>>> -                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>> +                   if ( e.recalc )
>>>                        {
>>> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
>>> i, gfn + i)
>>> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>> +                             e.sa_p2mt = p2m_ram_rw;
>>> +                         else if ( p2m_is_changeable(e.sa_p2mt) )
>>> +                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>> + i, gfn + i)
>>> +                                         ? p2m_ram_logdirty : p2m_ram_rw;
>>> +
>>>                             ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
>>> e.access);
>>>                        }
>>>                        e.recalc = 0;
>>>
>>> With changes like this, both p2m types of gfn A and gfn B from above example
>>> would be set to p2m_ram_rw if log dirty is enabled.
>> Above modification would convert _all_ p2m_ioreq_server into
>> p2m_ram_rw, irrespective of log-dirty mode being active. Which
>> I don't think is what you want.
> 
> Well, this is another situation I found very interesting: without log-dirty,
> this approach actually works. :)

And what if the recalc flag gets set for some other reason?

>>>> And then - didn't we mean to disable that part of XenGT during
>>>> migration, i.e. temporarily accept the higher performance
>>>> overhead without the p2m_ioreq_server entries? In which case
>>>> flipping everything back to p2m_ram_rw after (completed or
>>>> canceled) migration would be exactly what we want. The (new
>>>> or previous) ioreq server should attach only afterwards, and
>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>> deems necessary.
>>>>
>>> Well, I agree this part of XenGT should be disabled during migration.
>>> But in such
>>> case I think it's device model's job to trigger the p2m type
>>> flipping(i.e. by calling
>>> HVMOP_set_mem_type).
>> I agree - this would seem to be the simpler model here, despite (as
>> George validly says) the more consistent model would be for the
>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>> only if there was an easy way for the hypervisor to enumerate all
>> p2m_ioreq_server pages.
> 
> Well, for me, the "easy way" means we should avoid traversing the whole ept
> paging structure all at once, right?

Yes.

> I have not figured out any clean 
> solution
> in hypervisor side, that's one reason I'd like to left this job to 
> device model
> side(another reason is that I do think device model should take this 
> responsibility).

Let's see if we can get George to agree.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-21  9:47                                         ` Jan Beulich
@ 2016-06-21 10:00                                           ` Yu Zhang
  2016-06-21 14:38                                           ` George Dunlap
  1 sibling, 0 replies; 68+ messages in thread
From: Yu Zhang @ 2016-06-21 10:00 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, Andrew Cooper, Tim Deegan, George Dunlap, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/21/2016 5:47 PM, Jan Beulich wrote:
>> On 6/21/2016 4:22 PM, Jan Beulich wrote:
>>>
>>> Above modification would convert _all_ p2m_ioreq_server into
>>> p2m_ram_rw, irrespective of log-dirty mode being active. Which
>>> I don't think is what you want.
>> Well, this is another situation I found very interesting: without log-dirty,
>> this approach actually works. :)
> And what if the recalc flag gets set for some other reason?

Then the previous assumption will not hold. :)
But for now, the log dirty code is the only place I have found in 
hypervisor which
will turn on the recalc flag.

>>> I agree - this would seem to be the simpler model here, despite (as
>>> George validly says) the more consistent model would be for the
>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>> only if there was an easy way for the hypervisor to enumerate all
>>> p2m_ioreq_server pages.
>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>> paging structure all at once, right?
> Yes.
>
>> I have not figured out any clean
>> solution
>> in hypervisor side, that's one reason I'd like to left this job to
>> device model
>> side(another reason is that I do think device model should take this
>> responsibility).
> Let's see if we can get George to agree.

OK. Thanks!

Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-21  9:47                                         ` Jan Beulich
  2016-06-21 10:00                                           ` Yu Zhang
@ 2016-06-21 14:38                                           ` George Dunlap
  2016-06-22  6:39                                             ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-21 14:38 UTC (permalink / raw)
  To: Jan Beulich, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 21/06/16 10:47, Jan Beulich wrote:
>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>> migration, i.e. temporarily accept the higher performance
>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>> canceled) migration would be exactly what we want. The (new
>>>>> or previous) ioreq server should attach only afterwards, and
>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>> deems necessary.
>>>>>
>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>> But in such
>>>> case I think it's device model's job to trigger the p2m type
>>>> flipping(i.e. by calling
>>>> HVMOP_set_mem_type).
>>> I agree - this would seem to be the simpler model here, despite (as
>>> George validly says) the more consistent model would be for the
>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>> only if there was an easy way for the hypervisor to enumerate all
>>> p2m_ioreq_server pages.
>>
>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>> paging structure all at once, right?
> 
> Yes.

Does calling p2m_change_entry_type_global() not satisfy this requirement?

>> I have not figured out any clean 
>> solution
>> in hypervisor side, that's one reason I'd like to left this job to 
>> device model
>> side(another reason is that I do think device model should take this 
>> responsibility).
> 
> Let's see if we can get George to agree.

Well I had in principle already agreed to letting this be the interface
on the previous round of patches; we're having this discussion because
you (Jan) asked about what happens if an ioreq server is de-registered
while there are still outstanding p2m types. :-)

I do think having Xen change the type makes the most sense, but if
you're happy to leave that up to the ioreq server, I'm OK with things
being done that way as well.  I think we can probably change it later if
we want.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-21 14:38                                           ` George Dunlap
@ 2016-06-22  6:39                                             ` Jan Beulich
  2016-06-22  8:38                                               ` Yu Zhang
  2016-06-22  9:16                                               ` George Dunlap
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-22  6:39 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>> deems necessary.
>>>>>>
>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>> But in such
>>>>> case I think it's device model's job to trigger the p2m type
>>>>> flipping(i.e. by calling
>>>>> HVMOP_set_mem_type).
>>>> I agree - this would seem to be the simpler model here, despite (as
>>>> George validly says) the more consistent model would be for the
>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>> only if there was an easy way for the hypervisor to enumerate all
>>>> p2m_ioreq_server pages.
>>>
>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>> paging structure all at once, right?
>> 
>> Yes.
> 
> Does calling p2m_change_entry_type_global() not satisfy this requirement?

Not really - that addresses the "low overhead" aspect, but not the
"enumerate all such entries" one.

>>> I have not figured out any clean 
>>> solution
>>> in hypervisor side, that's one reason I'd like to left this job to 
>>> device model
>>> side(another reason is that I do think device model should take this 
>>> responsibility).
>> 
>> Let's see if we can get George to agree.
> 
> Well I had in principle already agreed to letting this be the interface
> on the previous round of patches; we're having this discussion because
> you (Jan) asked about what happens if an ioreq server is de-registered
> while there are still outstanding p2m types. :-)

Indeed. Yet so far I understood you didn't like de-registration to
both not do the cleanup itself and fail if there are outstanding
entries.

> I do think having Xen change the type makes the most sense, but if
> you're happy to leave that up to the ioreq server, I'm OK with things
> being done that way as well.  I think we can probably change it later if
> we want.

Yes, since ioreq server interfaces will all be unstable ones, that
shouldn't be a problem. Albeit that's only the theory. With the call
coming from the device model, we'd need to make sure to put all
the logic (if any) to deal with the hypervisor implementation details
into libxc, so the caller of the libxc interface won't need to change.
I've learned during putting together the hvmctl series that this
wasn't done cleanly enough for one of the existing interfaces (see
patch 10 of that series).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22  6:39                                             ` Jan Beulich
@ 2016-06-22  8:38                                               ` Yu Zhang
  2016-06-22  9:11                                                 ` Jan Beulich
  2016-06-22  9:16                                               ` George Dunlap
  1 sibling, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-22  8:38 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/22/2016 2:39 PM, Jan Beulich wrote:
>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>> deems necessary.
>>>>>>>
>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>> But in such
>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>> flipping(i.e. by calling
>>>>>> HVMOP_set_mem_type).
>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>> George validly says) the more consistent model would be for the
>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>> p2m_ioreq_server pages.
>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>> paging structure all at once, right?
>>> Yes.
>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
> Not really - that addresses the "low overhead" aspect, but not the
> "enumerate all such entries" one.
>
>>>> I have not figured out any clean
>>>> solution
>>>> in hypervisor side, that's one reason I'd like to left this job to
>>>> device model
>>>> side(another reason is that I do think device model should take this
>>>> responsibility).
>>> Let's see if we can get George to agree.
>> Well I had in principle already agreed to letting this be the interface
>> on the previous round of patches; we're having this discussion because
>> you (Jan) asked about what happens if an ioreq server is de-registered
>> while there are still outstanding p2m types. :-)
> Indeed. Yet so far I understood you didn't like de-registration to
> both not do the cleanup itself and fail if there are outstanding
> entries.
>
>> I do think having Xen change the type makes the most sense, but if
>> you're happy to leave that up to the ioreq server, I'm OK with things
>> being done that way as well.  I think we can probably change it later if
>> we want.
> Yes, since ioreq server interfaces will all be unstable ones, that
> shouldn't be a problem. Albeit that's only the theory. With the call
> coming from the device model, we'd need to make sure to put all
> the logic (if any) to deal with the hypervisor implementation details
> into libxc, so the caller of the libxc interface won't need to change.
> I've learned during putting together the hvmctl series that this
> wasn't done cleanly enough for one of the existing interfaces (see
> patch 10 of that series).

Thanks Jan & George. So I guess you both accepted that we can left the 
clean up to
the device model side, right?

B.R.
Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22  8:38                                               ` Yu Zhang
@ 2016-06-22  9:11                                                 ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-22  9:11 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 22.06.16 at 10:38, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 6/22/2016 2:39 PM, Jan Beulich wrote:
>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>> deems necessary.
>>>>>>>>
>>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>>> But in such
>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>> flipping(i.e. by calling
>>>>>>> HVMOP_set_mem_type).
>>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>>> George validly says) the more consistent model would be for the
>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>> p2m_ioreq_server pages.
>>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>>> paging structure all at once, right?
>>>> Yes.
>>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
>> Not really - that addresses the "low overhead" aspect, but not the
>> "enumerate all such entries" one.
>>
>>>>> I have not figured out any clean
>>>>> solution
>>>>> in hypervisor side, that's one reason I'd like to left this job to
>>>>> device model
>>>>> side(another reason is that I do think device model should take this
>>>>> responsibility).
>>>> Let's see if we can get George to agree.
>>> Well I had in principle already agreed to letting this be the interface
>>> on the previous round of patches; we're having this discussion because
>>> you (Jan) asked about what happens if an ioreq server is de-registered
>>> while there are still outstanding p2m types. :-)
>> Indeed. Yet so far I understood you didn't like de-registration to
>> both not do the cleanup itself and fail if there are outstanding
>> entries.
>>
>>> I do think having Xen change the type makes the most sense, but if
>>> you're happy to leave that up to the ioreq server, I'm OK with things
>>> being done that way as well.  I think we can probably change it later if
>>> we want.
>> Yes, since ioreq server interfaces will all be unstable ones, that
>> shouldn't be a problem. Albeit that's only the theory. With the call
>> coming from the device model, we'd need to make sure to put all
>> the logic (if any) to deal with the hypervisor implementation details
>> into libxc, so the caller of the libxc interface won't need to change.
>> I've learned during putting together the hvmctl series that this
>> wasn't done cleanly enough for one of the existing interfaces (see
>> patch 10 of that series).
> 
> Thanks Jan & George. So I guess you both accepted that we can left the 
> clean up to
> the device model side, right?

Yes, except that I'd prefer it being worded "require the device model
to do the cleanup" over "leave it to ...", to make explicit that failure
will result upon de-registration when there are outstanding pages.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22  6:39                                             ` Jan Beulich
  2016-06-22  8:38                                               ` Yu Zhang
@ 2016-06-22  9:16                                               ` George Dunlap
  2016-06-22  9:29                                                 ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-22  9:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

On 22/06/16 07:39, Jan Beulich wrote:
>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>> deems necessary.
>>>>>>>
>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>> But in such
>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>> flipping(i.e. by calling
>>>>>> HVMOP_set_mem_type).
>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>> George validly says) the more consistent model would be for the
>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>> p2m_ioreq_server pages.
>>>>
>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>> paging structure all at once, right?
>>>
>>> Yes.
>>
>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
> 
> Not really - that addresses the "low overhead" aspect, but not the
> "enumerate all such entries" one.

I'm sorry, I think I'm missing something here.  What do we need the
enumeration for?

>> Well I had in principle already agreed to letting this be the interface
>> on the previous round of patches; we're having this discussion because
>> you (Jan) asked about what happens if an ioreq server is de-registered
>> while there are still outstanding p2m types. :-)
> 
> Indeed. Yet so far I understood you didn't like de-registration to
> both not do the cleanup itself and fail if there are outstanding
> entries.

No, I think regarding deregistering while there were outstanding
entries, I said the opposite -- that there's no point in failing the
de-registration, because a poorly-behaved ioreq server may just ignore
the error code and exit anyway.  Although, thinking on it again, I
suppose that an error code would allow a buggy ioreq server to know that
it had screwed up somewhere.  But either way, from the "robustness"
perspective, the result would almost certainly be a dangling ioreq
server registration *in addition* to the dangling p2m entries; so the
difference is just an interface tweak to aid in debugging, not worth
insisting on given the required work.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22  9:16                                               ` George Dunlap
@ 2016-06-22  9:29                                                 ` Jan Beulich
  2016-06-22  9:47                                                   ` George Dunlap
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-22  9:29 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
> On 22/06/16 07:39, Jan Beulich wrote:
>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>> deems necessary.
>>>>>>>>
>>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>>> But in such
>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>> flipping(i.e. by calling
>>>>>>> HVMOP_set_mem_type).
>>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>>> George validly says) the more consistent model would be for the
>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>> p2m_ioreq_server pages.
>>>>>
>>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>>> paging structure all at once, right?
>>>>
>>>> Yes.
>>>
>>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
>> 
>> Not really - that addresses the "low overhead" aspect, but not the
>> "enumerate all such entries" one.
> 
> I'm sorry, I think I'm missing something here.  What do we need the
> enumeration for?

We'd need that if we were to do the cleanup in the hypervisor (as
we can't rely on all p2m entry re-calculation to have happened by
the time a new ioreq server registers for the type).

>>> Well I had in principle already agreed to letting this be the interface
>>> on the previous round of patches; we're having this discussion because
>>> you (Jan) asked about what happens if an ioreq server is de-registered
>>> while there are still outstanding p2m types. :-)
>> 
>> Indeed. Yet so far I understood you didn't like de-registration to
>> both not do the cleanup itself and fail if there are outstanding
>> entries.
> 
> No, I think regarding deregistering while there were outstanding
> entries, I said the opposite -- that there's no point in failing the
> de-registration, because a poorly-behaved ioreq server may just ignore
> the error code and exit anyway.  Although, thinking on it again, I
> suppose that an error code would allow a buggy ioreq server to know that
> it had screwed up somewhere.

Not exactly, I think: The failed de-registration ought to lead to failure
of an attempt to register another ioreq server (or the same one again),
which should make the issue quickly noticable.

> But either way, from the "robustness"
> perspective, the result would almost certainly be a dangling ioreq
> server registration *in addition* to the dangling p2m entries; so the
> difference is just an interface tweak to aid in debugging, not worth
> insisting on given the required work.

So yes, observable behavior wise there shouldn't be any difference.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22  9:29                                                 ` Jan Beulich
@ 2016-06-22  9:47                                                   ` George Dunlap
  2016-06-22 10:07                                                     ` Yu Zhang
  2016-06-22 10:10                                                     ` Jan Beulich
  0 siblings, 2 replies; 68+ messages in thread
From: George Dunlap @ 2016-06-22  9:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

On 22/06/16 10:29, Jan Beulich wrote:
>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>> deems necessary.
>>>>>>>>>
>>>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>>>> But in such
>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>> flipping(i.e. by calling
>>>>>>>> HVMOP_set_mem_type).
>>>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>>>> George validly says) the more consistent model would be for the
>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>>> p2m_ioreq_server pages.
>>>>>>
>>>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>>>> paging structure all at once, right?
>>>>>
>>>>> Yes.
>>>>
>>>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
>>>
>>> Not really - that addresses the "low overhead" aspect, but not the
>>> "enumerate all such entries" one.
>>
>> I'm sorry, I think I'm missing something here.  What do we need the
>> enumeration for?
> 
> We'd need that if we were to do the cleanup in the hypervisor (as
> we can't rely on all p2m entry re-calculation to have happened by
> the time a new ioreq server registers for the type).

So you're afraid of this sequence of events?
1) Server A de-registered, triggering a ioreq_server -> ram_rw type change
2) gfn N is marked as misconfigured
3) Server B registers and marks gfn N as ioreq_server
4) When N is accessed, the misconfiguration is resolved incorrectly to
ram_rw

But that can't happen, because misconfigured entries are resolved before
setting a p2m entry; so at step 3, gfn N will be first set to
(non-misconfigured) ram_rw, then changed to (non-misconfigured)
ioreq_server.

Or is there another sequence of events that I'm missing?

>>>> Well I had in principle already agreed to letting this be the interface
>>>> on the previous round of patches; we're having this discussion because
>>>> you (Jan) asked about what happens if an ioreq server is de-registered
>>>> while there are still outstanding p2m types. :-)
>>>
>>> Indeed. Yet so far I understood you didn't like de-registration to
>>> both not do the cleanup itself and fail if there are outstanding
>>> entries.
>>
>> No, I think regarding deregistering while there were outstanding
>> entries, I said the opposite -- that there's no point in failing the
>> de-registration, because a poorly-behaved ioreq server may just ignore
>> the error code and exit anyway.  Although, thinking on it again, I
>> suppose that an error code would allow a buggy ioreq server to know that
>> it had screwed up somewhere.
> 
> Not exactly, I think: The failed de-registration ought to lead to failure
> of an attempt to register another ioreq server (or the same one again),
> which should make the issue quickly noticable.

Hmm... yes, the more I think about it the more it seems like allowing
p2m entries from a previous ioreq server to be already set when there's
a new ioreq server registration is digging a hole for future people to
fall into.  Paul and Yu Zhang are the most likely people to fall into
that hole, so I haven't been arguing strenuously so far against it, but
given that I'm not yet convinced that fixing it is that difficult, at
very least I would strongly recommend them to reconsider.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22  9:47                                                   ` George Dunlap
@ 2016-06-22 10:07                                                     ` Yu Zhang
  2016-06-22 11:33                                                       ` George Dunlap
  2016-06-22 10:10                                                     ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-22 10:07 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/22/2016 5:47 PM, George Dunlap wrote:
> On 22/06/16 10:29, Jan Beulich wrote:
>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>> deems necessary.
>>>>>>>>>>
>>>>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>>>>> But in such
>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>> flipping(i.e. by calling
>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>>>> p2m_ioreq_server pages.
>>>>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>>>>> paging structure all at once, right?
>>>>>> Yes.
>>>>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>> "enumerate all such entries" one.
>>> I'm sorry, I think I'm missing something here.  What do we need the
>>> enumeration for?
>> We'd need that if we were to do the cleanup in the hypervisor (as
>> we can't rely on all p2m entry re-calculation to have happened by
>> the time a new ioreq server registers for the type).
> So you're afraid of this sequence of events?
> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type change
> 2) gfn N is marked as misconfigured
> 3) Server B registers and marks gfn N as ioreq_server
> 4) When N is accessed, the misconfiguration is resolved incorrectly to
> ram_rw
>
> But that can't happen, because misconfigured entries are resolved before
> setting a p2m entry; so at step 3, gfn N will be first set to
> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
> ioreq_server.
>
> Or is there another sequence of events that I'm missing?

Thanks for your reply, George. :)
If no log dirty is triggered during this process, your sequence is correct.
However, if log dirty is triggered, we'll met problems. I have described 
this
in previous mails :

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
on Jun 20

and

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
on Jun 21

>>>>> Well I had in principle already agreed to letting this be the interface
>>>>> on the previous round of patches; we're having this discussion because
>>>>> you (Jan) asked about what happens if an ioreq server is de-registered
>>>>> while there are still outstanding p2m types. :-)
>>>> Indeed. Yet so far I understood you didn't like de-registration to
>>>> both not do the cleanup itself and fail if there are outstanding
>>>> entries.
>>> No, I think regarding deregistering while there were outstanding
>>> entries, I said the opposite -- that there's no point in failing the
>>> de-registration, because a poorly-behaved ioreq server may just ignore
>>> the error code and exit anyway.  Although, thinking on it again, I
>>> suppose that an error code would allow a buggy ioreq server to know that
>>> it had screwed up somewhere.
>> Not exactly, I think: The failed de-registration ought to lead to failure
>> of an attempt to register another ioreq server (or the same one again),
>> which should make the issue quickly noticable.
> Hmm... yes, the more I think about it the more it seems like allowing
> p2m entries from a previous ioreq server to be already set when there's
> a new ioreq server registration is digging a hole for future people to
> fall into.  Paul and Yu Zhang are the most likely people to fall into
> that hole, so I haven't been arguing strenuously so far against it, but
> given that I'm not yet convinced that fixing it is that difficult, at

Taking the live migration into consideration, I admit I still have not 
found any light-
weighted solution to fix this. :)

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22  9:47                                                   ` George Dunlap
  2016-06-22 10:07                                                     ` Yu Zhang
@ 2016-06-22 10:10                                                     ` Jan Beulich
  2016-06-22 10:15                                                       ` George Dunlap
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-22 10:10 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

>>> On 22.06.16 at 11:47, <george.dunlap@citrix.com> wrote:
> On 22/06/16 10:29, Jan Beulich wrote:
>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>> deems necessary.
>>>>>>>>>>
>>>>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>>>>> But in such
>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>> flipping(i.e. by calling
>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>>>> p2m_ioreq_server pages.
>>>>>>>
>>>>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>>>>> paging structure all at once, right?
>>>>>>
>>>>>> Yes.
>>>>>
>>>>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
>>>>
>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>> "enumerate all such entries" one.
>>>
>>> I'm sorry, I think I'm missing something here.  What do we need the
>>> enumeration for?
>> 
>> We'd need that if we were to do the cleanup in the hypervisor (as
>> we can't rely on all p2m entry re-calculation to have happened by
>> the time a new ioreq server registers for the type).
> 
> So you're afraid of this sequence of events?
> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type change
> 2) gfn N is marked as misconfigured
> 3) Server B registers and marks gfn N as ioreq_server
> 4) When N is accessed, the misconfiguration is resolved incorrectly to
> ram_rw
> 
> But that can't happen, because misconfigured entries are resolved before
> setting a p2m entry; so at step 3, gfn N will be first set to
> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
> ioreq_server.
> 
> Or is there another sequence of events that I'm missing?

1) Server A marks GFN Y as ioreq_server
2) Server A de-registered, triggering a ioreq_server -> ram_rw type
   change
3) Server B registers and gfn Y still didn't become ram_rw again (as
   the misconfiguration didn't trickle down the tree far enough)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22 10:10                                                     ` Jan Beulich
@ 2016-06-22 10:15                                                       ` George Dunlap
  2016-06-22 11:50                                                         ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-22 10:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

On 22/06/16 11:10, Jan Beulich wrote:
>>>> On 22.06.16 at 11:47, <george.dunlap@citrix.com> wrote:
>> On 22/06/16 10:29, Jan Beulich wrote:
>>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>>> deems necessary.
>>>>>>>>>>>
>>>>>>>>>> Well, I agree this part of XenGT should be disabled during migration.
>>>>>>>>>> But in such
>>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>>> flipping(i.e. by calling
>>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>>> I agree - this would seem to be the simpler model here, despite (as
>>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>>>>> p2m_ioreq_server pages.
>>>>>>>>
>>>>>>>> Well, for me, the "easy way" means we should avoid traversing the whole ept
>>>>>>>> paging structure all at once, right?
>>>>>>>
>>>>>>> Yes.
>>>>>>
>>>>>> Does calling p2m_change_entry_type_global() not satisfy this requirement?
>>>>>
>>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>>> "enumerate all such entries" one.
>>>>
>>>> I'm sorry, I think I'm missing something here.  What do we need the
>>>> enumeration for?
>>>
>>> We'd need that if we were to do the cleanup in the hypervisor (as
>>> we can't rely on all p2m entry re-calculation to have happened by
>>> the time a new ioreq server registers for the type).
>>
>> So you're afraid of this sequence of events?
>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type change
>> 2) gfn N is marked as misconfigured
>> 3) Server B registers and marks gfn N as ioreq_server
>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>> ram_rw
>>
>> But that can't happen, because misconfigured entries are resolved before
>> setting a p2m entry; so at step 3, gfn N will be first set to
>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>> ioreq_server.
>>
>> Or is there another sequence of events that I'm missing?
> 
> 1) Server A marks GFN Y as ioreq_server
> 2) Server A de-registered, triggering a ioreq_server -> ram_rw type
>    change
> 3) Server B registers and gfn Y still didn't become ram_rw again (as
>    the misconfiguration didn't trickle down the tree far enough)

There are some missing steps here.  Gfn Y is still misconfigured, right?
 What will happen when the misconfiguration is resolved?  Will it not
become ram_rw?  If not, what would it change to and why?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22 10:07                                                     ` Yu Zhang
@ 2016-06-22 11:33                                                       ` George Dunlap
  2016-06-23  7:37                                                         ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-22 11:33 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 22/06/16 11:07, Yu Zhang wrote:
> 
> 
> On 6/22/2016 5:47 PM, George Dunlap wrote:
>> On 22/06/16 10:29, Jan Beulich wrote:
>>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>>> deems necessary.
>>>>>>>>>>>
>>>>>>>>>> Well, I agree this part of XenGT should be disabled during
>>>>>>>>>> migration.
>>>>>>>>>> But in such
>>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>>> flipping(i.e. by calling
>>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>>> I agree - this would seem to be the simpler model here, despite
>>>>>>>>> (as
>>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>>>>> p2m_ioreq_server pages.
>>>>>>>> Well, for me, the "easy way" means we should avoid traversing
>>>>>>>> the whole ept
>>>>>>>> paging structure all at once, right?
>>>>>>> Yes.
>>>>>> Does calling p2m_change_entry_type_global() not satisfy this
>>>>>> requirement?
>>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>>> "enumerate all such entries" one.
>>>> I'm sorry, I think I'm missing something here.  What do we need the
>>>> enumeration for?
>>> We'd need that if we were to do the cleanup in the hypervisor (as
>>> we can't rely on all p2m entry re-calculation to have happened by
>>> the time a new ioreq server registers for the type).
>> So you're afraid of this sequence of events?
>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type
>> change
>> 2) gfn N is marked as misconfigured
>> 3) Server B registers and marks gfn N as ioreq_server
>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>> ram_rw
>>
>> But that can't happen, because misconfigured entries are resolved before
>> setting a p2m entry; so at step 3, gfn N will be first set to
>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>> ioreq_server.
>>
>> Or is there another sequence of events that I'm missing?
> 
> Thanks for your reply, George. :)
> If no log dirty is triggered during this process, your sequence is correct.
> However, if log dirty is triggered, we'll met problems. I have described
> this
> in previous mails :
> 
> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
> on Jun 20
> 
> and
> 
> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
> on Jun 21

Right -- sorry, now I see the issue:

1. Server A marks gfn X as ioreq_server
2. Server A deregisters, gfn X misconfigured
3. Server B registers, marks gfn Y as ioreq_server
4. Logdirty mode enabled; gfn Y misconfigured
5. When X or Y are accessed, resolve_misconfigure() has no way of
telling whether the entry is from server A (which should be set to
logdirty) or from server B (which should be left as ioreq_server).

In a sense this is a deficiency in the change_entry_type_global()
interface.  A common OS principle is "make the common case fast, and the
uncommon case correct".  The scenario described above seems to me to be
an uncommon case which is handled quickly but incorrectly; ideally we
should handle it correctly, even if it's not very quick.

Synchronously resolving a previous misconfig is probably the most
straightforward thing to do.  It could be done at point #3, when an M->N
type change is not complete and a new p2m entry of type M is written; it
could be at point #4, when an N->O type change is initiated while an
M->N type change hasn't completed.  Or it could be when an N->O type
change happens while there are unfinished M->N transitions *and*
post-type-change M entries.

But, that's a lot of somewhat complicated work for a scenario that is
unlikely to happen in practice, and I can see why Yu Zhang would feel
reluctant to do it.

For the time being though, this will fail at #4, right?  That is,
logdirty mode cannot be enabled while server B is registered?

That does mean we'd be forced to sort out the situation before we allow
logdirty and ioreq_server to be used at the same time, but that doesn't
really seem like such a bad idea to me.

I'm still open to being convinced, but at the moment it really seems to
me like improving the situation is the better long-term option.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22 10:15                                                       ` George Dunlap
@ 2016-06-22 11:50                                                         ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-22 11:50 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

>>> On 22.06.16 at 12:15, <george.dunlap@citrix.com> wrote:
> On 22/06/16 11:10, Jan Beulich wrote:
>>>>> On 22.06.16 at 11:47, <george.dunlap@citrix.com> wrote:
>>> So you're afraid of this sequence of events?
>>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type change
>>> 2) gfn N is marked as misconfigured
>>> 3) Server B registers and marks gfn N as ioreq_server
>>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>>> ram_rw
>>>
>>> But that can't happen, because misconfigured entries are resolved before
>>> setting a p2m entry; so at step 3, gfn N will be first set to
>>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>>> ioreq_server.
>>>
>>> Or is there another sequence of events that I'm missing?
>> 
>> 1) Server A marks GFN Y as ioreq_server
>> 2) Server A de-registered, triggering a ioreq_server -> ram_rw type
>>    change
>> 3) Server B registers and gfn Y still didn't become ram_rw again (as
>>    the misconfiguration didn't trickle down the tree far enough)
> 
> There are some missing steps here.  Gfn Y is still misconfigured, right?
>  What will happen when the misconfiguration is resolved?  Will it not
> become ram_rw?  If not, what would it change to and why?

Oh, right, of course.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-22 11:33                                                       ` George Dunlap
@ 2016-06-23  7:37                                                         ` Yu Zhang
  2016-06-23 10:33                                                           ` George Dunlap
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-23  7:37 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/22/2016 7:33 PM, George Dunlap wrote:
> On 22/06/16 11:07, Yu Zhang wrote:
>>
>> On 6/22/2016 5:47 PM, George Dunlap wrote:
>>> On 22/06/16 10:29, Jan Beulich wrote:
>>>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>>>> deems necessary.
>>>>>>>>>>>>
>>>>>>>>>>> Well, I agree this part of XenGT should be disabled during
>>>>>>>>>>> migration.
>>>>>>>>>>> But in such
>>>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>>>> flipping(i.e. by calling
>>>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>>>> I agree - this would seem to be the simpler model here, despite
>>>>>>>>>> (as
>>>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be reasonable
>>>>>>>>>> only if there was an easy way for the hypervisor to enumerate all
>>>>>>>>>> p2m_ioreq_server pages.
>>>>>>>>> Well, for me, the "easy way" means we should avoid traversing
>>>>>>>>> the whole ept
>>>>>>>>> paging structure all at once, right?
>>>>>>>> Yes.
>>>>>>> Does calling p2m_change_entry_type_global() not satisfy this
>>>>>>> requirement?
>>>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>>>> "enumerate all such entries" one.
>>>>> I'm sorry, I think I'm missing something here.  What do we need the
>>>>> enumeration for?
>>>> We'd need that if we were to do the cleanup in the hypervisor (as
>>>> we can't rely on all p2m entry re-calculation to have happened by
>>>> the time a new ioreq server registers for the type).
>>> So you're afraid of this sequence of events?
>>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type
>>> change
>>> 2) gfn N is marked as misconfigured
>>> 3) Server B registers and marks gfn N as ioreq_server
>>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>>> ram_rw
>>>
>>> But that can't happen, because misconfigured entries are resolved before
>>> setting a p2m entry; so at step 3, gfn N will be first set to
>>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>>> ioreq_server.
>>>
>>> Or is there another sequence of events that I'm missing?
>> Thanks for your reply, George. :)
>> If no log dirty is triggered during this process, your sequence is correct.
>> However, if log dirty is triggered, we'll met problems. I have described
>> this
>> in previous mails :
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
>> on Jun 20
>>
>> and
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
>> on Jun 21
> Right -- sorry, now I see the issue:
>
> 1. Server A marks gfn X as ioreq_server
> 2. Server A deregisters, gfn X misconfigured
> 3. Server B registers, marks gfn Y as ioreq_server
> 4. Logdirty mode enabled; gfn Y misconfigured
> 5. When X or Y are accessed, resolve_misconfigure() has no way of
> telling whether the entry is from server A (which should be set to
> logdirty) or from server B (which should be left as ioreq_server).

Exactly.  :)
Another simpler scenario would be
1. Server A marks gfn X as p2m_ioreq_server;
2. Logdirty mode enabled; gfn X misconfigured;
3. When X is written, it will not cause ept vioalation, but ept 
misconfig, and the
resolve_misconfig() would set gfn X back to p2m_ram_rw, thereafter we 
can not
track access to X;

Note: Not resetting the p2m type for p2m_ioreq_server when 
p2m->ioreq_server is
not NULL is suitable for this simpler scenario, but is not correct if 
take your scenario
into account.

The core reason is I could not find a simple solution in 
resolve_misconfig() to handle
handle both the outdated p2m_ioreq_server entries, the in-use ones and 
to support
the logdirty feature at the same time.

> In a sense this is a deficiency in the change_entry_type_global()
> interface.  A common OS principle is "make the common case fast, and the
> uncommon case correct".  The scenario described above seems to me to be
> an uncommon case which is handled quickly but incorrectly; ideally we
> should handle it correctly, even if it's not very quick.
>
> Synchronously resolving a previous misconfig is probably the most
> straightforward thing to do.  It could be done at point #3, when an M->N
> type change is not complete and a new p2m entry of type M is written; it
> could be at point #4, when an N->O type change is initiated while an
> M->N type change hasn't completed.  Or it could be when an N->O type
> change happens while there are unfinished M->N transitions *and*
> post-type-change M entries.

Sorry, I did not quite get it.  Could you please elaborate more? Thanks! :)

>
> But, that's a lot of somewhat complicated work for a scenario that is
> unlikely to happen in practice, and I can see why Yu Zhang would feel
> reluctant to do it.

Yes, that's part of my reasons. :)

>
> For the time being though, this will fail at #4, right?  That is,
> logdirty mode cannot be enabled while server B is registered?
>
> That does mean we'd be forced to sort out the situation before we allow
> logdirty and ioreq_server to be used at the same time, but that doesn't
> really seem like such a bad idea to me.

One solution I thought of is to just return failure in 
hap_enable_log_dirty()
if p2m->ioreq.server is not NULL. But I did not choose such approach, 
because:

1> I still want to keep the logdirty feature so that XenGT can use it to 
keep track
of the dirty rams when we support live migration in the future;

2> I also agree with Paul's argument: it is device model's duty to do 
the p2m type
resetting work.

> I'm still open to being convinced, but at the moment it really seems to
> me like improving the situation is the better long-term option.
>

Thanks for all your advices, George. I'm also willing to taking other 
advices, if we have
a more acceptable(for you, Jan and other maintainers) resync approach in 
hypervisor,
I'd like  to add this. If the code is too complicated, I can submit it 
in a separate new
patchset. :)


B.R.
Yu



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-23  7:37                                                         ` Yu Zhang
@ 2016-06-23 10:33                                                           ` George Dunlap
  2016-06-24  4:16                                                             ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2016-06-23 10:33 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima

On 23/06/16 08:37, Yu Zhang wrote:
> On 6/22/2016 7:33 PM, George Dunlap wrote:
>> On 22/06/16 11:07, Yu Zhang wrote:
>>>
>>> On 6/22/2016 5:47 PM, George Dunlap wrote:
>>>> On 22/06/16 10:29, Jan Beulich wrote:
>>>>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>>>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>>>>> deems necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>> Well, I agree this part of XenGT should be disabled during
>>>>>>>>>>>> migration.
>>>>>>>>>>>> But in such
>>>>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>>>>> flipping(i.e. by calling
>>>>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>>>>> I agree - this would seem to be the simpler model here, despite
>>>>>>>>>>> (as
>>>>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be
>>>>>>>>>>> reasonable
>>>>>>>>>>> only if there was an easy way for the hypervisor to enumerate
>>>>>>>>>>> all
>>>>>>>>>>> p2m_ioreq_server pages.
>>>>>>>>>> Well, for me, the "easy way" means we should avoid traversing
>>>>>>>>>> the whole ept
>>>>>>>>>> paging structure all at once, right?
>>>>>>>>> Yes.
>>>>>>>> Does calling p2m_change_entry_type_global() not satisfy this
>>>>>>>> requirement?
>>>>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>>>>> "enumerate all such entries" one.
>>>>>> I'm sorry, I think I'm missing something here.  What do we need the
>>>>>> enumeration for?
>>>>> We'd need that if we were to do the cleanup in the hypervisor (as
>>>>> we can't rely on all p2m entry re-calculation to have happened by
>>>>> the time a new ioreq server registers for the type).
>>>> So you're afraid of this sequence of events?
>>>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type
>>>> change
>>>> 2) gfn N is marked as misconfigured
>>>> 3) Server B registers and marks gfn N as ioreq_server
>>>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>>>> ram_rw
>>>>
>>>> But that can't happen, because misconfigured entries are resolved
>>>> before
>>>> setting a p2m entry; so at step 3, gfn N will be first set to
>>>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>>>> ioreq_server.
>>>>
>>>> Or is there another sequence of events that I'm missing?
>>> Thanks for your reply, George. :)
>>> If no log dirty is triggered during this process, your sequence is
>>> correct.
>>> However, if log dirty is triggered, we'll met problems. I have described
>>> this
>>> in previous mails :
>>>
>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
>>>
>>> on Jun 20
>>>
>>> and
>>>
>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
>>>
>>> on Jun 21
>> Right -- sorry, now I see the issue:
>>
>> 1. Server A marks gfn X as ioreq_server
>> 2. Server A deregisters, gfn X misconfigured
>> 3. Server B registers, marks gfn Y as ioreq_server
>> 4. Logdirty mode enabled; gfn Y misconfigured
>> 5. When X or Y are accessed, resolve_misconfigure() has no way of
>> telling whether the entry is from server A (which should be set to
>> logdirty) or from server B (which should be left as ioreq_server).
> 
> Exactly.  :)
> Another simpler scenario would be
> 1. Server A marks gfn X as p2m_ioreq_server;
> 2. Logdirty mode enabled; gfn X misconfigured;
> 3. When X is written, it will not cause ept vioalation, but ept
> misconfig, and the
> resolve_misconfig() would set gfn X back to p2m_ram_rw, thereafter we
> can not
> track access to X;

Right, so this is a reason that simply making misconfigurations always
resolve ioreq_server into ram_rw isn't compatible with logdirty.

> Note: Not resetting the p2m type for p2m_ioreq_server when
> p2m->ioreq_server is
> not NULL is suitable for this simpler scenario, but is not correct if
> take your scenario
> into account.
> 
> The core reason is I could not find a simple solution in
> resolve_misconfig() to handle
> handle both the outdated p2m_ioreq_server entries, the in-use ones and
> to support
> the logdirty feature at the same time.

Indeed; and as I said, the real problem is that
p2m_change_entry_type_global() isn't really properly abstracted; in
order to use it you need to know how it works and be careful not to use
it at the wrong time.

Short-term, thinking through a handful of the scenarios we want to
support should be good enough.  Long-term, making it more robust so that
we don't have to think so hard about is probably better.

>> In a sense this is a deficiency in the change_entry_type_global()
>> interface.  A common OS principle is "make the common case fast, and the
>> uncommon case correct".  The scenario described above seems to me to be
>> an uncommon case which is handled quickly but incorrectly; ideally we
>> should handle it correctly, even if it's not very quick.
>>
>> Synchronously resolving a previous misconfig is probably the most
>> straightforward thing to do.  It could be done at point #3, when an M->N
>> type change is not complete and a new p2m entry of type M is written; it
>> could be at point #4, when an N->O type change is initiated while an
>> M->N type change hasn't completed.  Or it could be when an N->O type
>> change happens while there are unfinished M->N transitions *and*
>> post-type-change M entries.
> 
> Sorry, I did not quite get it.  Could you please elaborate more? Thanks! :)

Well the basic idea is to make change_entry_type_global() *appear* to
all external callers as though the change happened immediately.  And the
basic problem is that at the moment, you can start a second
change_entry_type_global() before the first one has actually finished
changing all the types it was meant to change.  So the improvement is to
make sure that all the types which need to be changed actually get
changed before the second invocation starts.

The absolute simplest thing to do would be to make it actually search
through the p2m table and make the change immediately.  But we'd like to
avoid this if we can because it's so slow.

So the next simplest thing to do would be that when someone calls
change_entry_type_global() a second time, you go through every
misconfigured entry and resolve it, so that the new
change_entry_type_global() starts with a clean slate.  This should be
faster than just sweeping the whole p2m table, since we only need to
check the p2m entries that haven't been touched since we did the type
change, but it may still be a lot of work.

So there are other optimizations we might be able to make to try to
avoid going through and re-syncing things; and those are the examples
that I gave.

>> For the time being though, this will fail at #4, right?  That is,
>> logdirty mode cannot be enabled while server B is registered?
>>
>> That does mean we'd be forced to sort out the situation before we allow
>> logdirty and ioreq_server to be used at the same time, but that doesn't
>> really seem like such a bad idea to me.
> 
> One solution I thought of is to just return failure in
> hap_enable_log_dirty()
> if p2m->ioreq.server is not NULL. But I did not choose such approach,
> because:
> 
> 1> I still want to keep the logdirty feature so that XenGT can use it to
> keep track
> of the dirty rams when we support live migration in the future;

But that's not something you can set in stone -- you can have it return
-EBUSY now, and then at such time as you add dirty vram support for
ioreq_server p2m types (which I don't think should be hard at all,
actually), you can remove that restriction.

The fact is that at he moment, setting logdirty *will not work*; and the
best interface is one in which broken things cannot happen even by
accident.  Regardless of what we end up deciding wrt Xen changing the
entries, I think that Xen should refuse to enable logdirty mode when
there is an ioreq server registered for ioreq_server p2m entries.

> 2> I also agree with Paul's argument: it is device model's duty to do
> the p2m type
> resetting work.

But that's not really the point.  We all agree that people should look
where they're going and it's generally an individual's responsibility
not to fall into pits or run into things in the road or on the sidewalk.
 But that doesn't mean that it's therefore OK to dig a pit with spikes
at the bottom in the middle of where people normally walk or cycle.
Because even though it is an individual person's responsibility not to
walk into holes, occasionally people are distracted or don't see well or
make mistakes; and the consequences for being temporarily distracted
should never be "falling onto a bed of sharp spikes". :-)  If you do
have to dig a hole in the sidewalk, then at very least you need to put a
physical barrier around it, so that the consequences for being
distracted are "runs into a barrier" rather than "falls into a pit".
But the best of all, if you can manage it, is not to dig the hole at all.

Similarly, even if it could be in theory the device model's duty to
reset the p2m entries it changed, it's still the case that programmers
make mistakes.  When those mistakes happen, we at very least want it to
be as easy to figure out what the problem is as possible; and if we can,
we want to make those mistakes completely harmless.

Making it possible for ioreq server A's entries to remain outstanding
after ioreq server B connects is the programming equivalent of leaving a
big open hole in the middle of a sidewalk: it means that when there's a
mistake made, there aren't any obvious immediate failures that tell you,
"Server A forgot to release some entries".  Instead, you will get random
failures, as bits of memory behave strangely or run very slowly for no
apparent reason.

Having it impossible to connect ioreq server B if there are still
outstanding entries from ioreq server A is the equivalent of digging a
hole and then putting up a barrier.  Now you get a failure when you try
it, and you're told exactly what the problem is -- the last guy didn't
release all his entries.  If you don't have the code you're still a bit
stuck, but at least you know it just doesn't work, rather than failing
in mysterious and difficult to detect ways later.

Having Xen reset the entries just makes this entire problem go away --
it's like accessing whatever you needed to access by digging sideways
from the storm drains, rather than digging a hole in the sidewalk.

It's all well and good to say, "It's the device model's responsibility",
but we need to plan on programmers making mistakes.  (And we also need
to plan for administrators making mistakes, which is why I think
returning -EBUSY when you try to enable logdirty with

>> I'm still open to being convinced, but at the moment it really seems to
>> me like improving the situation is the better long-term option.
>>
> 
> Thanks for all your advices, George. I'm also willing to taking other
> advices, if we have
> a more acceptable(for you, Jan and other maintainers) resync approach in
> hypervisor,
> I'd like  to add this. If the code is too complicated, I can submit it
> in a separate new
> patchset. :)

Well I think sometime in early July I should be able to make some time
to take a look at it properly.  Maybe I can start with a "draft" patch,
and you can take it and fix it up and make it work.  Or maybe I'll find
it's actually too complicated, and then agree with you that relying on
the server to clean up after itself is the only option. :-)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-23 10:33                                                           ` George Dunlap
@ 2016-06-24  4:16                                                             ` Yu Zhang
  2016-06-24  6:12                                                               ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-24  4:16 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, JunNakajima



On 6/23/2016 6:33 PM, George Dunlap wrote:
> On 23/06/16 08:37, Yu Zhang wrote:
>> On 6/22/2016 7:33 PM, George Dunlap wrote:
>>> On 22/06/16 11:07, Yu Zhang wrote:
>>>> On 6/22/2016 5:47 PM, George Dunlap wrote:
>>>>> On 22/06/16 10:29, Jan Beulich wrote:
>>>>>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>>>>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>>>>>> deems necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, I agree this part of XenGT should be disabled during
>>>>>>>>>>>>> migration.
>>>>>>>>>>>>> But in such
>>>>>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>>>>>> flipping(i.e. by calling
>>>>>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>>>>>> I agree - this would seem to be the simpler model here, despite
>>>>>>>>>>>> (as
>>>>>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be
>>>>>>>>>>>> reasonable
>>>>>>>>>>>> only if there was an easy way for the hypervisor to enumerate
>>>>>>>>>>>> all
>>>>>>>>>>>> p2m_ioreq_server pages.
>>>>>>>>>>> Well, for me, the "easy way" means we should avoid traversing
>>>>>>>>>>> the whole ept
>>>>>>>>>>> paging structure all at once, right?
>>>>>>>>>> Yes.
>>>>>>>>> Does calling p2m_change_entry_type_global() not satisfy this
>>>>>>>>> requirement?
>>>>>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>>>>>> "enumerate all such entries" one.
>>>>>>> I'm sorry, I think I'm missing something here.  What do we need the
>>>>>>> enumeration for?
>>>>>> We'd need that if we were to do the cleanup in the hypervisor (as
>>>>>> we can't rely on all p2m entry re-calculation to have happened by
>>>>>> the time a new ioreq server registers for the type).
>>>>> So you're afraid of this sequence of events?
>>>>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type
>>>>> change
>>>>> 2) gfn N is marked as misconfigured
>>>>> 3) Server B registers and marks gfn N as ioreq_server
>>>>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>>>>> ram_rw
>>>>>
>>>>> But that can't happen, because misconfigured entries are resolved
>>>>> before
>>>>> setting a p2m entry; so at step 3, gfn N will be first set to
>>>>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>>>>> ioreq_server.
>>>>>
>>>>> Or is there another sequence of events that I'm missing?
>>>> Thanks for your reply, George. :)
>>>> If no log dirty is triggered during this process, your sequence is
>>>> correct.
>>>> However, if log dirty is triggered, we'll met problems. I have described
>>>> this
>>>> in previous mails :
>>>>
>>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
>>>>
>>>> on Jun 20
>>>>
>>>> and
>>>>
>>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
>>>>
>>>> on Jun 21
>>> Right -- sorry, now I see the issue:
>>>
>>> 1. Server A marks gfn X as ioreq_server
>>> 2. Server A deregisters, gfn X misconfigured
>>> 3. Server B registers, marks gfn Y as ioreq_server
>>> 4. Logdirty mode enabled; gfn Y misconfigured
>>> 5. When X or Y are accessed, resolve_misconfigure() has no way of
>>> telling whether the entry is from server A (which should be set to
>>> logdirty) or from server B (which should be left as ioreq_server).
>> Exactly.  :)
>> Another simpler scenario would be
>> 1. Server A marks gfn X as p2m_ioreq_server;
>> 2. Logdirty mode enabled; gfn X misconfigured;
>> 3. When X is written, it will not cause ept vioalation, but ept
>> misconfig, and the
>> resolve_misconfig() would set gfn X back to p2m_ram_rw, thereafter we
>> can not
>> track access to X;
> Right, so this is a reason that simply making misconfigurations always
> resolve ioreq_server into ram_rw isn't compatible with logdirty.
>
>> Note: Not resetting the p2m type for p2m_ioreq_server when
>> p2m->ioreq_server is
>> not NULL is suitable for this simpler scenario, but is not correct if
>> take your scenario
>> into account.
>>
>> The core reason is I could not find a simple solution in
>> resolve_misconfig() to handle
>> handle both the outdated p2m_ioreq_server entries, the in-use ones and
>> to support
>> the logdirty feature at the same time.
> Indeed; and as I said, the real problem is that
> p2m_change_entry_type_global() isn't really properly abstracted; in
> order to use it you need to know how it works and be careful not to use
> it at the wrong time.
>
> Short-term, thinking through a handful of the scenarios we want to
> support should be good enough.  Long-term, making it more robust so that
> we don't have to think so hard about is probably better.
>
>>> In a sense this is a deficiency in the change_entry_type_global()
>>> interface.  A common OS principle is "make the common case fast, and the
>>> uncommon case correct".  The scenario described above seems to me to be
>>> an uncommon case which is handled quickly but incorrectly; ideally we
>>> should handle it correctly, even if it's not very quick.
>>>
>>> Synchronously resolving a previous misconfig is probably the most
>>> straightforward thing to do.  It could be done at point #3, when an M->N
>>> type change is not complete and a new p2m entry of type M is written; it
>>> could be at point #4, when an N->O type change is initiated while an
>>> M->N type change hasn't completed.  Or it could be when an N->O type
>>> change happens while there are unfinished M->N transitions *and*
>>> post-type-change M entries.
>> Sorry, I did not quite get it.  Could you please elaborate more? Thanks! :)
> Well the basic idea is to make change_entry_type_global() *appear* to
> all external callers as though the change happened immediately.  And the
> basic problem is that at the moment, you can start a second
> change_entry_type_global() before the first one has actually finished
> changing all the types it was meant to change.  So the improvement is to
> make sure that all the types which need to be changed actually get
> changed before the second invocation starts.
>
> The absolute simplest thing to do would be to make it actually search
> through the p2m table and make the change immediately.  But we'd like to
> avoid this if we can because it's so slow.
>
> So the next simplest thing to do would be that when someone calls
> change_entry_type_global() a second time, you go through every
> misconfigured entry and resolve it, so that the new
> change_entry_type_global() starts with a clean slate.  This should be
> faster than just sweeping the whole p2m table, since we only need to
> check the p2m entries that haven't been touched since we did the type
> change, but it may still be a lot of work.
>
> So there are other optimizations we might be able to make to try to
> avoid going through and re-syncing things; and those are the examples
> that I gave.
>
>>> For the time being though, this will fail at #4, right?  That is,
>>> logdirty mode cannot be enabled while server B is registered?
>>>
>>> That does mean we'd be forced to sort out the situation before we allow
>>> logdirty and ioreq_server to be used at the same time, but that doesn't
>>> really seem like such a bad idea to me.
>> One solution I thought of is to just return failure in
>> hap_enable_log_dirty()
>> if p2m->ioreq.server is not NULL. But I did not choose such approach,
>> because:
>>
>> 1> I still want to keep the logdirty feature so that XenGT can use it to
>> keep track
>> of the dirty rams when we support live migration in the future;
> But that's not something you can set in stone -- you can have it return
> -EBUSY now, and then at such time as you add dirty vram support for
> ioreq_server p2m types (which I don't think should be hard at all,
> actually), you can remove that restriction.

OK. Returning -EBUSY is fine to me.
In fact, I do not really worry about the tracking of ioreq_server gfns, 
I was
worrying about the normal dirty ram pages. But if p2m_change_entry_type()
can be enhanced in the future, I can remove that restriction then. :)

> The fact is that at he moment, setting logdirty *will not work*; and the
> best interface is one in which broken things cannot happen even by
> accident.  Regardless of what we end up deciding wrt Xen changing the
> entries, I think that Xen should refuse to enable logdirty mode when
> there is an ioreq server registered for ioreq_server p2m entries.
>
>> 2> I also agree with Paul's argument: it is device model's duty to do
>> the p2m type
>> resetting work.
> But that's not really the point.  We all agree that people should look
> where they're going and it's generally an individual's responsibility
> not to fall into pits or run into things in the road or on the sidewalk.
>   But that doesn't mean that it's therefore OK to dig a pit with spikes
> at the bottom in the middle of where people normally walk or cycle.
> Because even though it is an individual person's responsibility not to
> walk into holes, occasionally people are distracted or don't see well or
> make mistakes; and the consequences for being temporarily distracted
> should never be "falling onto a bed of sharp spikes". :-)  If you do
> have to dig a hole in the sidewalk, then at very least you need to put a
> physical barrier around it, so that the consequences for being
> distracted are "runs into a barrier" rather than "falls into a pit".
> But the best of all, if you can manage it, is not to dig the hole at all.
>
> Similarly, even if it could be in theory the device model's duty to
> reset the p2m entries it changed, it's still the case that programmers
> make mistakes.  When those mistakes happen, we at very least want it to
> be as easy to figure out what the problem is as possible; and if we can,
> we want to make those mistakes completely harmless.
>
> Making it possible for ioreq server A's entries to remain outstanding
> after ioreq server B connects is the programming equivalent of leaving a
> big open hole in the middle of a sidewalk: it means that when there's a
> mistake made, there aren't any obvious immediate failures that tell you,
> "Server A forgot to release some entries".  Instead, you will get random
> failures, as bits of memory behave strangely or run very slowly for no
> apparent reason.
>
> Having it impossible to connect ioreq server B if there are still
> outstanding entries from ioreq server A is the equivalent of digging a
> hole and then putting up a barrier.  Now you get a failure when you try
> it, and you're told exactly what the problem is -- the last guy didn't
> release all his entries.  If you don't have the code you're still a bit
> stuck, but at least you know it just doesn't work, rather than failing
> in mysterious and difficult to detect ways later.
>
> Having Xen reset the entries just makes this entire problem go away --
> it's like accessing whatever you needed to access by digging sideways
> from the storm drains, rather than digging a hole in the sidewalk.
>
> It's all well and good to say, "It's the device model's responsibility",
> but we need to plan on programmers making mistakes.  (And we also need
> to plan for administrators making mistakes, which is why I think
> returning -EBUSY when you try to enable logdirty with

Hah. This is a very good metaphor. I am convinced. :)
Though I have doubts about how to refactor the 
p2m_change_entry_type_global() interface,
I'm now willing to take your suggestions:
a> still need the p2m resetting when ioreq server is unbounded;
b> disable log dirty feature if one ioreq server is bounded.

Does anyone else has different opinions? Thanks!

>>> I'm still open to being convinced, but at the moment it really seems to
>>> me like improving the situation is the better long-term option.
>>>
>> Thanks for all your advices, George. I'm also willing to taking other
>> advices, if we have
>> a more acceptable(for you, Jan and other maintainers) resync approach in
>> hypervisor,
>> I'd like  to add this. If the code is too complicated, I can submit it
>> in a separate new
>> patchset. :)
> Well I think sometime in early July I should be able to make some time
> to take a look at it properly.  Maybe I can start with a "draft" patch,
> and you can take it and fix it up and make it work.  Or maybe I'll find
> it's actually too complicated, and then agree with you that relying on
> the server to clean up after itself is the only option. :-)

Thank you, George. I definitely would like to take this work.
And before that, I think disable the log dirty could be OK for me(
after all, making vGPU live migratible requires more features added).

Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-24  4:16                                                             ` Yu Zhang
@ 2016-06-24  6:12                                                               ` Jan Beulich
  2016-06-24  7:12                                                                 ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-24  6:12 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 24.06.16 at 06:16, <yu.c.zhang@linux.intel.com> wrote:
> I'm now willing to take your suggestions:
> a> still need the p2m resetting when ioreq server is unbounded;
> b> disable log dirty feature if one ioreq server is bounded.
> 
> Does anyone else has different opinions? Thanks!

Hmm, in particular for a) I don't think I read that out of George's
descriptions. But of course much depends on what you really
mean by that: Do you want to say we need to guarantee all
entries get reverted back, or do you instead mean to just kick
off the conversion (via the misconfig mechanism)?

In any event, I think log-dirty shouldn't be disabled when an
ioreq server binds the type, but as long as there are outstanding
entries of that type. That way, the "cannot be migrated" state
of a VM has a chance to clear.

And then, thinking about it again especially in the context of
the hvmctl series - the unbinding of the type is happening in a
hypercall with built-in preemption capability. Hence there's not
really an issue with how long that conversion may take, as
long as there's no need to pause the guest for that time period.
Which means you could first initiate conversion via the
misconfig mechanism, but then immediately go ahead and walk
the entire guest address space (or the relevant part of it, if
the bounds got tracked) with continuations used as necessary.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-24  6:12                                                               ` Jan Beulich
@ 2016-06-24  7:12                                                                 ` Yu Zhang
  2016-06-24  8:01                                                                   ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-24  7:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima



On 6/24/2016 2:12 PM, Jan Beulich wrote:
>>>> On 24.06.16 at 06:16, <yu.c.zhang@linux.intel.com> wrote:
>> I'm now willing to take your suggestions:
>> a> still need the p2m resetting when ioreq server is unbounded;
>> b> disable log dirty feature if one ioreq server is bounded.
>>
>> Does anyone else has different opinions? Thanks!
> Hmm, in particular for a) I don't think I read that out of George's
> descriptions. But of course much depends on what you really
> mean by that: Do you want to say we need to guarantee all
> entries get reverted back, or do you instead mean to just kick
> off the conversion (via the misconfig mechanism)?

Thanks for your reply, Jan. I mean the misconfig mechanism.

>
> In any event, I think log-dirty shouldn't be disabled when an
> ioreq server binds the type, but as long as there are outstanding
> entries of that type. That way, the "cannot be migrated" state
> of a VM has a chance to clear.

Do you mean to disable log dirty by checking if there's outstanding 
p2m_ioreq_server
entries instead of the p2m->ioreq.server? How about we check both? 
Because only
checking the count of outstanding p2m_ioreq_server may prevent the live 
migration
when an ioreq server is unbound, but with p2m type not entirely synced.

> And then, thinking about it again especially in the context of
> the hvmctl series - the unbinding of the type is happening in a
> hypercall with built-in preemption capability. Hence there's not
> really an issue with how long that conversion may take, as
> long as there's no need to pause the guest for that time period.
> Which means you could first initiate conversion via the
> misconfig mechanism, but then immediately go ahead and walk
> the entire guest address space (or the relevant part of it, if
> the bounds got tracked) with continuations used as necessary.

Sorry, what's the misconfig mechanism good for if I still need to sweep 
the entire
p2m table immediately?

Thanks
Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-24  7:12                                                                 ` Yu Zhang
@ 2016-06-24  8:01                                                                   ` Jan Beulich
  2016-06-24  9:57                                                                     ` Yu Zhang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2016-06-24  8:01 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 24.06.16 at 09:12, <yu.c.zhang@linux.intel.com> wrote:
> On 6/24/2016 2:12 PM, Jan Beulich wrote:
>> In any event, I think log-dirty shouldn't be disabled when an
>> ioreq server binds the type, but as long as there are outstanding
>> entries of that type. That way, the "cannot be migrated" state
>> of a VM has a chance to clear.
> 
> Do you mean to disable log dirty by checking if there's outstanding 
> p2m_ioreq_server
> entries instead of the p2m->ioreq.server? How about we check both? 
> Because only
> checking the count of outstanding p2m_ioreq_server may prevent the live 
> migration
> when an ioreq server is unbound, but with p2m type not entirely synced.

Checking both would limit things further, which seems to
contradict your intention of relaxing things. What may be a
reasonable combination is to check for a registered ioreq server
when enabling log-dirty, and for outstanding pages when
registering a new ioreq server. Yet then again it feels like we're
moving in circles: Didn't we mean de-registration to fail when
there are outstanding pages? In which case checking for a
registered server would be slightly more tight than checking for
outstanding pages: The server could have removed all pages,
but not de-registered, which would - afaict - still allow log-dirty
to function (of course new registration of pages would need to
fail until log-dirty got disabled again).

>> And then, thinking about it again especially in the context of
>> the hvmctl series - the unbinding of the type is happening in a
>> hypercall with built-in preemption capability. Hence there's not
>> really an issue with how long that conversion may take, as
>> long as there's no need to pause the guest for that time period.
>> Which means you could first initiate conversion via the
>> misconfig mechanism, but then immediately go ahead and walk
>> the entire guest address space (or the relevant part of it, if
>> the bounds got tracked) with continuations used as necessary.
> 
> Sorry, what's the misconfig mechanism good for if I still need to sweep 
> the entire
> p2m table immediately?

To avoid the need to pause the guest for an extended time: At
least between scheduling a continuation and executing it, the
guest could happily run (and perhaps cause some of the
otherwise synchronous - to the hypercall - work to get carried
out already, with the involved execution time accounted to the
guest instead of the host).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-24  8:01                                                                   ` Jan Beulich
@ 2016-06-24  9:57                                                                     ` Yu Zhang
  2016-06-24 10:27                                                                       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Yu Zhang @ 2016-06-24  9:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima



On 6/24/2016 4:01 PM, Jan Beulich wrote:
>>>> On 24.06.16 at 09:12, <yu.c.zhang@linux.intel.com> wrote:
>> On 6/24/2016 2:12 PM, Jan Beulich wrote:
>>> In any event, I think log-dirty shouldn't be disabled when an
>>> ioreq server binds the type, but as long as there are outstanding
>>> entries of that type. That way, the "cannot be migrated" state
>>> of a VM has a chance to clear.
>> Do you mean to disable log dirty by checking if there's outstanding
>> p2m_ioreq_server
>> entries instead of the p2m->ioreq.server? How about we check both?
>> Because only
>> checking the count of outstanding p2m_ioreq_server may prevent the live
>> migration
>> when an ioreq server is unbound, but with p2m type not entirely synced.
> Checking both would limit things further, which seems to
> contradict your intention of relaxing things. What may be a
> reasonable combination is to check for a registered ioreq server
> when enabling log-dirty, and for outstanding pages when
> registering a new ioreq server. Yet then again it feels like we're
> moving in circles: Didn't we mean de-registration to fail when
> there are outstanding pages? In which case checking for a
> registered server would be slightly more tight than checking for
> outstanding pages: The server could have removed all pages,
> but not de-registered, which would - afaict - still allow log-dirty
> to function (of course new registration of pages would need to
> fail until log-dirty got disabled again).

OK. To disable logdirty, the judgement could be based on the existence of
outstanding p2m_ioreq_sever entries.

>>> And then, thinking about it again especially in the context of
>>> the hvmctl series - the unbinding of the type is happening in a
>>> hypercall with built-in preemption capability. Hence there's not
>>> really an issue with how long that conversion may take, as
>>> long as there's no need to pause the guest for that time period.
>>> Which means you could first initiate conversion via the
>>> misconfig mechanism, but then immediately go ahead and walk
>>> the entire guest address space (or the relevant part of it, if
>>> the bounds got tracked) with continuations used as necessary.
>> Sorry, what's the misconfig mechanism good for if I still need to sweep
>> the entire
>> p2m table immediately?
> To avoid the need to pause the guest for an extended time: At
> least between scheduling a continuation and executing it, the
> guest could happily run (and perhaps cause some of the
> otherwise synchronous - to the hypercall - work to get carried
> out already, with the involved execution time accounted to the
> guest instead of the host).
>

By "scheduling a continuation and executing it", do you mean code like
hypercall_create_continuation? I'll need to study on this part. But one
difference I can imagine is that the hypercall_create_continuation is of
uncompleted hypercalls I guess, yet the p2m table sweeping may only
be a side effect of the unbound hypercall here.
I may have some misunderstandings here, so please correct me if so,
and I'll have try some investigation at the same time. Thanks!

Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2016-06-24  9:57                                                                     ` Yu Zhang
@ 2016-06-24 10:27                                                                       ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2016-06-24 10:27 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, JunNakajima

>>> On 24.06.16 at 11:57, <yu.c.zhang@linux.intel.com> wrote:
> On 6/24/2016 4:01 PM, Jan Beulich wrote:
>>>>> On 24.06.16 at 09:12, <yu.c.zhang@linux.intel.com> wrote:
>>> On 6/24/2016 2:12 PM, Jan Beulich wrote:
>>>> And then, thinking about it again especially in the context of
>>>> the hvmctl series - the unbinding of the type is happening in a
>>>> hypercall with built-in preemption capability. Hence there's not
>>>> really an issue with how long that conversion may take, as
>>>> long as there's no need to pause the guest for that time period.
>>>> Which means you could first initiate conversion via the
>>>> misconfig mechanism, but then immediately go ahead and walk
>>>> the entire guest address space (or the relevant part of it, if
>>>> the bounds got tracked) with continuations used as necessary.
>>> Sorry, what's the misconfig mechanism good for if I still need to sweep
>>> the entire
>>> p2m table immediately?
>> To avoid the need to pause the guest for an extended time: At
>> least between scheduling a continuation and executing it, the
>> guest could happily run (and perhaps cause some of the
>> otherwise synchronous - to the hypercall - work to get carried
>> out already, with the involved execution time accounted to the
>> guest instead of the host).
> 
> By "scheduling a continuation and executing it", do you mean code like
> hypercall_create_continuation? I'll need to study on this part. But one
> difference I can imagine is that the hypercall_create_continuation is of
> uncompleted hypercalls I guess, yet the p2m table sweeping may only
> be a side effect of the unbound hypercall here.

Side effect or not, it'll take long and hence you can't do it in one go.
Hence the need for a continuation. We have at least one other such
example - see paging_domctl(). I don't think you'll need an auxiliary
new hypercall here, though.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2016-06-24 10:27 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-19  9:05 [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2016-05-19  9:05 ` [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
2016-06-14 10:04   ` Jan Beulich
2016-06-14 13:14     ` George Dunlap
2016-06-15 10:51     ` Yu Zhang
2016-05-19  9:05 ` [PATCH v4 2/3] x86/ioreq server: Add new functions to get/set memory types Yu Zhang
2016-05-19  9:05 ` [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2016-06-14 10:45   ` Jan Beulich
2016-06-14 13:13     ` George Dunlap
2016-06-14 13:31       ` Jan Beulich
2016-06-15  9:50         ` George Dunlap
2016-06-15 10:21           ` Jan Beulich
2016-06-15 11:28             ` George Dunlap
2016-06-16  9:30             ` Yu Zhang
2016-06-16  9:55               ` Jan Beulich
2016-06-17 10:17                 ` George Dunlap
2016-06-20  9:03                   ` Yu Zhang
2016-06-20 10:10                     ` George Dunlap
2016-06-20 10:25                       ` Jan Beulich
2016-06-20 10:32                         ` George Dunlap
2016-06-20 10:55                           ` Jan Beulich
2016-06-20 11:28                             ` Yu Zhang
2016-06-20 13:13                               ` George Dunlap
2016-06-21  7:42                                 ` Yu Zhang
2016-06-20 10:30                       ` Yu Zhang
2016-06-20 10:43                         ` George Dunlap
2016-06-20 10:45                         ` Jan Beulich
2016-06-20 11:06                           ` Yu Zhang
2016-06-20 11:20                             ` Jan Beulich
2016-06-20 12:06                               ` Yu Zhang
2016-06-20 13:38                                 ` Jan Beulich
2016-06-21  7:45                                   ` Yu Zhang
2016-06-21  8:22                                     ` Jan Beulich
2016-06-21  9:16                                       ` Yu Zhang
2016-06-21  9:47                                         ` Jan Beulich
2016-06-21 10:00                                           ` Yu Zhang
2016-06-21 14:38                                           ` George Dunlap
2016-06-22  6:39                                             ` Jan Beulich
2016-06-22  8:38                                               ` Yu Zhang
2016-06-22  9:11                                                 ` Jan Beulich
2016-06-22  9:16                                               ` George Dunlap
2016-06-22  9:29                                                 ` Jan Beulich
2016-06-22  9:47                                                   ` George Dunlap
2016-06-22 10:07                                                     ` Yu Zhang
2016-06-22 11:33                                                       ` George Dunlap
2016-06-23  7:37                                                         ` Yu Zhang
2016-06-23 10:33                                                           ` George Dunlap
2016-06-24  4:16                                                             ` Yu Zhang
2016-06-24  6:12                                                               ` Jan Beulich
2016-06-24  7:12                                                                 ` Yu Zhang
2016-06-24  8:01                                                                   ` Jan Beulich
2016-06-24  9:57                                                                     ` Yu Zhang
2016-06-24 10:27                                                                       ` Jan Beulich
2016-06-22 10:10                                                     ` Jan Beulich
2016-06-22 10:15                                                       ` George Dunlap
2016-06-22 11:50                                                         ` Jan Beulich
2016-06-15 10:52     ` Yu Zhang
2016-06-15 12:26       ` Jan Beulich
2016-06-16  9:32         ` Yu Zhang
2016-06-16 10:02           ` Jan Beulich
2016-06-16 11:18             ` Yu Zhang
2016-06-16 12:43               ` Jan Beulich
2016-06-20  9:05             ` Yu Zhang
2016-06-14 13:14   ` George Dunlap
2016-05-27  7:52 ` [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Zhang, Yu C
2016-05-27 10:00   ` Jan Beulich
2016-05-27  9:51     ` Zhang, Yu C
2016-05-27 10:02     ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).