All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] Slotted channels for sync vm_events
@ 2018-12-19 18:52 Petre Pircalabu
  2018-12-19 18:52 ` [RFC PATCH 1/6] tools/libxc: Consistent usage of xc_vm_event_* interface Petre Pircalabu
                   ` (7 more replies)
  0 siblings, 8 replies; 50+ messages in thread
From: Petre Pircalabu @ 2018-12-19 18:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

This patchset is a rework of the "multi-page ring buffer" for vm_events
patch based on Andrew Cooper's comments.
For synchronous vm_events the ring waitqueue logic was unnecessary as the
vcpu sending the request was blocked until a response was received.
To simplify the request/response mechanism, an array of slotted channels
was created, one per vcpu. Each vcpu puts the request in the
corresponding slot and blocks until the response is received.

I'm sending this patch as a RFC because, while I'm still working on way to
measure the overall performance improvement, your feedback would be a great
assistance.

Petre Pircalabu (6):
  tools/libxc: Consistent usage of xc_vm_event_* interface
  tools/libxc: Define VM_EVENT type
  vm_event: Refactor vm_event_domain implementation
  vm_event: Use slotted channels for sync requests.
  xen-access: add support for slotted channel vm_events
  xc_version: add vm_event interface version

 tools/libxc/include/xenctrl.h       |  60 +--
 tools/libxc/xc_mem_paging.c         |  23 +-
 tools/libxc/xc_memshr.c             |  34 --
 tools/libxc/xc_monitor.c            |  67 ++-
 tools/libxc/xc_private.c            |   3 +
 tools/libxc/xc_private.h            |  22 +-
 tools/libxc/xc_vm_event.c           | 192 ++++---
 tools/tests/xen-access/xen-access.c | 545 ++++++++++++++++----
 tools/xenpaging/xenpaging.c         |  42 +-
 xen/arch/arm/mem_access.c           |   2 +-
 xen/arch/x86/mm.c                   |   7 +
 xen/arch/x86/mm/mem_access.c        |   4 +-
 xen/arch/x86/mm/mem_paging.c        |   2 +-
 xen/arch/x86/mm/mem_sharing.c       |   5 +-
 xen/arch/x86/mm/p2m.c               |  10 +-
 xen/common/kernel.c                 |   3 +
 xen/common/mem_access.c             |   2 +-
 xen/common/monitor.c                |   4 +-
 xen/common/vm_event.c               | 972 +++++++++++++++++++++++++++---------
 xen/drivers/passthrough/pci.c       |   2 +-
 xen/include/public/domctl.h         |  64 ++-
 xen/include/public/memory.h         |   2 +
 xen/include/public/version.h        |   3 +
 xen/include/public/vm_event.h       |  15 +
 xen/include/xen/sched.h             |  25 +-
 xen/include/xen/vm_event.h          |  30 +-
 26 files changed, 1511 insertions(+), 629 deletions(-)

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH 1/6] tools/libxc: Consistent usage of xc_vm_event_* interface
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
@ 2018-12-19 18:52 ` Petre Pircalabu
  2018-12-19 18:52 ` [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type Petre Pircalabu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Petre Pircalabu @ 2018-12-19 18:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Petre Pircalabu, Wei Liu, Ian Jackson

Modified xc_mem_paging_enable to use directly xc_vm_event_enable and
moved the ring_page handling from client to libxc (xenpaging).

Restricted vm_event_control usage only to simplest domctls which do
not expect any return values and change xc_vm_event_enable to call do_domctl
directly.

Removed xc_memshr_ring_enable/disable and xc_memshr_domain_resume.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
---
 tools/libxc/include/xenctrl.h | 49 +--------------------------------
 tools/libxc/xc_mem_paging.c   | 23 +++++-----------
 tools/libxc/xc_memshr.c       | 34 -----------------------
 tools/libxc/xc_monitor.c      | 31 +++++++++++++++++----
 tools/libxc/xc_private.h      |  2 +-
 tools/libxc/xc_vm_event.c     | 64 ++++++++++++++++---------------------------
 tools/xenpaging/xenpaging.c   | 42 +++-------------------------
 7 files changed, 62 insertions(+), 183 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 97ae965..de0b990 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1962,7 +1962,7 @@ int xc_altp2m_change_gfn(xc_interface *handle, uint32_t domid,
  * Hardware-Assisted Paging (i.e. Intel EPT, AMD NPT). Moreover, AMD NPT
  * support is considered experimental.
  */
-int xc_mem_paging_enable(xc_interface *xch, uint32_t domain_id, uint32_t *port);
+void *xc_mem_paging_enable(xc_interface *xch, uint32_t domain_id, uint32_t *port);
 int xc_mem_paging_disable(xc_interface *xch, uint32_t domain_id);
 int xc_mem_paging_resume(xc_interface *xch, uint32_t domain_id);
 int xc_mem_paging_nominate(xc_interface *xch, uint32_t domain_id,
@@ -2090,53 +2090,6 @@ int xc_memshr_control(xc_interface *xch,
                       uint32_t domid,
                       int enable);
 
-/* Create a communication ring in which the hypervisor will place ENOMEM
- * notifications.
- *
- * ENOMEM happens when unsharing pages: a Copy-on-Write duplicate needs to be
- * allocated, and thus the out-of-memory error occurr.
- *
- * For complete examples on how to plumb a notification ring, look into
- * xenpaging or xen-access.
- *
- * On receipt of a notification, the helper should ensure there is memory
- * available to the domain before retrying.
- *
- * If a domain encounters an ENOMEM condition when sharing and this ring
- * has not been set up, the hypervisor will crash the domain.
- *
- * Fails with:
- *  EINVAL if port is NULL
- *  EINVAL if the sharing ring has already been enabled
- *  ENOSYS if no guest gfn has been specified to host the ring via an hvm param
- *  EINVAL if the gfn for the ring has not been populated
- *  ENOENT if the gfn for the ring is paged out, or cannot be unshared
- *  EINVAL if the gfn for the ring cannot be written to
- *  EINVAL if the domain is dying
- *  ENOSPC if an event channel cannot be allocated for the ring
- *  ENOMEM if memory cannot be allocated for internal data structures
- *  EINVAL or EACCESS if the request is denied by the security policy
- */
-
-int xc_memshr_ring_enable(xc_interface *xch, 
-                          uint32_t domid,
-                          uint32_t *port);
-/* Disable the ring for ENOMEM communication.
- * May fail with EINVAL if the ring was not enabled in the first place.
- */
-int xc_memshr_ring_disable(xc_interface *xch, 
-                           uint32_t domid);
-
-/*
- * Calls below return EINVAL if sharing has not been enabled for the domain
- * Calls below return EINVAL if the domain is dying
- */
-/* Once a reponse to an ENOMEM notification is prepared, the tool can
- * notify the hypervisor to re-schedule the faulting vcpu of the domain with an
- * event channel kick and/or this call. */
-int xc_memshr_domain_resume(xc_interface *xch,
-                            uint32_t domid);
-
 /* Select a page for sharing. 
  *
  * A 64 bit opaque handle will be stored in handle.  The hypervisor ensures
diff --git a/tools/libxc/xc_mem_paging.c b/tools/libxc/xc_mem_paging.c
index a067706..08468fb 100644
--- a/tools/libxc/xc_mem_paging.c
+++ b/tools/libxc/xc_mem_paging.c
@@ -37,35 +37,26 @@ static int xc_mem_paging_memop(xc_interface *xch, uint32_t domain_id,
     return do_memory_op(xch, XENMEM_paging_op, &mpo, sizeof(mpo));
 }
 
-int xc_mem_paging_enable(xc_interface *xch, uint32_t domain_id,
-                         uint32_t *port)
+void *xc_mem_paging_enable(xc_interface *xch, uint32_t domain_id,
+                           uint32_t *port)
 {
-    if ( !port )
-    {
-        errno = EINVAL;
-        return -1;
-    }
-
-    return xc_vm_event_control(xch, domain_id,
-                               XEN_VM_EVENT_ENABLE,
-                               XEN_DOMCTL_VM_EVENT_OP_PAGING,
-                               port);
+    return xc_vm_event_enable(xch, domain_id,
+                              XEN_DOMCTL_VM_EVENT_OP_PAGING,
+                              port);
 }
 
 int xc_mem_paging_disable(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_DISABLE,
-                               XEN_DOMCTL_VM_EVENT_OP_PAGING,
-                               NULL);
+                               XEN_DOMCTL_VM_EVENT_OP_PAGING);
 }
 
 int xc_mem_paging_resume(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_RESUME,
-                               XEN_DOMCTL_VM_EVENT_OP_PAGING,
-                               NULL);
+                               XEN_DOMCTL_VM_EVENT_OP_PAGING);
 }
 
 int xc_mem_paging_nominate(xc_interface *xch, uint32_t domain_id, uint64_t gfn)
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index d5e135e..06f613a 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -41,31 +41,6 @@ int xc_memshr_control(xc_interface *xch,
     return do_domctl(xch, &domctl);
 }
 
-int xc_memshr_ring_enable(xc_interface *xch, 
-                          uint32_t domid,
-                          uint32_t *port)
-{
-    if ( !port )
-    {
-        errno = EINVAL;
-        return -1;
-    }
-
-    return xc_vm_event_control(xch, domid,
-                               XEN_VM_EVENT_ENABLE,
-                               XEN_DOMCTL_VM_EVENT_OP_SHARING,
-                               port);
-}
-
-int xc_memshr_ring_disable(xc_interface *xch, 
-                           uint32_t domid)
-{
-    return xc_vm_event_control(xch, domid,
-                               XEN_VM_EVENT_DISABLE,
-                               XEN_DOMCTL_VM_EVENT_OP_SHARING,
-                               NULL);
-}
-
 static int xc_memshr_memop(xc_interface *xch, uint32_t domid,
                             xen_mem_sharing_op_t *mso)
 {
@@ -200,15 +175,6 @@ int xc_memshr_range_share(xc_interface *xch,
     return xc_memshr_memop(xch, source_domain, &mso);
 }
 
-int xc_memshr_domain_resume(xc_interface *xch,
-                            uint32_t domid)
-{
-    return xc_vm_event_control(xch, domid,
-                               XEN_VM_EVENT_RESUME,
-                               XEN_DOMCTL_VM_EVENT_OP_SHARING,
-                               NULL);
-}
-
 int xc_memshr_debug_gfn(xc_interface *xch,
                         uint32_t domid,
                         unsigned long gfn)
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index 4ac823e..d190c29 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -24,24 +24,43 @@
 
 void *xc_monitor_enable(xc_interface *xch, uint32_t domain_id, uint32_t *port)
 {
-    return xc_vm_event_enable(xch, domain_id, HVM_PARAM_MONITOR_RING_PFN,
-                              port);
+    void *buffer;
+    int saved_errno;
+
+    /* Pause the domain for ring page setup */
+    if ( xc_domain_pause(xch, domain_id) )
+    {
+        PERROR("Unable to pause domain\n");
+        return NULL;
+    }
+
+    buffer = xc_vm_event_enable(xch, domain_id,
+                                HVM_PARAM_MONITOR_RING_PFN,
+                                port);
+    saved_errno = errno;
+    if ( xc_domain_unpause(xch, domain_id) )
+    {
+        if ( buffer )
+            saved_errno = errno;
+        PERROR("Unable to unpause domain");
+    }
+
+    errno = saved_errno;
+    return buffer;
 }
 
 int xc_monitor_disable(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_DISABLE,
-                               XEN_DOMCTL_VM_EVENT_OP_MONITOR,
-                               NULL);
+                               XEN_DOMCTL_VM_EVENT_OP_MONITOR);
 }
 
 int xc_monitor_resume(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_RESUME,
-                               XEN_DOMCTL_VM_EVENT_OP_MONITOR,
-                               NULL);
+                               XEN_DOMCTL_VM_EVENT_OP_MONITOR);
 }
 
 int xc_monitor_get_capabilities(xc_interface *xch, uint32_t domain_id,
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index adc3b6a..663e78b 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -412,7 +412,7 @@ int xc_ffs64(uint64_t x);
  * vm_event operations. Internal use only.
  */
 int xc_vm_event_control(xc_interface *xch, uint32_t domain_id, unsigned int op,
-                        unsigned int mode, uint32_t *port);
+                        unsigned int mode);
 /*
  * Enables vm_event and returns the mapped ring page indicated by param.
  * param can be HVM_PARAM_PAGING/ACCESS/SHARING_RING_PFN
diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c
index 8674607..d9e3a49 100644
--- a/tools/libxc/xc_vm_event.c
+++ b/tools/libxc/xc_vm_event.c
@@ -23,20 +23,16 @@
 #include "xc_private.h"
 
 int xc_vm_event_control(xc_interface *xch, uint32_t domain_id, unsigned int op,
-                        unsigned int mode, uint32_t *port)
+                        unsigned int mode)
 {
     DECLARE_DOMCTL;
-    int rc;
 
     domctl.cmd = XEN_DOMCTL_vm_event_op;
     domctl.domain = domain_id;
     domctl.u.vm_event_op.op = op;
     domctl.u.vm_event_op.mode = mode;
 
-    rc = do_domctl(xch, &domctl);
-    if ( !rc && port )
-        *port = domctl.u.vm_event_op.port;
-    return rc;
+    return do_domctl(xch, &domctl);
 }
 
 void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
@@ -46,7 +42,8 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
     uint64_t pfn;
     xen_pfn_t ring_pfn, mmap_pfn;
     unsigned int op, mode;
-    int rc1, rc2, saved_errno;
+    int rc;
+    DECLARE_DOMCTL;
 
     if ( !port )
     {
@@ -54,17 +51,9 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
         return NULL;
     }
 
-    /* Pause the domain for ring page setup */
-    rc1 = xc_domain_pause(xch, domain_id);
-    if ( rc1 != 0 )
-    {
-        PERROR("Unable to pause domain\n");
-        return NULL;
-    }
-
     /* Get the pfn of the ring page */
-    rc1 = xc_hvm_param_get(xch, domain_id, param, &pfn);
-    if ( rc1 != 0 )
+    rc = xc_hvm_param_get(xch, domain_id, param, &pfn);
+    if ( rc != 0 )
     {
         PERROR("Failed to get pfn of ring page\n");
         goto out;
@@ -72,13 +61,13 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
 
     ring_pfn = pfn;
     mmap_pfn = pfn;
-    rc1 = xc_get_pfn_type_batch(xch, domain_id, 1, &mmap_pfn);
-    if ( rc1 || mmap_pfn & XEN_DOMCTL_PFINFO_XTAB )
+    rc = xc_get_pfn_type_batch(xch, domain_id, 1, &mmap_pfn);
+    if ( rc || mmap_pfn & XEN_DOMCTL_PFINFO_XTAB )
     {
         /* Page not in the physmap, try to populate it */
-        rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0,
+        rc = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0,
                                               &ring_pfn);
-        if ( rc1 != 0 )
+        if ( rc != 0 )
         {
             PERROR("Failed to populate ring pfn\n");
             goto out;
@@ -87,7 +76,7 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
 
     mmap_pfn = ring_pfn;
     ring_page = xc_map_foreign_pages(xch, domain_id, PROT_READ | PROT_WRITE,
-                                         &mmap_pfn, 1);
+                                     &mmap_pfn, 1);
     if ( !ring_page )
     {
         PERROR("Could not map the ring page\n");
@@ -117,40 +106,35 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
      */
     default:
         errno = EINVAL;
-        rc1 = -1;
+        rc = -1;
         goto out;
     }
 
-    rc1 = xc_vm_event_control(xch, domain_id, op, mode, port);
-    if ( rc1 != 0 )
+    domctl.cmd = XEN_DOMCTL_vm_event_op;
+    domctl.domain = domain_id;
+    domctl.u.vm_event_op.op = op;
+    domctl.u.vm_event_op.mode = mode;
+
+    rc = do_domctl(xch, &domctl);
+    if ( rc != 0 )
     {
         PERROR("Failed to enable vm_event\n");
         goto out;
     }
 
+    *port = domctl.u.vm_event_op.port;
+
     /* Remove the ring_pfn from the guest's physmap */
-    rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, &ring_pfn);
-    if ( rc1 != 0 )
+    rc = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, &ring_pfn);
+    if ( rc != 0 )
         PERROR("Failed to remove ring page from guest physmap");
 
  out:
-    saved_errno = errno;
-
-    rc2 = xc_domain_unpause(xch, domain_id);
-    if ( rc1 != 0 || rc2 != 0 )
+    if ( rc != 0 )
     {
-        if ( rc2 != 0 )
-        {
-            if ( rc1 == 0 )
-                saved_errno = errno;
-            PERROR("Unable to unpause domain");
-        }
-
         if ( ring_page )
             xenforeignmemory_unmap(xch->fmem, ring_page, 1);
         ring_page = NULL;
-
-        errno = saved_errno;
     }
 
     return ring_page;
diff --git a/tools/xenpaging/xenpaging.c b/tools/xenpaging/xenpaging.c
index d0571ca..b4a3a5c 100644
--- a/tools/xenpaging/xenpaging.c
+++ b/tools/xenpaging/xenpaging.c
@@ -337,40 +337,11 @@ static struct xenpaging *xenpaging_init(int argc, char *argv[])
         goto err;
     }
 
-    /* Map the ring page */
-    xc_get_hvm_param(xch, paging->vm_event.domain_id, 
-                        HVM_PARAM_PAGING_RING_PFN, &ring_pfn);
-    mmap_pfn = ring_pfn;
-    paging->vm_event.ring_page = 
-        xc_map_foreign_pages(xch, paging->vm_event.domain_id,
-                             PROT_READ | PROT_WRITE, &mmap_pfn, 1);
-    if ( !paging->vm_event.ring_page )
-    {
-        /* Map failed, populate ring page */
-        rc = xc_domain_populate_physmap_exact(paging->xc_handle, 
-                                              paging->vm_event.domain_id,
-                                              1, 0, 0, &ring_pfn);
-        if ( rc != 0 )
-        {
-            PERROR("Failed to populate ring gfn\n");
-            goto err;
-        }
-
-        paging->vm_event.ring_page = 
-            xc_map_foreign_pages(xch, paging->vm_event.domain_id,
-                                 PROT_READ | PROT_WRITE,
-                                 &mmap_pfn, 1);
-        if ( !paging->vm_event.ring_page )
-        {
-            PERROR("Could not map the ring page\n");
-            goto err;
-        }
-    }
-    
     /* Initialise Xen */
-    rc = xc_mem_paging_enable(xch, paging->vm_event.domain_id,
-                             &paging->vm_event.evtchn_port);
-    if ( rc != 0 )
+    paging->vm_event.ring_page =
+            xc_mem_paging_enable(xch, paging->vm_event.domain_id,
+                                 &paging->vm_event.evtchn_port);
+    if ( paging->vm_event.ring_page == NULL )
     {
         switch ( errno ) {
             case EBUSY:
@@ -418,11 +389,6 @@ static struct xenpaging *xenpaging_init(int argc, char *argv[])
                    (vm_event_sring_t *)paging->vm_event.ring_page,
                    PAGE_SIZE);
 
-    /* Now that the ring is set, remove it from the guest's physmap */
-    if ( xc_domain_decrease_reservation_exact(xch, 
-                    paging->vm_event.domain_id, 1, 0, &ring_pfn) )
-        PERROR("Failed to remove ring from guest physmap");
-
     /* Get max_pages from guest if not provided via cmdline */
     if ( !paging->max_pages )
     {
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
  2018-12-19 18:52 ` [RFC PATCH 1/6] tools/libxc: Consistent usage of xc_vm_event_* interface Petre Pircalabu
@ 2018-12-19 18:52 ` Petre Pircalabu
  2018-12-19 22:13   ` Tamas K Lengyel
                     ` (2 more replies)
  2018-12-19 18:52 ` [RFC PATCH 3/6] vm_event: Refactor vm_event_domain implementation Petre Pircalabu
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 50+ messages in thread
From: Petre Pircalabu @ 2018-12-19 18:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich

Define the type for each of the supported vm_event rings (paging,
monitor and sharing) and replace the ring param field with this type.

Replace XEN_DOMCTL_VM_EVENT_OP_ occurrences with their corresponding
XEN_VM_EVENT_TYPE_ counterpart.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
---
 tools/libxc/xc_mem_paging.c |  6 ++--
 tools/libxc/xc_monitor.c    |  6 ++--
 tools/libxc/xc_private.h    |  8 +++---
 tools/libxc/xc_vm_event.c   | 68 ++++++++++++++++++++++-----------------------
 xen/common/vm_event.c       | 12 ++++----
 xen/include/public/domctl.h | 45 ++++++++++++++++--------------
 6 files changed, 73 insertions(+), 72 deletions(-)

diff --git a/tools/libxc/xc_mem_paging.c b/tools/libxc/xc_mem_paging.c
index 08468fb..37a8224 100644
--- a/tools/libxc/xc_mem_paging.c
+++ b/tools/libxc/xc_mem_paging.c
@@ -41,7 +41,7 @@ void *xc_mem_paging_enable(xc_interface *xch, uint32_t domain_id,
                            uint32_t *port)
 {
     return xc_vm_event_enable(xch, domain_id,
-                              XEN_DOMCTL_VM_EVENT_OP_PAGING,
+                              XEN_VM_EVENT_TYPE_PAGING,
                               port);
 }
 
@@ -49,14 +49,14 @@ int xc_mem_paging_disable(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_DISABLE,
-                               XEN_DOMCTL_VM_EVENT_OP_PAGING);
+                               XEN_VM_EVENT_TYPE_PAGING);
 }
 
 int xc_mem_paging_resume(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_RESUME,
-                               XEN_DOMCTL_VM_EVENT_OP_PAGING);
+                               XEN_VM_EVENT_TYPE_PAGING);
 }
 
 int xc_mem_paging_nominate(xc_interface *xch, uint32_t domain_id, uint64_t gfn)
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index d190c29..718fe8b 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -35,7 +35,7 @@ void *xc_monitor_enable(xc_interface *xch, uint32_t domain_id, uint32_t *port)
     }
 
     buffer = xc_vm_event_enable(xch, domain_id,
-                                HVM_PARAM_MONITOR_RING_PFN,
+                                XEN_VM_EVENT_TYPE_MONITOR,
                                 port);
     saved_errno = errno;
     if ( xc_domain_unpause(xch, domain_id) )
@@ -53,14 +53,14 @@ int xc_monitor_disable(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_DISABLE,
-                               XEN_DOMCTL_VM_EVENT_OP_MONITOR);
+                               XEN_VM_EVENT_TYPE_MONITOR);
 }
 
 int xc_monitor_resume(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
                                XEN_VM_EVENT_RESUME,
-                               XEN_DOMCTL_VM_EVENT_OP_MONITOR);
+                               XEN_VM_EVENT_TYPE_MONITOR);
 }
 
 int xc_monitor_get_capabilities(xc_interface *xch, uint32_t domain_id,
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 663e78b..482451c 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -412,12 +412,12 @@ int xc_ffs64(uint64_t x);
  * vm_event operations. Internal use only.
  */
 int xc_vm_event_control(xc_interface *xch, uint32_t domain_id, unsigned int op,
-                        unsigned int mode);
+                        unsigned int type);
 /*
- * Enables vm_event and returns the mapped ring page indicated by param.
- * param can be HVM_PARAM_PAGING/ACCESS/SHARING_RING_PFN
+ * Enables vm_event and returns the mapped ring page indicated by type.
+ * type can be XEN_VM_EVENT_TYPE_(PAGING/MONITOR/SHARING)
  */
-void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
+void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int type,
                          uint32_t *port);
 
 int do_dm_op(xc_interface *xch, uint32_t domid, unsigned int nr_bufs, ...);
diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c
index d9e3a49..4fc2548 100644
--- a/tools/libxc/xc_vm_event.c
+++ b/tools/libxc/xc_vm_event.c
@@ -23,29 +23,54 @@
 #include "xc_private.h"
 
 int xc_vm_event_control(xc_interface *xch, uint32_t domain_id, unsigned int op,
-                        unsigned int mode)
+                        unsigned int type)
 {
     DECLARE_DOMCTL;
 
     domctl.cmd = XEN_DOMCTL_vm_event_op;
     domctl.domain = domain_id;
     domctl.u.vm_event_op.op = op;
-    domctl.u.vm_event_op.mode = mode;
+    domctl.u.vm_event_op.type = type;
 
     return do_domctl(xch, &domctl);
 }
 
-void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
+static int xc_vm_event_ring_pfn_param(int type, int *param)
+{
+    if ( !param )
+        return -EINVAL;
+
+    switch ( type )
+    {
+    case XEN_VM_EVENT_TYPE_PAGING:
+        *param = HVM_PARAM_PAGING_RING_PFN;
+        break;
+
+    case XEN_VM_EVENT_TYPE_MONITOR:
+        *param = HVM_PARAM_MONITOR_RING_PFN;
+        break;
+
+    case XEN_VM_EVENT_TYPE_SHARING:
+        *param = HVM_PARAM_SHARING_RING_PFN;
+        break;
+
+    default:
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int type,
                          uint32_t *port)
 {
     void *ring_page = NULL;
     uint64_t pfn;
     xen_pfn_t ring_pfn, mmap_pfn;
-    unsigned int op, mode;
-    int rc;
+    int param, rc;
     DECLARE_DOMCTL;
 
-    if ( !port )
+    if ( !port || xc_vm_event_ring_pfn_param(type, &param) != 0 )
     {
         errno = EINVAL;
         return NULL;
@@ -83,37 +108,10 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int param,
         goto out;
     }
 
-    switch ( param )
-    {
-    case HVM_PARAM_PAGING_RING_PFN:
-        op = XEN_VM_EVENT_ENABLE;
-        mode = XEN_DOMCTL_VM_EVENT_OP_PAGING;
-        break;
-
-    case HVM_PARAM_MONITOR_RING_PFN:
-        op = XEN_VM_EVENT_ENABLE;
-        mode = XEN_DOMCTL_VM_EVENT_OP_MONITOR;
-        break;
-
-    case HVM_PARAM_SHARING_RING_PFN:
-        op = XEN_VM_EVENT_ENABLE;
-        mode = XEN_DOMCTL_VM_EVENT_OP_SHARING;
-        break;
-
-    /*
-     * This is for the outside chance that the HVM_PARAM is valid but is invalid
-     * as far as vm_event goes.
-     */
-    default:
-        errno = EINVAL;
-        rc = -1;
-        goto out;
-    }
-
     domctl.cmd = XEN_DOMCTL_vm_event_op;
     domctl.domain = domain_id;
-    domctl.u.vm_event_op.op = op;
-    domctl.u.vm_event_op.mode = mode;
+    domctl.u.vm_event_op.op = XEN_VM_EVENT_ENABLE;
+    domctl.u.vm_event_op.type = type;
 
     rc = do_domctl(xch, &domctl);
     if ( rc != 0 )
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 26cfa2c..dddc2d4 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -371,7 +371,7 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
     vm_event_response_t rsp;
 
     /*
-     * vm_event_resume() runs in either XEN_DOMCTL_VM_EVENT_OP_*, or
+     * vm_event_resume() runs in either XEN_VM_EVENT_* domctls, or
      * EVTCHN_send context from the introspection consumer. Both contexts
      * are guaranteed not to be the subject of vm_event responses.
      * While we could ASSERT(v != current) for each VCPU in d in the loop
@@ -592,7 +592,7 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
 {
     int rc;
 
-    rc = xsm_vm_event_control(XSM_PRIV, d, vec->mode, vec->op);
+    rc = xsm_vm_event_control(XSM_PRIV, d, vec->type, vec->op);
     if ( rc )
         return rc;
 
@@ -619,10 +619,10 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
 
     rc = -ENOSYS;
 
-    switch ( vec->mode )
+    switch ( vec->type )
     {
 #ifdef CONFIG_HAS_MEM_PAGING
-    case XEN_DOMCTL_VM_EVENT_OP_PAGING:
+    case XEN_VM_EVENT_TYPE_PAGING:
     {
         rc = -EINVAL;
 
@@ -681,7 +681,7 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
     break;
 #endif
 
-    case XEN_DOMCTL_VM_EVENT_OP_MONITOR:
+    case XEN_VM_EVENT_TYPE_MONITOR:
     {
         rc = -EINVAL;
 
@@ -722,7 +722,7 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
     break;
 
 #ifdef CONFIG_HAS_MEM_SHARING
-    case XEN_DOMCTL_VM_EVENT_OP_SHARING:
+    case XEN_VM_EVENT_TYPE_SHARING:
     {
         rc = -EINVAL;
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 7e1cf21..26b1a55 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1,8 +1,8 @@
 /******************************************************************************
  * domctl.h
- *
+ * 
  * Domain management operations. For use by node control stack.
- *
+ * 
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to
  * deal in the Software without restriction, including without limitation the
@@ -769,18 +769,9 @@ struct xen_domctl_gdbsx_domstatus {
  * VM event operations
  */
 
-/* XEN_DOMCTL_vm_event_op */
-
 /*
- * There are currently three rings available for VM events:
- * sharing, monitor and paging. This hypercall allows one to
- * control these rings (enable/disable), as well as to signal
- * to the hypervisor to pull responses (resume) from the given
- * ring.
+ * There are currently three types of rings available for VM events.
  */
-#define XEN_VM_EVENT_ENABLE               0
-#define XEN_VM_EVENT_DISABLE              1
-#define XEN_VM_EVENT_RESUME               2
 
 /*
  * Domain memory paging
@@ -796,7 +787,7 @@ struct xen_domctl_gdbsx_domstatus {
  * EXDEV  - guest has PoD enabled
  * EBUSY  - guest has or had paging enabled, ring buffer still active
  */
-#define XEN_DOMCTL_VM_EVENT_OP_PAGING            1
+#define XEN_VM_EVENT_TYPE_PAGING         1
 
 /*
  * Monitor helper.
@@ -820,7 +811,7 @@ struct xen_domctl_gdbsx_domstatus {
  * EBUSY  - guest has or had access enabled, ring buffer still active
  *
  */
-#define XEN_DOMCTL_VM_EVENT_OP_MONITOR           2
+#define XEN_VM_EVENT_TYPE_MONITOR        2
 
 /*
  * Sharing ENOMEM helper.
@@ -835,15 +826,27 @@ struct xen_domctl_gdbsx_domstatus {
  * Note that shring can be turned on (as per the domctl below)
  * *without* this ring being setup.
  */
-#define XEN_DOMCTL_VM_EVENT_OP_SHARING           3
+#define XEN_VM_EVENT_TYPE_SHARING        3
+
+/*
+ * This hypercall allows one to control the vm_event rings (enable/disable),
+ * as well as to signal to the hypervisor to pull responses (resume) and
+ * retrieve the event channel from the given ring.
+ */
+#define XEN_VM_EVENT_ENABLE               0
+#define XEN_VM_EVENT_DISABLE              1
+#define XEN_VM_EVENT_RESUME               2
 
-/* Use for teardown/setup of helper<->hypervisor interface for paging,
- * access and sharing.*/
+/*
+ * Use for teardown/setup of helper<->hypervisor interface for paging,
+ * access and sharing.
+ */
+/* XEN_DOMCTL_vm_event_op */
 struct xen_domctl_vm_event_op {
-    uint32_t       op;           /* XEN_VM_EVENT_* */
-    uint32_t       mode;         /* XEN_DOMCTL_VM_EVENT_OP_* */
+    uint32_t        op;           /* XEN_VM_EVENT_* */
+    uint32_t        type;         /* XEN_VM_EVENT_TYPE_* */
 
-    uint32_t port;              /* OUT: event channel for ring */
+    uint32_t        port;         /* OUT: event channel for ring */
 };
 
 /*
@@ -997,7 +1000,7 @@ struct xen_domctl_psr_cmt_op {
  * Enable/disable monitoring various VM events.
  * This domctl configures what events will be reported to helper apps
  * via the ring buffer "MONITOR". The ring has to be first enabled
- * with the domctl XEN_DOMCTL_VM_EVENT_OP_MONITOR.
+ * with XEN_VM_EVENT_ENABLE.
  *
  * GET_CAPABILITIES can be used to determine which of these features is
  * available on a given platform.
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 3/6] vm_event: Refactor vm_event_domain implementation
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
  2018-12-19 18:52 ` [RFC PATCH 1/6] tools/libxc: Consistent usage of xc_vm_event_* interface Petre Pircalabu
  2018-12-19 18:52 ` [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type Petre Pircalabu
@ 2018-12-19 18:52 ` Petre Pircalabu
  2018-12-19 22:26   ` Tamas K Lengyel
  2018-12-19 18:52 ` [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests Petre Pircalabu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Petre Pircalabu @ 2018-12-19 18:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Stefano Stabellini, Jan Beulich,
	Roger Pau Monné

Decouple the VM Event interface from the ring implementation.
---
 xen/arch/arm/mem_access.c     |   2 +-
 xen/arch/x86/mm/mem_access.c  |   4 +-
 xen/arch/x86/mm/mem_paging.c  |   2 +-
 xen/arch/x86/mm/mem_sharing.c |   5 +-
 xen/arch/x86/mm/p2m.c         |  10 +-
 xen/common/mem_access.c       |   2 +-
 xen/common/monitor.c          |   4 +-
 xen/common/vm_event.c         | 503 ++++++++++++++++++++++++------------------
 xen/drivers/passthrough/pci.c |   2 +-
 xen/include/xen/sched.h       |  25 +--
 xen/include/xen/vm_event.h    |  26 +--
 11 files changed, 312 insertions(+), 273 deletions(-)

diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
index db49372..ba0114a 100644
--- a/xen/arch/arm/mem_access.c
+++ b/xen/arch/arm/mem_access.c
@@ -290,7 +290,7 @@ bool p2m_mem_access_check(paddr_t gpa, vaddr_t gla, const struct npfec npfec)
     }
 
     /* Otherwise, check if there is a vm_event monitor subscriber */
-    if ( !vm_event_check_ring(v->domain->vm_event_monitor) )
+    if ( !vm_event_check(v->domain->vm_event_monitor) )
     {
         /* No listener */
         if ( p2m->access_required )
diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 56c06a4..57aeda7 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -182,7 +182,7 @@ bool p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     gfn_unlock(p2m, gfn, 0);
 
     /* Otherwise, check if there is a memory event listener, and send the message along */
-    if ( !vm_event_check_ring(d->vm_event_monitor) || !req_ptr )
+    if ( !vm_event_check(d->vm_event_monitor) || !req_ptr )
     {
         /* No listener */
         if ( p2m->access_required )
@@ -210,7 +210,7 @@ bool p2m_mem_access_check(paddr_t gpa, unsigned long gla,
             return true;
         }
     }
-    if ( vm_event_check_ring(d->vm_event_monitor) &&
+    if ( vm_event_check(d->vm_event_monitor) &&
          d->arch.monitor.inguest_pagefault_disabled &&
          npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
     {
diff --git a/xen/arch/x86/mm/mem_paging.c b/xen/arch/x86/mm/mem_paging.c
index 54a94fa..dc2a59a 100644
--- a/xen/arch/x86/mm/mem_paging.c
+++ b/xen/arch/x86/mm/mem_paging.c
@@ -44,7 +44,7 @@ int mem_paging_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_paging_op_t) arg)
         goto out;
 
     rc = -ENODEV;
-    if ( unlikely(!vm_event_check_ring(d->vm_event_paging)) )
+    if ( unlikely(!vm_event_check(d->vm_event_paging)) )
         goto out;
 
     switch( mpo.op )
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 5ac9d8f..91e92a7 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -557,8 +557,7 @@ int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
         .u.mem_sharing.p2mt = p2m_ram_shared
     };
 
-    if ( (rc = __vm_event_claim_slot(d, 
-                        d->vm_event_share, allow_sleep)) < 0 )
+    if ( (rc = __vm_event_claim_slot(d->vm_event_share, allow_sleep)) < 0 )
         return rc;
 
     if ( v->domain == d )
@@ -567,7 +566,7 @@ int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
         vm_event_vcpu_pause(v);
     }
 
-    vm_event_put_request(d, d->vm_event_share, &req);
+    vm_event_put_request(d->vm_event_share, &req);
 
     return 0;
 }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index fea4497..3876dda 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1462,7 +1462,7 @@ void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn,
      * correctness of the guest execution at this point.  If this is the only
      * page that happens to be paged-out, we'll be okay..  but it's likely the
      * guest will crash shortly anyways. */
-    int rc = vm_event_claim_slot(d, d->vm_event_paging);
+    int rc = vm_event_claim_slot(d->vm_event_paging);
     if ( rc < 0 )
         return;
 
@@ -1476,7 +1476,7 @@ void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn,
         /* Evict will fail now, tag this request for pager */
         req.u.mem_paging.flags |= MEM_PAGING_EVICT_FAIL;
 
-    vm_event_put_request(d, d->vm_event_paging, &req);
+    vm_event_put_request(d->vm_event_paging, &req);
 }
 
 /**
@@ -1514,7 +1514,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn_l)
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     /* We're paging. There should be a ring */
-    int rc = vm_event_claim_slot(d, d->vm_event_paging);
+    int rc = vm_event_claim_slot(d->vm_event_paging);
     if ( rc == -ENOSYS )
     {
         gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring "
@@ -1555,7 +1555,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn_l)
     {
         /* gfn is already on its way back and vcpu is not paused */
     out_cancel:
-        vm_event_cancel_slot(d, d->vm_event_paging);
+        vm_event_cancel_slot(d->vm_event_paging);
         return;
     }
 
@@ -1563,7 +1563,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn_l)
     req.u.mem_paging.p2mt = p2mt;
     req.vcpu_id = v->vcpu_id;
 
-    vm_event_put_request(d, d->vm_event_paging, &req);
+    vm_event_put_request(d->vm_event_paging, &req);
 }
 
 /**
diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
index 010e6f8..51e4e2b 100644
--- a/xen/common/mem_access.c
+++ b/xen/common/mem_access.c
@@ -52,7 +52,7 @@ int mem_access_memop(unsigned long cmd,
         goto out;
 
     rc = -ENODEV;
-    if ( unlikely(!vm_event_check_ring(d->vm_event_monitor)) )
+    if ( unlikely(!vm_event_check(d->vm_event_monitor)) )
         goto out;
 
     switch ( mao.op )
diff --git a/xen/common/monitor.c b/xen/common/monitor.c
index c606683..fdf7b23 100644
--- a/xen/common/monitor.c
+++ b/xen/common/monitor.c
@@ -93,7 +93,7 @@ int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req)
     int rc;
     struct domain *d = v->domain;
 
-    rc = vm_event_claim_slot(d, d->vm_event_monitor);
+    rc = vm_event_claim_slot(d->vm_event_monitor);
     switch ( rc )
     {
     case 0:
@@ -124,7 +124,7 @@ int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req)
     }
 
     vm_event_fill_regs(req);
-    vm_event_put_request(d, d->vm_event_monitor, req);
+    vm_event_put_request(d->vm_event_monitor, req);
 
     return rc;
 }
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index dddc2d4..77da41b 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -35,86 +35,66 @@
 #define xen_rmb()  smp_rmb()
 #define xen_wmb()  smp_wmb()
 
-#define vm_event_ring_lock_init(_ved)  spin_lock_init(&(_ved)->ring_lock)
-#define vm_event_ring_lock(_ved)       spin_lock(&(_ved)->ring_lock)
-#define vm_event_ring_unlock(_ved)     spin_unlock(&(_ved)->ring_lock)
+#define vm_event_lock_init(_ved)  spin_lock_init(&(_ved)->lock)
+#define vm_event_lock(_ved)       spin_lock(&(_ved)->lock)
+#define vm_event_unlock(_ved)     spin_unlock(&(_ved)->lock)
 
-static int vm_event_enable(
-    struct domain *d,
-    struct xen_domctl_vm_event_op *vec,
-    struct vm_event_domain **ved,
-    int pause_flag,
-    int param,
-    xen_event_channel_notification_t notification_fn)
-{
-    int rc;
-    unsigned long ring_gfn = d->arch.hvm.params[param];
-
-    if ( !*ved )
-        *ved = xzalloc(struct vm_event_domain);
-    if ( !*ved )
-        return -ENOMEM;
-
-    /* Only one helper at a time. If the helper crashed,
-     * the ring is in an undefined state and so is the guest.
-     */
-    if ( (*ved)->ring_page )
-        return -EBUSY;;
-
-    /* The parameter defaults to zero, and it should be
-     * set to something */
-    if ( ring_gfn == 0 )
-        return -ENOSYS;
-
-    vm_event_ring_lock_init(*ved);
-    vm_event_ring_lock(*ved);
-
-    rc = vm_event_init_domain(d);
-
-    if ( rc < 0 )
-        goto err;
-
-    rc = prepare_ring_for_helper(d, ring_gfn, &(*ved)->ring_pg_struct,
-                                 &(*ved)->ring_page);
-    if ( rc < 0 )
-        goto err;
-
-    /* Set the number of currently blocked vCPUs to 0. */
-    (*ved)->blocked = 0;
-
-    /* Allocate event channel */
-    rc = alloc_unbound_xen_event_channel(d, 0, current->domain->domain_id,
-                                         notification_fn);
-    if ( rc < 0 )
-        goto err;
-
-    (*ved)->xen_port = vec->port = rc;
-
-    /* Prepare ring buffer */
-    FRONT_RING_INIT(&(*ved)->front_ring,
-                    (vm_event_sring_t *)(*ved)->ring_page,
-                    PAGE_SIZE);
-
-    /* Save the pause flag for this particular ring. */
-    (*ved)->pause_flag = pause_flag;
-
-    /* Initialize the last-chance wait queue. */
-    init_waitqueue_head(&(*ved)->wq);
-
-    vm_event_ring_unlock(*ved);
-    return 0;
+#define to_vm_event_domain_ring(_ved) container_of(_ved, struct vm_event_domain_ring, ved)
 
- err:
-    destroy_ring_for_helper(&(*ved)->ring_page,
-                            (*ved)->ring_pg_struct);
-    vm_event_ring_unlock(*ved);
-    xfree(*ved);
-    *ved = NULL;
+struct vm_event_domain
+{
+    /* VM event ops */
+    bool (*check)(struct vm_event_domain *ved);
+    int (*claim_slot)(struct vm_event_domain *ved, bool allow_sleep);
+    void (*release_slot)(struct vm_event_domain *ved);
+    void (*put_request)(struct vm_event_domain *ved, vm_event_request_t *req);
+    int (*get_response)(struct vm_event_domain *ved, vm_event_response_t *rsp);
+    int (*disable)(struct vm_event_domain **_ved);
+
+    /* The domain associated with the VM event */
+    struct domain *d;
+
+    /* ring lock */
+    spinlock_t lock;
+};
+
+bool vm_event_check(struct vm_event_domain *ved)
+{
+    return (ved && ved->check(ved));
+}
 
-    return rc;
+/* VM event domain ring implementation */
+struct vm_event_domain_ring
+{
+    /* VM event domain */
+    struct vm_event_domain ved;
+    /* The ring has 64 entries */
+    unsigned char foreign_producers;
+    unsigned char target_producers;
+    /* shared ring page */
+    void *ring_page;
+    struct page_info *ring_pg_struct;
+    /* front-end ring */
+    vm_event_front_ring_t front_ring;
+    /* event channel port (vcpu0 only) */
+    int xen_port;
+    /* vm_event bit for vcpu->pause_flags */
+    int pause_flag;
+    /* list of vcpus waiting for room in the ring */
+    struct waitqueue_head wq;
+    /* the number of vCPUs blocked */
+    unsigned int blocked;
+    /* The last vcpu woken up */
+    unsigned int last_vcpu_wake_up;
+};
+
+static bool vm_event_ring_check(struct vm_event_domain *ved)
+{
+    struct vm_event_domain_ring *impl = to_vm_event_domain_ring(ved);
+    return impl->ring_page != NULL;
 }
 
-static unsigned int vm_event_ring_available(struct vm_event_domain *ved)
+static unsigned int vm_event_ring_available(struct vm_event_domain_ring *ved)
 {
     int avail_req = RING_FREE_REQUESTS(&ved->front_ring);
     avail_req -= ved->target_producers;
@@ -126,15 +106,16 @@ static unsigned int vm_event_ring_available(struct vm_event_domain *ved)
 }
 
 /*
- * vm_event_wake_blocked() will wakeup vcpus waiting for room in the
+ * vm_event_ring_wake_blocked() will wakeup vcpus waiting for room in the
  * ring. These vCPUs were paused on their way out after placing an event,
  * but need to be resumed where the ring is capable of processing at least
  * one event from them.
  */
-static void vm_event_wake_blocked(struct domain *d, struct vm_event_domain *ved)
+static void vm_event_ring_wake_blocked(struct vm_event_domain_ring *ved)
 {
     struct vcpu *v;
     unsigned int avail_req = vm_event_ring_available(ved);
+    struct domain *d = ved->ved.d;
 
     if ( avail_req == 0 || ved->blocked == 0 )
         return;
@@ -171,7 +152,7 @@ static void vm_event_wake_blocked(struct domain *d, struct vm_event_domain *ved)
  * was unable to do so, it is queued on a wait queue.  These are woken as
  * needed, and take precedence over the blocked vCPUs.
  */
-static void vm_event_wake_queued(struct domain *d, struct vm_event_domain *ved)
+static void vm_event_ring_wake_queued(struct vm_event_domain_ring *ved)
 {
     unsigned int avail_req = vm_event_ring_available(ved);
 
@@ -180,79 +161,84 @@ static void vm_event_wake_queued(struct domain *d, struct vm_event_domain *ved)
 }
 
 /*
- * vm_event_wake() will wakeup all vcpus waiting for the ring to
+ * vm_event_ring_wake() will wakeup all vcpus waiting for the ring to
  * become available.  If we have queued vCPUs, they get top priority. We
  * are guaranteed that they will go through code paths that will eventually
- * call vm_event_wake() again, ensuring that any blocked vCPUs will get
+ * call vm_event_ring_wake() again, ensuring that any blocked vCPUs will get
  * unpaused once all the queued vCPUs have made it through.
  */
-void vm_event_wake(struct domain *d, struct vm_event_domain *ved)
+static void vm_event_ring_wake(struct vm_event_domain_ring *ved)
 {
     if (!list_empty(&ved->wq.list))
-        vm_event_wake_queued(d, ved);
+        vm_event_ring_wake_queued(ved);
     else
-        vm_event_wake_blocked(d, ved);
+        vm_event_ring_wake_blocked(ved);
 }
 
-static int vm_event_disable(struct domain *d, struct vm_event_domain **ved)
+static int vm_event_disable(struct vm_event_domain **_ved)
 {
-    if ( vm_event_check_ring(*ved) )
-    {
-        struct vcpu *v;
+    return ( vm_event_check(*_ved) ) ? (*_ved)->disable(_ved) : 0;
+}
 
-        vm_event_ring_lock(*ved);
+static int vm_event_ring_disable(struct vm_event_domain **_ved)
+{
+    struct vcpu *v;
+    struct vm_event_domain_ring *ved = to_vm_event_domain_ring(*_ved);
+    struct domain *d = ved->ved.d;
 
-        if ( !list_empty(&(*ved)->wq.list) )
-        {
-            vm_event_ring_unlock(*ved);
-            return -EBUSY;
-        }
+    vm_event_lock(&ved->ved);
+
+    if ( !list_empty(&ved->wq.list) )
+    {
+        vm_event_unlock(&ved->ved);
+        return -EBUSY;
+    }
 
-        /* Free domU's event channel and leave the other one unbound */
-        free_xen_event_channel(d, (*ved)->xen_port);
+    /* Free domU's event channel and leave the other one unbound */
+    free_xen_event_channel(d, ved->xen_port);
 
-        /* Unblock all vCPUs */
-        for_each_vcpu ( d, v )
+    /* Unblock all vCPUs */
+    for_each_vcpu ( d, v )
+    {
+        if ( test_and_clear_bit(ved->pause_flag, &v->pause_flags) )
         {
-            if ( test_and_clear_bit((*ved)->pause_flag, &v->pause_flags) )
-            {
-                vcpu_unpause(v);
-                (*ved)->blocked--;
-            }
+            vcpu_unpause(v);
+            ved->blocked--;
         }
+    }
 
-        destroy_ring_for_helper(&(*ved)->ring_page,
-                                (*ved)->ring_pg_struct);
+    destroy_ring_for_helper(&ved->ring_page,
+                            ved->ring_pg_struct);
 
-        vm_event_cleanup_domain(d);
+    vm_event_cleanup_domain(d);
 
-        vm_event_ring_unlock(*ved);
-    }
+    vm_event_unlock(&ved->ved);
 
-    xfree(*ved);
-    *ved = NULL;
+    XFREE(*_ved);
 
     return 0;
 }
 
-static inline void vm_event_release_slot(struct domain *d,
-                                         struct vm_event_domain *ved)
+static inline void vm_event_ring_release_slot(struct vm_event_domain *ved)
 {
+    struct vm_event_domain_ring *impl = to_vm_event_domain_ring(ved);
+
     /* Update the accounting */
-    if ( current->domain == d )
-        ved->target_producers--;
+    if ( current->domain == ved->d )
+        impl->target_producers--;
     else
-        ved->foreign_producers--;
+        impl->foreign_producers--;
 
     /* Kick any waiters */
-    vm_event_wake(d, ved);
+    vm_event_ring_wake(impl);
 }
 
 /*
- * vm_event_mark_and_pause() tags vcpu and put it to sleep.
- * The vcpu will resume execution in vm_event_wake_blocked().
+ * vm_event_ring_mark_and_pause() tags vcpu and put it to sleep.
+ * The vcpu will resume execution in vm_event_ring_wake_blocked().
  */
-void vm_event_mark_and_pause(struct vcpu *v, struct vm_event_domain *ved)
+static void vm_event_ring_mark_and_pause(struct vcpu *v,
+                                         struct vm_event_domain_ring *ved)
 {
     if ( !test_and_set_bit(ved->pause_flag, &v->pause_flags) )
     {
@@ -261,24 +247,31 @@ void vm_event_mark_and_pause(struct vcpu *v, struct vm_event_domain *ved)
     }
 }
 
+void vm_event_put_request(struct vm_event_domain *ved,
+                          vm_event_request_t *req)
+{
+    if( !vm_event_check(ved))
+        return;
+
+    ved->put_request(ved, req);
+}
+
 /*
  * This must be preceded by a call to claim_slot(), and is guaranteed to
  * succeed.  As a side-effect however, the vCPU may be paused if the ring is
  * overly full and its continued execution would cause stalling and excessive
  * waiting.  The vCPU will be automatically unpaused when the ring clears.
  */
-void vm_event_put_request(struct domain *d,
-                          struct vm_event_domain *ved,
-                          vm_event_request_t *req)
+static void vm_event_ring_put_request(struct vm_event_domain *ved,
+                                      vm_event_request_t *req)
 {
     vm_event_front_ring_t *front_ring;
     int free_req;
     unsigned int avail_req;
     RING_IDX req_prod;
     struct vcpu *curr = current;
-
-    if( !vm_event_check_ring(ved))
-        return;
+    struct domain *d = ved->d;
+    struct vm_event_domain_ring *impl = to_vm_event_domain_ring(ved);
 
     if ( curr->domain != d )
     {
@@ -286,16 +279,16 @@ void vm_event_put_request(struct domain *d,
 #ifndef NDEBUG
         if ( !(req->flags & VM_EVENT_FLAG_VCPU_PAUSED) )
             gdprintk(XENLOG_G_WARNING, "d%dv%d was not paused.\n",
-                     d->domain_id, req->vcpu_id);
+                     ved->d->domain_id, req->vcpu_id);
 #endif
     }
 
     req->version = VM_EVENT_INTERFACE_VERSION;
 
-    vm_event_ring_lock(ved);
+    vm_event_lock(ved);
 
     /* Due to the reservations, this step must succeed. */
-    front_ring = &ved->front_ring;
+    front_ring = &impl->front_ring;
     free_req = RING_FREE_REQUESTS(front_ring);
     ASSERT(free_req > 0);
 
@@ -309,35 +302,36 @@ void vm_event_put_request(struct domain *d,
     RING_PUSH_REQUESTS(front_ring);
 
     /* We've actually *used* our reservation, so release the slot. */
-    vm_event_release_slot(d, ved);
+    vm_event_ring_release_slot(ved);
 
     /* Give this vCPU a black eye if necessary, on the way out.
      * See the comments above wake_blocked() for more information
      * on how this mechanism works to avoid waiting. */
-    avail_req = vm_event_ring_available(ved);
+    avail_req = vm_event_ring_available(impl);
     if( curr->domain == d && avail_req < d->max_vcpus &&
         !atomic_read(&curr->vm_event_pause_count) )
-        vm_event_mark_and_pause(curr, ved);
+        vm_event_ring_mark_and_pause(curr, impl);
 
-    vm_event_ring_unlock(ved);
+    vm_event_unlock(ved);
 
-    notify_via_xen_event_channel(d, ved->xen_port);
+    notify_via_xen_event_channel(d, impl->xen_port);
 }
 
-int vm_event_get_response(struct domain *d, struct vm_event_domain *ved,
-                          vm_event_response_t *rsp)
+static int vm_event_ring_get_response(struct vm_event_domain *ved,
+                                      vm_event_response_t *rsp)
 {
     vm_event_front_ring_t *front_ring;
     RING_IDX rsp_cons;
+    struct vm_event_domain_ring *impl = (struct vm_event_domain_ring *)ved;
 
-    vm_event_ring_lock(ved);
+    vm_event_lock(ved);
 
-    front_ring = &ved->front_ring;
+    front_ring = &impl->front_ring;
     rsp_cons = front_ring->rsp_cons;
 
     if ( !RING_HAS_UNCONSUMED_RESPONSES(front_ring) )
     {
-        vm_event_ring_unlock(ved);
+        vm_event_unlock(ved);
         return 0;
     }
 
@@ -351,9 +345,9 @@ int vm_event_get_response(struct domain *d, struct vm_event_domain *ved,
 
     /* Kick any waiters -- since we've just consumed an event,
      * there may be additional space available in the ring. */
-    vm_event_wake(d, ved);
+    vm_event_ring_wake(impl);
 
-    vm_event_ring_unlock(ved);
+    vm_event_unlock(ved);
 
     return 1;
 }
@@ -366,9 +360,15 @@ int vm_event_get_response(struct domain *d, struct vm_event_domain *ved,
  * Note: responses are handled the same way regardless of which ring they
  * arrive on.
  */
-void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
+static int vm_event_resume(struct vm_event_domain *ved)
 {
     vm_event_response_t rsp;
+    struct domain *d;
+
+    if (! vm_event_check(ved))
+        return -ENODEV;
+
+    d = ved->d;
 
     /*
      * vm_event_resume() runs in either XEN_VM_EVENT_* domctls, or
@@ -381,7 +381,7 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
     ASSERT(d != current->domain);
 
     /* Pull all responses off the ring. */
-    while ( vm_event_get_response(d, ved, &rsp) )
+    while ( ved->get_response(ved, &rsp) )
     {
         struct vcpu *v;
 
@@ -443,31 +443,36 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
                 vm_event_vcpu_unpause(v);
         }
     }
+
+    return 0;
 }
 
-void vm_event_cancel_slot(struct domain *d, struct vm_event_domain *ved)
+void vm_event_cancel_slot(struct vm_event_domain *ved)
 {
-    if( !vm_event_check_ring(ved) )
+    if( !vm_event_check(ved) )
         return;
 
-    vm_event_ring_lock(ved);
-    vm_event_release_slot(d, ved);
-    vm_event_ring_unlock(ved);
+    if (ved->release_slot)
+    {
+        vm_event_lock(ved);
+        ved->release_slot(ved);
+        vm_event_unlock(ved);
+    }
 }
 
-static int vm_event_grab_slot(struct vm_event_domain *ved, int foreign)
+static int vm_event_ring_grab_slot(struct vm_event_domain_ring *ved, int foreign)
 {
     unsigned int avail_req;
 
     if ( !ved->ring_page )
         return -ENOSYS;
 
-    vm_event_ring_lock(ved);
+    vm_event_lock(&ved->ved);
 
     avail_req = vm_event_ring_available(ved);
     if ( avail_req == 0 )
     {
-        vm_event_ring_unlock(ved);
+        vm_event_unlock(&ved->ved);
         return -EBUSY;
     }
 
@@ -476,31 +481,26 @@ static int vm_event_grab_slot(struct vm_event_domain *ved, int foreign)
     else
         ved->foreign_producers++;
 
-    vm_event_ring_unlock(ved);
+    vm_event_unlock(&ved->ved);
 
     return 0;
 }
 
 /* Simple try_grab wrapper for use in the wait_event() macro. */
-static int vm_event_wait_try_grab(struct vm_event_domain *ved, int *rc)
+static int vm_event_ring_wait_try_grab(struct vm_event_domain_ring *ved, int *rc)
 {
-    *rc = vm_event_grab_slot(ved, 0);
+    *rc = vm_event_ring_grab_slot(ved, 0);
     return *rc;
 }
 
-/* Call vm_event_grab_slot() until the ring doesn't exist, or is available. */
-static int vm_event_wait_slot(struct vm_event_domain *ved)
+/* Call vm_event_ring_grab_slot() until the ring doesn't exist, or is available. */
+static int vm_event_ring_wait_slot(struct vm_event_domain_ring *ved)
 {
     int rc = -EBUSY;
-    wait_event(ved->wq, vm_event_wait_try_grab(ved, &rc) != -EBUSY);
+    wait_event(ved->wq, vm_event_ring_wait_try_grab(ved, &rc) != -EBUSY);
     return rc;
 }
 
-bool vm_event_check_ring(struct vm_event_domain *ved)
-{
-    return (ved && ved->ring_page);
-}
-
 /*
  * Determines whether or not the current vCPU belongs to the target domain,
  * and calls the appropriate wait function.  If it is a guest vCPU, then we
@@ -513,46 +513,42 @@ bool vm_event_check_ring(struct vm_event_domain *ved)
  *               0: a spot has been reserved
  *
  */
-int __vm_event_claim_slot(struct domain *d, struct vm_event_domain *ved,
-                          bool allow_sleep)
+static int vm_event_ring_claim_slot(struct vm_event_domain *ved, bool allow_sleep)
+{
+    if ( (current->domain == ved->d) && allow_sleep )
+        return vm_event_ring_wait_slot(to_vm_event_domain_ring(ved));
+    else
+        return vm_event_ring_grab_slot(to_vm_event_domain_ring(ved),
+                                       current->domain != ved->d);
+}
+
+int __vm_event_claim_slot(struct vm_event_domain *ved, bool allow_sleep)
 {
-    if ( !vm_event_check_ring(ved) )
+    if ( !vm_event_check(ved) )
         return -EOPNOTSUPP;
 
-    if ( (current->domain == d) && allow_sleep )
-        return vm_event_wait_slot(ved);
-    else
-        return vm_event_grab_slot(ved, (current->domain != d));
+    return ved->claim_slot(ved, allow_sleep);
 }
 
 #ifdef CONFIG_HAS_MEM_PAGING
 /* Registered with Xen-bound event channel for incoming notifications. */
 static void mem_paging_notification(struct vcpu *v, unsigned int port)
 {
-    struct domain *domain = v->domain;
-
-    if ( likely(vm_event_check_ring(domain->vm_event_paging)) )
-        vm_event_resume(domain, domain->vm_event_paging);
+    vm_event_resume(v->domain->vm_event_paging);
 }
 #endif
 
 /* Registered with Xen-bound event channel for incoming notifications. */
 static void monitor_notification(struct vcpu *v, unsigned int port)
 {
-    struct domain *domain = v->domain;
-
-    if ( likely(vm_event_check_ring(domain->vm_event_monitor)) )
-        vm_event_resume(domain, domain->vm_event_monitor);
+    vm_event_resume(v->domain->vm_event_monitor);
 }
 
 #ifdef CONFIG_HAS_MEM_SHARING
 /* Registered with Xen-bound event channel for incoming notifications. */
 static void mem_sharing_notification(struct vcpu *v, unsigned int port)
 {
-    struct domain *domain = v->domain;
-
-    if ( likely(vm_event_check_ring(domain->vm_event_share)) )
-        vm_event_resume(domain, domain->vm_event_share);
+    vm_event_resume(v->domain->vm_event_share);
 }
 #endif
 
@@ -560,7 +556,7 @@ static void mem_sharing_notification(struct vcpu *v, unsigned int port)
 void vm_event_cleanup(struct domain *d)
 {
 #ifdef CONFIG_HAS_MEM_PAGING
-    if ( vm_event_check_ring(d->vm_event_paging) )
+    if ( vm_event_check(d->vm_event_paging) )
     {
         /* Destroying the wait queue head means waking up all
          * queued vcpus. This will drain the list, allowing
@@ -569,24 +565,109 @@ void vm_event_cleanup(struct domain *d)
          * Finally, because this code path involves previously
          * pausing the domain (domain_kill), unpausing the
          * vcpus causes no harm. */
-        destroy_waitqueue_head(&d->vm_event_paging->wq);
-        (void)vm_event_disable(d, &d->vm_event_paging);
+        destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_paging)->wq);
+        (void)vm_event_disable(&d->vm_event_paging);
     }
 #endif
-    if ( vm_event_check_ring(d->vm_event_monitor) )
+    if ( vm_event_check(d->vm_event_monitor) )
     {
-        destroy_waitqueue_head(&d->vm_event_monitor->wq);
-        (void)vm_event_disable(d, &d->vm_event_monitor);
+        destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_monitor)->wq);
+        (void)vm_event_disable(&d->vm_event_monitor);
     }
 #ifdef CONFIG_HAS_MEM_SHARING
-    if ( vm_event_check_ring(d->vm_event_share) )
+    if ( vm_event_check(d->vm_event_share) )
     {
-        destroy_waitqueue_head(&d->vm_event_share->wq);
-        (void)vm_event_disable(d, &d->vm_event_share);
+        destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_share)->wq);
+        (void)vm_event_disable(&d->vm_event_share);
     }
 #endif
 }
 
+static int vm_event_ring_enable(
+    struct domain *d,
+    struct xen_domctl_vm_event_op *vec,
+    struct vm_event_domain **ved,
+    int pause_flag,
+    int param,
+    xen_event_channel_notification_t notification_fn)
+{
+    int rc;
+    unsigned long ring_gfn = d->arch.hvm.params[param];
+    struct vm_event_domain_ring *impl;
+
+    impl = (*ved) ? (struct vm_event_domain_ring* )(*ved) :
+            xzalloc(struct vm_event_domain_ring);
+
+    if ( !impl )
+        return -ENOMEM;
+
+    impl->ved.d = d;
+    impl->ved.check = vm_event_ring_check;
+    impl->ved.claim_slot = vm_event_ring_claim_slot;
+    impl->ved.release_slot = vm_event_ring_release_slot;
+    impl->ved.put_request = vm_event_ring_put_request;
+    impl->ved.get_response = vm_event_ring_get_response;
+    impl->ved.disable = vm_event_ring_disable;
+
+    /* Only one helper at a time. If the helper crashed,
+     * the ring is in an undefined state and so is the guest.
+     */
+    if ( impl->ring_page )
+        return -EBUSY;
+
+    /* The parameter defaults to zero, and it should be
+     * set to something */
+    if ( ring_gfn == 0 )
+        return -ENOSYS;
+
+    vm_event_lock_init(&impl->ved);
+    vm_event_lock(&impl->ved);
+
+    rc = vm_event_init_domain(d);
+    if ( rc < 0 )
+        goto err;
+
+    rc = prepare_ring_for_helper(d, ring_gfn, &impl->ring_pg_struct,
+                                 &impl->ring_page);
+    if ( rc < 0 )
+        goto err;
+
+    /* Set the number of currently blocked vCPUs to 0. */
+    impl->blocked = 0;
+
+    /* Allocate event channel */
+    rc = alloc_unbound_xen_event_channel(d, 0, current->domain->domain_id,
+                                         notification_fn);
+    if ( rc < 0 )
+        goto err;
+
+    impl->xen_port = vec->port = rc;
+
+    /* Prepare ring buffer */
+    FRONT_RING_INIT(&impl->front_ring,
+                    (vm_event_sring_t *)impl->ring_page,
+                    PAGE_SIZE);
+
+    /* Save the pause flag for this particular ring. */
+    impl->pause_flag = pause_flag;
+
+    /* Initialize the last-chance wait queue. */
+    init_waitqueue_head(&impl->wq);
+
+    vm_event_unlock(&impl->ved);
+
+    *ved = &impl->ved;
+    return 0;
+
+ err:
+    destroy_ring_for_helper(&impl->ring_page,
+                            impl->ring_pg_struct);
+    vm_event_unlock(&impl->ved);
+    XFREE(impl);
+
+    return rc;
+}
+
 int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
                     XEN_GUEST_HANDLE_PARAM(void) u_domctl)
 {
@@ -651,26 +732,23 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
                 break;
 
             /* domain_pause() not required here, see XSA-99 */
-            rc = vm_event_enable(d, vec, &d->vm_event_paging, _VPF_mem_paging,
-                                 HVM_PARAM_PAGING_RING_PFN,
-                                 mem_paging_notification);
+            rc = vm_event_ring_enable(d, vec, &d->vm_event_paging, _VPF_mem_paging,
+                                      HVM_PARAM_PAGING_RING_PFN,
+                                      mem_paging_notification);
         }
         break;
 
         case XEN_VM_EVENT_DISABLE:
-            if ( vm_event_check_ring(d->vm_event_paging) )
+            if ( vm_event_check(d->vm_event_paging) )
             {
                 domain_pause(d);
-                rc = vm_event_disable(d, &d->vm_event_paging);
+                rc = vm_event_disable(&d->vm_event_paging);
                 domain_unpause(d);
             }
             break;
 
         case XEN_VM_EVENT_RESUME:
-            if ( vm_event_check_ring(d->vm_event_paging) )
-                vm_event_resume(d, d->vm_event_paging);
-            else
-                rc = -ENODEV;
+            rc = vm_event_resume(d->vm_event_paging);
             break;
 
         default:
@@ -692,26 +770,23 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
             rc = arch_monitor_init_domain(d);
             if ( rc )
                 break;
-            rc = vm_event_enable(d, vec, &d->vm_event_monitor, _VPF_mem_access,
-                                 HVM_PARAM_MONITOR_RING_PFN,
-                                 monitor_notification);
+            rc = vm_event_ring_enable(d, vec, &d->vm_event_monitor, _VPF_mem_access,
+                                      HVM_PARAM_MONITOR_RING_PFN,
+                                      monitor_notification);
             break;
 
         case XEN_VM_EVENT_DISABLE:
-            if ( vm_event_check_ring(d->vm_event_monitor) )
+            if ( vm_event_check(d->vm_event_monitor) )
             {
                 domain_pause(d);
-                rc = vm_event_disable(d, &d->vm_event_monitor);
+                rc = vm_event_disable(&d->vm_event_monitor);
                 arch_monitor_cleanup_domain(d);
                 domain_unpause(d);
             }
             break;
 
         case XEN_VM_EVENT_RESUME:
-            if ( vm_event_check_ring(d->vm_event_monitor) )
-                vm_event_resume(d, d->vm_event_monitor);
-            else
-                rc = -ENODEV;
+            rc = vm_event_resume(d->vm_event_monitor);
             break;
 
         default:
@@ -740,26 +815,22 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
                 break;
 
             /* domain_pause() not required here, see XSA-99 */
-            rc = vm_event_enable(d, vec, &d->vm_event_share, _VPF_mem_sharing,
-                                 HVM_PARAM_SHARING_RING_PFN,
-                                 mem_sharing_notification);
+            rc = vm_event_ring_enable(d, vec, &d->vm_event_share, _VPF_mem_sharing,
+                                      HVM_PARAM_SHARING_RING_PFN,
+                                      mem_sharing_notification);
             break;
 
         case XEN_VM_EVENT_DISABLE:
-            if ( vm_event_check_ring(d->vm_event_share) )
+            if ( vm_event_check(d->vm_event_share) )
             {
                 domain_pause(d);
-                rc = vm_event_disable(d, &d->vm_event_share);
+                rc = vm_event_disable(&d->vm_event_share);
                 domain_unpause(d);
             }
             break;
 
         case XEN_VM_EVENT_RESUME:
-            if ( vm_event_check_ring(d->vm_event_share) )
-                vm_event_resume(d, d->vm_event_share);
-            else
-                rc = -ENODEV;
-            break;
+            rc = vm_event_resume(d->vm_event_share);
 
         default:
             rc = -ENOSYS;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 1277ce2..a9593e7 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1465,7 +1465,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     /* Prevent device assign if mem paging or mem sharing have been 
      * enabled for this domain */
     if ( unlikely(d->arch.hvm.mem_sharing_enabled ||
-                  vm_event_check_ring(d->vm_event_paging) ||
+                  vm_event_check(d->vm_event_paging) ||
                   p2m_get_hostp2m(d)->global_logdirty) )
         return -EXDEV;
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0309c1f..d840e03 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -278,30 +278,7 @@ struct vcpu
 #define domain_lock(d) spin_lock_recursive(&(d)->domain_lock)
 #define domain_unlock(d) spin_unlock_recursive(&(d)->domain_lock)
 
-/* VM event */
-struct vm_event_domain
-{
-    /* ring lock */
-    spinlock_t ring_lock;
-    /* The ring has 64 entries */
-    unsigned char foreign_producers;
-    unsigned char target_producers;
-    /* shared ring page */
-    void *ring_page;
-    struct page_info *ring_pg_struct;
-    /* front-end ring */
-    vm_event_front_ring_t front_ring;
-    /* event channel port (vcpu0 only) */
-    int xen_port;
-    /* vm_event bit for vcpu->pause_flags */
-    int pause_flag;
-    /* list of vcpus waiting for room in the ring */
-    struct waitqueue_head wq;
-    /* the number of vCPUs blocked */
-    unsigned int blocked;
-    /* The last vcpu woken up */
-    unsigned int last_vcpu_wake_up;
-};
+struct vm_event_domain;
 
 struct evtchn_port_ops;
 
diff --git a/xen/include/xen/vm_event.h b/xen/include/xen/vm_event.h
index 5302ee5..a5c82d6 100644
--- a/xen/include/xen/vm_event.h
+++ b/xen/include/xen/vm_event.h
@@ -29,8 +29,8 @@
 /* Clean up on domain destruction */
 void vm_event_cleanup(struct domain *d);
 
-/* Returns whether a ring has been set up */
-bool vm_event_check_ring(struct vm_event_domain *ved);
+/* Returns whether the VM event domain has been set up */
+bool vm_event_check(struct vm_event_domain *ved);
 
 /* Returns 0 on success, -ENOSYS if there is no ring, -EBUSY if there is no
  * available space and the caller is a foreign domain. If the guest itself
@@ -45,30 +45,22 @@ bool vm_event_check_ring(struct vm_event_domain *ved);
  * cancel_slot(), both of which are guaranteed to
  * succeed.
  */
-int __vm_event_claim_slot(struct domain *d, struct vm_event_domain *ved,
-                          bool allow_sleep);
-static inline int vm_event_claim_slot(struct domain *d,
-                                      struct vm_event_domain *ved)
+int __vm_event_claim_slot(struct vm_event_domain *ved, bool allow_sleep);
+static inline int vm_event_claim_slot(struct vm_event_domain *ved)
 {
-    return __vm_event_claim_slot(d, ved, true);
+    return __vm_event_claim_slot(ved, true);
 }
 
-static inline int vm_event_claim_slot_nosleep(struct domain *d,
-                                              struct vm_event_domain *ved)
+static inline int vm_event_claim_slot_nosleep(struct vm_event_domain *ved)
 {
-    return __vm_event_claim_slot(d, ved, false);
+    return __vm_event_claim_slot(ved, false);
 }
 
-void vm_event_cancel_slot(struct domain *d, struct vm_event_domain *ved);
+void vm_event_cancel_slot(struct vm_event_domain *ved);
 
-void vm_event_put_request(struct domain *d, struct vm_event_domain *ved,
+void vm_event_put_request(struct vm_event_domain *ved,
                           vm_event_request_t *req);
 
-int vm_event_get_response(struct domain *d, struct vm_event_domain *ved,
-                          vm_event_response_t *rsp);
-
-void vm_event_resume(struct domain *d, struct vm_event_domain *ved);
-
 int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
                     XEN_GUEST_HANDLE_PARAM(void) u_domctl);
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
                   ` (2 preceding siblings ...)
  2018-12-19 18:52 ` [RFC PATCH 3/6] vm_event: Refactor vm_event_domain implementation Petre Pircalabu
@ 2018-12-19 18:52 ` Petre Pircalabu
  2018-12-20 12:05   ` Paul Durrant
  2018-12-19 18:52 ` [RFC PATCH 5/6] xen-access: add support for slotted channel vm_events Petre Pircalabu
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Petre Pircalabu @ 2018-12-19 18:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

In high throughput introspection scenarios where lots of monitor
vm_events are generated, the ring buffer can fill up before the monitor
application gets a chance to handle all the requests thus blocking
other vcpus which will have to wait for a slot to become available.

This patch adds support for a different mechanism to handle synchronous
vm_event requests / responses. As each synchronous request pauses the
vcpu until the corresponsing response is handled, it can be stored in
a slotted memory buffer (one per vcpu) shared between the hypervisor and
the controlling domain. The asynchronous vm_event requests will be sent
to the controlling domain using a ring buffer, but without blocking the
vcpu as no response is required.

The memory for the asynchronous ring and the synchronous channels will
be allocated from domheap and mapped to the controlling domain using the
foreignmemory_map_resource interface. Unlike the current implementation,
the allocated pages are not part of the target DomU, so they will not be
reclaimed when the vm_event domain is disabled.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
---
 tools/libxc/include/xenctrl.h |  11 +
 tools/libxc/xc_monitor.c      |  36 +++
 tools/libxc/xc_private.h      |  14 ++
 tools/libxc/xc_vm_event.c     |  74 +++++-
 xen/arch/x86/mm.c             |   7 +
 xen/common/vm_event.c         | 515 ++++++++++++++++++++++++++++++++++++++----
 xen/include/public/domctl.h   |  25 +-
 xen/include/public/memory.h   |   2 +
 xen/include/public/vm_event.h |  15 ++
 xen/include/xen/vm_event.h    |   4 +
 10 files changed, 660 insertions(+), 43 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index de0b990..fad8bc4 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2012,6 +2012,17 @@ int xc_get_mem_access(xc_interface *xch, uint32_t domain_id,
  * Caller has to unmap this page when done.
  */
 void *xc_monitor_enable(xc_interface *xch, uint32_t domain_id, uint32_t *port);
+
+struct xenforeignmemory_resource_handle *xc_monitor_enable_ex(
+    xc_interface *xch,
+    uint32_t domain_id,
+    void **_ring_buffer,
+    uint32_t ring_frames,
+    uint32_t *ring_port,
+    void **_sync_buffer,
+    uint32_t *sync_ports,
+    uint32_t nr_sync_channels);
+
 int xc_monitor_disable(xc_interface *xch, uint32_t domain_id);
 int xc_monitor_resume(xc_interface *xch, uint32_t domain_id);
 /*
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index 718fe8b..4ceb528 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -49,6 +49,42 @@ void *xc_monitor_enable(xc_interface *xch, uint32_t domain_id, uint32_t *port)
     return buffer;
 }
 
+struct xenforeignmemory_resource_handle *xc_monitor_enable_ex(
+    xc_interface *xch,
+    uint32_t domain_id,
+    void **_ring_buffer,
+    uint32_t ring_frames,
+    uint32_t *ring_port,
+    void **_sync_buffer,
+    uint32_t *sync_ports,
+    uint32_t nr_sync_channels)
+{
+    xenforeignmemory_resource_handle *fres;
+    int saved_errno;
+
+    /* Pause the domain for ring page setup */
+    if ( xc_domain_pause(xch, domain_id) )
+    {
+        PERROR("Unable to pause domain\n");
+        return NULL;
+    }
+
+    fres = xc_vm_event_enable_ex(xch, domain_id, XEN_VM_EVENT_TYPE_MONITOR,
+                                _ring_buffer, ring_frames, ring_port,
+                                _sync_buffer, sync_ports, nr_sync_channels);
+
+    saved_errno = errno;
+    if ( xc_domain_unpause(xch, domain_id) )
+    {
+        if ( fres )
+            saved_errno = errno;
+        PERROR("Unable to unpause domain");
+    }
+
+    errno = saved_errno;
+    return fres;
+}
+
 int xc_monitor_disable(xc_interface *xch, uint32_t domain_id)
 {
     return xc_vm_event_control(xch, domain_id,
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 482451c..1f70223 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -420,6 +420,20 @@ int xc_vm_event_control(xc_interface *xch, uint32_t domain_id, unsigned int op,
 void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int type,
                          uint32_t *port);
 
+/*
+ * Enables vm_event for using the xenforeignmemory_map_resource interface.
+ * The vm_event type can be XEN_VM_EVENT_TYPE_(PAGING/MONITOR/SHARING).
+ *
+ * The function returns:
+ *  - A ring for asynchronous vm_events.
+ *  - A slotted buffer for synchronous vm_events (one slot per vcpu)
+ *  - xenforeignmemory_resource_handle used exclusively for resource cleanup
+ */
+xenforeignmemory_resource_handle *xc_vm_event_enable_ex(xc_interface *xch,
+    uint32_t domain_id, int type,
+    void **_ring_buffer, uint32_t ring_frames, uint32_t *ring_port,
+    void **_sync_buffer, uint32_t *sync_ports, uint32_t nr_sync_channels);
+
 int do_dm_op(xc_interface *xch, uint32_t domid, unsigned int nr_bufs, ...);
 
 #endif /* __XC_PRIVATE_H__ */
diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c
index 4fc2548..0a976b4 100644
--- a/tools/libxc/xc_vm_event.c
+++ b/tools/libxc/xc_vm_event.c
@@ -22,6 +22,12 @@
 
 #include "xc_private.h"
 
+#include <xen/vm_event.h>
+
+#ifndef PFN_UP
+#define PFN_UP(x)     (((x) + PAGE_SIZE-1) >> PAGE_SHIFT)
+#endif /* PFN_UP */
+
 int xc_vm_event_control(xc_interface *xch, uint32_t domain_id, unsigned int op,
                         unsigned int type)
 {
@@ -120,7 +126,7 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int type,
         goto out;
     }
 
-    *port = domctl.u.vm_event_op.port;
+    *port = domctl.u.vm_event_op.u.enable.port;
 
     /* Remove the ring_pfn from the guest's physmap */
     rc = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, &ring_pfn);
@@ -138,6 +144,72 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int type,
     return ring_page;
 }
 
+xenforeignmemory_resource_handle *xc_vm_event_enable_ex(xc_interface *xch,
+    uint32_t domain_id, int type,
+    void **_ring_buffer, uint32_t ring_frames, uint32_t *ring_port,
+    void **_sync_buffer, uint32_t *sync_ports, uint32_t nr_sync_channels)
+{
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BOUNCE(sync_ports, nr_sync_channels * sizeof(uint32_t),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    xenforeignmemory_resource_handle *fres;
+    unsigned long nr_frames;
+    void *buffer;
+
+    if ( !_ring_buffer || !ring_port || !_sync_buffer || !sync_ports )
+    {
+        errno = EINVAL;
+        return NULL;
+    }
+
+    nr_frames = ring_frames + PFN_UP(nr_sync_channels * sizeof(struct vm_event_slot));
+
+    fres = xenforeignmemory_map_resource(xch->fmem, domain_id,
+                                         XENMEM_resource_vm_event, type, 0,
+                                         nr_frames, &buffer,
+                                         PROT_READ | PROT_WRITE, 0);
+    if ( !fres )
+    {
+        PERROR("Could not map the vm_event pages\n");
+        return NULL;
+    }
+
+    domctl.cmd = XEN_DOMCTL_vm_event_op;
+    domctl.domain = domain_id;
+    domctl.u.vm_event_op.op = XEN_VM_EVENT_GET_PORTS;
+    domctl.u.vm_event_op.type = type;
+
+    if ( xc_hypercall_bounce_pre(xch, sync_ports) )
+    {
+        PERROR("Could not bounce memory for XEN_DOMCTL_vm_event_op");
+        errno = ENOMEM;
+        return NULL;
+    }
+
+    set_xen_guest_handle(domctl.u.vm_event_op.u.get_ports.sync, sync_ports);
+
+    if ( do_domctl(xch, &domctl) )
+    {
+        PERROR("Failed to get vm_event ports\n");
+        goto out;
+    }
+
+    xc_hypercall_bounce_post(xch, sync_ports);
+    *ring_port = domctl.u.vm_event_op.u.get_ports.async;
+
+    *_sync_buffer = buffer + ring_frames * PAGE_SIZE;
+    *_ring_buffer = buffer;
+
+    return fres;
+
+out:
+    xc_hypercall_bounce_post(xch, sync_ports);
+    if ( fres )
+        xenforeignmemory_unmap_resource(xch->fmem, fres);
+    return NULL;
+}
+
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 1431f34..256c63b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -103,6 +103,7 @@
 #include <xen/efi.h>
 #include <xen/grant_table.h>
 #include <xen/hypercall.h>
+#include <xen/vm_event.h>
 #include <asm/paging.h>
 #include <asm/shadow.h>
 #include <asm/page.h>
@@ -4469,6 +4470,12 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
     }
 #endif
 
+    case XENMEM_resource_vm_event:
+    {
+        rc = vm_event_get_frames(d, id, frame, nr_frames, mfn_list);
+        break;
+    }
+
     default:
         rc = -EOPNOTSUPP;
         break;
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 77da41b..a2712a0 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -28,6 +28,8 @@
 #include <asm/p2m.h>
 #include <asm/monitor.h>
 #include <asm/vm_event.h>
+#include <xen/guest_access.h>
+#include <xen/vmap.h>
 #include <xsm/xsm.h>
 
 /* for public/io/ring.h macros */
@@ -40,6 +42,7 @@
 #define vm_event_unlock(_ved)     spin_unlock(&(_ved)->lock)
 
 #define to_vm_event_domain_ring(_ved) container_of(_ved, struct vm_event_domain_ring, ved)
+#define to_vm_event_domain_channel(_ved) container_of(_ved, struct vm_event_domain_channel, ved)
 
 struct vm_event_domain
 {
@@ -48,7 +51,8 @@ struct vm_event_domain
     int (*claim_slot)(struct vm_event_domain *ved, bool allow_sleep);
     void (*release_slot)(struct vm_event_domain *ved);
     void (*put_request)(struct vm_event_domain *ved, vm_event_request_t *req);
-    int (*get_response)(struct vm_event_domain *ved, vm_event_response_t *rsp);
+    int (*get_response)(struct vm_event_domain *ved, struct vcpu *v,
+                        unsigned int port, vm_event_response_t *rsp);
     int (*disable)(struct vm_event_domain **_ved);
 
     /* The domain associated with the VM event */
@@ -58,11 +62,6 @@ struct vm_event_domain
     spinlock_t lock;
 };
 
-bool vm_event_check(struct vm_event_domain *ved)
-{
-    return (ved && ved->check(ved));
-}
-
 /* VM event domain ring implementation */
 struct vm_event_domain_ring
 {
@@ -78,22 +77,57 @@ struct vm_event_domain_ring
     vm_event_front_ring_t front_ring;
     /* event channel port (vcpu0 only) */
     int xen_port;
-    /* vm_event bit for vcpu->pause_flags */
-    int pause_flag;
     /* list of vcpus waiting for room in the ring */
     struct waitqueue_head wq;
     /* the number of vCPUs blocked */
     unsigned int blocked;
+    /* vm_event bit for vcpu->pause_flags */
+    int pause_flag;
     /* The last vcpu woken up */
     unsigned int last_vcpu_wake_up;
 };
 
+struct vm_event_buffer
+{
+    void *va;
+    unsigned int nr_frames;
+    mfn_t mfn[0];
+};
+
+struct vm_event_domain_channel
+{
+    /* VM event domain */
+    struct vm_event_domain ved;
+    /* ring for asynchronous vm events */
+    struct vm_event_buffer *ring;
+    /* front-end ring */
+    vm_event_front_ring_t front_ring;
+    /* per vcpu channels for synchronous vm events */
+    struct vm_event_buffer *channels;
+    /*
+     * event channels ports
+     * - one per vcpu for the synchronous channels.
+     * - one for the asynchronous ring.
+     */
+    uint32_t xen_ports[0];
+};
+
+bool vm_event_check(struct vm_event_domain *ved)
+{
+    return (ved && ved->check(ved));
+}
+
 static bool vm_event_ring_check(struct vm_event_domain *ved)
 {
     struct vm_event_domain_ring *impl = to_vm_event_domain_ring(ved);
     return impl->ring_page != NULL;
 }
 
+static bool is_vm_event_domain_ring(struct vm_event_domain *ved)
+{
+    return ved->check == vm_event_ring_check;
+}
+
 static unsigned int vm_event_ring_available(struct vm_event_domain_ring *ved)
 {
     int avail_req = RING_FREE_REQUESTS(&ved->front_ring);
@@ -317,12 +351,15 @@ static void vm_event_ring_put_request(struct vm_event_domain *ved,
     notify_via_xen_event_channel(d, impl->xen_port);
 }
 
-static int vm_event_ring_get_response(struct vm_event_domain *ved,
-                                      vm_event_response_t *rsp)
+static int vm_event_ring_get_response(
+    struct vm_event_domain *ved,
+    struct vcpu *v,
+    unsigned int port,
+    vm_event_response_t *rsp)
 {
     vm_event_front_ring_t *front_ring;
     RING_IDX rsp_cons;
-    struct vm_event_domain_ring *impl = (struct vm_event_domain_ring *)ved;
+    struct vm_event_domain_ring *impl = to_vm_event_domain_ring(ved);
 
     vm_event_lock(ved);
 
@@ -332,7 +369,7 @@ static int vm_event_ring_get_response(struct vm_event_domain *ved,
     if ( !RING_HAS_UNCONSUMED_RESPONSES(front_ring) )
     {
         vm_event_unlock(ved);
-        return 0;
+        return -1;
     }
 
     /* Copy response */
@@ -353,6 +390,35 @@ static int vm_event_ring_get_response(struct vm_event_domain *ved,
 }
 
 /*
+ * The response is received only from the sync channels
+ */
+static int vm_event_channel_get_response(
+    struct vm_event_domain *ved,
+    struct vcpu *v,
+    unsigned int port,
+    vm_event_response_t *rsp)
+{
+    struct vm_event_domain_channel *impl = to_vm_event_domain_channel(ved);
+    struct vm_event_slot *slot = impl->channels->va + v->vcpu_id * sizeof(struct vm_event_slot);
+
+    vm_event_lock(ved);
+
+    if ( slot->state != VM_EVENT_SLOT_STATE_FINISH )
+    {
+        gdprintk(XENLOG_G_WARNING, "The VM event slot state for d%dv%d is invalid.\n",
+                 ved->d->domain_id, v->vcpu_id);
+        vm_event_unlock(ved);
+        return -1;
+    }
+
+    memcpy(rsp, &slot->u.rsp, sizeof(*rsp));
+    slot->state = VM_EVENT_SLOT_STATE_IDLE;
+
+    vm_event_unlock(ved);
+    return 0;
+}
+
+/*
  * Pull all responses from the given ring and unpause the corresponding vCPU
  * if required. Based on the response type, here we can also call custom
  * handlers.
@@ -360,10 +426,11 @@ static int vm_event_ring_get_response(struct vm_event_domain *ved,
  * Note: responses are handled the same way regardless of which ring they
  * arrive on.
  */
-static int vm_event_resume(struct vm_event_domain *ved)
+static int vm_event_resume(struct vm_event_domain *ved, struct vcpu *v, unsigned int port)
 {
     vm_event_response_t rsp;
     struct domain *d;
+    int rc;
 
     if (! vm_event_check(ved))
         return -ENODEV;
@@ -380,22 +447,25 @@ static int vm_event_resume(struct vm_event_domain *ved)
      */
     ASSERT(d != current->domain);
 
-    /* Pull all responses off the ring. */
-    while ( ved->get_response(ved, &rsp) )
+    /* Loop until all available responses are read. */
+    do
     {
-        struct vcpu *v;
+        struct vcpu *rsp_v;
+        rc = ved->get_response(ved, v, port, &rsp);
+        if ( rc < 0 )
+            break;
 
         if ( rsp.version != VM_EVENT_INTERFACE_VERSION )
         {
             printk(XENLOG_G_WARNING "vm_event interface version mismatch\n");
-            continue;
+            goto end_loop;
         }
 
         /* Validate the vcpu_id in the response. */
         if ( (rsp.vcpu_id >= d->max_vcpus) || !d->vcpu[rsp.vcpu_id] )
-            continue;
+            goto end_loop;
 
-        v = d->vcpu[rsp.vcpu_id];
+        rsp_v = d->vcpu[rsp.vcpu_id];
 
         /*
          * In some cases the response type needs extra handling, so here
@@ -403,7 +473,7 @@ static int vm_event_resume(struct vm_event_domain *ved)
          */
 
         /* Check flags which apply only when the vCPU is paused */
-        if ( atomic_read(&v->vm_event_pause_count) )
+        if ( atomic_read(&rsp_v->vm_event_pause_count) )
         {
 #ifdef CONFIG_HAS_MEM_PAGING
             if ( rsp.reason == VM_EVENT_REASON_MEM_PAGING )
@@ -415,34 +485,36 @@ static int vm_event_resume(struct vm_event_domain *ved)
              * has to set arch-specific flags when supported, and to avoid
              * bitmask overhead when it isn't supported.
              */
-            vm_event_emulate_check(v, &rsp);
+            vm_event_emulate_check(rsp_v, &rsp);
 
             /*
              * Check in arch-specific handler to avoid bitmask overhead when
              * not supported.
              */
-            vm_event_register_write_resume(v, &rsp);
+            vm_event_register_write_resume(rsp_v, &rsp);
 
             /*
              * Check in arch-specific handler to avoid bitmask overhead when
              * not supported.
              */
-            vm_event_toggle_singlestep(d, v, &rsp);
+            vm_event_toggle_singlestep(d, rsp_v, &rsp);
 
             /* Check for altp2m switch */
             if ( rsp.flags & VM_EVENT_FLAG_ALTERNATE_P2M )
-                p2m_altp2m_check(v, rsp.altp2m_idx);
+                p2m_altp2m_check(rsp_v, rsp.altp2m_idx);
 
             if ( rsp.flags & VM_EVENT_FLAG_SET_REGISTERS )
-                vm_event_set_registers(v, &rsp);
+                vm_event_set_registers(rsp_v, &rsp);
 
             if ( rsp.flags & VM_EVENT_FLAG_GET_NEXT_INTERRUPT )
-                vm_event_monitor_next_interrupt(v);
+                vm_event_monitor_next_interrupt(rsp_v);
 
             if ( rsp.flags & VM_EVENT_FLAG_VCPU_PAUSED )
-                vm_event_vcpu_unpause(v);
+                vm_event_vcpu_unpause(rsp_v);
         }
+end_loop: ;
     }
+    while ( rc > 0 );
 
     return 0;
 }
@@ -527,28 +599,28 @@ int __vm_event_claim_slot(struct vm_event_domain *ved, bool allow_sleep)
     if ( !vm_event_check(ved) )
         return -EOPNOTSUPP;
 
-    return ved->claim_slot(ved, allow_sleep);
+    return (ved->claim_slot) ? ved->claim_slot(ved, allow_sleep) : 0;
 }
 
 #ifdef CONFIG_HAS_MEM_PAGING
 /* Registered with Xen-bound event channel for incoming notifications. */
 static void mem_paging_notification(struct vcpu *v, unsigned int port)
 {
-    vm_event_resume(v->domain->vm_event_paging);
+    vm_event_resume(v->domain->vm_event_paging, v, port);
 }
 #endif
 
 /* Registered with Xen-bound event channel for incoming notifications. */
 static void monitor_notification(struct vcpu *v, unsigned int port)
 {
-    vm_event_resume(v->domain->vm_event_monitor);
+    vm_event_resume(v->domain->vm_event_monitor, v, port);
 }
 
 #ifdef CONFIG_HAS_MEM_SHARING
 /* Registered with Xen-bound event channel for incoming notifications. */
 static void mem_sharing_notification(struct vcpu *v, unsigned int port)
 {
-    vm_event_resume(v->domain->vm_event_share);
+    vm_event_resume(v->domain->vm_event_share, v, port);
 }
 #endif
 
@@ -565,19 +637,24 @@ void vm_event_cleanup(struct domain *d)
          * Finally, because this code path involves previously
          * pausing the domain (domain_kill), unpausing the
          * vcpus causes no harm. */
-        destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_paging)->wq);
+        if ( is_vm_event_domain_ring(d->vm_event_paging) )
+            destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_paging)->wq);
         (void)vm_event_disable(&d->vm_event_paging);
     }
 #endif
+
     if ( vm_event_check(d->vm_event_monitor) )
     {
-        destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_monitor)->wq);
+        if ( is_vm_event_domain_ring(d->vm_event_monitor) )
+            destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_monitor)->wq);
         (void)vm_event_disable(&d->vm_event_monitor);
     }
+
 #ifdef CONFIG_HAS_MEM_SHARING
     if ( vm_event_check(d->vm_event_share) )
     {
-        destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_share)->wq);
+        if ( is_vm_event_domain_ring(d->vm_event_share) )
+            destroy_waitqueue_head(&to_vm_event_domain_ring(d->vm_event_share)->wq);
         (void)vm_event_disable(&d->vm_event_share);
     }
 #endif
@@ -641,7 +718,7 @@ static int vm_event_ring_enable(
     if ( rc < 0 )
         goto err;
 
-    impl->xen_port = vec->port = rc;
+    impl->xen_port = vec->u.enable.port = rc;
 
     /* Prepare ring buffer */
     FRONT_RING_INIT(&impl->front_ring,
@@ -668,6 +745,294 @@ static int vm_event_ring_enable(
     return rc;
 }
 
+/*
+ * Helper functions for allocating / freeing vm_event buffers
+ */
+static int vm_event_alloc_buffer(struct domain *d, unsigned int nr_frames,
+                                 struct vm_event_buffer **_veb)
+{
+    struct vm_event_buffer *veb;
+    int i = 0, rc;
+
+    veb = _xzalloc(sizeof(struct vm_event_buffer) + nr_frames * sizeof(mfn_t),
+                   __alignof__(struct vm_event_buffer));
+    if ( unlikely(!veb) )
+    {
+        rc = -ENOMEM;
+        goto err;
+    }
+
+    veb->nr_frames = nr_frames;
+
+    for ( i = 0; i < nr_frames; i++ )
+    {
+        struct page_info *page = alloc_domheap_page(d, 0);
+
+        if ( !page )
+        {
+            rc = -ENOMEM;
+            goto err;
+        }
+
+        if ( !get_page_and_type(page, d, PGT_writable_page) )
+        {
+            domain_crash(d);
+            rc = -ENODATA;
+            goto err;
+        }
+
+        veb->mfn[i] = page_to_mfn(page);
+    }
+
+    veb->va = vmap(veb->mfn, nr_frames);
+    if ( !veb->va )
+    {
+        rc = -ENOMEM;
+        goto err;
+    }
+
+    for( i = 0; i < nr_frames; i++ )
+        clear_page(veb->va + i * PAGE_SIZE);
+
+    *_veb = veb;
+    return 0;
+
+err:
+    while ( --i >= 0 )
+    {
+        struct page_info *page = mfn_to_page(veb->mfn[i]);
+
+        if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
+            put_page(page);
+        put_page_and_type(page);
+    }
+
+    xfree(veb);
+    return rc;
+}
+
+static void vm_event_free_buffer(struct vm_event_buffer **_veb)
+{
+    struct vm_event_buffer *veb = *_veb;
+
+    if ( !veb )
+        return;
+
+    if ( veb->va )
+    {
+        int i;
+
+        vunmap(veb->va);
+        for ( i = 0; i < veb->nr_frames; i++ )
+        {
+            struct page_info *page = mfn_to_page(veb->mfn[i]);
+
+            if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
+                put_page(page);
+            put_page_and_type(page);
+        }
+    }
+    XFREE(*_veb);
+}
+
+static bool vm_event_channel_check(struct vm_event_domain *ved)
+{
+    struct vm_event_domain_channel *impl = to_vm_event_domain_channel(ved);
+    return impl->ring->va != NULL && impl->channels->va != NULL;
+}
+
+static void vm_event_channel_put_request(struct vm_event_domain *ved,
+                                         vm_event_request_t *req)
+{
+    struct vcpu *curr = current;
+    struct vm_event_domain_channel *impl = to_vm_event_domain_channel(ved);
+    struct domain *d;
+    struct vm_event_slot *slot;
+    bool sync;
+
+    if ( !vm_event_check(ved) )
+        return;
+
+    d = ved->d;
+    slot = impl->channels->va + req->vcpu_id * sizeof(struct vm_event_slot);
+
+    if ( curr->domain != d )
+    {
+        req->flags |= VM_EVENT_FLAG_FOREIGN;
+#ifndef NDEBUG
+        if ( !(req->flags & VM_EVENT_FLAG_VCPU_PAUSED) )
+            gdprintk(XENLOG_G_WARNING, "d%dv%d was not paused.\n",
+                     d->domain_id, req->vcpu_id);
+#endif
+    }
+
+    req->version = VM_EVENT_INTERFACE_VERSION;
+
+    sync = req->flags & VM_EVENT_FLAG_VCPU_PAUSED;
+
+    vm_event_lock(ved);
+
+    if ( sync )
+    {
+        if ( slot->state != VM_EVENT_SLOT_STATE_IDLE )
+        {
+            gdprintk(XENLOG_G_WARNING, "The VM event slot for d%dv%d is not IDLE.\n",
+                     d->domain_id, req->vcpu_id);
+            vm_event_unlock(ved);
+            return;
+        }
+        memcpy( &slot->u.req, req, sizeof(*req) );
+        slot->state = VM_EVENT_SLOT_STATE_SUBMIT;
+    }
+    else
+    {
+        vm_event_front_ring_t *front_ring;
+        RING_IDX req_prod;
+
+        /* Due to the reservations, this step must succeed. */
+        front_ring = &impl->front_ring;
+
+        /* Copy request */
+        req_prod = front_ring->req_prod_pvt;
+        memcpy(RING_GET_REQUEST(front_ring, req_prod), req, sizeof(*req));
+        req_prod++;
+
+        /* Update ring */
+        front_ring->req_prod_pvt = req_prod;
+        RING_PUSH_REQUESTS(front_ring);
+    }
+
+    vm_event_unlock(ved);
+
+    notify_via_xen_event_channel(d, impl->xen_ports[(sync) ? req->vcpu_id : d->max_vcpus]);
+}
+
+static int vm_event_channel_disable(struct vm_event_domain **_ved)
+{
+    struct vm_event_domain_channel *ved = to_vm_event_domain_channel(*_ved);
+    struct domain *d = ved->ved.d;
+    struct vcpu *v;
+    int i;
+
+    vm_event_lock(&ved->ved);
+
+    for_each_vcpu ( d, v )
+    {
+        if ( atomic_read(&v->vm_event_pause_count) )
+            vm_event_vcpu_unpause(v);
+        /*
+        if ( test_and_clear_bit(ved->ved.pause_flag, &v->pause_flags) )
+        {
+            vcpu_unpause(v);
+        }
+        */
+    }
+
+    /* Free domU's event channels and leave the other one unbound */
+    for ( i = 0; i < d->max_vcpus; i++ )
+        evtchn_close(d, ved->xen_ports[i], 0);
+    evtchn_close(d, ved->xen_ports[d->max_vcpus], 0);
+
+    vm_event_free_buffer(&ved->ring);
+    vm_event_free_buffer(&ved->channels);
+
+    vm_event_cleanup_domain(d);
+
+    vm_event_unlock(&ved->ved);
+
+    XFREE(*_ved);
+
+    return 0;
+}
+
+static int vm_event_channel_enable(
+    struct domain *d,
+    struct vm_event_domain **_ved,
+    unsigned int nr_frames,
+    xen_event_channel_notification_t notification_fn)
+{
+    int i = 0, rc;
+    struct vm_event_domain_channel *impl;
+    unsigned int nr_ring_frames, nr_channel_frames;
+
+    if ( *_ved )
+        return -EBUSY;
+
+    if ( nr_frames <= PFN_UP(d->max_vcpus * sizeof(struct vm_event_slot)) )
+        return -EINVAL;
+
+    impl = _xzalloc(sizeof(struct vm_event_domain_channel) +
+                        ( d->max_vcpus + 1 ) * sizeof(uint32_t),
+                    __alignof__(struct vm_event_domain_channel));
+    if ( !impl )
+        return -ENOMEM;
+
+    impl->ved.d = d;
+    impl->ved.check = vm_event_channel_check;
+    impl->ved.claim_slot = NULL;
+    impl->ved.release_slot = NULL;
+    impl->ved.put_request = vm_event_channel_put_request;
+    impl->ved.get_response = vm_event_channel_get_response;
+    impl->ved.disable = vm_event_channel_disable;
+
+    nr_channel_frames = PFN_UP(d->max_vcpus * sizeof(vm_event_request_t));
+    nr_ring_frames = nr_frames - nr_channel_frames;
+
+    vm_event_lock_init(&impl->ved);
+    vm_event_lock(&impl->ved);
+
+    rc = vm_event_init_domain(d);
+    if ( rc < 0 )
+        goto err;
+
+    rc = vm_event_alloc_buffer(d, nr_ring_frames, &impl->ring);
+    if ( rc )
+        goto err;
+
+    /* Allocate event channel for the async ring*/
+    rc = alloc_unbound_xen_event_channel(d, 0, current->domain->domain_id,
+                                         notification_fn);
+    if ( rc < 0 )
+        goto err;
+
+    impl->xen_ports[d->max_vcpus] = rc;
+
+    /* Prepare ring buffer */
+    FRONT_RING_INIT(&impl->front_ring,
+                    (vm_event_sring_t *)impl->ring->va,
+                    impl->ring->nr_frames * PAGE_SIZE);
+
+    rc = vm_event_alloc_buffer(d, nr_channel_frames, &impl->channels);
+    if ( rc != 0)
+        goto err;
+
+    for ( i = 0; i < d->max_vcpus; i++)
+    {
+        rc = alloc_unbound_xen_event_channel(d, i, current->domain->domain_id,
+                                             notification_fn);
+        if ( rc < 0 )
+            goto err;
+
+        impl->xen_ports[i] = rc;
+    }
+
+    *_ved = &impl->ved;
+
+    vm_event_unlock(&impl->ved);
+    return 0;
+
+err:
+    while (i--)
+        evtchn_close(d, impl->xen_ports[i], 0);
+    evtchn_close(d, impl->xen_ports[d->max_vcpus], 0);
+    vm_event_free_buffer(&impl->ring);
+    vm_event_free_buffer(&impl->channels);
+    vm_event_cleanup_domain(d);
+    vm_event_unlock(&impl->ved);
+    xfree(impl);
+    return rc;
+}
+
 int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
                     XEN_GUEST_HANDLE_PARAM(void) u_domctl)
 {
@@ -748,7 +1113,9 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
             break;
 
         case XEN_VM_EVENT_RESUME:
-            rc = vm_event_resume(d->vm_event_paging);
+            if ( vm_event_check(d->vm_event_paging) &&
+                 is_vm_event_domain_ring(d->vm_event_paging) )
+                rc = vm_event_resume(d->vm_event_paging, NULL, 0);
             break;
 
         default:
@@ -786,7 +1153,30 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
             break;
 
         case XEN_VM_EVENT_RESUME:
-            rc = vm_event_resume(d->vm_event_monitor);
+            if ( vm_event_check(d->vm_event_monitor) &&
+                 is_vm_event_domain_ring(d->vm_event_monitor) )
+                rc = vm_event_resume(d->vm_event_monitor, NULL, 0);
+            break;
+
+        case XEN_VM_EVENT_GET_PORTS:
+            if ( !vm_event_check(d->vm_event_monitor) )
+                break;
+
+            if ( !is_vm_event_domain_ring(d->vm_event_monitor) )
+            {
+                struct vm_event_domain_channel *impl = to_vm_event_domain_channel(d->vm_event_monitor);
+
+                if ( copy_to_guest(vec->u.get_ports.sync,
+                                   impl->xen_ports,
+                                   d->max_vcpus) != 0 )
+                {
+                    rc = -EFAULT;
+                    break;
+                }
+
+                vec->u.get_ports.async = impl->xen_ports[d->max_vcpus];
+                rc = 0;
+            }
             break;
 
         default:
@@ -830,7 +1220,10 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
             break;
 
         case XEN_VM_EVENT_RESUME:
-            rc = vm_event_resume(d->vm_event_share);
+            if ( vm_event_check(d->vm_event_monitor) &&
+                 is_vm_event_domain_ring(d->vm_event_monitor) )
+                rc = vm_event_resume(d->vm_event_share, NULL, 0);
+            break;
 
         default:
             rc = -ENOSYS;
@@ -847,6 +1240,52 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
     return rc;
 }
 
+int vm_event_get_frames(struct domain *d, unsigned int id,
+                        unsigned long frame, unsigned int nr_frames,
+                        xen_pfn_t mfn_list[])
+{
+    int rc = 0, i, j;
+    struct vm_event_domain **_ved;
+    struct vm_event_domain_channel *impl;
+    xen_event_channel_notification_t fn;
+
+    switch ( id )
+    {
+    case XEN_VM_EVENT_TYPE_MONITOR:
+        /* domain_pause() not required here, see XSA-99 */
+        rc = arch_monitor_init_domain(d);
+        if ( rc )
+            return rc;
+        _ved = &d->vm_event_monitor;
+        fn = monitor_notification;
+        break;
+
+    default:
+        return -ENOSYS;
+    }
+
+    rc = vm_event_channel_enable(d, _ved, nr_frames, fn);
+    if ( rc )
+    {
+        switch ( id )
+        {
+            case XEN_VM_EVENT_TYPE_MONITOR:
+                arch_monitor_cleanup_domain(d);
+                break;
+        }
+        return rc;
+    }
+
+    impl = to_vm_event_domain_channel(*_ved);
+    j = 0;
+    for ( i = 0; i < impl->ring->nr_frames; i++ )
+        mfn_list[j++] = mfn_x(impl->ring->mfn[i]);
+    for ( i = 0; i < impl->channels->nr_frames; i++ )
+        mfn_list[j++] = mfn_x(impl->channels->mfn[i]);
+
+    return rc;
+}
+
 void vm_event_vcpu_pause(struct vcpu *v)
 {
     ASSERT(v == current);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 26b1a55..78262a1 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -38,7 +38,7 @@
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000011
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -836,6 +836,7 @@ struct xen_domctl_gdbsx_domstatus {
 #define XEN_VM_EVENT_ENABLE               0
 #define XEN_VM_EVENT_DISABLE              1
 #define XEN_VM_EVENT_RESUME               2
+#define XEN_VM_EVENT_GET_PORTS            3
 
 /*
  * Use for teardown/setup of helper<->hypervisor interface for paging,
@@ -843,10 +844,26 @@ struct xen_domctl_gdbsx_domstatus {
  */
 /* XEN_DOMCTL_vm_event_op */
 struct xen_domctl_vm_event_op {
-    uint32_t        op;           /* XEN_VM_EVENT_* */
-    uint32_t        type;         /* XEN_VM_EVENT_TYPE_* */
+    /* IN: Xen vm_event opcode (XEN_VM_EVENT_*) */
+    uint32_t            op;
+    /* IN: Xen vm event ring type (XEN_VM_EVENT_TYPE_*) */
+    uint32_t            type;
 
-    uint32_t        port;         /* OUT: event channel for ring */
+    union {
+        struct {
+            /* OUT: remote port for event channel ring */
+            uint32_t    port;
+        } enable;
+        struct {
+            /* OUT: remote port for the async event channel ring */
+            uint32_t    async;
+            /*
+             * OUT: remote ports for the sync event vm_event channels
+             * The number for ports will be equal with the vcpu count.
+             */
+            XEN_GUEST_HANDLE_64(uint32) sync;
+        } get_ports;
+    } u;
 };
 
 /*
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 8638023..cfd280d 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -612,6 +612,7 @@ struct xen_mem_acquire_resource {
 
 #define XENMEM_resource_ioreq_server 0
 #define XENMEM_resource_grant_table 1
+#define XENMEM_resource_vm_event 2
 
     /*
      * IN - a type-specific resource identifier, which must be zero
@@ -619,6 +620,7 @@ struct xen_mem_acquire_resource {
      *
      * type == XENMEM_resource_ioreq_server -> id == ioreq server id
      * type == XENMEM_resource_grant_table -> id defined below
+     * type == XENMEM_resource_vm_event -> id == vm_event type
      */
     uint32_t id;
 
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index b2bafc0..499fbbc 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -388,6 +388,21 @@ typedef struct vm_event_st {
 
 DEFINE_RING_TYPES(vm_event, vm_event_request_t, vm_event_response_t);
 
+struct vm_event_slot
+{
+    uint32_t state;
+    union {
+        vm_event_request_t req;
+        vm_event_response_t rsp;
+    } u;
+};
+
+enum vm_event_slot_state {
+    VM_EVENT_SLOT_STATE_IDLE,   /* no contents */
+    VM_EVENT_SLOT_STATE_SUBMIT, /* request ready */
+    VM_EVENT_SLOT_STATE_FINISH, /* response ready */
+};
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 #endif /* _XEN_PUBLIC_VM_EVENT_H */
 
diff --git a/xen/include/xen/vm_event.h b/xen/include/xen/vm_event.h
index a5c82d6..d4bd184 100644
--- a/xen/include/xen/vm_event.h
+++ b/xen/include/xen/vm_event.h
@@ -64,6 +64,10 @@ void vm_event_put_request(struct vm_event_domain *ved,
 int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
                     XEN_GUEST_HANDLE_PARAM(void) u_domctl);
 
+int vm_event_get_frames(struct domain *d, unsigned int id,
+                        unsigned long frame, unsigned int nr_frames,
+                        xen_pfn_t mfn_list[]);
+
 void vm_event_vcpu_pause(struct vcpu *v);
 void vm_event_vcpu_unpause(struct vcpu *v);
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 5/6] xen-access: add support for slotted channel vm_events
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
                   ` (3 preceding siblings ...)
  2018-12-19 18:52 ` [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests Petre Pircalabu
@ 2018-12-19 18:52 ` Petre Pircalabu
  2018-12-19 18:52 ` [RFC PATCH 6/6] xc_version: add vm_event interface version Petre Pircalabu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Petre Pircalabu @ 2018-12-19 18:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Wei Liu, Tamas K Lengyel, Ian Jackson, Razvan Cojocaru

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
---
 tools/tests/xen-access/xen-access.c | 545 +++++++++++++++++++++++++++++-------
 1 file changed, 441 insertions(+), 104 deletions(-)

diff --git a/tools/tests/xen-access/xen-access.c b/tools/tests/xen-access/xen-access.c
index 6aaee16..b09be6c 100644
--- a/tools/tests/xen-access/xen-access.c
+++ b/tools/tests/xen-access/xen-access.c
@@ -62,13 +62,33 @@
 /* From xen/include/asm-x86/x86-defns.h */
 #define X86_CR4_PGE        0x00000080 /* enable global pages */
 
-typedef struct vm_event {
-    domid_t domain_id;
+#ifndef round_pgup
+#define round_pgup(p)    (((p) + (XC_PAGE_SIZE - 1)) & XC_PAGE_MASK)
+#endif /* round_pgup */
+
+struct vm_event_ring
+{
     xenevtchn_handle *xce_handle;
     int port;
     vm_event_back_ring_t back_ring;
     uint32_t evtchn_port;
-    void *ring_page;
+    void *buffer;
+    unsigned int page_count;
+};
+
+struct vm_event_channel
+{
+    xenevtchn_handle **xce_handles;
+    int *ports;
+    uint32_t *evtchn_ports;
+    void *buffer;
+};
+
+typedef struct vm_event {
+    domid_t domain_id;
+    unsigned int num_vcpus;
+    struct vm_event_ring *ring;
+    struct vm_event_channel *channel;
 } vm_event_t;
 
 typedef struct xenaccess {
@@ -79,6 +99,7 @@ typedef struct xenaccess {
     vm_event_t vm_event;
 } xenaccess_t;
 
+
 static int interrupted;
 bool evtchn_bind = 0, evtchn_open = 0, mem_access_enable = 0;
 
@@ -87,45 +108,224 @@ static void close_handler(int sig)
     interrupted = sig;
 }
 
-int xc_wait_for_event_or_timeout(xc_interface *xch, xenevtchn_handle *xce, unsigned long ms)
+static int vcpu_id_by_port(vm_event_t *vm_event, int port)
 {
-    struct pollfd fd = { .fd = xenevtchn_fd(xce), .events = POLLIN | POLLERR };
-    int port;
-    int rc;
+    int i;
+
+    if ( port == vm_event->ring->port )
+        return 0;
+
+    if ( vm_event->channel )
+        for ( i = 0; i < vm_event->num_vcpus; i++ )
+            if ( vm_event->channel->ports[i] == port )
+                return i;
+
+    return -1;
+}
+
+static int xenaccess_wait_for_events(xenaccess_t *xenaccess,
+                                     int **_ports,
+                                     unsigned long ms)
+{
+    struct pollfd *fds;
+    vm_event_t *vm_event;
+    int rc, fd_count = 0, i = 0, found = 0;
+    int *ports = NULL;
+
+    vm_event = &xenaccess->vm_event;
 
-    rc = poll(&fd, 1, ms);
-    if ( rc == -1 )
+    fd_count = ((vm_event->channel) ? vm_event->num_vcpus : 0) + 1;
+
+    fds = calloc(fd_count, sizeof(struct pollfd));
+
+    if ( vm_event->channel )
     {
-        if (errno == EINTR)
-            return 0;
+        for (i = 0; i < vm_event->num_vcpus; i++ )
+        {
+            fds[i].fd = xenevtchn_fd(vm_event->channel->xce_handles[i]);
+            fds[i].events = POLLIN | POLLERR;
+            fds[i].revents = 0;
+        }
+    }
 
-        ERROR("Poll exited with an error");
-        goto err;
+    fds[i].fd = xenevtchn_fd(vm_event->ring->xce_handle);
+    fds[i].events = POLLIN | POLLERR;
+    fds[i].revents = 0;
+
+    rc = poll(fds, fd_count, ms);
+    if ( rc == -1 || rc == 0 )
+    {
+        if ( errno == EINTR )
+            rc = 0;
+        goto cleanup;
     }
 
-    if ( rc == 1 )
+    ports = malloc(rc * sizeof(int));
+
+    for ( i = 0; i < fd_count ; i++ )
     {
-        port = xenevtchn_pending(xce);
-        if ( port == -1 )
+        if ( fds[i].revents & POLLIN )
         {
-            ERROR("Failed to read port from event channel");
-            goto err;
+            bool ring_event = i == (fd_count-1);
+            xenevtchn_handle *xce = ( ring_event ) ? vm_event->ring->xce_handle :
+                                                     vm_event->channel->xce_handles[i];
+            int port = xenevtchn_pending(xce);
+
+            if ( port == -1 )
+            {
+                ERROR("Failed to read port from event channel");
+                rc = -1;
+                goto cleanup;
+            }
+
+            if ( ring_event )
+            {
+                if ( RING_HAS_UNCONSUMED_REQUESTS(&vm_event->ring->back_ring) )
+                    ports[found++] = port;
+
+                if ( xenevtchn_unmask(xce, port) )
+                {
+                    ERROR("Failed to unmask event channel port");
+                    rc = -1;
+                    goto cleanup;
+                }
+            }
+            else
+            {
+                int vcpu_id = vcpu_id_by_port(vm_event, port);
+                struct vm_event_slot *slot;
+
+                if ( vcpu_id < 0 )
+                {
+                    ERROR("Failed to get the vm_event_slot for port %d\n", port);
+                    rc = -1;
+                    goto cleanup;
+                }
+                slot = &((struct vm_event_slot *)vm_event->channel->buffer)[vcpu_id];
+
+                if ( slot->state == VM_EVENT_SLOT_STATE_SUBMIT )
+                    ports[found++] = port;
+                /* Unmask the port's event channel in case of a spurious interrupt */
+                else if ( xenevtchn_unmask(xce, port) )
+                {
+                    ERROR("Failed to unmask event channel port");
+                    rc = -1;
+                    goto cleanup;
+                }
+            }
         }
+    }
+    rc = found;
+    *_ports = ports;
 
-        rc = xenevtchn_unmask(xce, port);
-        if ( rc != 0 )
+cleanup:
+    free(fds);
+    return rc;
+}
+
+static int xenaccess_evtchn_bind_port(uint32_t evtchn_port,
+                                      domid_t domain_id,
+                                      xenevtchn_handle **_handle,
+                                      int *_port)
+{
+    xenevtchn_handle *handle;
+    int rc;
+
+    if ( !_handle || !_port )
+        return -EINVAL;
+
+    /* Open event channel */
+    handle = xenevtchn_open(NULL, 0);
+    if ( handle == NULL )
+    {
+        ERROR("Failed to open event channel\n");
+        return -ENODEV;
+    }
+
+    /* Bind event notification */
+    rc = xenevtchn_bind_interdomain(handle, domain_id, evtchn_port);
+    if ( rc < 0 )
+    {
+        ERROR("Failed to bind event channel\n");
+        xenevtchn_close(handle);
+        return rc;
+    }
+
+    *_handle = handle;
+    *_port = rc;
+    return 0;
+}
+
+static void xenaccess_evtchn_unbind_port(uint32_t evtchn_port,
+                                         xenevtchn_handle **_handle,
+                                         int *_port)
+{
+    if ( !_handle || !*_handle || !_port )
+        return;
+
+    xenevtchn_unbind(*_handle, *_port);
+    xenevtchn_close(*_handle);
+    *_handle = NULL;
+    *_port = 0;
+}
+
+static int xenaccess_evtchn_bind(xenaccess_t *xenaccess)
+{
+    int rc, i = 0;
+
+    rc = xenaccess_evtchn_bind_port(xenaccess->vm_event.ring->evtchn_port,
+                                    xenaccess->vm_event.domain_id,
+                                    &xenaccess->vm_event.ring->xce_handle,
+                                    &xenaccess->vm_event.ring->port);
+    if ( rc < 0 )
+    {
+        ERROR("Failed to bind ring events\n");
+        return rc;
+    }
+
+    if ( xenaccess->vm_event.channel == NULL)
+        return 0;
+
+    for ( i = 0; i < xenaccess->vm_event.num_vcpus; i++ )
+    {
+        rc = xenaccess_evtchn_bind_port(xenaccess->vm_event.channel->evtchn_ports[i],
+                                        xenaccess->vm_event.domain_id,
+                                        &xenaccess->vm_event.channel->xce_handles[i],
+                                        &xenaccess->vm_event.channel->ports[i]);
+        if ( rc < 0 )
         {
-            ERROR("Failed to unmask event channel port");
+            ERROR("Failed to bind channel events\n");
             goto err;
         }
     }
-    else
-        port = -1;
 
-    return port;
+    evtchn_bind = true;
+    return 0;
 
- err:
-    return -errno;
+err:
+    xenaccess_evtchn_unbind_port(xenaccess->vm_event.ring->evtchn_port,
+                                 &xenaccess->vm_event.ring->xce_handle,
+                                 &xenaccess->vm_event.ring->port);
+
+    for ( i--; i >= 0; i-- )
+        xenaccess_evtchn_unbind_port(xenaccess->vm_event.channel->evtchn_ports[i],
+                                     &xenaccess->vm_event.channel->xce_handles[i],
+                                     &xenaccess->vm_event.channel->ports[i]);
+    return rc;
+}
+
+static void xenaccess_evtchn_unbind(xenaccess_t *xenaccess)
+{
+    int i;
+
+    xenaccess_evtchn_unbind_port(xenaccess->vm_event.ring->evtchn_port,
+                                 &xenaccess->vm_event.ring->xce_handle,
+                                 &xenaccess->vm_event.ring->port);
+
+    for ( i = 0; i < xenaccess->vm_event.num_vcpus; i++ )
+        xenaccess_evtchn_unbind_port(xenaccess->vm_event.channel->evtchn_ports[i],
+                                     &xenaccess->vm_event.channel->xce_handles[i],
+                                     &xenaccess->vm_event.channel->ports[i]);
 }
 
 int xenaccess_teardown(xc_interface *xch, xenaccess_t *xenaccess)
@@ -136,8 +336,13 @@ int xenaccess_teardown(xc_interface *xch, xenaccess_t *xenaccess)
         return 0;
 
     /* Tear down domain xenaccess in Xen */
-    if ( xenaccess->vm_event.ring_page )
-        munmap(xenaccess->vm_event.ring_page, XC_PAGE_SIZE);
+    if ( xenaccess->vm_event.ring->buffer )
+        munmap(xenaccess->vm_event.ring->buffer,
+               xenaccess->vm_event.ring->page_count * XC_PAGE_SIZE);
+
+    if ( xenaccess->vm_event.channel->buffer )
+        munmap(xenaccess->vm_event.channel->buffer,
+               round_pgup(xenaccess->vm_event.num_vcpus * sizeof (struct vm_event_slot)) );
 
     if ( mem_access_enable )
     {
@@ -153,24 +358,8 @@ int xenaccess_teardown(xc_interface *xch, xenaccess_t *xenaccess)
     /* Unbind VIRQ */
     if ( evtchn_bind )
     {
-        rc = xenevtchn_unbind(xenaccess->vm_event.xce_handle,
-                              xenaccess->vm_event.port);
-        if ( rc != 0 )
-        {
-            ERROR("Error unbinding event port");
-            return rc;
-        }
-    }
-
-    /* Close event channel */
-    if ( evtchn_open )
-    {
-        rc = xenevtchn_close(xenaccess->vm_event.xce_handle);
-        if ( rc != 0 )
-        {
-            ERROR("Error closing event channel");
-            return rc;
-        }
+        xenaccess_evtchn_unbind(xenaccess);
+        evtchn_bind = false;
     }
 
     /* Close connection to Xen */
@@ -182,6 +371,16 @@ int xenaccess_teardown(xc_interface *xch, xenaccess_t *xenaccess)
     }
     xenaccess->xc_handle = NULL;
 
+    if (xenaccess->vm_event.channel)
+    {
+        free(xenaccess->vm_event.channel->evtchn_ports);
+        free(xenaccess->vm_event.channel->ports);
+        free(xenaccess->vm_event.channel->xce_handles);
+        free(xenaccess->vm_event.channel);
+    }
+
+    free(xenaccess->vm_event.ring);
+
     free(xenaccess);
 
     return 0;
@@ -191,6 +390,8 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t domain_id)
 {
     xenaccess_t *xenaccess = 0;
     xc_interface *xch;
+    xc_dominfo_t info;
+    void *handle;
     int rc;
 
     xch = xc_interface_open(NULL, NULL, 0);
@@ -201,8 +402,7 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t domain_id)
     *xch_r = xch;
 
     /* Allocate memory */
-    xenaccess = malloc(sizeof(xenaccess_t));
-    memset(xenaccess, 0, sizeof(xenaccess_t));
+    xenaccess = calloc(1, sizeof(xenaccess_t));
 
     /* Open connection to xen */
     xenaccess->xc_handle = xch;
@@ -210,12 +410,50 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t domain_id)
     /* Set domain id */
     xenaccess->vm_event.domain_id = domain_id;
 
-    /* Enable mem_access */
-    xenaccess->vm_event.ring_page =
-            xc_monitor_enable(xenaccess->xc_handle,
-                              xenaccess->vm_event.domain_id,
-                              &xenaccess->vm_event.evtchn_port);
-    if ( xenaccess->vm_event.ring_page == NULL )
+    rc = xc_domain_getinfo(xch, domain_id, 1, &info);
+    if ( rc != 1 )
+    {
+        ERROR("xc_domain_getinfo failed. rc = %d\n", rc);
+        goto err;
+    }
+
+    xenaccess->vm_event.num_vcpus = info.max_vcpu_id + 1;
+
+    xenaccess->vm_event.ring = calloc(1, sizeof(struct vm_event_ring));
+    xenaccess->vm_event.ring->page_count = 1;
+
+    xenaccess->vm_event.channel = calloc(1, sizeof(struct vm_event_channel));
+    xenaccess->vm_event.channel->xce_handles = calloc(xenaccess->vm_event.num_vcpus,
+                                                      sizeof(xenevtchn_handle*));
+    xenaccess->vm_event.channel->ports = calloc(xenaccess->vm_event.num_vcpus,
+                                                sizeof(int));
+    xenaccess->vm_event.channel->evtchn_ports = calloc(xenaccess->vm_event.num_vcpus,
+                                                       sizeof(uint32_t));
+
+    handle = xc_monitor_enable_ex(xenaccess->xc_handle,
+                                  xenaccess->vm_event.domain_id,
+                                  &xenaccess->vm_event.ring->buffer,
+                                  xenaccess->vm_event.ring->page_count,
+                                  &xenaccess->vm_event.ring->evtchn_port,
+                                  &xenaccess->vm_event.channel->buffer,
+                                  xenaccess->vm_event.channel->evtchn_ports,
+                                  xenaccess->vm_event.num_vcpus);
+
+    if ( handle == NULL && errno == EOPNOTSUPP )
+    {
+        free(xenaccess->vm_event.channel->xce_handles);
+        free(xenaccess->vm_event.channel->ports);
+        free(xenaccess->vm_event.channel->evtchn_ports);
+        free(xenaccess->vm_event.channel);
+        xenaccess->vm_event.channel = NULL;
+
+        handle = xc_monitor_enable(xenaccess->xc_handle,
+                                   xenaccess->vm_event.domain_id,
+                                   &xenaccess->vm_event.ring->evtchn_port);
+        xenaccess->vm_event.ring->buffer = handle;
+    }
+
+    if ( handle == NULL )
     {
         switch ( errno ) {
             case EBUSY:
@@ -230,40 +468,25 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t domain_id)
         }
         goto err;
     }
-    mem_access_enable = 1;
 
-    /* Open event channel */
-    xenaccess->vm_event.xce_handle = xenevtchn_open(NULL, 0);
-    if ( xenaccess->vm_event.xce_handle == NULL )
-    {
-        ERROR("Failed to open event channel");
-        goto err;
-    }
-    evtchn_open = 1;
+    /* Enable mem_access */
+    mem_access_enable = 1;
 
-    /* Bind event notification */
-    rc = xenevtchn_bind_interdomain(xenaccess->vm_event.xce_handle,
-                                    xenaccess->vm_event.domain_id,
-                                    xenaccess->vm_event.evtchn_port);
+    rc = xenaccess_evtchn_bind(xenaccess);
     if ( rc < 0 )
-    {
-        ERROR("Failed to bind event channel");
         goto err;
-    }
-    evtchn_bind = 1;
-    xenaccess->vm_event.port = rc;
 
-    /* Initialise ring */
-    SHARED_RING_INIT((vm_event_sring_t *)xenaccess->vm_event.ring_page);
-    BACK_RING_INIT(&xenaccess->vm_event.back_ring,
-                   (vm_event_sring_t *)xenaccess->vm_event.ring_page,
-                   XC_PAGE_SIZE);
+    evtchn_bind = true;
 
+    /* Initialise ring */
+    SHARED_RING_INIT((vm_event_sring_t *)xenaccess->vm_event.ring->buffer);
+    BACK_RING_INIT(&xenaccess->vm_event.ring->back_ring,
+                   (vm_event_sring_t *)xenaccess->vm_event.ring->buffer,
+                   XC_PAGE_SIZE * xenaccess->vm_event.ring->page_count);
     /* Get max_gpfn */
     rc = xc_domain_maximum_gpfn(xenaccess->xc_handle,
                                 xenaccess->vm_event.domain_id,
                                 &xenaccess->max_gpfn);
-
     if ( rc )
     {
         ERROR("Failed to get max gpfn");
@@ -275,11 +498,8 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t domain_id)
     return xenaccess;
 
  err:
-    rc = xenaccess_teardown(xch, xenaccess);
-    if ( rc )
-    {
+    if ( xenaccess_teardown(xch, xenaccess) )
         ERROR("Failed to teardown xenaccess structure!\n");
-    }
 
  err_iface:
     return NULL;
@@ -301,12 +521,10 @@ int control_singlestep(
 /*
  * Note that this function is not thread safe.
  */
-static void get_request(vm_event_t *vm_event, vm_event_request_t *req)
+static void get_ring_request(vm_event_back_ring_t *back_ring, vm_event_request_t *req)
 {
-    vm_event_back_ring_t *back_ring;
     RING_IDX req_cons;
 
-    back_ring = &vm_event->back_ring;
     req_cons = back_ring->req_cons;
 
     /* Copy request */
@@ -318,6 +536,62 @@ static void get_request(vm_event_t *vm_event, vm_event_request_t *req)
     back_ring->sring->req_event = req_cons + 1;
 }
 
+static bool get_request(vm_event_t *vm_event, vm_event_request_t *req,
+                        int *ports, int *index, int *next)
+{
+    int port;
+
+    *index = *next;
+
+    if ( *index < 0 )
+        return false;
+
+    port = ports[*index];
+
+    if ( port == vm_event->ring->port )
+    {
+        /* SKIP spurious event */
+        if ( !RING_HAS_UNCONSUMED_REQUESTS(&vm_event->ring->back_ring) )
+        {
+            ports[*index] = -1;
+            (*next)--;
+            return true;
+        }
+
+        get_ring_request(&vm_event->ring->back_ring, req);
+
+        /*
+         * The vm event ring contains unconsumed request.
+         * The "pending" port index is not be decremented and the subsequent
+         * call will return the next pending vm_event request from the ring.
+         */
+        if ( RING_HAS_UNCONSUMED_REQUESTS(&vm_event->ring->back_ring) )
+            return true;
+    }
+    else if (vm_event->channel)
+    {
+        int vcpu_id = vcpu_id_by_port(vm_event, port);
+
+        if (vcpu_id < 0)
+            ports[*index] = -1;
+        else
+        {
+            struct vm_event_slot *slot = &((struct vm_event_slot *)vm_event->channel->buffer)[vcpu_id];
+
+            if ( slot->state != VM_EVENT_SLOT_STATE_SUBMIT )
+                ports[*index] = -1;
+            else
+            {
+                memcpy(req, &slot->u.req, sizeof(*req));
+                if (xenevtchn_unmask(vm_event->channel->xce_handles[vcpu_id], port) )
+                    ports[*index] = -1;
+            }
+        }
+    }
+    (*next)--;
+    return true;
+}
+
 /*
  * X86 control register names
  */
@@ -339,12 +613,10 @@ static const char* get_x86_ctrl_reg_name(uint32_t index)
 /*
  * Note that this function is not thread safe.
  */
-static void put_response(vm_event_t *vm_event, vm_event_response_t *rsp)
+static void put_ring_response(vm_event_back_ring_t *back_ring, vm_event_response_t *rsp)
 {
-    vm_event_back_ring_t *back_ring;
     RING_IDX rsp_prod;
 
-    back_ring = &vm_event->back_ring;
     rsp_prod = back_ring->rsp_prod_pvt;
 
     /* Copy response */
@@ -356,6 +628,59 @@ static void put_response(vm_event_t *vm_event, vm_event_response_t *rsp)
     RING_PUSH_RESPONSES(back_ring);
 }
 
+static void put_response(vm_event_t *vm_event, vm_event_response_t *rsp, int port)
+{
+    if ( port == vm_event->ring->port )
+    {
+        /* Drop ring responses if the synchronous slotted channel is enabled */
+        if ( vm_event->channel )
+        {
+            ERROR("Cannot put response on async ring\n");
+            return;
+        }
+
+        put_ring_response(&vm_event->ring->back_ring, rsp);
+    }
+    else
+    {
+        int vcpu_id = vcpu_id_by_port(vm_event, port);
+        struct vm_event_slot *slot;
+
+        if ( vcpu_id < 0 )
+            return;
+
+        slot = &((struct vm_event_slot *)vm_event->channel->buffer)[vcpu_id];
+        memcpy(&slot->u.rsp, rsp, sizeof(*rsp));
+        slot->state = VM_EVENT_SLOT_STATE_FINISH;
+    }
+}
+
+static int xenaccess_notify(vm_event_t *vm_event, int port)
+{
+       xenevtchn_handle *xce;
+
+        if ( port == vm_event->ring->port )
+            xce = vm_event->ring->xce_handle;
+        else
+        {
+            int vcpu_id = vcpu_id_by_port(vm_event, port);
+
+            if ( vcpu_id < 0 )
+                return -1;
+
+            xce = vm_event->channel->xce_handles[vcpu_id];
+        }
+
+        /* Tell Xen page is ready */
+        if ( xenevtchn_notify(xce, port) != 0)
+        {
+            ERROR("Error resuming page");
+            return -1;
+        }
+
+    return 0;
+}
+
 void usage(char* progname)
 {
     fprintf(stderr, "Usage: %s [-m] <domain_id> write|exec", progname);
@@ -663,6 +988,8 @@ int main(int argc, char *argv[])
     /* Wait for access */
     for (;;)
     {
+        int *ports = NULL, current_index, next_index, req_count;
+
         if ( interrupted )
         {
             /* Unregister for every event */
@@ -697,25 +1024,35 @@ int main(int argc, char *argv[])
             shutting_down = 1;
         }
 
-        rc = xc_wait_for_event_or_timeout(xch, xenaccess->vm_event.xce_handle, 100);
-        if ( rc < -1 )
+        rc = xenaccess_wait_for_events(xenaccess, &ports, 100);
+        if ( rc < 0 )
         {
-            ERROR("Error getting event");
             interrupted = -1;
             continue;
         }
-        else if ( rc != -1 )
+        else if ( rc == 0 )
+        {
+            if ( ! shutting_down )
+                continue;
+        }
+        else
         {
             DPRINTF("Got event from Xen\n");
         }
 
-        while ( RING_HAS_UNCONSUMED_REQUESTS(&xenaccess->vm_event.back_ring) )
+        req_count = rc;
+        current_index = req_count;
+        next_index = current_index;
+
+        while ( get_request(&xenaccess->vm_event, &req, ports, &current_index, &next_index) )
         {
-            get_request(&xenaccess->vm_event, &req);
+            if ( ports[current_index] < 0)
+                continue;
 
             if ( req.version != VM_EVENT_INTERFACE_VERSION )
             {
                 ERROR("Error: vm_event interface version mismatch!\n");
+                ports[current_index] = -1;
                 interrupted = -1;
                 continue;
             }
@@ -896,21 +1233,21 @@ int main(int argc, char *argv[])
             }
 
             /* Put the response on the ring */
-            put_response(&xenaccess->vm_event, &rsp);
-        }
+            if (req.flags & VM_EVENT_FLAG_VCPU_PAUSED)
+                put_response(&xenaccess->vm_event, &rsp, ports[current_index]);
 
-        /* Tell Xen page is ready */
-        rc = xenevtchn_notify(xenaccess->vm_event.xce_handle,
-                              xenaccess->vm_event.port);
-
-        if ( rc != 0 )
-        {
-            ERROR("Error resuming page");
-            interrupted = -1;
+            if ( current_index != next_index )
+                if ( xenaccess_notify(&xenaccess->vm_event, ports[current_index]) )
+                    interrupted = -1;
         }
 
+        free(ports);
+
         if ( shutting_down )
+        {
+            DPRINTF("Shutting down xenaccess\n");
             break;
+        }
     }
     DPRINTF("xenaccess shut down on signal %d\n", interrupted);
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 6/6] xc_version: add vm_event interface version
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
                   ` (4 preceding siblings ...)
  2018-12-19 18:52 ` [RFC PATCH 5/6] xen-access: add support for slotted channel vm_events Petre Pircalabu
@ 2018-12-19 18:52 ` Petre Pircalabu
  2019-01-08 16:27   ` Jan Beulich
  2018-12-19 22:33 ` [PATCH RFC 0/6] Slotted channels for sync vm_events Tamas K Lengyel
  2019-02-06 14:26 ` Petre Ovidiu PIRCALABU
  7 siblings, 1 reply; 50+ messages in thread
From: Petre Pircalabu @ 2018-12-19 18:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
---
 tools/libxc/xc_private.c     | 3 +++
 xen/common/kernel.c          | 3 +++
 xen/include/public/version.h | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index 90974d5..9b983e0 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -497,6 +497,9 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
             HYPERCALL_BOUNCE_SET_DIR(arg, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
             break;
         }
+    case XENVER_vm_event_version:
+        sz = 0;
+        break;
     default:
         ERROR("xc_version: unknown command %d\n", cmd);
         return -EINVAL;
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 5766a0f..667552c 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -516,6 +516,9 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         return sz;
     }
+
+    case XENVER_vm_event_version:
+        return VM_EVENT_INTERFACE_VERSION;
     }
 
     return -ENOSYS;
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 7063e8c..b962386 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -103,6 +103,9 @@ struct xen_build_id {
 };
 typedef struct xen_build_id xen_build_id_t;
 
+/* arg == NULL; returns the vm_event interface version */
+#define XENVER_vm_event_version 11
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type
  2018-12-19 18:52 ` [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type Petre Pircalabu
@ 2018-12-19 22:13   ` Tamas K Lengyel
  2019-01-02 11:11   ` Wei Liu
  2019-01-08 16:25   ` Jan Beulich
  2 siblings, 0 replies; 50+ messages in thread
From: Tamas K Lengyel @ 2018-12-19 22:13 UTC (permalink / raw)
  To: Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Xen-devel

On Wed, Dec 19, 2018 at 11:52 AM Petre Pircalabu
<ppircalabu@bitdefender.com> wrote:
>
> Define the type for each of the supported vm_event rings (paging,
> monitor and sharing) and replace the ring param field with this type.
>
> Replace XEN_DOMCTL_VM_EVENT_OP_ occurrences with their corresponding
> XEN_VM_EVENT_TYPE_ counterpart.
>

This patch looks fine to me as-is, mostly just mechanical renaming/cleanup.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 3/6] vm_event: Refactor vm_event_domain implementation
  2018-12-19 18:52 ` [RFC PATCH 3/6] vm_event: Refactor vm_event_domain implementation Petre Pircalabu
@ 2018-12-19 22:26   ` Tamas K Lengyel
  2018-12-20 12:39     ` Petre Ovidiu PIRCALABU
  0 siblings, 1 reply; 50+ messages in thread
From: Tamas K Lengyel @ 2018-12-19 22:26 UTC (permalink / raw)
  To: Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Wed, Dec 19, 2018 at 11:52 AM Petre Pircalabu
<ppircalabu@bitdefender.com> wrote:
>
> Decouple the VM Event interface from the ring implementation.

This will need a much better description. There is also a lot of churn
that is mostly just mechanical in this patch but makes reviewing it
hard. Perhaps functional changes and mechanical changes could be split
to two patches?

> +struct vm_event_domain
> +{
> +    /* VM event ops */
> +    bool (*check)(struct vm_event_domain *ved);
> +    int (*claim_slot)(struct vm_event_domain *ved, bool allow_sleep);
> +    void (*release_slot)(struct vm_event_domain *ved);
> +    void (*put_request)(struct vm_event_domain *ved, vm_event_request_t *req);
> +    int (*get_response)(struct vm_event_domain *ved, vm_event_response_t *rsp);
> +    int (*disable)(struct vm_event_domain **_ved);

I don't see (yet) the reason why having these pointers stored in the
struct is needed. Are there going to be different implementations for
these? If so, need to explain that in the commit message.

> +
> +    /* The domain associated with the VM event */
> +    struct domain *d;
> +
> +    /* ring lock */
> +    spinlock_t lock;
> +};
> +
> +bool vm_event_check(struct vm_event_domain *ved)
> +{
> +    return (ved && ved->check(ved));
> +}
>
> -    return rc;
> +/* VM event domain ring implementation */
> +struct vm_event_domain_ring
> +{
> +    /* VM event domain */
> +    struct vm_event_domain ved;

Why is this not a pointer instead? Does each vm_event_domain_ring
really need a separate copy of vm_event_domain?

> +    /* The ring has 64 entries */
> +    unsigned char foreign_producers;
> +    unsigned char target_producers;
> +    /* shared ring page */
> +    void *ring_page;
> +    struct page_info *ring_pg_struct;
> +    /* front-end ring */
> +    vm_event_front_ring_t front_ring;
> +    /* event channel port (vcpu0 only) */
> +    int xen_port;
> +    /* vm_event bit for vcpu->pause_flags */
> +    int pause_flag;
> +    /* list of vcpus waiting for room in the ring */
> +    struct waitqueue_head wq;
> +    /* the number of vCPUs blocked */
> +    unsigned int blocked;
> +    /* The last vcpu woken up */
> +    unsigned int last_vcpu_wake_up;
> +};
> +

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
                   ` (5 preceding siblings ...)
  2018-12-19 18:52 ` [RFC PATCH 6/6] xc_version: add vm_event interface version Petre Pircalabu
@ 2018-12-19 22:33 ` Tamas K Lengyel
  2018-12-19 23:30   ` Andrew Cooper
  2018-12-20 10:48   ` Petre Ovidiu PIRCALABU
  2019-02-06 14:26 ` Petre Ovidiu PIRCALABU
  7 siblings, 2 replies; 50+ messages in thread
From: Tamas K Lengyel @ 2018-12-19 22:33 UTC (permalink / raw)
  To: Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Wed, Dec 19, 2018 at 11:52 AM Petre Pircalabu
<ppircalabu@bitdefender.com> wrote:
>
> This patchset is a rework of the "multi-page ring buffer" for vm_events
> patch based on Andrew Cooper's comments.
> For synchronous vm_events the ring waitqueue logic was unnecessary as the
> vcpu sending the request was blocked until a response was received.
> To simplify the request/response mechanism, an array of slotted channels
> was created, one per vcpu. Each vcpu puts the request in the
> corresponding slot and blocks until the response is received.
>
> I'm sending this patch as a RFC because, while I'm still working on way to
> measure the overall performance improvement, your feedback would be a great
> assistance.

Generally speaking this approach is OK, but I'm concerned that we will
eventually run into the same problem that brought up the idea of using
multi-page rings: vm_event structures that are larger then a page.
Right now this series adds a ring for each vCPU, which does mitigate
some of the bottleneck, but it does not really address the root cause.
It also adds significant complexity as the userspace side now has to
map in multiple rings, each with its own event channel and polling
requirements.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2018-12-19 22:33 ` [PATCH RFC 0/6] Slotted channels for sync vm_events Tamas K Lengyel
@ 2018-12-19 23:30   ` Andrew Cooper
  2018-12-20 10:48   ` Petre Ovidiu PIRCALABU
  1 sibling, 0 replies; 50+ messages in thread
From: Andrew Cooper @ 2018-12-19 23:30 UTC (permalink / raw)
  To: Tamas K Lengyel, Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Ian Jackson,
	Julien Grall, Jan Beulich, Xen-devel, Roger Pau Monné

On 19/12/2018 22:33, Tamas K Lengyel wrote:
> On Wed, Dec 19, 2018 at 11:52 AM Petre Pircalabu
> <ppircalabu@bitdefender.com> wrote:
>> This patchset is a rework of the "multi-page ring buffer" for vm_events
>> patch based on Andrew Cooper's comments.
>> For synchronous vm_events the ring waitqueue logic was unnecessary as the
>> vcpu sending the request was blocked until a response was received.
>> To simplify the request/response mechanism, an array of slotted channels
>> was created, one per vcpu. Each vcpu puts the request in the
>> corresponding slot and blocks until the response is received.
>>
>> I'm sending this patch as a RFC because, while I'm still working on way to
>> measure the overall performance improvement, your feedback would be a great
>> assistance.
> Generally speaking this approach is OK, but I'm concerned that we will
> eventually run into the same problem that brought up the idea of using
> multi-page rings: vm_event structures that are larger then a page.
> Right now this series adds a ring for each vCPU, which does mitigate
> some of the bottleneck, but it does not really address the root cause.
> It also adds significant complexity as the userspace side now has to
> map in multiple rings, each with its own event channel and polling
> requirements.

I haven't looked at the series in detail yet, but there should
explicitly be no issue if/when sizeof(vm_event) exceeds 4k.  In practice
there are many reasons why letting it get that large will be a problem.

The size of the sync "ring" (slotted mapping?) is exactly
sizeof(vm_event) * d->max_vcpus, (which is a function of the interface
version) and should be mapped as a single contiguous block of pages. 
The resource foreign map interface was designed with this case in mind.

The async ring is a traditional ring, and (eventually?) wants to become
a caller-specified size to make it large enough for any reasonable
quantity of queued requests, at which point it can switch to lossy
semantics.

There are three angles for doing this work.
1) Avoid cases where the guests balloon driver can interfere with
attaching the monitor ring.
2) Deal more scalably with large number of vcpus/events.
3) Remove the final case where Xen needs to wait for the queue to drain,
which in turn lets us delete the waitqueue infrastructure (which is
horrible in its own right) and breaks one of the safety mechanisms that
live-patching relies on.

Frankly, option 3 is the one I care most about (because I can't safely
livepatch a system using introspection, and XenServer supports both of
these things), whereas the first two are concrete improvements for
userspace using vm_event APIs.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2018-12-19 22:33 ` [PATCH RFC 0/6] Slotted channels for sync vm_events Tamas K Lengyel
  2018-12-19 23:30   ` Andrew Cooper
@ 2018-12-20 10:48   ` Petre Ovidiu PIRCALABU
  2018-12-20 14:08     ` Tamas K Lengyel
  1 sibling, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2018-12-20 10:48 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Wed, 2018-12-19 at 15:33 -0700, Tamas K Lengyel wrote:
> On Wed, Dec 19, 2018 at 11:52 AM Petre Pircalabu
> <ppircalabu@bitdefender.com> wrote:
> > 
> > This patchset is a rework of the "multi-page ring buffer" for
> > vm_events
> > patch based on Andrew Cooper's comments.
> > For synchronous vm_events the ring waitqueue logic was unnecessary
> > as the
> > vcpu sending the request was blocked until a response was received.
> > To simplify the request/response mechanism, an array of slotted
> > channels
> > was created, one per vcpu. Each vcpu puts the request in the
> > corresponding slot and blocks until the response is received.
> > 
> > I'm sending this patch as a RFC because, while I'm still working on
> > way to
> > measure the overall performance improvement, your feedback would be
> > a great
> > assistance.
> 
> Generally speaking this approach is OK, but I'm concerned that we
> will
> eventually run into the same problem that brought up the idea of
> using
> multi-page rings: vm_event structures that are larger then a page.
> Right now this series adds a ring for each vCPU, which does mitigate
> some of the bottleneck, but it does not really address the root
> cause.
> It also adds significant complexity as the userspace side now has to
> map in multiple rings, each with its own event channel and polling
> requirements.
> 
> Tamas
The memory for the vm_event "rings" (actually for synchronous vm_event
just an array of vm_event_slot structures ( state + vm_event_request /
vm_event_response) is allocated directly from domheap and spans over as
many pages as necessary.
Regarding the userspace complexity, unfortunately I haven't had a
better idea (but I'm open to suggestions).
In order to have a lock-free mechanism to access the vm_event data,
each vcpu should access only its own slot (referenced by vcpu_id).
I have used the "one event channel per slot + one for the async ring"
approach, because, to my understanding, the only additional information
an event channel can carry is the vcpu on which is triggered.

//Petre


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-19 18:52 ` [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests Petre Pircalabu
@ 2018-12-20 12:05   ` Paul Durrant
  2018-12-20 14:25     ` Petre Ovidiu PIRCALABU
                       ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Paul Durrant @ 2018-12-20 12:05 UTC (permalink / raw)
  To: 'Petre Pircalabu', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Petre Pircalabu
> Sent: 19 December 2018 18:52
> To: xen-devel@lists.xenproject.org
> Cc: Petre Pircalabu <ppircalabu@bitdefender.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wei.liu2@citrix.com>; Razvan Cojocaru
> <rcojocaru@bitdefender.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; George Dunlap <George.Dunlap@citrix.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>;
> Tim (Xen.org) <tim@xen.org>; Julien Grall <julien.grall@arm.com>; Tamas K
> Lengyel <tamas@tklengyel.com>; Jan Beulich <jbeulich@suse.com>; Roger Pau
> Monne <roger.pau@citrix.com>
> Subject: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels for
> sync requests.
> 
> In high throughput introspection scenarios where lots of monitor
> vm_events are generated, the ring buffer can fill up before the monitor
> application gets a chance to handle all the requests thus blocking
> other vcpus which will have to wait for a slot to become available.
> 
> This patch adds support for a different mechanism to handle synchronous
> vm_event requests / responses. As each synchronous request pauses the
> vcpu until the corresponsing response is handled, it can be stored in
> a slotted memory buffer (one per vcpu) shared between the hypervisor and
> the controlling domain. The asynchronous vm_event requests will be sent
> to the controlling domain using a ring buffer, but without blocking the
> vcpu as no response is required.
> 
> The memory for the asynchronous ring and the synchronous channels will
> be allocated from domheap and mapped to the controlling domain using the
> foreignmemory_map_resource interface. Unlike the current implementation,
> the allocated pages are not part of the target DomU, so they will not be
> reclaimed when the vm_event domain is disabled.

Why re-invent the wheel here? The ioreq infrastructure already does pretty much everything you need AFAICT.

  Paul

> 
> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> ---
>  tools/libxc/include/xenctrl.h |  11 +
>  tools/libxc/xc_monitor.c      |  36 +++
>  tools/libxc/xc_private.h      |  14 ++
>  tools/libxc/xc_vm_event.c     |  74 +++++-
>  xen/arch/x86/mm.c             |   7 +
>  xen/common/vm_event.c         | 515
> ++++++++++++++++++++++++++++++++++++++----
>  xen/include/public/domctl.h   |  25 +-
>  xen/include/public/memory.h   |   2 +
>  xen/include/public/vm_event.h |  15 ++
>  xen/include/xen/vm_event.h    |   4 +
>  10 files changed, 660 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index de0b990..fad8bc4 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2012,6 +2012,17 @@ int xc_get_mem_access(xc_interface *xch, uint32_t
> domain_id,
>   * Caller has to unmap this page when done.
>   */
>  void *xc_monitor_enable(xc_interface *xch, uint32_t domain_id, uint32_t
> *port);
> +
> +struct xenforeignmemory_resource_handle *xc_monitor_enable_ex(
> +    xc_interface *xch,
> +    uint32_t domain_id,
> +    void **_ring_buffer,
> +    uint32_t ring_frames,
> +    uint32_t *ring_port,
> +    void **_sync_buffer,
> +    uint32_t *sync_ports,
> +    uint32_t nr_sync_channels);
> +
>  int xc_monitor_disable(xc_interface *xch, uint32_t domain_id);
>  int xc_monitor_resume(xc_interface *xch, uint32_t domain_id);
>  /*
> diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
> index 718fe8b..4ceb528 100644
> --- a/tools/libxc/xc_monitor.c
> +++ b/tools/libxc/xc_monitor.c
> @@ -49,6 +49,42 @@ void *xc_monitor_enable(xc_interface *xch, uint32_t
> domain_id, uint32_t *port)
>      return buffer;
>  }
> 
> +struct xenforeignmemory_resource_handle *xc_monitor_enable_ex(
> +    xc_interface *xch,
> +    uint32_t domain_id,
> +    void **_ring_buffer,
> +    uint32_t ring_frames,
> +    uint32_t *ring_port,
> +    void **_sync_buffer,
> +    uint32_t *sync_ports,
> +    uint32_t nr_sync_channels)
> +{
> +    xenforeignmemory_resource_handle *fres;
> +    int saved_errno;
> +
> +    /* Pause the domain for ring page setup */
> +    if ( xc_domain_pause(xch, domain_id) )
> +    {
> +        PERROR("Unable to pause domain\n");
> +        return NULL;
> +    }
> +
> +    fres = xc_vm_event_enable_ex(xch, domain_id,
> XEN_VM_EVENT_TYPE_MONITOR,
> +                                _ring_buffer, ring_frames, ring_port,
> +                                _sync_buffer, sync_ports,
> nr_sync_channels);
> +
> +    saved_errno = errno;
> +    if ( xc_domain_unpause(xch, domain_id) )
> +    {
> +        if ( fres )
> +            saved_errno = errno;
> +        PERROR("Unable to unpause domain");
> +    }
> +
> +    errno = saved_errno;
> +    return fres;
> +}
> +
>  int xc_monitor_disable(xc_interface *xch, uint32_t domain_id)
>  {
>      return xc_vm_event_control(xch, domain_id,
> diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
> index 482451c..1f70223 100644
> --- a/tools/libxc/xc_private.h
> +++ b/tools/libxc/xc_private.h
> @@ -420,6 +420,20 @@ int xc_vm_event_control(xc_interface *xch, uint32_t
> domain_id, unsigned int op,
>  void *xc_vm_event_enable(xc_interface *xch, uint32_t domain_id, int type,
>                           uint32_t *port);
> 
> +/*
> + * Enables vm_event for using the xenforeignmemory_map_resource
> interface.
> + * The vm_event type can be XEN_VM_EVENT_TYPE_(PAGING/MONITOR/SHARING).
> + *
> + * The function returns:
> + *  - A ring for asynchronous vm_events.
> + *  - A slotted buffer for synchronous vm_events (one slot per vcpu)
> + *  - xenforeignmemory_resource_handle used exclusively for resource
> cleanup
> + */
> +xenforeignmemory_resource_handle *xc_vm_event_enable_ex(xc_interface
> *xch,
> +    uint32_t domain_id, int type,
> +    void **_ring_buffer, uint32_t ring_frames, uint32_t *ring_port,
> +    void **_sync_buffer, uint32_t *sync_ports, uint32_t
> nr_sync_channels);
> +
>  int do_dm_op(xc_interface *xch, uint32_t domid, unsigned int nr_bufs,
> ...);
> 
>  #endif /* __XC_PRIVATE_H__ */
> diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c
> index 4fc2548..0a976b4 100644
> --- a/tools/libxc/xc_vm_event.c
> +++ b/tools/libxc/xc_vm_event.c
> @@ -22,6 +22,12 @@
> 
>  #include "xc_private.h"
> 
> +#include <xen/vm_event.h>
> +
> +#ifndef PFN_UP
> +#define PFN_UP(x)     (((x) + PAGE_SIZE-1) >> PAGE_SHIFT)
> +#endif /* PFN_UP */
> +
>  int xc_vm_event_control(xc_interface *xch, uint32_t domain_id, unsigned
> int op,
>                          unsigned int type)
>  {
> @@ -120,7 +126,7 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t
> domain_id, int type,
>          goto out;
>      }
> 
> -    *port = domctl.u.vm_event_op.port;
> +    *port = domctl.u.vm_event_op.u.enable.port;
> 
>      /* Remove the ring_pfn from the guest's physmap */
>      rc = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0,
> &ring_pfn);
> @@ -138,6 +144,72 @@ void *xc_vm_event_enable(xc_interface *xch, uint32_t
> domain_id, int type,
>      return ring_page;
>  }
> 
> +xenforeignmemory_resource_handle *xc_vm_event_enable_ex(xc_interface
> *xch,
> +    uint32_t domain_id, int type,
> +    void **_ring_buffer, uint32_t ring_frames, uint32_t *ring_port,
> +    void **_sync_buffer, uint32_t *sync_ports, uint32_t nr_sync_channels)
> +{
> +    DECLARE_DOMCTL;
> +    DECLARE_HYPERCALL_BOUNCE(sync_ports, nr_sync_channels *
> sizeof(uint32_t),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    xenforeignmemory_resource_handle *fres;
> +    unsigned long nr_frames;
> +    void *buffer;
> +
> +    if ( !_ring_buffer || !ring_port || !_sync_buffer || !sync_ports )
> +    {
> +        errno = EINVAL;
> +        return NULL;
> +    }
> +
> +    nr_frames = ring_frames + PFN_UP(nr_sync_channels * sizeof(struct
> vm_event_slot));
> +
> +    fres = xenforeignmemory_map_resource(xch->fmem, domain_id,
> +                                         XENMEM_resource_vm_event, type,
> 0,
> +                                         nr_frames, &buffer,
> +                                         PROT_READ | PROT_WRITE, 0);
> +    if ( !fres )
> +    {
> +        PERROR("Could not map the vm_event pages\n");
> +        return NULL;
> +    }
> +
> +    domctl.cmd = XEN_DOMCTL_vm_event_op;
> +    domctl.domain = domain_id;
> +    domctl.u.vm_event_op.op = XEN_VM_EVENT_GET_PORTS;
> +    domctl.u.vm_event_op.type = type;
> +
> +    if ( xc_hypercall_bounce_pre(xch, sync_ports) )
> +    {
> +        PERROR("Could not bounce memory for XEN_DOMCTL_vm_event_op");
> +        errno = ENOMEM;
> +        return NULL;
> +    }
> +
> +    set_xen_guest_handle(domctl.u.vm_event_op.u.get_ports.sync,
> sync_ports);
> +
> +    if ( do_domctl(xch, &domctl) )
> +    {
> +        PERROR("Failed to get vm_event ports\n");
> +        goto out;
> +    }
> +
> +    xc_hypercall_bounce_post(xch, sync_ports);
> +    *ring_port = domctl.u.vm_event_op.u.get_ports.async;
> +
> +    *_sync_buffer = buffer + ring_frames * PAGE_SIZE;
> +    *_ring_buffer = buffer;
> +
> +    return fres;
> +
> +out:
> +    xc_hypercall_bounce_post(xch, sync_ports);
> +    if ( fres )
> +        xenforeignmemory_unmap_resource(xch->fmem, fres);
> +    return NULL;
> +}
> +
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 1431f34..256c63b 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -103,6 +103,7 @@
>  #include <xen/efi.h>
>  #include <xen/grant_table.h>
>  #include <xen/hypercall.h>
> +#include <xen/vm_event.h>
>  #include <asm/paging.h>
>  #include <asm/shadow.h>
>  #include <asm/page.h>
> @@ -4469,6 +4470,12 @@ int arch_acquire_resource(struct domain *d,
> unsigned int type,
>      }
>  #endif
> 
> +    case XENMEM_resource_vm_event:
> +    {
> +        rc = vm_event_get_frames(d, id, frame, nr_frames, mfn_list);
> +        break;
> +    }
> +
>      default:
>          rc = -EOPNOTSUPP;
>          break;
> diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
> index 77da41b..a2712a0 100644
> --- a/xen/common/vm_event.c
> +++ b/xen/common/vm_event.c
> @@ -28,6 +28,8 @@
>  #include <asm/p2m.h>
>  #include <asm/monitor.h>
>  #include <asm/vm_event.h>
> +#include <xen/guest_access.h>
> +#include <xen/vmap.h>
>  #include <xsm/xsm.h>
> 
>  /* for public/io/ring.h macros */
> @@ -40,6 +42,7 @@
>  #define vm_event_unlock(_ved)     spin_unlock(&(_ved)->lock)
> 
>  #define to_vm_event_domain_ring(_ved) container_of(_ved, struct
> vm_event_domain_ring, ved)
> +#define to_vm_event_domain_channel(_ved) container_of(_ved, struct
> vm_event_domain_channel, ved)
> 
>  struct vm_event_domain
>  {
> @@ -48,7 +51,8 @@ struct vm_event_domain
>      int (*claim_slot)(struct vm_event_domain *ved, bool allow_sleep);
>      void (*release_slot)(struct vm_event_domain *ved);
>      void (*put_request)(struct vm_event_domain *ved, vm_event_request_t
> *req);
> -    int (*get_response)(struct vm_event_domain *ved, vm_event_response_t
> *rsp);
> +    int (*get_response)(struct vm_event_domain *ved, struct vcpu *v,
> +                        unsigned int port, vm_event_response_t *rsp);
>      int (*disable)(struct vm_event_domain **_ved);
> 
>      /* The domain associated with the VM event */
> @@ -58,11 +62,6 @@ struct vm_event_domain
>      spinlock_t lock;
>  };
> 
> -bool vm_event_check(struct vm_event_domain *ved)
> -{
> -    return (ved && ved->check(ved));
> -}
> -
>  /* VM event domain ring implementation */
>  struct vm_event_domain_ring
>  {
> @@ -78,22 +77,57 @@ struct vm_event_domain_ring
>      vm_event_front_ring_t front_ring;
>      /* event channel port (vcpu0 only) */
>      int xen_port;
> -    /* vm_event bit for vcpu->pause_flags */
> -    int pause_flag;
>      /* list of vcpus waiting for room in the ring */
>      struct waitqueue_head wq;
>      /* the number of vCPUs blocked */
>      unsigned int blocked;
> +    /* vm_event bit for vcpu->pause_flags */
> +    int pause_flag;
>      /* The last vcpu woken up */
>      unsigned int last_vcpu_wake_up;
>  };
> 
> +struct vm_event_buffer
> +{
> +    void *va;
> +    unsigned int nr_frames;
> +    mfn_t mfn[0];
> +};
> +
> +struct vm_event_domain_channel
> +{
> +    /* VM event domain */
> +    struct vm_event_domain ved;
> +    /* ring for asynchronous vm events */
> +    struct vm_event_buffer *ring;
> +    /* front-end ring */
> +    vm_event_front_ring_t front_ring;
> +    /* per vcpu channels for synchronous vm events */
> +    struct vm_event_buffer *channels;
> +    /*
> +     * event channels ports
> +     * - one per vcpu for the synchronous channels.
> +     * - one for the asynchronous ring.
> +     */
> +    uint32_t xen_ports[0];
> +};
> +
> +bool vm_event_check(struct vm_event_domain *ved)
> +{
> +    return (ved && ved->check(ved));
> +}
> +
>  static bool vm_event_ring_check(struct vm_event_domain *ved)
>  {
>      struct vm_event_domain_ring *impl = to_vm_event_domain_ring(ved);
>      return impl->ring_page != NULL;
>  }
> 
> +static bool is_vm_event_domain_ring(struct vm_event_domain *ved)
> +{
> +    return ved->check == vm_event_ring_check;
> +}
> +
>  static unsigned int vm_event_ring_available(struct vm_event_domain_ring
> *ved)
>  {
>      int avail_req = RING_FREE_REQUESTS(&ved->front_ring);
> @@ -317,12 +351,15 @@ static void vm_event_ring_put_request(struct
> vm_event_domain *ved,
>      notify_via_xen_event_channel(d, impl->xen_port);
>  }
> 
> -static int vm_event_ring_get_response(struct vm_event_domain *ved,
> -                                      vm_event_response_t *rsp)
> +static int vm_event_ring_get_response(
> +    struct vm_event_domain *ved,
> +    struct vcpu *v,
> +    unsigned int port,
> +    vm_event_response_t *rsp)
>  {
>      vm_event_front_ring_t *front_ring;
>      RING_IDX rsp_cons;
> -    struct vm_event_domain_ring *impl = (struct vm_event_domain_ring
> *)ved;
> +    struct vm_event_domain_ring *impl = to_vm_event_domain_ring(ved);
> 
>      vm_event_lock(ved);
> 
> @@ -332,7 +369,7 @@ static int vm_event_ring_get_response(struct
> vm_event_domain *ved,
>      if ( !RING_HAS_UNCONSUMED_RESPONSES(front_ring) )
>      {
>          vm_event_unlock(ved);
> -        return 0;
> +        return -1;
>      }
> 
>      /* Copy response */
> @@ -353,6 +390,35 @@ static int vm_event_ring_get_response(struct
> vm_event_domain *ved,
>  }
> 
>  /*
> + * The response is received only from the sync channels
> + */
> +static int vm_event_channel_get_response(
> +    struct vm_event_domain *ved,
> +    struct vcpu *v,
> +    unsigned int port,
> +    vm_event_response_t *rsp)
> +{
> +    struct vm_event_domain_channel *impl =
> to_vm_event_domain_channel(ved);
> +    struct vm_event_slot *slot = impl->channels->va + v->vcpu_id *
> sizeof(struct vm_event_slot);
> +
> +    vm_event_lock(ved);
> +
> +    if ( slot->state != VM_EVENT_SLOT_STATE_FINISH )
> +    {
> +        gdprintk(XENLOG_G_WARNING, "The VM event slot state for d%dv%d is
> invalid.\n",
> +                 ved->d->domain_id, v->vcpu_id);
> +        vm_event_unlock(ved);
> +        return -1;
> +    }
> +
> +    memcpy(rsp, &slot->u.rsp, sizeof(*rsp));
> +    slot->state = VM_EVENT_SLOT_STATE_IDLE;
> +
> +    vm_event_unlock(ved);
> +    return 0;
> +}
> +
> +/*
>   * Pull all responses from the given ring and unpause the corresponding
> vCPU
>   * if required. Based on the response type, here we can also call custom
>   * handlers.
> @@ -360,10 +426,11 @@ static int vm_event_ring_get_response(struct
> vm_event_domain *ved,
>   * Note: responses are handled the same way regardless of which ring they
>   * arrive on.
>   */
> -static int vm_event_resume(struct vm_event_domain *ved)
> +static int vm_event_resume(struct vm_event_domain *ved, struct vcpu *v,
> unsigned int port)
>  {
>      vm_event_response_t rsp;
>      struct domain *d;
> +    int rc;
> 
>      if (! vm_event_check(ved))
>          return -ENODEV;
> @@ -380,22 +447,25 @@ static int vm_event_resume(struct vm_event_domain
> *ved)
>       */
>      ASSERT(d != current->domain);
> 
> -    /* Pull all responses off the ring. */
> -    while ( ved->get_response(ved, &rsp) )
> +    /* Loop until all available responses are read. */
> +    do
>      {
> -        struct vcpu *v;
> +        struct vcpu *rsp_v;
> +        rc = ved->get_response(ved, v, port, &rsp);
> +        if ( rc < 0 )
> +            break;
> 
>          if ( rsp.version != VM_EVENT_INTERFACE_VERSION )
>          {
>              printk(XENLOG_G_WARNING "vm_event interface version
> mismatch\n");
> -            continue;
> +            goto end_loop;
>          }
> 
>          /* Validate the vcpu_id in the response. */
>          if ( (rsp.vcpu_id >= d->max_vcpus) || !d->vcpu[rsp.vcpu_id] )
> -            continue;
> +            goto end_loop;
> 
> -        v = d->vcpu[rsp.vcpu_id];
> +        rsp_v = d->vcpu[rsp.vcpu_id];
> 
>          /*
>           * In some cases the response type needs extra handling, so here
> @@ -403,7 +473,7 @@ static int vm_event_resume(struct vm_event_domain
> *ved)
>           */
> 
>          /* Check flags which apply only when the vCPU is paused */
> -        if ( atomic_read(&v->vm_event_pause_count) )
> +        if ( atomic_read(&rsp_v->vm_event_pause_count) )
>          {
>  #ifdef CONFIG_HAS_MEM_PAGING
>              if ( rsp.reason == VM_EVENT_REASON_MEM_PAGING )
> @@ -415,34 +485,36 @@ static int vm_event_resume(struct vm_event_domain
> *ved)
>               * has to set arch-specific flags when supported, and to
> avoid
>               * bitmask overhead when it isn't supported.
>               */
> -            vm_event_emulate_check(v, &rsp);
> +            vm_event_emulate_check(rsp_v, &rsp);
> 
>              /*
>               * Check in arch-specific handler to avoid bitmask overhead
> when
>               * not supported.
>               */
> -            vm_event_register_write_resume(v, &rsp);
> +            vm_event_register_write_resume(rsp_v, &rsp);
> 
>              /*
>               * Check in arch-specific handler to avoid bitmask overhead
> when
>               * not supported.
>               */
> -            vm_event_toggle_singlestep(d, v, &rsp);
> +            vm_event_toggle_singlestep(d, rsp_v, &rsp);
> 
>              /* Check for altp2m switch */
>              if ( rsp.flags & VM_EVENT_FLAG_ALTERNATE_P2M )
> -                p2m_altp2m_check(v, rsp.altp2m_idx);
> +                p2m_altp2m_check(rsp_v, rsp.altp2m_idx);
> 
>              if ( rsp.flags & VM_EVENT_FLAG_SET_REGISTERS )
> -                vm_event_set_registers(v, &rsp);
> +                vm_event_set_registers(rsp_v, &rsp);
> 
>              if ( rsp.flags & VM_EVENT_FLAG_GET_NEXT_INTERRUPT )
> -                vm_event_monitor_next_interrupt(v);
> +                vm_event_monitor_next_interrupt(rsp_v);
> 
>              if ( rsp.flags & VM_EVENT_FLAG_VCPU_PAUSED )
> -                vm_event_vcpu_unpause(v);
> +                vm_event_vcpu_unpause(rsp_v);
>          }
> +end_loop: ;
>      }
> +    while ( rc > 0 );
> 
>      return 0;
>  }
> @@ -527,28 +599,28 @@ int __vm_event_claim_slot(struct vm_event_domain
> *ved, bool allow_sleep)
>      if ( !vm_event_check(ved) )
>          return -EOPNOTSUPP;
> 
> -    return ved->claim_slot(ved, allow_sleep);
> +    return (ved->claim_slot) ? ved->claim_slot(ved, allow_sleep) : 0;
>  }
> 
>  #ifdef CONFIG_HAS_MEM_PAGING
>  /* Registered with Xen-bound event channel for incoming notifications. */
>  static void mem_paging_notification(struct vcpu *v, unsigned int port)
>  {
> -    vm_event_resume(v->domain->vm_event_paging);
> +    vm_event_resume(v->domain->vm_event_paging, v, port);
>  }
>  #endif
> 
>  /* Registered with Xen-bound event channel for incoming notifications. */
>  static void monitor_notification(struct vcpu *v, unsigned int port)
>  {
> -    vm_event_resume(v->domain->vm_event_monitor);
> +    vm_event_resume(v->domain->vm_event_monitor, v, port);
>  }
> 
>  #ifdef CONFIG_HAS_MEM_SHARING
>  /* Registered with Xen-bound event channel for incoming notifications. */
>  static void mem_sharing_notification(struct vcpu *v, unsigned int port)
>  {
> -    vm_event_resume(v->domain->vm_event_share);
> +    vm_event_resume(v->domain->vm_event_share, v, port);
>  }
>  #endif
> 
> @@ -565,19 +637,24 @@ void vm_event_cleanup(struct domain *d)
>           * Finally, because this code path involves previously
>           * pausing the domain (domain_kill), unpausing the
>           * vcpus causes no harm. */
> -        destroy_waitqueue_head(&to_vm_event_domain_ring(d-
> >vm_event_paging)->wq);
> +        if ( is_vm_event_domain_ring(d->vm_event_paging) )
> +            destroy_waitqueue_head(&to_vm_event_domain_ring(d-
> >vm_event_paging)->wq);
>          (void)vm_event_disable(&d->vm_event_paging);
>      }
>  #endif
> +
>      if ( vm_event_check(d->vm_event_monitor) )
>      {
> -        destroy_waitqueue_head(&to_vm_event_domain_ring(d-
> >vm_event_monitor)->wq);
> +        if ( is_vm_event_domain_ring(d->vm_event_monitor) )
> +            destroy_waitqueue_head(&to_vm_event_domain_ring(d-
> >vm_event_monitor)->wq);
>          (void)vm_event_disable(&d->vm_event_monitor);
>      }
> +
>  #ifdef CONFIG_HAS_MEM_SHARING
>      if ( vm_event_check(d->vm_event_share) )
>      {
> -        destroy_waitqueue_head(&to_vm_event_domain_ring(d-
> >vm_event_share)->wq);
> +        if ( is_vm_event_domain_ring(d->vm_event_share) )
> +            destroy_waitqueue_head(&to_vm_event_domain_ring(d-
> >vm_event_share)->wq);
>          (void)vm_event_disable(&d->vm_event_share);
>      }
>  #endif
> @@ -641,7 +718,7 @@ static int vm_event_ring_enable(
>      if ( rc < 0 )
>          goto err;
> 
> -    impl->xen_port = vec->port = rc;
> +    impl->xen_port = vec->u.enable.port = rc;
> 
>      /* Prepare ring buffer */
>      FRONT_RING_INIT(&impl->front_ring,
> @@ -668,6 +745,294 @@ static int vm_event_ring_enable(
>      return rc;
>  }
> 
> +/*
> + * Helper functions for allocating / freeing vm_event buffers
> + */
> +static int vm_event_alloc_buffer(struct domain *d, unsigned int
> nr_frames,
> +                                 struct vm_event_buffer **_veb)
> +{
> +    struct vm_event_buffer *veb;
> +    int i = 0, rc;
> +
> +    veb = _xzalloc(sizeof(struct vm_event_buffer) + nr_frames *
> sizeof(mfn_t),
> +                   __alignof__(struct vm_event_buffer));
> +    if ( unlikely(!veb) )
> +    {
> +        rc = -ENOMEM;
> +        goto err;
> +    }
> +
> +    veb->nr_frames = nr_frames;
> +
> +    for ( i = 0; i < nr_frames; i++ )
> +    {
> +        struct page_info *page = alloc_domheap_page(d, 0);
> +
> +        if ( !page )
> +        {
> +            rc = -ENOMEM;
> +            goto err;
> +        }
> +
> +        if ( !get_page_and_type(page, d, PGT_writable_page) )
> +        {
> +            domain_crash(d);
> +            rc = -ENODATA;
> +            goto err;
> +        }
> +
> +        veb->mfn[i] = page_to_mfn(page);
> +    }
> +
> +    veb->va = vmap(veb->mfn, nr_frames);
> +    if ( !veb->va )
> +    {
> +        rc = -ENOMEM;
> +        goto err;
> +    }
> +
> +    for( i = 0; i < nr_frames; i++ )
> +        clear_page(veb->va + i * PAGE_SIZE);
> +
> +    *_veb = veb;
> +    return 0;
> +
> +err:
> +    while ( --i >= 0 )
> +    {
> +        struct page_info *page = mfn_to_page(veb->mfn[i]);
> +
> +        if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
> +            put_page(page);
> +        put_page_and_type(page);
> +    }
> +
> +    xfree(veb);
> +    return rc;
> +}
> +
> +static void vm_event_free_buffer(struct vm_event_buffer **_veb)
> +{
> +    struct vm_event_buffer *veb = *_veb;
> +
> +    if ( !veb )
> +        return;
> +
> +    if ( veb->va )
> +    {
> +        int i;
> +
> +        vunmap(veb->va);
> +        for ( i = 0; i < veb->nr_frames; i++ )
> +        {
> +            struct page_info *page = mfn_to_page(veb->mfn[i]);
> +
> +            if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
> +                put_page(page);
> +            put_page_and_type(page);
> +        }
> +    }
> +    XFREE(*_veb);
> +}
> +
> +static bool vm_event_channel_check(struct vm_event_domain *ved)
> +{
> +    struct vm_event_domain_channel *impl =
> to_vm_event_domain_channel(ved);
> +    return impl->ring->va != NULL && impl->channels->va != NULL;
> +}
> +
> +static void vm_event_channel_put_request(struct vm_event_domain *ved,
> +                                         vm_event_request_t *req)
> +{
> +    struct vcpu *curr = current;
> +    struct vm_event_domain_channel *impl =
> to_vm_event_domain_channel(ved);
> +    struct domain *d;
> +    struct vm_event_slot *slot;
> +    bool sync;
> +
> +    if ( !vm_event_check(ved) )
> +        return;
> +
> +    d = ved->d;
> +    slot = impl->channels->va + req->vcpu_id * sizeof(struct
> vm_event_slot);
> +
> +    if ( curr->domain != d )
> +    {
> +        req->flags |= VM_EVENT_FLAG_FOREIGN;
> +#ifndef NDEBUG
> +        if ( !(req->flags & VM_EVENT_FLAG_VCPU_PAUSED) )
> +            gdprintk(XENLOG_G_WARNING, "d%dv%d was not paused.\n",
> +                     d->domain_id, req->vcpu_id);
> +#endif
> +    }
> +
> +    req->version = VM_EVENT_INTERFACE_VERSION;
> +
> +    sync = req->flags & VM_EVENT_FLAG_VCPU_PAUSED;
> +
> +    vm_event_lock(ved);
> +
> +    if ( sync )
> +    {
> +        if ( slot->state != VM_EVENT_SLOT_STATE_IDLE )
> +        {
> +            gdprintk(XENLOG_G_WARNING, "The VM event slot for d%dv%d is
> not IDLE.\n",
> +                     d->domain_id, req->vcpu_id);
> +            vm_event_unlock(ved);
> +            return;
> +        }
> +        memcpy( &slot->u.req, req, sizeof(*req) );
> +        slot->state = VM_EVENT_SLOT_STATE_SUBMIT;
> +    }
> +    else
> +    {
> +        vm_event_front_ring_t *front_ring;
> +        RING_IDX req_prod;
> +
> +        /* Due to the reservations, this step must succeed. */
> +        front_ring = &impl->front_ring;
> +
> +        /* Copy request */
> +        req_prod = front_ring->req_prod_pvt;
> +        memcpy(RING_GET_REQUEST(front_ring, req_prod), req,
> sizeof(*req));
> +        req_prod++;
> +
> +        /* Update ring */
> +        front_ring->req_prod_pvt = req_prod;
> +        RING_PUSH_REQUESTS(front_ring);
> +    }
> +
> +    vm_event_unlock(ved);
> +
> +    notify_via_xen_event_channel(d, impl->xen_ports[(sync) ? req->vcpu_id
> : d->max_vcpus]);
> +}
> +
> +static int vm_event_channel_disable(struct vm_event_domain **_ved)
> +{
> +    struct vm_event_domain_channel *ved =
> to_vm_event_domain_channel(*_ved);
> +    struct domain *d = ved->ved.d;
> +    struct vcpu *v;
> +    int i;
> +
> +    vm_event_lock(&ved->ved);
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( atomic_read(&v->vm_event_pause_count) )
> +            vm_event_vcpu_unpause(v);
> +        /*
> +        if ( test_and_clear_bit(ved->ved.pause_flag, &v->pause_flags) )
> +        {
> +            vcpu_unpause(v);
> +        }
> +        */
> +    }
> +
> +    /* Free domU's event channels and leave the other one unbound */
> +    for ( i = 0; i < d->max_vcpus; i++ )
> +        evtchn_close(d, ved->xen_ports[i], 0);
> +    evtchn_close(d, ved->xen_ports[d->max_vcpus], 0);
> +
> +    vm_event_free_buffer(&ved->ring);
> +    vm_event_free_buffer(&ved->channels);
> +
> +    vm_event_cleanup_domain(d);
> +
> +    vm_event_unlock(&ved->ved);
> +
> +    XFREE(*_ved);
> +
> +    return 0;
> +}
> +
> +static int vm_event_channel_enable(
> +    struct domain *d,
> +    struct vm_event_domain **_ved,
> +    unsigned int nr_frames,
> +    xen_event_channel_notification_t notification_fn)
> +{
> +    int i = 0, rc;
> +    struct vm_event_domain_channel *impl;
> +    unsigned int nr_ring_frames, nr_channel_frames;
> +
> +    if ( *_ved )
> +        return -EBUSY;
> +
> +    if ( nr_frames <= PFN_UP(d->max_vcpus * sizeof(struct vm_event_slot))
> )
> +        return -EINVAL;
> +
> +    impl = _xzalloc(sizeof(struct vm_event_domain_channel) +
> +                        ( d->max_vcpus + 1 ) * sizeof(uint32_t),
> +                    __alignof__(struct vm_event_domain_channel));
> +    if ( !impl )
> +        return -ENOMEM;
> +
> +    impl->ved.d = d;
> +    impl->ved.check = vm_event_channel_check;
> +    impl->ved.claim_slot = NULL;
> +    impl->ved.release_slot = NULL;
> +    impl->ved.put_request = vm_event_channel_put_request;
> +    impl->ved.get_response = vm_event_channel_get_response;
> +    impl->ved.disable = vm_event_channel_disable;
> +
> +    nr_channel_frames = PFN_UP(d->max_vcpus *
> sizeof(vm_event_request_t));
> +    nr_ring_frames = nr_frames - nr_channel_frames;
> +
> +    vm_event_lock_init(&impl->ved);
> +    vm_event_lock(&impl->ved);
> +
> +    rc = vm_event_init_domain(d);
> +    if ( rc < 0 )
> +        goto err;
> +
> +    rc = vm_event_alloc_buffer(d, nr_ring_frames, &impl->ring);
> +    if ( rc )
> +        goto err;
> +
> +    /* Allocate event channel for the async ring*/
> +    rc = alloc_unbound_xen_event_channel(d, 0, current->domain-
> >domain_id,
> +                                         notification_fn);
> +    if ( rc < 0 )
> +        goto err;
> +
> +    impl->xen_ports[d->max_vcpus] = rc;
> +
> +    /* Prepare ring buffer */
> +    FRONT_RING_INIT(&impl->front_ring,
> +                    (vm_event_sring_t *)impl->ring->va,
> +                    impl->ring->nr_frames * PAGE_SIZE);
> +
> +    rc = vm_event_alloc_buffer(d, nr_channel_frames, &impl->channels);
> +    if ( rc != 0)
> +        goto err;
> +
> +    for ( i = 0; i < d->max_vcpus; i++)
> +    {
> +        rc = alloc_unbound_xen_event_channel(d, i, current->domain-
> >domain_id,
> +                                             notification_fn);
> +        if ( rc < 0 )
> +            goto err;
> +
> +        impl->xen_ports[i] = rc;
> +    }
> +
> +    *_ved = &impl->ved;
> +
> +    vm_event_unlock(&impl->ved);
> +    return 0;
> +
> +err:
> +    while (i--)
> +        evtchn_close(d, impl->xen_ports[i], 0);
> +    evtchn_close(d, impl->xen_ports[d->max_vcpus], 0);
> +    vm_event_free_buffer(&impl->ring);
> +    vm_event_free_buffer(&impl->channels);
> +    vm_event_cleanup_domain(d);
> +    vm_event_unlock(&impl->ved);
> +    xfree(impl);
> +    return rc;
> +}
> +
>  int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
>                      XEN_GUEST_HANDLE_PARAM(void) u_domctl)
>  {
> @@ -748,7 +1113,9 @@ int vm_event_domctl(struct domain *d, struct
> xen_domctl_vm_event_op *vec,
>              break;
> 
>          case XEN_VM_EVENT_RESUME:
> -            rc = vm_event_resume(d->vm_event_paging);
> +            if ( vm_event_check(d->vm_event_paging) &&
> +                 is_vm_event_domain_ring(d->vm_event_paging) )
> +                rc = vm_event_resume(d->vm_event_paging, NULL, 0);
>              break;
> 
>          default:
> @@ -786,7 +1153,30 @@ int vm_event_domctl(struct domain *d, struct
> xen_domctl_vm_event_op *vec,
>              break;
> 
>          case XEN_VM_EVENT_RESUME:
> -            rc = vm_event_resume(d->vm_event_monitor);
> +            if ( vm_event_check(d->vm_event_monitor) &&
> +                 is_vm_event_domain_ring(d->vm_event_monitor) )
> +                rc = vm_event_resume(d->vm_event_monitor, NULL, 0);
> +            break;
> +
> +        case XEN_VM_EVENT_GET_PORTS:
> +            if ( !vm_event_check(d->vm_event_monitor) )
> +                break;
> +
> +            if ( !is_vm_event_domain_ring(d->vm_event_monitor) )
> +            {
> +                struct vm_event_domain_channel *impl =
> to_vm_event_domain_channel(d->vm_event_monitor);
> +
> +                if ( copy_to_guest(vec->u.get_ports.sync,
> +                                   impl->xen_ports,
> +                                   d->max_vcpus) != 0 )
> +                {
> +                    rc = -EFAULT;
> +                    break;
> +                }
> +
> +                vec->u.get_ports.async = impl->xen_ports[d->max_vcpus];
> +                rc = 0;
> +            }
>              break;
> 
>          default:
> @@ -830,7 +1220,10 @@ int vm_event_domctl(struct domain *d, struct
> xen_domctl_vm_event_op *vec,
>              break;
> 
>          case XEN_VM_EVENT_RESUME:
> -            rc = vm_event_resume(d->vm_event_share);
> +            if ( vm_event_check(d->vm_event_monitor) &&
> +                 is_vm_event_domain_ring(d->vm_event_monitor) )
> +                rc = vm_event_resume(d->vm_event_share, NULL, 0);
> +            break;
> 
>          default:
>              rc = -ENOSYS;
> @@ -847,6 +1240,52 @@ int vm_event_domctl(struct domain *d, struct
> xen_domctl_vm_event_op *vec,
>      return rc;
>  }
> 
> +int vm_event_get_frames(struct domain *d, unsigned int id,
> +                        unsigned long frame, unsigned int nr_frames,
> +                        xen_pfn_t mfn_list[])
> +{
> +    int rc = 0, i, j;
> +    struct vm_event_domain **_ved;
> +    struct vm_event_domain_channel *impl;
> +    xen_event_channel_notification_t fn;
> +
> +    switch ( id )
> +    {
> +    case XEN_VM_EVENT_TYPE_MONITOR:
> +        /* domain_pause() not required here, see XSA-99 */
> +        rc = arch_monitor_init_domain(d);
> +        if ( rc )
> +            return rc;
> +        _ved = &d->vm_event_monitor;
> +        fn = monitor_notification;
> +        break;
> +
> +    default:
> +        return -ENOSYS;
> +    }
> +
> +    rc = vm_event_channel_enable(d, _ved, nr_frames, fn);
> +    if ( rc )
> +    {
> +        switch ( id )
> +        {
> +            case XEN_VM_EVENT_TYPE_MONITOR:
> +                arch_monitor_cleanup_domain(d);
> +                break;
> +        }
> +        return rc;
> +    }
> +
> +    impl = to_vm_event_domain_channel(*_ved);
> +    j = 0;
> +    for ( i = 0; i < impl->ring->nr_frames; i++ )
> +        mfn_list[j++] = mfn_x(impl->ring->mfn[i]);
> +    for ( i = 0; i < impl->channels->nr_frames; i++ )
> +        mfn_list[j++] = mfn_x(impl->channels->mfn[i]);
> +
> +    return rc;
> +}
> +
>  void vm_event_vcpu_pause(struct vcpu *v)
>  {
>      ASSERT(v == current);
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 26b1a55..78262a1 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -38,7 +38,7 @@
>  #include "hvm/save.h"
>  #include "memory.h"
> 
> -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000011
> +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> 
>  /*
>   * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> @@ -836,6 +836,7 @@ struct xen_domctl_gdbsx_domstatus {
>  #define XEN_VM_EVENT_ENABLE               0
>  #define XEN_VM_EVENT_DISABLE              1
>  #define XEN_VM_EVENT_RESUME               2
> +#define XEN_VM_EVENT_GET_PORTS            3
> 
>  /*
>   * Use for teardown/setup of helper<->hypervisor interface for paging,
> @@ -843,10 +844,26 @@ struct xen_domctl_gdbsx_domstatus {
>   */
>  /* XEN_DOMCTL_vm_event_op */
>  struct xen_domctl_vm_event_op {
> -    uint32_t        op;           /* XEN_VM_EVENT_* */
> -    uint32_t        type;         /* XEN_VM_EVENT_TYPE_* */
> +    /* IN: Xen vm_event opcode (XEN_VM_EVENT_*) */
> +    uint32_t            op;
> +    /* IN: Xen vm event ring type (XEN_VM_EVENT_TYPE_*) */
> +    uint32_t            type;
> 
> -    uint32_t        port;         /* OUT: event channel for ring */
> +    union {
> +        struct {
> +            /* OUT: remote port for event channel ring */
> +            uint32_t    port;
> +        } enable;
> +        struct {
> +            /* OUT: remote port for the async event channel ring */
> +            uint32_t    async;
> +            /*
> +             * OUT: remote ports for the sync event vm_event channels
> +             * The number for ports will be equal with the vcpu count.
> +             */
> +            XEN_GUEST_HANDLE_64(uint32) sync;
> +        } get_ports;
> +    } u;
>  };
> 
>  /*
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 8638023..cfd280d 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -612,6 +612,7 @@ struct xen_mem_acquire_resource {
> 
>  #define XENMEM_resource_ioreq_server 0
>  #define XENMEM_resource_grant_table 1
> +#define XENMEM_resource_vm_event 2
> 
>      /*
>       * IN - a type-specific resource identifier, which must be zero
> @@ -619,6 +620,7 @@ struct xen_mem_acquire_resource {
>       *
>       * type == XENMEM_resource_ioreq_server -> id == ioreq server id
>       * type == XENMEM_resource_grant_table -> id defined below
> +     * type == XENMEM_resource_vm_event -> id == vm_event type
>       */
>      uint32_t id;
> 
> diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
> index b2bafc0..499fbbc 100644
> --- a/xen/include/public/vm_event.h
> +++ b/xen/include/public/vm_event.h
> @@ -388,6 +388,21 @@ typedef struct vm_event_st {
> 
>  DEFINE_RING_TYPES(vm_event, vm_event_request_t, vm_event_response_t);
> 
> +struct vm_event_slot
> +{
> +    uint32_t state;
> +    union {
> +        vm_event_request_t req;
> +        vm_event_response_t rsp;
> +    } u;
> +};
> +
> +enum vm_event_slot_state {
> +    VM_EVENT_SLOT_STATE_IDLE,   /* no contents */
> +    VM_EVENT_SLOT_STATE_SUBMIT, /* request ready */
> +    VM_EVENT_SLOT_STATE_FINISH, /* response ready */
> +};
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  #endif /* _XEN_PUBLIC_VM_EVENT_H */
> 
> diff --git a/xen/include/xen/vm_event.h b/xen/include/xen/vm_event.h
> index a5c82d6..d4bd184 100644
> --- a/xen/include/xen/vm_event.h
> +++ b/xen/include/xen/vm_event.h
> @@ -64,6 +64,10 @@ void vm_event_put_request(struct vm_event_domain *ved,
>  int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec,
>                      XEN_GUEST_HANDLE_PARAM(void) u_domctl);
> 
> +int vm_event_get_frames(struct domain *d, unsigned int id,
> +                        unsigned long frame, unsigned int nr_frames,
> +                        xen_pfn_t mfn_list[]);
> +
>  void vm_event_vcpu_pause(struct vcpu *v);
>  void vm_event_vcpu_unpause(struct vcpu *v);
> 
> --
> 2.7.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 3/6] vm_event: Refactor vm_event_domain implementation
  2018-12-19 22:26   ` Tamas K Lengyel
@ 2018-12-20 12:39     ` Petre Ovidiu PIRCALABU
  0 siblings, 0 replies; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2018-12-20 12:39 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Wed, 2018-12-19 at 15:26 -0700, Tamas K Lengyel wrote:
> On Wed, Dec 19, 2018 at 11:52 AM Petre Pircalabu
> <ppircalabu@bitdefender.com> wrote:
> > 
> > Decouple the VM Event interface from the ring implementation.
> 
> This will need a much better description. There is also a lot of
> churn
> that is mostly just mechanical in this patch but makes reviewing it
> hard. Perhaps functional changes and mechanical changes could be
> split
> to two patches?
This was a auxiliary patch to help introduce the new
vm_event_domain_channel by making the vm_event interface implementation
agnostic. I will try splitting it in order to make it more readable.

> 
> > +struct vm_event_domain
> > +{
> > +    /* VM event ops */
> > +    bool (*check)(struct vm_event_domain *ved);
> > +    int (*claim_slot)(struct vm_event_domain *ved, bool
> > allow_sleep);
> > +    void (*release_slot)(struct vm_event_domain *ved);
> > +    void (*put_request)(struct vm_event_domain *ved,
> > vm_event_request_t *req);
> > +    int (*get_response)(struct vm_event_domain *ved,
> > vm_event_response_t *rsp);
> > +    int (*disable)(struct vm_event_domain **_ved);
> 
> I don't see (yet) the reason why having these pointers stored in the
> struct is needed. Are there going to be different implementations for
> these? If so, need to explain that in the commit message.
Yes, this functions will be re-implemented for the
vm_event_domain_channel. I will add an explanation in the commit
message.
> 
> > +
> > +    /* The domain associated with the VM event */
> > +    struct domain *d;
> > +
> > +    /* ring lock */
> > +    spinlock_t lock;
> > +};
> > +
> > +bool vm_event_check(struct vm_event_domain *ved)
> > +{
> > +    return (ved && ved->check(ved));
> > +}
> > 
> > -    return rc;
> > +/* VM event domain ring implementation */
> > +struct vm_event_domain_ring
> > +{
> > +    /* VM event domain */
> > +    struct vm_event_domain ved;
> 
> Why is this not a pointer instead? Does each vm_event_domain_ring
> really need a separate copy of vm_event_domain?

The vm_event_domain structure contains the common attributes for each
domain implementation and it acts more like a "base class".
It must be the first variable in the "derived" structure, so it can be
passed to the implementation specific functions and cast accordingly.
Other than the function pointers, the domain reference and the lock are
separate for each domain.

Also, in order to support legacy applications is better to have the
function interface on a per-domain basis. The only optimization I can
think of is grouping them in a separate "ops" structure:
   struct vm_event_ops ring_ops = {...}
   struct vm_event_ops channel_ops = {...}
and changing the way the functions are called:
   ved->check(ved) ==> ved->ops.check(ved)
Do you favor this approach?

//Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2018-12-20 10:48   ` Petre Ovidiu PIRCALABU
@ 2018-12-20 14:08     ` Tamas K Lengyel
  0 siblings, 0 replies; 50+ messages in thread
From: Tamas K Lengyel @ 2018-12-20 14:08 UTC (permalink / raw)
  To: Petre Ovidiu PIRCALABU
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Thu, Dec 20, 2018 at 3:48 AM Petre Ovidiu PIRCALABU
<ppircalabu@bitdefender.com> wrote:
>
> On Wed, 2018-12-19 at 15:33 -0700, Tamas K Lengyel wrote:
> > On Wed, Dec 19, 2018 at 11:52 AM Petre Pircalabu
> > <ppircalabu@bitdefender.com> wrote:
> > >
> > > This patchset is a rework of the "multi-page ring buffer" for
> > > vm_events
> > > patch based on Andrew Cooper's comments.
> > > For synchronous vm_events the ring waitqueue logic was unnecessary
> > > as the
> > > vcpu sending the request was blocked until a response was received.
> > > To simplify the request/response mechanism, an array of slotted
> > > channels
> > > was created, one per vcpu. Each vcpu puts the request in the
> > > corresponding slot and blocks until the response is received.
> > >
> > > I'm sending this patch as a RFC because, while I'm still working on
> > > way to
> > > measure the overall performance improvement, your feedback would be
> > > a great
> > > assistance.
> >
> > Generally speaking this approach is OK, but I'm concerned that we
> > will
> > eventually run into the same problem that brought up the idea of
> > using
> > multi-page rings: vm_event structures that are larger then a page.
> > Right now this series adds a ring for each vCPU, which does mitigate
> > some of the bottleneck, but it does not really address the root
> > cause.
> > It also adds significant complexity as the userspace side now has to
> > map in multiple rings, each with its own event channel and polling
> > requirements.
> >
> > Tamas
> The memory for the vm_event "rings" (actually for synchronous vm_event
> just an array of vm_event_slot structures ( state + vm_event_request /
> vm_event_response) is allocated directly from domheap and spans over as
> many pages as necessary.

Ah, OK, I missed that. In that case that is fine :)

> Regarding the userspace complexity, unfortunately I haven't had a
> better idea (but I'm open to suggestions).
> In order to have a lock-free mechanism to access the vm_event data,
> each vcpu should access only its own slot (referenced by vcpu_id).
> I have used the "one event channel per slot + one for the async ring"
> approach, because, to my understanding, the only additional information
> an event channel can carry is the vcpu on which is triggered.

Right, alternative would be to have a single event channel and then
the userspace has to check each slot manually to see which was
updated. Not really ideal either, so I would stick with the current
approach with having multiple event channels.

Thanks!
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-20 12:05   ` Paul Durrant
@ 2018-12-20 14:25     ` Petre Ovidiu PIRCALABU
  2018-12-20 14:28       ` Paul Durrant
  2019-01-08 14:49     ` Petre Ovidiu PIRCALABU
  2019-01-10 15:30     ` Petre Ovidiu PIRCALABU
  2 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2018-12-20 14:25 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
> > The memory for the asynchronous ring and the synchronous channels
> > will
> > be allocated from domheap and mapped to the controlling domain
> > using the
> > foreignmemory_map_resource interface. Unlike the current
> > implementation,
> > the allocated pages are not part of the target DomU, so they will
> > not be
> > reclaimed when the vm_event domain is disabled.
> 
> Why re-invent the wheel here? The ioreq infrastructure already does
> pretty much everything you need AFAICT.
> 
>   Paul

I wanted preseve as much as possible from the existing vm_event DOMCTL
interface and add only the necessary code to allocate and map the
vm_event_pages.
Also, to my knowledge, the ioreq server is only supported for x86 hvm
targets. I didn't want to add an extra limitation to the vm_event
system.
//Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-20 14:25     ` Petre Ovidiu PIRCALABU
@ 2018-12-20 14:28       ` Paul Durrant
  2018-12-20 15:03         ` Jan Beulich
                           ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Paul Durrant @ 2018-12-20 14:28 UTC (permalink / raw)
  To: 'Petre Ovidiu PIRCALABU', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

> -----Original Message-----
> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
> Sent: 20 December 2018 14:26
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
> for sync requests.
> 
> On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
> > > The memory for the asynchronous ring and the synchronous channels
> > > will
> > > be allocated from domheap and mapped to the controlling domain
> > > using the
> > > foreignmemory_map_resource interface. Unlike the current
> > > implementation,
> > > the allocated pages are not part of the target DomU, so they will
> > > not be
> > > reclaimed when the vm_event domain is disabled.
> >
> > Why re-invent the wheel here? The ioreq infrastructure already does
> > pretty much everything you need AFAICT.
> >
> >   Paul
> 
> I wanted preseve as much as possible from the existing vm_event DOMCTL
> interface and add only the necessary code to allocate and map the
> vm_event_pages.

That means we have two subsystems duplicating a lot of functionality though. It would be much better to use ioreq server if possible than provide a compatibility interface via DOMCTL.

> Also, to my knowledge, the ioreq server is only supported for x86 hvm
> targets. I didn't want to add an extra limitation to the vm_event
> system.

I believe Julien is already porting it to ARM.

  Paul

> //Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-20 14:28       ` Paul Durrant
@ 2018-12-20 15:03         ` Jan Beulich
  2018-12-24 10:37         ` Julien Grall
  2019-01-09 16:21         ` Razvan Cojocaru
  2 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2018-12-20 15:03 UTC (permalink / raw)
  To: Petre Pircalabu, Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim Deegan, george.dunlap,
	Julien Grall, Tamas K Lengyel, xen-devel, IanJackson,
	Roger Pau Monne

>>> On 20.12.18 at 15:28, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
>> Sent: 20 December 2018 14:26
>> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org 
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
>> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
>> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
>> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
>> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
>> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
>> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
>> for sync requests.
>> 
>> On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
>> > > The memory for the asynchronous ring and the synchronous channels
>> > > will
>> > > be allocated from domheap and mapped to the controlling domain
>> > > using the
>> > > foreignmemory_map_resource interface. Unlike the current
>> > > implementation,
>> > > the allocated pages are not part of the target DomU, so they will
>> > > not be
>> > > reclaimed when the vm_event domain is disabled.
>> >
>> > Why re-invent the wheel here? The ioreq infrastructure already does
>> > pretty much everything you need AFAICT.
>> >
>> >   Paul
>> 
>> I wanted preseve as much as possible from the existing vm_event DOMCTL
>> interface and add only the necessary code to allocate and map the
>> vm_event_pages.
> 
> That means we have two subsystems duplicating a lot of functionality though. 
> It would be much better to use ioreq server if possible than provide a 
> compatibility interface via DOMCTL.

+1 from me, fwiw.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-20 14:28       ` Paul Durrant
  2018-12-20 15:03         ` Jan Beulich
@ 2018-12-24 10:37         ` Julien Grall
  2019-01-09 16:21         ` Razvan Cojocaru
  2 siblings, 0 replies; 50+ messages in thread
From: Julien Grall @ 2018-12-24 10:37 UTC (permalink / raw)
  To: Paul Durrant, 'Petre Ovidiu PIRCALABU', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Tamas K Lengyel, Jan Beulich, Ian Jackson,
	Roger Pau Monne

Hi Paul,

On 12/20/18 2:28 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
>> Sent: 20 December 2018 14:26
>> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
>> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
>> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
>> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
>> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
>> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
>> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
>> for sync requests.
>>
>> On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
>>>> The memory for the asynchronous ring and the synchronous channels
>>>> will
>>>> be allocated from domheap and mapped to the controlling domain
>>>> using the
>>>> foreignmemory_map_resource interface. Unlike the current
>>>> implementation,
>>>> the allocated pages are not part of the target DomU, so they will
>>>> not be
>>>> reclaimed when the vm_event domain is disabled.
>>>
>>> Why re-invent the wheel here? The ioreq infrastructure already does
>>> pretty much everything you need AFAICT.
>>>
>>>    Paul
>>
>> I wanted preseve as much as possible from the existing vm_event DOMCTL
>> interface and add only the necessary code to allocate and map the
>> vm_event_pages.
> 
> That means we have two subsystems duplicating a lot of functionality though. It would be much better to use ioreq server if possible than provide a compatibility interface via DOMCTL.
> 
>> Also, to my knowledge, the ioreq server is only supported for x86 hvm
>> targets. I didn't want to add an extra limitation to the vm_event
>> system.
> 
> I believe Julien is already porting it to ARM.

FWIW, yes I have a port of ioreq for Arm. Still cleaning-up the code.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type
  2018-12-19 18:52 ` [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type Petre Pircalabu
  2018-12-19 22:13   ` Tamas K Lengyel
@ 2019-01-02 11:11   ` Wei Liu
  2019-01-08 15:01     ` Petre Ovidiu PIRCALABU
  2019-01-08 16:25   ` Jan Beulich
  2 siblings, 1 reply; 50+ messages in thread
From: Wei Liu @ 2019-01-02 11:11 UTC (permalink / raw)
  To: Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich,
	xen-devel

On Wed, Dec 19, 2018 at 08:52:05PM +0200, Petre Pircalabu wrote:
> Define the type for each of the supported vm_event rings (paging,
> monitor and sharing) and replace the ring param field with this type.
> 
> Replace XEN_DOMCTL_VM_EVENT_OP_ occurrences with their corresponding
> XEN_VM_EVENT_TYPE_ counterpart.
> 
> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> ---
>  tools/libxc/xc_mem_paging.c |  6 ++--
>  tools/libxc/xc_monitor.c    |  6 ++--
>  tools/libxc/xc_private.h    |  8 +++---
>  tools/libxc/xc_vm_event.c   | 68 ++++++++++++++++++++++-----------------------
>  xen/common/vm_event.c       | 12 ++++----
>  xen/include/public/domctl.h | 45 ++++++++++++++++--------------

You also need to change XEN_DOMCTL_INTERFACE_VERSION in this patch.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-20 12:05   ` Paul Durrant
  2018-12-20 14:25     ` Petre Ovidiu PIRCALABU
@ 2019-01-08 14:49     ` Petre Ovidiu PIRCALABU
  2019-01-08 15:08       ` Paul Durrant
  2019-01-10 15:30     ` Petre Ovidiu PIRCALABU
  2 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-01-08 14:49 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > 
> > The memory for the asynchronous ring and the synchronous channels
> > will
> > be allocated from domheap and mapped to the controlling domain
> > using the
> > foreignmemory_map_resource interface. Unlike the current
> > implementation,
> > the allocated pages are not part of the target DomU, so they will
> > not be
> > reclaimed when the vm_event domain is disabled.
> 
> Why re-invent the wheel here? The ioreq infrastructure already does
> pretty much everything you need AFAICT.
> 
>   Paul
> 

To my understanding, the current implementation of the ioreq server is
limited to just 2 allocated pages (ioreq and bufioreq). 
The main goal of the new vm_event implementation proposal is to be more
flexible in respect of the number of pages necessary for the
request/response buffers ( the slotted structure which holds one
request/response per vcpu or the ring spanning multiple pages in the
previous proposal).
Is it feasible to extend the current ioreq server implementation
allocate dynamically a specific number of pages?

Also, for the current vm_event implementation, other than using the
hvm_params to specify the ring page gfn, I couldn't see any reason why
it should be limited to HVM guests only. Is it feasible to assume the
vm_event mechanism will not ever be extended to PV guests?

Many thanks,
Petre


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type
  2019-01-02 11:11   ` Wei Liu
@ 2019-01-08 15:01     ` Petre Ovidiu PIRCALABU
  2019-01-25 14:16       ` Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-01-08 15:01 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, Razvan Cojocaru, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Tamas K Lengyel, Jan Beulich, xen-devel

On Wed, 2019-01-02 at 11:11 +0000, Wei Liu wrote:
> On Wed, Dec 19, 2018 at 08:52:05PM +0200, Petre Pircalabu wrote:
> > Define the type for each of the supported vm_event rings (paging,
> > monitor and sharing) and replace the ring param field with this
> > type.
> > 
> > Replace XEN_DOMCTL_VM_EVENT_OP_ occurrences with their
> > corresponding
> > XEN_VM_EVENT_TYPE_ counterpart.
> > 
> > Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> > ---
> >  tools/libxc/xc_mem_paging.c |  6 ++--
> >  tools/libxc/xc_monitor.c    |  6 ++--
> >  tools/libxc/xc_private.h    |  8 +++---
> >  tools/libxc/xc_vm_event.c   | 68 ++++++++++++++++++++++-----------
> > ------------
> >  xen/common/vm_event.c       | 12 ++++----
> >  xen/include/public/domctl.h | 45 ++++++++++++++++--------------
> 
> You also need to change XEN_DOMCTL_INTERFACE_VERSION in this patch.
> 
> Wei.

XEN_DOMCTL_INTERFACE_VERSION was incremented in another patch of the
series (vm_event: Use slotted channels for sync requests). Is it
necessary to have it incremented twice? 

Many thanks,
Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2019-01-08 14:49     ` Petre Ovidiu PIRCALABU
@ 2019-01-08 15:08       ` Paul Durrant
  2019-01-08 16:13         ` Petre Ovidiu PIRCALABU
  0 siblings, 1 reply; 50+ messages in thread
From: Paul Durrant @ 2019-01-08 15:08 UTC (permalink / raw)
  To: 'Petre Ovidiu PIRCALABU', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

> -----Original Message-----
> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
> Sent: 08 January 2019 14:50
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
> for sync requests.
> 
> On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > >
> > > The memory for the asynchronous ring and the synchronous channels
> > > will
> > > be allocated from domheap and mapped to the controlling domain
> > > using the
> > > foreignmemory_map_resource interface. Unlike the current
> > > implementation,
> > > the allocated pages are not part of the target DomU, so they will
> > > not be
> > > reclaimed when the vm_event domain is disabled.
> >
> > Why re-invent the wheel here? The ioreq infrastructure already does
> > pretty much everything you need AFAICT.
> >
> >   Paul
> >
> 
> To my understanding, the current implementation of the ioreq server is
> limited to just 2 allocated pages (ioreq and bufioreq)

The current implementation is, but the direct resource mapping hypercall removed any limit from the API. It should be feasible to extend to as many pages as is needed, hence:

#define XENMEM_resource_ioreq_server_frame_ioreq(n) (1 + (n))

...in the public header.

> The main goal of the new vm_event implementation proposal is to be more
> flexible in respect of the number of pages necessary for the
> request/response buffers ( the slotted structure which holds one
> request/response per vcpu or the ring spanning multiple pages in the
> previous proposal).
> Is it feasible to extend the current ioreq server implementation
> allocate dynamically a specific number of pages?

Yes, absolutely. At the moment the single page for synchronous emulation requests limits HVM guests to 128 vcpus. When we want to go past this limit then multiple pages will be necessary... which is why the hypercall was designed the way it is.

> 
> Also, for the current vm_event implementation, other than using the
> hvm_params to specify the ring page gfn, I couldn't see any reason why
> it should be limited to HVM guests only. Is it feasible to assume the
> vm_event mechanism will not ever be extended to PV guests?
> 

Unless you limit things to HVM (and PVH) guests then I guess you'll run into the same page ownership problems that ioreq server ran into (due to a PV guest being allowed to map any page assigned to it... including those that may be 'resources' it should not be able to see directly). Any particular reason why you'd definitely want to support pure PV guests?

  Paul

> Many thanks,
> Petre
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2019-01-08 15:08       ` Paul Durrant
@ 2019-01-08 16:13         ` Petre Ovidiu PIRCALABU
  2019-01-08 16:25           ` Paul Durrant
  0 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-01-08 16:13 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

On Tue, 2019-01-08 at 15:08 +0000, Paul Durrant wrote:
> > 
> > 
> > Also, for the current vm_event implementation, other than using the
> > hvm_params to specify the ring page gfn, I couldn't see any reason
> > why
> > it should be limited to HVM guests only. Is it feasible to assume
> > the
> > vm_event mechanism will not ever be extended to PV guests?
> > 
> 
> Unless you limit things to HVM (and PVH) guests then I guess you'll
> run into the same page ownership problems that ioreq server ran into
> (due to a PV guest being allowed to map any page assigned to it...
> including those that may be 'resources' it should not be able to see
> directly). Any particular reason why you'd definitely want to support
> pure PV guests?
> 
>   Paul

No, but at this point I just want to make sure I'm not limiting the
vm_events usage.

Many thanks,
Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2019-01-08 16:13         ` Petre Ovidiu PIRCALABU
@ 2019-01-08 16:25           ` Paul Durrant
  0 siblings, 0 replies; 50+ messages in thread
From: Paul Durrant @ 2019-01-08 16:25 UTC (permalink / raw)
  To: 'Petre Ovidiu PIRCALABU', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

> -----Original Message-----
> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
> Sent: 08 January 2019 16:14
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
> for sync requests.
> 
> On Tue, 2019-01-08 at 15:08 +0000, Paul Durrant wrote:
> > >
> > >
> > > Also, for the current vm_event implementation, other than using the
> > > hvm_params to specify the ring page gfn, I couldn't see any reason
> > > why
> > > it should be limited to HVM guests only. Is it feasible to assume
> > > the
> > > vm_event mechanism will not ever be extended to PV guests?
> > >
> >
> > Unless you limit things to HVM (and PVH) guests then I guess you'll
> > run into the same page ownership problems that ioreq server ran into
> > (due to a PV guest being allowed to map any page assigned to it...
> > including those that may be 'resources' it should not be able to see
> > directly). Any particular reason why you'd definitely want to support
> > pure PV guests?
> >
> >   Paul
> 
> No, but at this point I just want to make sure I'm not limiting the
> vm_events usage.

Ok, but given that a framework (i.e. ioreq) exists for HVM/PVH guests then IMO it makes sense to target those guests first and then figure out how to make things work for PV later if need be.

  Paul

> 
> Many thanks,
> Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type
  2018-12-19 18:52 ` [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type Petre Pircalabu
  2018-12-19 22:13   ` Tamas K Lengyel
  2019-01-02 11:11   ` Wei Liu
@ 2019-01-08 16:25   ` Jan Beulich
  2019-02-11 12:30     ` Petre Ovidiu PIRCALABU
  2 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2019-01-08 16:25 UTC (permalink / raw)
  To: Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, xen-devel

>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
> @@ -796,7 +787,7 @@ struct xen_domctl_gdbsx_domstatus {
>   * EXDEV  - guest has PoD enabled
>   * EBUSY  - guest has or had paging enabled, ring buffer still active
>   */
> -#define XEN_DOMCTL_VM_EVENT_OP_PAGING            1
> +#define XEN_VM_EVENT_TYPE_PAGING         1

Assuming the renaming is for the purpose of re-use elsewhere,
I think these should be moved to vm_event.h then.

> + * This hypercall allows one to control the vm_event rings (enable/disable),
> + * as well as to signal to the hypervisor to pull responses (resume) and
> + * retrieve the event channel from the given ring.
> + */
> +#define XEN_VM_EVENT_ENABLE               0
> +#define XEN_VM_EVENT_DISABLE              1
> +#define XEN_VM_EVENT_RESUME               2

Same perhaps for these, albeit here you just move them down a
few lines.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2018-12-19 18:52 ` [RFC PATCH 6/6] xc_version: add vm_event interface version Petre Pircalabu
@ 2019-01-08 16:27   ` Jan Beulich
  2019-01-08 16:37     ` Razvan Cojocaru
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2019-01-08 16:27 UTC (permalink / raw)
  To: Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>

An empty description is not helpful. The immediate question is: Why?
We don't do this for other interface versions. I'm unconvinced a
special purpose piece of information like this one belongs into the
rather generic version hypercall.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-01-08 16:27   ` Jan Beulich
@ 2019-01-08 16:37     ` Razvan Cojocaru
  2019-01-08 16:47       ` Jan Beulich
  2019-02-12 18:13       ` Tamas K Lengyel
  0 siblings, 2 replies; 50+ messages in thread
From: Razvan Cojocaru @ 2019-01-08 16:37 UTC (permalink / raw)
  To: Jan Beulich, Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

On 1/8/19 6:27 PM, Jan Beulich wrote:
>>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
>> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> 
> An empty description is not helpful. The immediate question is: Why?
> We don't do this for other interface versions. I'm unconvinced a
> special purpose piece of information like this one belongs into the
> rather generic version hypercall.

For an introspection application meant to be deployed on several Xen 
versions without recompiling, it is important to be able to decide at 
runtime what size and layout the vm_event struct has.

Currently this can somewhat be done by associating the current version 
with the vm_event version, but that is not ideal for obvious reasons. 
Reading the vm_event version from an actual vm_event is also out of the 
question, because in order to be able to receive the first vm_event we 
have to set the ring buffer up, and that requires knowledge of the size 
of the vm_event. So a run-time mechanism for querying the vm_event 
version is needed.

We just thought that this was the most flexible place to add it.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-01-08 16:37     ` Razvan Cojocaru
@ 2019-01-08 16:47       ` Jan Beulich
  2019-01-09  9:11         ` Razvan Cojocaru
  2019-02-12 18:13       ` Tamas K Lengyel
  1 sibling, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2019-01-08 16:47 UTC (permalink / raw)
  To: Petre Pircalabu, Razvan Cojocaru
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 08.01.19 at 17:37, <rcojocaru@bitdefender.com> wrote:
> On 1/8/19 6:27 PM, Jan Beulich wrote:
>>>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
>>> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
>> 
>> An empty description is not helpful. The immediate question is: Why?
>> We don't do this for other interface versions. I'm unconvinced a
>> special purpose piece of information like this one belongs into the
>> rather generic version hypercall.
> 
> For an introspection application meant to be deployed on several Xen 
> versions without recompiling, it is important to be able to decide at 
> runtime what size and layout the vm_event struct has.
> 
> Currently this can somewhat be done by associating the current version 
> with the vm_event version, but that is not ideal for obvious reasons. 
> Reading the vm_event version from an actual vm_event is also out of the 
> question, because in order to be able to receive the first vm_event we 
> have to set the ring buffer up, and that requires knowledge of the size 
> of the vm_event. So a run-time mechanism for querying the vm_event 
> version is needed.
> 
> We just thought that this was the most flexible place to add it.

How about a new XEN_DOMCTL_VM_EVENT_GET_VERSION?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-01-08 16:47       ` Jan Beulich
@ 2019-01-09  9:11         ` Razvan Cojocaru
  2019-02-12 16:57           ` Petre Ovidiu PIRCALABU
  0 siblings, 1 reply; 50+ messages in thread
From: Razvan Cojocaru @ 2019-01-09  9:11 UTC (permalink / raw)
  To: Jan Beulich, Petre Pircalabu
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel



On 1/8/19 6:47 PM, Jan Beulich wrote:
>>>> On 08.01.19 at 17:37, <rcojocaru@bitdefender.com> wrote:
>> On 1/8/19 6:27 PM, Jan Beulich wrote:
>>>>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
>>>> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
>>>
>>> An empty description is not helpful. The immediate question is: Why?
>>> We don't do this for other interface versions. I'm unconvinced a
>>> special purpose piece of information like this one belongs into the
>>> rather generic version hypercall.
>>
>> For an introspection application meant to be deployed on several Xen
>> versions without recompiling, it is important to be able to decide at
>> runtime what size and layout the vm_event struct has.
>>
>> Currently this can somewhat be done by associating the current version
>> with the vm_event version, but that is not ideal for obvious reasons.
>> Reading the vm_event version from an actual vm_event is also out of the
>> question, because in order to be able to receive the first vm_event we
>> have to set the ring buffer up, and that requires knowledge of the size
>> of the vm_event. So a run-time mechanism for querying the vm_event
>> version is needed.
>>
>> We just thought that this was the most flexible place to add it.
> 
> How about a new XEN_DOMCTL_VM_EVENT_GET_VERSION?

That would work as well, we just thought this was the least intrusive 
and most extensible way to do it (other queries could be added similarly 
in the future, without needing a new DOMCTL / libxc toolstack 
modifications).


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-20 14:28       ` Paul Durrant
  2018-12-20 15:03         ` Jan Beulich
  2018-12-24 10:37         ` Julien Grall
@ 2019-01-09 16:21         ` Razvan Cojocaru
  2019-01-10  9:58           ` Paul Durrant
  2 siblings, 1 reply; 50+ messages in thread
From: Razvan Cojocaru @ 2019-01-09 16:21 UTC (permalink / raw)
  To: Paul Durrant, 'Petre Ovidiu PIRCALABU',
	xen-devel, Andrew Cooper, Tamas K Lengyel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, Tim (Xen.org),
	George Dunlap, Julien Grall, Jan Beulich, Ian Jackson,
	Roger Pau Monne

On 12/20/18 4:28 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
>> Sent: 20 December 2018 14:26
>> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
>> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
>> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
>> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
>> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
>> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
>> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
>> for sync requests.
>>
>> On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
>>>> The memory for the asynchronous ring and the synchronous channels
>>>> will
>>>> be allocated from domheap and mapped to the controlling domain
>>>> using the
>>>> foreignmemory_map_resource interface. Unlike the current
>>>> implementation,
>>>> the allocated pages are not part of the target DomU, so they will
>>>> not be
>>>> reclaimed when the vm_event domain is disabled.
>>>
>>> Why re-invent the wheel here? The ioreq infrastructure already does
>>> pretty much everything you need AFAICT.
>>>
>>>    Paul
>>
>> I wanted preseve as much as possible from the existing vm_event DOMCTL
>> interface and add only the necessary code to allocate and map the
>> vm_event_pages.
> 
> That means we have two subsystems duplicating a lot of functionality though. It would be much better to use ioreq server if possible than provide a compatibility interface via DOMCTL.

Just to clarify the compatibility issue: there's a third element between 
Xen and the introspection application, the Linux kernel which needs to 
be fairly recent for the whole ioreq machinery to work. The quemu code 
also seems to fallback to the old way of working if that's the case.

This means that there's a choice to be made here: either we keep 
backwards compatibility with the old vm_event interface (in which case 
we can't drop the waitqueue code), or we switch to the new one and leave 
older setups in the dust (but there's less code duplication and we can 
get rid of the waitqueue code).

In any event, it's not very clear (to me, at least) how the envisioned 
ioreq replacement should work. I assume we're meant to use the whole 
infrastructure (as opposed to what we're now doing, which is to only use 
the map-hypervisor-memory part), i.e. both mapping and signaling. Could 
we discuss this in more detail? Are there any docs on this or ioreq 
minimal clients (like xen-access.c is for vm_event) we might use?


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2019-01-09 16:21         ` Razvan Cojocaru
@ 2019-01-10  9:58           ` Paul Durrant
  2019-01-10 15:28             ` Razvan Cojocaru
  0 siblings, 1 reply; 50+ messages in thread
From: Paul Durrant @ 2019-01-10  9:58 UTC (permalink / raw)
  To: 'Razvan Cojocaru', 'Petre Ovidiu PIRCALABU',
	xen-devel, Andrew Cooper, Tamas K Lengyel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, Tim (Xen.org),
	George Dunlap, Julien Grall, Jan Beulich, Ian Jackson,
	Roger Pau Monne

> -----Original Message-----

> >>>
> >>> Why re-invent the wheel here? The ioreq infrastructure already does
> >>> pretty much everything you need AFAICT.
> >>>
> >>>    Paul
> >>
> >> I wanted preseve as much as possible from the existing vm_event DOMCTL
> >> interface and add only the necessary code to allocate and map the
> >> vm_event_pages.
> >
> > That means we have two subsystems duplicating a lot of functionality
> though. It would be much better to use ioreq server if possible than
> provide a compatibility interface via DOMCTL.
> 
> Just to clarify the compatibility issue: there's a third element between
> Xen and the introspection application, the Linux kernel which needs to
> be fairly recent for the whole ioreq machinery to work. The qemu code
> also seems to fallback to the old way of working if that's the case.
> 

Tht'a corrent. For IOREQ server there is a fall-back mechanism when privcmd doesn't support resource mapping.

> This means that there's a choice to be made here: either we keep
> backwards compatibility with the old vm_event interface (in which case
> we can't drop the waitqueue code), or we switch to the new one and leave
> older setups in the dust (but there's less code duplication and we can
> get rid of the waitqueue code).
> 

I don't know what your compatibility model is. QEMU needs to maintain compatibility across various different versions of Xen and Linux so there are many shims and much compat code. You may not need this.

> In any event, it's not very clear (to me, at least) how the envisioned
> ioreq replacement should work. I assume we're meant to use the whole
> infrastructure (as opposed to what we're now doing, which is to only use
> the map-hypervisor-memory part), i.e. both mapping and signaling. Could
> we discuss this in more detail? Are there any docs on this or ioreq
> minimal clients (like xen-access.c is for vm_event) we might use?
> 

I don't know how much of the infrastructure will be re-usable for you. Resource mapping itself is supposed to be generic, not specific to IOREQ server. Indeed it already supports grant table mapping too. So IMO you should at least expose your shared pages using this mechanism.

It would be nice if you could also re-use ioreqs (and bufioreqs) for sending your data but they may well be a poor fit... but you could probably cut'n'paste some of the init and teardown code to set up your shared pages.

  Paul

> 
> Thanks,
> Razvan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2019-01-10  9:58           ` Paul Durrant
@ 2019-01-10 15:28             ` Razvan Cojocaru
  0 siblings, 0 replies; 50+ messages in thread
From: Razvan Cojocaru @ 2019-01-10 15:28 UTC (permalink / raw)
  To: Paul Durrant, 'Petre Ovidiu PIRCALABU',
	xen-devel, Andrew Cooper, Tamas K Lengyel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, Tim (Xen.org),
	George Dunlap, Julien Grall, Jan Beulich, Ian Jackson,
	Roger Pau Monne

On 1/10/19 11:58 AM, Paul Durrant wrote:
>> -----Original Message-----
> 
>>>>>
>>>>> Why re-invent the wheel here? The ioreq infrastructure already does
>>>>> pretty much everything you need AFAICT.
>>>>>
>>>>>     Paul
>>>>
>>>> I wanted preseve as much as possible from the existing vm_event DOMCTL
>>>> interface and add only the necessary code to allocate and map the
>>>> vm_event_pages.
>>>
>>> That means we have two subsystems duplicating a lot of functionality
>> though. It would be much better to use ioreq server if possible than
>> provide a compatibility interface via DOMCTL.
>>
>> Just to clarify the compatibility issue: there's a third element between
>> Xen and the introspection application, the Linux kernel which needs to
>> be fairly recent for the whole ioreq machinery to work. The qemu code
>> also seems to fallback to the old way of working if that's the case.
>>
> 
> Tht'a corrent. For IOREQ server there is a fall-back mechanism when privcmd doesn't support resource mapping.
> 
>> This means that there's a choice to be made here: either we keep
>> backwards compatibility with the old vm_event interface (in which case
>> we can't drop the waitqueue code), or we switch to the new one and leave
>> older setups in the dust (but there's less code duplication and we can
>> get rid of the waitqueue code).
>>
> 
> I don't know what your compatibility model is. QEMU needs to maintain compatibility across various different versions of Xen and Linux so there are many shims and much compat code. You may not need this.

Our current model is: deploy a special guest (that we call a SVA - short 
for security virtual appliance), with its own kernel and applications, 
that for all intents and purposes will act dom0-like.

So in that scenario, we control the guest kernel so backwards 
compatibility for the case where the kernel does not support the proper 
ioctl is not a priority. That said, it might very well be an issue for 
someone, and we'd like to be well-behaved citizens and not inconvenience 
other vm_event consumers. Tamas, is this something you'd be concerned about?

What we do care about is being able to fallback in the case where the 
host hypervisor does not know anything about the new ioreq 
infrastructure. IOW, nobody can stop a client from running a Xen 
4.7-based XenServer on top of which our introspection guest will not be 
able to use the new ioreq code even if it's using the latest kernel. But 
that can be done at application level and would not require 
hypervisor-level backwards compatibility support (whereas in the first 
case - an old kernel - it would).

On top of all of this there's Andrew's concern of being able to get rid 
of the current vm_event waitqueue code that's making migration brittle.

So, if I understand the situation correctly, we need to negotiate the 
following:

1. Should we try to switch to the ioreq infrastructure for vm_event or 
use our custom one? If I'm remembering things correctly, Paul and Jan 
are for it, Andrew is somewhat against it, Tamas has not expressed a 
preference.

2. However we approach the new code, should we or should we not also 
provide a backwards compatibility layer in the hypervisor? We don't need 
it, but somebody might and it's probably not a good idea to design based 
entirely on the needs of one use-case. Tamas may have different needs 
here, and maybe other members of the xen-devel community as well. Andrew 
prefers that we don't since that removes the waitqueue code.

To reiterate how this got started: we want to move the ring buffer 
memory from the guest to the hypervisor (we've had cases of OSes 
reclaiming that page after the first introspection application exit), 
and we want to make that memory bigger (so that more events will fit 
into it, carrying more information (bigger events)). That's essentially 
all we're after.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2018-12-20 12:05   ` Paul Durrant
  2018-12-20 14:25     ` Petre Ovidiu PIRCALABU
  2019-01-08 14:49     ` Petre Ovidiu PIRCALABU
@ 2019-01-10 15:30     ` Petre Ovidiu PIRCALABU
  2019-01-10 15:46       ` Paul Durrant
  2 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-01-10 15:30 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > 
> > The memory for the asynchronous ring and the synchronous channels
> > will
> > be allocated from domheap and mapped to the controlling domain
> > using the
> > foreignmemory_map_resource interface. Unlike the current
> > implementation,
> > the allocated pages are not part of the target DomU, so they will
> > not be
> > reclaimed when the vm_event domain is disabled.
> 
> Why re-invent the wheel here? The ioreq infrastructure already does
> pretty much everything you need AFAICT.
> 
>   Paul
> 

Hi Paul,

I'm still struggling to understand how the vm_event subsystem could be
integrated with an IOREQ server.

An IOREQ server shares with the emulator 2 pages, one for ioreqs and
one for buffered_ioreqs. For vm_event we need to share also one or more
pages for the async ring and a few pages for the slotted synchronous
vm_events.
So, to my understanding, your idea to use the ioreq infrastructure for
vm_events is basically to replace the custom signalling (event channels
+ ring / custom states) with ioreqs. Since the
vm_event_request/response structures are larger than 8 bytes, the
"data_is_ptr" flag should be used in conjunction with the addresses
(indexes) from the shared vm_event buffers. 

Is this the mechanism you had in mind?

Many thanks,
Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2019-01-10 15:30     ` Petre Ovidiu PIRCALABU
@ 2019-01-10 15:46       ` Paul Durrant
  2019-04-02 14:47         ` Andrew Cooper
  0 siblings, 1 reply; 50+ messages in thread
From: Paul Durrant @ 2019-01-10 15:46 UTC (permalink / raw)
  To: 'Petre Ovidiu PIRCALABU', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

> -----Original Message-----
> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
> Sent: 10 January 2019 15:31
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
> for sync requests.
> 
> On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > >
> > > The memory for the asynchronous ring and the synchronous channels
> > > will
> > > be allocated from domheap and mapped to the controlling domain
> > > using the
> > > foreignmemory_map_resource interface. Unlike the current
> > > implementation,
> > > the allocated pages are not part of the target DomU, so they will
> > > not be
> > > reclaimed when the vm_event domain is disabled.
> >
> > Why re-invent the wheel here? The ioreq infrastructure already does
> > pretty much everything you need AFAICT.
> >
> >   Paul
> >
> 
> Hi Paul,
> 
> I'm still struggling to understand how the vm_event subsystem could be
> integrated with an IOREQ server.
> 
> An IOREQ server shares with the emulator 2 pages, one for ioreqs and
> one for buffered_ioreqs. For vm_event we need to share also one or more
> pages for the async ring and a few pages for the slotted synchronous
> vm_events.
> So, to my understanding, your idea to use the ioreq infrastructure for
> vm_events is basically to replace the custom signalling (event channels
> + ring / custom states) with ioreqs. Since the
> vm_event_request/response structures are larger than 8 bytes, the
> "data_is_ptr" flag should be used in conjunction with the addresses
> (indexes) from the shared vm_event buffers.
> 
> Is this the mechanism you had in mind?
> 

Yes, that's roughly what I hoped might be possible. If that is too cumbersome though then it should at least be feasible to mimic the ioreq code's page allocation functions and code up vm_event buffers as another type of mappable resource.

  Paul

> Many thanks,
> Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type
  2019-01-08 15:01     ` Petre Ovidiu PIRCALABU
@ 2019-01-25 14:16       ` Wei Liu
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Liu @ 2019-01-25 14:16 UTC (permalink / raw)
  To: Petre Ovidiu PIRCALABU
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich,
	xen-devel

On Tue, Jan 08, 2019 at 03:01:34PM +0000, Petre Ovidiu PIRCALABU wrote:
> On Wed, 2019-01-02 at 11:11 +0000, Wei Liu wrote:
> > On Wed, Dec 19, 2018 at 08:52:05PM +0200, Petre Pircalabu wrote:
> > > Define the type for each of the supported vm_event rings (paging,
> > > monitor and sharing) and replace the ring param field with this
> > > type.
> > > 
> > > Replace XEN_DOMCTL_VM_EVENT_OP_ occurrences with their
> > > corresponding
> > > XEN_VM_EVENT_TYPE_ counterpart.
> > > 
> > > Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> > > ---
> > >  tools/libxc/xc_mem_paging.c |  6 ++--
> > >  tools/libxc/xc_monitor.c    |  6 ++--
> > >  tools/libxc/xc_private.h    |  8 +++---
> > >  tools/libxc/xc_vm_event.c   | 68 ++++++++++++++++++++++-----------
> > > ------------
> > >  xen/common/vm_event.c       | 12 ++++----
> > >  xen/include/public/domctl.h | 45 ++++++++++++++++--------------
> > 
> > You also need to change XEN_DOMCTL_INTERFACE_VERSION in this patch.
> > 
> > Wei.
> 
> XEN_DOMCTL_INTERFACE_VERSION was incremented in another patch of the
> series (vm_event: Use slotted channels for sync requests). Is it
> necessary to have it incremented twice? 

No it isn't. Sorry.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
                   ` (6 preceding siblings ...)
  2018-12-19 22:33 ` [PATCH RFC 0/6] Slotted channels for sync vm_events Tamas K Lengyel
@ 2019-02-06 14:26 ` Petre Ovidiu PIRCALABU
  2019-02-07 11:46   ` George Dunlap
  7 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-02-06 14:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:
> This patchset is a rework of the "multi-page ring buffer" for
> vm_events
> patch based on Andrew Cooper's comments.
> For synchronous vm_events the ring waitqueue logic was unnecessary as
> the
> vcpu sending the request was blocked until a response was received.
> To simplify the request/response mechanism, an array of slotted
> channels
> was created, one per vcpu. Each vcpu puts the request in the
> corresponding slot and blocks until the response is received.
> 
> I'm sending this patch as a RFC because, while I'm still working on
> way to
> measure the overall performance improvement, your feedback would be a
> great
> assistance.
> 

Is anyone still using asynchronous vm_event requests? (the vcpu is not
blocked and no response is expected).
If not, I suggest that the feature should be removed as it
(significantly) increases the complexity of the current implementation.

Many thanks,
Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2019-02-06 14:26 ` Petre Ovidiu PIRCALABU
@ 2019-02-07 11:46   ` George Dunlap
  2019-02-07 16:06     ` Petre Ovidiu PIRCALABU
  0 siblings, 1 reply; 50+ messages in thread
From: George Dunlap @ 2019-02-07 11:46 UTC (permalink / raw)
  To: Petre Ovidiu PIRCALABU, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:
> On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:
>> This patchset is a rework of the "multi-page ring buffer" for
>> vm_events
>> patch based on Andrew Cooper's comments.
>> For synchronous vm_events the ring waitqueue logic was unnecessary as
>> the
>> vcpu sending the request was blocked until a response was received.
>> To simplify the request/response mechanism, an array of slotted
>> channels
>> was created, one per vcpu. Each vcpu puts the request in the
>> corresponding slot and blocks until the response is received.
>>
>> I'm sending this patch as a RFC because, while I'm still working on
>> way to
>> measure the overall performance improvement, your feedback would be a
>> great
>> assistance.
>>
> 
> Is anyone still using asynchronous vm_event requests? (the vcpu is not
> blocked and no response is expected).
> If not, I suggest that the feature should be removed as it
> (significantly) increases the complexity of the current implementation.

Could you describe in a bit more detail what the situation is?  What's
the current state fo affairs with vm_events, what you're trying to
change, why async vm_events is more difficult?

I certainly think it would be better if you could write the new vm_event
interface without having to spend a lot of effort supporting modes that
you think nobody uses.

On the other hand, getting into the habit of breaking stuff, even for
people we don't know about, will be a hindrance to community growth; a
commitment to keeping it working will be a benefit to growth.

But of course, we haven't declared the vm_event interface 'supported'
(it's not even mentioned in the SUPPORT.md document yet).

Just for the sake of discussion, would it be possible / reasonble, for
instance, to create a new interface, vm_events2, instead?  Then you
could write it to share the ioreq interface without having legacy
baggage you're not using; we could deprecate and eventually remove
vm_events1, and if anyone shouts, we can put it back.

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2019-02-07 11:46   ` George Dunlap
@ 2019-02-07 16:06     ` Petre Ovidiu PIRCALABU
  2019-02-12 17:01       ` Tamas K Lengyel
  0 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-02-07 16:06 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

On Thu, 2019-02-07 at 11:46 +0000, George Dunlap wrote:
> On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:
> > On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:
> > > This patchset is a rework of the "multi-page ring buffer" for
> > > vm_events
> > > patch based on Andrew Cooper's comments.
> > > For synchronous vm_events the ring waitqueue logic was
> > > unnecessary as
> > > the
> > > vcpu sending the request was blocked until a response was
> > > received.
> > > To simplify the request/response mechanism, an array of slotted
> > > channels
> > > was created, one per vcpu. Each vcpu puts the request in the
> > > corresponding slot and blocks until the response is received.
> > > 
> > > I'm sending this patch as a RFC because, while I'm still working
> > > on
> > > way to
> > > measure the overall performance improvement, your feedback would
> > > be a
> > > great
> > > assistance.
> > > 
> > 
> > Is anyone still using asynchronous vm_event requests? (the vcpu is
> > not
> > blocked and no response is expected).
> > If not, I suggest that the feature should be removed as it
> > (significantly) increases the complexity of the current
> > implementation.
> 
> Could you describe in a bit more detail what the situation
> is?  What's
> the current state fo affairs with vm_events, what you're trying to
> change, why async vm_events is more difficult?
> 
The main reason for the vm_events modification was to improve the
overall performance in high throughput introspection scenarios. For
domus with a higher vcpu count, a vcpu could sleep for a certain period
of time while waiting for a ring slot to become available
(__vm_event_claim_slot)
The first patchset only increased the ring size, and the second
iteraton, based on Andrew Copper's comments, proposed a separate path
to handle synchronous events ( a slotted buffer for each vcpu) in order
to have the events handled independently of one another. To handle
asynchronous events, a dynamically allocated vm_event ring is used.
While the implementation is not exactly an exercise in simplicity, it
preserves all the needed functionality and offers fallback if the Linux
domain running the monitor application doesn't support
IOCTL_PRIVCMD_MMAP_RESOURCE.
However, the problem got a little bit more complicated when I tried
implementing the vm_events using an IOREQ server (based on Paul
Durrant's comments). For synchronous vm_events, it simplified things a
little, eliminating both the need for a special structure to hold the 
processing state and the evtchns for each vcpu.
The asynchronous events were a little more tricky to handle. The
buffered ioreqs were a good candidate, but the only thing usable is the
corresponding evtchn in conjunction with an existing ring. In order to
use them, a mock buffered ioreq should be created and transmitted, with
the only meaningful field being the ioreq type.

> I certainly think it would be better if you could write the new
> vm_event
> interface without having to spend a lot of effort supporting modes
> that
> you think nobody uses.
> 
> On the other hand, getting into the habit of breaking stuff, even for
> people we don't know about, will be a hindrance to community growth;
> a
> commitment to keeping it working will be a benefit to growth.
> 
> But of course, we haven't declared the vm_event interface 'supported'
> (it's not even mentioned in the SUPPORT.md document yet).
> 
> Just for the sake of discussion, would it be possible / reasonble,
> for
> instance, to create a new interface, vm_events2, instead?  Then you
> could write it to share the ioreq interface without having legacy
> baggage you're not using; we could deprecate and eventually remove
> vm_events1, and if anyone shouts, we can put it back.
> 
> Thoughts?
> 
>  -George
Yes, it's possible and it will GREATLY simplify the implementation. I
just have to make sure the interfaces are mutually exclusive.

Many thanks for your support,
Petre


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type
  2019-01-08 16:25   ` Jan Beulich
@ 2019-02-11 12:30     ` Petre Ovidiu PIRCALABU
  0 siblings, 0 replies; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-02-11 12:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Tamas K Lengyel, xen-devel

On Tue, 2019-01-08 at 09:25 -0700, Jan Beulich wrote:
> > > > On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
> > 
> > @@ -796,7 +787,7 @@ struct xen_domctl_gdbsx_domstatus {
> >   * EXDEV  - guest has PoD enabled
> >   * EBUSY  - guest has or had paging enabled, ring buffer still
> > active
> >   */
> > -#define XEN_DOMCTL_VM_EVENT_OP_PAGING            1
> > +#define XEN_VM_EVENT_TYPE_PAGING         1
> 
> Assuming the renaming is for the purpose of re-use elsewhere,
> I think these should be moved to vm_event.h then.

I will move the definitions to vm_event.h but part of the corresponding
comments will have to remain in domctl.h (e.g. XEN_VM_EVENT_ENABLE
error codes for each vm_event type).
> 
> > + * This hypercall allows one to control the vm_event rings
> > (enable/disable),
> > + * as well as to signal to the hypervisor to pull responses
> > (resume) and
> > + * retrieve the event channel from the given ring.
> > + */
> > +#define XEN_VM_EVENT_ENABLE               0
> > +#define XEN_VM_EVENT_DISABLE              1
> > +#define XEN_VM_EVENT_RESUME               2
> 
> Same perhaps for these, albeit here you just move them down a
> few lines.
These are the actual XEN_DOMCTL_vm_event_op opcodes. They are used only
in conjunction with this domctl so I would prefer not to move them.

Many thanks for your comments,
Petre


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-01-09  9:11         ` Razvan Cojocaru
@ 2019-02-12 16:57           ` Petre Ovidiu PIRCALABU
  2019-02-12 17:14             ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Petre Ovidiu PIRCALABU @ 2019-02-12 16:57 UTC (permalink / raw)
  To: Razvan Cojocaru, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

On Wed, 2019-01-09 at 11:11 +0200, Razvan Cojocaru wrote:
> 
> On 1/8/19 6:47 PM, Jan Beulich wrote:
> > > > > On 08.01.19 at 17:37, <rcojocaru@bitdefender.com> wrote:
> > > 
> > > On 1/8/19 6:27 PM, Jan Beulich wrote:
> > > > > > > On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
> > > > > 
> > > > > Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> > > > 
> > > > An empty description is not helpful. The immediate question is:
> > > > Why?
> > > > We don't do this for other interface versions. I'm unconvinced
> > > > a
> > > > special purpose piece of information like this one belongs into
> > > > the
> > > > rather generic version hypercall.
> > > 
> > > For an introspection application meant to be deployed on several
> > > Xen
> > > versions without recompiling, it is important to be able to
> > > decide at
> > > runtime what size and layout the vm_event struct has.
> > > 
> > > Currently this can somewhat be done by associating the current
> > > version
> > > with the vm_event version, but that is not ideal for obvious
> > > reasons.
> > > Reading the vm_event version from an actual vm_event is also out
> > > of the
> > > question, because in order to be able to receive the first
> > > vm_event we
> > > have to set the ring buffer up, and that requires knowledge of
> > > the size
> > > of the vm_event. So a run-time mechanism for querying the
> > > vm_event
> > > version is needed.
> > > 
> > > We just thought that this was the most flexible place to add it.
> > 
> > How about a new XEN_DOMCTL_VM_EVENT_GET_VERSION?
> 
> That would work as well, we just thought this was the least
> intrusive 
> and most extensible way to do it (other queries could be added
> similarly 
> in the future, without needing a new DOMCTL / libxc toolstack 
> modifications).
> 
Personally, would prefer the xc_version approach because it has a
number of advantages over a creating specific domctl:

- First, it's a version. In my opinion, if an interface too strongly
coupled with XEN that it cannot be disabled at configure-time, it's
generic enough to be queried by the common version functions. An 
example of getting specialized information from XEN is
XENVER_get_features, which is also handled using xc_version.

- This interface version is hypervisor specific. A client application
should be able to query this version at startup, even before the
monitor domain is available, and a domctl requires a domain id. The
DOM0 id or DOMID_INVALID can be used, but I find it rather confusing to
query something hypervisor specific and pass a domain related param.

- It's simple and it can be easily extended.

Many thanks,
Petre

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2019-02-07 16:06     ` Petre Ovidiu PIRCALABU
@ 2019-02-12 17:01       ` Tamas K Lengyel
  2019-02-19 11:48         ` Razvan Cojocaru
  0 siblings, 1 reply; 50+ messages in thread
From: Tamas K Lengyel @ 2019-02-12 17:01 UTC (permalink / raw)
  To: Petre Ovidiu PIRCALABU
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	George Dunlap, Tim Deegan, Julien Grall, Jan Beulich, xen-devel,
	Roger Pau Monné

On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU
<ppircalabu@bitdefender.com> wrote:
>
> On Thu, 2019-02-07 at 11:46 +0000, George Dunlap wrote:
> > On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:
> > > On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:
> > > > This patchset is a rework of the "multi-page ring buffer" for
> > > > vm_events
> > > > patch based on Andrew Cooper's comments.
> > > > For synchronous vm_events the ring waitqueue logic was
> > > > unnecessary as
> > > > the
> > > > vcpu sending the request was blocked until a response was
> > > > received.
> > > > To simplify the request/response mechanism, an array of slotted
> > > > channels
> > > > was created, one per vcpu. Each vcpu puts the request in the
> > > > corresponding slot and blocks until the response is received.
> > > >
> > > > I'm sending this patch as a RFC because, while I'm still working
> > > > on
> > > > way to
> > > > measure the overall performance improvement, your feedback would
> > > > be a
> > > > great
> > > > assistance.
> > > >
> > >
> > > Is anyone still using asynchronous vm_event requests? (the vcpu is
> > > not
> > > blocked and no response is expected).
> > > If not, I suggest that the feature should be removed as it
> > > (significantly) increases the complexity of the current
> > > implementation.
> >
> > Could you describe in a bit more detail what the situation
> > is?  What's
> > the current state fo affairs with vm_events, what you're trying to
> > change, why async vm_events is more difficult?
> >
> The main reason for the vm_events modification was to improve the
> overall performance in high throughput introspection scenarios. For
> domus with a higher vcpu count, a vcpu could sleep for a certain period
> of time while waiting for a ring slot to become available
> (__vm_event_claim_slot)
> The first patchset only increased the ring size, and the second
> iteraton, based on Andrew Copper's comments, proposed a separate path
> to handle synchronous events ( a slotted buffer for each vcpu) in order
> to have the events handled independently of one another. To handle
> asynchronous events, a dynamically allocated vm_event ring is used.
> While the implementation is not exactly an exercise in simplicity, it
> preserves all the needed functionality and offers fallback if the Linux
> domain running the monitor application doesn't support
> IOCTL_PRIVCMD_MMAP_RESOURCE.
> However, the problem got a little bit more complicated when I tried
> implementing the vm_events using an IOREQ server (based on Paul
> Durrant's comments). For synchronous vm_events, it simplified things a
> little, eliminating both the need for a special structure to hold the
> processing state and the evtchns for each vcpu.
> The asynchronous events were a little more tricky to handle. The
> buffered ioreqs were a good candidate, but the only thing usable is the
> corresponding evtchn in conjunction with an existing ring. In order to
> use them, a mock buffered ioreq should be created and transmitted, with
> the only meaningful field being the ioreq type.
>
> > I certainly think it would be better if you could write the new
> > vm_event
> > interface without having to spend a lot of effort supporting modes
> > that
> > you think nobody uses.
> >
> > On the other hand, getting into the habit of breaking stuff, even for
> > people we don't know about, will be a hindrance to community growth;
> > a
> > commitment to keeping it working will be a benefit to growth.
> >
> > But of course, we haven't declared the vm_event interface 'supported'
> > (it's not even mentioned in the SUPPORT.md document yet).
> >
> > Just for the sake of discussion, would it be possible / reasonble,
> > for
> > instance, to create a new interface, vm_events2, instead?  Then you
> > could write it to share the ioreq interface without having legacy
> > baggage you're not using; we could deprecate and eventually remove
> > vm_events1, and if anyone shouts, we can put it back.
> >
> > Thoughts?
> >
> >  -George
> Yes, it's possible and it will GREATLY simplify the implementation. I
> just have to make sure the interfaces are mutually exclusive.

I'm for removing features from the vm_event interface that are no
longer in use, especially if they block more advantageous changes like
this one. We don't know what the use-case was for async events nor
have seen anyone even mention them since I've been working with Xen.
Creating a new interface, as mentioned above, would make sense if
there was a disagreement with retiring this feature. I don't think
that's the case. I certainly would prefer not having to maintain two
separate interfaces going forward without a clear justification and
documented use-case explaining why we keep the old interface around.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-02-12 16:57           ` Petre Ovidiu PIRCALABU
@ 2019-02-12 17:14             ` Jan Beulich
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2019-02-12 17:14 UTC (permalink / raw)
  To: Petre Pircalabu, Razvan Cojocaru
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 12.02.19 at 17:57, <ppircalabu@bitdefender.com> wrote:
> On Wed, 2019-01-09 at 11:11 +0200, Razvan Cojocaru wrote:
>> 
>> On 1/8/19 6:47 PM, Jan Beulich wrote:
>> > > > > On 08.01.19 at 17:37, <rcojocaru@bitdefender.com> wrote:
>> > > 
>> > > On 1/8/19 6:27 PM, Jan Beulich wrote:
>> > > > > > > On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
>> > > > > 
>> > > > > Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
>> > > > 
>> > > > An empty description is not helpful. The immediate question is:
>> > > > Why?
>> > > > We don't do this for other interface versions. I'm unconvinced
>> > > > a
>> > > > special purpose piece of information like this one belongs into
>> > > > the
>> > > > rather generic version hypercall.
>> > > 
>> > > For an introspection application meant to be deployed on several
>> > > Xen
>> > > versions without recompiling, it is important to be able to
>> > > decide at
>> > > runtime what size and layout the vm_event struct has.
>> > > 
>> > > Currently this can somewhat be done by associating the current
>> > > version
>> > > with the vm_event version, but that is not ideal for obvious
>> > > reasons.
>> > > Reading the vm_event version from an actual vm_event is also out
>> > > of the
>> > > question, because in order to be able to receive the first
>> > > vm_event we
>> > > have to set the ring buffer up, and that requires knowledge of
>> > > the size
>> > > of the vm_event. So a run-time mechanism for querying the
>> > > vm_event
>> > > version is needed.
>> > > 
>> > > We just thought that this was the most flexible place to add it.
>> > 
>> > How about a new XEN_DOMCTL_VM_EVENT_GET_VERSION?
>> 
>> That would work as well, we just thought this was the least
>> intrusive 
>> and most extensible way to do it (other queries could be added
>> similarly 
>> in the future, without needing a new DOMCTL / libxc toolstack 
>> modifications).
>> 
> Personally, would prefer the xc_version approach because it has a
> number of advantages over a creating specific domctl:
> 
> - First, it's a version. In my opinion, if an interface too strongly
> coupled with XEN that it cannot be disabled at configure-time, it's
> generic enough to be queried by the common version functions. An 
> example of getting specialized information from XEN is
> XENVER_get_features, which is also handled using xc_version.

Whether XENVER_get_features is a good fit there is questionable,
but it's been there for too long to be (re)moved.

> - This interface version is hypervisor specific. A client application
> should be able to query this version at startup, even before the
> monitor domain is available, and a domctl requires a domain id. The
> DOM0 id or DOMID_INVALID can be used, but I find it rather confusing to
> query something hypervisor specific and pass a domain related param.

Well, I did suggest a domctl because there already is
XEN_DOMCTL_vm_event_op, and it could be a sub-op there.
Of course you could also add a sysctl just for this, or a whole
new major hypercall ...

Note how XEN_DOMCTL_createdomain (obviously) also can't
possibly be handed a domain ID.

> - It's simple and it can be easily extended.

As are any other (simple) hypercalls.

As you can see I continue to be opposed to add special
purpose subsystem information to a general hypercall (of
whatever sort). However, as so often I'm not opposed
enough to actively nack such an addition, so if other REST
maintainers think this is an appropriate thing to do, so be it.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-01-08 16:37     ` Razvan Cojocaru
  2019-01-08 16:47       ` Jan Beulich
@ 2019-02-12 18:13       ` Tamas K Lengyel
  2019-02-12 18:19         ` Razvan Cojocaru
  1 sibling, 1 reply; 50+ messages in thread
From: Tamas K Lengyel @ 2019-02-12 18:13 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Petre Pircalabu, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On Tue, Jan 8, 2019 at 9:39 AM Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
>
> On 1/8/19 6:27 PM, Jan Beulich wrote:
> >>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
> >> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> >
> > An empty description is not helpful. The immediate question is: Why?
> > We don't do this for other interface versions. I'm unconvinced a
> > special purpose piece of information like this one belongs into the
> > rather generic version hypercall.
>
> For an introspection application meant to be deployed on several Xen
> versions without recompiling, it is important to be able to decide at
> runtime what size and layout the vm_event struct has.
>
> Currently this can somewhat be done by associating the current version
> with the vm_event version, but that is not ideal for obvious reasons.

We do exactly that in LibVMI and it works fine - care to elaborate
what problem you have with doing that? There is a 1:1 match between
any stable Xen version and a vm_event interface version. I don't think
we will ever have a situation where we bump the vm_event interface
version but not the Xen release version.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-02-12 18:13       ` Tamas K Lengyel
@ 2019-02-12 18:19         ` Razvan Cojocaru
  2019-02-12 18:25           ` Tamas K Lengyel
  0 siblings, 1 reply; 50+ messages in thread
From: Razvan Cojocaru @ 2019-02-12 18:19 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On 2/12/19 8:13 PM, Tamas K Lengyel wrote:
> On Tue, Jan 8, 2019 at 9:39 AM Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>>
>> On 1/8/19 6:27 PM, Jan Beulich wrote:
>>>>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
>>>> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
>>>
>>> An empty description is not helpful. The immediate question is: Why?
>>> We don't do this for other interface versions. I'm unconvinced a
>>> special purpose piece of information like this one belongs into the
>>> rather generic version hypercall.
>>
>> For an introspection application meant to be deployed on several Xen
>> versions without recompiling, it is important to be able to decide at
>> runtime what size and layout the vm_event struct has.
>>
>> Currently this can somewhat be done by associating the current version
>> with the vm_event version, but that is not ideal for obvious reasons.
> 
> We do exactly that in LibVMI and it works fine - care to elaborate
> what problem you have with doing that? There is a 1:1 match between
> any stable Xen version and a vm_event interface version. I don't think
> we will ever have a situation where we bump the vm_event interface
> version but not the Xen release version.

XenServer. Any given version of XenServer may or may not fit the matrix
you're talking about (because some patches are backported, some are not,
etc.). In which case it all becomes very complicated.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 6/6] xc_version: add vm_event interface version
  2019-02-12 18:19         ` Razvan Cojocaru
@ 2019-02-12 18:25           ` Tamas K Lengyel
  0 siblings, 0 replies; 50+ messages in thread
From: Tamas K Lengyel @ 2019-02-12 18:25 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Petre Pircalabu, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On Tue, Feb 12, 2019 at 11:20 AM Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
>
> On 2/12/19 8:13 PM, Tamas K Lengyel wrote:
> > On Tue, Jan 8, 2019 at 9:39 AM Razvan Cojocaru
> > <rcojocaru@bitdefender.com> wrote:
> >>
> >> On 1/8/19 6:27 PM, Jan Beulich wrote:
> >>>>>> On 19.12.18 at 19:52, <ppircalabu@bitdefender.com> wrote:
> >>>> Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
> >>>
> >>> An empty description is not helpful. The immediate question is: Why?
> >>> We don't do this for other interface versions. I'm unconvinced a
> >>> special purpose piece of information like this one belongs into the
> >>> rather generic version hypercall.
> >>
> >> For an introspection application meant to be deployed on several Xen
> >> versions without recompiling, it is important to be able to decide at
> >> runtime what size and layout the vm_event struct has.
> >>
> >> Currently this can somewhat be done by associating the current version
> >> with the vm_event version, but that is not ideal for obvious reasons.
> >
> > We do exactly that in LibVMI and it works fine - care to elaborate
> > what problem you have with doing that? There is a 1:1 match between
> > any stable Xen version and a vm_event interface version. I don't think
> > we will ever have a situation where we bump the vm_event interface
> > version but not the Xen release version.
>
> XenServer. Any given version of XenServer may or may not fit the matrix
> you're talking about (because some patches are backported, some are not,
> etc.). In which case it all becomes very complicated.

Ah yes, if custom patches are applied on top I can see how that could
become a problem.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2019-02-12 17:01       ` Tamas K Lengyel
@ 2019-02-19 11:48         ` Razvan Cojocaru
  2019-03-04 16:01           ` George Dunlap
  0 siblings, 1 reply; 50+ messages in thread
From: Razvan Cojocaru @ 2019-02-19 11:48 UTC (permalink / raw)
  To: Tamas K Lengyel, Petre Ovidiu PIRCALABU
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Julien Grall, Jan Beulich, xen-devel,
	Roger Pau Monné

On 2/12/19 7:01 PM, Tamas K Lengyel wrote:
> On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU
> <ppircalabu@bitdefender.com> wrote:
>>
>> On Thu, 2019-02-07 at 11:46 +0000, George Dunlap wrote:
>>> On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:
>>>> On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:
>>>>> This patchset is a rework of the "multi-page ring buffer" for
>>>>> vm_events
>>>>> patch based on Andrew Cooper's comments.
>>>>> For synchronous vm_events the ring waitqueue logic was
>>>>> unnecessary as
>>>>> the
>>>>> vcpu sending the request was blocked until a response was
>>>>> received.
>>>>> To simplify the request/response mechanism, an array of slotted
>>>>> channels
>>>>> was created, one per vcpu. Each vcpu puts the request in the
>>>>> corresponding slot and blocks until the response is received.
>>>>>
>>>>> I'm sending this patch as a RFC because, while I'm still working
>>>>> on
>>>>> way to
>>>>> measure the overall performance improvement, your feedback would
>>>>> be a
>>>>> great
>>>>> assistance.
>>>>>
>>>>
>>>> Is anyone still using asynchronous vm_event requests? (the vcpu is
>>>> not
>>>> blocked and no response is expected).
>>>> If not, I suggest that the feature should be removed as it
>>>> (significantly) increases the complexity of the current
>>>> implementation.
>>>
>>> Could you describe in a bit more detail what the situation
>>> is?  What's
>>> the current state fo affairs with vm_events, what you're trying to
>>> change, why async vm_events is more difficult?
>>>
>> The main reason for the vm_events modification was to improve the
>> overall performance in high throughput introspection scenarios. For
>> domus with a higher vcpu count, a vcpu could sleep for a certain period
>> of time while waiting for a ring slot to become available
>> (__vm_event_claim_slot)
>> The first patchset only increased the ring size, and the second
>> iteraton, based on Andrew Copper's comments, proposed a separate path
>> to handle synchronous events ( a slotted buffer for each vcpu) in order
>> to have the events handled independently of one another. To handle
>> asynchronous events, a dynamically allocated vm_event ring is used.
>> While the implementation is not exactly an exercise in simplicity, it
>> preserves all the needed functionality and offers fallback if the Linux
>> domain running the monitor application doesn't support
>> IOCTL_PRIVCMD_MMAP_RESOURCE.
>> However, the problem got a little bit more complicated when I tried
>> implementing the vm_events using an IOREQ server (based on Paul
>> Durrant's comments). For synchronous vm_events, it simplified things a
>> little, eliminating both the need for a special structure to hold the
>> processing state and the evtchns for each vcpu.
>> The asynchronous events were a little more tricky to handle. The
>> buffered ioreqs were a good candidate, but the only thing usable is the
>> corresponding evtchn in conjunction with an existing ring. In order to
>> use them, a mock buffered ioreq should be created and transmitted, with
>> the only meaningful field being the ioreq type.
>>
>>> I certainly think it would be better if you could write the new
>>> vm_event
>>> interface without having to spend a lot of effort supporting modes
>>> that
>>> you think nobody uses.
>>>
>>> On the other hand, getting into the habit of breaking stuff, even for
>>> people we don't know about, will be a hindrance to community growth;
>>> a
>>> commitment to keeping it working will be a benefit to growth.
>>>
>>> But of course, we haven't declared the vm_event interface 'supported'
>>> (it's not even mentioned in the SUPPORT.md document yet).
>>>
>>> Just for the sake of discussion, would it be possible / reasonble,
>>> for
>>> instance, to create a new interface, vm_events2, instead?  Then you
>>> could write it to share the ioreq interface without having legacy
>>> baggage you're not using; we could deprecate and eventually remove
>>> vm_events1, and if anyone shouts, we can put it back.
>>>
>>> Thoughts?
>>>
>>>   -George
>> Yes, it's possible and it will GREATLY simplify the implementation. I
>> just have to make sure the interfaces are mutually exclusive.
> 
> I'm for removing features from the vm_event interface that are no
> longer in use, especially if they block more advantageous changes like
> this one. We don't know what the use-case was for async events nor
> have seen anyone even mention them since I've been working with Xen.
> Creating a new interface, as mentioned above, would make sense if
> there was a disagreement with retiring this feature. I don't think
> that's the case. I certainly would prefer not having to maintain two
> separate interfaces going forward without a clear justification and
> documented use-case explaining why we keep the old interface around.

AFAICT, the async model is broken conceptually as well, so it makes no 
sense. It would make sense, IMHO, if it would be lossy (i.e. we just 
write in the ring buffer, if somebody manages to "catch" an event while 
it flies by then so be it, if not it will be overwritten). If it would 
have been truly lossy, I'd see a use-case for it, for gathering 
statistics maybe.

However, as the code is now, the VCPUs are not paused only if there's 
still space in the ring buffer. If there's no more space in the ring 
buffer, the VCPU trying to put an event in still gets paused by the 
vm_event logic, which means that these events are only "half-async".

FWIW, I'm with Tamas on this one: if nobody cares about async events / 
comes forward with valid use cases or applications using them, I see no 
reason why we should have this extra code to maintain, find bugs in, and 
trip over other components (migration, in Andrew's example).


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2019-02-19 11:48         ` Razvan Cojocaru
@ 2019-03-04 16:01           ` George Dunlap
  2019-03-04 16:20             ` Tamas K Lengyel
  0 siblings, 1 reply; 50+ messages in thread
From: George Dunlap @ 2019-03-04 16:01 UTC (permalink / raw)
  To: Razvan Cojocaru, Tamas K Lengyel, Petre Ovidiu PIRCALABU
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, xen-devel, Roger Pau Monné

On 2/19/19 11:48 AM, Razvan Cojocaru wrote:
> On 2/12/19 7:01 PM, Tamas K Lengyel wrote:
>> On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU
>> <ppircalabu@bitdefender.com> wrote:
>>>
>>> On Thu, 2019-02-07 at 11:46 +0000, George Dunlap wrote:
>>>> On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:
>>>>> On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:
>>>>>> This patchset is a rework of the "multi-page ring buffer" for
>>>>>> vm_events
>>>>>> patch based on Andrew Cooper's comments.
>>>>>> For synchronous vm_events the ring waitqueue logic was
>>>>>> unnecessary as
>>>>>> the
>>>>>> vcpu sending the request was blocked until a response was
>>>>>> received.
>>>>>> To simplify the request/response mechanism, an array of slotted
>>>>>> channels
>>>>>> was created, one per vcpu. Each vcpu puts the request in the
>>>>>> corresponding slot and blocks until the response is received.
>>>>>>
>>>>>> I'm sending this patch as a RFC because, while I'm still working
>>>>>> on
>>>>>> way to
>>>>>> measure the overall performance improvement, your feedback would
>>>>>> be a
>>>>>> great
>>>>>> assistance.
>>>>>>
>>>>>
>>>>> Is anyone still using asynchronous vm_event requests? (the vcpu is
>>>>> not
>>>>> blocked and no response is expected).
>>>>> If not, I suggest that the feature should be removed as it
>>>>> (significantly) increases the complexity of the current
>>>>> implementation.
>>>>
>>>> Could you describe in a bit more detail what the situation
>>>> is?  What's
>>>> the current state fo affairs with vm_events, what you're trying to
>>>> change, why async vm_events is more difficult?
>>>>
>>> The main reason for the vm_events modification was to improve the
>>> overall performance in high throughput introspection scenarios. For
>>> domus with a higher vcpu count, a vcpu could sleep for a certain period
>>> of time while waiting for a ring slot to become available
>>> (__vm_event_claim_slot)
>>> The first patchset only increased the ring size, and the second
>>> iteraton, based on Andrew Copper's comments, proposed a separate path
>>> to handle synchronous events ( a slotted buffer for each vcpu) in order
>>> to have the events handled independently of one another. To handle
>>> asynchronous events, a dynamically allocated vm_event ring is used.
>>> While the implementation is not exactly an exercise in simplicity, it
>>> preserves all the needed functionality and offers fallback if the Linux
>>> domain running the monitor application doesn't support
>>> IOCTL_PRIVCMD_MMAP_RESOURCE.
>>> However, the problem got a little bit more complicated when I tried
>>> implementing the vm_events using an IOREQ server (based on Paul
>>> Durrant's comments). For synchronous vm_events, it simplified things a
>>> little, eliminating both the need for a special structure to hold the
>>> processing state and the evtchns for each vcpu.
>>> The asynchronous events were a little more tricky to handle. The
>>> buffered ioreqs were a good candidate, but the only thing usable is the
>>> corresponding evtchn in conjunction with an existing ring. In order to
>>> use them, a mock buffered ioreq should be created and transmitted, with
>>> the only meaningful field being the ioreq type.
>>>
>>>> I certainly think it would be better if you could write the new
>>>> vm_event
>>>> interface without having to spend a lot of effort supporting modes
>>>> that
>>>> you think nobody uses.
>>>>
>>>> On the other hand, getting into the habit of breaking stuff, even for
>>>> people we don't know about, will be a hindrance to community growth;
>>>> a
>>>> commitment to keeping it working will be a benefit to growth.
>>>>
>>>> But of course, we haven't declared the vm_event interface 'supported'
>>>> (it's not even mentioned in the SUPPORT.md document yet).
>>>>
>>>> Just for the sake of discussion, would it be possible / reasonble,
>>>> for
>>>> instance, to create a new interface, vm_events2, instead?  Then you
>>>> could write it to share the ioreq interface without having legacy
>>>> baggage you're not using; we could deprecate and eventually remove
>>>> vm_events1, and if anyone shouts, we can put it back.
>>>>
>>>> Thoughts?
>>>>
>>>>   -George
>>> Yes, it's possible and it will GREATLY simplify the implementation. I
>>> just have to make sure the interfaces are mutually exclusive.
>>
>> I'm for removing features from the vm_event interface that are no
>> longer in use, especially if they block more advantageous changes like
>> this one. We don't know what the use-case was for async events nor
>> have seen anyone even mention them since I've been working with Xen.
>> Creating a new interface, as mentioned above, would make sense if
>> there was a disagreement with retiring this feature. I don't think
>> that's the case. I certainly would prefer not having to maintain two
>> separate interfaces going forward without a clear justification and
>> documented use-case explaining why we keep the old interface around.
> 
> AFAICT, the async model is broken conceptually as well, so it makes no
> sense. It would make sense, IMHO, if it would be lossy (i.e. we just
> write in the ring buffer, if somebody manages to "catch" an event while
> it flies by then so be it, if not it will be overwritten). If it would
> have been truly lossy, I'd see a use-case for it, for gathering
> statistics maybe.
> 
> However, as the code is now, the VCPUs are not paused only if there's
> still space in the ring buffer. If there's no more space in the ring
> buffer, the VCPU trying to put an event in still gets paused by the
> vm_event logic, which means that these events are only "half-async".
> 
> FWIW, I'm with Tamas on this one: if nobody cares about async events /
> comes forward with valid use cases or applications using them, I see no
> reason why we should have this extra code to maintain, find bugs in, and
> trip over other components (migration, in Andrew's example).

The whole idea would be that we *don't* maintain "v1", unless someone
shows up and says they want to use it.

To let you know where I'm coming from:  It's actually fairly uncommon
for people using something to participate in the community that
generates it.  For example, a few years ago I spent some time with a
GCoC student making libxl bindings for Golang.  In spite of the fact
that they're still at 'early prototype' stage, apparently there's a
community of people that have picked those up and are using them.

Such people often:
* Start using without getting involved with the community
* Don't read the mailing list to hear what's going to happen
* Won't complain if something breaks, but will just go find a different
platform.

If you have something that already works, switching to something else is
a huge effort; if what you used to have is broken and you have to switch
everything over anyway, then you're much more likely to try something else.

There are many reason for the success of both Linux and Linux
containers, but one major one is the fairly fanatical commitment they
have to backwards compatibility and avoiding breaking userspace
applications.  By contrast, it seems to me that the xend -> xl
transition, while necessary, has been an inflection point that caused a
lot of people to consider moving away from Xen (since a lot of their old
tooling stopped working).

So; at the moment, we don't know if anyone's using the async
functionality or not.  We have three choices:

A. Try to implement both sync / async in the existing interface.
B. Re-write the current interface to be sync-only.
C. Create a new interface ('v2') from scratch, deleting 'v1' when 'v2'
is ready.

Option A *probaby* keeps silent users.  But, Petre spends a lot of time
designing around and debugging something that he doesn't use or care
about, and that he's not sure *anyone* uses.  Not great.

Option B is more efficient for Petre.  But if there are users, any ones
that don't complain we lose immediately.  If any users do complain, then
again we have the choice: Try to retrofit the async functionality, or
tell them we're sorry, they'll have to port their thing to v2 or go away.

In the case of option C, we can leave 'v1' there as long as we want.  If
we delete it, and people complain, it won't be terribly difficult to
reinstate 'v1' without affecting 'v2'.

I mean, a part of me completely agrees with you: get rid of cruft
nobody's using; if you want to depend on something someone else is
developing, at least show up and get involved in the community, or don't
complain when it goes away.

But another part of me is just worried the long-term effects of this
kind of behavior.

Anyway, I won't insist on having a v2 if nobody else says anything; I
just wanted to make sure the "invisible" effects got proper consideration.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/6] Slotted channels for sync vm_events
  2019-03-04 16:01           ` George Dunlap
@ 2019-03-04 16:20             ` Tamas K Lengyel
  0 siblings, 0 replies; 50+ messages in thread
From: Tamas K Lengyel @ 2019-03-04 16:20 UTC (permalink / raw)
  To: George Dunlap
  Cc: Petre Ovidiu PIRCALABU, Stefano Stabellini, Wei Liu,
	Razvan Cojocaru, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Jan Beulich, xen-devel, Roger Pau Monné

On Mon, Mar 4, 2019 at 9:01 AM George Dunlap <george.dunlap@citrix.com> wrote:
>
> On 2/19/19 11:48 AM, Razvan Cojocaru wrote:
> > On 2/12/19 7:01 PM, Tamas K Lengyel wrote:
> >> On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU
> >> <ppircalabu@bitdefender.com> wrote:
> >>>
> >>> On Thu, 2019-02-07 at 11:46 +0000, George Dunlap wrote:
> >>>> On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:
> >>>>> On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:
> >>>>>> This patchset is a rework of the "multi-page ring buffer" for
> >>>>>> vm_events
> >>>>>> patch based on Andrew Cooper's comments.
> >>>>>> For synchronous vm_events the ring waitqueue logic was
> >>>>>> unnecessary as
> >>>>>> the
> >>>>>> vcpu sending the request was blocked until a response was
> >>>>>> received.
> >>>>>> To simplify the request/response mechanism, an array of slotted
> >>>>>> channels
> >>>>>> was created, one per vcpu. Each vcpu puts the request in the
> >>>>>> corresponding slot and blocks until the response is received.
> >>>>>>
> >>>>>> I'm sending this patch as a RFC because, while I'm still working
> >>>>>> on
> >>>>>> way to
> >>>>>> measure the overall performance improvement, your feedback would
> >>>>>> be a
> >>>>>> great
> >>>>>> assistance.
> >>>>>>
> >>>>>
> >>>>> Is anyone still using asynchronous vm_event requests? (the vcpu is
> >>>>> not
> >>>>> blocked and no response is expected).
> >>>>> If not, I suggest that the feature should be removed as it
> >>>>> (significantly) increases the complexity of the current
> >>>>> implementation.
> >>>>
> >>>> Could you describe in a bit more detail what the situation
> >>>> is?  What's
> >>>> the current state fo affairs with vm_events, what you're trying to
> >>>> change, why async vm_events is more difficult?
> >>>>
> >>> The main reason for the vm_events modification was to improve the
> >>> overall performance in high throughput introspection scenarios. For
> >>> domus with a higher vcpu count, a vcpu could sleep for a certain period
> >>> of time while waiting for a ring slot to become available
> >>> (__vm_event_claim_slot)
> >>> The first patchset only increased the ring size, and the second
> >>> iteraton, based on Andrew Copper's comments, proposed a separate path
> >>> to handle synchronous events ( a slotted buffer for each vcpu) in order
> >>> to have the events handled independently of one another. To handle
> >>> asynchronous events, a dynamically allocated vm_event ring is used.
> >>> While the implementation is not exactly an exercise in simplicity, it
> >>> preserves all the needed functionality and offers fallback if the Linux
> >>> domain running the monitor application doesn't support
> >>> IOCTL_PRIVCMD_MMAP_RESOURCE.
> >>> However, the problem got a little bit more complicated when I tried
> >>> implementing the vm_events using an IOREQ server (based on Paul
> >>> Durrant's comments). For synchronous vm_events, it simplified things a
> >>> little, eliminating both the need for a special structure to hold the
> >>> processing state and the evtchns for each vcpu.
> >>> The asynchronous events were a little more tricky to handle. The
> >>> buffered ioreqs were a good candidate, but the only thing usable is the
> >>> corresponding evtchn in conjunction with an existing ring. In order to
> >>> use them, a mock buffered ioreq should be created and transmitted, with
> >>> the only meaningful field being the ioreq type.
> >>>
> >>>> I certainly think it would be better if you could write the new
> >>>> vm_event
> >>>> interface without having to spend a lot of effort supporting modes
> >>>> that
> >>>> you think nobody uses.
> >>>>
> >>>> On the other hand, getting into the habit of breaking stuff, even for
> >>>> people we don't know about, will be a hindrance to community growth;
> >>>> a
> >>>> commitment to keeping it working will be a benefit to growth.
> >>>>
> >>>> But of course, we haven't declared the vm_event interface 'supported'
> >>>> (it's not even mentioned in the SUPPORT.md document yet).
> >>>>
> >>>> Just for the sake of discussion, would it be possible / reasonble,
> >>>> for
> >>>> instance, to create a new interface, vm_events2, instead?  Then you
> >>>> could write it to share the ioreq interface without having legacy
> >>>> baggage you're not using; we could deprecate and eventually remove
> >>>> vm_events1, and if anyone shouts, we can put it back.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>   -George
> >>> Yes, it's possible and it will GREATLY simplify the implementation. I
> >>> just have to make sure the interfaces are mutually exclusive.
> >>
> >> I'm for removing features from the vm_event interface that are no
> >> longer in use, especially if they block more advantageous changes like
> >> this one. We don't know what the use-case was for async events nor
> >> have seen anyone even mention them since I've been working with Xen.
> >> Creating a new interface, as mentioned above, would make sense if
> >> there was a disagreement with retiring this feature. I don't think
> >> that's the case. I certainly would prefer not having to maintain two
> >> separate interfaces going forward without a clear justification and
> >> documented use-case explaining why we keep the old interface around.
> >
> > AFAICT, the async model is broken conceptually as well, so it makes no
> > sense. It would make sense, IMHO, if it would be lossy (i.e. we just
> > write in the ring buffer, if somebody manages to "catch" an event while
> > it flies by then so be it, if not it will be overwritten). If it would
> > have been truly lossy, I'd see a use-case for it, for gathering
> > statistics maybe.
> >
> > However, as the code is now, the VCPUs are not paused only if there's
> > still space in the ring buffer. If there's no more space in the ring
> > buffer, the VCPU trying to put an event in still gets paused by the
> > vm_event logic, which means that these events are only "half-async".
> >
> > FWIW, I'm with Tamas on this one: if nobody cares about async events /
> > comes forward with valid use cases or applications using them, I see no
> > reason why we should have this extra code to maintain, find bugs in, and
> > trip over other components (migration, in Andrew's example).
>
> The whole idea would be that we *don't* maintain "v1", unless someone
> shows up and says they want to use it.
>
> To let you know where I'm coming from:  It's actually fairly uncommon
> for people using something to participate in the community that
> generates it.  For example, a few years ago I spent some time with a
> GCoC student making libxl bindings for Golang.  In spite of the fact
> that they're still at 'early prototype' stage, apparently there's a
> community of people that have picked those up and are using them.
>
> Such people often:
> * Start using without getting involved with the community
> * Don't read the mailing list to hear what's going to happen
> * Won't complain if something breaks, but will just go find a different
> platform.
>
> If you have something that already works, switching to something else is
> a huge effort; if what you used to have is broken and you have to switch
> everything over anyway, then you're much more likely to try something else.
>
> There are many reason for the success of both Linux and Linux
> containers, but one major one is the fairly fanatical commitment they
> have to backwards compatibility and avoiding breaking userspace
> applications.  By contrast, it seems to me that the xend -> xl
> transition, while necessary, has been an inflection point that caused a
> lot of people to consider moving away from Xen (since a lot of their old
> tooling stopped working).
>
> So; at the moment, we don't know if anyone's using the async
> functionality or not.  We have three choices:
>
> A. Try to implement both sync / async in the existing interface.
> B. Re-write the current interface to be sync-only.
> C. Create a new interface ('v2') from scratch, deleting 'v1' when 'v2'
> is ready.
>
> Option A *probaby* keeps silent users.  But, Petre spends a lot of time
> designing around and debugging something that he doesn't use or care
> about, and that he's not sure *anyone* uses.  Not great.
>
> Option B is more efficient for Petre.  But if there are users, any ones
> that don't complain we lose immediately.  If any users do complain, then
> again we have the choice: Try to retrofit the async functionality, or
> tell them we're sorry, they'll have to port their thing to v2 or go away.
>
> In the case of option C, we can leave 'v1' there as long as we want.  If
> we delete it, and people complain, it won't be terribly difficult to
> reinstate 'v1' without affecting 'v2'.
>
> I mean, a part of me completely agrees with you: get rid of cruft
> nobody's using; if you want to depend on something someone else is
> developing, at least show up and get involved in the community, or don't
> complain when it goes away.
>
> But another part of me is just worried the long-term effects of this
> kind of behavior.
>
> Anyway, I won't insist on having a v2 if nobody else says anything; I
> just wanted to make sure the "invisible" effects got proper consideration.

I agree with the not-break-things for convenience only for stable
interfaces. This is not a stable interface however and never was -
it's experimental at this point. So we should treat it as such and not
keep dead code around just to make it appear to be a stable interface.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
  2019-01-10 15:46       ` Paul Durrant
@ 2019-04-02 14:47         ` Andrew Cooper
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Cooper @ 2019-04-02 14:47 UTC (permalink / raw)
  To: Paul Durrant, 'Petre Ovidiu PIRCALABU', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Konrad Rzeszutek Wilk, Tim (Xen.org),
	George Dunlap, Julien Grall, Tamas K Lengyel, Jan Beulich,
	Ian Jackson, Roger Pau Monne

On 10/01/2019 15:46, Paul Durrant wrote:
>> -----Original Message-----
>> From: Petre Ovidiu PIRCALABU [mailto:ppircalabu@bitdefender.com]
>> Sent: 10 January 2019 15:31
>> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> <wei.liu2@citrix.com>; Razvan Cojocaru <rcojocaru@bitdefender.com>; Konrad
>> Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
>> <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
>> Jackson <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Julien
>> Grall <julien.grall@arm.com>; Tamas K Lengyel <tamas@tklengyel.com>; Jan
>> Beulich <jbeulich@suse.com>; Roger Pau Monne <roger.pau@citrix.com>
>> Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
>> for sync requests.
>>
>> On Thu, 2018-12-20 at 12:05 +0000, Paul Durrant wrote:
>>>> -----Original Message-----
>>>>
>>>> The memory for the asynchronous ring and the synchronous channels
>>>> will
>>>> be allocated from domheap and mapped to the controlling domain
>>>> using the
>>>> foreignmemory_map_resource interface. Unlike the current
>>>> implementation,
>>>> the allocated pages are not part of the target DomU, so they will
>>>> not be
>>>> reclaimed when the vm_event domain is disabled.
>>> Why re-invent the wheel here? The ioreq infrastructure already does
>>> pretty much everything you need AFAICT.
>>>
>>>   Paul
>>>
>> Hi Paul,
>>
>> I'm still struggling to understand how the vm_event subsystem could be
>> integrated with an IOREQ server.
>>
>> An IOREQ server shares with the emulator 2 pages, one for ioreqs and
>> one for buffered_ioreqs. For vm_event we need to share also one or more
>> pages for the async ring and a few pages for the slotted synchronous
>> vm_events.
>> So, to my understanding, your idea to use the ioreq infrastructure for
>> vm_events is basically to replace the custom signalling (event channels
>> + ring / custom states) with ioreqs. Since the
>> vm_event_request/response structures are larger than 8 bytes, the
>> "data_is_ptr" flag should be used in conjunction with the addresses
>> (indexes) from the shared vm_event buffers.
>>
>> Is this the mechanism you had in mind?
>>
> Yes, that's roughly what I hoped might be possible. If that is too cumbersome though then it should at least be feasible to mimic the ioreq code's page allocation functions and code up vm_event buffers as another type of mappable resource.

So, I've finally realised what has been subtly nagging at me for a while
from the suggestion to use ioreqs.  vm_event and ioreq have completely
different operations and semantics as far as the code in Xen is concerned.

The semantics for ioreq servers are "given a specific MMIO/PIO/CFG
action, which one of $N emulators should handle it".

vm_event on the other hand behaves just like the VT-x/SVM vmexit
intercepts.  It is "tell me when the guest does $X".  There isn't a
sensible case for having multiple vm_event consumers for a domain.

There is no overlap in the format of data used, or the cases where an
event would be sent.  Therefore, I think trying to implement vm_event in
terms of the ioreq server infrastructure is a short sighted move.

Beyond that, the only similarity is the slotted ring setup, which can be
entirely abstracted away behind resource mapping.  This actually comes
with a bonus in that vm_event will no longer strictly be tied to HVM
guests by virtue of its ring living in an HVMPARAM.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2019-04-02 15:01 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19 18:52 [PATCH RFC 0/6] Slotted channels for sync vm_events Petre Pircalabu
2018-12-19 18:52 ` [RFC PATCH 1/6] tools/libxc: Consistent usage of xc_vm_event_* interface Petre Pircalabu
2018-12-19 18:52 ` [RFC PATCH 2/6] tools/libxc: Define VM_EVENT type Petre Pircalabu
2018-12-19 22:13   ` Tamas K Lengyel
2019-01-02 11:11   ` Wei Liu
2019-01-08 15:01     ` Petre Ovidiu PIRCALABU
2019-01-25 14:16       ` Wei Liu
2019-01-08 16:25   ` Jan Beulich
2019-02-11 12:30     ` Petre Ovidiu PIRCALABU
2018-12-19 18:52 ` [RFC PATCH 3/6] vm_event: Refactor vm_event_domain implementation Petre Pircalabu
2018-12-19 22:26   ` Tamas K Lengyel
2018-12-20 12:39     ` Petre Ovidiu PIRCALABU
2018-12-19 18:52 ` [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests Petre Pircalabu
2018-12-20 12:05   ` Paul Durrant
2018-12-20 14:25     ` Petre Ovidiu PIRCALABU
2018-12-20 14:28       ` Paul Durrant
2018-12-20 15:03         ` Jan Beulich
2018-12-24 10:37         ` Julien Grall
2019-01-09 16:21         ` Razvan Cojocaru
2019-01-10  9:58           ` Paul Durrant
2019-01-10 15:28             ` Razvan Cojocaru
2019-01-08 14:49     ` Petre Ovidiu PIRCALABU
2019-01-08 15:08       ` Paul Durrant
2019-01-08 16:13         ` Petre Ovidiu PIRCALABU
2019-01-08 16:25           ` Paul Durrant
2019-01-10 15:30     ` Petre Ovidiu PIRCALABU
2019-01-10 15:46       ` Paul Durrant
2019-04-02 14:47         ` Andrew Cooper
2018-12-19 18:52 ` [RFC PATCH 5/6] xen-access: add support for slotted channel vm_events Petre Pircalabu
2018-12-19 18:52 ` [RFC PATCH 6/6] xc_version: add vm_event interface version Petre Pircalabu
2019-01-08 16:27   ` Jan Beulich
2019-01-08 16:37     ` Razvan Cojocaru
2019-01-08 16:47       ` Jan Beulich
2019-01-09  9:11         ` Razvan Cojocaru
2019-02-12 16:57           ` Petre Ovidiu PIRCALABU
2019-02-12 17:14             ` Jan Beulich
2019-02-12 18:13       ` Tamas K Lengyel
2019-02-12 18:19         ` Razvan Cojocaru
2019-02-12 18:25           ` Tamas K Lengyel
2018-12-19 22:33 ` [PATCH RFC 0/6] Slotted channels for sync vm_events Tamas K Lengyel
2018-12-19 23:30   ` Andrew Cooper
2018-12-20 10:48   ` Petre Ovidiu PIRCALABU
2018-12-20 14:08     ` Tamas K Lengyel
2019-02-06 14:26 ` Petre Ovidiu PIRCALABU
2019-02-07 11:46   ` George Dunlap
2019-02-07 16:06     ` Petre Ovidiu PIRCALABU
2019-02-12 17:01       ` Tamas K Lengyel
2019-02-19 11:48         ` Razvan Cojocaru
2019-03-04 16:01           ` George Dunlap
2019-03-04 16:20             ` Tamas K Lengyel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.