All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
@ 2019-03-05 13:14 Andrii Anisov
  2019-03-05 13:14 ` [PATCH 1/2 for-4.12] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall Andrii Anisov
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Andrii Anisov @ 2019-03-05 13:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Julien Grall, Andrii Anisov, Roger Pau Monné

From: Andrii Anisov <andrii_anisov@epam.com>

Following discussion [1] it is introduced and implemented a runstate
registration interface which uses guest's phys address instead of a virtual one.
The new hypercall employes the same data structures as a predecessor, but
expects the vcpu_runstate_info structure to not cross a page boundary.
The interface is implemented in a way vcpu_runstate_info structure is mapped to
the hypervisor on the hypercall processing and is directly accessed during its
updates. This runstate area mapping follows vcpu_info structure registration.

The series is tested for ARM64. Build tested for x86. I'd appreciate if someone could
check it with x86.
The Linux kernel patch is here [2]. Though it is for 4.14.

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html
[2] https://github.com/aanisov/linux/commit/ba34d2780f57ea43f81810cd695aace7b55c0f29

Andrii Anisov (2):
  xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
  xen: implement VCPUOP_register_runstate_phys_memory_area

 xen/arch/arm/domain.c        | 59 +++++++++++++++++++--------
 xen/arch/x86/domain.c        | 87 ++++++++++++++++++++++++++++------------
 xen/common/domain.c          | 95 +++++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-arm/domain.h |  2 +
 xen/include/public/vcpu.h    | 16 ++++++++
 xen/include/xen/domain.h     |  5 +++
 xen/include/xen/sched.h      |  7 ++++
 7 files changed, 227 insertions(+), 44 deletions(-)

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/2 for-4.12] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
  2019-03-05 13:14 [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Andrii Anisov
@ 2019-03-05 13:14 ` Andrii Anisov
  2019-03-14  8:45   ` Jan Beulich
  2019-03-05 13:14 ` [PATCH 2/2 for-4.12] xen: implement VCPUOP_register_runstate_phys_memory_area Andrii Anisov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Andrii Anisov @ 2019-03-05 13:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, xen-devel, Wei Liu,
	Roger Pau Monné

From: Andrii Anisov <andrii_anisov@epam.com>

The hypercall employs the same vcpu_register_runstate_memory_area
structure for the interface, but requires registered area to not
cross a page boundary.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/domain.c     | 39 ++++++++++++++++++---------------
 xen/arch/x86/domain.c     | 55 ++++++++++++++++++++++++++---------------------
 xen/common/domain.c       |  7 ++++++
 xen/include/public/vcpu.h | 16 ++++++++++++++
 xen/include/xen/sched.h   |  7 ++++++
 5 files changed, 83 insertions(+), 41 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 6dc633e..ec9bdbd 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -277,29 +277,33 @@ static void ctxt_switch_to(struct vcpu *n)
 /* Update per-VCPU guest runstate shared memory area (if registered). */
 static void update_runstate_area(struct vcpu *v)
 {
-    void __user *guest_handle = NULL;
-
     if ( guest_handle_is_null(runstate_guest(v)) )
         return;
 
-    if ( VM_ASSIST(v->domain, runstate_update_flag) )
+    if ( v->runstate_guest_type == RUNSTATE_VADDR )
     {
-        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
-        guest_handle--;
-        v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
-        smp_wmb();
-    }
+        void __user *guest_handle = NULL;
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            guest_handle = &v->runstate_guest.p->state_entry_time + 1;
+            guest_handle--;
+            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+            __raw_copy_to_guest(guest_handle,
+                                (void *)(&v->runstate.state_entry_time + 1) - 1,
+                                1);
+            smp_wmb();
+        }
 
-    __copy_to_guest(runstate_guest(v), &v->runstate, 1);
+        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
 
-    if ( guest_handle )
-    {
-        v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
-        smp_wmb();
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
+        if ( guest_handle )
+        {
+            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+            __raw_copy_to_guest(guest_handle,
+                                (void *)(&v->runstate.state_entry_time + 1) - 1,
+                                1);
+        }
     }
 }
 
@@ -998,6 +1002,7 @@ long do_arm_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) a
     {
         case VCPUOP_register_vcpu_info:
         case VCPUOP_register_runstate_memory_area:
+        case VCPUOP_register_runstate_phys_memory_area:
             return do_vcpu_op(cmd, vcpuid, arg);
         default:
             return -EINVAL;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b5febd6..2acffba 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1614,36 +1614,43 @@ bool update_runstate_area(struct vcpu *v)
 
     update_guest_memory_policy(v, &policy);
 
-    if ( VM_ASSIST(v->domain, runstate_update_flag) )
+    if ( v->runstate_guest_type == RUNSTATE_VADDR )
     {
-        guest_handle = has_32bit_shinfo(v->domain)
-            ? &v->runstate_guest.compat.p->state_entry_time + 1
-            : &v->runstate_guest.native.p->state_entry_time + 1;
-        guest_handle--;
-        v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
-        smp_wmb();
-    }
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            guest_handle = has_32bit_shinfo(v->domain)
+                ? &v->runstate_guest.compat.p->state_entry_time + 1
+                : &v->runstate_guest.native.p->state_entry_time + 1;
+            guest_handle--;
+            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+            __raw_copy_to_guest(guest_handle,
+                                (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
+            smp_wmb();
+        }
 
-    if ( has_32bit_shinfo(v->domain) )
-    {
-        struct compat_vcpu_runstate_info info;
+        if ( has_32bit_shinfo(v->domain) )
+        {
+            struct compat_vcpu_runstate_info info;
 
-        XLAT_vcpu_runstate_info(&info, &v->runstate);
-        __copy_to_guest(v->runstate_guest.compat, &info, 1);
-        rc = true;
+            XLAT_vcpu_runstate_info(&info, &v->runstate);
+            __copy_to_guest(v->runstate_guest.compat, &info, 1);
+            rc = true;
+        }
+        else
+            rc = __copy_to_guest(runstate_guest(v), &v->runstate, 1) !=
+                 sizeof(v->runstate);
+
+        if ( guest_handle )
+        {
+            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+            __raw_copy_to_guest(guest_handle,
+                                (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
+        }
     }
     else
-        rc = __copy_to_guest(runstate_guest(v), &v->runstate, 1) !=
-             sizeof(v->runstate);
-
-    if ( guest_handle )
     {
-        v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
-        smp_wmb();
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
+        rc = true;
     }
 
     update_guest_memory_policy(v, &policy);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 32bca8d..2c83ede 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1521,6 +1521,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         rc = 0;
         runstate_guest(v) = area.addr.h;
+        v->runstate_guest_type = RUNSTATE_VADDR;
 
         if ( v == current )
         {
@@ -1535,6 +1536,12 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case VCPUOP_register_runstate_phys_memory_area:
+    {
+        rc = -ENOSYS;
+        break;
+    }
+
 #ifdef VCPU_TRAP_NMI
     case VCPUOP_send_nmi:
         if ( !guest_handle_is_null(arg) )
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index 3623af9..0722892 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -235,6 +235,22 @@ struct vcpu_register_time_memory_area {
 typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
 
+/*
+ * Register a shared memory area from which the guest may obtain its own
+ * runstate information without needing to execute a hypercall.
+ * Notes:
+ *  1. The registered address must be guest's physical address.
+ *  2. The registered runstate area should not cross page boundary.
+ *  3. Only one shared area may be registered per VCPU. The shared area is
+ *     updated by the hypervisor each time the VCPU is scheduled. Thus
+ *     runstate.state will always be RUNSTATE_running and
+ *     runstate.state_entry_time will indicate the system time at which the
+ *     VCPU was last scheduled to run.
+ * @extra_arg == pointer to vcpu_register_runstate_memory_area structure.
+ */
+#define VCPUOP_register_runstate_phys_memory_area 14
+
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 
 /*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index edee52d..9ea76ec 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -163,6 +163,13 @@ struct vcpu
     void            *sched_priv;    /* scheduler-specific data */
 
     struct vcpu_runstate_info runstate;
+
+    enum {
+        RUNSTATE_NONE = 0,
+        RUNSTATE_PADDR = 1,
+        RUNSTATE_VADDR = 2,
+    } runstate_guest_type;
+
 #ifndef CONFIG_COMPAT
 # define runstate_guest(v) ((v)->runstate_guest)
     XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/2 for-4.12] xen: implement VCPUOP_register_runstate_phys_memory_area
  2019-03-05 13:14 [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Andrii Anisov
  2019-03-05 13:14 ` [PATCH 1/2 for-4.12] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall Andrii Anisov
@ 2019-03-05 13:14 ` Andrii Anisov
  2019-03-14  9:05   ` Jan Beulich
  2019-03-05 13:20 ` [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Juergen Gross
  2019-03-05 13:56 ` Julien Grall
  3 siblings, 1 reply; 31+ messages in thread
From: Andrii Anisov @ 2019-03-05 13:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, xen-devel, Wei Liu,
	Roger Pau Monné

From: Andrii Anisov <andrii_anisov@epam.com>

VCPUOP_register_runstate_phys_memory_area is implemented via runstate
area mapping.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/domain.c        | 22 ++++++++++-
 xen/arch/x86/domain.c        | 34 ++++++++++++++--
 xen/common/domain.c          | 92 ++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/domain.h |  2 +
 xen/include/xen/domain.h     |  5 +++
 5 files changed, 148 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index ec9bdbd..afc2d48 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -275,7 +275,7 @@ static void ctxt_switch_to(struct vcpu *n)
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
-static void update_runstate_area(struct vcpu *v)
+void update_runstate_area(struct vcpu *v)
 {
     if ( guest_handle_is_null(runstate_guest(v)) )
         return;
@@ -305,6 +305,26 @@ static void update_runstate_area(struct vcpu *v)
                                 1);
         }
     }
+    else if ( v->runstate_guest_type == RUNSTATE_PADDR )
+    {
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            runstate_guest(v).p->state_entry_time |= XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+        }
+
+        memcpy(runstate_guest(v).p, &v->runstate, sizeof(v->runstate));
+
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            runstate_guest(v).p->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+        }
+    }
+    else
+    { /* No actions required */ }
 }
 
 static void schedule_tail(struct vcpu *prev)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 2acffba..6598bbb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1605,7 +1605,7 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
 /* Update per-VCPU guest runstate shared memory area (if registered). */
 bool update_runstate_area(struct vcpu *v)
 {
-    bool rc;
+    bool rc = true;
     struct guest_memory_policy policy = { .nested_guest_mode = false };
     void __user *guest_handle = NULL;
 
@@ -1648,9 +1648,37 @@ bool update_runstate_area(struct vcpu *v)
                                 (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
         }
     }
-    else
+    else if ( v->runstate_guest_type == RUNSTATE_PADDR )
     {
-        rc = true;
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+            if ( has_32bit_shinfo((v)->domain) )
+                v->runstate_guest.compat.p->state_entry_time |= XEN_RUNSTATE_UPDATE;
+            else
+                runstate_guest(v).p->state_entry_time |= XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+        }
+
+        if ( has_32bit_shinfo(v->domain) )
+        {
+            struct compat_vcpu_runstate_info info;
+
+            XLAT_vcpu_runstate_info(&info, &v->runstate);
+            memcpy(v->runstate_guest.compat.p, &info, sizeof(info));
+        }
+        else
+            memcpy(runstate_guest(v).p, &v->runstate, sizeof(v->runstate));
+
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            if ( has_32bit_shinfo((v)->domain) )
+                v->runstate_guest.compat.p->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            else
+                runstate_guest(v).p->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+        }
     }
 
     update_guest_memory_policy(v, &policy);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2c83ede..cb9c788 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -738,7 +738,14 @@ int domain_kill(struct domain *d)
         if ( cpupool_move_domain(d, cpupool0) )
             return -ERESTART;
         for_each_vcpu ( d, v )
+        {
+            if ( v->runstate_guest_type == RUNSTATE_VADDR )
+                set_xen_guest_handle(runstate_guest(v), NULL);
+            else
+                unmap_runstate_area(v);
+
             unmap_vcpu_info(v);
+        }
         d->is_dying = DOMDYING_dead;
         /* Mem event cleanup has to go here because the rings 
          * have to be put before we call put_domain. */
@@ -1192,7 +1199,11 @@ int domain_soft_reset(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        set_xen_guest_handle(runstate_guest(v), NULL);
+        if ( v->runstate_guest_type == RUNSTATE_VADDR )
+            set_xen_guest_handle(runstate_guest(v), NULL);
+        else
+            unmap_runstate_area(v);
+
         unmap_vcpu_info(v);
     }
 
@@ -1333,6 +1344,65 @@ void unmap_vcpu_info(struct vcpu *v)
     put_page_and_type(mfn_to_page(mfn));
 }
 
+int map_runstate_area(struct vcpu *v,
+                      struct vcpu_register_runstate_memory_area *area)
+{
+    unsigned long offset = area->addr.p & ~PAGE_MASK;
+    gfn_t gfn = gaddr_to_gfn(area->addr.p);
+    struct domain *d = v->domain;
+    void *mapping;
+    struct page_info *page;
+    size_t size = sizeof (struct vcpu_runstate_info );
+
+    ASSERT(v->runstate_guest_type == RUNSTATE_PADDR );
+
+    if ( offset > (PAGE_SIZE - size) )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gfn_x(gfn), NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    mapping = __map_domain_page_global(page);
+
+    if ( mapping == NULL )
+    {
+        put_page_and_type(page);
+        return -ENOMEM;
+    }
+
+    runstate_guest(v).p = mapping + offset;
+
+    return 0;
+}
+
+void unmap_runstate_area(struct vcpu *v)
+{
+    mfn_t mfn;
+
+    if ( v->runstate_guest_type != RUNSTATE_PADDR )
+        return;
+
+    if ( guest_handle_is_null(runstate_guest(v)) )
+        return;
+
+    mfn = _mfn(virt_to_mfn(runstate_guest(v).p));
+
+    unmap_domain_page_global((void *)
+                             ((unsigned long)runstate_guest(v).p &
+                              PAGE_MASK));
+
+    v->runstate_guest_type = RUNSTATE_NONE;
+    runstate_guest(v).p = NULL;
+    put_page_and_type(mfn_to_page(mfn));
+}
+
 int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct vcpu_guest_context *ctxt;
@@ -1532,13 +1602,29 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
             vcpu_runstate_get(v, &runstate);
             __copy_to_guest(runstate_guest(v), &runstate, 1);
         }
-
         break;
     }
 
     case VCPUOP_register_runstate_phys_memory_area:
     {
-        rc = -ENOSYS;
+        struct vcpu_register_runstate_memory_area area;
+
+        rc = -EFAULT;
+        if ( copy_from_guest(&area, arg, 1) )
+            break;
+
+        unmap_runstate_area(v);
+        v->runstate_guest_type = RUNSTATE_PADDR;
+        rc = map_runstate_area(v, &area);
+
+        if ( rc )
+        {
+            v->runstate_guest_type = RUNSTATE_NONE;
+            break;
+        }
+
+        update_runstate_area(v);
+
         break;
     }
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 312fec8..3fb6ea2 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -217,6 +217,8 @@ void vcpu_show_execution_state(struct vcpu *);
 void vcpu_show_registers(const struct vcpu *);
 void vcpu_switch_to_aarch64_mode(struct vcpu *);
 
+void update_runstate_area(struct vcpu *);
+
 /*
  * Due to the restriction of GICv3, the number of vCPUs in AFF0 is
  * limited to 16, thus only the first 4 bits of AFF0 are legal. We will
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index d1bfc82..090a54d 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -118,4 +118,9 @@ struct vnuma_info {
 
 void vnuma_destroy(struct vnuma_info *vnuma);
 
+struct vcpu_register_runstate_memory_area;
+int map_runstate_area(struct vcpu *v,
+                      struct vcpu_register_runstate_memory_area *area);
+void unmap_runstate_area(struct vcpu *v);
+
 #endif /* __XEN_DOMAIN_H__ */
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:14 [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Andrii Anisov
  2019-03-05 13:14 ` [PATCH 1/2 for-4.12] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall Andrii Anisov
  2019-03-05 13:14 ` [PATCH 2/2 for-4.12] xen: implement VCPUOP_register_runstate_phys_memory_area Andrii Anisov
@ 2019-03-05 13:20 ` Juergen Gross
  2019-03-05 13:32   ` Andrii Anisov
  2019-03-05 13:56 ` Julien Grall
  3 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2019-03-05 13:20 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Julien Grall, Andrii Anisov, Roger Pau Monné

On 05/03/2019 14:14, Andrii Anisov wrote:
> From: Andrii Anisov <andrii_anisov@epam.com>
> 
> Following discussion [1] it is introduced and implemented a runstate
> registration interface which uses guest's phys address instead of a virtual one.
> The new hypercall employes the same data structures as a predecessor, but
> expects the vcpu_runstate_info structure to not cross a page boundary.
> The interface is implemented in a way vcpu_runstate_info structure is mapped to
> the hypervisor on the hypercall processing and is directly accessed during its
> updates. This runstate area mapping follows vcpu_info structure registration.
> 
> The series is tested for ARM64. Build tested for x86. I'd appreciate if someone could
> check it with x86.
> The Linux kernel patch is here [2]. Though it is for 4.14.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html
> [2] https://github.com/aanisov/linux/commit/ba34d2780f57ea43f81810cd695aace7b55c0f29
> 
> Andrii Anisov (2):
>   xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
>   xen: implement VCPUOP_register_runstate_phys_memory_area
> 
>  xen/arch/arm/domain.c        | 59 +++++++++++++++++++--------
>  xen/arch/x86/domain.c        | 87 ++++++++++++++++++++++++++++------------
>  xen/common/domain.c          | 95 +++++++++++++++++++++++++++++++++++++++++++-
>  xen/include/asm-arm/domain.h |  2 +
>  xen/include/public/vcpu.h    | 16 ++++++++
>  xen/include/xen/domain.h     |  5 +++
>  xen/include/xen/sched.h      |  7 ++++
>  7 files changed, 227 insertions(+), 44 deletions(-)
> 

No new features for 4.12. This series will have to wait until 4.13.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:20 ` [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Juergen Gross
@ 2019-03-05 13:32   ` Andrii Anisov
  2019-03-05 13:39     ` Julien Grall
  2019-03-05 13:44     ` Juergen Gross
  0 siblings, 2 replies; 31+ messages in thread
From: Andrii Anisov @ 2019-03-05 13:32 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Julien Grall, Andrii Anisov, Roger Pau Monné

Hello Juergen,

On 05.03.19 15:20, Juergen Gross wrote:
> No new features for 4.12. This series will have to wait until 4.13.

This is rather a complex fix for [1].

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02379.html

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:32   ` Andrii Anisov
@ 2019-03-05 13:39     ` Julien Grall
  2019-03-05 14:11       ` Andrii Anisov
  2019-03-05 13:44     ` Juergen Gross
  1 sibling, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-05 13:39 UTC (permalink / raw)
  To: Andrii Anisov, Juergen Gross, xen-devel
  Cc: Andrii Anisov, Roger Pau Monné

On 3/5/19 1:32 PM, Andrii Anisov wrote:
> Hello Juergen,

Hi,

> On 05.03.19 15:20, Juergen Gross wrote:
>> No new features for 4.12. This series will have to wait until 4.13.
> 
> This is rather a complex fix for [1].

I will back Juergen here. As I said in that thread, I don't consider it 
as a critical issue because of the type of guest we support. This is 
just an annoyance in debug build because of the number of message printed.

Cheers,

> 
> [1] 
> https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02379.html
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:32   ` Andrii Anisov
  2019-03-05 13:39     ` Julien Grall
@ 2019-03-05 13:44     ` Juergen Gross
  2019-03-05 13:50       ` Andrii Anisov
  1 sibling, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2019-03-05 13:44 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Julien Grall, Andrii Anisov, Roger Pau Monné

On 05/03/2019 14:32, Andrii Anisov wrote:
> Hello Juergen,
> 
> On 05.03.19 15:20, Juergen Gross wrote:
>> No new features for 4.12. This series will have to wait until 4.13.
> 
> This is rather a complex fix for [1].
> 
> [1]
> https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02379.html
> 

With the problem existing for about 1 year now I still don't see the
urgency to rush it into 4.12 "at the last minute".

And as you are saying: this is a rather complex fix. I'd like to have
only one further RC before branching off 4.12, so I'd like to see this
series to go in later.

It might still be considered for 4.12.1 (you'd have to negotiate that
with the stable maintainer, of course).


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:44     ` Juergen Gross
@ 2019-03-05 13:50       ` Andrii Anisov
  0 siblings, 0 replies; 31+ messages in thread
From: Andrii Anisov @ 2019-03-05 13:50 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Julien Grall, Andrii Anisov, Roger Pau Monné



On 05.03.19 15:44, Juergen Gross wrote:
> On 05/03/2019 14:32, Andrii Anisov wrote:
>> Hello Juergen,
>>
>> On 05.03.19 15:20, Juergen Gross wrote:
>>> No new features for 4.12. This series will have to wait until 4.13.
>>
>> This is rather a complex fix for [1].
>>
>> [1]
>> https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02379.html
>>
> 
> With the problem existing for about 1 year now I still don't see the
> urgency to rush it into 4.12 "at the last minute".
I hope, that problem will be listed in the release notes as a known issue.

> 
> And as you are saying: this is a rather complex fix. I'd like to have
> only one further RC before branching off 4.12, so I'd like to see this
> series to go in later.
> 
> It might still be considered for 4.12.1 (you'd have to negotiate that
> with the stable maintainer, of course).

Should I send the series without for-4.12 mark?

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:14 [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Andrii Anisov
                   ` (2 preceding siblings ...)
  2019-03-05 13:20 ` [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Juergen Gross
@ 2019-03-05 13:56 ` Julien Grall
  2019-03-07 13:01   ` Andrii Anisov
  3 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-05 13:56 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Juergen Gross, Andrii Anisov, Roger Pau Monné

Hi,

On 3/5/19 1:14 PM, Andrii Anisov wrote:
> The interface is implemented in a way vcpu_runstate_info structure is mapped to
> the hypervisor on the hypercall processing and is directly accessed during its
> updates.

I had some concern about this solution in [1] that have not been 
addressed nor even mentioned in the series.

This is not the first time this is happening, so I am going to ignore 
this series until you finally address the problems.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:39     ` Julien Grall
@ 2019-03-05 14:11       ` Andrii Anisov
  2019-03-05 14:30         ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Andrii Anisov @ 2019-03-05 14:11 UTC (permalink / raw)
  To: Julien Grall, Juergen Gross, xen-devel
  Cc: Andrii Anisov, Roger Pau Monné

Hello Julien,

On 05.03.19 15:39, Julien Grall wrote:
> This is just an annoyance in debug build because of the number of message printed.
It is not an annoyance, but inaccurate runstate info passed (actually not passed) to KPTI enabled guests.
Debug build only makes the problem visible.


-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 14:11       ` Andrii Anisov
@ 2019-03-05 14:30         ` Julien Grall
  2019-03-07 13:07           ` Andrii Anisov
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-05 14:30 UTC (permalink / raw)
  To: Andrii Anisov, Juergen Gross, xen-devel
  Cc: Andrii Anisov, Roger Pau Monné

Hi,

On 3/5/19 2:11 PM, Andrii Anisov wrote:
> Hello Julien,
> 
> On 05.03.19 15:39, Julien Grall wrote:
>> This is just an annoyance in debug build because of the number of 
>> message printed.
> It is not an annoyance, but inaccurate runstate info passed (actually 
> not passed) to KPTI enabled guests.
The runstate is actually updated just not for the guest. It will be done 
at the next context switch. But I am not convinced you will actually see 
a major differences in number here. Do you have figures?

Lastly, as you can see, I don't have the same opinions as you for the 
issues. It may have helped if you provided a descriptive cover letter 
explaining the issues rather than assuming we are all on the same page.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 13:56 ` Julien Grall
@ 2019-03-07 13:01   ` Andrii Anisov
  2019-03-07 14:02     ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Andrii Anisov @ 2019-03-07 13:01 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Juergen Gross, Andrii Anisov, Roger Pau Monné

Dear Julien,

On 05.03.19 15:56, Julien Grall wrote:
> I had some concern about this solution in [1] that have not been addressed nor even mentioned in the series.

You missed the link.
So I assume you say about you preferences to not have runstate area mapped because of consuming vmap space for arm64. Also, along that thread you mentioned you that guest might change gva mapping, what is irrelevant to registration with physical address.

My reasons to have that runstate mapped are following:
  - Introducing the new interface we are not burden with legacy, so in charge to impose requirements. In this case - to have runstate area not crossing a page boundary
  - Global mapping used here does not consume vmap space on arm64. It seems to me x86 guys are ok with mapping as well, at least Roger suggested it from the beginning. So it should be ok for them as well.
  - In case domain is mapping runstate with physical address, it can not change the mapping.
  - I suppose, normally, runstate area would be registered once per vcpu. While it is updated each context switch, I do estimate it more optimal to have it mapped once on registration. This reduces two mappings per context switch in the worst case (one for prev vcpu, one for the next).
  - IMHO, this implementation is simpler and cleaner than what I have for runstate mapping on access.

So, taking in account all above, I really do not see, how runstate area registered with physical address differs from vcpu_info and does not deserve to be permanently mapped.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-05 14:30         ` Julien Grall
@ 2019-03-07 13:07           ` Andrii Anisov
  0 siblings, 0 replies; 31+ messages in thread
From: Andrii Anisov @ 2019-03-07 13:07 UTC (permalink / raw)
  To: Julien Grall, Juergen Gross, xen-devel
  Cc: Andrii Anisov, Roger Pau Monné



On 05.03.19 16:30, Julien Grall wrote:
> On 3/5/19 2:11 PM, Andrii Anisov wrote:
>> It is not an annoyance, but inaccurate runstate info passed (actually not passed) to KPTI enabled guests.
> The runstate is actually updated just not for the guest.
That is what I said.

> It will be done at the next context switch.
It *may* be done at one of the next context switches.

> But I am not convinced you will actually see a major differences in number here. Do you have figures?
No, no figures to measure inaccuracy. Yet its existence is obvious.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 13:01   ` Andrii Anisov
@ 2019-03-07 14:02     ` Julien Grall
  2019-03-07 14:34       ` Andrii Anisov
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-07 14:02 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Juergen Gross, Andrii Anisov, Roger Pau Monné

On 07/03/2019 13:01, Andrii Anisov wrote:
> Dear Julien,

Hello,

> 
> On 05.03.19 15:56, Julien Grall wrote:
>> I had some concern about this solution in [1] that have not been addressed nor 
>> even mentioned in the series.
> 
> You missed the link.

I was referring to your [1].

> So I assume you say about you preferences to not have runstate area mapped 
> because of consuming vmap space for arm64. Also, along that thread you mentioned 
> you that guest might change gva mapping, what is irrelevant to registration with 
> physical address.
> 
> My reasons to have that runstate mapped are following:
>   - Introducing the new interface we are not burden with legacy, so in charge to 
> impose requirements. In this case - to have runstate area not crossing a page 
> boundary
>   - Global mapping used here does not consume vmap space on arm64. It seems to 
> me x86 guys are ok with mapping as well, at least Roger suggested it from the 
> beginning. So it should be ok for them as well.
You left arm32 out of your equations here...

>   - In case domain is mapping runstate with physical address, it can not change 
> the mapping.

This is not entirely correct. The domain can not change the mapping under our 
feet, but it can still change via the hypercall. There are nothing preventing 
that with current hypercall and the one your propose.

AFAICT, the vCPU operation can happen from a different vCPU. So now you need to 
add some kind of synchronization to avoid the page disappearing under your feet 
if the mapping is replaced from another vCPU. The risk is to write to a page the 
has been allocated for another purpose and corrupt it.

I am not entirely sure we want to deny the call been done twice as this may have 
an impact on the way the OS is written. So the only way would be to introduce 
either reference counting or lock (if the former is not possible).

This is not an issue with the current code because we only store the guest 
virtual address. The worst thing that can happen is we are writing to the wrong 
guest virtual address, but at that point this is a guest problem. Although we 
may want to use ACCESS_ONCE() in both load and store to ensure atomicity.

>   - I suppose, normally, runstate area would be registered once per vcpu. While 
> it is updated each context switch, I do estimate it more optimal to have it 
> mapped once on registration. This reduces two mappings per context switch in the 
> worst case (one for prev vcpu, one for the next).

Well the number you showed in the other thread didn't show any improvement at 
all... So please explain why we should call map_domain_page_global() here and 
using more vmap on arm32.

>   - IMHO, this implementation is simpler and cleaner than what I have for 
> runstate mapping on access.

Did you implement it using access_guest_memory_by_ipa?

But I don't think the implementation you suggest will be that simpler once you 
deal with the problem above.

> 
> So, taking in account all above, I really do not see, how runstate area 
> registered with physical address differs from vcpu_info and does not deserve to 
> be permanently mapped.

As I said to the other thread, we should switch all the hypercalls to use 
physical address. However, I am not convinced that keep the runstate mapped in 
Xen is a good idea.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 14:02     ` Julien Grall
@ 2019-03-07 14:34       ` Andrii Anisov
  2019-03-07 15:17         ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Andrii Anisov @ 2019-03-07 14:34 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Juergen Gross, Andrii Anisov, Roger Pau Monné



On 07.03.19 16:02, Julien Grall wrote:
>> So I assume you say about you preferences to not have runstate area mapped because of consuming vmap space for arm64. Also, along that thread you mentioned you that guest might change gva mapping, what is irrelevant to registration with physical address.
>>
>> My reasons to have that runstate mapped are following:
>>   - Introducing the new interface we are not burden with legacy, so in charge to impose requirements. In this case - to have runstate area not crossing a page boundary
>>   - Global mapping used here does not consume vmap space on arm64. It seems to me x86 guys are ok with mapping as well, at least Roger suggested it from the beginning. So it should be ok for them as well.
> You left arm32 out of your equations here...
Yes, I left arm32 aside.

>>   - In case domain is mapping runstate with physical address, it can not change the mapping.
> 
> This is not entirely correct. The domain can not change the mapping under our feet, but it can still change via the hypercall. There are nothing preventing that with current hypercall and the one your propose.
Could you please describe the scenario with more details and the interface used for it?
Also vcpu_info needs protections from it. Do you agree?

> Well the number you showed in the other thread didn't show any improvement at all... So please explain why we should call map_domain_page_global() here and using more vmap on arm32
I'm not expecting vmap might be a practical problem for arm32 based system.
With the current implementation numbers are equal to those I have for runstate mapping on access.
But I'm not sure my test setup able to distinguish the difference.

>>   - IMHO, this implementation is simpler and cleaner than what I have for runstate mapping on access.
> 
> Did you implement it using access_guest_memory_by_ipa?
Not exactly, access_guest_memory_by_ipa() has no implementation for x86. But it is made around that code.
  
> But I don't think the implementation you suggest will be that simpler once you deal with the problem above.I missed that problem. Will look at it.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 14:34       ` Andrii Anisov
@ 2019-03-07 15:17         ` Julien Grall
  2019-03-07 15:20           ` Julien Grall
                             ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Julien Grall @ 2019-03-07 15:17 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov, Roger Pau Monné

Hi Andrii,

On 07/03/2019 14:34, Andrii Anisov wrote:
> On 07.03.19 16:02, Julien Grall wrote:
>>> So I assume you say about you preferences to not have runstate area mapped 
>>> because of consuming vmap space for arm64. Also, along that thread you 
>>> mentioned you that guest might change gva mapping, what is irrelevant to 
>>> registration with physical address.
>>>
>>> My reasons to have that runstate mapped are following:
>>>   - Introducing the new interface we are not burden with legacy, so in charge 
>>> to impose requirements. In this case - to have runstate area not crossing a 
>>> page boundary
>>>   - Global mapping used here does not consume vmap space on arm64. It seems 
>>> to me x86 guys are ok with mapping as well, at least Roger suggested it from 
>>> the beginning. So it should be ok for them as well.
>> You left arm32 out of your equations here...
> Yes, I left arm32 aside.

Why? Arm32 is as equally supported as Arm64.

> 
>>>   - In case domain is mapping runstate with physical address, it can not 
>>> change the mapping.
>>
>> This is not entirely correct. The domain can not change the mapping under our 
>> feet, but it can still change via the hypercall. There are nothing preventing 
>> that with current hypercall and the one your propose.
> Could you please describe the scenario with more details and the interface used 
> for it?

What scenario? You just need to read the implementation of the current hypercall 
and see there are nothing preventing the call to be done twice.

When you are designing a new hypercall you have to think how a guest can misuse 
it (yes I said misuse not use!). Think about a guest with two vCPUs. vCPU A is 
constantly modifying the runstate for vCPU B. What could happen if the 
hypervisor is in the middle of context switch vCPU B?

As I pointed

> Also vcpu_info needs protections from it. Do you agree?

vcpu_info cannot be called twice thanks to the following check in map_vcpu_info:
     if ( !mfn_eq(v->vcpu_info_mfn, INVALID_MFN) )
         return -EINVAL;

> 
>> Well the number you showed in the other thread didn't show any improvement at 
>> all... So please explain why we should call map_domain_page_global() here and 
>> using more vmap on arm32
> I'm not expecting vmap might be a practical problem for arm32 based system.

Well vmap is quite small on Arm. So why should we use more of it if...

> With the current implementation numbers are equal to those I have for runstate 
> mapping on access.

it does not make an real improvement to the context switch? But I recall you 
said the interrupt latency were worst with keeping the runstate mapped (7000ns 
vs 7900ns). You also saw a performance drop when using glmark2 benchmark. So how 
come you can say they are equal? What did change?

In other words, I don't want to keep things mapped in Xen if we can achieve 
similar performance with "mapping"/"unmapping" at context switch.

> But I'm not sure my test setup able to distinguish the difference.

Well, there are a lot of stuff happening during context switch. So the benefit 
of keeping the mapping is probably lost in the noise.

> 
>>>   - IMHO, this implementation is simpler and cleaner than what I have for 
>>> runstate mapping on access.
>>
>> Did you implement it using access_guest_memory_by_ipa?
> Not exactly, access_guest_memory_by_ipa() has no implementation for x86. But it 
> is made around that code.

For the HVM, the equivalent function is hvm_copy_to_guest_phys. I don't know 
what would be the interface for PV. Roger, any idea?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 15:17         ` Julien Grall
@ 2019-03-07 15:20           ` Julien Grall
  2019-03-07 16:16           ` Roger Pau Monné
  2019-03-18 11:31           ` Andrii Anisov
  2 siblings, 0 replies; 31+ messages in thread
From: Julien Grall @ 2019-03-07 15:20 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov, Roger Pau Monné



On 07/03/2019 15:17, Julien Grall wrote:
> Hi Andrii,
> 
> On 07/03/2019 14:34, Andrii Anisov wrote:
>> On 07.03.19 16:02, Julien Grall wrote:
>>>> So I assume you say about you preferences to not have runstate area mapped 
>>>> because of consuming vmap space for arm64. Also, along that thread you 
>>>> mentioned you that guest might change gva mapping, what is irrelevant to 
>>>> registration with physical address.
>>>>
>>>> My reasons to have that runstate mapped are following:
>>>>   - Introducing the new interface we are not burden with legacy, so in 
>>>> charge to impose requirements. In this case - to have runstate area not 
>>>> crossing a page boundary
>>>>   - Global mapping used here does not consume vmap space on arm64. It seems 
>>>> to me x86 guys are ok with mapping as well, at least Roger suggested it from 
>>>> the beginning. So it should be ok for them as well.
>>> You left arm32 out of your equations here...
>> Yes, I left arm32 aside.
> 
> Why? Arm32 is as equally supported as Arm64.
> 
>>
>>>>   - In case domain is mapping runstate with physical address, it can not 
>>>> change the mapping.
>>>
>>> This is not entirely correct. The domain can not change the mapping under our 
>>> feet, but it can still change via the hypercall. There are nothing preventing 
>>> that with current hypercall and the one your propose.
>> Could you please describe the scenario with more details and the interface 
>> used for it?
> 
> What scenario? You just need to read the implementation of the current hypercall 
> and see there are nothing preventing the call to be done twice.
> 
> When you are designing a new hypercall you have to think how a guest can misuse 
> it (yes I said misuse not use!). Think about a guest with two vCPUs. vCPU A is 
> constantly modifying the runstate for vCPU B. What could happen if the 
> hypervisor is in the middle of context switch vCPU B?
> 
> As I pointed

Hmmm this is a left-over of a sentence I was thinking to add but dropped.

Another use case to think about is multiple vCPUs trying to register a runstate 
concurrently for the same vCPU.

> 
>> Also vcpu_info needs protections from it. Do you agree?
> 
> vcpu_info cannot be called twice thanks to the following check in map_vcpu_info:
>      if ( !mfn_eq(v->vcpu_info_mfn, INVALID_MFN) )
>          return -EINVAL;
> 
>>
>>> Well the number you showed in the other thread didn't show any improvement at 
>>> all... So please explain why we should call map_domain_page_global() here and 
>>> using more vmap on arm32
>> I'm not expecting vmap might be a practical problem for arm32 based system.
> 
> Well vmap is quite small on Arm. So why should we use more of it if...
> 
>> With the current implementation numbers are equal to those I have for runstate 
>> mapping on access.
> 
> it does not make an real improvement to the context switch? But I recall you 
> said the interrupt latency were worst with keeping the runstate mapped (7000ns 
> vs 7900ns). You also saw a performance drop when using glmark2 benchmark. So how 
> come you can say they are equal? What did change?
> 
> In other words, I don't want to keep things mapped in Xen if we can achieve 
> similar performance with "mapping"/"unmapping" at context switch.
> 
>> But I'm not sure my test setup able to distinguish the difference.
> 
> Well, there are a lot of stuff happening during context switch. So the benefit 
> of keeping the mapping is probably lost in the noise.
> 
>>
>>>>   - IMHO, this implementation is simpler and cleaner than what I have for 
>>>> runstate mapping on access.
>>>
>>> Did you implement it using access_guest_memory_by_ipa?
>> Not exactly, access_guest_memory_by_ipa() has no implementation for x86. But 
>> it is made around that code.
> 
> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I don't know 
> what would be the interface for PV. Roger, any idea?
> 
> Cheers,
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 15:17         ` Julien Grall
  2019-03-07 15:20           ` Julien Grall
@ 2019-03-07 16:16           ` Roger Pau Monné
  2019-03-07 16:36             ` Julien Grall
  2019-03-18 11:31           ` Andrii Anisov
  2 siblings, 1 reply; 31+ messages in thread
From: Roger Pau Monné @ 2019-03-07 16:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, xen-devel, Andrii Anisov, Stefano Stabellini,
	Andrii Anisov

On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
> Hi Andrii,
> 
> On 07/03/2019 14:34, Andrii Anisov wrote:
> > On 07.03.19 16:02, Julien Grall wrote:
> > > >   - IMHO, this implementation is simpler and cleaner than what I
> > > > have for runstate mapping on access.
> > > 
> > > Did you implement it using access_guest_memory_by_ipa?
> > Not exactly, access_guest_memory_by_ipa() has no implementation for x86.
> > But it is made around that code.
> 
> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I don't know
> what would be the interface for PV. Roger, any idea?

For PV I think you will have to use get_page_from_gfn, check the
permissions, map it, write and unmap it. The same flow would also work
for HVM, so I'm not sure if there's much point in using
hvm_copy_to_guest_phys. Or you can implement a generic
copy_to_guest_phys helper that works for both PV and HVM.

Note that for translated guests you will have to walk the HAP page
tables for each vCPU for each context switch, which I think will be
expensive in terms of performance (I might be wrong however, since I
have no proof of this).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 16:16           ` Roger Pau Monné
@ 2019-03-07 16:36             ` Julien Grall
  2019-03-07 17:15               ` Roger Pau Monné
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-07 16:36 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Juergen Gross, xen-devel, Andrii Anisov, Stefano Stabellini,
	Andrii Anisov

Hi Roger,

Thank you for the answer.

On 07/03/2019 16:16, Roger Pau Monné wrote:
> On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
>> Hi Andrii,
>>
>> On 07/03/2019 14:34, Andrii Anisov wrote:
>>> On 07.03.19 16:02, Julien Grall wrote:
>>>>>    - IMHO, this implementation is simpler and cleaner than what I
>>>>> have for runstate mapping on access.
>>>>
>>>> Did you implement it using access_guest_memory_by_ipa?
>>> Not exactly, access_guest_memory_by_ipa() has no implementation for x86.
>>> But it is made around that code.
>>
>> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I don't know
>> what would be the interface for PV. Roger, any idea?
> 
> For PV I think you will have to use get_page_from_gfn, check the
> permissions, map it, write and unmap it. The same flow would also work
> for HVM, so I'm not sure if there's much point in using
> hvm_copy_to_guest_phys. Or you can implement a generic
> copy_to_guest_phys helper that works for both PV and HVM.
> 
> Note that for translated guests you will have to walk the HAP page
> tables for each vCPU for each context switch, which I think will be
> expensive in terms of performance (I might be wrong however, since I
> have no proof of this).

AFAICT, we already walk the page-table with the current implementation. So this 
should be no different here, except we will not need to walk the guest-PT here. No?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 16:36             ` Julien Grall
@ 2019-03-07 17:15               ` Roger Pau Monné
  2019-03-07 18:00                 ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Roger Pau Monné @ 2019-03-07 17:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, xen-devel, Andrii Anisov, Stefano Stabellini,
	Andrii Anisov

On Thu, Mar 07, 2019 at 04:36:59PM +0000, Julien Grall wrote:
> Hi Roger,
> 
> Thank you for the answer.
> 
> On 07/03/2019 16:16, Roger Pau Monné wrote:
> > On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
> > > Hi Andrii,
> > > 
> > > On 07/03/2019 14:34, Andrii Anisov wrote:
> > > > On 07.03.19 16:02, Julien Grall wrote:
> > > > > >    - IMHO, this implementation is simpler and cleaner than what I
> > > > > > have for runstate mapping on access.
> > > > > 
> > > > > Did you implement it using access_guest_memory_by_ipa?
> > > > Not exactly, access_guest_memory_by_ipa() has no implementation for x86.
> > > > But it is made around that code.
> > > 
> > > For the HVM, the equivalent function is hvm_copy_to_guest_phys. I don't know
> > > what would be the interface for PV. Roger, any idea?
> > 
> > For PV I think you will have to use get_page_from_gfn, check the
> > permissions, map it, write and unmap it. The same flow would also work
> > for HVM, so I'm not sure if there's much point in using
> > hvm_copy_to_guest_phys. Or you can implement a generic
> > copy_to_guest_phys helper that works for both PV and HVM.
> > 
> > Note that for translated guests you will have to walk the HAP page
> > tables for each vCPU for each context switch, which I think will be
> > expensive in terms of performance (I might be wrong however, since I
> > have no proof of this).
> 
> AFAICT, we already walk the page-table with the current implementation. So
> this should be no different here, except we will not need to walk the
> guest-PT here. No?

Yes, current implementation is even worse because it walks both the
guest page tables and the HAP page tables in the HVM case. It would be
interesting IMO if we could avoid walking any of those page tables.

I see you have concerns about permanently mapping the runstate area,
so I'm not going to oppose, albeit even with only 1G of VA space you
can map plenty of runstate areas, and taking into account this is
32bit hardware I'm not sure you will ever have that many vCPUs that
you will run out of VA space to map runstate areas.

That being said, if the implementation turns out to be more
complicated because of this permanent mapping, walking the guest HAP
page tables is certainly no worse than what's done ATM.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 17:15               ` Roger Pau Monné
@ 2019-03-07 18:00                 ` Julien Grall
  2019-03-08  6:28                   ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-07 18:00 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Juergen Gross, xen-devel, Andrii Anisov, Stefano Stabellini,
	Andrii Anisov

Hi Roger,

On 07/03/2019 17:15, Roger Pau Monné wrote:
> On Thu, Mar 07, 2019 at 04:36:59PM +0000, Julien Grall wrote:
>> Hi Roger,
>>
>> Thank you for the answer.
>>
>> On 07/03/2019 16:16, Roger Pau Monné wrote:
>>> On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
>>>> Hi Andrii,
>>>>
>>>> On 07/03/2019 14:34, Andrii Anisov wrote:
>>>>> On 07.03.19 16:02, Julien Grall wrote:
>>>>>>>     - IMHO, this implementation is simpler and cleaner than what I
>>>>>>> have for runstate mapping on access.
>>>>>>
>>>>>> Did you implement it using access_guest_memory_by_ipa?
>>>>> Not exactly, access_guest_memory_by_ipa() has no implementation for x86.
>>>>> But it is made around that code.
>>>>
>>>> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I don't know
>>>> what would be the interface for PV. Roger, any idea?
>>>
>>> For PV I think you will have to use get_page_from_gfn, check the
>>> permissions, map it, write and unmap it. The same flow would also work
>>> for HVM, so I'm not sure if there's much point in using
>>> hvm_copy_to_guest_phys. Or you can implement a generic
>>> copy_to_guest_phys helper that works for both PV and HVM.
>>>
>>> Note that for translated guests you will have to walk the HAP page
>>> tables for each vCPU for each context switch, which I think will be
>>> expensive in terms of performance (I might be wrong however, since I
>>> have no proof of this).
>>
>> AFAICT, we already walk the page-table with the current implementation. So
>> this should be no different here, except we will not need to walk the
>> guest-PT here. No?
> 
> Yes, current implementation is even worse because it walks both the
> guest page tables and the HAP page tables in the HVM case. It would be
> interesting IMO if we could avoid walking any of those page tables.
> 
> I see you have concerns about permanently mapping the runstate area,
> so I'm not going to oppose, albeit even with only 1G of VA space you
> can map plenty of runstate areas, and taking into account this is
> 32bit hardware I'm not sure you will ever have that many vCPUs that
> you will run out of VA space to map runstate areas.

Actually the vmap is only 768MB. The vmap is at the moment used for mapping:
	- MMIO devices (through ioremap)
         - event channel pages

As the runstate is far smaller than a page, this sounds like a waste of memory 
for a benefits that haven't not yet been shown. Indeed, number provided by 
Andrii either show worst performance or similar one.

But TBH, I am not expecting that a really clear performance improvement on Arm 
as there are a lot to do in the context switch.

> 
> That being said, if the implementation turns out to be more
> complicated because of this permanent mapping, walking the guest HAP
> page tables is certainly no worse than what's done ATM.

To be honest I am not fully against always mapping the runstate in Xen. But I 
need data to show this is worth it. So far, the performance promised are not 
there and the implementation is not foolproof yet.

If we want to keep the runstate mapped permanently, then we need to add either a 
lock or a refcounting. So the page does not disappear during context switch if 
we happen to update the runstate concurrently (via the hypercall).

This may increase the complexity of the implementation (not sure by how much 
thought).

Another solution is to prevent the runstate to be updated. But I think we will 
just add a bit more burden in the guest OS.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 18:00                 ` Julien Grall
@ 2019-03-08  6:28                   ` Juergen Gross
  2019-03-08 10:15                     ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2019-03-08  6:28 UTC (permalink / raw)
  To: Julien Grall, Roger Pau Monné
  Cc: Andrii Anisov, Stefano Stabellini, Andrii Anisov, xen-devel

On 07/03/2019 19:00, Julien Grall wrote:
> Hi Roger,
> 
> On 07/03/2019 17:15, Roger Pau Monné wrote:
>> On Thu, Mar 07, 2019 at 04:36:59PM +0000, Julien Grall wrote:
>>> Hi Roger,
>>>
>>> Thank you for the answer.
>>>
>>> On 07/03/2019 16:16, Roger Pau Monné wrote:
>>>> On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
>>>>> Hi Andrii,
>>>>>
>>>>> On 07/03/2019 14:34, Andrii Anisov wrote:
>>>>>> On 07.03.19 16:02, Julien Grall wrote:
>>>>>>>>     - IMHO, this implementation is simpler and cleaner than what I
>>>>>>>> have for runstate mapping on access.
>>>>>>>
>>>>>>> Did you implement it using access_guest_memory_by_ipa?
>>>>>> Not exactly, access_guest_memory_by_ipa() has no implementation
>>>>>> for x86.
>>>>>> But it is made around that code.
>>>>>
>>>>> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I
>>>>> don't know
>>>>> what would be the interface for PV. Roger, any idea?
>>>>
>>>> For PV I think you will have to use get_page_from_gfn, check the
>>>> permissions, map it, write and unmap it. The same flow would also work
>>>> for HVM, so I'm not sure if there's much point in using
>>>> hvm_copy_to_guest_phys. Or you can implement a generic
>>>> copy_to_guest_phys helper that works for both PV and HVM.
>>>>
>>>> Note that for translated guests you will have to walk the HAP page
>>>> tables for each vCPU for each context switch, which I think will be
>>>> expensive in terms of performance (I might be wrong however, since I
>>>> have no proof of this).
>>>
>>> AFAICT, we already walk the page-table with the current
>>> implementation. So
>>> this should be no different here, except we will not need to walk the
>>> guest-PT here. No?
>>
>> Yes, current implementation is even worse because it walks both the
>> guest page tables and the HAP page tables in the HVM case. It would be
>> interesting IMO if we could avoid walking any of those page tables.
>>
>> I see you have concerns about permanently mapping the runstate area,
>> so I'm not going to oppose, albeit even with only 1G of VA space you
>> can map plenty of runstate areas, and taking into account this is
>> 32bit hardware I'm not sure you will ever have that many vCPUs that
>> you will run out of VA space to map runstate areas.
> 
> Actually the vmap is only 768MB. The vmap is at the moment used for
> mapping:
>     - MMIO devices (through ioremap)
>         - event channel pages
> 
> As the runstate is far smaller than a page, this sounds like a waste of
> memory for a benefits that haven't not yet been shown. Indeed, number
> provided by Andrii either show worst performance or similar one.
> 
> But TBH, I am not expecting that a really clear performance improvement
> on Arm as there are a lot to do in the context switch.
> 
>>
>> That being said, if the implementation turns out to be more
>> complicated because of this permanent mapping, walking the guest HAP
>> page tables is certainly no worse than what's done ATM.
> 
> To be honest I am not fully against always mapping the runstate in Xen.
> But I need data to show this is worth it. So far, the performance
> promised are not there and the implementation is not foolproof yet.
> 
> If we want to keep the runstate mapped permanently, then we need to add
> either a lock or a refcounting. So the page does not disappear during
> context switch if we happen to update the runstate concurrently (via the
> hypercall).
> 
> This may increase the complexity of the implementation (not sure by how
> much thought).
> 
> Another solution is to prevent the runstate to be updated. But I think
> we will just add a bit more burden in the guest OS.

Not sure about other systems, but current Linux kernel registers the
runstate area for other cpus only if those are not up. So there is no
way the runstate of that foreign vcpu could be updated during
registering it. Of course this would need to be tested (e.g. -EBUSY
for registering runstate of an active foreign vcpu).


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-08  6:28                   ` Juergen Gross
@ 2019-03-08 10:15                     ` Julien Grall
  2019-03-08 10:18                       ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-08 10:15 UTC (permalink / raw)
  To: Juergen Gross, Roger Pau Monné
  Cc: Andrii Anisov, Stefano Stabellini, Andrii Anisov, xen-devel

Hi Juergen,

On 3/8/19 6:28 AM, Juergen Gross wrote:
> On 07/03/2019 19:00, Julien Grall wrote:
>> Hi Roger,
>>
>> On 07/03/2019 17:15, Roger Pau Monné wrote:
>>> On Thu, Mar 07, 2019 at 04:36:59PM +0000, Julien Grall wrote:
>>>> Hi Roger,
>>>>
>>>> Thank you for the answer.
>>>>
>>>> On 07/03/2019 16:16, Roger Pau Monné wrote:
>>>>> On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
>>>>>> Hi Andrii,
>>>>>>
>>>>>> On 07/03/2019 14:34, Andrii Anisov wrote:
>>>>>>> On 07.03.19 16:02, Julien Grall wrote:
>>>>>>>>>      - IMHO, this implementation is simpler and cleaner than what I
>>>>>>>>> have for runstate mapping on access.
>>>>>>>>
>>>>>>>> Did you implement it using access_guest_memory_by_ipa?
>>>>>>> Not exactly, access_guest_memory_by_ipa() has no implementation
>>>>>>> for x86.
>>>>>>> But it is made around that code.
>>>>>>
>>>>>> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I
>>>>>> don't know
>>>>>> what would be the interface for PV. Roger, any idea?
>>>>>
>>>>> For PV I think you will have to use get_page_from_gfn, check the
>>>>> permissions, map it, write and unmap it. The same flow would also work
>>>>> for HVM, so I'm not sure if there's much point in using
>>>>> hvm_copy_to_guest_phys. Or you can implement a generic
>>>>> copy_to_guest_phys helper that works for both PV and HVM.
>>>>>
>>>>> Note that for translated guests you will have to walk the HAP page
>>>>> tables for each vCPU for each context switch, which I think will be
>>>>> expensive in terms of performance (I might be wrong however, since I
>>>>> have no proof of this).
>>>>
>>>> AFAICT, we already walk the page-table with the current
>>>> implementation. So
>>>> this should be no different here, except we will not need to walk the
>>>> guest-PT here. No?
>>>
>>> Yes, current implementation is even worse because it walks both the
>>> guest page tables and the HAP page tables in the HVM case. It would be
>>> interesting IMO if we could avoid walking any of those page tables.
>>>
>>> I see you have concerns about permanently mapping the runstate area,
>>> so I'm not going to oppose, albeit even with only 1G of VA space you
>>> can map plenty of runstate areas, and taking into account this is
>>> 32bit hardware I'm not sure you will ever have that many vCPUs that
>>> you will run out of VA space to map runstate areas.
>>
>> Actually the vmap is only 768MB. The vmap is at the moment used for
>> mapping:
>>      - MMIO devices (through ioremap)
>>          - event channel pages
>>
>> As the runstate is far smaller than a page, this sounds like a waste of
>> memory for a benefits that haven't not yet been shown. Indeed, number
>> provided by Andrii either show worst performance or similar one.
>>
>> But TBH, I am not expecting that a really clear performance improvement
>> on Arm as there are a lot to do in the context switch.
>>
>>>
>>> That being said, if the implementation turns out to be more
>>> complicated because of this permanent mapping, walking the guest HAP
>>> page tables is certainly no worse than what's done ATM.
>>
>> To be honest I am not fully against always mapping the runstate in Xen.
>> But I need data to show this is worth it. So far, the performance
>> promised are not there and the implementation is not foolproof yet.
>>
>> If we want to keep the runstate mapped permanently, then we need to add
>> either a lock or a refcounting. So the page does not disappear during
>> context switch if we happen to update the runstate concurrently (via the
>> hypercall).
>>
>> This may increase the complexity of the implementation (not sure by how
>> much thought).
>>
>> Another solution is to prevent the runstate to be updated. But I think
>> we will just add a bit more burden in the guest OS.
> 
> Not sure about other systems, but current Linux kernel registers the
> runstate area for other cpus only if those are not up. So there is no
> way the runstate of that foreign vcpu could be updated during
> registering it. Of course this would need to be tested (e.g. -EBUSY
> for registering runstate of an active foreign vcpu).

I was under the impression that xen_vcpu_restore() (see 
arch/x86/xen/enlighten.c) may update the runstate while the vCPU is up 
(at least on pre-Xen 4.5). Did I miss anything?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-08 10:15                     ` Julien Grall
@ 2019-03-08 10:18                       ` Juergen Gross
  2019-03-08 10:31                         ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2019-03-08 10:18 UTC (permalink / raw)
  To: Julien Grall, Roger Pau Monné
  Cc: Andrii Anisov, Stefano Stabellini, Andrii Anisov, xen-devel

On 08/03/2019 11:15, Julien Grall wrote:
> Hi Juergen,
> 
> On 3/8/19 6:28 AM, Juergen Gross wrote:
>> On 07/03/2019 19:00, Julien Grall wrote:
>>> Hi Roger,
>>>
>>> On 07/03/2019 17:15, Roger Pau Monné wrote:
>>>> On Thu, Mar 07, 2019 at 04:36:59PM +0000, Julien Grall wrote:
>>>>> Hi Roger,
>>>>>
>>>>> Thank you for the answer.
>>>>>
>>>>> On 07/03/2019 16:16, Roger Pau Monné wrote:
>>>>>> On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
>>>>>>> Hi Andrii,
>>>>>>>
>>>>>>> On 07/03/2019 14:34, Andrii Anisov wrote:
>>>>>>>> On 07.03.19 16:02, Julien Grall wrote:
>>>>>>>>>>      - IMHO, this implementation is simpler and cleaner than
>>>>>>>>>> what I
>>>>>>>>>> have for runstate mapping on access.
>>>>>>>>>
>>>>>>>>> Did you implement it using access_guest_memory_by_ipa?
>>>>>>>> Not exactly, access_guest_memory_by_ipa() has no implementation
>>>>>>>> for x86.
>>>>>>>> But it is made around that code.
>>>>>>>
>>>>>>> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I
>>>>>>> don't know
>>>>>>> what would be the interface for PV. Roger, any idea?
>>>>>>
>>>>>> For PV I think you will have to use get_page_from_gfn, check the
>>>>>> permissions, map it, write and unmap it. The same flow would also
>>>>>> work
>>>>>> for HVM, so I'm not sure if there's much point in using
>>>>>> hvm_copy_to_guest_phys. Or you can implement a generic
>>>>>> copy_to_guest_phys helper that works for both PV and HVM.
>>>>>>
>>>>>> Note that for translated guests you will have to walk the HAP page
>>>>>> tables for each vCPU for each context switch, which I think will be
>>>>>> expensive in terms of performance (I might be wrong however, since I
>>>>>> have no proof of this).
>>>>>
>>>>> AFAICT, we already walk the page-table with the current
>>>>> implementation. So
>>>>> this should be no different here, except we will not need to walk the
>>>>> guest-PT here. No?
>>>>
>>>> Yes, current implementation is even worse because it walks both the
>>>> guest page tables and the HAP page tables in the HVM case. It would be
>>>> interesting IMO if we could avoid walking any of those page tables.
>>>>
>>>> I see you have concerns about permanently mapping the runstate area,
>>>> so I'm not going to oppose, albeit even with only 1G of VA space you
>>>> can map plenty of runstate areas, and taking into account this is
>>>> 32bit hardware I'm not sure you will ever have that many vCPUs that
>>>> you will run out of VA space to map runstate areas.
>>>
>>> Actually the vmap is only 768MB. The vmap is at the moment used for
>>> mapping:
>>>      - MMIO devices (through ioremap)
>>>          - event channel pages
>>>
>>> As the runstate is far smaller than a page, this sounds like a waste of
>>> memory for a benefits that haven't not yet been shown. Indeed, number
>>> provided by Andrii either show worst performance or similar one.
>>>
>>> But TBH, I am not expecting that a really clear performance improvement
>>> on Arm as there are a lot to do in the context switch.
>>>
>>>>
>>>> That being said, if the implementation turns out to be more
>>>> complicated because of this permanent mapping, walking the guest HAP
>>>> page tables is certainly no worse than what's done ATM.
>>>
>>> To be honest I am not fully against always mapping the runstate in Xen.
>>> But I need data to show this is worth it. So far, the performance
>>> promised are not there and the implementation is not foolproof yet.
>>>
>>> If we want to keep the runstate mapped permanently, then we need to add
>>> either a lock or a refcounting. So the page does not disappear during
>>> context switch if we happen to update the runstate concurrently (via the
>>> hypercall).
>>>
>>> This may increase the complexity of the implementation (not sure by how
>>> much thought).
>>>
>>> Another solution is to prevent the runstate to be updated. But I think
>>> we will just add a bit more burden in the guest OS.
>>
>> Not sure about other systems, but current Linux kernel registers the
>> runstate area for other cpus only if those are not up. So there is no
>> way the runstate of that foreign vcpu could be updated during
>> registering it. Of course this would need to be tested (e.g. -EBUSY
>> for registering runstate of an active foreign vcpu).
> 
> I was under the impression that xen_vcpu_restore() (see
> arch/x86/xen/enlighten.c) may update the runstate while the vCPU is up
> (at least on pre-Xen 4.5). Did I miss anything?

Pre Xen 4.5: yes. But I don't think you want to introduce the new
semantics in Xen 4.4 and older?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-08 10:18                       ` Juergen Gross
@ 2019-03-08 10:31                         ` Julien Grall
  0 siblings, 0 replies; 31+ messages in thread
From: Julien Grall @ 2019-03-08 10:31 UTC (permalink / raw)
  To: Juergen Gross, Roger Pau Monné
  Cc: Andrii Anisov, Stefano Stabellini, Andrii Anisov, xen-devel

Hi Juergen,

On 3/8/19 10:18 AM, Juergen Gross wrote:
> On 08/03/2019 11:15, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 3/8/19 6:28 AM, Juergen Gross wrote:
>>> On 07/03/2019 19:00, Julien Grall wrote:
>>>> Hi Roger,
>>>>
>>>> On 07/03/2019 17:15, Roger Pau Monné wrote:
>>>>> On Thu, Mar 07, 2019 at 04:36:59PM +0000, Julien Grall wrote:
>>>>>> Hi Roger,
>>>>>>
>>>>>> Thank you for the answer.
>>>>>>
>>>>>> On 07/03/2019 16:16, Roger Pau Monné wrote:
>>>>>>> On Thu, Mar 07, 2019 at 03:17:54PM +0000, Julien Grall wrote:
>>>>>>>> Hi Andrii,
>>>>>>>>
>>>>>>>> On 07/03/2019 14:34, Andrii Anisov wrote:
>>>>>>>>> On 07.03.19 16:02, Julien Grall wrote:
>>>>>>>>>>>       - IMHO, this implementation is simpler and cleaner than
>>>>>>>>>>> what I
>>>>>>>>>>> have for runstate mapping on access.
>>>>>>>>>>
>>>>>>>>>> Did you implement it using access_guest_memory_by_ipa?
>>>>>>>>> Not exactly, access_guest_memory_by_ipa() has no implementation
>>>>>>>>> for x86.
>>>>>>>>> But it is made around that code.
>>>>>>>>
>>>>>>>> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I
>>>>>>>> don't know
>>>>>>>> what would be the interface for PV. Roger, any idea?
>>>>>>>
>>>>>>> For PV I think you will have to use get_page_from_gfn, check the
>>>>>>> permissions, map it, write and unmap it. The same flow would also
>>>>>>> work
>>>>>>> for HVM, so I'm not sure if there's much point in using
>>>>>>> hvm_copy_to_guest_phys. Or you can implement a generic
>>>>>>> copy_to_guest_phys helper that works for both PV and HVM.
>>>>>>>
>>>>>>> Note that for translated guests you will have to walk the HAP page
>>>>>>> tables for each vCPU for each context switch, which I think will be
>>>>>>> expensive in terms of performance (I might be wrong however, since I
>>>>>>> have no proof of this).
>>>>>>
>>>>>> AFAICT, we already walk the page-table with the current
>>>>>> implementation. So
>>>>>> this should be no different here, except we will not need to walk the
>>>>>> guest-PT here. No?
>>>>>
>>>>> Yes, current implementation is even worse because it walks both the
>>>>> guest page tables and the HAP page tables in the HVM case. It would be
>>>>> interesting IMO if we could avoid walking any of those page tables.
>>>>>
>>>>> I see you have concerns about permanently mapping the runstate area,
>>>>> so I'm not going to oppose, albeit even with only 1G of VA space you
>>>>> can map plenty of runstate areas, and taking into account this is
>>>>> 32bit hardware I'm not sure you will ever have that many vCPUs that
>>>>> you will run out of VA space to map runstate areas.
>>>>
>>>> Actually the vmap is only 768MB. The vmap is at the moment used for
>>>> mapping:
>>>>       - MMIO devices (through ioremap)
>>>>           - event channel pages
>>>>
>>>> As the runstate is far smaller than a page, this sounds like a waste of
>>>> memory for a benefits that haven't not yet been shown. Indeed, number
>>>> provided by Andrii either show worst performance or similar one.
>>>>
>>>> But TBH, I am not expecting that a really clear performance improvement
>>>> on Arm as there are a lot to do in the context switch.
>>>>
>>>>>
>>>>> That being said, if the implementation turns out to be more
>>>>> complicated because of this permanent mapping, walking the guest HAP
>>>>> page tables is certainly no worse than what's done ATM.
>>>>
>>>> To be honest I am not fully against always mapping the runstate in Xen.
>>>> But I need data to show this is worth it. So far, the performance
>>>> promised are not there and the implementation is not foolproof yet.
>>>>
>>>> If we want to keep the runstate mapped permanently, then we need to add
>>>> either a lock or a refcounting. So the page does not disappear during
>>>> context switch if we happen to update the runstate concurrently (via the
>>>> hypercall).
>>>>
>>>> This may increase the complexity of the implementation (not sure by how
>>>> much thought).
>>>>
>>>> Another solution is to prevent the runstate to be updated. But I think
>>>> we will just add a bit more burden in the guest OS.
>>>
>>> Not sure about other systems, but current Linux kernel registers the
>>> runstate area for other cpus only if those are not up. So there is no
>>> way the runstate of that foreign vcpu could be updated during
>>> registering it. Of course this would need to be tested (e.g. -EBUSY
>>> for registering runstate of an active foreign vcpu).
>>
>> I was under the impression that xen_vcpu_restore() (see
>> arch/x86/xen/enlighten.c) may update the runstate while the vCPU is up
>> (at least on pre-Xen 4.5). Did I miss anything?
> 
> Pre Xen 4.5: yes. But I don't think you want to introduce the new
> semantics in Xen 4.4 and older?

That's correct, the hypercall will only be added for future Xen releases.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2 for-4.12] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
  2019-03-05 13:14 ` [PATCH 1/2 for-4.12] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall Andrii Anisov
@ 2019-03-14  8:45   ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2019-03-14  8:45 UTC (permalink / raw)
  To: andrii.anisov
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Julien Grall, xen-devel, andrii_anisov,
	Roger Pau Monne

>>> On 05.03.19 at 14:14, <andrii.anisov@gmail.com> wrote:
> From: Andrii Anisov <andrii_anisov@epam.com>
> 
> The hypercall employs the same vcpu_register_runstate_memory_area
> structure for the interface, but requires registered area to not
> cross a page boundary.
> 
> Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>

While first of all you'll need to settle the dispute with Julien, a
couple of comments on the actual changes you do in case the
approach is to be pursued.

> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -277,29 +277,33 @@ static void ctxt_switch_to(struct vcpu *n)
>  /* Update per-VCPU guest runstate shared memory area (if registered). */
>  static void update_runstate_area(struct vcpu *v)
>  {
> -    void __user *guest_handle = NULL;
> -
>      if ( guest_handle_is_null(runstate_guest(v)) )
>          return;

I think (seeing also patch 2) re-using the handle representation
is a bad idea. If there's a Xen-internal mapping use a plain
pointer. (All comments on Arm code respectively apply to the
x86 version, also [if any] in patch 2.)

> -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +    if ( v->runstate_guest_type == RUNSTATE_VADDR )

Patch 2 shows that this wants to be switch().

>      {
> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
> -        guest_handle--;
> -        v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
> -        smp_wmb();
> -    }
> +        void __user *guest_handle = NULL;
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            guest_handle = &v->runstate_guest.p->state_entry_time + 1;
> +            guest_handle--;
> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            __raw_copy_to_guest(guest_handle,
> +                                (void *)(&v->runstate.state_entry_time + 1) - 1,
> +                                1);
> +            smp_wmb();
> +        }
>  
> -    __copy_to_guest(runstate_guest(v), &v->runstate, 1);
> +        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
>  
> -    if ( guest_handle )
> -    {
> -        v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> -        smp_wmb();
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
> +        if ( guest_handle )
> +        {
> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            smp_wmb();
> +            __raw_copy_to_guest(guest_handle,
> +                                (void *)(&v->runstate.state_entry_time + 1) - 1,
> +                                1);
> +        }
>      }

There looks to be an "else" missing here (or "default:" once changed
to switch()), which you have on the x86 side. I'm unconvinced it
should be "return true" there though (in which case having nothing
here would indeed have been correct) - ASSERT_UNREACHABLE()
would much rather seem appropriate.

> @@ -1535,6 +1536,12 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case VCPUOP_register_runstate_phys_memory_area:
> +    {
> +        rc = -ENOSYS;
> +        break;
> +    }

Stray braces and bad use of -ENOSYS (should be e.g. -EOPNOTSUPP).

> --- a/xen/include/public/vcpu.h
> +++ b/xen/include/public/vcpu.h
> @@ -235,6 +235,22 @@ struct vcpu_register_time_memory_area {
>  typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
>  DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
>  
> +/*
> + * Register a shared memory area from which the guest may obtain its own
> + * runstate information without needing to execute a hypercall.
> + * Notes:
> + *  1. The registered address must be guest's physical address.
> + *  2. The registered runstate area should not cross page boundary.
> + *  3. Only one shared area may be registered per VCPU. The shared area is
> + *     updated by the hypervisor each time the VCPU is scheduled. Thus
> + *     runstate.state will always be RUNSTATE_running and
> + *     runstate.state_entry_time will indicate the system time at which the
> + *     VCPU was last scheduled to run.
> + * @extra_arg == pointer to vcpu_register_runstate_memory_area structure.

I don't think re-use of that structure is appropriate to represent
a physical address.

> + */
> +#define VCPUOP_register_runstate_phys_memory_area 14
> +
> +
>  #endif /* __XEN_PUBLIC_VCPU_H__ */

Stray double blank lines.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -163,6 +163,13 @@ struct vcpu
>      void            *sched_priv;    /* scheduler-specific data */
>  
>      struct vcpu_runstate_info runstate;
> +
> +    enum {
> +        RUNSTATE_NONE = 0,
> +        RUNSTATE_PADDR = 1,
> +        RUNSTATE_VADDR = 2,
> +    } runstate_guest_type;

I can see that putting the new field here nicely groups with the
other related fields. But this should be weighed against the
introduction of a new padding hole, as opposed to fitting the
field into a existing padding space. An abbreviated version of
the reasoning should be put in the description.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/2 for-4.12] xen: implement VCPUOP_register_runstate_phys_memory_area
  2019-03-05 13:14 ` [PATCH 2/2 for-4.12] xen: implement VCPUOP_register_runstate_phys_memory_area Andrii Anisov
@ 2019-03-14  9:05   ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2019-03-14  9:05 UTC (permalink / raw)
  To: andrii.anisov
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Julien Grall, xen-devel, andrii_anisov,
	Roger Pau Monne

>>> On 05.03.19 at 14:14, <andrii.anisov@gmail.com> wrote:
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -305,6 +305,26 @@ static void update_runstate_area(struct vcpu *v)
>                                  1);
>          }
>      }
> +    else if ( v->runstate_guest_type == RUNSTATE_PADDR )
> +    {
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            runstate_guest(v).p->state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            smp_wmb();
> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> +        }
> +
> +        memcpy(runstate_guest(v).p, &v->runstate, sizeof(v->runstate));
> +
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            runstate_guest(v).p->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            smp_wmb();
> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +        }
> +    }
> +    else
> +    { /* No actions required */ }

See my respective comment on patch 1.

> @@ -1648,9 +1648,37 @@ bool update_runstate_area(struct vcpu *v)
>                                  (void *)(&v->runstate.state_entry_time + 1) - 1, 1);
>          }
>      }
> -    else
> +    else if ( v->runstate_guest_type == RUNSTATE_PADDR )
>      {
> -        rc = true;
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            if ( has_32bit_shinfo((v)->domain) )

Stray inner parentheses (at least one more instance further down).

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -738,7 +738,14 @@ int domain_kill(struct domain *d)
>          if ( cpupool_move_domain(d, cpupool0) )
>              return -ERESTART;
>          for_each_vcpu ( d, v )
> +        {
> +            if ( v->runstate_guest_type == RUNSTATE_VADDR )
> +                set_xen_guest_handle(runstate_guest(v), NULL);
> +            else
> +                unmap_runstate_area(v);

Since this recurs further down, can't the VADDR case also be
taken care of by unmap_runstate_area()?

> @@ -1333,6 +1344,65 @@ void unmap_vcpu_info(struct vcpu *v)
>      put_page_and_type(mfn_to_page(mfn));
>  }
>  
> +int map_runstate_area(struct vcpu *v,
> +                      struct vcpu_register_runstate_memory_area *area)

Neither this nor the unmap function look to be used outside this
file, so should be static. It also looks as if the second parameter
could be constified.

> +{
> +    unsigned long offset = area->addr.p & ~PAGE_MASK;
> +    gfn_t gfn = gaddr_to_gfn(area->addr.p);
> +    struct domain *d = v->domain;
> +    void *mapping;
> +    struct page_info *page;
> +    size_t size = sizeof (struct vcpu_runstate_info );

Stray blanks.

> +    ASSERT(v->runstate_guest_type == RUNSTATE_PADDR );

Another one.

> +    if ( offset > (PAGE_SIZE - size) )
> +        return -EINVAL;
> +
> +    page = get_page_from_gfn(d, gfn_x(gfn), NULL, P2M_ALLOC);

Don't you also need P2M_UNSHARE? And is the p2m type of the
page really of no interest here?

> +void unmap_runstate_area(struct vcpu *v)
> +{
> +    mfn_t mfn;
> +
> +    if ( v->runstate_guest_type != RUNSTATE_PADDR )
> +        return;
> +
> +    if ( guest_handle_is_null(runstate_guest(v)) )
> +        return;
> +
> +    mfn = _mfn(virt_to_mfn(runstate_guest(v).p));

The pointer is the result of __map_domain_page_global() - I
don't think you can legitimately use virt_to_mfn() on it, at
least not on x86; domain_page_map_to_mfn() is what you
want to use here.

> @@ -1532,13 +1602,29 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>              vcpu_runstate_get(v, &runstate);
>              __copy_to_guest(runstate_guest(v), &runstate, 1);
>          }
> -
>          break;
>      }

Undue removal of a blank line.

>      case VCPUOP_register_runstate_phys_memory_area:
>      {
> -        rc = -ENOSYS;
> +        struct vcpu_register_runstate_memory_area area;
> +
> +        rc = -EFAULT;
> +        if ( copy_from_guest(&area, arg, 1) )
> +            break;
> +
> +        unmap_runstate_area(v);
> +        v->runstate_guest_type = RUNSTATE_PADDR;
> +        rc = map_runstate_area(v, &area);
> +
> +        if ( rc )
> +        {
> +            v->runstate_guest_type = RUNSTATE_NONE;

All of this updating of type and mapping are racy, and I guess
that's also what Julien has already pointed out. As long as the
mapping can be updated by remote CPUs, you have to make
sure that the consuming side can't see partially updated values.
On option _might_ be to switch to NONE first, and to PADDR
upon success (with suitable barriers in between). The logic
then of course needs mirroring to the VADDR path.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-07 15:17         ` Julien Grall
  2019-03-07 15:20           ` Julien Grall
  2019-03-07 16:16           ` Roger Pau Monné
@ 2019-03-18 11:31           ` Andrii Anisov
  2019-03-18 12:25             ` Julien Grall
  2 siblings, 1 reply; 31+ messages in thread
From: Andrii Anisov @ 2019-03-18 11:31 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov, Roger Pau Monné

Hello Julien, Guys,

Sorry for a delayed answer. I caught a pretty nasty flu after last long weekend, it made me completely unavailable last week :(

On 07.03.19 17:17, Julien Grall wrote:
> Why? Arm32 is as equally supported as Arm64.
Yep, I believe that.
But I do not expect one would build arm32 based systems with many vcpus.
I have an expression arm32 do not target server applications. Whats left? Embedded with 4, no, ok up to 8 VMs, 8 vcpus each. How much would it cost for runstates mappings?

> What scenario? You just need to read the implementation of the current hypercall and see there are nothing preventing the call to be done twice.
Ah, OK, you are about those kind of races. Yes, it looks I've overlooked that scenario.
> 
> When you are designing a new hypercall you have to think how a guest can misuse it (yes I said misuse not use!). Think about a guest with two vCPUs. vCPU A is constantly modifying the runstate for vCPU B. What could happen if the hypervisor is in the middle of context switch vCPU B?
Effects I can imagine, might be different:
  - New runstate area might be updated on Arm64, maybe partially and concurrently (IIRC, we have all the RAM permanently mapped to XEN)
  - Paging fault might happen on Arm32
  - Smth. similar or different might happen on x86 PV or HVM

Yet, all of them are out of design and are quite unexpected.
> As I pointed
> 
>> Also vcpu_info needs protections from it. Do you agree?
> 
> vcpu_info cannot be called twice thanks to the following check in map_vcpu_info:
>      if ( !mfn_eq(v->vcpu_info_mfn, INVALID_MFN) )
>          return -EINVAL;
Right you are.

>>> Well the number you showed in the other thread didn't show any improvement at all... So please explain why we should call map_domain_page_global() here and using more vmap on arm32
>> I'm not expecting vmap might be a practical problem for arm32 based system.
> 
> Well vmap is quite small on Arm. So why should we use more of it if...
> 
>> With the current implementation numbers are equal to those I have for runstate mapping on access.
> 
> it does not make an real improvement to the context switch? But I recall you said the interrupt latency were worst with keeping the runstate mapped (7000ns vs 7900ns).
Yes, for Roger's patch.

> You also saw a performance drop when using glmark2 benchmark.
Yes, I did see it with Roger's patch. But with mine - numbers are slightly better (~1%) for runstate being mapped.
Also introducing more races preventing code will introduce its impact.

> So how come you can say they are equal? What did change?
Nothing seems to change in context switch part, but numbers with TBM and glmark2 differ. I do not understand why it happens. As well as I do not understand why TBM shown me latency increase where I expected noticeable reduction.

> In other words, I don't want to keep things mapped in Xen if we can achieve similar performance with "mapping"/"unmapping" at context switch.
I understand that clearly.

>> But I'm not sure my test setup able to distinguish the difference.
> 
> Well, there are a lot of stuff happening during context switch. So the benefit of keeping the mapping is probably lost in the noise.
Finaly we received the whole set for Lauterbach tracer. The last adapter to get things together arrived last week. So I really hope I would be able to get trustworthy measurements of those subtle things nested into context switch.

>>>>   - IMHO, this implementation is simpler and cleaner than what I have for runstate mapping on access.
>>>
>>> Did you implement it using access_guest_memory_by_ipa?
>> Not exactly, access_guest_memory_by_ipa() has no implementation for x86. But it is made around that code.
> 
> For the HVM, the equivalent function is hvm_copy_to_guest_phys. I don't know what would be the interface for PV. Roger, any idea?
Will turn to map on access implementation as well. To have options and being able to compare.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-18 11:31           ` Andrii Anisov
@ 2019-03-18 12:25             ` Julien Grall
  2019-03-18 13:38               ` Andrii Anisov
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2019-03-18 12:25 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov, Roger Pau Monné

On 18/03/2019 11:31, Andrii Anisov wrote:
> Hello Julien, Guys,

Hi,

> Sorry for a delayed answer. I caught a pretty nasty flu after last long weekend, 
> it made me completely unavailable last week :(

Sorry to hear that.

> On 07.03.19 17:17, Julien Grall wrote:
>> Why? Arm32 is as equally supported as Arm64.
> Yep, I believe that.
> But I do not expect one would build arm32 based systems with many vcpus.
> I have an expression arm32 do not target server applications. Whats left? 
> Embedded with 4, no, ok up to 8 VMs, 8 vcpus each. How much would it cost for 
> runstates mappings?

As I already said multiple times before, please try to explain everything in 
your first e-mail...

The limitations you mention is only for GICv2. If you use GICv3, the number of 
CPUs can go much higher. Whether this is going to be use in the future is 
another question. However, I would rather try to not discard 32-bit from any 
discussion as you don't know how this is going to be used in the future.

The Arm32 port is interesting because all the memory is not mapped in Xen. There 
are chance that the Arm64 will go towards the same in the future.

I have been thinking a bit more about arm32.I don't think we ever map 2GB of 
on-demand-paging in one go, so this could probably be reduced to 1GB. The other 
1GB could be used to increase the vmap. This would give us up to 1792MB of vmap.

Pending to the performance result, the global mapping could be a solution for 
arm32 as well.

> 
>> What scenario? You just need to read the implementation of the current 
>> hypercall and see there are nothing preventing the call to be done twice.
> Ah, OK, you are about those kind of races. Yes, it looks I've overlooked that 
> scenario.
>>
>> When you are designing a new hypercall you have to think how a guest can 
>> misuse it (yes I said misuse not use!). Think about a guest with two vCPUs. 
>> vCPU A is constantly modifying the runstate for vCPU B. What could happen if 
>> the hypervisor is in the middle of context switch vCPU B?
> Effects I can imagine, might be different:
>   - New runstate area might be updated on Arm64, maybe partially and 
> concurrently (IIRC, we have all the RAM permanently mapped to XEN)

Today the RAM is always permanently mapped, I can't promise this is going to be 
the case in the future.

>   - Paging fault might happen on Arm32

What do you mean?

>   - Smth. similar or different might happen on x86 PV or HVM
> 
> Yet, all of them are out of design and are quite unexpected.
We *must* protect hypervisor against any guest behavior. Particularly the 
unexpected one. If the Android VM hit itself, then I pretty much don't care 
assuming the VM was misbehaving. However, I don't think anyone would be happy if 
the Android VM is able to take down the whole platform. At least, I would not 
want to be the passenger of that car...

>> As I pointed
>>
>>> Also vcpu_info needs protections from it. Do you agree?
>>
>> vcpu_info cannot be called twice thanks to the following check in map_vcpu_info:
>>      if ( !mfn_eq(v->vcpu_info_mfn, INVALID_MFN) )
>>          return -EINVAL;
> Right you are.
> 
>>>> Well the number you showed in the other thread didn't show any improvement 
>>>> at all... So please explain why we should call map_domain_page_global() here 
>>>> and using more vmap on arm32
>>> I'm not expecting vmap might be a practical problem for arm32 based system.
>>
>> Well vmap is quite small on Arm. So why should we use more of it if...
>>
>>> With the current implementation numbers are equal to those I have for 
>>> runstate mapping on access.
>>
>> it does not make an real improvement to the context switch? But I recall you 
>> said the interrupt latency were worst with keeping the runstate mapped (7000ns 
>> vs 7900ns).
> Yes, for Roger's patch.
> 
>> You also saw a performance drop when using glmark2 benchmark.
> Yes, I did see it with Roger's patch. But with mine - numbers are slightly 
> better (~1%) for runstate being mapped. > Also introducing more races preventing code will introduce its impact.

Please provide the numbers once you fixed the race.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-18 12:25             ` Julien Grall
@ 2019-03-18 13:38               ` Andrii Anisov
  2019-03-21 19:05                 ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Andrii Anisov @ 2019-03-18 13:38 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov, Roger Pau Monné



On 18.03.19 14:25, Julien Grall wrote:
> As I already said multiple times before, please try to explain everything in your first e-mail...
I know. I'm trying to provide enough info in the cover letter. But it seems I do not succeed.
Putting all the thoughts might lead into overburdened text. But it looks it should be done. Going to do that next time.

> The limitations you mention is only for GICv2. If you use GICv3, the number of CPUs can go much higher. Whether this is going to be use in the future is another question. However, I would rather try to not discard 32-bit from any discussion as you don't know how this is going to be used in the future.
> 
> The Arm32 port is interesting because all the memory is not mapped in Xen. There are chance that the Arm64 will go towards the same in the future.
> 
> I have been thinking a bit more about arm32.I don't think we ever map 2GB of on-demand-paging in one go, so this could probably be reduced to 1GB. The other 1GB could be used to increase the vmap. This would give us up to 1792MB of vmap.
> 
> Pending to the performance result, the global mapping could be a solution for arm32 as well.
Well.. OK.

>> Effects I can imagine, might be different:
>>   - New runstate area might be updated on Arm64, maybe partially and concurrently (IIRC, we have all the RAM permanently mapped to XEN)
> 
> Today the RAM is always permanently mapped, I can't promise this is going to be the case in the future.
> 
>>   - Paging fault might happen on Arm32
> 
> What do you mean?
Ouch... I did mean translation fault.

> 
>>   - Smth. similar or different might happen on x86 PV or HVM

>> Yet, all of them are out of design and are quite unexpected.
> We *must* protect hypervisor against any guest behavior.
Totally agree.

> Particularly the unexpected one. If the Android VM hit itself, then I pretty much don't care assuming the VM was misbehaving. However, I don't think anyone would be happy if the Android VM is able to take down the whole platform. At least, I would not want to be the passenger of that car...
Neither do I.

>>> You also saw a performance drop when using glmark2 benchmark.
>> Yes, I did see it with Roger's patch. But with mine - numbers are slightly better (~1%) for runstate being mapped. > Also introducing more races preventing code will introduce its impact.
> 
> Please provide the numbers once you fixed the race.
I'm laying my hands on the tracer now. Want to get numbers from it as well.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address
  2019-03-18 13:38               ` Andrii Anisov
@ 2019-03-21 19:05                 ` Julien Grall
  0 siblings, 0 replies; 31+ messages in thread
From: Julien Grall @ 2019-03-21 19:05 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Andrii Anisov, Roger Pau Monné

Hi Andrii,

On 3/18/19 1:38 PM, Andrii Anisov wrote:
> 
> 
> On 18.03.19 14:25, Julien Grall wrote:
>> As I already said multiple times before, please try to explain 
>> everything in your first e-mail...
> I know. I'm trying to provide enough info in the cover letter. But it 
> seems I do not succeed.
> Putting all the thoughts might lead into overburdened text. But it looks 
> it should be done. Going to do that next time.

I realize my point is maybe misleading. Overburderned text is indeed not 
very great. Missing something in the cover letter is not entirely a big 
deal. Then the reviewers can ask question for clarification afterwards.

However, when I pointed out that arm32 was left from the text, you 
should have provided more details on why rather than just confirming you 
left it. Maybe I should have been clearer in my review and ask a proper 
question.

> 
>>
>>>   - Smth. similar or different might happen on x86 PV or HVM
> 
>>> Yet, all of them are out of design and are quite unexpected.
>> We *must* protect hypervisor against any guest behavior.
> Totally agree.
> 
>> Particularly the unexpected one. If the Android VM hit itself, then I 
>> pretty much don't care assuming the VM was misbehaving. However, I 
>> don't think anyone would be happy if the Android VM is able to take 
>> down the whole platform. At least, I would not want to be the 
>> passenger of that car...
> Neither do I.
> 
>>>> You also saw a performance drop when using glmark2 benchmark.
>>> Yes, I did see it with Roger's patch. But with mine - numbers are 
>>> slightly better (~1%) for runstate being mapped. > Also introducing 
>>> more races preventing code will introduce its impact.
>>
>> Please provide the numbers once you fixed the race.
> I'm laying my hands on the tracer now. Want to get numbers from it as wel.

Thank you for doing the benchmark.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-03-21 19:05 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05 13:14 [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Andrii Anisov
2019-03-05 13:14 ` [PATCH 1/2 for-4.12] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall Andrii Anisov
2019-03-14  8:45   ` Jan Beulich
2019-03-05 13:14 ` [PATCH 2/2 for-4.12] xen: implement VCPUOP_register_runstate_phys_memory_area Andrii Anisov
2019-03-14  9:05   ` Jan Beulich
2019-03-05 13:20 ` [PATCH 0/2 for-4.12] Introduce runstate area registration with phys address Juergen Gross
2019-03-05 13:32   ` Andrii Anisov
2019-03-05 13:39     ` Julien Grall
2019-03-05 14:11       ` Andrii Anisov
2019-03-05 14:30         ` Julien Grall
2019-03-07 13:07           ` Andrii Anisov
2019-03-05 13:44     ` Juergen Gross
2019-03-05 13:50       ` Andrii Anisov
2019-03-05 13:56 ` Julien Grall
2019-03-07 13:01   ` Andrii Anisov
2019-03-07 14:02     ` Julien Grall
2019-03-07 14:34       ` Andrii Anisov
2019-03-07 15:17         ` Julien Grall
2019-03-07 15:20           ` Julien Grall
2019-03-07 16:16           ` Roger Pau Monné
2019-03-07 16:36             ` Julien Grall
2019-03-07 17:15               ` Roger Pau Monné
2019-03-07 18:00                 ` Julien Grall
2019-03-08  6:28                   ` Juergen Gross
2019-03-08 10:15                     ` Julien Grall
2019-03-08 10:18                       ` Juergen Gross
2019-03-08 10:31                         ` Julien Grall
2019-03-18 11:31           ` Andrii Anisov
2019-03-18 12:25             ` Julien Grall
2019-03-18 13:38               ` Andrii Anisov
2019-03-21 19:05                 ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.